sat-ebook-20150618.pdf

Chapter 3

Successful Backtesting

Algorithmic backtesting requires knowledge of many areas, including psychology, mathematics, statistics, software development and market/exchange microstructure. I couldn’t hope to cover all of those topics in one chapter, so I’m going to split them into two or three smaller pieces. What will we discuss in this section? I’ll begin by deﬁning backtesting and then I will describe the basics of how it is carried out. Then I will elucidate upon the biases we touched upon in previous chapters.

In subsequent chapters we will look at the details of strategy implementations that are often barely mentioned or ignored elsewhere. We will also consider how to make the backtesting process more realistic by including the idiosyncrasies of a trading exchange. Then we will discuss transaction costs and how to correctly model them in a backtest setting. We will end with a discussion on the performance of our backtests and ﬁnally provide detailed examples of common quant strategies.

Let’s begin by discussing what backtesting is and why we should carry it out in our algorithmic trading.

Why Backtest Strategies?
Algorithmic trading stands apart from other types of investment classes because we can more reliably provide expectations about future performance from past performance, as a consequence of abundant data availability. The process by which this is carried out is known as backtesting.
In simple terms, backtesting is carried out by exposing your particular strategy algorithm to a stream of historical ﬁnancial data, which leads to a set of trading signals. Each trade (which we will mean here to be a ’round-trip’ of two signals) will have an associated proﬁt or loss. The accumulation of this proﬁt/loss over the duration of your strategy backtest will lead to the total proﬁt and loss (also known as the ’P & L’ or ’PnL’). That is the essence of the idea, although of course the “devil is always in the details”!
What are key reasons for backtesting an algorithmic strategy?
- Filtration - If you recall from the previous chapter on Strategy Identiﬁcation, our goal at the initial research stage was to set up a strategy pipeline and then ﬁlter out any strategy that did not meet certain criteria. Backtesting provides us with another ﬁltration mechanism, as we can eliminate strategies that do not meet our performance needs.
- Modelling - Backtesting allows us to (safely!) test new models of certain market phenom- ena, such as transaction costs, order routing, latency, liquidity or other market microstruc- ture issues.
- Optimisation - Although strategy optimisation is fraught with biases, backtesting allows us to increase the performance of a strategy by modifying the quantity or values of the parameters associated with that strategy and recalculating its performance.
  
  15
  - Veriﬁcation - Our strategies are often sourced externally, via our strategy pipeline. Back- testing a strategy ensures that it has not been incorrectly implemented. Although we will rarely have access to the signals generated by external strategies, we will often have access to the performance metrics such as the Sharpe Ratio and Drawdown characteristics. Thus we can compare them with our own implementation.
    
    Backtesting provides a host of advantages for algorithmic trading. However, it is not always possible to straightforwardly backtest a strategy. In general, as the frequency of the strategy increases, it becomes harder to correctly model the microstructure eﬀects of the market and exchanges. This leads to less reliable backtests and thus a trickier evaluation of a chosen strategy. This is a particular problem where the execution system is the key to the strategy performance, as with ultra-high frequency algorithms.
    Unfortunately, backtesting is fraught with biases of all types and we will now discuss them in depth.

Backtesting Biases

There are many biases that can aﬀect the performance of a backtested strategy. Unfortunately, these biases have a tendency to inﬂate the performance rather than detract from it. Thus you should always consider a backtest to be an idealised upper bound on the actual performance of the strategy. It is almost impossible to eliminate biases from algorithmic trading so it is our job to minimise them as best we can in order to make informed decisions about our algorithmic strategies.

There are four major biases that I wish to discuss: Optimisation Bias, Look-Ahead Bias, Survivorship Bias and Cognitive Bias.

Optimisation Bias
This is probably the most insidious of all backtest biases. It involves adjusting or introducing additional trading parameters until the strategy performance on the backtest data set is very attractive. However, once live the performance of the strategy can be markedly diﬀerent. Another name for this bias is "curve ﬁtting" or "data-snooping bias".
Optimisation bias is hard to eliminate as algorithmic strategies often involve many parame- ters. "Parameters" in this instance might be the entry/exit criteria, look-back periods, averag- ing periods (i.e the moving average smoothing parameter) or volatility measurement frequency. Optimisation bias can be minimised by keeping the number of parameters to a minimum and increasing the quantity of data points in the training set. In fact, one must also be careful of the latter as older training points can be subject to a prior regime (such as a regulatory environment) and thus may not be relevant to your current strategy.
One method to help mitigate this bias is to perform a sensitivity analysis. This means varying the parameters incrementally and plotting a "surface" of performance. Sound, fundamental reasoning for parameter choices should, with all other factors considered, lead to a smoother parameter surface. If you have a very jumpy performance surface, it often means that a parameter is not reﬂecting a phenomena and is an artefact of the test data. There is a vast literature on multi-dimensional optimisation algorithms and it is a highly active area of research. I won’t dwell on it here, but keep it in the back of your mind when you ﬁnd a strategy with a fantastic backtest!
Look-Ahead Bias
Look-ahead bias is introduced into a backtesting system when future data is accidentally included at a point in the simulation where that data would not have actually been available. If we are running the backtest chronologically and we reach time point N , then look-ahead bias occurs if data is included for any point N + k, where k > 0. Look-ahead bias errors can be incredibly subtle. Here are three examples of how look-ahead bias can be introduced:
- Technical Bugs - Arrays/vectors in code often have iterators or index variables. Incorrect oﬀsets of these indices can lead to a look-ahead bias by incorporating data at N + k for non-zero k.
- Parameter Calculation - Another common example of look-ahead bias occurs when calculating optimal strategy parameters, such as with linear regressions between two time series. If the whole data set (including future data) is used to calculate the regression coeﬃcients, and thus retroactively applied to a trading strategy for optimisation purposes, then future data is being incorporated and a look-ahead bias exists.
- Maxima/Minima - Certain trading strategies make use of extreme values in any time period, such as incorporating the high or low prices in OHLC data. However, since these maximal/minimal values can only be calculated at the end of a time period, a look-ahead bias is introduced if these values are used -during- the current period. It is always necessary to lag high/low values by at least one period in any trading strategy making use of them.
  
  As with optimisation bias, one must be extremely careful to avoid its introduction. It is often the main reason why trading strategies underperform their backtests signiﬁcantly in "live trading".
Survivorship Bias
Survivorship bias is a particularly dangerous phenomenon and can lead to signiﬁcantly inﬂated performance for certain strategy types. It occurs when strategies are tested on datasets that do not include the full universe of prior assets that may have been chosen at a particular point in time, but only consider those that have "survived" to the current time.
As an example, consider testing a strategy on a random selection of equities before and after the 2001 market crash. Some technology stocks went bankrupt, while others managed to stay aﬂoat and even prospered. If we had restricted this strategy only to stocks which made it through the market drawdown period, we would be introducing a survivorship bias because they have already demonstrated their success to us. In fact, this is just another speciﬁc case of look-ahead bias, as future information is being incorporated into past analysis.
There are two main ways to mitigate survivorship bias in your strategy backtests:
- Survivorship Bias Free Datasets - In the case of equity data it is possible to purchase datasets that include delisted entities, although they are not cheap and only tend to be utilised by institutional ﬁrms. In particular, Yahoo Finance data is NOT survivorship bias free, and this is commonly used by many retail algo traders. One can also trade on asset classes that are not prone to survivorship bias, such as certain commodities (and their future derivatives).
- Use More Recent Data - In the case of equities, utilising a more recent data set mitigates the possibility that the stock selection chosen is weighted to "survivors", simply as there is less likelihood of overall stock delisting in shorter time periods. One can also start building a personal survivorship-bias free dataset by collecting data from current point onward. After 3-4 years, you will have a solid survivorship-bias free set of equities data with which to backtest further strategies.
We will now consider certain psychological phenomena that can inﬂuence your trading per- formance.
Cognitive Bias

This particular phenomena is not often discussed in the context of quantitative trading. However, it is discussed extensively in regard to more discretionary trading methods. When creating backtests over a period of 5 years or more, it is easy to look at an upwardly trending equity curve, calculate the compounded annual return, Sharpe ratio and even drawdown characteristics and be satisﬁed with the results. As an example, the strategy might possess a maximum relative

drawdown of 25% and a maximum drawdown duration of 4 months. This would not be atypical for a momentum strategy. It is straightforward to convince oneself that it is easy to tolerate such periods of losses because the overall picture is rosy. However, in practice, it is far harder!

If historical drawdowns of 25% or more occur in the backtests, then in all likelihood you will see periods of similar drawdown in live trading. These periods of drawdown are psychologically diﬃcult to endure. I have observed ﬁrst hand what an extended drawdown can be like, in an institutional setting, and it is not pleasant - even if the backtests suggest such periods will occur. The reason I have termed it a "bias" is that often a strategy which would otherwise be successful is stopped from trading during times of extended drawdown and thus will lead to signiﬁcant underperformance compared to a backtest. Thus, even though the strategy is algorithmic in nature, psychological factors can still have a heavy inﬂuence on proﬁtability. The takeaway is to ensure that if you see drawdowns of a certain percentage and duration in the backtests, then you should expect them to occur in live trading environments, and will need to persevere in order to reach proﬁtability once more.

Exchange Issues
1. Order Types
  One choice that an algorithmic trader must make is how and when to make use of the diﬀerent exchange orders available. This choice usually falls into the realm of the execution system, but we will consider it here as it can greatly aﬀect strategy backtest performance. There are two types of order that can be carried out: market orders and limit orders.
  A market order executes a trade immediately, irrespective of available prices. Thus large trades executed as market orders will often get a mixture of prices as each subsequent limit order on the opposing side is ﬁlled. Market orders are considered aggressive orders since they will almost certainly be ﬁlled, albeit with a potentially unknown cost.
  Limit orders provide a mechanism for the strategy to determine the worst price at which the trade will get executed, with the caveat that the trade may not get ﬁlled partially or fully. Limit orders are considered passive orders since they are often unﬁlled, but when they are a price is guaranteed. An individual exchange’s collection of limit orders is known as the limit order book, which is essentially a queue of buy and sell orders at certain sizes and prices.
  When backtesting, it is essential to model the eﬀects of using market or limit orders correctly. For high-frequency strategies in particular, backtests can signiﬁcantly outperform live trading if the eﬀects of market impact and the limit order book are not modelled accurately.
2. Price Consolidation
  There are particular issues related to backtesting strategies when making use of daily data in the form of Open-High-Low-Close (OHLC) ﬁgures, especially for equities. Note that this is precisely the form of data given out by Yahoo Finance, which is a very common source of data for retail algorithmic traders!
  Cheap or free datasets, while suﬀering from survivorship bias (which we have already discussed above), are also often composite price feeds from multiple exchanges. This means that the extreme points (i.e. the open, close, high and low) of the data are very susceptible to "outlying" values due to small orders at regional exchanges. Further, these values are also sometimes more likely to be tick-errors that have yet to be removed from the dataset.
  This means that if your trading strategy makes extensive use of any of the OHLC points speciﬁcally, backtest performance can diﬀer from live performance as orders might be routed to diﬀerent exchanges depending upon your broker and your available access to liquidity. The only way to resolve these problems is to make use of higher frequency data or obtain data directly from an individual exchange itself, rather than a cheaper composite feed.
3. Forex Trading and ECNs
  The backtesting of foreign exchange strategies is somewhat trickier to implement than that of equity strategies. Forex trading occurs across multiple venues and Electronic Communication Networks (ECN). The bid/ask prices achieved on one venue can diﬀer substantially from those on another venue. One must be extremely careful to make use of pricing information from the particular venue you will be trading on in the backtest, as opposed to a consolidated feed from multiple venues, as this will be signiﬁcantly more indicative of the prices you are likely to achieve going forward.
  Another idiosyncrasy of the foreign exchange markets is that brokers themselves are not obligated to share trade prices/sizes with every trading participant, since this is their proprietary information[6]. Thus it is more appropriate to use bid-ask quotes in your backtests and to be extremely careful of the variation of transaction costs between brokers/venues.
4. Shorting Constraints

When carrying out short trades in the backtest it is necessary to be aware that some equities may not have been available (due to the lack of availability in that stock to borrow) or due to a market constraint, such as the US SEC banning the shorting of ﬁnancial stocks during the 2008 market crisis.

This can severely inﬂate backtesting returns so be careful to include such short sale constraints within your backtests, or avoid shorting at all if you believe there are likely to be liquidity constraints in the instruments you trade.

Transaction Costs
One of the most prevalent beginner mistakes when implementing trading models is to neglect (or grossly underestimate) the eﬀects of transaction costs on a strategy. Though it is often assumed that transaction costs only reﬂect broker commissions, there are in fact many other ways that costs can be accrued on a trading model. The three main types of costs that must be considered include:
1. Commission
  The most direct form of transaction costs incurred by an algorithmic trading strategy are com- missions and fees. All strategies require some form of access to an exchange, either directly or through a brokerage intermediary ("the broker"). These services incur an incremental cost with each trade, known as commission.
  Brokers generally provide many services, although quantitative algorithms only really make use of the exchange infrastructure. Hence brokerage commissions are often small on per trade basis. Brokers also charge fees, which are costs incurred to clear and settle trades. Further to this are taxes imposed by regional or national governments. For instance, in the UK there is a stamp duty to pay on equities transactions. Since commissions, fees and taxes are generally ﬁxed, they are relatively straightforward to implement in a backtest engine (see below).
2. Slippage
  Slippage is the diﬀerence in price achieved between the time when a trading system decides to transact and the time when a transaction is actually carried out at an exchange. Slippage is a considerable component of transaction costs and can make the diﬀerence between a very proﬁtable strategy and one that performs poorly. Slippage is a function of the underlying asset volatility, the latency between the trading system and the exchange and the type of strategy being carried out.
  An instrument with higher volatility is more likely to be moving and so prices between signal and execution can diﬀer substantially. Latency is deﬁned as the time diﬀerence between signal generation and point of execution. Higher frequency strategies are more sensitive to latency
  issues and improvements of milliseconds on this latency can make all the diﬀerence towards proﬁtability. The type of strategy is also important. Momentum systems suﬀer more from slippage on average because they are trying to purchase instruments that are already moving in the forecast direction. The opposite is true for mean-reverting strategies as these strategies are moving in a direction opposing the trade.
3. Market Impact
  Market impact is the cost incurred to traders due to the supply/demand dynamics of the exchange (and asset) through which they are trying to trade. A large order on a relatively illiquid asset is likely to move the market substantially as the trade will need to access a large component of the current supply. To counter this, large block trades are broken down into smaller "chunks" which are transacted periodically, as and when new liquidity arrives at the exchange. On the opposite end, for highly liquid instruments such as the S&P500 E-Mini index futures contract, low volume trades are unlikely to adjust the "current price" in any great amount.
  More illiquid assets are characterised by a larger spread, which is the diﬀerence between the current bid and ask prices on the limit order book. This spread is an additional transaction cost associated with any trade. Spread is a very important component of the total transaction cost
  - as evidenced by the myriad of UK spread-betting ﬁrms whose advertising campaigns express the "tightness" of their spreads for heavily traded instruments.

Backtesting vs Reality

In summary there are a staggering array of factors that can be simulated in order to generate a realistic backtest. The dangers of overﬁtting, poor data cleansing, incorrect handling of transac- tion costs, market regime change and trading constraints often lead to a backtest performance that diﬀers substantially from a live strategy deployment.

Thus one must be very aware that future performance is very unlikely to match historical performance directly. We will discuss these issues in further detail when we come to implement an event-driven backtesting engine near the end of the book.

Algorithmic Trading

Saturday, September 1, 2018