Survivorship Bias in Backtests: An Example (with R Code)
The problem
It’s fun trying to make money off of the stock market, and I’ve tried all sorts of strategies. One of the things I’ve learned to watch out for is survivorship bias, which can make a terrible strategy look fantastic.
The problem is simple. If you start out with a “current” list of stocks, the historical data you use to develop your strategy won’t include stocks that went to 0. By filtering out the biggest losers, it becomes very easy to develop a strategy that looks good but won’t actually work when you try to implement it.
It can also be more subtle. If you grab a list of mutual funds offered by a particular company, for example, you won’t see any of the funds that were discontinued due to poor performance (or other reasons).
An example with individual S&P 500 stocks
Tickers now vs. tickers then
A common strategy is to try to identify a subset of S&P 500 companies that are likely to outperform the others. Something like “hold the 10 S&P 500 stocks with the highest growth over the past month.”
Suppose we want to use data from the beginning of 2019 to build some sort of a strategy. Consider two approaches:
- Get list of S&P 500 tickers as of today (Nov. 10, 2019)
- Get list of S&P 500 tickers at the beginning of the time period for the backtest (Jan. 1, 2019)
It seems like it wouldn’t make a huge difference, since we’re only talking about an 11-month period.
We can get the list of ticker symbols from the Wikipedia page for current S&P 500 companies, which has a table that looks like this:
By navigating through the Wikipedia revision history, I found that the following stocks were part of the S&P on Jan. 1 but not on Nov. 10:
>>> "APC" "BHGE" "BHF" "DWDP" "ESRX" "FLR" "FL" "GT" "HRS",
"HCP" "JEF" "LLL" "MAT" "KORS" "NKTR" "NFX" "PCG" "RHT"
"SCG" "SYMC" "TMK" "TSS"
Most of these companies dropped out for reasons other than poor performance (ticker symbol changed, merger/acquisition, etc.). After some Googling, I identified 6 of the 22 as having dropped out due to market cap considerations.
Meanwhile, these companies were added during 2019:
>>> "AMCR" "ATO" "BKR" "CPRI" "CDW" "CE" "CTVA" "DOW" "DD"
"FRC" "GL" "PEAK" "IEX" "LHX" "LVS" "LDOS" "MKTX" "NLOK"
"NVR" "TMUS" "TFX" "WAB"
And I believe 12 of these were due to growth.
Swapping out losers for winners
Let’s look at the performance of the 12 extra stocks we’d add in if we used the Nov. 10 list, and the 6 we’d leave out if we used Jan. 1.
First, load daily gains for each stock:
tickers.leftout <- c("BHF", "FLR", "FL", "MAT", "NKTR", "PCG")
tickers.addedin <- c("ATO", "CDW", "CE", "FRC", "GL", "IEX",
"LVS", "LDOS", "MKTX", "NVR", "TMUS", "TFX")
Second, calculate total growth and max drawdown:
df <- c(tickers.leftout, tickers.addedin) %>%
load_gains(from = "2019-01-01") %>%
calc_metrics(c("growth", "mdd"))
Third, plot total growth vs. MDD:
df$Group <- c(rep("Removed from backtest", 6),
rep("Added to backtest", 12))ggplot(df, aes(x = `Max drawdown (%)`, y = `Growth (%)`,
color = Group, label = Fund)) +
geom_point() +
geom_label_repel(show.legend = FALSE) +
labs(title = "Growth vs. Max drawdown")
As you can see, the funds we added were big gainers (mean +39.8%), and the ones we removed were big losers (mean -18.5%).
Two of the dropouts, BHF and MAT, actually weren’t losers. Apparently, inclusion in the S&P isn’t entirely based on market cap, and for whatever reason these two stocks were relegated to the S&P MidCap 400 despite fairly strong growth.
Anyway, you can imagine how exchanging stocks with a 60% difference in historical growth can skew a backtest. It’s always going to make strategies look better than they are. The longer the period, the better they’ll look.
And don’t fall into the trap of thinking survivorship bias will only explain part of a backtested strategy’s impressive returns. It can very easily explain all of it. Trust me on that one.