September 13, 2018

Perils of Data Mining

From the time my book was published others have tried to improve upon the book’s Global Equity Momentum (GEM) model. There is nothing wrong with trying to improve on prior work. That is how society progresses.

But such attempts can have data mining, overfitting, and selection bias issues. Data mining is when you search through data to find more variables or better model parameters. These may not hold up going forward, especially if you have a limited amount of testing data.

Many develop quant models by data mining 15 to 20 years of ETF or mutual fund data which is as far back as that data will let you go. Here is a chart from my book showing how regimes change considerably every 15 years.


Searching for parameters based on only 15 or 20 years of data are likely to give disappointing results going forward as regimes change. Even 40 years of data may not be enough to inspire full confidence.

Model overfitting happens with having complex models.  John von Neumann said, “With four parameters I can fit an elephant and with five I can make him wiggle his trunk.”

Selection bias is when you know what your testing results are likely to be ahead of time and build a model incorporating that information. You might select data or your data starting point knowing it will give good results while ignoring other possibilities.

Here is the most egregious example of selection bias that I know.  An advisory firm invited me to dinner to discuss licensing my proprietary models. I thought this odd since they already had their own momentum-based models. At dinner I asked them why their published results only went back only 13 years when there was more data available. They said it was because investors do not like to see drawdowns greater than 20%!

Selection bias, model overfitting, and data mining issues may not be obvious or intentional. Here is what I have done to try to avoid these problems.

Use lots of data


Our first GEM backtest began from 1974. We were constrained by the amount of bond data that we had then. When I acquired more bond data, we extended our backtest to 1971 where we were now limited by the amount of MSCI non-U.S. stock index data. The extra 3 years of performance gave us an out-of-sample period covering the 1973-74 bear market. GEM performed well out-of-sample by being out of stocks during most of the bear market.

We recently gained access to non-MSCI stock index data and were able to extend our GEM backtest to Jan 1950. Asness, Israelov, and Liew (2017) in “International Diversification Works (Eventually)” also used 1950 as a starting date for their study. During World War II almost no one invested globally. The Templeton Growth Fund that began in 1954 was the first international fund available to U.S. investors .




Historical data and analysis should not be taken as an indication or guarantee of any future performance. Future performance of  GEM may differ significantly from historical performance. Please see our Disclaimer page for additional disclosures.

While you should never just rely on the past performance, GEM did continue to outperform during the 1950s and 1960s.

Respect prior studies and well-established ideas

Researchers have studied long-short momentum more than any factor in finance. Geczy and Samonov (2017) looked at momentum applied to geographically diversified stock indices, bonds, currencies, commodities, stock sectors, and U.S. stocks back to 1801. Momentum outperformed buy-and-hold in all these areas. The best results were with global stock indices shown below as “Equity”. These are what we use with momentum.


Source: Geczy & Samonov (2017), “Two Centuries of Multi-Asset Momentum (Equities, Bonds, Currencies, Commodites, Sectors, and Stocks

We use a 12-month momentum lookback because Cowles & Jones found it worked well in 1937. Jegadeesh & Titman did also in their seminal momentum research done in the 1990s. Greyserman and Kaminski (2014) showed that trend following momentum with a 12-month lookback outperformed buy and hold back to the year 1223!

Keep things simple


We prefer to be holding stocks as much as we can since they have the most proven risk premium. We keep things simple by being in U.S. or non-U.S. stock indices according to their relative strength over the preceding 12 months. For non-U.S. stocks, we avoid selection bias by being in as broad an index as possible. That is the MSCI All Country World Index ex-U.S (ACWI ex-US). It includes all non-U.S. MSCI developed and emerging countries weighted by their market capitalization. When the trend in stocks is negative according to 12-month absolute momentum, we exit stocks for the safety of aggregate bonds. We have always tried to follow Einstein’s advice of keeping things as simple as possible but no simpler. Let us look instead now at variations of dual momentum appearing on the internet.

Shorter lookback periods


In my book I show that a 12-month look back period outperformed 3, 6, and 9-month look back periods with GEM. For years now, my website’s FAQ page has described in more detail why 12 months works best. Yet there are those who still believe that because shorter look backs are more sensitive to market changes, they should give better results.

A 3-month look back has performed well over the past 20 years. If you were to look only at that data, you might feel reassured about using a shorter look back period. But this starts to unravel in 1979-80 when the markets became very choppy. Choppiness gives both lower returns and higher drawdowns. Here are GEM results from Jan 1971 through Aug 2018 comparing 12 and 3-month look back periods.



12 MONTHS
3 MONTHS
CAGR
16.6
13.6
ANNUAL STD DEV
12.2
11.8
SHARPE RATIO
0.94
0.74
WORST DRAWDOWN
-16.8
-23.3


Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index.

There are also tax advantages to a 12-month look back. Using it, GEM trades on average 1.3 times per year. Seventy percent of GEM’s gains are long-term, while 100% of its losses are short-term. Trading increases and these tax advantages disappear if you use a shorter look back. A 12-month look back worked well in the pioneering momentum research done in 1937 and 1993. Using a 12-month lookback reduces data mining concerns and seasonality bias.

EAFE instead of ACWI ex-U.S. 


There are several websites that show GEM results and issue GEM signals using an ETF for the MSCI EAFE index rather than the broader MSCI ACWI ex-U.S. index. Emerging markets and Canada are missing from the MSCI EAFE index.They make up 24% of the MSCI ACWI ex-US index. Here are the GEM results using each index since the MSCI ACWI ex-U.S. index was introduced in December 1988.



ACWI ex-US
EAFE
CAGR
15.6
14.3
ANNUAL STD DEV
13.1
13.0
SHARPE RATIO
0.84
0.73
WORST DRAWDOWN
                             -17.0
                             -17.0


Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index.

GEM earned 130 basis points more in annual return using the MSCI ACWI ex-U.S. rather than the MSCI EAFE index. I see no reason to not expect differences in future returns. I would thus stay away from using MSCI EAFE ETFs if possible.

U.S. small and mid-cap stocks


Some think using a total U.S. stock market index should do better than the S&P 500 index since broader indices include small and midcap stocks. We can easily check that out.

The broader U.S. stock indices have performed similarly to one another. I will use the Russell 3000 since it has the longest price history.


Here are GEM results comparing the S&P 500 to the Russell 3000 from when the Russell 3000 began trading in January 1979.



S&P 500
Russell 3000
CAGR
18.7
17.06
ANNUAL STD DEV
13.9
13.9
SHARPE RATIO
0.99
0.86
WORST DRAWDOWN
                             -16.8
                             -23.3


Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index.

The reason broader indices give worse results may be due to there not being a small cap premium despite many who think otherwise. See here and here for more on this.

Long term bonds

Some prefer to use long-term Treasury bonds as a safe harbor when they exit stocks because stocks and bonds have been negatively correlated. They then think they will earn better returns being in long duration bonds when stocks are weak.

It is true that stocks and bonds have been negatively correlated in recent years. But that has not always been the case. In fact, stock-bond correlations are as likely to be positive as negative over the long run.

Source: “Equity-Bond Correlation: A Historical Perspective”, Graham Capital Management Research Note, September 2017

Long-term bonds have had similar returns to intermediate bonds despite their higher volatility. The return of intermediate bonds includes compensation for reinvestment risk once these bonds mature. Long bonds do not have this reinvestment risk and do not require a risk premium for it.

Here are GEM results with the Barclays U.S. Aggregate bond index versus the Barclays 20 Year Treasury bond index from when both became available in January 1976.



AGG Bonds
20 YR Treasuries
CAGR
17.5
18.1
ANNUAL STD DEV
12.5
13.6
SHARPE RATIO
0.98
0.94
WORST DRAWDOWN
                             -16.8
                             -17.0


Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index.

On a risk-adjusted basis, 20-year Treasuries did not outperform intermediate-term bonds despite low stock-bond correlations and a strong bull market in bonds. Under more normal market conditions, long term bonds with their higher risk are likely to be at a disadvantage as a safe asset. See here for more on long-term bonds as a crisis asset.

Other markets

Whenever a sector or factor fund is strong, I get emails asking if I have looked at adding it to GEM. To answer these questions, I used long-term index data to see if adding any of the following would have improved GEM results: small cap, value, low volatility, quality, stock momentum, equal weight, REITs, commodities, and the NASDAQ 100. None added value to GEM.

There are quantitative models that get better results by including larger than market cap allocations to emerging markets (EMs) in their backtests. EMs did particularly well in the late 1990s and mid 2000s when newly liberated EM countries had rapid export growth and large capital flows.

EMs show an improvement in GEM if you use only the MSCI EM data which begins in 1988.  When I added additional pre-MSCI EM data to GEM, the results were disappointing. Drawdowns and volatility increased substantially. The same thing happened with sector rotation when I obtained an additional 20 years of sector data back to 1973. These examples again show the importance of longer data sets. The statistician Edwards Deming said, “In God we trust. All others bring data.”

Recent data mining example


I have gotten emails recently asking about a strategy called “Accelerating Dual Momentum” that was inspired by GEM. This model looks at 20 years of mutual fund data. Based on that data, the developer uses long-term Treasury bonds as his safe harbor asset and a combination of short look back periods. We have seen that these may not be the best choices based on longer term data.

The developer also questions using the MSCI ACWI ex-U.S. index of large and mid-cap stocks as the best vehicle for non-U.S. equities. His argument is that companies are more globalized now, so the correlation between U.S. and non-U.S. companies is higher than it once was. This may be true. But the following chart from my website’s FAQ page shows something else happening. The relative strength difference between U.S. and non-U.S. equities is due mostly to macro-economic conditions reflected in the strength or weakness of the U.S. dollar.


The developer’s solution to what he perceives as a correlation problem is to use a small to midcap international stock fund in place of a large cap international fund.

The starting date of the small to midcap MSCI ACWI ex-U.S. index is May 1994. There is not enough history there to make a good assessment of non-U.S. small to midcap performance. But we know that small cap international stocks do not show a statistically significant size premium as noted here.

An ACWI small to midcap ex-U.S. index fund began only in 2009. So in place of an index fund, the developer uses an actively managed small to midcap international fund. Looking at a large universe of funds, one can always after the fact find a few actively managed funds that have outperformed similar index funds. But there is a problem with that approach. Fund performance may be persistent over the short-run due to momentum. But over the long run, there is no meaningful relationship between past and future fund performance. See here for more on this.

At the end of his discussion, the developer presents a chart showing the rolling real return of GEM versus his model. Even after all his data mining efforts, the real return of both models over the past 30 years has been about the same. If the developer had calculated GEM correctly, GEM would have been the winner.

What is surprising

It is not surprising that people try to develop or modify models using 20 years of data. Most do not realize how much uncertainty exists when you use this amount of data. What is surprising though is how many people ignore contrary evidence. Much of the information I presented here is in my book. All the information and more are on my website’s FAQ page. But people still cling to their prior beliefs and ignore contrary evidence.

Perhaps this should not surprise me. In the 1960s and several times since then, academics have shown that actively managed funds underperform index funds. Yet fifty years later, only 35% of total U.S. fund assets are in index strategies.