September 13, 2018

Perils of Data Mining

From the time my book was published others have tried to improve upon the book’s Global Equity Momentum (GEM) model. There is nothing wrong with trying to improve on prior work. That is how progress is made.

But such attempts can have data mining, overfitting, and ex-post selection bias issues. Data mining is when you search through data to develope or optimize a model. Such models often do not hold up going forward, especially if you have a limited amount of testing data.

Many develop quant models by data mining 15 to 20 years of ETF or mutual fund data which is as far back as that data will let you go. Here is a chart from my book showing how regimes change considerably every 15 years.


Searching for parameters based on only 15 or 20 years of data are likely to give disappointing results going forward as regimes change. Even 40 years of data may not be enough to inspire full confidence.

Model overfitting happens with having complex models.  John von Neumann said, “With four parameters I can fit an elephant and with five I can make him wiggle his trunk.”

Selection bias is when you know what your testing results are likely to be ahead of time and build a model incorporating that information. You might select data or your data starting point knowing it will give good results while ignoring other possibilities.

Here is the most egregious example of selection bias that I know.  An advisory firm invited me to dinner to discuss licensing my proprietary models. I thought this odd since they already had their own momentum-based models. At dinner I asked them why their published results only went back only 13 years when there was more data available. They said it was because investors do not like to see drawdowns greater than 20%!

Selection bias, model overfitting, and data mining issues may not be obvious or intentional. Here is what I have done to try to avoid these problems.

Use lots of data


Our first GEM backtest began from 1974. We were constrained by the amount of bond data that we had then. When I acquired more bond data, we extended our backtest to 1971 where we were now limited by the amount of MSCI non-U.S. stock index data. The extra 3 years of performance gave us an out-of-sample period covering the 1973-74 bear market. GEM performed well out-of-sample by being out of stocks during most of the bear market.

We recently gained access to non-MSCI stock index data and were able to extend our GEM backtest to Jan 1950. Asness, Israelov, and Liew (2017) in “International Diversification Works (Eventually)” also used 1950 as a starting date for their study. During World War II almost no one invested globally. Capital controls made it impossible to purchase equities in hostile countries, The Templeton Growth Fund that began in 1954 was the first international fund available to U.S. investors .




Historical data and analysis should not be taken as an indication or guarantee of any future performance. Future performance of  GEM may differ significantly from historical performance. Please see our Disclaimer page for additional disclosures.

It is encouraging that GEM continued to outperform during the 1950s and 1960s. See the Performance page of our website for more details.

Respect prior studies and well-established ideas

Researchers have studied long-short momentum more than any factor in finance. Geczy and Samonov (2017) looked at momentum applied to geographically diversified stock indices, bonds, currencies, commodities, stock sectors, and U.S. stocks back to 1801. Momentum outperformed buy-and-hold in all these areas. The best results were with global stock indices shown below as “Equity”. These are what we use with momentum.


Source: Geczy & Samonov (2017), “Two Centuries of Multi-Asset Momentum (Equities, Bonds, Currencies, Commodites, Sectors, and Stocks

We use a 12-month momentum lookback because Cowles & Jones found it worked well in 1937. Jegadeesh & Titman did also in their seminal momentum research done in the 1990s. A 12-month momentum look back soundly beat buy-and-hold from the beginning of stock market trading in the 1600s and with other assets back to 1223 as reported by Greyserman & Kaminski (2014).

Keep things simple


We prefer to be holding stocks as much as we can since they have the most proven risk premium. We keep things simple by being in U.S. or non-U.S. stock indices according to their relative strength over the preceding 12 months. For non-U.S. stocks, we avoid selection bias by being in as broad an index as possible. That is the MSCI All Country World Index ex-U.S (ACWI ex-US). It includes all non-U.S. MSCI developed and emerging countries weighted by their market capitalization. When the trend in stocks is negative according to 12-month absolute momentum, we exit stocks for the safety of aggregate bonds. We always try to follow Einstein’s advice of keeping things as simple as possible but no simpler. Let us look instead now at variations of dual momentum appearing on the internet.

Shorter lookback periods


In my book I show that a 12-month look back period outperformed 3, 6, and 9-month look back periods with GEM. For years now, my website’s FAQ page has described in more detail why a 12- month look back works best. Yet there are those who still believe that because shorter look backs are more sensitive to market changes, they should give better results.

A 3-month look back performed well over the past 20 years. If you were to look only at that data, you might feel reassured about using a shorter look back period. But this starts to unravel in 1979-80 when the markets were very choppy. Choppiness gives both lower returns and higher drawdowns. Here are GEM results from Jan 1971 through Aug 2018 comparing 12 and 3-month look back periods.


12 MONTHS
3 MONTHS
CAGR
16.6
13.6
ANNUAL STD DEV
12.2
11.8
SHARPE RATIO
0.94
0.74
WORST DRAWDOWN
-16.8
-23.3

Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index.

There are also tax advantages to a 12-month look back. Using 12 months, GEM trades on average 1.3 times per year. Seventy percent of GEM’s gains are long-term, while 100% of its losses are short-term. There is more trading and these tax advantages disappear if you use a shorter look back period.

Source: Greyserman and Kaminski, Trend Following with Managed Futures, John Wiley &   Sons, Inc, 2014

Using a 12-month lookback also reduces data mining concerns and seasonality bias.

EAFE instead of ACWI ex-U.S. 


There are websites that show dual momentum results and issue signals using an ETF for the MSCI EAFE index rather than the broader MSCI ACWI ex-U.S. index. Emerging markets and Canada are missing from the MSCI EAFE index. They make up 24% of the MSCI ACWI ex-US index. Here are the GEM results using each index since 1989.The MSCI ACWI ex-U.S. index was introduced in December 1988.


ACWI ex-US
EAFE
CAGR
15.6
14.3
ANNUAL STD DEV
13.1
13.0
SHARPE RATIO
0.84
0.73
WORST DRAWDOWN
                             -17.0
                             -17.0

Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index.

GEM earned 130 basis points more in annual return using the MSCI ACWI ex-U.S. rather than the MSCI EAFE index holding all else constant. I would stay away from using MSCI EAFE ETFs if at all possible.

U.S. small and mid-cap stocks


Some think using a total U.S. stock market index should do better than the S&P 500 index since broader indices include small and midcap stocks. We can easily check that out.

The broader U.S. stock indices have performed similarly to one another. I will use the Russell 3000 since it has the longest price history.


Here are GEM results comparing the S&P 500 to the Russell 3000 from when the Russell 3000 began trading in January 1979.


S&P 500
Russell 3000
CAGR
18.7
17.06
ANNUAL STD DEV
13.9
13.9
SHARPE RATIO
0.99
0.86
WORST DRAWDOWN
                             -16.8
                             -23.3

Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index.

The reason broader indices give worse results may be due to there not being a small cap premium despite many who think otherwise. See here and here for more on this.

Long term bonds

Some prefer to use long-term Treasury bonds as a safe harbor when they exit stocks because stocks and bonds have been negatively correlated. They then think they will earn better returns being in long duration bonds when stocks are weak.

It is true that stocks and bonds have been negatively correlated in recent years. But that has not always been the case. In fact, stock-bond correlations are as likely to be positive as negative over the long run.

Source: “Equity-Bond Correlation: A Historical Perspective”, Graham Capital Management Research Note, September 2017

Long-term bonds have had similar long-run returns to intermediate bonds despite their higher volatility. The return of intermediate bonds includes compensation for reinvestment risk once these bonds mature. Long bonds do not have this reinvestment risk and do not require a risk premium for it. Therefore, their relative return suffers.

Here are GEM results with the Barclays U.S. Aggregate bond index versus the Barclays 20 Year Treasury bond index from when both became available in January 1976.


AGG Bonds
20 YR Treasuries
CAGR
17.5
18.1
ANNUAL STD DEV
12.5
13.6
SHARPE RATIO
0.98
0.94
WORST DRAWDOWN
                             -16.8
                             -17.0

Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index.

On a risk-adjusted basis, 20-year Treasuries did not outperform intermediate-term bonds despite low stock-bond correlations and a strong bull market in bonds over the past 35 years. Under normal market conditions, long term bonds with their higher risk are more likely to be at a disadvantage as a safe asset. See here for more on long-term bonds as a crisis asset.

Other markets

Whenever a sector or factor fund is strong, I get emails asking if I have looked at adding it to GEM. To answer these questions, I used long-term index data to see if adding any of the following would have improved GEM results: small cap, value, growth, low volatility, quality, stock momentum, equal weight, REITs, commodities, and the NASDAQ 100. None added value to GEM.

There are some quantitative models that get better results by including larger than market cap allocations to emerging markets (EMs) in their backtests. EMs did particularly well in the late 1990s and mid 2000s when newly liberated EM countries had rapid export growth and large capital flows.

EMs give an improvement in GEM if you use the MSCI EM data which begins in 1988. When I added additional pre-MSCI EM data to GEM, the results were disappointing. Drawdowns and volatility increased substantially. The same thing happened with sector rotation when I obtained an additional 20 years of sector data back to 1973. These examples again show the importance of longer data sets. The statistician Edwards Deming said, “In God we trust. All others bring data.”

Recent data mining example


I have gotten emails recently asking about a strategy called “Accelerating Dual Momentum” that was inspired by GEM. This model looks at 20 years of mutual fund data. Based on that data, the developer uses long-term Treasury bonds as his safe harbor asset and a combination of short look back periods. We have seen that these may not be the best choices based on longer term data. There is also good reason for taxable accounts to avoid short look back periods.

The developer also questions using the MSCI ACWI ex-U.S. index of large and mid-cap stocks as the best vehicle for non-U.S. equities. His argument is that companies are more globalized now, so the correlation between U.S. and non-U.S. companies is higher than it once was. This may be true. But the following chart from my website’s FAQ page shows something else happening. The relative strength difference between U.S. and non-U.S. equities is due mostly to macro-economic conditions reflected in the strength or weakness of the U.S. dollar.


Relative strength momentum is aided by a strong home country bias that keeps U.S. investors from fully exploiting strength in non-U.S. equities when they are outperforming U.S. equities.

The developer’s solution to what he perceives as a correlation problem is to use a small to midcap international stock fund in place of a large cap international fund.

The starting date of the small to midcap MSCI ACWI ex-U.S. index is May 1994. There is not enough history there to make a good assessment of non-U.S. small to midcap performance. But we know that small cap international stocks do not show a statistically significant size premium as noted here.

An ACWI small to midcap ex-U.S. index fund began only in 2009. So in place of an index fund, the developer uses an actively managed small to midcap international fund. Looking at a large universe of funds, one can always after the fact find a few actively managed funds that have outperformed similar index funds. But there is a problem with that approach. Fund performance may be persistent over the short-run due to momentum. But over the long run, there is no meaningful relationship between past and future fund performance. See here for more on this.

At the end of his discussion, the developer presents a chart showing the rolling real return of GEM versus his model back to 1871. But prior to 1970, he does not use international stocks. There was thus no dual momentum. There is only absolute momentum with the S&P 500 and government bonds (10-year bonds after 1953 and longer-term government bonds prior to that). He uses bond yields and not their total return. He further misrepresents GEM by using EAFE instead of ACWI ex-US when he does use international stocks from 1970 forward. We saw earlier that the narrower EAFE index understates GEM performance by130 bps annually from when ACWI ex-US was introduced in late 1988. Selection bias is where you use a subset of your data to make your results look better or, in this case, worse than they really are.

Even after all his data mining efforts and distortions of GEM and dual momentum, the developer shows the trailing 30-year annualized real return of both models since the mid-1990s to be about the same. Rather than trying to overtweak performance by manipulating limited amounts of data, others would do better keeping in mind the simple principles and behavioral biases that make dual momentum successful.

What is surprising

It is not surprising that people try to develop or modify models using only 20 or so years of data. Most do not realize how much uncertainty exists with 20 years of data. What is surprising is how many people ignore evidence given to them. Much of the information here is in my book. All the information and more is on my website’s FAQ page. But people still cling to their prior beliefs and ignore contrary evidence.

Perhaps this should not surprise me. In the 1960s and several times since then, academics have shown that actively managed funds underperform passive index funds. Yet fifty years later, only 35% of total U.S. fund assets are in index strategies. Behavioral biases are surely hard to overcome.