August 12, 2017

Book Review: Standard Deviations, Flawed Assumptions, Tortured Data and Other Ways to Lie with Statistics

Years ago, when asked to recommend some good investment books, I often suggested ones dealing with the psychological issues influencing investor behavior. These focused on investor fear and greed, showing “what fools these mortals be.” Here are examples: Devil Take the Hindmost: A History of Financial Speculation by Edward Chancellor, and Extraordinary Popular Delusions and the Madness of Crowds by Charles MacKay.

In recent years, there has been a wealth of similar material in the form of behavioral finance and behavioral economics. I now suggest that investors do an internet search on these topics. To better understand investing and investors, you should be familiar with concepts like herd mentality, recency bias, confirmation bias, overconfidence, overreaction, loss aversion, and the disposition effect.

An enjoyable introduction to this field is Richard Thaler’s Misbehaving: The Making of Behavioral Economics. Here is an extensive bibliography for those who want to do a more in-depth study.

Importance of Statistical Analysis

Now that quantitative investment approaches (factors, indexing, rules-based models) are becoming prominent, you need to also be able to properly evaluate quantitative methods. A lively book on the foundations of statistical analysis is The Seven Pillars of Statistical Wisdom by Stephen Stigler. An engaging and more nuanced view of the subject is Robert Abelson's Statistics As Principled Argument.

What I mostly recommend is Standard Deviations, Flawed Assumptions, Tortured Data and Other Ways to Lie with Statistics by economist Gary Smith. Everyone should benefit from reading this book.


Smith’s premise is that we yearn to make an uncertain world more certain and to predict the unpredictable. This makes us susceptible to statistical deceptions. The investment world is especially susceptible now that it is more model-based and data driven.

It is easy to lie with statistics but hard to tell the truth without them. Smith takes up the challenge of sorting good from bad using insightful stories and entertaining examples. Here are some salient topics with real-world cases that Smith covers:
 
•    Survivorship and self-selection biases
•    Overemphasis on short-term results
•    Underestimating the role of chance
•    Results distorted by self-interest
•    Correlation is not causation
•    Regression to the mean
•    Law of small numbers
•    Confounding factors
•    Misleading graphs
•    Gamblers fallacy

Critical Thinking

Smith is not afraid to point out mistakes by the economic establishment.  He mentions  errors by University of Chicago economist Steven Levitt of Freakonomics fame.  Smith also discusses research made popular by two Harvard professors, Reinhart and Rogoff. The professors concluded that a nation’s economic growth is imperiled when its ratio of government debt to GDP exceeds 90%. Smith points out serious problems with their work due to inadvertent errors, selective omissions of data, and questionable research procedures.

Publish or perish can contribute to errors in academic research. Economic self- interest, as in medical and financial research, can also cause errors. Smith helps us see how important it is to look at research critically instead of blindly accepting what is presented.

Theory Ahead of Data

Throughout his book Smith focuses on the potential perils associated with deriving theories from data. He gives examples of the Texas sharpshooter fallacy (aka the Feynman trap). Here a man with a gun but no skill fires a large number of bullets at the side of a barn. He then paints a bullseye around the spot with the most bullet holes. Another version is where the sharpshooter fires lots of bullets at lots of targets. He then finds a target he hits and forgets the rest. Predicting what the data looks like after examining the data is easy but meaningless. Smith says:

Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense, and it needs to be tested with uncontaminated data.

Financial market researchers often use data to help invent a theory or develop a trading method.  Theory or method generated by ransacking data is a perilous undertaking. Tortured data will always confess something. Pillaged data without theory leads to bogus inferences.

Data grubbing can uncover patterns that are nothing more than coincidence. Smith points to the South Seas stock bubble as an example where investors saw a pattern - buy the stock at a certain price and sell it at a higher price. But they didn’t think about whether t it made any sense.

Smith addresses those who take a quantitative approach to investing. He says quants have “a na├»ve confidence that historical patterns are a reliable guide to the future, and a dependence on theoretical assumptions that are mathematically convenient but dangerously unrealistic.”

Common Sense

Smith’s solution is to first make sure that one’s approach makes sense. He agrees with the great mathematician Pierre-Simon LaPlace who said probabilities are nothing but common sense reduced to calculation.

Smith says we should be cautious of calculating without thinking. I remember a case at the Harvard Business School where we looked at numbers trying to figure out why Smucker’s new ketchup was not doing well as a mass market item. The reason turned out to be that no one wanted to buy ketchup in a jam jar. We need to look past the numbers to see if what we are doing makes sense.

I often see data-derived trading approaches fit to past data without considering whether the approaches conform to known market principles. If one does come up with models having sensible explanations, they should then be tested on new data not corrupted by data grubbing. Whenever you deviate from the market portfolio, you are saying you are right and the market is wrong. There should be some good reasons and plenty of supporting data for believing this is true. [1]

Self-Deception

I have seen some take the concept of dual momentum and make it more complicated with additional parameters. They may hold back half their data for model validation. They call this out-of-sample testing, but that is a questionable call.

Do you think you would hear about these models if their “out-of-sample” tests showed poor results? Would they discard their models and move on? Chances are they would search for other parameters that gave satisfactory results on both the original and hold out data.

Momentum is robust enough that a test on hold out data might look okay right from the beginning. But that is likely due to momentum’s overall strength and pervasiveness.  It is questionable whether you really had something better than with a simpler approach. You may have just fit past data better, which is easy to do by adding more parameters.

Keep It Simple

There are some areas I wish Smith had addressed more. First is the importance of having simple models, ala Occam’s razor. Simple models dramatically reduce the possibility of spurious results from overfitting data.

Overfitting is a serious problem in financial research. With enough parameters, you can fit almost any data to get attractive past results. But these usually do not hold up going forward.

John von Neumann said that with four parameters he can fit an elephant, and with five he can make it wiggle its trunk.In the words of Edsger Dijkstra, "Simplicity is a great virtue, but it is requires hard work to achieve it and education to appreciate it. And...complexity sells better."

In Data We Trust

Smith briefly mentions the law of small numbers popularized bt Kahnemann and Tversky. But I wish Smith had gone more into the importance of abundant data and how it helps us avoid biased results.

According to de Moivre’s observation, accuracy is proportional to the square root of the number of observations. To have half the standard error, you need four times the data.

The stock market has had major regime pattern changes about every 15 years. A model that worked well during one regime pattern may fail during the next. We want a robust model that holds up across all regimes. To determine that, you need plenty of data. In the words of Sherlock Holmes, “Data! Data! Data! I can’t make bricks without clay!”

Theories Without Data

Smith talks a lot about the issue of data groping without theory. But he also mentions the opposite problem of theory without adequate data analysis. Smith cites as an example the limit to growth theories of Malthus, Forrester, and Meadows. Smith contends they did not make any attempt to see whether historical data supported or refuted their theories. Most economists now dismiss these theories.

But Smith may not have considered later information on this topic. Here and here is information that makes this subject more provocative. Smith says it is good to think critically. This is true even when approaching a book as thoughtful as Smith’s.

[1] Even if a model makes sense, you need to make sure it continues to do so. Investment approaches can become over utilized and ineffective when they attract too much capital. See here and here.   

July 14, 2017

Trend Following Research

There have been hundreds of research papers on relative strength momentum since the seminal work by Jegadeesh and Titman in 1993. [1] Relative momentum has been shown to work in and out-of-sample within and across most asset classes. Theoretical results have been consistent, persistent, and robust.

Research on trend following absolute momentum got a much later start. The first paper on “Time Series Momentum” was by Moskowitz, Ooi, and Pedersen (2012). [2]


This was followed by my "Absolute Momentum: A Simple Rules-Based Strategy and Universal Trend Following Overlay" in 2013.

Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained.

Since then, there have been other good absolute momentum research papers. But absolute momentum and trend following in general have still not gotten the attention they deserve. Major fund sponsors offer single or multi-factor products using relative momentum. But not a single one incorporates absolute momentum as a trend filter.

Absolute momentum can enhance expected returns just like relative momentum. But, unlike relative momentum, absolute momentum can also reduce expected downside risk exposure. It performs best in extreme market environments, making it an excellent portfolio diversifier.

Moving Averages

Let us look at other trend following research over the past few years. In 2014, Lemperiere et al. applied exponential moving averages to futures since 1960. They examined spot commodities and stock indices since 1800. [3] Their “Two Centuries of Trend Following” showed a t-statistic of 5 on excess returns since 1960 and a t-statistic of 10 on excess returns since 1800. These results were after accounting for the upward drift of the markets. The effect was stable across time and asset classes. There was also no degradation of long-term trend strength in recent years.


In "Timing the Market with a Combination of Moving Averages," Glabadanidis (2016) presented ample evidence of the timing ability of a combination of simple moving averages applied to U.S. stocks.

A comprehensive treatment of moving averages is in the book Market Timing with Moving Averages: The Anatomy and Performance of Trading Rules  by Valeriy Zakamulin. This book will be published in September. Zakamulin has already written academic papers on moving average methods.

In the book, Zakamulin analyzed eight different types of moving averages along with absolute momentum. He applied these to stocks, stock indices, bonds, currencies, and commodities since 1857. He showed that these strategies can protect portfolios from losses when needed the most. Zakamulin's conclusion was that trend following represents a prudent investment approach for medium and long-term investors.

What was especially interesting to me was Zakamulin’s 159-year test on the S&P Composite Index. He looked at the frequency of positive results using a 10-year rolling performance window. Absolute momentum came in first and second place among the strategies tested. It also held 7 out of the top 10 highest positive rankings.

Absolute Momentum

There have been at least a half-dozen noteworthy studies during the past few years that focused on absolute momentum.

In Trend Following with Managed Futures, Greyserman and Kaminski (2014) applied absolute momentum to stock indices, bonds, commodities, and currencies all the way back to 1223! They held assets long or short depending on the trend of the last 12 months. The authors found that trend following was much more effective than buy-and-hold. Sizes of the five largest drawdowns were also reduced by an average of one-third.

In “The Trend is Your Friend: Time-Series Momentum Strategies Across Equity and Commodity Markets,” Georgopoulou and Wang (2016) found that absolute momentum was significant, consistent, and robust across conventional asset classes from 1969 to 2015.


In “Trend Following: Equity and Bond Crisis Alpha,” Hamill, Rattray & Van Hemert (2016) applied absolute momentum to global diversified markets from 1960 through 2015. Absolute momentum performed consistently and was particularly strong duing the worst equity and bond environments.



In “The Enduring Effect of Time Series Momentum on Stock Returns Over Nearly 100 Years,” D’Souza et al. (2016) found significant profits from absolute momentum applied to individual U.S. stocks from 1927 to 2014 and to international stocks from 1975. Unlike relative momentum, absolute momentum did well in both up and down markets. Absolute momentum fully subsumed relative momentum and was not subsumed by any other factor. The combination of relative and absolute momentum (dual momentum) earned a striking 1.88% per month (t-statistic 5.6).


In “Two Centuries of Multi-Asset Momentum (Equities, Bonds, Currencies, Commodities, Sectors and Stocks),” Geczy and Samonov (2017) applied relative momentum to country indices, bonds, currencies, commodities, sectors, and U.S. stocks over the past 215 years. But they also showed that absolute momentum (which they called “trend”) had highly significant positive results in every asset class.


In “Time-Series and Cross-Sectional Momentum Strategies under Alternative Implementation Strategies,” Bird, Gao, and Yeung (2017) found that both relative and absolute momentum generated positive returns in 24 major stock markets from 1990 through 2012. But absolute momentum was clearly superior. With appropriate cutoffs, absolute momentum outperformed in all 24 markets. The authors concluded that momentum is best implemented using absolute momentum.


A recent study of absolute momentum by Hurst, Ooi, and Pedersen (2017) is an extension of their earlier paper, “A Century of Evidence on Trend-Following Investing.”  In it, the authors studied the performance of trend-following across global markets (commodities, bond indices, equity indices, currency pairs) since 1880. They found in each decade since 1880, absolute momentum delivered positive average returns. This was accomplished with low correlation to traditional asset classes and after adjustments for fees and trading costs.

Absolute momentum performed well across different macro environments and in 8 out of 10 of the largest crisis periods. It performed best during extreme up and down markets in U.S. stocks.


  
Implications

The non-acceptance of absolute momentum as a trend filter by most fund sponsors in the face of strong evidence of its effectiveness has three likely explanations. The first is that research information disperses very slowly through the investment community.  The second is that investors prefer to follow the crowd, even if that means losing more in down markets. They may also be especially averse to experiencing trend following whipsaws. The third reason is the possible long-standing bias against trend following. This has been difficult to dislodge, despite strong evidence of its effectiveness.

Based on the above research results, trend may very well be the strongest factor. Yet it is also the most ignored factor. This is good news for those of us using it.

There are serious questions about the real-time efficacy of factor-based investing, especially for the future. This is because of over exploitation and capacity constraints with some factors, as explained here and here. It looks like trend followers have nothing there to worry about.

[1] Many academic papers refer to relative momentum as cross-sectional, even though some applications are cross-asset, not cross-sectional.
[2] Academic papers often refer to absolute momentum as time series momentum. But all momentum is based on time series. Geczy and Samonov (2017) repeatedly characterize all momentum as time series momentum in their latest paper.
[3] For a theoretical justification of trend following, see Zhu and Zhou's (2009), “Technical Analysis: An Asset Allocation Perspective on the Use of Moving Averages.”

June 10, 2017

Real Time Factor Performance

According to S&P DJ Indices, 92% of all actively managed stock funds failed to beat their benchmarks over the past 15 years. This should come as no surprise. Similar results were published more than 20 years ago. This information has caused a move away from active stock selection and toward index funds or systematic approaches.

Money managers have recently moved more in the direction of factor-based and so-called smart beta investing. But as I pointed out in my February blog post, “Factor Zoo or Unicorn Ranch?”, there are some serious issues with this type of investing. Not the least of which is the shortfall between actual and theoretical returns.

Theoretical results are in academic papers and all over the internet. Very little information is available on the real-time performance of factor-based investments.

Lack of Real Time Performance Studies

The Loughran and Hough study in 2006 was a rare look at real-time factor performance. In it, the authors showed that there was no significant difference in performance between U.S. value and growth mutual funds from 1965 through 2001. The authors concluded by saying the idea that value generates superior long-run performance is an “illusion”.

This was the only study I could find that examined actual rather than theoretical results of a popular investment factor. But now there is another study. Last month Arnott, Kalesnik, and Wu (AKW) published an article called, “The Incredible Shrinking Factor Return.” 

AKW examined actual versus theoretical performance of four factors well-known to investors using 5323 mutual funds from January 1991 through December 2016. These factors are market, value, size, and momentum

Two Step Approach

To determine their results, AKW did a two stage (Fama-MacBeth) regression.  In stage 1, they regressed mutual fund returns against the excess return of each factor to figure each fund’s average factor loadings. In stage 2, they regressed fund returns against the average factor loadings to get the return of each fund per unit of factor exposure. These were then compared to the fund factor returns.

This approach is a good one since it incorporates factor covariances to determine factor premia.
Comparing actual to theoretical performance can reveal data mining, selection, and survivorship biases. It can also identify the effects of management fees, bid-ask spreads, and transaction costs.

In many academic papers, more than half the profits come from shorting stocks. But shorting may be expensive and sometimes impossible to do. Looking here at long-only mutual fund performance removes those unrealistic profits.
  
Performance Shortfalls

Here are AKW’s regression results using 25 years of fund data from January 1991 through December 2016:


We see a 50% shortfall in the performance of the market factor. This is not surprising. For many years, other research has shown this effect. High beta tends to underperform low beta on a risk adjusted basis going forward in time. 
 
In the AKW regression, the size factor shows a small but insignificant improvement in actual versus theoretical returns. This may be data noise.

Value is the most commonly used factor. AKW’s regression shows that value fund managers captured only 60% of the value premium since 1991. This compliments the recent findings of Kok, Ribando & Sloan (2017). They claim that outside the initial evaluation period of 1963 to 1981, the evidence of a value premium is weak to non-existent. Value is suspect now,  both on a theoretical and actual basis.

The largest shortfall AKW discovered is with momentum. The realized momentum return of live portfolios was close to zero compared to a theoretical return of around 6% per year. Stock momentum alpha has not been positive since 2002. AKW say transaction costs play a major role as the source of slippage between theoretical and realized factor returns. In their words, “…higher turnover strategies, such as momentum, have trading costs that may be large enough to wipe out the premium completely if enough money is following the strategy.”

AKW concludes their study by asking if 10,000 quants all pursue the same factor tilts, how likely is it that these factors will add value?

Skepticism and Pushback

Skepticism toward new information may be a good thing. More research and analysis can help advance what we know about the world.

Corey Hoffstein offers a critical response to the AKW study in an article he calls “A Simulation Based Rebuttal to Research Affiliates.” Corey points out one should not overlook style drift as a significant source of error. Return estimates can be inaccurate if managers switch investing styles. In support of this, Corey shows 3-year rolling betas versus full period betas for the Vanguard Wellington Fund (VWELX). His data is from January 1994 through July 2016.
  
Corey’s logic is like saying one should be suspicious of the 10% average return of the S&P 500 index over the past 50 years because yearly returns have varied from -37% to 38%.  One would never expect to earn 10% every year going forward.

Research of pension consultants shows that 3-year performance by equity managers is mean reverting. This may explain some of the difference between full period and 3-year rolling window returns. With 3-year rolling returns, some, and perhaps a lot, of the variation in returns may be random noise.

In addition to the full data set, AKW looks at an expanding window of returns that incorporates all the data available up to that point. An expanding window regression converges to the full sample factor betas toward the end of the sample period. When AKW compares expanding window regressions to full period ones, they get comparable results.

Corey’s second argument is that you can attribute a portion of the AKW identified shortfall to estimation error. Factor loading estimates are noisy. Estimation error in the independent variables creates a pull toward zero in the beta coefficients. This causes a downward biasing of factor premia estimates in the second stage of AKW’s regression. 

Corey offers no direct evidence of how much bias there is in the AKW regression. Instead, Corey conducts a 1000 hypothetical fund simulation using normally distributed betas.

There are some good reasons why simulations are rarely used in financial markets research. Simulations are dependent on distributional assumptions that are usually unrealistic with financial markets. Market returns are not independent, and their underlying distributions may be non-stationary.

In his simulation, Corey assumes that returns are normally distributed, which is not the case for mutual fund returns. Nor does Corey show that estimation errors have the same distribution scale as the betas themselves. If  one is going to use simulated data, it would be better to use additional simulations with other distributions.

Academic researchers prefer to use as much real data as they can rather than simulated data. The AKW regression uses 25 years of actual mutual fund data, which should be enough to minimize the influence of tracking error on AKW's results.

Corey uses only one fund, VWEIX, with his simulation to estimate how much downward bias there might be in the AKW regression. He looks at the differences in standard deviation between full period and 3 year rolling estimates of VWELX’s beta coefficients.


Corey concludes there may be significant downward bias in the AKW regression estimates. Corey does not explain why there are different degrees of slippage for the different factors. Even if you accept his simulation, results from 1994 through 2016 of only one fund may just be on outlier. Other funds may conform well to the AKW's results. In the end, Corey says, "our results do not fully refute AKW’s evidence".

AKW also mentions this downward bias in betas due to estimation error in the independent variables. They conduct six different robustness tests that reinforce their results and help mitigate that error. Those results are consistent with AKW’s core findings.

We Are All Biased

I applaud Corey’s skepticism with regard to unexpected research findings. I also applaud him when he says, “…published research in finance is often like a back test. Rarely do you see any that does not support the firm’s products or existing views.” 

We see that with many advisors and fund managers, as well as throughout the blogsphere. But we should keep in mind that this logic works both ways. Those who adhere to alternative approaches are often the ones who challenge new ideas. We should apply some healthy skepticism to both sides of such controversies. 

April 12, 2017

Lessons Learned from Sports Investing

Wee Willie Keeler was one of the greatest contact hitters in baseball. One year, 30 of Keeler’s 33 home runs were inside the park. Keeler’s motto was, “Keep your eye clear, and hit ‘em where they ain’t.”

I have always tried to do that by focusing on underexploited investment opportunities. In the 1970s that meant stock options. In the 1980s I had success with managed futures.

Also in the 1980s I had a family member who bet on football games. He knew I invested using data-driven quantitative methods, so he asked me to take a look at betting NFL home underdogs. I was reluctant at first but then obliged him. I was surprised to discover profit opportunities there.

I became intrigued with the possibility of exploiting inefficiencies in that market. There were no computer-based sports databases back then and almost no published sports research. So I hired a few UC Berkeley students to go through the data and help me test betting strategies.

After we had a stable of successful angles, I put one of these students on a bus to Reno each weekend. Encouraged by our early results, I expanded this research to include all sports, both pro and college.

I focused on areas where the linemakers were not paying enough attention, such as game time weather conditions or mean reversion in team stats. My wife never understood why I was always so interested in the wind direction at Wrigley Field.

We even came up with a player stat based Monte Carlo simulator that predicted the outcome of every baseball game. It gave us an edge early in the season before others figured out the impact of all the off-season player trades.

One of my research assistants continued to analyze sports after graduation. He became Vice President of Basketball Operations for an NBA championship team. He is now VP of Basketball Strategy and Data Analysis with another NBA team.

Our biggest edge came from betting against public biases. For example, teams that showed poor performance in their last game were often under bet in their next game. As with stock market investing, mean reversion and public myopia were rampant in sports wagering. (My best indicator of positive future results has always been when investors overreact to short-term losses or underperformance and close out their accounts.)

Issues with Doing Well

As we continued to do well, some bookmakers would no longer take our action. One let us bet early so he could use that information to move their lines. Another became very friendly and would bring us other bookmakers’ lines as soon as they were available. This way he could know most of our plays and bet right along with us.

Afternoons we would hang large marking boards on the walls of our investment office and write down the betting lines from all our outs. Fortunately, we had very few office visitors!

I had a 12-foot BUD (Big Ugly Dish to get all the satellite feeds) installed at my house and would watch as many games as I could. That was the problem. Sports wagering was causing me to neglect my family, so I set it aside.

Looking back on my sports activities, I realize now that I learned valuable lessons that helped make me a better researcher and investor. Here are some of them:

Always Have an Edge

When I went to Nevada with friends, I would never play casino games. When they asked why, and I said, “I don’t gamble,” they would laugh. They knew I was betting tens of thousands of dollars every week on sporting events. I always wanted a positive expectation of profit before assuming any risk. To me, this is what distinguished what I was doing from gambling.

Most of those who invest actively have little or no edge. You cannot have an advantage doing what everyone is doing. You would generally be better off investing in low-cost passive index funds. As I indicated in my last blog post, factor-based investing may soon pose the same problem. My need for a positive expectation led me instead to the little exploited niche of dual momentum investing.

Do Your Homework

Betting lines, like financial markets, are mostly efficient. The only way to be confident you have an edge is through thorough research using plenty of data. Doing your homework gives you confidence. It helps you stay with your approach despite short-term fluctuations in the value of your investments.

For investors, this can mean not doing what everyone else is doing. Herding is a powerful behavioral instinct, but it can lead to mediocre or worse investment returns.  You need to have a healthy dose of skepticism about all strategies that differ from the market portfolio. This also means looking beyond academic studies. You need to be aware of how strategies actually perform real time in light of scalability and liquidity issues. And you need to consider how they will perform in the future as they attract more capital. [1]

Keep Things Simple

Selection bias, over optimization, and model overfitting are serious problems in both sports and non-sports research. If you keep tweaking a strategy, it isn’t difficult to find betting angles that look like they have over 60% winners. But these almost never hold up in real time.

With sports wagering you need 52.4% winners to break even after costs. Sports betting legend Lem Banker became wealthy with an overall winning percentage of around 57%.

Sports research taught me the importance of having a simple strategy with intuitive logic behind it. You also need plenty of backtest data across different markets. This is what led me to momentum investing. It is simple, logical, and supported by over 200 years of backtest validation across nearly all markets.

Have Realistic Expectations

If you win 57% of your sports bets, you are still going to have some serious losing streaks. You just have to accept this. Warren Buffett is often quoted as saying the # 1 rule of investing is to not lose money, and the # 2 rule is to never forget rule #1. That is nonsense. Buffett’s Berkshire Hathaway was down more than 50% twice during the past 15 years. Yet Buffett has still done well. Confidence in your approach and emotional discipline are really what you need once you have a proven edge.

Expecting to consistently win at sports much more than 60% of the time is unrealistic. Expecting to beat the markets most of the time on a short-term basis is also unrealistic. Here is the percentage of time that Global Equities Momentum (GEM) featured in my book outperformed the S&P 500 index over various periods since 1971:

Time horizon
% of time GEM outperformed the S&P 500
3 months
52%
1 year
55%
3 years
71%
5 years
85%
10 years
99%
Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index. Please see our Disclaimer page for more information.

Over one year or less, GEM did not do much better than a coin flip. But over 5 or more years, those results change considerably. Patience is important whether you are a traditional investor or have a 57%-win rate from sports. Warren Buffett did have the right idea when he said the stock market is a mechanism for transferring wealth from the impatient to the patient.

Leave Your Opinions at the Door

You need to forget your likes or dislikes and go where the data takes you to be an effective sports bettor.  The same is true for investing. I have seen many investors disregard or override their strategies when these conflicted with their hopes or cherished beliefs. Some close their accounts or decline to open new accounts because of their behavioral biases or fears. To be a winner over the long run, you need to be a good loser over the short run. You can do this if you have a proven edge with a simple approach, have done your homework, and have realistic expectations. Go Patriots!


[1] For more on this, see my blog post "Factor Zoo or Unicorn Ranch" and Research Affiliates' "The Incredible Shrinking Factor Return".

February 22, 2017

Factor Zoo or Unicorn Ranch?


According to Morningstar, as of June 2016, the assets in smart beta exchange traded products totaled $490 billion. BlackRock forecasts smart beta using size, value, quality, momentum, and low-volatility will reach $1 trillion by 2020 and $2.4 trillion by 2025. This annual growth rate of 19% is double the growth rate of the entire ETF market. Are factors the cure-all for our investment needs? Or are they like “active management” that everyone wanted to have instead of holding passive index funds in the 1970s?

No one then wanted to be just average. This ironically gave many investors below average returns as they used the same information to compete against one another.  Superior performance was usually due more to luck than to skill. But Bill McNabb, CEO of Vanguard, points out that passive index funds have been in the top quartile of long-term performance.

Factor-based investors and advisors now think they have an advantage. They base this belief on the results of theoretical asset pricing models, many of which have failed empirically.

Asset pricing models look at long-term long/short returns without taking into account the price impact of trading. Factors that looked good on paper may be lacking in robustness, pervasiveness, persistence, or intuitiveness. So let's look at this more closely.

Does Size Matter?

The small cap size premium was first identified by Banz in 1981. His results were influenced by extreme outliers from the 1930s.

Looking at more recent history, the oldest small cap index is the Russell 2000. It started in January 1979. Here is the Russell 2000 annual return and volatility over the life of the index compared to the S&P 500 index.

The Russell 2000 underperformed the S&P 500 by 1.3% annually and had a substantially higher standard deviation. The Russell 2000 thus  underperformed on both a risk-adjusted and non-risk adjusted basis.[1]

Here is a chart comparing the Sharpe ratios of all small and large cap stocks over a longer period of time. Small cap stocks failed to show significantly higher risk-adjusted profits than large cap stocks.

In the table below long-only small caps slightly outperformed large caps globally since 1982. But small caps have underperformed large caps in the U.S. since 1926. Where is the outperformance that Banz talked about? 
According to Shumway and Warther (1998) in “The Delisting Bias in CRSP's Nasdaq Data and its Implications for the Size Effect”, small caps originally showed a premium because they had an upward bias due to inaccurate returns on delisted stocks. When this bias was removed, the small cap anomaly disappeared.

In “Transaction Costs and the Small Firm Effect,” Stoll and Whitney (1983) showed that transaction costs offset a significant portion of the small cap size premium.

Some researchers say a small cap premium still exists if you combine size with other factors. In other words, size can be important depending on what you do with it.

Front Running

Some attribute the poor performance of the Russell 2000 index to the actions of front runners. Index replicators follow formulas for trading. They have little control over what and when to trade. Their trades are also known by the public ahead of time.

I pointed out in my last post that front runners cost S&P GSCI index investors 3.6% in annual return. Front running can happen with any index or factor-based strategy having known portfolio rebalancing dates.

Front runners can initiate trades ahead of index replicators or smart beta fund managers. They then take profits after the replicators and fund managers finish their trading. Front runners thereby capture part of the factor or index return at the expense of index and fund investors.

If I were still managing hedge funds, I might front run rules-based strategies like value or momentum. These strategies often hold less liquid, more volatile stocks that offer the highest front running profits. Momentum would be a particularly attractive target. Its high portfolio turnover means more opportunities for profit. 

Value - The Price is Right?

We all like bargains. Advisors and fund sponsors play off that desire by promoting the idea of a value premium. This past month I read two investment blogs saying cheap value stocks have outperformed the market by 4% per year. This may be a case of theoretical results differing from actual ones. Or it may be a case of the value premium never existing in the first place. According to Asness et al., the only period where there seemed to be a significant positive premium in large-cap stocks was over the in-sample 1963-1981 period. Over a longer 88-year period, there was no significant value premium.

A few months back, I referenced a study by Loughran and Hough that is worth mentioning again. These authors looked at the performance of all U.S. equity funds from 1962 through 2001. They used the prior 36 months to sort funds by style (top versus bottom quartile) and size (top versus bottom half). 

Equal Weighted Mutual Fund Returns 1965 to 2002


Growth
Value
Difference
t-stat
Large Cap
11.30
11.41
0.11
-.05
Small Cap
14.52
14.10
-0.42
-.16
Source: Loughran and Hough (2006), “Do Investors Capture the Value Premium?

From 1965 through 2001, the average large cap growth fund returned 11.30% per year, while the average large cap value fund returned 11.41%. This large cap outperformance of 0.11% of value over growth was insignificant.



With small caps, the authors were very surprised at the results. Small cap value funds earned 14.10%, while small cap growth funds returned 14.52%. Small cap value underperformed small cap growth by 0.42% per year.  


Israel and Moskowitz (2012) showed that the value premium is insignificant among the two largest quintiles of stocks and is concentrated among small cap stocks. [2] So, why did small cap value funds underperform small cap growth funds?

Loughran and Hough said wide bid-ask spreads and the price impact of trading worked against the capture of a value premium in small-cap stocks. For value investing in general, they concluded, “We propose that the value premium is simply beyond reach…investors should harbor no illusion that pursuit of a value style will generate superior long-run performance.”

Some who want to believe in the superiority of value or small cap investing point to performance of the Dimensional Fund Advisors (DFA) funds. Their U.S. Small Cap Portfolio (DFSTX) that began in March 1992 was the first factor-based small-cap fund. DFA's U.S. Large Cap Value Portfolio (DFLVX) and U.S. Small Cap Value Portfolio (DFSVX) funds began in February and March of 1993. All these funds have positive alphas with respect to the market. But none of the alphas are statistically significant.[3] To the extent that the DFA funds have done reasonably well may not be due only to their factor tilts


DFA serves as a market maker in the stocks they hold. This means they can be patient when adjusting portfolio positions. This reduces their costs of trading in exchange for some additional tracking error. Using a buy-sell range also reduces turnover and trading costs. Holding a very large number of securities reduces the price impact of DFA's trading.

DFA has also benefited from not being tied to an index and thereby subject to front running costs. DFA has been aggressive in lending securities, as well. Additionally, DFA has avoided IPOs and stocks with high borrowing costs. 


Stocks with high borrowing costs usually have a large short interest. This means there is a limited supply of stock available for borrowing. Studies here, here, and here show that heavily shorted stocks have negative abnormal returns while lightly shorted stocks outperform their benchmarks.


Source: Boehmer et al. (2009), “The Good News in Short Interest”

Risk Factors

People may not remember that factors were once called “risk factors.” Value funds are known for tracking error that can persist for 10 or more years. Value trap induced tracking error is a form of risk. It can cause investors and money managers to liquidate their positions at inopportune times.

Another risk is scalability. It might not be possible for popular strategies like value to always maintain an advantage over the market. This is particularly true of value stocks that are often out-of-favor and ignored. That can make them less liquid and more expensive to trade.

In “A Taxonomy of Anomalies Costs and their Trading Costs” Novy-Marx and Velikov (2015) looked at how capital levels can affect factor trading profits. Their calculations showed that excess profits disappear once the amount in value strategies exceeds $20.7 to $50.6 billion.


The Novy-Marx and Velikov capital levels are based on a turnover reducing approach. It buys value stocks ranked in the top 10th or 30th percentile. But it does not liquidate them until stocks drop out of the top 50th percentile. DFA, MSCI and others use a similar turnover reducing approach. 

Here is a chart showing the amount of capital invested now in dedicated U.S. large and mid-cap value funds. It does not include managed accounts, hedge funds, and many of the other 400+ funds having the word “value” in their names.

U.S. Large Cap Value Index Funds
Assets
iShares Russell 1000 Value (IWD)
$35.2 b
Vanguard Value (VTV)
$27.6 b
DFA US Large Cap Value I (DFLVX)
$19.7 b
iShares S&P 500 Value (IVE)
$13.1 b
iShares Russell Mid Cap Value (IWS)
$9.4 b
Vanguard Mid Cap Value (VOE)
$6.6 b
TIAA-CREF Large Cap Value Index (TRLCX)
$6.3 b
DFA US Large Cap Value III (DFUVX)
$3.4 b
Schwab US Large Cap Value (SCHV)
$2.9 b
Total Value Assets
$124.3 b
The $124.3 billion in value funds exceeds the upper bounds where Novy-Marx and Velikov say value profits would disappear.

Momentum – the Premier Anomaly

Momentum is the strongest market anomaly based on academic research. Momentum has been studied now for more than 25 years. It meets all the tests of robustness, pervasiveness, persistence, and intuitiveness. It is with investability that momentum falls short.[4]

Momentum performs best in focused, concentrated portfolios. Momentum is a high turnover strategy. Momentum stocks are often volatile with wide bid-ask spreads. Trading billions of dollars in a modest number of volatile stocks is bound to impact trade execution. It would be like trying to force a dozen people through a small door opening.

Academics have long been concerned about the price impact of momentum trading. The first to study this were Lesmond, Schill and Zhou (2002) in “The Illusive Nature of Momentum Profits.” They found that momentum creates an illusion of profit opportunity when none really exists. Two years later, Korajcyzk and Sadka (2004) determined that profit opportunities could vanish once the amount invested in momentum-based strategies reaches $5 billion.

Counter to these findings, Frazinni, Israel and Moskowitz (2012) from AQR, based on 12 years of proprietary data, argued that the potential scale of momentum is more than an order of magnitude greater than previous studies suggested. They said this capacity could increase even further by using optimized trading methods.

More recently, Ratcliffe, Miranda and Ang (2016) from BlackRock also suggested that a greater amount of capital could be traded using momentum. But they also made this disclaimer, “The exercise we conduct in this paper is hypothetical and involves several unrealistic assumptions.”

In contrast to these two studies, Fisher, Shah and Titman (2015), using observed bid-ask spreads, got results much closer to those of Lesmond et al. and Korajcyzk & Sadka than Frazinni et al.

Novy-Marx and Velikov (2015) also determined the capacity for stock momentum before profits would vanish.


This is close to the $5 billion amount where Korajcyzk and Sadka said momentum profits would disappear. Novy-Marx and Velikov used an optimization algorithm to keep them in trades longer, as discussed by Frazzini et al.

Here is a table of the amounts invested in U.S. momentum exchange traded ptoducts:
 
This is a conservative listing. It does not include mutual funds, managed accounts, or hedge funds. Even so, it exceeds the level of assets where both Novy-Marx and Velikov and Korajcyzk and Sadka say momentum profits should no longer exist.

Here is a table from the most recent study of factor capacity. It is by Beck, Hsu, Kalesnik and Kostka (2016) in “Will Your Factor Deliver? An Examination of Factor Robustness and Implementation Costs.” They used a different method than Novy-Marx and Velikov to compute factor capacity.

With $10 billion invested in large cap momentum, the value added by momentum goes from +2.7% per year before transaction costs to -3.4% after transaction costs. This is with monthly portfolio rebalancing. If you rebalance quarterly instead of monthly, your additional annual return goes from +2.0% before trading costs to -1.6% afterwards. Expected high future growth in factor-based investing should make this worse.

This situation is much like the one I discussed in my last post. Those offering commodity products to the public said passive commodities are still a worthwhile diversification. But a larger number of independent researchers, with no products to promote, said the opposite. Who shall we believe?

To help us answr that question, let us look at the performance of the oldest publicly available momentum funds. First is the PowerShares DWA Momentum ETF (PDP) managed by Dorsey Wright. It began on March 1, 2007. The second is the AQR Large Cap Momentum (AMOMX) mutual fund. It began on July 9, 2009.

From its start through January 2017, PDP had an annual return of 6.44%, while its Russell 3000 Growth benchmark returned 8.67%. This is an average annual return shortfall of 2.23%. PDP has had a focused portfolio of 100 momentum stocks.

AMOMX had an annual return of 14.55% since its inception, while its Russell 1000 Growth benchmark returned 16.11%. This is an average annual shortfall of 1.56%. These are short periods of time to evaluate performance. But it does suggest some caution.

Besides managing seven momentum mutual funds, AQR uses momentum with their multi-style funds and large hedge fund. Even though the Frazinni et al. paper said stock momentum could handle considerably more capital, AQR now spreads their large-cap U.S. momentum holdings among 496 stocks. This is half that fund’s available universe of 1000 stocks.

Quality

We can find intuitive reasons why size, value, and momentum might provide a premium based on  risk, investor behavior, or market structure. This becomes more challenging with quality. Why should quality stocks be mispriced by the market? There is no reason to believe that higher quality stocks are riskier than lower quality stocks. It is also hard to find behavioral factors or structural impediments that would explain why one would neglect high quality stocks causing them to command a behavioral premium. It is not surprising then that there are few signs of a premium or premium persistence across multiple definitions of quality.

Cakici (2015) found only marginal evidence that gross profitability (a subset of quality) exists globally. Hsu and Kalesnik (2014) reported in “Finding Smart Beta in the Factor Zoo” that two measures of quality (gross profitability and ROE) in international stocks from 1987 through 2013 showed no significant improvement in Sharpe ratio over lower quality stocks. They also found no evidence of a significant advantage using four measures of quality from 1967 through 2013 in U.S. stocks:


Multi Factor Portfolios

West, Kalesnik and Clements (2016) in “How Not to Get Fired in Smart Beta Investing” included quality in a multi-factor environment.

They determined that quality, value, and momentum are a non-robust combination. Why is this important?

More multi-factor ETFs were created in the last two years than any other category of ETF. In a January 2016 survey by Greenwich Associates, 57% of institutional investors said they used multi-factor funds in some way now. 48% said they plan to increase their use soon.

Multi-factor portfolios have less volatility and reduced tracking error compared to single factor portfolios. In “A Smoother Path to Outperformance with Multi-Factor Smart Beta Investing,” Brightman, Kalesnik, Li and Shim (2017) show that annual volatility drops from 16.4% for an average factor to 15.2-15.6% for a multi-factor portfolio. This reduction is desirable. But those familiar with portfolio theory know that factor portfolio returns are a weighted average of individual factor returns. If factor returns are disappointing due to lack of scalability (value and momentum), data accuracy and persistence (size), or robustness (quality), multi-factor returns will also be disappointing. In addition, multi-factor portfolios can face greater uncertainty due to selection bias and increased data mining.
 
On the positive side, a multi-factor approach can cut benchmark tracking error in half. But would it really matter if 10 years of factor-based underperformance were reduced to 5 years? Small cap value once underperformed the market for 42 consecutive months. If that had been 21 months, would it have made much difference? Most investors would have been gone long before then.

Low Volatility

In a Brown Brothers Harriman survey of 175 financial advisors and institutional investors, low volatility was the most popular smart-beta choice. 44% of respondents chose low volatility over other factors. There is no risk-based reason why low-volatility/low beta stocks should outperform their counterparts. But a case can be made that leverage constraints can cause high volatility stocks to be bid up so they are overpriced relative to low-volatility stocks.

However, low volatility is the most problematic factor. The first cautionary sign is a chart of pre-1967 performance in the appendix of Novy-Marx’s (2016) paper “Understanding Defensive Equity.” Volatility and beta are estimated using daily data from the prior year when available. Otherwise, Novy-Marx uses 5 years of monthly data.
There is little difference between the lowest and highest volatility quintiles. With respect to beta, low beta is the worst performer, while high beta turns in the second-best performance. These results contradict those found by Novy-Marx and others since 1968.

Novy-Marx also pointed out that the vast majority of low volatility profits since 1968 came from the short side. He showed that most of the benefits from low volatility investing could be achieved simply by eliminating small growth stocks from one’s portfolio.

In “The Limits to Arbitrage and the Low-Volatility Anomaly,” Li, Sullivan and Garcia-Feijoo (2014) found that the excess return associated with low volatility was present only in the first month after portfolio formation. Additionally, excess return has been weak since 1990. They also found that the low volatility premium was offset by high transaction costs. It was largely eliminated if you omitted stocks priced under $5 per share. It also was not present in equal weight portfolios.

Garcia-Feijoo, Kochard, Sullivan and Wang (2015) in “Low-Volatility Cycles: The Influence of Valuation and Momentum on Low-Volatility Portfolios,” showed that the excess return from low-volatility is reliably positive only when low-volatility stocks are much cheaper than high volatility stocks as shown by a high book-to-price (B/P) ratio.

Using U.S. stock data from 1929 through 2010, van Vliet (2012) found low-volatility has had time-varying exposure to the value factor. When low-volatility stocks had value exposure, they returned an average of 9.5% annually versus the market’s 7.5%. But when low-volatility stocks had growth exposure, they returned 10.8% annually versus the market’s 12.2%.

Getting back to the idea of short interest, Jordan and Riley (2016) show in “The Long and Short of the Vol Anomaly,” that short interest dominates the low-volatility effect from July 1991 through December 2012.

 
High volatility stocks with low short interest had extraordinarily positive returns. High volatility stocks with high short interest had extraordinarily poor returns. Low volatility stocks had a similar, but less dramatic, disparity in performance based on short interest. Short interest has had a large impact on low-volatility performance.

Summarized here are the issues associated with the low-volatility premium:

•    Weak since 1990
•    Absent in higher priced stocks
•    Exists mostly on the short side
•    Largely offset by transaction costs
•    Reliably positive only when cheap
•    Not present in equal weight portfolios
•    Present only in the first month after formation

Less Downside Risk

With all these negatives, one might wonder why low-volatility has been the fastest growing factor. This may have to do with investors thinking low-volatility has less risk exposure than the market. It is not surprising that investors are more risk-averse now. They have experienced two bear markets over the past 20 years where stocks lost half their value.

How much risk reduction is there really from low-volatility investing? To find out, I accessed the online data provided by van Vliet and De Koning. They used the 1000 largest NYSE, AMEX, and NASDAQ stocks over $1 per share in the CRSP database. Stocks were equal weighted and sorted into deciles based on their volatility over the past 36 months. These portfolios were rebalanced quarterly.

I accessed the data starting in January 1934 to avoid the extreme returns of the late 1920s and early 1930s. I used the top two low-volatility deciles, representing 200 stocks, which is a typical fund-size portfolio. I compared the performance of the low-volatility portfolio to the S&P 500 and to a robust version of trend following absolute momentum that I use in my proprietary dual momentum models. Absolute momentum holds the S&P 500 when the model is in stocks and intermediate U.S. Government bonds when the model is out of stocks. Data is from Ibbotson Associates. 

Jan 1934 – Dec 2014
S&P 500
Low-Volatility
Absolute Momentum
CAGR
11.1%
12.3%
13.2%
Standard Deviation
15.8%
12.3%
11.3%
Sharpe Ratio
0.53
0.73
0.85
Worst Drawdown
-50.9%
-40.1%
-31.5%
Worst U. S. Bear Markets 1934- 2014


S&P 500
Low-Volatility
Absolute Momentum
Jul 2007 – Feb 2009
-50.9%
-38.3%
+5.0%
Apr 2000 – Sep 2002
-43.8%
+24.2%
+17.4%
Jan 1973 – Sep 1974
-41.8%
-37.5%
+2.0%
Nov 1968 – Jun 1970
-29.3%
-22.9%

    -2.9%
Mar 1937 – Mar 1938
-50.5%
-40.1%
-9.1%
Results are hypothetical, are NOT an indicator of future results, and do NOT represent returns that any investor actually attained. Indexes are unmanaged, do not reflect management or trading fees, and one cannot invest directly in an index. Please see our Disclaimer page for more information.

The low-volatility portfolio outperformed the S&P 500. But absolute momentum was more effective at both reducing drawdown and enhancing return. 

For those who want more evidence on the efficacy of trend following, here are the results from Greyserman and Kaminski’s application of 12-month absolute momentum to stock indices, bonds, commodities, and currencies back to the year 1223! Assets were held long or short depending on the trend of the last 12 months. Trend following absolute momentum was much more effective than buy-and-hold. The sizes of the five largest drawdowns were also reduced by an average of one-third.





Source: Greyserman and Kaminski (2014), Trend Following with Managed Futures

The viability of trend-following momentum back to the 13th century is strong evidence that it is not an artifact of data mining.

Conclusion

There are research papers and articles almost every day extolling the virtues of factor investing based on studies of historical stock data. Factors may look good in theory and on paper. But whether they can provide superior risk-adjusted real world returns after costs is another story. Too many people may be thinking they can do better than market by using higher cost factor-based investing. Not everyone can be above average. There is a good chance many of these strategies will underperform, especially as the amount of capital invested in them continues to growth.

Even  now, each factor that I looked at failed to hold up under one or more of these tests: robustness, persistence, pervasiveness, intuitiveness, and investabilty. Those who are prudent and truly interested in evidence-based investing will be cautious. Others will continue to accept what they have been told by product sponsors and a small number of academic theorists.


[1] For more on the the Russell 2000 index and its issues, see Alpha Architect's "A Better Way to Buy the Russell 2000".
[2] Results were similar using valuation measures other than book-to-price. Kok, Ribando, and Sloan (2017) also found "remarkably consistent results" using different valuation ratios and weightings.
[3] When I looked at the 41 factor-based funds with more than 5 years of price history, only three had an alpha that was statistically significant at the 5% confidence level.
[4] This applies to stock momentum. Research shows that momentum works best when applied to geographically diversified stock indices. See "Momentum for Buy-and-Hold Investors".