August 12, 2017

Book Review: Standard Deviations, Flawed Assumptions, Tortured Data and Other Ways to Lie with Statistics

Years ago, when asked to recommend some good investment books, I often suggested ones dealing with the psychological issues influencing investor behavior. These focused on investor fear and greed, showing “what fools these mortals be.” Here are examples: Devil Take the Hindmost: A History of Financial Speculation by Edward Chancellor, and Extraordinary Popular Delusions and the Madness of Crowds by Charles MacKay.

In recent years, there has been a wealth of similar material in the form of behavioral finance and behavioral economics. I now suggest that investors do an internet search on these topics. To better understand investing and investors, you should be familiar with concepts like herd mentality, recency bias, confirmation bias, overconfidence, overreaction, loss aversion, and the disposition effect.

An enjoyable introduction to this field is Richard Thaler’s Misbehaving: The Making of Behavioral Economics. Here is an extensive bibliography for those who want to do a more in-depth study.

Importance of Statistical Analysis

Now that quantitative investment approaches (factors, indexing, rules-based models) are becoming prominent, you need to also be able to properly evaluate quantitative methods. A lively book on the foundations of statistical analysis is The Seven Pillars of Statistical Wisdom by Stephen Stigler. An engaging and more nuanced view of the subject is Robert Abelson's Statistics As Principled Argument.

What I mostly recommend is Standard Deviations, Flawed Assumptions, Tortured Data and Other Ways to Lie with Statistics by economist Gary Smith. Everyone should benefit from reading this book.


Smith’s premise is that we yearn to make an uncertain world more certain and to predict the unpredictable. This makes us susceptible to statistical deceptions. The investment world is especially susceptible now that it is more model-based and data driven.

It is easy to lie with statistics but hard to tell the truth without them. Smith takes up the challenge of sorting good from bad using insightful stories and entertaining examples. Here are some salient topics with real-world cases that Smith covers:
 
•    Survivorship and self-selection biases
•    Overemphasis on short-term results
•    Underestimating the role of chance
•    Results distorted by self-interest
•    Correlation is not causation
•    Regression to the mean
•    Law of small numbers
•    Confounding factors
•    Misleading graphs
•    Gamblers fallacy

Critical Thinking

Smith is not afraid to point out mistakes by the economic establishment.  He mentions  errors by University of Chicago economist Steven Levitt of Freakonomics fame.  Smith also discusses research made popular by two Harvard professors, Reinhart and Rogoff. The professors concluded that a nation’s economic growth is imperiled when its ratio of government debt to GDP exceeds 90%. Smith points out serious problems with their work due to inadvertent errors, selective omissions of data, and questionable research procedures.

Publish or perish can contribute to errors in academic research. Economic self- interest, as in medical and financial research, can also cause errors. Smith helps us see how important it is to look at research critically instead of blindly accepting what is presented.

Theory Ahead of Data

Throughout his book Smith focuses on the potential perils associated with deriving theories from data. He gives examples of the Texas sharpshooter fallacy (aka the Feynman trap). Here a man with a gun but no skill fires a large number of bullets at the side of a barn. He then paints a bullseye around the spot with the most bullet holes. Another version is where the sharpshooter fires lots of bullets at lots of targets. He then finds a target he hits and forgets the rest. Predicting what the data looks like after examining the data is easy but meaningless. Smith says:

Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense, and it needs to be tested with uncontaminated data.

Financial market researchers often use data to help invent a theory or develop a trading method.  Theory or method generated by ransacking data is a perilous undertaking. Tortured data will always confess something. Pillaged data without theory leads to bogus inferences.

Data grubbing can uncover patterns that are nothing more than coincidence. Smith points to the South Seas stock bubble as an example where investors saw a pattern - buy the stock at a certain price and sell it at a higher price. But they didn’t think about whether t it made any sense.

Smith addresses those who take a quantitative approach to investing. He says quants have “a na├»ve confidence that historical patterns are a reliable guide to the future, and a dependence on theoretical assumptions that are mathematically convenient but dangerously unrealistic.”

Common Sense

Smith’s solution is to first make sure that one’s approach makes sense. He agrees with the great mathematician Pierre-Simon LaPlace who said probabilities are nothing but common sense reduced to calculation.

Smith says we should be cautious of calculating without thinking. I remember a case at the Harvard Business School where we looked at numbers trying to figure out why Smucker’s new ketchup was not doing well as a mass market item. The reason turned out to be that no one wanted to buy ketchup in a jam jar. We need to look past the numbers to see if what we are doing makes sense.

I often see data-derived trading approaches fit to past data without considering whether the approaches conform to known market principles. If one does come up with models having sensible explanations, they should then be tested on new data not corrupted by data grubbing. Whenever you deviate from the market portfolio, you are saying you are right and the market is wrong. There should be some good reasons and plenty of supporting data for believing this is true. [1]

Self-Deception

I have seen some take the concept of dual momentum and make it more complicated with additional parameters. They may hold back half their data for model validation. They call this out-of-sample testing, but that is a questionable call.

Do you think you would hear about these models if their “out-of-sample” tests showed poor results? Would they discard their models and move on? Chances are they would search for other parameters that gave satisfactory results on both the original and hold out data.

Momentum is robust enough that a test on hold out data might look okay right from the beginning. But that is likely due to momentum’s overall strength and pervasiveness.  It is questionable whether you really had something better than with a simpler approach. You may have just fit past data better, which is easy to do by adding more parameters.

Keep It Simple

There are some areas I wish Smith had addressed more. First is the importance of having simple models, ala Occam’s razor. Simple models dramatically reduce the possibility of spurious results from overfitting data.

Overfitting is a serious problem in financial research. With enough parameters, you can fit almost any data to get attractive past results. But these usually do not hold up going forward.

John von Neumann said that with four parameters he can fit an elephant, and with five he can make it wiggle its trunk.In the words of Edsger Dijkstra, "Simplicity is a great virtue, but it is requires hard work to achieve it and education to appreciate it. And...complexity sells better."

In Data We Trust

Smith briefly mentions the law of small numbers popularized bt Kahnemann and Tversky. But I wish Smith had gone more into the importance of abundant data and how it helps us avoid biased results.

According to de Moivre’s observation, accuracy is proportional to the square root of the number of observations. To have half the standard error, you need four times the data.

The stock market has had major regime pattern changes about every 15 years. A model that worked well during one regime pattern may fail during the next. We want a robust model that holds up across all regimes. To determine that, you need plenty of data. In the words of Sherlock Holmes, “Data! Data! Data! I can’t make bricks without clay!”

Theories Without Data

Smith talks a lot about the issue of data groping without theory. But he also mentions the opposite problem of theory without adequate data analysis. Smith cites as an example the limit to growth theories of Malthus, Forrester, and Meadows. Smith contends they did not make any attempt to see whether historical data supported or refuted their theories. Most economists now dismiss these theories.

But Smith may not have considered later information on this topic. Here and here is information that makes this subject more provocative. Smith says it is good to think critically. This is true even when approaching a book as thoughtful as Smith’s.

[1] Even if a model makes sense initially, you need to make sure it continues to do so. Investment approaches can become over utilized and ineffective when they attract too much capital. See here and here.