How to Lie With Statistics (in the Age of Big Data)
When I emailed my mom last month to tell her the awesome news about the book I’m writing she emailed me back the following:
i.e, A modern-day How to Lie with Statistics (1954), avail on Amazon
for $9.10. Love, Mom
That was her whole email. She’s never been very verbose, in person or electronically. Too busy hacking.
Even so, she gave me enough to go on, and I bought the book and recently read it. It was awesome and I recommend it to anyone who hasn’t read it – or read it recently. It’s a quick read and available as a free pdf download here.
The goal of the book is to demonstrate all the ways marketers, journalists, accountants, and sometimes even statisticians can bias your interpretation of statistical facts or even just confuse you into thinking something is true when it’s not. It’s illustrated as well, which is fun and often funny.
The author does things like talk about how you can present graphs to be very misleading – my favorite, because it happens to be my pet peeve, is the “growth chart” where the y-axis goes from 1400 to 1402 so things look like they’ve grown a huge amount because “0” isn’t represented anywhere. Or of course the chart that has no numbers at all so you don’t know what you’re looking at.
There are a few things that don’t translate: so for example, he has a big thing about how people say “average” but they don’t specify whether they mean “arithmetic mean” or “median.” Nowadays this is taken to mean the former (am I wrong?).
And also, it’s fascinating to see how culture has changed – many of his examples that involve race would be very different nowadays, and issues around women, and the idea that you could run a randomized experiment to give half the people polio vaccines and withhold them from the other half, when polio is a real threat that leaves children paralyzed, is really strange.
Also, many of the examples – there are hundreds – refer to the Great Depression and the recovery since then, and the assumptions are bizarrely different in 1954 than you see in 2014 (and I’d guess different than how it will be in 2024 but I hope I’m wrong). Specifically, it seems that many of the lies that people are propagating with statistics are to downplay their profits so as to not seem excessive. Can you imagine?!
One of the reasons I read this book, of course, was to see if my book really is a modern version of that one. And I have to say that many of the issues do not translate, but some of them do, in interesting ways.
Even the reason that many of them don’t is kind of interesting: in the age of big data, we often don’t even see charts of data so how can we be misled by them? In other words, the presumption is that the data is so big as to be inaccessible. Google doesn’t bother showing us the numbers. Plus they don’t have to since we use their services anyway.
The most transferrable tips on how to lie with statistics probably stem from discussions on the following topics:
- Selection bias (things like, of the people who responded to our poll, they are all happy with our service)
- Survivorship bias (things like, companies that have been in the S&P for 30 years have great stock performance)
- Confusing people about topic A by discussing a related but not directly relevant topic B. This is described in the book as a “semi-attached figure”
The last one is the most relevant, I believe. In the age of big data, and partly because the data is “too big” to take a real look at, we spend an amazing amount of time talking about how a model is measuring something we care about (teachers’ value, or how good a candidate is for a job) when in fact the model is doing something quite different (test scores, demographic data).
If we were aware of those discrepancies we’d have way more skepticism, but we’re intimidated by the size of the data and the complexity of the models.
A final point. For the most part that crucial big data issue of complexity isn’t addressed in the book. It kind of makes me pine for the olden days, except not really if I’m black, a woman, or at risk of being exposed to polio.
UPDATES: First, my bad for not understanding that, at the time, the polio vaccine wasn’t known to work, or even be harmful, so of course there were trials. I was speaking from the perspective of the present day when it seems obvious that it works. For that matter I’m not even sure it was the particular vaccine that ended up working that was being tested.
Second, I showed my mom this post and her response was perfect:
Glad you liked it! Love, Mom