Home > data science, finance, modeling > On being a data science skeptic: due out soon

On being a data science skeptic: due out soon

July 11, 2013

A few months ago, at the end of January, I wrote a post about Bill Gates naive views on the objectivity of data. One of the commenters, “CitizensArrest,” asked me to take a look at a related essay written by Susan Webber entitled “Management’s Great Addiction: It’s time we recognized that we just can’t measure everything.”

Webber’s essay is really excellent, not to mention impressively prescient considering it was published in 2006, before the credit crisis. The format of the essay is simple: it brings up and explains various dangers in the context of measurement and modeling of business data, and calls for finding a space in business for skepticism. What an idea! Imagine if that had actually happened in finance when it should have back in 2006.

Please go read her essay, it’s short.

Recently, when O’Reilly asked me to write an essay, I thought back to this short piece and decided to use it as a template for explaining why I think there’s a just-as-desperate need for skepticism in 2013 here in the big data world as there was back then in finance.

Whereas most of Webber’s essay talks about people blindly accepting numbers as true, objective, precise, and important, and the related tragic consequences, I’ve added a small wrinkle to this discussion. Namely, I also devote concern over the people who underestimate the power of data.

Most of this disregard for unintended consequences is blithe and unintentional (and some of it isn’t), but even so it can be hugely damaging, especially to the individuals being modeled: think foreclosed homes due to crappy housing-related models in the past, and think creepy models and the death spiral of modeling for the present and future.

Anyhoo, I’m actively writing it now, and it’ll be coming out soon. Stay tuned!

Categories: data science, finance, modeling
  1. July 11, 2013 at 7:42 am

    Historically speaking, this issue arises on a dismally recurring basis, (q.v.) Phrenology, Taylorism, and Cargo Cult Science. It’s the kind of science you’d expect from a Mr. Know It All — Not❢ who’d never been through the rigors and cautionary tales of a graduate level applied statistics course.


    • Higby
      July 12, 2013 at 10:08 am

      Statistics unleashes its own demons — the data it produces MUST be about something “real,” and ends up reifying (whatever).

      You probably need to read, Stephen Jay Gould, The Mismeasure of Man, to get a sense of this.

      And what about Taylorism? Sure, time-studies were stupid — but the basic idea, “scientific management,” is that Corporate managers can (and should) strip and codify the expertise of “workers,” and use that to take ownership of all aspects of production.

      MOOCs do just this, if you want a relevant recent example.

      The idea that management should take control is fairly recent (David Montgomery).


  2. Abe Kohen
    July 11, 2013 at 7:48 am

    Webber is an interesting read. Read Kuhn, Popper, et.al., way back when as an undergrad in a class on Philosophy of Science. I was amazed how at D.E.Shaw the quest was to quantify EVERYTHING. It had an impact on me, and I tend to quantify things more than before but not as much as the D.E. Shaw people. A few years back I read an interesting book by Gilovich: “How We Know What Isn’t So: The Fallibility of Human Reason in Everyday Life.” Definitely worth a read. When will your book be out?


  3. jmhl
    July 11, 2013 at 8:04 am

    An interesting counterpoint to Webber’s essay (which is very good) is Doug Hubbard’s book “How To Measure Anything: Finding the Value of Intangibles in Business”.


  4. July 11, 2013 at 9:35 am

    I don’t know if you’re in the New York area, but this meetup group could be interesting to you: http://www.meetup.com/The-NYC-Data-Skeptics-Meetup/


  5. ledflyd
    July 11, 2013 at 9:36 am

    Never mind… I just noticed you’re one of the organizers… see you there!


    • July 11, 2013 at 9:43 am

      Hey I was just about to tell you that! Awesome.


      p.s. This kinda reminds me of the fact that Amazon recommended I buy my own book.


  6. Klassy
    July 11, 2013 at 10:18 am

    Thanks for the link. Saved it for later reading, but this caught my eye: “An overabundance on metrics can lead to “knowing the price of everything and the value of nothing”. This reminds me of a chapter on the Quantified Self movement in Evgeny Morozov’s To Save Everything Click Here . Until I read the book, whenever I heard the term quantified self I assumed it was being used as a pejorative. I did not know that there were people who were actually proud of being part of this “movement”!


  7. mathematrucker
    July 11, 2013 at 11:11 am

    I wish that at least the slightly more incisive term “systemic fraud crisis” would have somehow managed to beat out the egregiously benign “credit crisis”, but then, with language control being among the elites’ most prized weapons of choice in the battle against democracy, it comes as no surprise to me that it didn’t. Oops, excuse me, I meant to say “rout”.


  8. July 11, 2013 at 11:16 am

    I am a research neurophysiologist. People in this field are very interested in modeling, but models are subjected to rigorous null hypothesis testing before being used in the clinical setting. Do business data scientists test their models before applying them to conduct business? if not, why not?


    • July 11, 2013 at 2:14 pm

      There are of course those who do, mostly academic researchers, but these had already moved on to more evolved models of business, like learning organizations and whole systems thinking, by the end of the last millennium. But your average agenda-driven half-track double-think tankers have a different agenda than wanting to know the truth. They have a preset product to sell, and so they have regressed to archaic autocratic models of business.


    • Higby
      July 12, 2013 at 10:20 am

      Talk about the rhetorical force of a name! “Rigorous null hypothesis testing” ! I love it!

      All this emerged from French mathematicians and astronomers trying to quantify the error in their measurements, something that is clearly impossible to do. If you could remove the error, then you would have a perfect representation! Yet, all measurement — even the measurement of error — is flawed, imprecise. Besides, how can you quantify something that is not there? is not actual?


  9. July 11, 2013 at 11:44 am

    You bet there’s a need for skeptics…one book I read a couple years ago and mention on my blog is “Proofiness, the Dark Arts of Mathematical Deception” from Charles Siefe at NYU, he does a good job and keep in mind the book is a couple years old but he was ahead of the game. I use his video he did in January of 2012 a lot as he does a good job with a lot of issues and he’s entertaining as well. People get to hear there’s no algorithm for the perfect butt or for happiness and digs into a lot of fabricated study numbers, etc. It’s one of the videos (along with one of yours) I keep in my footer. I try:) The Quant documentary is very good too with Derman and Wilmott is good too and I keep promoting it as well as it speaks to where the layman gets something out of it, like the bottles of beer analogy, but the layman can get this with understanding hedge fund workings.

    Hopefully the data science skeptic group will grow in other cities too, we need it. Yesterday the ultimate slap came out with data mining and selling, the game “Data Dealer” and it’s going on kickstarter:) I think the overall focus here is to be an awakening but if they make a few dollars along the way I don’t think they will mind.


    Learn how to sell your profile to an insurance company…(grin)…and get the dope off the dating sites:) “Play against each other and hack each others data base soon…connect with other data dealers and take over the world…buy profiles like a nurse who is selling access to the hospital patient data base and the buyers will come knocking start companies and online ventures, “…the video is done well for making their point as you do laugh at some of it and then you get disgusted in the same breath:)

    I don’t’ know if this is not a big enough slap in the face for regulators, namely the FTC for one and others in healthcare too, I don’t know what is:) The scary thing here too is will something like this worm it’s way in as part of the new norm? I hope not. In a bit of satire a couple years ago I did some satire saying that data addiction and abuse could be the next upcoming 12 step program on the horizon, but when I see what’s out there now I think I scared myself into skepticism.

    Yes indeed there is a big need for skeptics and I do think the abuse might be more widespread than we could be willing to accept as a society. Keep us posted on the essay!


  10. Higby
    July 12, 2013 at 10:31 am

    I’m glad that Webber mentioned Vietnam War statistics — Robert McNamara was part of the Ford brain trust that quantified everything with charts and graphs. The focus on statistics corrupted everything. http://en.wikipedia.org/wiki/Robert_McNamara

    And from Webber: “statistical inferences … most commonly used in business, are not conclusive. First, under even the best of circumstances, measurements are not perfectly accurate. Second, the sample chosen for study may not truly represent the population
    as a whole.” This last one is my biggest gripe — when do we ever have a total population for a sample? Never! Only when you have a total population can you measure error.


  11. July 12, 2013 at 10:09 pm

    Looking forward to your post. There was a session at the Boston chapter of the American Statistical Association last year, discussing the claim by Peter Norvig and some others that the era of Big Data and Data Science represented “the end of Statistics” as a discipline. The advocates for the position at the meeting, adopting it as protagonists in a debate, if not as believers, pressed the case. There was widespread skepticism, even if one could argue that those, like myself, had ulterior motives. I think the basic discomfort regarding this attitude is that there are deep, unstated assumptions in the view that more data is always better, and that the remedy to overcoming any imperfection is to mindlessly take more data. That, of course, is the street or “pop” view of data science. Maybe it is practiced that way in some circles. I do know data scientists who are, essentially, statisticians, and are properly mindful of the pitfalls of poor data collection and the unchecked assumption.

    The people I trust the least in that business are those who point to some commercial success with their conclusions and models to justify that ‘they must be doing something right’. Yes, they are making money in some market niche. It is not necessarily any kind of verification that their models mean anything, other than conveying convincing stories, at least in my opinion.


  1. No trackbacks yet.
Comments are closed.
%d bloggers like this: