Home > data science, rant > Poseurs should not own the backlash against data science poseurs

Poseurs should not own the backlash against data science poseurs

March 7, 2013

I’ve noticed a recent trend in coverage of data science. Namely, there’s backlash against the hype and the over-promising, intentional or not, of data science and data scientists. People are beginning to develop smell tests for big data and raise incredulous eyebrows at certain claims.

This is a good thing. We data scientists should welcome the backlash, first because it’s inevitable, and second because it allows us to have a much-needed conversation about how to behave and what is reasonable to claim or even hope for with respect to big data. There is a poseur problem in big data, after all.

But, fellow data nerds, let’s take this as a cue to start an internal discussion about data science skepticism. Let’s make sure that it’s coming from our community, or at least the surrounding technical community, rather than from yet another set of poseurs who don’t actually know what data is and would only serve to lampoon and discredit our emerging field rather than improve it. We should be the ones leading the charge and admitting when we’re full of shit. We need to own the backlash.

Let me give you an example. A serious data scientist friend of mine recently got asked to be interviewed as part of a conversation on data science skepticism. After thinking hard about what her contribution could be, she wrote back to accept the offer, but was then told she was “off the hook” because they’d found someone else who was “perfect for the assignment.” It turned out to be a journalist who had previously interviewed her. That was his credential for this conversation.

But how can you actually have informed skepticism if you are not yourself an expert?

Another example. David Brooks recently wrote a column wherein he declared himself a data science skeptic and then followed that up by referring to no fewer than eight random statistical studies that made no coherent sense and had no overall point. My conclusion: this is the wrong man to lead the charge against poseurs in data science.

If we are going to rebel against big data soundbites, let’s not do it in soundbites. Instead, let’s talk to people on the inside, who see specific problems in the field and are willing to talk openly about them.

I liked the recent Strata talk by Kate Crawford entitled “Untangling Algorithmic Illusions from Reality in Big Data” (h/t Alan Fekete) which discusses bias in data using very concrete examples, and asks us to examine the objectivity of our “facts”.

For example, she talked about a smart phone app that finds potholes in Boston and report them to the City, and how on the one hand it was cool but on the other it would mean that, if naively applied, richer neighborhoods like Lincoln would get better services than Roxbury. She explained an important point: data analysis is not objective, which most people know. But often the data itself is not either – it was collected in a certain way with particular selection biases.

We need more conversations like this or else we will be leaving a hole which will be filled with loud, uninformed skeptics who would be right to raise the alarm.

One last thing. I’m aware that tons of people, especially serious academic statisticians and computer scientists, criticize data scientists for a totally different reason, namely that we are overly self-promoting (although academics have their own status plays).

But I don’t apologize for that. The truth is, a data scientist is a hybrid between a business person and a researcher. And this is a good thing, not a bad thing: it means the world gets direct access to the modeler, and can challenge any hyperbolic claims by asking for details, rather than having to go through a marketing person who acts (usually quite poorly) as a nerd interpreter. I for one would rather represent my work directly to the world (and be called a self-promoter) then to be kept in the back room.

 

Categories: data science, rant
  1. March 7, 2013 at 7:21 am

    “Another example. David Brooks recently wrote a column wherein he declared himself a data science skeptic and then followed that up by referring to no fewer than eight random statistical studies that made no coherent sense and had no overall point. My conclusion: this is the wrong man to lead the charge against poseurs in data science.”

    You don’t get to pick your critics. I bet the Wall Street bankers would much rather have wonky academics giving them grief than the OWS. Valid concerns can come from unusual places. Keep up the good work.

  2. March 7, 2013 at 8:00 am

    I think it’s more a matter of data science integrity — the integral unity of research, public service, and teaching — than being a business person. These days business persons are too easy to buy.

  3. JSE
    March 7, 2013 at 8:49 am

    “The truth is, a data scientist is a hybrid between a business person and a researcher. And this is a good thing, not a bad thing”

    I think it’s neither a bad thing nor a good thing, or, better, there are good things about it and bad things about it. The good includes what you mention. The bad is that, when you are a business person, your communications are to some extent motivated; so when I hear claim X from someone whose livelihood directly depends on my accepting claim X, I have to figure out to what extent that person actually believes claim X, and to what extent they are saying what they have to say to stay in business. Under ideal circumstances the things you say to get paid are the things you sincerely believe, but circumstances are not always ideal!

    This is relevant to the question discussed here earlier: why it is hard to say “I don’t know.” It is hard to get _paid_ to say “I don’t know,” when there are other people who say that they do know, even when you really don’t know, and neither do the other people. And yet saying “I don’t know,” sometimes in a sharp tone of voice, is what science demands that we do.

    • March 7, 2013 at 8:53 am

      I’d claim it’s still a good thing for two reasons. Keeping in mind, of course that we’re talking about data people inside companies. First, the data person is typically less likely to lie outright than the marketer who would replace them. Second, it’s easier to tell if they’re lying (if you can smell bullshit) because you can ask them technical questions.

      In other words, I’m saying it’s better than the alternative in a clearly biased situation.

      • JSE
        March 7, 2013 at 11:39 am

        I see your point — I was imagining the contrast between the hybrid and the academic data scientist, not that between the hybrid and the back-room corporate data scientist. (And you are of course right that academic scientists have some motivation mixed with their communication too, even if it’s more about status than direct financial incentive.)

  4. March 7, 2013 at 8:51 am

    Thank you for writing this! We always seem to be on the same page, Data Science is OUR life, our bread and butter and our PASSION and I for one get very upset at wanna-bees and poseurs, if you don’t what you are talking about stay the heck out of my field! Almost 2 decades, I have worked behind closed doors with my head down and my ears to the data – I won’t let this part of my life be over shadowed by NO NOTHING people who don’t even have a clue what ETL or heteroscedastic regression is! Thank you my friend :)

  5. BK
    March 7, 2013 at 2:53 pm

    There’s nothing new here: people who have no head for numbers have been saying `you can lie with statistics, so let’s not use any statistics’ for centuries.

    Any time a new machine replaces the old way of doing things, some people react with curiosity and skepticism, and some react by throwing a wooden shoe at it. There’s little point in directly addressing or even acknowledging the people throwing shoes, because they’re not going to change their minds even given a landslide of data. At best, they’ll change their minds when enough of the level-headed skeptics are comfortable; at worst, we just have to wait for them to retire.

  1. No trackbacks yet.
Comments are closed.
Follow

Get every new post delivered to your Inbox.

Join 1,717 other followers

%d bloggers like this: