A good data scientist is hard to find

Home > data science > A good data scientist is hard to find

A good data scientist is hard to find

December 26, 2011 Cathy O'Neil, mathbabe

As a data scientist at an internet start-up, I am something of a quantitative handyman. I go where there is need for quantitative thinking. Since the business model of my company is super quantitative, this means I have lots of work. I have recently categorized the kind of things I do into 4 bins:

I visualize data for business people to digest. This is a kind of fancy data science-y way of saying I design reports. It’s actually a hugely critical part of the business, since our clients are less quantitative than we are and need to feel like they understand the situation, so clear, honest, and easily digestible visuals is a priority.
I forecast behavior using models. This means I forecast what users on a website will do, based on their attributes and historical precedent for what people who shared their attributes did in the past, and I also do things like stress test the business itself, in order to answer questions like, what would happen to our revenue stream if one of our advertisers jumped out of the auction?
I measure. This is where the old-school statistics comes in, in deciding whether things are statistically significant and what is our confidence interval. It’s related to reporting as well, but it’s a separate task.
I help decide whether business ideas are quantitatively reasonable. Will there be enough data to answer this question? How long will we need to collect data to have a statistically significant answer to that? This is kind of like being a McKinsey consultant on data steroids.

So why is it so hard to find a good data scientist?

Here’s why. Most data scientists don’t really think that 3 and 4 above are their job. It is far less sexy to try to honestly find the confidence interval of a prediction than it is to model behavior. Data scientists are considered magical when they forecast behavior that was hitherto unknown, and they are considered total downers when they tell their CEO, hey there’s just not enough data to start that business you want to start, or hey this data is actually really fat-tailed and our confidence intervals suck.

In other words, it’s something like what the head of risk management had to face at a big bank taking risks in 2007. There’s a responsibility to warn people that too much confidence in the models is bad, but then there’s the political reality of the situation, where you just want to be liked and you don’t actually have the power to stop the relevant decisions anyway. And there’s the added issue in a start-up that they are your models, and you want them to be liked (and to be invincible).

It’s far easier to focus on visualizing and modeling, or to stay even sexier and more mystical, just modeling itself, and let the business make decisions that could ultimately not work out, or act on data that’s pure noise.

How do you select for a good data scientist? Look for one that speaks clearly, directly, and emphasizes skepticism. Look for one that is ready to vent about how people trust models too much, and also someone who’s pushy enough to speak up at a meeting and be that annoying person who holds people back from drinking too much kool-aid.

Categories: data science

Comments (12)

OldSkeptic

December 27, 2011 at 4:19 am

Yeh and get fired all the time .. as I have been. So no one wants to be a ‘good;’ data scientist, the risks are too large. Better just to tell the ‘pers that be’ what they want to hear .. and if you don’t they will get rid of you and get someone who will.

I’m a classic old school data scientist (Physics and OR background), with multiple skills right across the spectrum. You know what I have discovered .. no one cares any longer.

In fact if you are good (and I am really good) and you have integrity … you are a marked person. If by some mistake they actually hire you then your time is very, very limited.

Once saved an organisation $600 million (yes really) and they hated me for it, tried to stop them blowing $1.4 billion (yes really) and got rolled.

We live in a society of idiotology and woe betide someone who says “the emperor has no clothes”.

LikeLike
- Salviati Galilei
  
  December 27, 2011 at 10:27 am
  
  When I read this bit I was going to write up about my own experience and then I read your post. Thank you, you saved a good 15 minutes of my day. I have a Physics/Mathematics background. I always felt like I owed the field of Mathematics the integrity it deserved. I once developed a client selection model straight up hypothesis testing, it would have increased revenues by 10% annually, straight bottom line. It was turned down because the CEO was in love with technical analysis, society of idiotology indeed. I left finance to become a high school math teacher, and found the situation to be even worse.
  
  Mathematics and scientific thinking are not only not valued in America, they are despised traits. We live in a society where fraud and pseudoscience rules with abandon.
  
  LikeLike
fulkart

December 27, 2011 at 10:35 am

Well said!

Many managers specify the problem to be solved in terms of the solution they desire. Pity the pour soul who works from first principles and reaches a contradictory conclusion.

LikeLike
Stephen Purpura (@spurpura)

December 27, 2011 at 12:24 pm

I agree with the author but also agree that managing the political situation of corporate America is extremely challenging.

LikeLike
Michael Tregaron

December 27, 2011 at 1:36 pm

Crikey! Thought things were bad, but reproduce this situation nationwide and there’s your reason for why your country is going down the pan.
I’m fascinated by maths, data, stats et al, and wish I’d paid more attention to my dithering old maths teacher, but that hasn’t stopped me using what little skills I have.
To those frustrated and unfulfilled data scientists – come to the UK. We’re aware of our lack of abilities in this direction, caused primarily by schools dumbing down their standards, but our current administration is aware of our problems and there is a real push in trying to improve matters.

LikeLike
Stephen Purpura (@spurpura)

December 27, 2011 at 3:00 pm

I wrote a quick follow-up on this post and it’s comment backlash: http://blog.stephenpurpura.com/post/14875121554/data-scientists-disliked-in-corporate-america

LikeLike
reslez

December 27, 2011 at 7:58 pm

Yeah, everything’s political, which exposes the economics lie that everyone’s a rational actor. Businesses are run by people who do what they want, fighting people who want to do what *they* want. In business, data analysis (and economics) are merely tools used to win political arguments.

LikeLike
human mathematics

January 3, 2012 at 5:20 pm

What’s the evidence that it actually is hard to find a good data scientist?

LikeLike
- Corporate Serf
  
  April 4, 2013 at 11:26 am
  
  human math:
  
  I can contribute one data point. We just went through some pain to hire one person (and not completely out of the woods yet), and it is almost impossible to find someone who understands the probability, math and has some bs detector. I think the situation is worse with the “Data Science” masters level courses. Before them, people who did “data science” actually had quantitative backgrounds who has worked in the field and has seen mis-applications of models.
  
  LikeLike
  - isomorphismes
    
    April 9, 2013 at 5:40 am
    
    ok. I guess from my perspective, I understand those things but don’t feel chased after. So now we have two anecdotes. =)
    
    LikeLike
FrankJones

January 7, 2012 at 4:35 am

At most startups I’ve been at the “data scientist” gets stuck being the “data janitor” until they get fed up and go off to a non-startup where they can get paid more to do their actual job instead of tend to the infrastructure issues.

LikeLike
Andrew

April 4, 2013 at 11:14 am

This is an interesting post but for opposite reasons. I fit all four categories yet I don’t see myself as a data scientist. I go under the moniker “statistical analyst and decision modeller.” I’ve always thought data scientists were more the techie type database admin guys.

LikeLike