Home > data science, rant > Two rants about hiring a data scientist

Two rants about hiring a data scientist

September 18, 2012

I had a great time yesterday handing out #OWS Alternative Banking playing cards to press, police, and protesters all over downtown Manhattan, and I’m planning to write a follow-up post soon about whether Occupy is or is not dead and whether we do or do not wish it to be and for what reason (spoiler alert: I wish it were because I wish all the problems Occupy seeks to address had been solved).

But today I’m taking a break to do some good and quick, old-fashioned venting.

——-

First rant: I hate it when I hear business owners say they want to hire data scientists but only if they know SQL, because for whatever reason they aren’t serious if they don’t learn something as important as that.

That’s hogwash!

If I’m working at a company that has a Hive, why would I bother learning SQL? Especially if I’ve presumably got some quantitative chops and can learn something like SQL in a matter of days. It would be a waste of my time to do it in advance of actually using it.

I think people get on this pedestal because:

  1. It’s hard for them to learn SQL so they assume it’s hard for other smart people. False.
  2. They have only worked in environments where a SQL database was the main way to get data. No longer true.

By the way, you can replace “SQL” above with any programming language, although SQL seems to be the most common one where people hold it against you with some kind of high and mighty attitude.

——-

Second rant: I hate it when I hear data scientists dismiss domain expertise as unimportant. They act like they’re such good data miners that they’ll find out anything the domain experts knew and then some within hours, i.e. in less time than it would take to talk to a domain expertise carefully about their knowledge.

That’s dumb!

If you’re not listening well, then you’re missing out on the best signals of all. Get over your misanthropic, aspy self and do a careful interview. Pay attention to what happens over time and why and how long effects take and signals that they have begun or ended. You will then have a menu of signals to check and you can start with them and move on to variations of them as appropriate.

If you ignore domain expertise, you are just going to overfit weird noisy signals to your model in addition to finding a few real ones and ignoring others that are very important but unintuitive (to you).

——-

I wanted to balance my rants so I don’t appear anti-business or anti-data scientist. What they have in common is understanding the world a little bit from the other person’s point of view, taking that other view seriously, and giving respect where it’s due.

Categories: data science, rant
  1. JSE
    September 18, 2012 at 9:01 am

    I would add to your rant about domain expertise: remember that domain expertise takes time to acquire, even if you’re smart!

  2. September 18, 2012 at 10:04 am

    The challenge is biggest when a company is hiring its first data scientist. It often comes down to someone with no training in the field trying to evaluate someone who may be an expert in the field. Even if the interviewer is the CTO or a great developer, he/she may not be qualified to evaluate how much statistics or probability theory a candidate knows. I always advise people to get outside help when they recruit their first data scientist. It is well worth their $ to pay a practicing data scientist as an interview consultant. The consulting data scientist should easily be able to come to the conclusion that if you know Python, C or R very well, you’ll likely learn SQL in 3 days.

  3. Ivan
    September 18, 2012 at 10:54 am

    Well, you should probably understand what a relational database is if you are to work with it. The SQL language itself is easy.

  4. September 18, 2012 at 11:27 am

    You realize that OWS only exists because the credit bubble popped and the floor cracked under the financiers, right?

    And that the problems is in the credit origination system which is controlled by Washington, D.C. and not Manhattan, right?

  5. jim
    September 18, 2012 at 3:37 pm

    Some of us misanthropic aspies do in fact listen to domain experts, particularly experts in domains that misanthropic aspies would have a hard time fitting into.

  6. Larry Headlund
    September 18, 2012 at 6:31 pm

    Second rant first (domain experts): When I first began hearing the term someone asked “What other kind of expert is there?”. With the exception of Irwin Corey, we are all, at best, limited experts. Anyone who thinks that their talent is the super talent that renders some specialized knowledge obselete is asking to be mugged by reality.

    Second rant, SQL. I’m with you all the way if SQL knowledge is just a checkoff item or they don’t wan’t to put any time into a new hire, expecting instant integration. Also someone who has learned other programming languages can be expected to add another (with varying degrees of difficulty depending on the languages i,e, c++ -> java easy, fortran->java not so much). You could b expected to be able to do some things in only a few days.

    However …

    You seem to be adopting a version of your second rant in your first one: Your quantitative chops are so awsome that you transcend mere knowledge of SQL, you can learn all you need to know in a matter of days. Apparently is this case, SQL knowledge, a domain expertise is irrelevant.

  7. MathCommando
    September 18, 2012 at 11:51 pm

    The SQL requirements are nothing. It’s the VBA requirements that drive me into a homicidal rage. These idiots think they can do everything with it, even HFT. Last year I interviewed for a pricing job at an energy company and was told that they were building a custom made version of windows to run on a supercomputer. The only reason to do that is to run VBA on it! They are so afraid of anything else that they won’t even run plugins in Excel. They think VBA is top dog because it’s all they are capable of. It’s only hard to use because it’s not meant for real programming. Mention Matlab, Python or C++ and they’ll look at you like a witch doctor. I’m unemployed right now and had a shot at a job being a VBA jockey and had to pass because I’ve done it before. The only thing worse than being the smartest guy in the office is being the smartest guy in the office who is forced to use VBA. I felt like this guy:

  8. Bindicap
    September 19, 2012 at 12:16 am

    I’m sympathetic to the spirit of rule 1 (smart people can pick up specific skills as needed), but I think practicality pushes strongly the other way. SQL isn’t a niche programming language, it’s a very common query language, with hooks all over. It’s reasonable to expect at least experimentation with some SQL db. You can easily set up and administer your own PhpMySQLadmin, for instance. Understanding RDBMS concepts is a relevant skill.

    I totally agree you should run away from shops that want VBA expertise…. Wow, what energy company is that….

    Totally agree on rant 2!

  9. voloch
  10. September 21, 2012 at 3:37 pm

    A programmer or domain expert can pick up enough stats to do some light machine learning. A statistician or domain expert can pick up enough SQL to do some light database programming. A programmer or statistician can pick up some domain knowledge.

    Why not just hire jacks-and-jills-of-all trades for everything? I think the answer is that no matter how smart you are, you don’t become a domain expert, statistics expert, or programming expert in two days.

    Oh, the stories I could (and do!) tell about the abuse of SQL and other computing resources by Ph.D.s who, in their defense, typically have solid domain knowledge and statistics knowledge.

    Also, I want to add that having a Ph.D. in stats is no guarantee someone can apply even a simple technique like logistic regression to a real problem any more than a Ph.D. in computer science guarantees someone can apply SQL effectively to a real problem. I had a very hard time convincing the person who hired me for my first professional software job (in speech recognition) that someone who’d been a professor and researcher at Bell Labs had any practical skills whatsoever; in my experience, that kind of skepticism is justified. My first year on the job was one of the hardest things I’ve ever done in terms of learning and getting up to speed.

  1. No trackbacks yet.
Comments are closed.
Follow

Get every new post delivered to your Inbox.

Join 1,283 other followers

%d bloggers like this: