October 4, 2012

This is written by Rachel Schutt and crossposted from her Columbiadatascience blog

Data is information and is extremely powerful. Models and algorithms that use data can literally change the world. Quantitatively-minded people have always been able to solve important problems, so this is nothing new, and there’s always been data, so this is nothing new.

What is new is the massive amounts of data we have on all aspects of our lives, from the micro to the macro. The data we have from government, finance, education, the environment, social welfare, health, entertainment, the internet will be used to make policy-decisions and to build products back into the fabric of our culture.

I want you, my students, to be the ones doing it. I look around the classroom and see a group of thoughtful, intelligent people who want to do good, and are absolutely capable of doing it.

I don’t call myself a “data scientist”. I call myself a statistician. I refuse to be called a data scientist because as it’s currently used, it’s a meaningless, arbitrary marketing term. However, the existence of the term, and apparent “sexiness” of the profession draws attention to data and opens up opportunities. So we need Next-Gen Data Scientists. That’s you! Here’s what I mean when I say Next-Gen Data Scientist:

  • Next-Gen Data Scientists have humility. They don’t lie about their credentials and they don’t spend most of their efforts on self-promotion.
  • Next-Gen Data Scientists have integrity. Their work is not about trying to be “cool” or solving some “cool” problem. It’s about being a problem solver and finding simple, elegant solutions. (or complicated, if necessary)
  • Next-Gen Data Scientists don’t try to impress with complicated algorithms and models that don’t work.
  • Next-Gen Data Scientists spend a lot more time trying to get data into shape then anyone cares to admit.
  • Next-Gen Data Scientists have the experience or education to actually know what they’re talking about. They’ve put their time in.
  • Next-Gen Data Scientists are skeptical – skeptical about models themselves and how they can fail and the way they’re used or can be misused.
  • Next-Gen Data Scientists make sure they know what they’re talking about before running around trying to show everyone else they exist.
  • Next-Gen Data Scientsts have a variety of skills including coding, statistics, machine learning, visualization, communication, math.
  • Next-Gen Data Scientists do enough Science to merit the word “Scientist”, someone who tests hypotheses and welcomes challenges and alternative theories.
  • Next-Gen Data Scientists are solving a new breed of problem that surrounds the structure and exploration of data and the computational issues surrounding it.
  • Next-Gen Data Scientists don’t find religion in tools, methods or academic departments. They are versatile and interdisciplinary.
  • Next-Gen Data Scientists are highly skilled and ought to get paid well enough that they don’t have to worry too much about money
  • Next-Gen Data Scientists don’t let money blind them to the point that their models are used for unethical purposes.
  • Next-Gen Data Scientists seek out opportunities to solve problems of social value.
  • Next-Gen Data Scientists understand the implications and consequences of the models they’re building.
  • Next-Gen Data Scientists collaborate and cooperate.
  • Next-Gen Data Scientists bring their humanity with them to problem solving, and algorithm/model-building.
