Rachel Schutt speaks at Strata tomorrow about Next-Gen data science
I’m excited about Rachel Schutt’s talk at Strata Santa Clara tomorrow at 1:30 PST. I don’t think it’s being live-streamed, unfortunately, but maybe we will eventually get our hands on a video.
The topic is next-gen data science and data scientists, which is explained in her abstract:
Data Science is an emerging field in industry, yet not well-defined as an academic discipline (or even in industry for that matter). I proposed the “Introduction to Data Science” course at Columbia in March, 2012. This was the first course at Columbia that had the term “Data Science” in the title. I had three primary motivations:
1) Bringing industry to students: I wanted to give students an education in what it’s like to be a data scientist in industry and give them some of the skills data scientists have. This is based on my experience as a lead analyst on the Google+ Data Science team. But I didn’t want to limit them to only my way of seeing the world, so each week, guest speakers from theNYC tech community came to teach the class.
2) I wanted to think more deeply about the science of data science: Data Science has the potential to be a deep and profound research discipline impacting all aspects of our lives. Columbia University and Mayor Bloomberg announced the Institute for Data Sciences and Engineering in July, 2012. This course created an opportunity to develop the theory of Data Science and to formalize it as a legitimate science.
3) Personal Challenge: I kept hearing from data scientists in industry that you can’t teach data science in a classroom or university setting and I took that on as a challenge. I wanted to test the hypothesis that it was possible to train awesome data scientists in the classroom.
In February 2013, 2 months will have passed since the class ended. I’ll be able to reflect on how the class went, how I thought about the curriculum, how I engaged the NYC tech community to be involved in the class, who the students were, whether I had impact on them, etc.
Rachel wrote a blog for the class and had a great post about being a next-gen data scientist. She has high hopes for the students in the class and wrote an aspirational list for them. It started with the idea of being more focused on integrity than on self-promotion, and it ended with bringing one’s humanity to the job.
When Rachel talks about it it seems possible that one could use data science to actually make the world a better place rather than to simply add to the hype and to the predatory nature of the current modeling space (see this article for a perfect example of the predatory modeling side – it doesn’t specifically talk about models but believe me, they’re there, helping the payday lenders and the banks choose who to trap and who to ignore. I’ve talked to people who worked on earlier generations of those models).
Rachel also gave a TEDx Women’s talk at Barnard on the subject of bringing humanity to modeling. Here’s the video of her talk. And while I make fun of TED talks a lot, mostly because they have overly polished ideas and delivery, one thing I love about Rachel’s is how raw and powerful it is. Go Rachel!


A wonderful sentiment and a worthy and necessary goal for all science, not just data science. As you know from your own experience and the dissatisfaction and yearnings expressed by Occupy, the commandeering of science, economics and politics by the forces of greed and narrow self-interest are ever-present. As the world of massive data grows in ways invisible to most of us, we need more Rachels and MathBabes to help counter that human stain.