What can be achieved by Data Science?

Home > data science, news > What can be achieved by Data Science?

What can be achieved by Data Science?

August 18, 2014 Cathy O'Neil, mathbabe

This is a guest post by Sophie Chou, who recently graduated from Columbia in Computer Science and is on her way to the MIT Media Lab. Crossposted on Sophie’s blog.

“Data Science” is one of my least favorite tech buzzwords, second to probably “Big Data”, which in my opinion should be always printed followed by a winky face (after all, my data is bigger than yours). It’s mostly a marketing ploy used by companies to attract talented scientists, statisticians, and mathematicians, who, at the end of the day, will probably be working on some sort of advertising problem or the other.

Still, you have to admit, it does have a nice ring to it. Thus the title Democratizing Data Science, a vision paper which I co-authored with two cool Ph.D students at MIT CSAIL, William Li and Ramesh Sridharan.

The paper focuses on the latter part of the situation mentioned above. Namely, how can we direct these data scientists, aka scientists who interact with the data pipeline throughout the problem-solving process (whether they be computer scientists or programmers or statisticians or mathematicians in practice) towards problems focused on societal issues?

In the paper, we briefly define Data Science (asking ourselves what the heck it even means), then question what it means to democratize the field, and to what end that may be achieved. In other words, the current applications of Data Science, a new but growing field, in both research and industry, has the potential for great social impact, but in reality, resources are rarely distributed in a way to optimize the social good.

We’ll be presenting the paper at the KDD Conference next Sunday, August 24th at 11am as a highlight talk in the Bloomberg Building, 731 Lexington Avenue, NY, NY. It will be more like an open conversation than a lecture and audience participation and opinion is very welcome.

The conference on Sunday at Bloomberg is free, although you do need to register. There are three “tracks” going on that morning, “Data Science & Policy”, “Urban Computing”, and “Data Frameworks”. Ours is in the 3rd track. Sign up here!

If you don’t have time to make it, give the paper a skim anyway, because if you’re on Mathbabe’s blog you probably care about some of these things we talk about.

Categories: data science, news

Comments (4)

medicalquackblog

August 18, 2014 at 11:07 am

Well I’m glad we have folks like you out there Cathy, as the other side keeps figuring out ways to profit. I posted this over the weekend and you’re more in the know than me to talk about it but I just got the news out. Maybe you can elaborate on this about whether or not the utilities need better data scientists to compete or are the Congestion Contracts really be worked by financial companies or maybe both:) At any rate, it still ticks you off to read about it and a some really good investigative reporting from the New York Times.

http://ducknetweb.blogspot.com/2014/08/quant-run-investment-companies-cashing.html

LikeLike
- Guest2
  
  August 19, 2014 at 10:23 am
  
  Interesting! “Making money” is the prime motive with these folks. Anthony Giddens calls this the moral sequestration of modern life — the “evaporation” of morals.
  
  So, how do we counter this? Utilities and HHS have to hire their own quants? Then, regulators have to hire MORE quants, to make sure that the government quants aren’t bribed??? It never ends …
  
  LikeLike
- rtg
  
  August 19, 2014 at 3:46 pm
  
  Am I being naive to think that there could be a regulatory solution here? It sounds like the banks are profiting by acting effectively as electricity re-sellers. So they outcompete the utilities to purchase this extra capacity during spikes, but then they resell to utilities (since what do they want with the actual power itself). This is a bit like trading on futures markets when you don’t actually expect to take delivery of the pork or whatever. Couldn’t regulators limit the market to utilities? Or would that defeat the purpose (don’t know enough).
  
  Overall, this strikes me as just another in a long example of how financial markets (which to serve an important and legitimate purpose in matching capital to projects/ideas/etc) are being twisted for the sole purpose of making money. Reselling mortgages makes sense as a way to spread the risk out beyond just the entities that are capable of working directly with buyers…this can even help keep rates down for consumers. But when you start re-packaging the mortgages for the sole purpose of making money without any regard for what the asset is, you get into perverse activity. Perhaps this, more than anything, is the “Original Sin” of Data Science. There is a lot of talk about data-driven understanding, but I’ve encountered few problems where you don’t also need to couple in some domain expertise to avoid perverse results. Quantitative trading is the most extreme example of this. The equations are remarkably good at optimizing around what you ask them to (until they change the underlying state of things and render themselves no longer accurate)…but it’s rarely good for society to maximize profits over all else.
  
  LikeLike
vznvzn

August 19, 2014 at 6:33 pm

this seems to be a start on “ethics of data science”. agreed with the paper that economics tends to distort its focus. on the other hand at times everything about winner-take-all capitalism tends to get distorted… more on big data & the recent facebook academic study messup

LikeLike