Data journalism

Home > data journalism > Data journalism

Data journalism

February 28, 2014 Cathy O'Neil, mathbabe

I’m in Berkeley this week, where I gave two talks (here are my slides from Monday’s talk on recommendation engines, and here are my slides from Tuesday’s talk on modeling) and I’ve been hanging out with math nerds and college friends and enjoying the amazing food and cafe scene. This is the freaking life, people.

Here’s what’s been on my mind lately: the urgent need for good data journalism. If you read this Washington Post blog by Max Fisher you will get at one important angle of the problem. The article talks about the need for journalists to be competent in basic statistics and exploratory data analysis to do reasonable reporting on data, in this case the state of journalistic freedoms.

And you might think that, as long as journalists report on other stuff that’s not data heavy, they’re safe. But I’d argue that the proliferation of data is leaking into all corners of our culture, and basic data and computing literacy is becoming increasingly vital to the job of journalism.

Here’s what I’m not saying (a la Miss Disruption): learn to code, journalists, and everything will be cool. To be clear, having data skills is necessary but not sufficient.

So it’s more like, if you don’t learn to code, and even more importantly if you don’t learn to be skeptical of the models and the data, then you will have yet another obstacle between you and the truth.

Here’s one way to think about it. A few days ago I wrote a post about different ways to define and regulate discriminatory acts. On the one hand you have acts or processes that are “effectively discriminatory” and on the other you have acts or processes that are “intentionally discriminatory.”

In this day and age, we have complicated, opaque, and proprietary models: in other words, a perfect hiding place for bad intentions. It would be idiotic for someone with the intention of being discriminatory to do so outright. It’s much easier to embed such a thing in an opaque model where it will seem unintentional and will probably never be discovered at all.

But how is an investigative journalist going to even approach that? The first thing they need is to arm themselves with the right questions and the right attitude. And it wouldn’t help if they or their team can perform a test on the data and algorithm as well.

I’m not saying that we’re going to suddenly have do-everything super human journalists. Just as the list of job requirements for data scientists is outrageously long and nobody can be expert at everything, we will have to form teams of journalists which as a whole has lots of computing and investigative expertise.

The alternative is that the models go unchallenged, which is a really bad idea.

Here’s a perfect example of what I think needs to happen more: when ProPublica reverse-engineered Obama’s political messaging model.

Categories: data journalism

Comments (5)

medicalquackblog

February 28, 2014 at 1:51 pm

You are so right Cathy! Me thinks this is why Bezos bought the Washington Post, as stock bots read news feeds and Amazon needs that service like any public company traded. We still need hard news too, a mix of course. A journalist from the LA Times wrote about that recently too and said “we are buried in stat news” if you will and I kind of agree on that. You make the great point about accuracy and using good modeling to “find” or “create” a story with data bases as without doing it right, gosh knows it can go anywhere if not done right.

Over at the Economist, Ken Cukier made a video saying the same thing a short while back, “interviewing data bases” and yes having a team to do stories that are more complex than hard news for sure. With newspaper media looking for new money streams, exposure to ads where they are going with “click bait” to read the stories and yeah there’s still real good stuff out there but it could allow for more of the other to creep in too if a reporter is running short on “clicks” quotas:) Unfortunately they are getting to share the love of pay for performance that’s been clobbering doctors for quite a while. Here’s Cukier from the Economist with his video…also there’s a second video here about the “journobot” which is sometimes frightening with the levels of automation in there and Charlie Siefe says he’s working on a book that will have a chapter devoted to the “journobot” and I’m anxious to see what he says about it with what he feels. Bot, bots and more bots:)

http://ducknetweb.blogspot.com/2014/01/cms-modifies-policy-on-disclosure-on.html

I could see a journobot being helpful in gathering information and data for sure but how far do you let it go I think is the question…but yes you are right on with journalism and where it’s going for sure. If done right we could get any type of analytics perceptions used.

LikeLike
Alan Fekete

February 28, 2014 at 3:22 pm

See the interesting concept of “Computational journalism” described in a discussion published in CommACM in 2011.

LikeLike
Guest2

March 1, 2014 at 11:10 am

And where does the data come from? Does it fall out of the sky onto our lap-tops?

Not yet. But long needed changes in the Freedom of Information Act (passed by the US House unanimously 410-0 — now there is a statistic worth considering!) will force government agencies to post thrice-requested docs. FOIA, with limited exemptions, allows access to records received by gov agencies as well. BTW: States also have “public records” laws.

My point is, these changes up the ante for journalists. Not only do they need number and graph savvy, but they need to know how to get the numbers in the first place.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 1, 2014 at 11:11 am
  
  Hey that’s great news Thanks
  
  LikeLike
jmakansi

March 1, 2014 at 1:46 pm

In the late 1980s, professionals complained that journalists did not have enough background in the STEM fundamentals to report on and analyze credibly environmental issues. I was present during and involved in the founding of the Society of Environmental Journalists which was supposed to help remedy this situation. Twenty years later, I don’t think much has changed. Journalists will never be taught the underlying STEM stuff and few will learn it on their own or come from other professions into journalism. The other, more aggravating factor is that journalists are supposed to report, and therefore are at the mercy of reporting “both” sides of a story, regardless of how ideologically positioned both sources might be, and then have little choice but to “favor” the position that supports their own underlying (and usually not well informed) opinions or grounded ideologies. Finally, many of our contemporary hot button issues for journalists that require an ability to crunch numbers or at least understand of how they are crunched are so complex that I’m afraid they are so far beyond the grasp of journalists and even most professionals. Global climate change is the probably the best example. In my mind, the climate change “story” for or against depends completely on the error analyses which should be undertaken AND reported with any story that reports new ”
numbers.” How many journalists do you think would ever understand error analysis? Hell, I can’t get the professionals in my sector (energy) to pay attention to error analysis!

LikeLike