I’m in Berkeley this week, where I gave two talks (here are my slides from Monday’s talk on recommendation engines, and here are my slides from Tuesday’s talk on modeling) and I’ve been hanging out with math nerds and college friends and enjoying the amazing food and cafe scene. This is the freaking life, people.
Here’s what’s been on my mind lately: the urgent need for good data journalism. If you read this Washington Post blog by Max Fisher you will get at one important angle of the problem. The article talks about the need for journalists to be competent in basic statistics and exploratory data analysis to do reasonable reporting on data, in this case the state of journalistic freedoms.
And you might think that, as long as journalists report on other stuff that’s not data heavy, they’re safe. But I’d argue that the proliferation of data is leaking into all corners of our culture, and basic data and computing literacy is becoming increasingly vital to the job of journalism.
Here’s what I’m not saying (a la Miss Disruption): learn to code, journalists, and everything will be cool. To be clear, having data skills is necessary but not sufficient.
So it’s more like, if you don’t learn to code, and even more importantly if you don’t learn to be skeptical of the models and the data, then you will have yet another obstacle between you and the truth.
Here’s one way to think about it. A few days ago I wrote a post about different ways to define and regulate discriminatory acts. On the one hand you have acts or processes that are “effectively discriminatory” and on the other you have acts or processes that are “intentionally discriminatory.”
In this day and age, we have complicated, opaque, and proprietary models: in other words, a perfect hiding place for bad intentions. It would be idiotic for someone with the intention of being discriminatory to do so outright. It’s much easier to embed such a thing in an opaque model where it will seem unintentional and will probably never be discovered at all.
But how is an investigative journalist going to even approach that? The first thing they need is to arm themselves with the right questions and the right attitude. And it wouldn’t help if they or their team can perform a test on the data and algorithm as well.
I’m not saying that we’re going to suddenly have do-everything super human journalists. Just as the list of job requirements for data scientists is outrageously long and nobody can be expert at everything, we will have to form teams of journalists which as a whole has lots of computing and investigative expertise.
The alternative is that the models go unchallenged, which is a really bad idea.
Here’s a perfect example of what I think needs to happen more: when ProPublica reverse-engineered Obama’s political messaging model.