Navigating the mindset for data journalism

Home > data journalism > Navigating the mindset for data journalism

Navigating the mindset for data journalism

August 8, 2014 Cathy O'Neil, mathbabe

I’ve been working my butt off this summer starting up a data journalism program and teaching in it. I couldn’t ask for a better crew of students and instructors: engaged, intelligent, brave, and eager to learn. And my class has been amazing, due to the incredibly guest speakers who have given their time to us. On Tuesday we were honored to have danah boyd come talk about her new book It’s Complicated, and yesterday Julie Steele talked to us about visualization and how our technological tools affect our design, which was fabulous and also super useful for the class projects.

I feel like it’s the picture perfect situation for the emerging field of data journalism to be defined and developed. Even so, there are real obstacles to getting this right that I hadn’t anticipated. Let me focus on obstacles that exist within the academy, since that’s what I’ve been confronting these past few weeks and months.

Basically, as everyone knows, academia is severely partitioned between departments, both physically and culturally. Data journalism sits more or less between journalism and computer science, and both of those fields have cultures that are unintentionally hostile to a thriving new descendant. Let me exaggerate for effect, which is what I do.

In cartoonish form, introductory computer science classes are competitive weeder classes that promote a certain kind of narrow, clever, problem-solving approach. If you get your code to work, and work fast, you’re done, and you move quickly to the next question because there’s an avalanche of work and technical issues to plow through.

You don’t get that much time to think, and you almost never address the question of how to do things differently, or why syntax is inconsistent between different parts of python, or generally why a computer language is the way it is and how it could have been designed differently and what the history was that made it so, because you don’t have time and you have to learn learn learn. In other words, it’s kind of the least context-laden and most content-heavy way of learning that you can imagine. You impress people by what you can make work, and how fast, and it is a deep but narrow way of working, kind of like efficient well-digging.

Now let’s paint an equally exaggerated vision of the journalist training. A good journalist collects a ton of information to create a kind of palette for the topic in question, and dives straight into ambiguity or history or bias or contradiction to learn even more, and then starts to build a thesis after such comprehensive information collection has occurred. In other words, the context is what makes a topic interesting and important and newsworthy, and the human and gripping example is critical to illustrate the topic as well as to make it into a story rather than a set of facts. You impress people by your ability to synthesize an incredible breadth of knowledge and then find the hook that makes it a compelling story and draw it out and make it real. This is a broad filtering method where you don’t take the next step until you know you should.

To make it even more dumbed down, journalists are ever aware of the things they know they don’t know, and desperately want to fill in their knowledge gaps because otherwise they feel fraudulent, like they’re jumping to unwarranted conclusions. Computer scientists don’t care about not knowing things as long as their programs work. They can be blithe with respect to messy human details, which of course means they sometimes don’t notice or figure out their data has selection bias because they got an answer, but also means they are super efficient.

Now you can see why it’s a tough thing to teach journalists to code, and it’s also a tough thing to expect coders to become journalists. Both sides emphasize a kind of learning and a definition of success that the other side is blind to.

What would a middle ground look like? In the ideal scenario, it would be a place that appreciates and uses the power of data and programming and spends the time learning the history and searching the inherent human bias of data collection and analysis. That scenario is exciting, but it clearly takes time to build and represents a real investment both by the academic institutions that build it and by the media that eventually hire the data journalists coming from it.

In other words, the outside world has to actually want to hire the emerging thoughtful fruit of that labor. It brings me to other problems for data journalism that largely live outside the academic world, which I might blog about at some other time.

Categories: data journalism

Comments (11)

Jon Awbrey

August 8, 2014 at 7:40 am

☞http://m.org.sagepub.com/content/8/2/269.abstract

LikeLike
- Cathy O'Neil, mathbabe
  
  August 8, 2014 at 7:44 am
  
  Thanks!
  
  LikeLike
  - Jon Awbrey
    
    August 8, 2014 at 9:56 am
    
    Forgive the brevity, I was typing a cellphie, upside-down, still abed. Now I’ve had my walk I can be a little more peripatetic.
    
    Tensions like you describe are a natural part of all cross-&-trans-disciplinary efforts. I encountered them especially acutely in my time consulting on research as I tried to create computational tools to bridge the gaps between expert and novice researchers and between qualitative and quantitative research methods.
    
    The paper I linked came out of joint research Susan Awbrey and I did on integrative scholarship, learning organizations, etc. There are links to more papers along those lines on my Academia.edu page.
    
    LikeLike
Ryan Shaw

August 8, 2014 at 7:49 am

> What would a middle ground look like? In the ideal scenario, it would be a place that appreciates and uses the power of data and programming and spends the time learning the history and searching the inherent human bias of data collection and analysis.

FWIW, that is roughly the kind of place that information schools have been trying to become for the past couple of decades: http://ischools.org/about/history/

LikeLike
JSE

August 8, 2014 at 10:51 am

The only problem with “I-Schools” is that every time someone mentions one I start singing “ROCK ROCK ROCK ROCK ROCK N ROLL I-SCHOOL.”

LikeLike
Guest2

August 8, 2014 at 10:53 am

I love the dichotomy — decontextualized “technical” skills versus the totality of embeddedness and media skills. However, since tech skills are high-status relative to journalism, the split between them is unbridgeable.

Even the history of computer programmers, early on, between machine language crowd and language programmers is instructive — they contested each other’s legitimacy until they went their separate ways.

Think about your target audience.

What you are talking about here is a new sub-discipline, perhaps starting out as the diffusion of new skill-sets among journalists. You can hope that this group reaches a self-perpetuating threshold density, builds an organization, and figures out where its resources — new resources — can come from. Right now, it looks more like a new social movement (where recruiting, high emotional energy are important).

Perhaps if you can convert a high-status journalist — one that is established — this could kick-start the process of formation.

LikeLike
cat

August 8, 2014 at 11:28 am

I know you were painting with a broad brush, but the first few semesters of a CS program would look different if it weren’t remedial Computer Science. I guess the theory is if you can’t code you can’t implement any of the ‘science’ part they’ll teach you later on.

Having flunked Calculus a few times I would characterize Math programs exactly the same as you characterize Computer Science since college was my first exposure to to Math with a capital m.

LikeLike
vznvzn

August 8, 2014 at 1:07 pm

yeah edu/academia does not seem to value anything other than “right answers” and this has gotten way worse with stdized testing. planning on blogging on this myself at some point. its slightly better in college but it does seem like the edu system encourages a narrow-verging-on-tunnell vision kind of thinking. this even shows up online forums dedicated to knowledge/ experts eg on stackexchange. fyi saw this great movie/ documentary on the massive shifts going on in journalism wrt digital, assange etc from the pov of NYT newsroom. think you & readers will like it. NYT page 1. http://www.imdb.com/title/tt1787777/

LikeLike
Auros

August 8, 2014 at 10:50 pm

Personally, I felt quite a few of my CS classes were *too* focused on abstractions — why a language should be this way or that, and how languages implement lambda calculus — and didn’t give enough training on how to actually get stuff done. There’s a lot of conflict in CS between the theory folks and the technicians.

LikeLike
- Auros
  
  August 8, 2014 at 10:58 pm
  
  Also… My specialty was linguistics, and I was taking it at the school where that discipline was presided over by Frederick Jelinek, the who invented statistical speech recognition. That may be one of the best fields to drop somebody into to teach them the limitations of the quick-and-dirty approach you’re talking about. Go too fast, and you end up over-fitting your training data (so you fail when presented with novel test data), or fall prey to a variety of other classic blunders. With text-based problems, like Google’s search and Facebook’s various newsfeed algorithms, “failure” is kind of invisible unless it’s really catastrophic. With speech rec, even if you’re really pretty good, you’ll get regularly reminded of your imperfections:
  
  http://krugman.blogs.nytimes.com/2011/02/05/voice-recognition-still-imperfect/
  
  http://krugman.blogs.nytimes.com/2013/11/01/speech-recognition-still-imperfect/
  
  LikeLike
Bill

August 9, 2014 at 10:08 am

A real CS department spends time on automata, algorithms, language concepts, etc.
What you are describing is a diploma mill to churn out programmers. Now many people
who enter CS departments with “programming experience” believe that what you
describe is all that matters and complain bitterly about having to learn abstract concepts or do any kind of analysis. They avoid as much as possible taking courses that aren’t about teaching them the latest programming language/graphics toolkit/etc. They also avoid
taking any software engineering classes because they’ve never worked on anything
which was all that complicated or had to be maintained for years. Good departments refuse to let them get away with this.

While I’m sure that the journalists that you’ve run into at Columbia are the paragons of virtue that you describe, from what I’ve seen actually delivered as journalism in the USA;
I strongly doubt that many members of the profession even come close too what you describe on a daily basis.

In the end, both journalists and programmers are under deadline pressures which makes it difficult for even those who know the right way to do things to take the time to do so. Unfortunately, I’m afraid I can’t think of any way to change the situation.

LikeLike