August 22, 2014

So, Jeff Larson from ProPublica came yesterday to talk to us at the Lede Program, and damn that guy is awesome.

My data journalism hero!

Jeff Larson, my data journalism hero!

First, he showed us his work with the ProPublica Message Machine, where they first crowdsourced, then reverse engineered Obama’s political targeting algorithm. Turns out they used decision trees for that, so we got to talk about decision trees. But since it was an awesome project important to democracy, we also got to talk about democracy.

After that lengthy discussion, Jeff told us about using clustering algorithms to find interesting emails in foreign languages (and in particular, to sort out the spam). He mentioned both cosine similarity and k-means, which was cool because the Lede students already knew about those, and for a moment the class was like, “hey we can do this!” and it was true.

But just then, he showed us how to bypass captcha pages, at least 90% of the time, using neural networks. He seemed to somehow remain humble whilst explaining that he did this over a lunch break. Then the class was like, “holy shit this guy is a crazy genius!” and that was true too. 

Then Jeff led the entire program downtown to the ProPublica offices and gave us a tour of the office, and some of the other data journalists came in and told us what they were up to, which was super awesome but also top secret so I can’t tell you anything else about it. Suffice it to say they were all very awesome and that only one of them had formal CS training (Jeff was a lit major!), so the day was overall very inspiring and thought provoking.

