Here at the Lede Program we’ve been getting lots of different perspectives on what data journalism is and what it could be. As usual I will oversimplify for the sake of clarity, and apologies in advance to anyone I might offend.
The old school version of data journalism, which is called computer assisted reporting, maintains that a data story is first and foremost a story and should be viewed as such: you are investigating and interrogating the data as you would a witness, but the data isn’t itself a story, but rather a way of gathering evidence for the claims posed in the story. Every number cited needs to be independently supported with a secondary source.
Really important journalism lives in this context and is supported by the data, and the journalists in this realm are FOIA experts and speak truth to power in an exciting way. Think leaks and whistleblowers.
The new school vision of data journalism – again, entirely oversimplified – is that, by creating interesting data interactives that allow people to see how the news affects them – whether that means a map of “stuff happening” where they can see the stuff happening near them, or a big dataset that people can interact with in a tailored way, or a jury duty quiz that allows people to see how answers might get them kicked off or kept on a jury.
I imagine that some of these new-fangled approaches don’t even seem like stories at all to the old-school journalists, who want to see a bad guy caught, or a straight-up story told with a twist and a surprise and a “human face”. I’m not sure many of them would even get past the pitch stage if proffered to a curmudgeonly editor (and all editors are curmudgeonly, that’s just a fact).
The new interactive stories do not tell one story. Instead, they tell a bunch of stories to a bunch of people, and that interaction itself becomes the story. They also educate the public in a somewhat untamed way: by interacting with a database a reader can see variations in time, or in space, or in demographic, at least if the data is presented carefully.
Similarly, by seeing how each question on a jury duty quiz nudges you towards the plaintiff or the defendant, you can begin to see how seemingly innocuous information collected about you accumulates, which is how profiles are formed, on and offline.
I am super excited to announce that best-selling British author Nafeez Ahmed will be speaking at the Alt Banking group this Sunday. The title of his talk is Mass Surveillance and the Crisis of Civilization: The inevitable collapse of the old paradigm and the potential for the rise of the new.
Ahmed is an international security scholar and investigative journalist and executive director of the Institute for Policy Research & Development. He writes for The Guardian on the geopolitics of interconnected environmental, energy and economic crises, and is currently on tour in the United States to launch his science fiction novel, Zero Point.
As advance reading for this talk, we recommend browsing through his Guardian articles, including the widely read June 2014 piece, Pentagon preparing for mass civil breakdown. He’s also recently published on occupy.com an article entitled Exposed: Pentagon Funds New Data-Mining Tools To Track and Kill Activists, Part I.
Details: Ahmed will speak from 2-3pm on Sunday, August 24th, in room 409 of the International Affairs Building of Columbia University at W. 118th Street and Amsterdam Ave. After that we will have our regular meeting from 3-5pm in the same room, followed by food and drinks at Amsterdam Tapas. Please join us! And if you can’t this weekend but want to be on our mailing list, please email that request to email@example.com.
I’ve was away over the weekend (apologies to Aunt Pythia fans!) and super busy yesterday but this morning I finally had a chance to read Ethan Zuckerman’s Atlantic piece entitled The Internet’s Original Sin, which was sent to me by my friend Ernest Davis.
Here’s the thing, Zuckerman gets lots of things right in the article. Most importantly, the inherent conflict between privacy and the advertisement-based economy of the internet:
Demonstrating that you’re going to target more and better than Facebook requires moving deeper into the world of surveillance—tracking users’ mobile devices as they move through the physical world, assembling more complex user profiles by trading information between data brokers.
Once we’ve assumed that advertising is the default model to support the Internet, the next step is obvious: We need more data so we can make our targeted ads appear to be more effective.
This is well said, and important to understand.
Here’s where Zuckerman goes a little too far in my opinion:
Outrage over experimental manipulation of these profiles by social networks and dating companies has led to heated debates amongst the technologically savvy, but hasn’t shrunk the user bases of these services, as users now accept that this sort of manipulation is an integral part of the online experience.
It is a mistake to assume that “users accept this sort of manipulation” because not everyone has stopped using Facebook. Facebook is, after all, an hours-long daily habit for an enormous number of people, and it’s therefore sticky. People don’t give up addictive habits overnight. But it doesn’t mean they are feeling the same way about Facebook that they did 4 years ago. People are adjusting their opinion of the user experience as that user experience is increasingly manipulated and creepy.
An analogy should be drawn to something like smoking, where the rates have gone way down since we all found out it is bad for you. People stopped smoking even though it is really hard for most people (and impossible for some).
We should instead be thinking longer term about what people will be willing to leave Facebook for. What is the social networking model of the future? What kind of minimum privacy protections will convince people they are safe (enough)?
And, most importantly, will we even have reasonable minimum protections, or will privacy be entirely commoditized, whereby only premium pay members will be protected, while the rest of us will be thrown to the dogs?
“Data Science” is one of my least favorite tech buzzwords, second to probably “Big Data”, which in my opinion should be always printed followed by a winky face (after all, my data is bigger than yours). It’s mostly a marketing ploy used by companies to attract talented scientists, statisticians, and mathematicians, who, at the end of the day, will probably be working on some sort of advertising problem or the other.
Still, you have to admit, it does have a nice ring to it. Thus the title Democratizing Data Science, a vision paper which I co-authored with two cool Ph.D students at MIT CSAIL, William Li and Ramesh Sridharan.
The paper focuses on the latter part of the situation mentioned above. Namely, how can we direct these data scientists, aka scientists who interact with the data pipeline throughout the problem-solving process (whether they be computer scientists or programmers or statisticians or mathematicians in practice) towards problems focused on societal issues?
In the paper, we briefly define Data Science (asking ourselves what the heck it even means), then question what it means to democratize the field, and to what end that may be achieved. In other words, the current applications of Data Science, a new but growing field, in both research and industry, has the potential for great social impact, but in reality, resources are rarely distributed in a way to optimize the social good.
We’ll be presenting the paper at the KDD Conference next Sunday, August 24th at 11am as a highlight talk in the Bloomberg Building, 731 Lexington Avenue, NY, NY. It will be more like an open conversation than a lecture and audience participation and opinion is very welcome.
The conference on Sunday at Bloomberg is free, although you do need to register. There are three “tracks” going on that morning, “Data Science & Policy”, “Urban Computing”, and “Data Frameworks”. Ours is in the 3rd track. Sign up here!
If you don’t have time to make it, give the paper a skim anyway, because if you’re on Mathbabe’s blog you probably care about some of these things we talk about.
I’ve loved math since I can remember. When I was 5 I played with spirographs and learned about periodicity, which made me understand prime numbers as colorful patterns on a page. I always thought 5-fold symmetry was the most beautiful.
Then I got to college at UC Berkeley and in my second semester was privileged to learn algebra (and later, Galois Theory!) from Ken Ribet, who became my very good friend. He brought me to have dinner with all sorts of amazing mathematicians, like Serge Lang and J.P. Serre and Barry Mazur and John Tate and of course his Berkeley colleagues Hendrik Lenstra and Robert Coleman and many others. Many of the main characters behind the story of solving Fermat’s Last Theorem were people I had met at dinner parties at Ken’s house, including of course Ken himself. Math was discussed in between slices of Cheese Board Pizza and fresh salad mixes from the Berkeley Bowl.
How lucky was I?!?
And I knew it, at least partially. Really the best thing about these generous and wonderful people was how joyful they were about the serious business of doing math. It was a pleasure to them, and it made them smile and even appear wistful if I’d mention my difficulties with tensor products, say.
They were incredibly inviting to me, and honestly I was spoiled. I had been invited into this society because I loved math and because I was devoting myself to it, and that was enough for them. Math is, after all, not an individual act, it is a community effort, and progress is to be celebrated and adored. And it wasn’t just any community, it was a really really nice group of guys who loved what they did for a living and wanted other cool and smart people to join.
I mention all this because I want to clarify how fucking cool it can be to be a mathematician, and what kind of group involvement and effort it can feel like, even though many of the final touches on the proofs are made inside closed offices. Being part of such a community, where math is so revered and celebrated, it is its own reward to be able to prove a theorem and tell your friends about it.
Hey, guess what? This is true too! We always suspected it but now we can use it! How cool is that?
Now that I’ve explained how much I love math (and I still love math very much), let me explain why I hate the Fields Medal. Namely, because that group effort is utterly lost and is replaced with a synthetic and false myth of the individual genius working in isolation.
Here’s the thing, and I can say this now pretty confidently, journalism has rules about writing stories that don’t really work for math. When journalists are told to “put a face on the story,” they end up with all face and no story.
How else is a journalist going to write about progress in some esoteric field? The mathematics itself is naturally not within arms reach: mathematics is by nature deep and uses multiple layers of metaphor and notation which even trained mathematicians grapple with, never mind a new result on the very far edge of what is known. So it makes sense that the story becomes about the mathematician himself or herself.
It’s not just journalists, though. Certain mathematicians do their best to represent research mathematics, and sometimes it’s awesome, sometimes it kind of works, and sometimes it ends up being laughably or even embarrassingly simplistic. That’s the thing about math, it’s deep. It’s hard to boil down to a nut graf.
So here’s the thing, the Fields Medal is easy to understand (“it’s the Nobel Prize for math!”) but it’s incredibly and dangerously misleading. It gives the impression that we have these superstars who “have it” and then we have a bunch of wandering nerds who “don’t really have it.” That stereotype is a bad advertisement for mathematics and for mathematicians, who are actually much more generous and community-spirited than that.
Plus, now that I’m in full rant mode, can I just mention that the 40-year-old age limit for the award is just terrible and obviously works against certain people, especially women or men who take parenting seriously. I am not even going to explain that because it’s so freaking clear, and as a 42-year-old woman myself, may I say I’m just getting started. And yes, the fact that a woman has won the Fields Medal is a good things, but it’s a silver lining on an otherwise big old rain cloud which I do my best to personally blow away.
And, lest I seem somehow mean to the Fields Medal winners, of course they are great mathematicians! Yes, yes they are! They’re all great, and there are many great mathematicians who never get awards, and doing great math and making progress is its own reward, and those mathematicians who do great work tend to be the ones who already have lots of resources and don’t need more, but I’m not saying they shouldn’t be celebrated, because they’re awesome, no question about it.
Here’s what I’d like to see: serious outward-facing science journalism centered around, or at least instructive towards, the incredible collaborative effort that is modern mathematics.
Everyone I know who codes uses stackoverflow.com for absolutely everything.
Just yesterday I met a cool coding chick who was learning python and pandas (of course!) with the assistance of stackoverflow. It is exactly what you need to get stuff working, and it’s better than having a friend to ask, even a highly knowledgable friend, because your friend might be busy or might not know the answer, or even if your friend knew the answer her answer isn’t cut-and-paste-able.
If you are someone who has never used stackoverflow for help, then let me explain how it works. Say you want to know how to load a JSON file into python but you don’t want to write a script for that because you’re pretty sure someone already has. You just search for “import json into python” and you get results with vote counts:
Also, every math nerd I know uses and contributes to mathoverflow.net. It’s not just for math facts and questions, either, there are interesting discussions going on there all the time. Here’s an example of a comment in response to understanding the philosophy behind the claimed proof of the ABC Conjecture:
OK well hold on tight because now there’s a new online forum, but not about coding and not about math. It’s about all the other STEM subjects, which since we’ve removed math might need to be called STE subjects, which is not catchy.
So far only statistics is open, but other stuff is coming very soon. Specifically it covers, or soon will cover, the following fields:
- Cognitive Sciences
- Computer Sciences
- Earth and Planetary Sciences
- Science & Math Education
- History of Science and Mathematics
- Applied Mathematics, and
I’m super excited for this site, it has serious potential to make peoples’ lives better. I wish it had a category for Data Sciences, and for Data Journalism, because I’d probably be more involved in those categories than most of the above, but then again most data science-y questions could be inserted into one of the above. I’ll try to be patient on this one.
Here’s a screen shot of an existing Stats question on the site:
There’s an interesting and horrible New York Time story by Jessica Silver-Greenberg about a PayDay loan syndicate being run out of New York State. The syndicate consists of twelve companies owned by a single dude, Carey Vaughn Brown, with help from a corrupt lawyer and another corrupt COO. Manhattan District Attorneys are charging him and his helpers with usury under New York law.
The complexity of the operation was deliberate and intended to obscure the chain of events that would start with a New Yorker online looking for quick cash online and end with a predatory loan. They’d interface with a company called MyCashNow.com, which would immediately pass their application on to a bunch of other companies in different states or overseas.
Important context: in New York, the usury law caps interest rates at 25 percent annually, and these PayDay operations were charging between 350 and 650 percent annually. Also key, the usury laws apply to where the borrower is, not where the lender is, so even though some of the companies were located (at least on paper) in the West Indies, they were still breaking the law.
They don’t know exactly how big the operation was in New York, but one clue is that in 2012, one of the twelve companies had $50 million in proceeds from New York.
Here’s my question: how did MyCashNow.com advertise? Did it use Google ads, or Facebook ads, or something else, and if so, what were the attributes of the desperate New Yorkers that it looked for to do its predatory work?
One side of this is that vulnerable people were somehow targeted. The other side is that well-off people were not, which meant they didn’t see ads like this, which makes it harder for people like the Manhattan District Attorney to even know about shady operations like this.