Critical Questions for Big Data by danah boyd & Kate Crawford

Home > data journalism, data science > Critical Questions for Big Data by danah boyd & Kate Crawford

Critical Questions for Big Data by danah boyd & Kate Crawford

July 1, 2014 Cathy O'Neil, mathbabe

I’m teaching a class this summer in the Lede Program, starting in mid-July, which is called The Platform. Here’s the course description:

This course begins with the idea that computing tools are the products of human ingenuity and effort. They are never neutral and carry with them the biases of their designers and their design process. “Platform studies” is a new term used to describe investigations into these relationships between computing technologies and the creative or research products that they help to generate. How you understand how data, code, and algorithms affect creative practices can be an effective first step toward critical thinking about technology. This will not be purely theoretical, however, and specific case studies, technologies, and project work will make the ideas concrete.

Since my first class is coming soon, I’m actively thinking about what to talk about and which readings to assign. I’ve got wonderful guest lecturers coming, and for the most part the class will focus on those guest lecturers and their topics, but for the first class I want to give them an overview of a very large subject.

I’ve decided that danah boyd and Kate Crawford’s recent article, Critical Questions for Big Data, is pretty much perfect for this goal. I’ve read and written a lot about big data but even so I’m impressed by how clearly and comprehensively they have laid out their provocations. And although I’ve heard many of the ideas and examples before, some of them are new to me, and are directly related to the theme of the class, for example:

Twitter and Facebook are examples of Big Data sources that offer very poor archiving and search functions. Consequently, researchers are much more likely to focus on something in the present or immediate past – tracking reactions to an election, TV finale, or natural disaster – because of the sheer difficulty or impossibility of accessing older data.

Of course the students in the Lede are journalists, not academic researchers, which the article mostly addresses, and moreover they are not necessarily working with big data per se, but even so they are increasingly working with social media data, and moreover they are probably covering big data even if they don’t directly analyze it. So I think it’s still relevant to them. Or another way to express this is that one thing we will attempt to do in class is examine the extent to which their provocations are relevant.

Here’s another gem, directly related to the Facebook experiment I discussed yesterday:

As computational scientists have started engaging in acts of social science, there is a tendency to claim their work as the business of facts and not interpretation. A model may be mathematically sound, an experiment may seem valid, but as soon as a researcher seeks to understand what it means, the process of interpretation has begun. This is not to say that all interpretations are created equal, but rather that not all numbers are neutral.

In fact, what with this article and that case study, I’m pretty much set for my first day, after combining them with a discussion of the students’ projects and some related statistical experiments.

I also hope to invite at least one of the authors to come talk to the class, although I know they are both incredibly busy. Danah boyd, who recently came out with a book called It’s Complicated: the social lives of networked teens, also runs the Data & Society Research Institute, a NYC-based think/do tank focused on social, cultural, and ethical issues arising from data-centric technological development. I’m hoping she comes and talks about the work she’s starting up there.

Categories: data journalism, data science

Comments (7)

Thomas Karnofsky

July 1, 2014 at 11:11 am

In case you didn’t catch this post at NC, with some remarkable, informed, comments on the impact of easily (if you have the right, incredibly expensive equipment) attainable sets of data on the conduct of biomedical research. I have a vision of unemployed PhDs giving up on “real” careers and setting up shoestring labs in their basements and making great discoveries. Of course, who’s got a basement these days?

http://www.nakedcapitalism.com/2014/06/crapification-of-biomedical-research.html

LikeLike
- Cathy O'Neil, mathbabe
  
  July 1, 2014 at 11:13 am
  
  thanks!
  
  On Tue, Jul 1, 2014 at 11:11 AM, mathbabe wrote:
  
  >
  
  LikeLike
Arthur Wilke

July 1, 2014 at 12:09 pm

The Kramer, et al.,-Facebook “experiment” poses a problem for researchers at institutions (e.g., universities), organizations and individuals receiving federal money; Facebook user agreements do not appear, as this item (http://laboratorium.net/archive/2014/06/28/as_flies_to_wanton_boys) suggests meet the human subjects review standards of federally mandated Institutional Review Boards. In some cases journalists might highlight the different rules that apply along with periodically reminding readers of experiments of this kind as well as trends in social media that the customer is a commodity, not an independent user of a public service provider. Meanwhile the “upset” expressed by some regarding the experiment may be simply a part of a market response or constitute an unintended consequence. It would be a topic of interest for conventional social researchers.

For providers such as Facebook the lesson may be to continue to do experiments, include attorneys in order to enjoy lawyer-client privilege and simply use the results but not publish them: definitely a challenge for investigative journalists.

LikeLike
abekohen

July 1, 2014 at 12:59 pm

Might I suggest: Women’s health. Frequency of pelvic exams, mammograms, and the use of evidenced based medicine. What does the data show? Is there data? What are the ethical implications? The ethics of cost vs undiagnosed diseases (and potential death).

LikeLike
Arthur Wilke

July 1, 2014 at 1:43 pm

Apparently Facebook was aware of some of its human subjects research problems: with its “experiment:” http://pando.com/2014/07/01/facebook-didnt-update-its-data-policy-until-after-the-emotion-study-making-the-whole-thing-even-less-defensible/

LikeLike
Arthur Wilke

July 2, 2014 at 2:33 pm

Coauthor and Facebook employee Adam Kramer defended the Facebook experiment as part of an effort to improve service to users of Facebook (http://abcnews.go.com/blogs/headlines/2014/06/author-defends-study-that-experiments-with-facebook-users-emotions/) but the benefits of the research, he’s quoted as saying, “may not have justified all of this anxiety” (http://www.sfgate.com/opinion/editorials/article/Facebook-shouldn-t-experiment-with-users-feeds-5591383.php). Meanwhile PANDO (http://pando.com/2014/07/02/european-agencies-start-questioning-facebook-about-its-emotion-study/) reports possible European agencies’ concerns about the experiment.
The responses to the experiment, their distribution and the eventiual shelf-life of this story when compared, for example, to that of the publication of some of Edward Snowden’s trove of NSA documents may be a more challenging question than whether or not mood can be tweaked. Both Facebook and NSA, however, have vested interests in secrecy. Tentatively as to matters of importance to citizens rather than “friends,” the Snowden revelations look more important but domestically possibly not as as provocative as the revelation of the Facebook experiment. Researcher Kramer’s concern about the well-being of psyches may be a clue to the dominant framing in the political economy of information. For journalists having familiarity with the “insider’s” perspective and tools is a valuable tool and if exercised, even as critique or criticism, is an assist for those of us who rely on their reporting.

LikeLike
LBB

July 11, 2014 at 7:19 am

Consider “Dataclysm,” a book by Christian Rudder.

LikeLike