I recently devoured Julia Angwin‘s new book Dragnet Nation: A Quest for Privacy, Security, and Freedom in a World of Relentless Surveillance. I actually met Julia a few months ago and talked to her briefly about her upcoming book when I visited the ProPublica office downtown, so it was an extra treat to finally get my hands on the book.
First off, let me just say this is an important book, and a provides a crucial and well-described view into the private data behind the models that I get so worried about. After reading this book you have a good idea of the data landscape as well as many of the things that can currently go wrong for you personally with the associated loss of privacy. So for that reason alone I think this book should be widely read. It’s informational.
Julia takes us along her journey of trying to stay off the grid, and for me the most fascinating parts are her “data audit” (Chapter 6), where she tries to figure out what data about her is out there and who has it, and the attempts she makes to clean the web of her data and generally speaking “opt out”, which starts in Chapter 7 but extends beyond that when she makes the decision to get off of gmail and LinkedIn. Spoiler alert: her attempts do not succeed.
From the get go Julia is not a perfectionist, which is a relief. She’s a working mother with a web presence, and she doesn’t want to live in paranoid fear of being tracked. Rather, she wants to make the trackers work harder. She doesn’t want to hand herself over to them on a silver platter. That is already very very hard.
In fact, she goes pretty far, and pays for quite a few different esoteric privacy services; along the way she explores questions like how you decide to trust the weird people who offer those services. At some point she finds herself with two phones – including a “burner”, which made me think she was a character in House of Cards – and one of them was wrapped up in tin foil to avoid the GPS tracking. That was a bit far for me.
Early on in the book she compares the tracking of a U.S. citizen with what happened under Nazi Germany, and she makes the point that the Stasi would have been amazed by all this technology.
Very true, but here’s the thing. The culture of fear was very different then, and although there’s all this data out there, important distinctions need to be made: both what the data is used for and the extent to which people feel threatened by that usage are very different now.
Julia brought these up as well, and quoted sci-fi writer David Brin: The key question is, who has access? and what do they do with it?
Probably the most interesting moment in the book was when she described the so-called “Wiretapper’s Ball”, a private conference of private companies selling surveillance hardware and software to governments to track their citizens. Like maybe the Ukrainian government used such stuff when they texted warning messages to to protesters.
She quoted the Wiretapper’s Ball organizer Jerry Lucas as saying “We don’t really get into asking, ‘Is in the public’s interest?'”.
That’s the closest the book got to what I consider the critical question: to what extent is the public’s interest being pursued, if at all, by all of these data trackers and data miners?
And if the answer is “to no extent, by anyone,” what does that mean in the longer term? Julia doesn’t go much into this from an aggregate viewpoint, since her perspective is both individual and current.
At the end of the book, she makes a few interesting remarks. First, it’s just too much work to stay off the grid, and moreover it’s become entirely commoditized. In other words, you have to either be incredibly sophisticated or incredibly rich to get this done, at least right now. My guess is that, in the future, it will be more about the latter category: privacy will be enjoyed only by those people who can afford it.
Julia also mentions near the end that, even though she didn’t want to get super paranoid, she found herself increasingly inside a world based on fear and well on her way to becoming a “data survivalist,” which didn’t sound pleasant. It is not a lot of fun to be the only person caring about the tracking in a world of blithe acceptance.
Julia had some ways of measuring a tracking system, which she refers to as a “dragnet”, which seems to me a good place to start:
Yesterday an exciting ProPublica article entitled Machine Bias came out. Written by Julia Angwin, author of Dragnet Nation, and Jeff Larson, data journalist extraordinaire, the piece explains in human terms what it looks like when algorithms are biased.
Specifically, they looked into a class of models I featured in my upcoming book, Weapons of Math Destruction, called “recidivism risk” scoring models. These models score defendants and give those scores to judges to help them decide how long to sentence them to prison, for example. Higher scores of recidivism are supposed to correlate to a higher likelihood of returning to prison, and people who have been assigned high scores also tend to get sentenced to longer prison terms.
What They Found
Angwin and Larson studied the recidivism risk model called COMPAS. Starting with COMPAS scores for 10,000 criminal defendants in Broward County, Florida, they looked at the difference between who was predicted to get rearrested by COMPAS versus who actually did. This was a direct test of the accuracy of the risk model. The highlights of their results:
- Black defendants were often predicted to be at a higher risk of recidivism than they actually were. Our analysis found that black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent).
- White defendants were often predicted to be less risky than they were. Our analysis found that white defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often as black re-offenders (48 percent vs. 28 percent).
- The analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 45 percent more likely to be assigned higher risk scores than white defendants.
- Black defendants were also twice as likely as white defendants to be misclassified as being a higher risk of violent recidivism. And white violent recidivists were 63 percent more likely to have been misclassified as a low risk of violent recidivism, compared with black violent recidivists.
- The violent recidivism analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 77 percent more likely to be assigned higher risk scores than white defendants.
Here’s one of their charts (lower scores mean low-risk):
How They Found It
ProPublica is awesome and has the highest standards in data journalism. Which is to say, they published their methodology, including a description of the (paltry) history of other studies that looked into racial differences for recidivism risk scoring methods. They even have the data and the ipython notebook they used for their analysis on github.
They made heavy use of the open records law in Florida to do their research, including the original scores, the subsequent arrest records, and the classification of each person’s race. That data allowed them to build their analysis. They tracked both “recidivism” and “violent recidivism” and tracked both the original scores and the error rates. Take a look.
How Important Is This?
This is a triumph for the community of people (like me!) who have been worrying about exactly this kind of thing but who haven’t had hard proof until now. In my book I made multiple arguments for why we should expect this exact result for recidivism risk models, but I didn’t have a report to point to. So, in that sense, it’s extremely useful.
More broadly, it sets the standard for how to do this analysis. The transparency involved is hugely important, because nobody will be able to say they don’t know how these statistics were computed. They are basic questions by which every recidivism risk model should be measured.
Until now, recidivism risk models have been deployed naively, in judicial systems all across the country, and judges in those systems have been presented with such scores as if they are inherently “fair.”
But now, people deploying these models – and by people I mostly mean Department of Corrections decision-makers – will have pressure to make sure the models are audited for racism before using them. And they can do this kind of analysis in-house with much less work. I hope they do.
I have been told by my editor to take a look at the books already out there on big data to make sure my book hasn’t already been written. For example, today I’m set to read Robert Scheer’s They Know Everything About You: how data-collecting corporations and snooping government agencies are destroying democracy.
This book, like others I’ve already read and written about (Bruce Schneier’s Data and Goliath, Frank Pasquale’s Black Box Society, and Julia Angwin’s Dragnet Nation) are all primarily concerned with individual freedom and privacy, whereas my book is primarily concerned with social justice issues, and each chapter gives an example of how big data is being used a tool against the poor, against minorities, against the mentally ill, or against public school teachers.
Not that my book is entirely different from the above books, but the relationship is something like what I spelled out last week when I discussed the four political camps in the big data world. So far the books I’ve found are focused on the corporate angle or the privacy angle. There may also be books focused on the open data angle, but I’m guessing they have even less in common with my book, which focuses on the ways big data increase inequality and further alienate already alienated populations.
If any of you know of a book I should be looking at, please tell me!
Last Friday I was honored to be part of a super interesting and provocative conference at UC Berkeley’s Law School called Open Data: Addressing Privacy, Security, and Civil Rights Challenges.
What I loved about this conference is that it explicitly set out to talk across boundaries of the data world. That’s unusual.
Broadly speaking, there are four camps in the “big data” world:
- The corporate big data camp. This involves the perspective that we use data to know our customers, make our products tailored to their wants and needs, generally speaking keep our data secret so as to maximize profits. The other side of this camp is the public, seen as consumers.
- The security crowd. These are people like Bruce Schneier, whose book I recently read. They worry about individual freedom and liberty, and how mass surveillance and dragnets are degrading our existence. I have a lot of sympathy for their view, although their focus is not mine. The other side of this camp is the NSA, on the one hand, and hackers, on the other, who exploit weak data and privacy protections.
- The open data crowd. The people involved with this movement are split into two groups. The first consists of activists like Aaron Swartz and Carl Malamud, whose basic goal is to make publicly available things that theoretically, and often by law, should be publicly available, like court proceedings and scientific research, and the Sunlight Foundation, which focuses on data about politics. The second group of “open data” folks come from government itself, and are constantly espousing the win-win-win aspects of opening up data: win for companies, who make more profit, win for citizens, who have access to more and better information, and win for government, which benefits from more informed citizenry and civic apps. The other side of this camp is often security folks, who point out how much personal information often leaks through the cracks of open data.
- Finally, the camp I’m in, which is either the “big data and civil rights” crowd, or more broadly the people who worry about how this avalanche of big data is affecting the daily lives of citizens, not only when we are targeted by the NSA or by someone stealing our credit cards, but when we are born poor versus rich, and so on. The other side of this camp is represented by the big data brokers who sell information and profiles about everyone in the country, and sometimes the open data folks who give out data about citizens that can be used against them.
The thing is, all of these camps have their various interests, and can make good arguments for them. Even more importantly, they each have their own definition of the risks, as well as the probability of those risks.
For example, I care about hackers and people unreasonably tracked and targeted by the NSA, but I don’t think about that nearly as much as I think about how easy it is for poor people to be targeted by scam operations when they google for “how do I get food stamps”. As another example, when I saw Carl Malamud talk the other day, he obviously puts some attention into having social security numbers of individuals protected when he opens up court records, but it’s not obvious that he cares as much about that issue as someone who is a real privacy advocate would.
Anyway, we didn’t come to many conclusions in one day, but it was great for us all to be in one room and start the difficult conversation. To be fair, the “corporate big data camp” was not represented in that room as far as I know, but that’s because they’re too busy lobbying for a continuation of little to no regulation in Washington.
And given that we all have different worries, we also have different suggestions for how to address those worries; there is no one ideal regulation that will fix everything, and for that matter some people involved don’t believe that government regulations can ever work, and that we need citizen involvement above all, especially when it comes to big data in politics. A mishmash, in other words, but still an important conversation to begin.
I’d like it to continue! I’d like to see some public debates between different representatives of these groups.
I’m on record complaining about how journalists dumb down stories in blind pursuit of “naming the victim” or otherwise putting a picture on the story.
But then again, sometimes that’s exactly what you need to do, especially when the story is super complicated. Case in point: the Snowden revelations story.
In the past 2 weeks I’ve seen the Academy Award winning feature length film CitizenFour, I’ve read Bruce Schneier’s recent book, Data and Goliath: The Hidden Battles To Collect Your Data And Control Your World, and finally I watched John Oliver’s recent Snowden episode.
They were all great in their own way. I liked Schneier’s book, it was a quick read, and I’d recommend it to people who want to know more than Oliver’s interview shows us. He’s very very smart, incredibly well informed, and almost completely reasonable (unlike this review).
To be honest, though, when I recommend something to other people, I pick John Oliver’s approach; he cleverly puts the dick pic on the story (you have to reset it to the beginning):
Here’s the thing that I absolutely love about Oliver’s interview. He’s not absolutely smitten by Snowden, but he recognizes Snowden’s goal, and makes it absolutely clear what it means to people using the handy use case of how nude pictures get captured in the NSA dragnets. It is really brilliant.
Compared to Schneier’s book, Oliver is obviously not as informational. Schneier is a world-wide expert on security, and gives us real details on which governmental programs know what and how. But honestly, unless you’re interested in becoming a security expert, that isn’t so important. I’m a tech nerd and even for me the details were sometimes overwhelming.
Here’s what I want to concentrate on. In the last part of the book, Schneier suggests all sorts of ways that people can protect their own privacy, using all sorts of encryption tools and so on. He frames it as a form of protest, but it seems like a LOT of work to me.
Compare that to my favorite part of the Oliver interview, when Oliver asks Snowden (starting at minute 30:28 in the above interview) if we should “just stop taking dick pics.” Snowden’s answer is no: changing what we normally do because of surveillance is a loss of liberty, even if it’s dumb.
I agree, which is why I’m not going to stop blabbing my mouth off everywhere (I don’t actually send naked pictures of myself to people, I think that’s a generational thing).
One last thing I can’t resist saying, and which Schneier discusses at length: almost every piece of data collected about us by our government is more or less for sale anyway. Just think about that. It is more meaningful for people worried about large scale discrimination, like me, than it is for people worried about case-by-case pinpointed governmental acts of power and suppression.
Or, put it this way: when we are up in arms about the government having our dick pics, we forget that so do our phones, and so does Facebook, or Snapchat, not to mention all the backups on the cloud somewhere.