Archive for the ‘feedback loop’ Category

Looking for big data reading suggestions

I have been told by my editor to take a look at the books already out there on big data to make sure my book hasn’t already been written. For example, today I’m set to read Robert Scheer’s They Know Everything About You: how data-collecting corporations and snooping government agencies are destroying democracy.

This book, like others I’ve already read and written about (Bruce Schneier’s Data and Goliath, Frank Pasquale’s Black Box Society, and Julia Angwin’s Dragnet Nation) are all primarily concerned with individual freedom and privacy, whereas my book is primarily concerned with social justice issues, and each chapter gives an example of how big data is being used a tool against the poor, against minorities, against the mentally ill, or against public school teachers.

Not that my book is entirely different from the above books, but the relationship is something like what I spelled out last week when I discussed the four political camps in the big data world. So far the books I’ve found are focused on the corporate angle or the privacy angle. There may also be books focused on the open data angle, but I’m guessing they have even less in common with my book, which focuses on the ways big data increase inequality and further alienate already alienated populations.

If any of you know of a book I should be looking at, please tell me!

Driving While Black in the Bronx

This is the story of Q, a black man living in the Bronx, who kindly allowed me to interview him about his recent experience. The audio recording of my interview with him is available below as well.

Q was stopped in the Bronx driving a new car, the fourth time that week, by two rookie officers on foot. The officers told Q to “give me your fucking license,” and Q refused to produce his license, objecting to the tone of the officer’s request. When Q asked him why he was stopped, the officer told him that it was because of his tinted back windows, in spite of there being many other cars on the same block, and even next to him, with similarly tinted windows. Q decided to start recording the interaction on his phone after one of the cops used the n-word.

After a while seven cop cars came to the scene, and eventually a more polite policeman asked Q to produce his license, which he did. They brought him in, claiming they had a warrant for him. Q knew he didn’t actually have a warrant, but when he asked, they said it was a warrant for littering. It sounded like an excuse to arrest him because Q was arguing. He recorded them saying, “We should just lock this black guy up.”

They brought him to the precinct and Q asked him for a phone call. He needed to unlock his phone to get the phone number, and when he did, the policeman took his phone and ran out of the room. Q later found out his recordings had been deleted.

After a while he was assigned a legal aid lawyer, to go before a judge. Q asked the legal aid why he was locked up. She said there was no warrant on his record and that he’d been locked up for disorderly conduct. This was the third charge he’d heard about.

He had given up his car keys, his cell phone, his money, his watch and his house keys, all in different packages. When he went back to pick up his property while his white friend waited in the car, the people inside the office claimed they couldn’t find anything except his cell phone. They told him to come back at 9pm when the arresting officer would come in. Then Q’s white friend came in, and after Q explained the situation to him in front of the people working there, they suddenly found all of his possessions. Q thinks they assumed his friend was a lawyer because he was white and well dressed.

They took the starter plug out of his car as well, and he got his cell phone back with no videos. The ordeal lasted 12 hours altogether.

“The sad thing about it,” Q said, “is that it happens every single day. If you’re wearing a suit and tie it’s different, but when you’re wearing something fitted and some jeans, you’re treated as a criminal. It’s sad that people have to go through this on a daily basis, for what?”

Here’s the raw audio file of my interview with Q:


Fingers crossed – book coming out next May

As it turns out, it takes a while to write a book, and then another few months to publish it.

I’m very excited today to tentatively announce that my book, which is tentatively entitled Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, will be published in May 2016, in time to appear on summer reading lists and well before the election.

Fuck yeah! I’m so excited.

p.s. Fight for 15 is happening now.

Fairness, accountability, and transparency in big data models

As I wrote about already, last Friday I attended a one day workshop in Montreal called FATML: Fairness, Accountability, and Transparency in Machine Learning. It was part of the NIPS conference for computer science, and there were tons of nerds there, and I mean tons. I wanted to give a report on the day, as well as some observations.

First of all, I am super excited that this workshop happened at all. When I left my job at Intent Media in 2011 with the intention of studying these questions and eventually writing a book about them, they were, as far as I know, on nobody’s else’s radar. Now, thanks to the organizers Solon and Moritz, there are communities of people, coming from law, computer science, and policy circles, coming together to exchange ideas and strategies to tackle the problems. This is what progress feels like!

OK, so on to what the day contained and my copious comments.

Hannah Wallach

Sadly, I missed the first two talks, and an introduction to the day, because of two airplane cancellations (boo American Airlines!). I arrived in the middle of Hannah Wallach’s talk, the abstract of which is located here. Her talk was interesting, and I liked her idea of having social scientists partnered with data scientists and machine learning specialists, but I do want to mention that, although there’s a remarkable history of social scientists working within tech companies – say at Bell Labs and Microsoft and such – we don’t see that in finance at all, nor does it seem poised to happen. So in other words, we certainly can’t count on social scientists to be on hand when important mathematical models are getting ready for production.

Also, I liked Hannah’s three categories of models: predictive, explanatory, and exploratory. Even though I don’t necessarily think that a given model will fall neatly into one category or the other, they still give you a way to think about what we do when we make models. As an example, we think of recommendation models as ultimately predictive, but they are (often) predicated on the ability to understand people’s desires as made up of distinct and consistent dimensions of personality (like when we use PCA or something equivalent). In this sense we are also exploring how to model human desire and consistency. For that matter I guess you could say any model is at its heart an exploration into whether the underlying toy model makes any sense, but that question is dramatically less interesting when you’re using linear regression.

Anupam Datta and Michael Tschantz

Next up Michael Tschantz reported on work with Anupam Datta that they’ve done on Google profiles and Google ads. The started with google’s privacy policy, which I can’t find but which claims you won’t receive ads based on things like your health problems. Starting with a bunch of browsers with no cookies, and thinking of each of them as fake users, they did experiments to see what actually happened both to the ads for those fake users and to the google ad profiles for each of those fake users. They found that, at least sometimes, they did get the “wrong” kind of ad, although whether Google can be blamed or whether the advertiser had broken Google’s rules isn’t clear. Also, they found that fake “women” and “men” (who did not differ by any other variable, including their searches) were offered drastically different ads related to job searches, with men being offered way more ads to get $200K+ jobs, although these were basically coaching sessions for getting good jobs, so again the advertisers could have decided that men are more willing to pay for such coaching.

An issue I enjoyed talking about was brought up in this talk, namely the question of whether such a finding is entirely evanescent or whether we can call it “real.” Since google constantly updates its algorithm, and since ad budgets are coming and going, even the same experiment performed an hour later might have different results. In what sense can we then call any such experiment statistically significant or even persuasive? Also, IRL we don’t have clean browsers, so what happens when we have dirty browsers and we’re logged into gmail and Facebook? By then there are so many variables it’s hard to say what leads to what, but should that make us stop trying?

From my perspective, I’d like to see more research into questions like, of the top 100 advertisers on Google, who saw the majority of the ads? What was the economic, racial, and educational makeup of those users? A similar but different (because of the auction) question would be to reverse-engineer the advertisers’ Google ad targeting methodologies.

Finally, the speakers mentioned a failure on Google’s part of transparency. In your advertising profile, for example, you cannot see (and therefore cannot change) your marriage status, but advertisers can target you based on that variable.

Sorelle Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian

Next up we had Sorelle talk to us about her work with two guys with enormous names. They think about how to make stuff fair, the heart of the question of this workshop.

First, if we included race in, a resume sorting model, we’d probably see negative impact because of historical racism. Even if we removed race but included other attributes correlated with race (say zip code) this effect would remain. And it’s hard to know exactly when we’ve removed the relevant attributes, but one thing these guys did was define that precisely.

Second, say now you have some idea of the categories that are given unfair treatment, what can you do? One thing suggested by Sorelle et al is to first rank people in each category – to assign each person a percentile in their given category – and then to use the “forgetful function” and only consider that percentile. So, if we decided at a math department that we want 40% women graduate students, to achieve this goal with this method we’d independently rank the men and women, and we’d offer enough spots to top women to get our quota and separately we’d offer enough spots to top men to get our quota. Note that, although it comes from a pretty fancy setting, this is essentially affirmative action. That’s not, in my opinion, an argument against it. It’s in fact yet another argument for it: if we know women are systemically undervalued, we have to fight against it somehow, and this seems like the best and simplest approach.

Ed Felten and Josh Kroll

After lunch Ed Felton and Josh Kroll jointly described their work on making algorithms accountable. Basically they suggested a trustworthy and encrypted system of paper trails that would support a given algorithm (doesn’t really matter which) and create verifiable proofs that the algorithm was used faithfully and fairly in a given situation. Of course, we’d really only consider an algorithm to be used “fairly” if the algorithm itself is fair, but putting that aside, this addressed the question of whether the same algorithm was used for everyone, and things like that. In lawyer speak, this is called “procedural fairness.”

So for example, if we thought we could, we might want to turn the algorithm for punishment for drug use through this system, and we might find that the rules are applied differently to different people. This algorithm would catch that kind of problem, at least ideally.

David Robinson and Harlan Yu

Next up we talked to David Robinson and Harlan Yu about their work in Washington D.C. with policy makers and civil rights groups around machine learning and fairness. These two have been active with civil rights group and were an important part of both the Podesta Report, which I blogged about here, and also in drafting the Civil Rights Principles of Big Data.

The question of what policy makers understand and how to communicate with them came up several times in this discussion. We decided that, to combat cherry-picked examples we see in Congressional Subcommittee meetings, we need to have cherry-picked examples of our own to illustrate what can go wrong. That sounds bad, but put it another way: people respond to stories, especially to stories with innocent victims that have been wronged. So we are on the look-out.

Closing panel with Rayid Ghani and Foster Provost

I was on the closing panel with Rayid Ghani and Foster Provost, and we each had a few minutes to speak and then there were lots of questions and fun arguments. To be honest, since I was so in the moment during this panel, and also because I was jonesing for a beer, I can’t remember everything that happened.

As I remember, Foster talked about an algorithm he had created that does its best to “explain” the decisions of a complicated black box algorithm. So in real life our algorithms are really huge and messy and uninterpretable, but this algorithm does its part to add interpretability to the outcomes of that huge black box. The example he gave was to understand why a given person’s Facebook “likes” made a black box algorithm predict they were gay: by displaying, in order of importance, which likes added the most predictive power to the algorithm.

[Aside, can anyone explain to me what happens when such an algorithm comes across a person with very few likes? I’ve never understood this very well. I don’t know about you, but I have never “liked” anything on Facebook except my friends’ posts.]

Rayid talked about his work trying to develop a system for teachers to understand which students were at risk of dropping out, and for that system to be fair, and he discussed the extent to which that system could or should be transparent.

Oh yeah, and that reminds me that, after describing my book, we had a pretty great argument about whether credit scoring models should be open source, and what that would mean, and what feedback loops that would engender, and who would benefit.

Altogether a great day, and a fantastic discussion. Thanks again to Solon and Moritz for their work in organizing it.

Educational feedback loops in China and the U.S.

Today I want to discuss a recent review in New York Review of Books, on a new book entitled Who’s Afraid of the Big Bad Dragon? Why China Has the Best (and Worst) Education System in the World by Yong Zhao (hat tip Alex). The review was written by Diane Ravitch, an outspoken critic of No Child Left Behind, Race To The Top, and the Common Core.

You should read the review, it’s well written and convincing, at least to me. I’ve been studying these issues and devoted a large chunk of my book to the feedback loops described as they’ve played out in this country. Here are the steps I see, which are largely reflected in Ravitch’s review:

  1. Politicians get outraged about a growing “achievement gap” (whereby richer or whiter students get better test scores than poorer or browner students) and/or a “lack of international competitiveness” (whereby students in countries like China get higher international standardized test scores than U.S. students).
  2. The current president decides to “get tough on education,” which translates into new technology and way more standardized tests.
  3. The underlying message is that teachers and students and possibly parents are lazy and need to be “held accountable” to improve test scores. The even deeper assumption is that test scores are the way to measure quality of learning.
  4. Once there’s lots of attention being given to test scores, lots of things start happening in response (the “feedback loop”).
  5. For example, widespread cheating by students and teachers and principals, especially when teachers and principals get paid based on test performance.
  6. Also, well-off students get more and better test prep, so the achievement gap gets wider.
  7. Even just the test scores themselves lead to segregation by class: parents who can afford it move to towns with “better schools,” measured by test scores.
  8. International competitiveness doesn’t improve. But we’ve actually never been highly ranked since we started measuring this.

What Zhao’s book adds to this is how much worse it all is in China. Especially the cheating. My favorite excerpt from the book:

Teachers guess possible [test] items, companies sell answers and wireless cheating devices to students, and students engage in all sorts of elaborate cheating. In 2013, a riot broke out because a group of students in Hubei Province were stopped from executing the cheating scheme their parents purchased to ease their college entrance exam.

Ravitch adds after that that ‘an angry mob of two thousand people smashed cars and chanted, “We want fairness. There is no fairness if you do not let us cheat.”’

To be sure, the stakes in China are way higher. Test scores are incredibly important and allow people to have certain careers. But according to Zhao, this selection process, which is quite old, has stifled creativity in the Chinese educational system (so, in other words, test scores are the wrong way to measure learning, in part because of the feedback loop). He blames the obsession with test scores on the fact that no Chinese native has received a Nobel Prize since 1949, for example: the winners of that selection process are not naturally creative.

Furthermore, Zhao claims, the Chinese educational system stifles individuality and forces conformity. It is an authoritarian tool.

In that light, I guess we should be proud that our international scores are lower than China’s; maybe it is evidence that we’re doing something right.

I know that, as a parent, I am sensitive to these issues. I want my kids to have discipline in some ways, but I don’t want them to learn to submit themselves to an arbitrary system for no good reason. I like the fact that they question why they should do things like go to bed on time, and exercise regularly, and keep their rooms cleanish, and I encourage their questions, even while I know I’m kind of ruining their chances at happily working in a giant corporation and being a conformist drone.

This parenting style of mine, which I believe is pretty widespread, seems reasonable to me because, at least in my experience, I’ve gotten further by being smart and clever than by being exactly what other people have wanted me to be. And I’m glad I live in a society that rewards quirkiness and individuality.

Big Data’s Disparate Impact

Take a look at this paper by Solon Barocas and Andrew D. Selbst entitled Big Data’s Disparate Impact.

It deals with the question of whether current anti-discrimination law is equipped to handle the kind of unintentional discrimination and digital redlining we see emerging in some “big data” models (and that we suspect are hidden in a bunch more). See for example this post for more on this concept.

The short answer is no, our laws are not equipped.

Here’s the abstract:

This article addresses the potential for disparate impact in the data mining processes that are taking over modern-day business. Scholars and policymakers had, until recently, focused almost exclusively on data mining’s capacity to hide intentional discrimination, hoping to convince regulators to develop the tools to unmask such discrimination. Recently there has been a noted shift in the policy discussions, where some have begun to recognize that unintentional discrimination is a hidden danger that might be even more worrisome. So far, the recognition of the possibility of unintentional discrimination lacks technical and theoretical foundation, making policy recommendations difficult, where they are not simply misdirected. This article provides the necessary foundation about how data mining can give rise to discrimination and how data mining interacts with anti-discrimination law.

The article carefully steps through the technical process of data mining and points to different places within the process where a disproportionately adverse impact on protected classes may result from innocent choices on the part of the data miner. From there, the article analyzes these disproportionate impacts under Title VII. The Article concludes both that Title VII is largely ill equipped to address the discrimination that results from data mining. Worse, due to problems in the internal logic of data mining as well as political and constitutional constraints, there appears to be no easy way to reform Title VII to fix these inadequacies. The article focuses on Title VII because it is the most well developed anti-discrimination doctrine, but the conclusions apply more broadly because they are based on the general approach to anti-discrimination within American law.

I really appreciate this paper, because it’s an area I know almost nothing about: discrimination law and what are the standards for evidence of discrimination.

Sadly, what this paper explains to me is how very far we are away from anything resembling what we need to actually address the problems. For example, even in this paper, where the writers are well aware that training on historical data can unintentionally codify discriminatory treatment, they still seem to assume that the people who build and deploy models will “notice” this treatment. From my experience working in advertising, that’s not actually what happens. We don’t measure the effects of our models on our users. We only see whether we have gained an edge in terms of profit, which is very different.

Essentially, as modelers, we don’t humanize the people on the other side of the transaction, which prevents us from worrying about discrimination or even being aware of it as an issue. It’s so far from “intentional” that it’s almost a ridiculous accusation to make. Even so, it may well be a real problem and I don’t know how we as a society can deal with it unless we update our laws.

Upcoming data journalism and data ethics conferences

October 14, 2014 Comments off


Today I’m super excited to go to the opening launch party of danah boyd’s Data and Society. Data and Society has a bunch of cool initiatives but I’m particularly interested in their Council for Big Data, Ethics, and Society. They were the people that helped make the Podesta Report on Big Data as good as it was. There will be a mini-conference this afternoon I’m looking forward to very much. Brilliant folks doing great work and talking to each other across disciplinary lines, can’t get enough of that stuff.

This weekend

This coming Saturday I’ll be moderating a panel called Spotlight on Data-Driven Journalism: The job of a data journalist and the impact of computational reporting in the newsroom at the New York Press Club Conference on Journalism. The panelists are going to be great:

  • John Keefe @jkeefe, Sr. editor, data news & J-technology, WNYC
  • Maryanne Murray @lightnosugar, Global head of graphics, Reuters
  • Zach Seward @zseward, Quartz
  • Chris Walker @cpwalker07, Dir., data visualization, Mic News

The full program is available here.

December 12th

In mid-December I’m on a panel myself at the Fairness, Accountability, and Transparency in Machine Learning Conference in Montreal. This conference seems to directly take up the call of the Podesta Report I mentioned above, and seeks to provide further research into the dangers of “encoding discrimination in automated decisions”. Amazing! So glad this is happening and that I get to be part of it. Here are some questions that will be taken up at this one-day conference (more information here):

  • How can we achieve high classification accuracy while eliminating discriminatory biases? What are meaningful formal fairness properties?
  • How can we design expressive yet easily interpretable classifiers?
  • Can we ensure that a classifier remains accurate even if the statistical signal it relies on is exposed to public scrutiny?
  • Are there practical methods to test existing classifiers for compliance with a policy?