Big data, disparate impact, and the neoliberal mindset

Home > Uncategorized > Big data, disparate impact, and the neoliberal mindset

Big data, disparate impact, and the neoliberal mindset

September 7, 2015 Cathy O'Neil, mathbabe

When you’re writing a book for the general public’s consumption, you have to keep things pretty simple. You can’t spend a lot of time theorizing about why some stuff is going on, you have to focus on what’s happening, and how bad it is, and who’s getting screwed. Anything beyond that and you’ll be called a conspiracy theorist by some level of your editing team.

But the good thing about writing a blog is that you can actually say anything you like. That’s one reason I cling so strongly to mathbabe; I need to be able to write stuff that’s mildly conspiracy-theoretical. After all, just because you’re paranoid doesn’t mean nobody’s out to get you, right?

Anyhoo, I’m going to throw out a theory about big data, disparate impact, and the neoliberal mindset. First I need to set it up a bit.

Did you hear about this recent story whereby Facebook just got a patent to measure someone’s creditworthiness by looking at who their friends are and what their credit scores are? They idea is, you are more likely to be able to pay back your loans if the people you’re friends with pay back their loans.

On the one hand, it sounds possibly true: richer people tend to have richer friends, and so if there’s not very much information about someone, but that person is nevertheless inferred to be “friends with rich people,” then they might be a better bet for paying back loans.

On the other hand, it also sounds like an unfair way to distribute loans: most of us are friends with a bunch of people from high school, and if I happened to go to a high school filled with poor kids, then loans for me would be ruled out by this method.

This leads to the concept of disparate impact, which was beautifully explained in this recent article called When Big Data Becomes Bad Data (hat tip Marc Sobel). The idea is, when your process (or algorithm) favors one group of people over another, intentionally or not, it might be considered unfair and thus illegal. There’s lots of precedent for this in the courts, and recently the Supreme Court upheld it as a legitimate argument in Fair Housing Act cases.

It’s still not clear whether a “disparate impact” argument can be used in the case of algorithms, though. And there are plenty of people who work in the field of big data who dismiss this possibility altogether, and who even claim that things like the Facebook idea above are entirely legitimate. I had an argument on my Slate Money podcast last Friday about this very question.

Here’s my theory as to why it’s so hard for people to understand. They have been taken over in these matters by a neoliberal thought process, whereby every person is told to behave rationally, as an individual, and to seek maximum profit. It’s like an invisible hand on a miniature scale, acting everywhere and at all times.

Since this ideology has us acting as individuals, and ignoring group dynamics, the disparate impact argument is difficult if not impossible to understand. Why would anyone want to loan money to a poor person? That wouldn’t make economic sense. Or, more relevantly, why would anyone not distinguish between a poor person and a rich person before making a loan? That’s the absolute heart of how the big data movement operates. Changing that would be like throwing away money.

Since every interaction boils down to game theory and strategies for winning, “fairness” doesn’t come into the equation (note, the more equations the better!) of an individual’s striving for more opportunity and more money. Fairness isn’t even definable unless you give context, and context is exactly what this mindset ignores.

Here’s how I talk to someone when this subject comes up. I right away distinguish between the goal of the loaner – namely, accuracy and profit – and the goal of the public at large, namely that we have a reasonable financial system that doesn’t exacerbate the current inequalities or send people into debt spirals. This second goal has a lot to do with fairness and definitely pertains broadly to groups of people. Then, after setting that up, we can go ahead and discuss the newest big data idea, as long as we remember to look at it through both lenses.

Categories: Uncategorized

Comments (20)

jeremiah757

September 7, 2015 at 8:56 am

You jumped from “people who pay back their loans” to “rich people” without explanation. Are there no struggling but conscientious borrowers? Are there no wealthy but irresponsible borrowers? I know a few business contacts who, while rich, are totally not creditworthy. You often jump to this rich versus poor divide, as if there are no other factors to consider.

LikeLike
Jon Awbrey

September 7, 2015 at 8:58 am

On a related note —

http://inquiryintoinquiry.com/2015/09/01/basal-ingredients-of-society-•-prologue/

LikeLike
Klondike Jack

September 7, 2015 at 9:31 am

A bit tangential, having to do with vouchers and charter schools, but here’s a fine discussion of the myth of the possibility of rational action due to inequality of agency and information. When you factor in the use of disinformation and the willful avoidance of full disclosure, the probability of a completely rational decision being made quickly approaches zero. http://horacemannleague.blogspot.com/2013/01/asymmetric-information-parental-choice.html

LikeLike
- Jon Awbrey
  
  September 7, 2015 at 12:08 pm
  
  Not too tangential, actually …
  
  Vouchers are a tool for injecting a viral species of market dynamics — the dynamics of private individuals competing in a zero-sum game for individual commodities — into a sphere where those dynamics are out of place and where they are ultimately destructive. That is the sphere of cooperative, public, social institutions that must rise above the myopia of short-term private interests if they are to sustain themselves at all over the long haul of history.
  
  http://inquiryintoinquiry.com/2015/09/07/basal-ingredients-of-society-%E2%80%A2-7/
  
  LikeLike
jeremiah757

September 7, 2015 at 9:36 am

Oh, and then you move on to disparate impact, which is about racial discrimination. Again without explanation, we are to believe that “people who don’t pay back their loans” is code for “black people.” Some might call that racism.

LikeLike
- Auros
  
  September 11, 2015 at 3:50 am
  
  So here’s the thing. You don’t actually need a conspiracy for the financial system to move money from a poor / disfavored class to a capital-owning class. You can just have individuals each acting in their own self-interest, within the law (even if only barely, or sometimes walking over the legal line but not suffering enough consequences to deter them).
  
  Take the history of home contract sales and block busting, which you can read about at great length in one chapter of Ta-Nehisi Coates’ epic case for reparations.
  
  http://www.theatlantic.com/magazine/archive/2014/06/the-case-for-reparations/361631/
  
  Or take the more recent situations where sub-prime loans, and especially loans that had no interest at first but then “exploded” into ARMs with balloon-payments — were systematically marketed to poor, and especially minority, communities. I actually had a landlady who lost ownership of the house I was in b/c she was the victim of just such a loan. She bought the house on the advice of a real estate agent of her own race, whom she had met at church. He did the same thing to a lot of other people, as well, and when the bubble popped, he skipped town with his profits. Yay for affinity scams.
  
  Cathy is moving from the methods of identifying people who have more assets (and thus are better loan risks) to issues of economic class and racial disparate impact because that is a realistic assessment of what has actually happened in the American financial system, over and over, since the early 20th century.
  
  LikeLike
AQ

September 7, 2015 at 10:39 am

On the one hand, it sounds possibly true: richer people tend to have richer friends, and so if there’s not very much information about someone, but that person is nevertheless inferred to be “friends with rich people,” then they might be a better bet for paying back loans.

Really, how do we know this is true? They may have cash flow and wealth but do they pay their bills? If they don’t pay their bills, do businesses complain? Does the government throw them in jail? My mind wandered to Donald Trump’s comments about using bankruptcy, ancedotal stories about high-end hotel bills not being paid, my dad’s stories about how difficult it was collect on construction work done for upper income people back in the day in Chicago, large corporations who pay when they more or less want to. Really I see the difference here as people with low incomes tend to have more of their income pickpocketed by either the state or state-sanctioned mechanisms.

That said, I’d suggest that the algorithms build in the same mechanisms because of course everyone KNOWS rich people are better at paying back loans. Heck, they don’t even need loans in the first place, do they? So if we want to get them to accept our ‘loans’ then they deserve special privelegs, don’t they? Maybe just a handshake deal.

Even if you don’t agree with anything I’ve said, it should at least go to all of the cultural messaging that anyone building an algorithm would have to actively counteract to build a non-biased algorithm or even one that doesn’t actively enforce our current capitalistic values. Or wait… feature not bug.

LikeLike
JB

September 7, 2015 at 11:48 am

Well said. Thanks for this.

LikeLike
AQ

September 7, 2015 at 11:59 am

Sorry, had another tangent thought on this while my mind was wandering and thought I’d share to see what others thought. What if the whole purpose of algorithms is really to establish cash flow or rather “rents?” If we think about this in terms of establishing rentier income then who exactly are the most reliable ‘subjects’ to extract rents from? (state enforced even)

LikeLike
AQ

September 7, 2015 at 12:16 pm

And to take it a step further, if you play by the rules then you’ll be fine. The ‘system’ aka algorithm is completely fair and unbiased. It can’t can’t be rigged because everyone is ‘subjected’ to the same baseline criteria (big data). Human opinion has been removed.

Bang, you get systematic extraction and at the same time a grand cultural whisper which says if you’re not okay with this then it’s your own fault. Or your neighbor’s fault. Or you hang out with the wrong people or you don’t play by society’s tacit rules. Anything but looking at the system itself because the system can’t be flawed or rigged, right?!?

Thanks for the thought provoking post. Won’t hog up the space anymore.

LikeLike
Chuck Carlstrom

September 7, 2015 at 1:47 pm

I find your post interesting, but interestingly your fear of big data I see as the hope of big data. The fears you emphasize already exist. Contrary to the maxim that 2 wrongs don’t make a right, an additional “wrong” may mitigate the previous wrong.
I start with the presumption that if it were possible to perfectly uncover the true probabilities of an individual’s ex ante chances of bankruptcy that would be perfectly fair. More data will move us closer to that ideal. That is, those that are completely different from their friends gain as a new signal contradicts the old one. Therefore there are fewer individuals harmed from using these signals. The question is then whether the remaining individuals are harmed further (and to a greater extent than those that gain). Relatedly there are individuals unfairly advantaged by one piece of date then unfairly disadvantaged by another. In the aggregate these will likely average out to a large extent.

LikeLike
- Tim McDermott
  
  September 7, 2015 at 6:29 pm
  
  I think big data is only a solution if we eliminate the agency of individual humans, and groups of humans who know that we really don’t want to give loans to _those_ people. So we will tweak our algorithms — in ways that are buried in thousands/millions of lines of code — so that we just don’t make many loans in that zip code or this area over here. No malice here, just algorithms.
  
  LikeLike
abekohen

September 7, 2015 at 4:40 pm

For many people there is a big difference between FB friends and real-life friends, even if there is some overlap. I wouldn’t trust a credit ranking system based on FB friends.

LikeLike
Tim McDermott

September 7, 2015 at 6:23 pm

For me, the wonderful insight in this post is, to rephrase, that much of the dysphoria loose today comes from a reductionist view of society. If I, and every other ‘I’ out there, just takes care of my own business, and only my own business, then the world improves.

Reductionism is the belief that we can understanding something by recursively understand its parts. This approach dominated science for nearly 4 centuries and produced spectacular advances. But the discovery of chaos (complex dynamical systems) killed reductionism in the sciences (except medicine). To solve the sort of questions left over after all the reductionist fruit has been picked, we need to look at systems.

But what if large plurality, or even majority, of our fellow citizens believe that the best way to comprehend a frog is to cut it up? Pay no attention to what tadpoles eat, and how that affects ponds and streams; pay no attention to what frogs eat; pay no attention to what eats frogs. It gets a lot easier to think that caring about an endangered species is just a way to mess with peoples’ livelihood.

What if a majority of the folks who actually vote don’t understand how the world really works?

LikeLike
Chuck Carlstrom

September 7, 2015 at 6:50 pm

My comment like your post was about the use of big data in general. FB was an example as it was in your post. Sort of a silly examplel in that we don’t know if that is the intended use by FB AND as you point out it may not provide any information. The beauty is that it provides no information then the lenders would end up not using it.

LikeLike
Nick Ryan

September 7, 2015 at 9:26 pm

To me, this also seems like a problem of economics. Solutions like loaning based on Facebook friends may seem like a great idea if it allows more people to get loans or makes our loan rates more accurate. These are the efficiency gains from big data. The fundamental trade-off being made in microeconomics is usually efficiency vs. distribution. Unfortunately, while most economists pay lip service to the fact that this trade-off exists and then ignore it and say that distribution is irrelevant if a given policy is more economically efficient. “The winners can always pay the losers”. If we forced economists, and policy makers, and companies to think more about the distributional effects of their policies, it would have more just outcomes than we see today. I realize that this isn’t actionable advice, but it is something I am working through.

LikeLike
mclaren

September 8, 2015 at 12:31 am

Big Data is the mathematical embodiment of the false and pernicious maxim attributed to Margaret Thatcher: “There is no such thing as society, there are only people” — as false and as evil a doctrine as has ever emerged from the fertile mind of man.
Again and again, we find that mathematics applied to economics is the last refuge of the despot too cowardly to own hi/r own taste for savage oppression. “Oh, but I didn’t discriminate against you…it was the _algorithm_ that did that!”

LikeLike
Gyan

September 8, 2015 at 7:54 am

It seems to me you are confusing levels with relatives – both rich and poor people can be equivalent risks as long as their debt service levels are proportionate to their incomes. Naturally this means smaller loans for poor people and larger ones for rich people. This has nothing to do with fairness per se but if you lend money with the expectation of getting it back, this result must necessarily follow. If you don’t, it’s called a grant, and poor people should be getting those if collectively as a society, we come to the decision that it’s beneficial for us to pool our resources and re-distribute it to poor people.

LikeLike
shah8

September 8, 2015 at 3:59 pm

Just in case alphaville deletes:

Okay…uh, I should probably point out that Noah Smith is basically out of his mind about “loan fairness”.

1) Cathy O’Neil is essentially talking about lending based on data artifacts that are degrees of separation away from material lending questions, where Noah Smith unnecessarily assumes that lending based on these practices would be efficient and profitable.

2) We’ve spent a lot of academic time and thought as to how people become underserved when it comes to banking, and then remedies, such as microfinance. Cathy O’Neil is obviously taking from that tradition when she cites how fairness only matters with lots of context (though in her article, she really needed to assert how laws and legal foundations are providers of context, for markets and other stuff). Noah Smith claiming that unscrupulous poor people will crowd out “hard working but naive” poor people is a classic and common bad faith argument, whether Smith intended to argue in bad faith or not. It’s an exogenous affair that can easily be remediated by secondary actions should policymakers feel the need. For example, policymakers don’t feel the need to improve the numbers of eligible people for welfare or disability, but they feel the need to improve the uptake for Obamacare. It’s merely a second question, not part of the original question about “fairness”.

3) The reason why the legal system creates the context for what is fair, and accepts questions about disparate impact is because:

a) People try to prosecute social questions through their control of economic institutions all the time. Like redlining. The techniques are described as being about profitable, when it’s anything but, or based to maximum extraction/plunder of the economic lives of “undesirables”. “Broken windows” and other sorts of policing fads are fundamentally based on using data in bad faith so as to maintain control of nonwhites in public spaces.

b) Much of the time the practices are legally banned because if too many people did *superprofitable action x*, the market would cease to exist, or be untrusted by people who *have* to use it, thereby creating a tragedy of the commons. The controversy of high frequency trading is an example of this sort of thinking.

LikeLike
RSW

September 9, 2015 at 1:09 am

Big Data is Bad Data when it is used asymmetrically by bad people. Otherwise it’s just data. If it is in fact true that a certain group of people has a higher probability of credit issues, it’s better that we know about it and address it. Sometime in the future, when government is able to construct a useful tax code, it is feasible to imagine that one could impose different income taxes on interest based on the ‘environmental risk ‘ (other than the credit history) of the borrower, effectively raising the rates on the already privileged. But in order to do that the good data scientists will need to determine this risk.

If it’s immoral or illegal to determine someone’s credit based on available information, then only bad people and criminals will determine someone’s credit based on available information.

LikeLike