The tricky thing about disparate impact

Home > Uncategorized > The tricky thing about disparate impact

The tricky thing about disparate impact

October 2, 2015 Cathy O'Neil, mathbabe

Today I’m fascinated by the story described in this three-part American Banker series on the Consumer Financial Protection Bureau’s (CFPB’s) use of disparate impact, written by Rachel Witkowski. Disparate impact, according to the article, is a legal theory that says lenders can be penalized if they have a neutral policy that creates an adverse impact against a protected class of borrowers, regardless of intent.

Witkowski reports on the CFPB trying to understand and punish auto lenders for their process for figuring out fees and interest rates on auto loans. In general, the auto dealers, who work in partnerships with auto lenders, have discretion to add on some interest rate and pocket the difference. They seem to be pocketing fatter differences for certain populations, specifically black car buyers.

The problem is, it’s hard to measure exactly how much fatter and who is getting screwed, by how much. And in the world of law and punishment, it’s not enough to prove that there’s been a disparate impact – you have to actually make restitutions to the victims. So for example, the CFPB is in discussions with Ally Financial for exactly this problem, and the question is how much money to they give to which borrowers as a refund.

The first reason this is hard to get right is that auto dealers and lenders don’t actually collect race information, in contrast to mortgage lending, where it’s a requirement of the lending process, specifically to ward against redlining. So the CFPB, in its investigation, has to rely on proxy data like zip codes and names to guess the race of a given borrower. In fact their methodology is described in this white paper, but unsurprisingly the auto lenders under scrutiny complain it is not sufficiently transparent.

What that translates into is the possibility that some white car buyers people will get refunded accidentally and some black car buyers won’t, even if there were shenanigans going on with their car loan. From my perspective as a data person, this tells me that, as long as we have problems like this, we should probably require race to be recorded in a car loan.

That’s not the only problem, though. The thing about these modern cases of measuring disparate impact is that it’s a model, and models are extremely squishy things. Two people asked to build a disparate impact model on the same data will likely come up with different answers, because all sorts of decisions have to be made on the way. From the article:

Each financial regulator has its own method for determining disparities and harm in fair-lending cases, and each of those cases can differ depending on the business model of the bank and what variables the regulators will consider. The Federal Reserve, for instance, generally adds controls, such as geography, to the statistical model if the bank’s business model indicates that certain pricing criteria can influence the price or markup, according to a 2013 Fed presentation.

Given this uncertainty, plus the uncertainty of the race of the borrowers, you end up firmly in a land of statistics, where each borrower is assigned a probability of being minority and a probability of having been screwed. Then the question becomes, do we err on the side of under- or over-refunding these borrowers? The lenders, who are paying for this all, tend to lean on the side of not giving any money away at all unless we’re sure.

In this particular story, specifically in part 3, there’s even an expert consultant named Dr. Bernard Siskin who happens to work for both sides – the banks and the CFPB. The excuse for that questionable arrangement is that there aren’t enough statisticians who can do this work (my hand is raised!), but the end result is that Siskin seems to help the banks complain about exactly this issue: which version of the disparate impact model is to be used, and what kind of attributes will be controlled for, so that they can each get the least expensive settlement.

Here’s my theory. This is a big new field in statistics and data science, and this is just the tip of the iceberg. We will be seeing a large amount of work being done and tools being made which aim to measure and audit processes and algorithms, whether they are auto loans that discriminate against minority borrowers or car computers that bypass emissions tests. And we will have to develop standards by which we measure a company’s work. The standards won’t be perfect, mind you, and people will end up getting away with certain things, but at leas we won’t have the gaming that’s obviously going on now, because there will be a set way, hopefully reasonably thought out, to measure discrimination, or lying, or cheating, or what have you.

That’s the field I want to go into. Building models that call bullshit on other models.

Categories: Uncategorized

Comments (5)

howardat58

October 2, 2015 at 9:31 am

So should this stuff be applied to Puerto Rico, USA, would all the poulation be described as latino, or the disliked term hispanic, and where there is a full range of colors, from very black to pasty white, and fairly evenly spread, what will they do? Looks rather like affirmative action to me.

LikeLike
Bank Treasury Person

October 2, 2015 at 10:29 am

I had some experience with this at an FI where I used to work and first of all this is just a dumb practice because dealers convince buyers who don’t require financing to “just take the loan, pay it off next month and I’ll knock $1000 off the price”, and then pocket the bonus from the lender.

But the interesting thought from a lender’s perspective is that all bankers understand that spreads are lower on customers who negotiate harder on rate and higher on customers who don’t (whether deposits or loans). If the data ultimately reveals that different races are more or less likely to push hard on rate (which would not be surprising), does that mean that customer-level pricing should be banned?

LikeLike
medicalquackblog

October 2, 2015 at 10:40 am

I’m not a bit Richard Cordray fan who run the CFPB. He has his own data collection efforts taking place, like buying all our credit card data “scored” from Argus. I think when he was appointed he had to be given the Jamie Dimon stamp of approval. There’s also the project run by Lewin Group, a subsidiary of United Healthcare now with HUD and HHS to get all your medical records and see what your current state is before that becomes available to consumers as well. I call it excess scoring of US consumers with folks abusing risk numbers. Here’s the big Argus project and banks and others buy your scored data by the gazillions.

http://ducknetweb.blogspot.com/2014/08/argus-analytics-produces-share-of.html

I think Cordray’s efforts are good for only finding low hanging fruit and who knows what else he’s into, but bottom line is the CFPB answers to the Office of the Comptroller of the Currency.

LikeLike
Zathras

October 2, 2015 at 11:12 am

It’s important to fill in the history of disparate impact, which the article does not do. A legal theory of disparate impact did not start with regulators. Instead, it was started by private lawyers making claims of discrimination against any number of entities. Being caught red-handed discriminating is not common, so it became a kind of indirect proof of discrimination. Over time it became a claim in its own right. The practice of insurance companies setting rates by zip code (“redlining”) is the prototypical case of disparate impact. And the analytics were simple. Once the courts accepted the theory of disparate impact, only then did regulators really start writing the rules into the CFR.

Based on the history, I would have to expect that, when it comes to these type of issues, it is wrong to expect for regulators to take the lead. Instead, watch what the private attorneys do. If they can make headway in these type of claims (and right now there is a real knowledge gap among law firms for doing this type of rigorous analytics), eventually the regulators will jump on board. But the private attorneys have to make the headway first.

LikeLike
mooms

October 2, 2015 at 12:44 pm

Clear problem, in statistical terms one needs to integrate out race and model uncertainty. This can be done, but efficiency and implementation are expensive! No knowledge without assumptions…

LikeLike