Recidivism risk algorithms are inherently discriminatory

Home > Uncategorized > Recidivism risk algorithms are inherently discriminatory

Recidivism risk algorithms are inherently discriminatory

January 4, 2017 Cathy O'Neil, mathbabe

A few people have been sending me, via Twitter or email, this unsurprising article about how recidivism risk algorithms are inherently racist.

I say unsurprising because I’ve recently read a 2011 paper by Faisal Kamiran and Toon Calders entitled Data preprocessing techniques for classification without discrimination, which explicitly describes the trade-off between accuracy and discrimination in algorithms in the presence of biased historical data (Section 4, starting on page 8).

In other words, when you have a dataset that has a “favored” group of people and a “discriminated” group of people, and you’re deciding on an outcome that has historically been awarded to the favored group more often – in this case, it would be a low recidivism risk rating – then you cannot expect to maximize accuracy and keep the discrimination down to zero at the same time.

Discrimination is defined in the paper as the difference in percentages of people who get the positive treatment among all people in the same category. So if 50% of whites are considered low-risk and 30% of blacks are, that’s a discrimination score of 0.20.

The paper goes on to show that the trade-off between accuracy and discrimination, which can be achieved through various means, is linear or sub-linear depending on how it’s done. Which is to say, for every 1% loss of discrimination you can expect to lose a fraction of 1% of accuracy.

It’s an interesting paper, well written, and you should take a look. But in any case, what it means in the case of recidivism risk algorithms is that any algorithm that is optimized for “catching the bad guys,” i.e. accuracy, which these algorithms are, and completely ignores the discrepancy between high risk scores for blacks and for whites, can be expected to be discriminatory in the above sense, because we know the data to be biased*.

* The bias is due to the history of heightened scrutiny of black neighborhoods by police which we know as broken windows policing, which makes blacks more likely to be arrested for a given crime, as well as the inherent racism and classism in our justice system itself that was so brilliantly explained out by Michelle Alexander in her book The New Jim Crow, which makes them more likely to be severely punished for a given crime.

Categories: Uncategorized

Comments (5)

rsterbal

January 4, 2017 at 8:09 am

The approach should be to give positive outcomes to the same percentage of each group.

LikeLike
Lloyd Lofthouse

January 4, 2017 at 10:20 am

A worse danger is when the police/government decides to act on this biased information and launches a primitive strike arresting people and throwing them in prison when they haven’t done anything because the algorithm predicts high odds that they might do something in the future. Guilty by algorithm without any crime being committed — off to prison to lower the risk of crimes that haven’t happened yet, haven’t even been thought of and might never happen.

For instance, President G. W. Bush lying about WMDs as justification to start a preemptive war in Iraq because of the odds that Dictator Saddam Husein might, in the future, have WMDs and start another war with his neighbors.

LikeLike
MikeM

January 4, 2017 at 1:27 pm

Your point about biased algorithms is right on. They suffer, among other faults, from an ecological fallacy, that group behavior can be ascribed to individuals within the group.

But I think that you should go further back than you did in your footnote. Let’s put a few things in their correct order. Prior to “the inherent racism and classism in our justice system” (undeniably so) is the inherent racism and classism in governmental (and corporate) policies that segregated African-Americans and kept them down. This led inevitably to fewer resources provided to those neighborhoods (education, social services, infrastructure maintenance, etc.), which in turn led to increased crime and violence in those neighborhoods.

Yes, more police attention is devoted to those neighborhoods; but let’s not forget that 95 percent of the residents in them do not commit crimes. These residents are at greater risk of victimization, which is why the police are more active in these neighborhoods. In other words, police resources are allocated, not on the basis of race, but on the basis of crime rates, violent acts, and shootings, all of which are more prevalent in those neighborhoods, stemming from the aforementioned biased policies of the past. The “heightened scrutiny of black neighborhoods” is in fact a heightened scrutiny of more dangerous neighborhoods, which exist not because of police policies but policies above their pay grade.

This is not to excuse the execrable behavior of individual racist police officers, who have killed unarmed civilians with impunity; only the presence of cameras have given lie to their assertions of imminent danger.

LikeLike
- RTG
  
  January 4, 2017 at 6:12 pm
  
  This is an important and well-articulated point. I wanted to say something about not knowing if positive outcomes are equally warranted across groups in response to rsterbal, and I think you capture the reason why this matters.
  
  It’s also what makes tackling racism so complicated at this point in our history. Extremely racist policy decisions made decades ago have inextricably conflated urban poverty with race. This leaves any attempt to mitigate unequal outcomes of policies along racial lines vulnerable to accusations of overlooking crime committed by blacks. And I would say that response to BLM provides a good litmus test for whether people truly recognize the historic injustices that have now been institutionalized in every aspect of our society from community planning, to housing, to the justice system, to education, to…
  
  So my next question is, “What can be done about this? And does data have a role to play in mitigating the impacts of historic racism?” I think the answer to the second part is “yes”, but I honestly don’t know how.
  
  LikeLike
Rebecca

January 4, 2017 at 6:16 pm

I read the paper you referenced and it is TERRIFIC but way over the head of my students (I created and teach the Advanced Analytics in Higher Education program at Arizona State University; my students are higher ed administrators with an analytic bent, not mathematicians, computer scientists or statisticians).

I would love to see someone record a video or write a practitioner paper on methods for dealing with discrimination in training data. I know you have several that talk about how we need to be aware and why this is important (and I make them read many of those), but boiling down papers like the Kameran/Calder piece into actionable steps an analyst could take would make it easier to both get the word out that these techniques exist AND change the conversation from “we need to do this” to “why didn’t you?”.

LikeLike