Speaking at QCon London on auditing algorithms

Home > Uncategorized > Speaking at QCon London on auditing algorithms

Speaking at QCon London on auditing algorithms

March 7, 2016 Cathy O'Neil, mathbabe

Today I’m flying to London to join a the QCon conference, which is for professional software folks. I’m speaking on Wednesday afternoon in the Data Science vertical, and the title for my talk is, “How Do We Audit Algorithms?“. They also interviewed me for the conference.

The speaker before me is discussing the nitty gritty of recidivism modeling, otherwise known as algorithms that help judges and parole boards decide what to do with prisoners depending on their “risk of returning to the justice system”. Given how deeply racist and anti-poor our justice system is, it’s a big question whether or how data-driven studies or algorithms can improve it.

In other words, the need for auditing algorithms could not be more front and center given the talk before mine. So I’m going to use it as a use case.

As for how we actually do audits, I’m cobbling together stuff that is known, current research, and a long list of to-dos. A very recent paper that I’ll talk about is entitled, Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit, and it seems to contain a pretty good set of tools – written in python, no less – that can help a curious person audit a data-driven system. However, it seems to lack real tools in the case where the “protected user attributes,” are not supplied.

So, if you have a dataset showing the history of a bunch of prisoners, their recidivism scores, and their subsequent sentencing lengths, you’d like to know whether the algorithms was biased against blacks or against poor people. But if you don’t have the column “race” or “income,” it’s a lot harder to do that analysis.

Best thing you can do, besides trying to collect such data in the future, might be something along these lines, where you do your best to infer race from zip codes and last names. But not all modelers even have that, so it gets tricky pretty fast.

As usual all thoughts and references are deeply appreciated.

Categories: Uncategorized

Comments (12)

sorelle

March 7, 2016 at 9:51 am

Doesn’t address your second point, but I (and some other regular readers) have a recent arXiv paper on the black-box auditing question: http://arxiv.org/abs/1602.07043
Code here: https://github.com/cfalk/BlackBoxAuditing

LikeLike
- Cathy O'Neil, mathbabe
  
  March 7, 2016 at 9:56 am
  
  Fantastic, thanks so much!
  
  LikeLike
Noemi Fabry

March 7, 2016 at 10:22 am

Can I throw a spanner in the works?

When we are talking about algorithm in the public sphere of a democratically accountable society, isn’t the first and most fundamental question, the question of metrics itself?

What can we really, genuinely measure? What are the limits of fitting lived reality into neat little digits? And are there other more useful ways of capturing such experiences? i.e the old All models are wrong some models are useful school of statistics.

I work in health care and am particularly interested in reporting systems (safety etc) andlimits of numeric representation and analysis of reality.

Probably not the right thing to say to a mathbabe? But maybe if the desired outcome is a better, more engaged, engaging and nurturing society algorithms are sometimes not the solution. How about starting with an exploration of what is measurable (“Grenzen der Messbarkeit” Hans-Georg Gadamer) when considering algorithms as solutions.

LikeLike
mike_bader

March 7, 2016 at 10:53 am

The housing and job audit study work in sociology could be a helpful literature from which to draw for inspiration, even if the methods differ. HUD tests fair housing laws in this country by sending matched pairs of auditors of different races and genders to attempt to rent or purchase housing. Diane Sawyer did a great piece on this twenty years ago to see it in action and This American Life did a radio story on the process in Chicago three years ago. The most recent report on housing discrimination is available here.

Another great application of this type of work has been to look at the influence of a criminal record on employment, which seems particularly relevant given the lede for your talk.

More recently, work has been done to use Craigslist callbacks about homes and other online methods to suss out discrimination. I’d be happy to dig up those citations if they would be helpful.

I am sure that your field has already incorporated much of this, but if nothing else it might be helpful for relating to work on similar topics going on in other fields.

LikeLike
- mike_bader
  
  March 7, 2016 at 2:01 pm
  
  About that bit on Craigslist and housing discrimination studies, there is a talk at Hunter College sociology this week that might interest you (or others): https://twitter.com/huntersociology/status/706900600848572417
  
  LikeLike
MikeM

March 7, 2016 at 11:36 pm

I wrote a book entitled “Recidivism” (https://www.academia.edu/10061829/Recidivism) over 30 years ago, which you might want to look at. The second half develops algorithms for estimating it, using incomplete distributions (i.e., assuming that the CDF does not rise to 1 — everyone fails), but the first half discusses some other ethical issues.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 8, 2016 at 8:02 am
  
  Thanks!
  
  LikeLike
MikeM

March 7, 2016 at 11:45 pm

An additional (linguistic) thought: the word “recidivism” implies full agency on the part of the individual. That is, saying that a person recidivates puts the entire burden on the ex-offender, ignoring the part that the external environment plays in his/her failure. Instead, we should look at the other side of the coin and talk about the ex-offenders probability of survival in the face of the difficulties s/he faces. So ethical issues arise even in the terms used.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 8, 2016 at 8:02 am
  
  I agree absolutely, that’s one of my major points.
  
  LikeLike
mytrialsite

March 10, 2016 at 3:07 am

Hi Cathy, I caught the first half of your talk yesterday but sadly had to duck out earlier to catch a train. I found it really interesting and I think these are really important issues for us to think about as big data and machine learning start to play an ever increasing role in society. Definitely something I’ll be reading and thinking about some more, so thank you.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 10, 2016 at 5:01 am
  
  Thanks! Definitely think that some in the audience were taken aback but overall I thought it went well.
  
  LikeLike
  - Rod Smith
    
    March 13, 2016 at 4:01 am
    
    Yes, I was a bit surprised at some of the reaction too. Perhaps you hit a nerve!
    
    LikeLike