How to fix recidivism risk models
Yesterday I wrote a post about the unsurprising discriminatory nature of recidivism models. Today I want to add to that post with an important goal in mind: we should fix recidivism models, not trash them altogether.
The truth is, the current justice system is fundamentally unfair, so throwing out algorithms because they are also unfair is not a solution. Instead, let’s improve the algorithms and then see if judges are using them at all.
The great news is that the paper I mentioned yesterday has three methods to do just that, and in fact there are plenty of papers that address this question with various approaches that get increasingly encouraging results. Here are brief descriptions of the three approaches from the paper:
- Massaging the training data. In this approach the training data is adjusted so that it has less bias. In particular, the choice of classification is switched for some people in the preferred population from + to -, i.e. from the good outcome to the bad outcome, and there are similar switches for some people in the discriminated population from – to +. The paper explains how to choose these switches carefully (in the presence of continuous scorings with thresholds).
- Reweighing the training data. The idea here is that with certain kinds of models, you can give weights to training data, and with a carefully chosen weighting system you can adjust for bias.
- Sampling the training data. This is similar to reweighing, where the weights will be nonnegative integer values.
In all of these examples, the training data is “preprocessed” so that you can train a model on “unbiased” data, and importantly, at the time of usage, you will not need to know the status of the individual you’re scoring. This is, I understand, a legally a critical assumption, since there are anti-discrimination laws which forbid you to “consider” the race of someone when deciding whether to hire them or so on.
In other words, we’re constrained by anti-discrimination law to not use all the information that might help us avoid discrimination. This constraint, generally speaking, prevents us from doing as good a job as possible.
- We might not think that we need to “remove all the discrimination.” Maybe we stratify the data by violent crime convictions first, and then within each resulting bin we work to remove discrimination.
- We might also use the racial and class discrepancies in recidivism risk rates as an opportunity to experiment with interventions that might lower those discrepancies. In other words, why are there discrepancies, and what can we do to diminish them?
- In other words, I do not claim that this is a trivial process. It will in fact require lots of conversations about the nature of justice and the goals of sentencing. Those are conversations we should have.
- Moreover, there’s the question of balancing the conflicting goals of various stakeholders which makes this an even more complicated ethical question.