Creepy model watch

Home > data science > Creepy model watch

Creepy model watch

February 21, 2012 Cathy O'Neil, mathbabe

I really feel like I can’t keep up with all of the creepy models coming out and the news articles about them, so I think I’ll just start making a list. I would appreciate readers adding to my list in the comment section. I think I’ll move this to a separate page on my blog if it comes out nice.

I recently blogged about a model that predicts student success in for-profit institutions, which I claim is really mostly about student debt and default,
but here’s a model which actually goes ahead and predicts default directly, it’s a new payday-like loan model. Oh good, because the old payday models didn’t make enough money or something.
Of course there’s the teacher value-added model which I’ve blogged about multiple times, most recently here. And here’s a paper I’d like everyone to read before they listen to anyone argue one way or the other about the model (h/t Joshua Batson). The abstract is stunning: Recently, educational researchers and practitioners have turned to value-added models to evaluate teacher performance. Although value-added estimates depend on the assessment used to measure student achievement, the importance of outcome selection has received scant attention in the literature. Using data from a large, urban school district, I examine whether value-added estimates from three separate reading achievement tests provide similar answers about teacher performance. I find moderate-sized rank correlations, ranging from 0.15 to 0.58, between the estimates derived from different tests. Although the tests vary to some degree in content, scaling, and sample of students, these factors do not explain the differences in teacher effects. Instead, test timing and measurement error contribute substantially to the instability of value-added estimates across tests. Just in case that didn’t come through, they are saying that the results of the teacher value-added test scores are very very noisy.
That reminds me, credit scoring models are old but very very creepy, wouldn’t you agree? What’s in them that they want to conceal them?
Did you read about how Target predicts pregnancy? Extremely creepy.
I’m actually divided about whether it’s the creepiest though, because I think the sheer enormity of information that Facebook collects about us is the most depressing thing of all.

Before I became a modeler, I wasn’t personally offended by the idea that people could use my information. I thought, I’ve got nothing to hide, and in fact maybe it will make my life easier and more efficient for the machine to know me and my habits.

But here’s how I think now that I’m a modeler and I see how this stuff gets made and I see how it gets applied. That we are each giving up our data, and it’s so easy to do we don’t think about it, and it’s being used to funnel people into success or failure in a feedback loop. And the modelers, the people responsible for creating these things and implementing them, are always already the successes, they are educated and are given good terms on their credit cards and mortgages because they have a nifty high tech job. So the makers get to think of how much easier and more convenient their lives are now that the models see how dependable they are as consumers.

But when there are funnels, there’s always someone who gets funneled down.

Think about how it works with insurance. The idea of insurance is to pool people so that when one person gets sick, the medical costs for that person are paid from the common fund. Everyone pays a bit so it doesn’t break the bank.

But if we have really good information, we begin to see how likely people are to get sick. So we can stratify the pool. Since I almost never get sick, and when I do it’s just strep throat, I get put into a very nice pool with other people who never get sick, and we pay very very little and it works out great for us. But other people have worse luck of the DNA draw and they get put into the “pretty sick” pool and their premium gets bigger as their pool gets sicker until they are really sick and the premium is actually unaffordable. We are left with a system where the people who need insurance the most can’t be part of the system anymore. Too much information ruins the whole idea of insurance and pooled risk.

I think modern modeling is analogous. When people offer deals, they can first check to see if the people they are offering deals are guaranteed to pay back everything. In other words, the businesses (understandably) want to make very certain they are going to profit from each and every customer, and they are getting more and more able to do this. That’s great for customers with perfect credit scores, and it makes it easier for people with perfect credit scores to keep their perfect credit scores, because they are getting the best deals.

But for people with bad credit scores, they get the rottenest deals, which makes a larger and larger percentage of their takehome pay (if they even get a job considering their credit scores) go towards fees and high interest rates. This of course creates an environment in which it’s difficult to improve their credit score- so they default and their credit score gets worse instead of better.

So there you have it, a negative feedback loop and a death spiral of modeling.

Categories: data science

Comments (18)

Constantine Costes

February 21, 2012 at 8:38 am

If you have not already seen it, check out the movie Gattaca, which has a kind of creepy data model at its core.

LikeLike
charles sereno

February 21, 2012 at 1:56 pm

Hoist by your own words — “I’m going to make a prediction, namely that there will be two different systems in place in 15 years. Neither will involve traditional publishers, but one of them will keep that refereeing system intact whereas the other will be more of a crowd-sourced referee system.” (mathbabe)
Isn’t this the stuff of models? If not, explain what you mean by a “model.”

LikeLike
- Cathy O'Neil, mathbabe
  
  February 21, 2012 at 1:59 pm
  
  I’m not sure what the question is. I’m a believer in models by the way, I just don’t think all models should be made and used just because they can be.
  
  LikeLike
  - charles sereno
    
    February 21, 2012 at 2:55 pm
    
    Sorry for being opaque. I thought your predictions would be apt ‘grist’ for the model ‘mill’ you requested. It’s over my head, but it’s OK by me.
    
    LikeLike
Emanuel Derman

February 21, 2012 at 2:02 pm

Latest questionable use of data : http://www.nytimes.com/2012/02/21/us/politics/campaigns-use-microtargeting-to-attract-supporters.html?_r=1&hp

LikeLike
- Cathy O'Neil, mathbabe
  
  February 21, 2012 at 2:07 pm
  
  Wow, great example. Thanks!
  
  Cathy
  
  LikeLike
McHandler

February 22, 2012 at 7:05 am

Camelot!
Camelot!
It’s only a model…
Shhhhh!

LikeLike
wgersen

February 22, 2012 at 7:41 am

Here’s a link to a New York Times article I blogged about in January:

It seems that no one knows for sure that value added modeling works but everyone wants to believe it works because it reduces a complicated skill (teaching) to the kind of neat mathematical formula that economists and politicians love… a formula that has a false precision to it that lends itself to rating… and worse of all, a formula that reinforces the notion that the only thing important in education are standardized test results….

LikeLike
Tim Cerino

February 22, 2012 at 11:20 am

So then a question: What kinds of projects, areas of research, etc. are better to focus on? Given that a statistician / data-scientist needs gainful employment and/or productive research topics, what areas are best? Or least-worst?

Totally agree with you that the creepy universe includes agressive customer profiling, pricing frameworks that hose those with worse credit scores/less money/poorer health, etc. Oh, and designing securities which exploit investors. The list goes on.

So then what about the other end of the spectrum? I like your post about “What data science _should_ be doing” – – can we extend the dialog and exchange of ideas in that direction?

LikeLike
- Cathy O'Neil, mathbabe
  
  February 22, 2012 at 11:23 am
  
  Tim,
  
  Great question, and yes I’m composing a post about this very thing. The answer is, there are lots of great things we can and should be doing. The trick is getting paid for them. I’d love help thinking about this!
  
  Cathy
  
  LikeLike
  - Tim Cerino
    
    February 22, 2012 at 3:17 pm
    
    I think there are many parallels between data science today and fixed income derivatives circa 1992. While there were some snakes in the grass in the early days of derivatives, I think most people in the industry were filled with optimism and excitement at all of the new gee-whiz modeling tools that were emerging. Econ theory kind-of-sort-of said the market would sort everything out in the end, so nobody had to worry too much about repercussions, so it was thought.
    
    I think data sci could potentially pose comparable amounts of frankenstein-risk, extrapolating forward 10-20 years.
    
    So I think it would be great if we start thinking about our ethical compass now, rather than ex-post. Watching the finance world try to sort this out after the fact has obviously not been pretty. Or very effective.
    
    With that said, have any of you read Emanuel Derman’s “Models. Behaving. Badly.”? In it, he offers a “Modeler’s Code of Ethics” – – fairly mild, but certainly a gesture in the right direction.
    
    Does this topic warrant some kind of working group?
    
    LikeLike
    - Cathy O'Neil, mathbabe
      
      February 22, 2012 at 3:36 pm
      
      I couldn’t agree more. And yes this warrants more than a mere discussion. I think this is an urgent issue. I reviewed Derman’s book in a post a few months ago… I will link to it when I get to a computer but now I am on my phone. Are you in NY? Want to start a working group on this with me?
      
      LikeLike
Greg Marquez

February 22, 2012 at 11:28 am

What my wife and I have been discussing for a while now is the possibility that companies, may be “modeling” who will pay unwarranted fees, penalties etc. added to bills without questioning them. Or who will fail to pay their bill on time, triggering late fees and penalties, if the billing date is moved to an earlier date.

Is that possible? Is it happening?

LikeLike
- Cathy O'Neil, mathbabe
  
  February 22, 2012 at 11:33 am
  
  happening. has been happening in credit card world for decades in fact.
  
  LikeLike
  - Greg Marquez
    
    February 22, 2012 at 4:28 pm
    
    Don’t know what this means, if anything, for creepy model watching but somebody from DEUTSCHE BANK, in New York, just clicked through to our website using the link in our comment above.
    
    LikeLike
csissoko

February 22, 2012 at 1:09 pm

“the businesses (understandably) want to make very certain they are going to profit from each and every customer, and they are getting more and more able to do this”

The problem is worse than that: cross-subsidization is common. For financial services it’s frequently the case that those who need the services least — i.e. the rich — get paid to participate, while those who truly need the services cover not only their own costs, but pay to incentivize the rich to participate. Credit cards definitely fall into this category and I suspect that that’s part of what was going on in the mortgage market too.

Click to access ppdp1003.pdf

http://www.pbs.org/wgbh/pages/frontline/creditcards/interviews/mehta.html?utm_campaign=videoplayer&utm_medium=fullplayer&utm_source=relatedlink

LikeLike
Stephen Purpura (@spurpura)

February 23, 2012 at 3:34 am

Modelers can only create negative feedback loops if society lets them. There is nothing to prevent us from setting up rules that detect and prevent negative feedback loops. It’s a hard problem. But hardly impossible.

LikeLike
travellingactuary

December 1, 2013 at 5:49 pm

The idea of the negative feedback loop in insurance is fascinating. If insurance companies become sufficiently skilled at modeling to identify a significant proportion of people who will claim, and hence charge an unaffordable rate for those people, they will set up a kind of catch-22 where the only people who are able to take out insurance are those who don’t need it; therefore being able to afford insurance becomes a sign that you don’t require insurance, so no one will take out insurance and the industry will vanish, or more likely revert to smaller friendly societies which are essentially savings clubs which manage a pool of emergency funds.

LikeLike