Another death spiral of modeling: e-scores

Home > data science, open source tools, rant > Another death spiral of modeling: e-scores

Another death spiral of modeling: e-scores

August 20, 2012 Cathy O'Neil, mathbabe

Yesterday my friend and fellow Occupier Suresh sent me this article from the New York Times.

It’s something I knew was already happening somewhere, but I didn’t know the perpetrators would be quite so proud of themselves as they are; on the other hand I’m also not surprised, because people making good money on mathematical models rarely take the time to consider the ramifications of those models. At least that’s been my experience.

So what have these guys created? It’s basically a modern internet version of a credit score, without all the burdensome regulation that comes with it. Namely, they collect all kinds of information about people on the web, anything they can get their hands on, which includes personal information like physical and web addresses, phone number, google searches, purchases, and clicks of each person, and from that they create a so-called “e-score” which evaluates how much you are worth to a given advertiser or credit card company or mortgage company or insurance company.

Some important issues I want to bring to your attention:

Credit scores are regulated, and in particular the disallow the use of racial information, whereas these e-scores are completely unregulated and can use whatever information they can gather (which is a lot). Not that credit score models are open source: they aren’t, so we don’t know if they are using variables correlated to race (like zip code). But still, there is some effort to protect people from outrageous and unfair profiling. I never though I’d be thinking of credit scoring companies as the good guys, but it is what it is.
These e-scores are only going for max pay-out, not default risk. So, for the sake of a credit card company, the ideal customer is someone who pays the minimum balance month after month, never finishing off the balance. That person would have a higher e-score than someone who pays off their balance every month, although presumably that person would have a lower credit score, since they are living more on the edge of insolvency.
Not that I need to mention this, but this is the ultimate in predatory modeling: every person is scored based on their ability to make money for the advertiser/ insurance company in question, based on any kind of ferreted-out information available. It’s really time for everyone to have two accounts, one for normal use, including filling out applications for mortgages and credit cards and buying things, and the second for sensitive google searches on medical problems and such.
Finally, and I’m happy to see that the New York Times article noticed this and called it out, this is the perfect setup for the death spiral of modeling that I’ve mentioned before: people considered low value will be funneled away from good deals, which will give them bad deals, which will put them into an even tighter pinch with money because they’re being nickeled and timed and paying high interest rates, which will make them even lower value.
A model like this is hugely scalable and valuable for a given advertiser.
Therefore, this model can seriously contribute to our problem of increasing inequality.
How can we resist this? It’s time for some rules on who owns personal information.

Categories: data science, open source tools, rant

Comments (2)

Melissa

February 14, 2013 at 7:25 am

Nice post which still, there is some effort to protect people from outrageous and unfair profiling. It never though I’d be thinking of credit scoring companies as the good guys, but it is what it is. Thanks a lot for posting this article.

LikeLike
wishboom

July 12, 2013 at 11:26 am

I don’t know how I missed this post. I’ve had this conversation several times about insurance — at least people are talking about it there; but hadn’t considered the parallels with credit. Great post, thank you.

LikeLike