For the record, the lecturer, Scott Page, is by all accounts a great guy, and indeed he seems super nice on video. I’d love to have him over for dinner with my family someday (Professor Page, please come to my house for dinner when you’re in town).
In spite of liking him, though, pretty much every example he gives as pro-modeling is, for me, an anti-modeling example. Maybe I should make a complementary series of YouTube comment videos. It’s not totally true, of course- I just probably don’t notice the things we agree on. But I do notice the topics on which we disagree:
- He talks a lot about how models make us clearer thinkers. But what he really seems to mean is that they make us stupider thinkers. His example is that, in order to decide who to vote for for president, we can model this decision as depending on two things: the likability of the person in question (presumably he assumes we want our president to be likable), and the extent to which that person is “as left” or “as right” as we are. I don’t know about you, but I actually care about specific issues and where people stand on them, and which issues I consider likely to come up for consideration in the next election cycle. Like, if I like someone for his “stick it to the banks” approach but he’s anti-abortion, then I think about whether abortion is likely to actually become illegal. And by the way, I don’t particularly care if my president is likable, I’d rather have him or her effective.
- He bizarrely chooses “financial interconnectedness” as a way of seeing how cool models are, and he shows a graph where the nodes are the financial institutions (Goldman Sachs, JP Morgan, etc.) and the edges are labeled with an interconnectedness score, bigger meaning more interconnected. He shows that, according to this graph, back in 2008 it shows we knew to bail out AIG but that it was definitely okay to let Lehman fail. I’m wondering if he really meant that this was an example of how your model could totally fail because your “interconnectedness scoring” sucked, but he didn’t seem to be tongue in cheek.
- He then talked about measuring the segregation of a neighborhood, either by race or by income, and he used New York and Chicago as examples. I won’t go into lots of details, but he gave a score to each block, like the census maps do with coloring, and he used those scores to develop a new score which was supposed to measure the segregation of each block. The problem I have with this segregation score is that it depends very heavily on the definition of the overall area you are considering. If you enlarge your definition of the New York City to include the suburbs, then the segregation score of New York City may (probably would) be completely different. This seems to be a really terrible characteristic of such a metric.
- My second problem with his segregation score is that, at the end, he had overall segregation numbers for Philly and Detroit, and then showed the maps and mentioned that, looking at the maps, you wouldn’t really notice that one is more segregated than the other (Philly more than Detroit), but knowing the scores you do know that. Umm.. I’d like to rather say, if you are getting scores that are not fundamentally obvious from looking at these pictures, then maybe it’s because your score sucks. What does having a “good segregation score” mean if not that it captures something you can see through a picture?
- One thing I liked was a demonstration of Schelling’s Segregation Model, which shows that, if you have a group of people who are not all that individually racist, you can still end up with a neighborhood which is very segregated.
I’m looking forward to watching more videos with my skeptical eye. After all, the guy is really a sweetheart, and I do really care about the idea of teaching people about modeling.
Crossposted from the Alternative Banking Blog.
I wanted to mention a few things that have been going on with the Alternative Banking group lately.
- The Occupy the SEC group submitted their public comments last week on the Volcker Rule and got AMAZING press. See here for a partial list of articles that have been written about these incredible folks.
- Hey, did you notice something about that last link? Yeah, Alt Banking now has a blog! Woohoo! One of our members Nathan has been updating it and he’s doing a fine job. I love how he mentions Jeremy Lin when discussing derivatives.
- Alt Banking also has a separate suggested reading list page on the new blog. Please add to it!
- We just submitted a short letter as a public comment to the new Consumer Financial Protection Bureau regulation which gives them oversight powers on debt collectors and credit score bureaus. We basically told them to make credit score models open source (and I wasn’t even in the initial conversation about what we should say to these guys! Open source rules!!):
I really feel like I can’t keep up with all of the creepy models coming out and the news articles about them, so I think I’ll just start making a list. I would appreciate readers adding to my list in the comment section. I think I’ll move this to a separate page on my blog if it comes out nice.
- I recently blogged about a model that predicts student success in for-profit institutions, which I claim is really mostly about student debt and default,
- but here’s a model which actually goes ahead and predicts default directly, it’s a new payday-like loan model. Oh good, because the old payday models didn’t make enough money or something.
- Of course there’s the teacher value-added model which I’ve blogged about multiple times, most recently here. And here’s a paper I’d like everyone to read before they listen to anyone argue one way or the other about the model (h/t Joshua Batson). The abstract is stunning: Recently, educational researchers and practitioners have turned to value-added models to evaluate teacher performance. Although value-added estimates depend on the assessment used to measure student achievement, the importance of outcome selection has received scant attention in the literature. Using data from a large, urban school district, I examine whether value-added estimates from three separate reading achievement tests provide similar answers about teacher performance. I find moderate-sized rank correlations, ranging from 0.15 to 0.58, between the estimates derived from different tests. Although the tests vary to some degree in content, scaling, and sample of students, these factors do not explain the differences in teacher effects. Instead, test timing and measurement error contribute substantially to the instability of value-added estimates across tests. Just in case that didn’t come through, they are saying that the results of the teacher value-added test scores are very very noisy.
- That reminds me, credit scoring models are old but very very creepy, wouldn’t you agree? What’s in them that they want to conceal them?
- Did you read about how Target predicts pregnancy? Extremely creepy.
- I’m actually divided about whether it’s the creepiest though, because I think the sheer enormity of information that Facebook collects about us is the most depressing thing of all.
Before I became a modeler, I wasn’t personally offended by the idea that people could use my information. I thought, I’ve got nothing to hide, and in fact maybe it will make my life easier and more efficient for the machine to know me and my habits.
But here’s how I think now that I’m a modeler and I see how this stuff gets made and I see how it gets applied. That we are each giving up our data, and it’s so easy to do we don’t think about it, and it’s being used to funnel people into success or failure in a feedback loop. And the modelers, the people responsible for creating these things and implementing them, are always already the successes, they are educated and are given good terms on their credit cards and mortgages because they have a nifty high tech job. So the makers get to think of how much easier and more convenient their lives are now that the models see how dependable they are as consumers.
But when there are funnels, there’s always someone who gets funneled down.
Think about how it works with insurance. The idea of insurance is to pool people so that when one person gets sick, the medical costs for that person are paid from the common fund. Everyone pays a bit so it doesn’t break the bank.
But if we have really good information, we begin to see how likely people are to get sick. So we can stratify the pool. Since I almost never get sick, and when I do it’s just strep throat, I get put into a very nice pool with other people who never get sick, and we pay very very little and it works out great for us. But other people have worse luck of the DNA draw and they get put into the “pretty sick” pool and their premium gets bigger as their pool gets sicker until they are really sick and the premium is actually unaffordable. We are left with a system where the people who need insurance the most can’t be part of the system anymore. Too much information ruins the whole idea of insurance and pooled risk.
I think modern modeling is analogous. When people offer deals, they can first check to see if the people they are offering deals are guaranteed to pay back everything. In other words, the businesses (understandably) want to make very certain they are going to profit from each and every customer, and they are getting more and more able to do this. That’s great for customers with perfect credit scores, and it makes it easier for people with perfect credit scores to keep their perfect credit scores, because they are getting the best deals.
But for people with bad credit scores, they get the rottenest deals, which makes a larger and larger percentage of their takehome pay (if they even get a job considering their credit scores) go towards fees and high interest rates. This of course creates an environment in which it’s difficult to improve their credit score- so they default and their credit score gets worse instead of better.
So there you have it, a negative feedback loop and a death spiral of modeling.
I recently read this New York Times article about a company that figures out how to get the best deal when you rent a car. The company is called AutoSlash and the idea is you book with them and they keep looking for good deals, coupons, or free offers every day until you actually need the car.
Wait a minute, a data science model that actually directly improves the lives of its customers? Why can’t we have more of these? Obviously the car companies absolutely hate this idea. But what are they going to do, stop offering online shopping?
Why don’t we see this in every category of shopping? It seems to me that you could do something like this and start a meta-marketplace, where you buy something and then, depending on how long you’re willing to wait until delivery, the model looks for a better online deal, in exchange for a small commission. Then you’d have to make sure that on average the commission is paying for itself with better deals, but my guess is it would work if you allowed it a few days to search per purchase. Or if you really are a doubter, fix a minimum wait time and let the company take some (larger) percentage of the difference between the initial price and the eventual best price.
Another way of saying this, is that when you go online to buy something, depending on the scale (say it’s on the expensive side) you probably shop around for a few days or weeks. Why do that in person? Just have a computer do it for you and tell you at the end what deal it gave you. Don’t get bombarded by ads, let the computer get bombarded by ads for you.
What is it that grad students do all day? Well if you’re Zachary Abel in the M.I.T. math department, then the answer may be that you fiddle with paperclips and make awesome nerdy and beautiful sculpture (I found his page through the God Plays Dice blog). Here’s my favorite sculpture from his site:
Be sure to read the explanations he gives of the things he’s made, they are very cool and sometimes comes with animation.
Adele spoof on Gingrich:
The House of the Rising Sun, nerdstyle (h/t Emil):
Yesterday I read this New York Times article on how the ECB is trading its short term Greek bonds, with Greece, for longer term bonds.
Specifically, in order to avoid holding bonds that Greece is officially planning to “voluntarily” default on, the ECB is turning in that super crappy crap for other bonds that Greece hasn’t yet decided how much they’ll default on.
Just to spell it out even more, the plan to get private bondholders more excited about trusting the European bond market has been this:
- have the the ECB step in (around the beginning of 2012) and provide liquidity and faith in the bond market,
- negotiate that the Greek bonds maturing in March 2012 are given a 70% haircut,
- make sure credit default swaps on those bonds are not activated (why we need it to be “voluntary”),
- change the terms of the bonds’ contracts so that the holdouts of this voluntary deal can be safely ignored, and
- have the ECB trade those bonds for longer-dated bonds at the last minute so they don’t actually have to take losses.
I’m not sure about you, but if I’m a private European debt holder my confidence in the bond market is not stronger right now. The argument for why the ECB is doing this is that they aren’t allowed to be seen giving money to Greece, by their charter. It’s odd to me that this charter, of all the various rules that have been broken here, is the one that is being fixated on as the important one we can’t break.
There are complicated politics going on, I am sure. I’m no expert in European politics, but this is about as European and about as political as things get.
Ignoring all of that, as a private bondholder, I’m putting a “ECB back-door swap” premium on all of my European debt from now on. Except maybe for German debt since I think Germany would rather jump out of the Euro altogether than default on its debt. But every other country is fair game. Bottomline is I short French debt today.