This is a guest post by Todd Schneider. You can read the full post with additional analysis on Todd’s personal site.
[M]ortgages were acknowledged to be the most mathematically complex securities in the marketplace. The complexity arose entirely out of the option the homeowner has to prepay his loan; it was poetic that the single financial complexity contributed to the marketplace by the common man was the Gordian knot giving the best brains on Wall Street a run for their money. Ranieri’s instincts that had led him to build an enormous research department had been right: Mortgages were about math.
The money was made, therefore, with ever more refined tools of analysis.
—Michael Lewis, Liar’s Poker (1989)
Fannie Mae and Freddie Mac began reporting loan-level credit performance data in 2013 at the direction of their regulator, the Federal Housing Finance Agency. The stated purpose of releasing the data was to “increase transparency, which helps investors build more accurate credit performance models in support of potential risk-sharing initiatives.”
The GSEs went through a nearly $200 billion government bailout during the financial crisis, motivated in large part by losses on loans that they guaranteed, so I figured there must be something interesting in the loan-level data. I decided to dig in with some geographic analysis, an attempt to identify the loan-level characteristics most predictive of default rates, and more. The code for processing and analyzing the data is all available on GitHub.
The “medium data” revolution
In the not-so-distant past, an analysis of loan-level mortgage data would have cost a lot of money. Between licensing data and paying for expensive computers to analyze it, you could have easily incurred costs north of a million dollars per year. Today, in addition to Fannie and Freddie making their data freely available, we’re in the midst of what I might call the “medium data” revolution: personal computers are so powerful that my MacBook Air is capable of analyzing the entire 215 GB of data, representing some 38 million loans, 1.6 billion observations, and over $7.1 trillion of origination volume. Furthermore, I did everything with free, open-source software.
What can we learn from the loan-level data?
Loans originated from 2005-2008 performed dramatically worse than loans that came before them! That should be an extraordinarily unsurprising statement to anyone who was even slightly aware of the U.S. mortgage crisis that began in 2007:
About 4% of loans originated from 1999 to 2003 became seriously delinquent at some point in their lives. The 2004 vintage showed some performance deterioration, and then the vintages from 2005 through 2008 show significantly worse performance: more than 15% of all loans originated in those years became distressed.
From 2009 through present, the performance has been much better, with fewer than 2% of loans defaulting. Of course part of that is that it takes time for a loan to default, so the most recent vintages will tend to have lower cumulative default rates while their loans are still young. But there has also been a dramatic shift in lending standards so that the loans made since 2009 have been much higher credit quality: the average FICO score used to be 720, but since 2009 it has been more like 765. Furthermore, if we look 2 standard deviations from the mean, we see that the low end of the FICO spectrum used to reach down to about 600, but since 2009 there have been very few loans with FICO less than 680:
Tighter agency standards, coupled with a complete shutdown in the non-agency mortgage market, including both subprime and Alt-A lending, mean that there is very little credit available to borrowers with low credit scores (a far more difficult question is whether this is a good or bad thing!).
Default rates increased everywhere during the bubble years, but some states fared far worse than others. I took every loan originated between 2005 and 2007, broadly considered to be the height of reckless mortgage lending, bucketed loans by state, and calculated the cumulative default rate of loans in each state:
4 states in particular jump out as the worst performers: California, Florida, Arizona, and Nevada. Just about every state experienced significantly higher than normal default rates during the mortgage crisis, but these 4 states, often labeled the “sand states”, experienced the worst of it.
If you’re interested in more technical discussion, including an attempt to identify which loan-level variables are most correlated to default rates (the number one being the home price adjusted loan to value ratio), read the full post on toddwschneider.com, and be sure to check out the project on GitHub if you’d like to do your own data analysis.
There’s a really interesting article over at the Wall Street Journal today, written by Andrea Fuller and entitled The Watchdogs of College Education Rarely Bite. The article discusses the accreditation system for colleges, and how it is more or less dysfunctional. Here’s an example from the article of how they are failing to do a good job:
At Bluefield State College in West Virginia, accreditors from the Higher Learning Commission suggested in 2011 that new electronic signs on campus might be difficult for students to read while driving, according to a copy of the report. The report didn’t mention the college’s graduation rate of 25% or less since 2006.
There is troubling evidence presented in the article that we should definitely pay attention to. It’s quite possible that the accreditors are being paid off, or at least have insufficient reason to come down hard on terribly performing schools. I hope we spend time rethinking the whole system.
However, I think it’s interesting to think about the metrics of success that were used in the article. It’s also an important step towards designing a more “data-driven” accreditation approach.
So, for the most part, the article described things in terms of graduation rates and student loan defaults. Not a bad start if you wanted to measure a school: you want high graduate rates, and you want low student loan default rates. Also, they did a good thing, namely compared these numbers to a baseline. In this case their baseline was the average for the schools that have lost accreditation since 2000. Here’s their plot:
Again, these are important metrics, but the logic of the above chart seems to be, if there is a school with a lower graduation rate or a higher default rate than these baseline numbers, or both, then you should also lose your accreditation.
And by the way, I’m not really disagreeing – there are too many bad schools out there, and this seems like a pretty good way of finding truly terrible outliers. Even so, as a data nerd, I need to make the argument that these statistics are highly misleading, or can be.
Say you are trying to compare two school, and one has a higher graduation rate than the other. Do you conclude that the one with a higher graduation rate is better? Well, no. It could just graduate people because it pushes people through the classes without really teaching them anything. Or, the other one could be lower because it takes a chance on more students. In other words, a graduation rate can be lower or higher for good or bad reasons, and taken alone is not a great indicator. Lots of community colleges, moreover, are set up to be transfer schools, and the students deliberately start at that school, then transfer to 4-year colleges, thus lowering the overall graduation rate. It’s a good thing that such schools exist, and we wouldn’t want to close them all down.
Similarly, higher default rates on student loans could be an artifact of a school taking chances on students that otherwise have fewer options, or a bad economy, or even just the type of education that is offered. Engineering schools tend to graduate students who find jobs quickly and easily, but that doesn’t mean every school should become an engineering school. So I wouldn’t compare default rates of two colleges and conclude that the college with a low default rate is necessarily better.
What I’m coming to is that deciding whether a given college has become a failure is actually pretty tricky, and we can complain – and should complain, apparently – about the current system of accreditation, but we can’t claim that it’s as simple as looking at two metrics and deciding what the cut-off is. Choosing a perfect threshold would be tricky.
Or rather, we could do something like that, but then it might have weird effects. If we closed all the schools that don’t keep graduation rates high and default rates low, we might see non-engineering students pushed out of the system, or we might see schools create partnerships with corporations and become federal aid-funded corporate training centers, we might just see (even more) widespread fraud in terms of reporting such things.
There’s a game of chicken going on in Europe, whereby the moneylenders (the European Central Bank, the IMF, and the European Commission) are trying to get Greece to pay back money they previously borrowed, but Greece doesn’t have any extra cash to do it. Clive Crook gave a good summary of the situation at Bloomberg View yesterday.
I sometimes like to imagine that Europe is a family, and Greece is a member of that family who really isn’t doing well. Greece owes the other family members money, but is also really ill and spends most of its time on the couch, coughing and feverish. The other family members want their money back, of course, but seeing how sick Greece is, are reluctant to actually kick a family member out on the street.
It’s not a perfect metaphor, since Greece is actually a country, and the people making big decisions about how debt payments will work in Greece are not the same people that suffer when they run out of jobs, medicine, and pension payments. But it’s gotten a bit more like that recently with the actual election of its Prime Minister, whereas before it was being run by an appointed technocrat from the central bank.
On the other hand, it is a pretty good metaphor, mostly because the grand European vision is very much one of a family, and pushing Greece out because of failure to pay money it doesn’t really have would be shameful to many who still believe in that vision.
So, going with the metaphor for the moment, I’d like to suggest an idea that came up in my Occupy meeting last Sunday when we were talking about how actual families would solve this problem. Namely, they wouldn’t. The sick person would be allowed to stay, even though they didn’t pay back the money. And everyone would be annoyed, but family is family.
It strikes me that the concept of “opting in” to some service or society has become strained, even more than usual.
We have become more or less used to the idea that we’ll check on agreement boxes, written in inscrutable legalese, in order to get free stuff. We will do that without ever reading the box or understanding what we’ve gotten ourselves into. That’s a form of passive opting in, which depends on us barely noticing things.
But there’s a new, even more ridiculous usage of the term “opt in” that has been popping up. It’s gone beyond passive action to what you could only describe as inaction. Two examples.
The first one comes from Belgium, where they’ve decided that people have not, in fact, opted in to Facebook’s tracking and surveillance mechanism by clicking on a link that brings them to Facebook. They want people to actually click the legalese box before being tracked. Of course, their concepts of privacy are much stronger than ours, but they have an important point: opting in requires doing something, and it doesn’t count if the “doing something,” which is in this case clicking on an innocuous link, has nothing to do with terms of service.
Second example. When I was getting prepared to give my Personal Democracy Forum talk the other day (the link for the talk is here), the speaker before me, who was talking about microtargeting in politics, mentioned to me that what they do isn’t so bad. She suggested that, when they send specific political messages to certain people and not others, they only even have that information about those voters because, after all, they provided it.
I was confused, so I asked her, “Are you saying that you only send messages to people that have somehow opted in to political messaging?”
“Yes,” she responded, “they opted in by registering to vote.”
Again, that’s a severe misuse of the term “opting in.” Registering to vote is simply a part of being a citizen, and does not even indirectly imply a willingness to be tracked. We should all be automatically registered anyway, although now I’m worried about what that might mean we’re signing up for.
I know it’s almost summer, not because of the sticky heat but because it’s almost time to go to Clearwater, the music festival started by Pete Seeger in 1969 to organize around cleaning up the Hudson River. It takes place in the beautiful Croton Point Park, outside the town of Croton-on-Hudson.
I’m going to be a volunteer this weekend, along with three other members of my band, and my first shift is serving dinner in the volunteers’ kitchen on Friday from 5 to 9pm. That means I have to pitch my tent in the volunteer’s camping area before I start, because it will be dark and confusing after that. But that means I have to get there super early, and I’m likely traveling on the train with my tent, sleeping bag, sleeping roll, pillow, clothes, bathroom bag, and of course my fiddle. That a lot of stuff to haul so I’m hoping it’s not 90 degrees or something on Friday.
But of course the really exciting thing, besides jamming with my bandmates in the evenings, is the music itself. The performers are listed here, and I’m super excited about The Lone Bellow and The Felice Brothers, but of course learning about new talent is what summer music festivals is all about.
Aunt Pythia is in a rush this morning, people! She is going to see Jurassic Wold, in Imax 3D no less, and she needs to finish this here advice column quickly in order to make time for the Saturday New York Times crossword puzzle (har har). So here goes.
So read, enjoy, comments, and before you leave,
ask Aunt Pythia any question at all at the bottom of the page!
Dear Aunt Pythia,
I’ve noticed a bunch of Masters’ in Data Science programs have been launched at various reputable universities lately. Can you vouch for the usefulness/value of any of them? Or would you say they are largely the product of big-name schools wanting to make a few bucks off the “Big Data” hype train?
Don’t Wanna Get Scammed!
Yeah, my guess is both. I mean, I think most of them would teach you something, but I’m pretty sure these programs are also cash cows. As to their usefulness, one thing I’ve noticed is how few of the programs want to hire someone who has actually worked as a data scientist in a company. That doesn’t mean there is not internal person, in the academic institution, who knows a given skill, but it probably means that there’s not much direct advice for people going into this field.
To be any more specific, you’d have to name a program for me to look into.
Hmm. Gotta think of a sex question for Aunt Pythia.
I’m a guy and I feel really guilty that I have sexual thoughts in a professional setting (although I do keep them to myself). For example, when women give math talks, I notice I am analyzing their sexiness – are they thin or at least quasi-thin, how ideal their boobs and curves look, how revealing and/or form-fitting their coverings, how well is their boob support functioning, and speaking of curves and forms, I imagine relating my pole to their their holes, after removing our pairs of pants and busting out my canonical divisor (ya know, the thing that kind of rhymes with genus) I’m at the cusp of an, um, singularity. More thoughts follow: Are they on their period, are their periods irregular? I compare their height in their heels, the depth of their voice, and the dimensions of their bust. When the latter two match up, i’ve found it possible to reverse a variety of positions, even if things aren’t completely smooth. My thoughts are quite wild and perverse and I feel somewhat ashamed for thinking these thoughts. Are these concerns rational, irrational? Do you think respectable, upstanding “nice-to-women” male members of the math profession have these thoughts, or is it just dirty minded guys like myself? Do you think lesbians have these thoughts?
Umm, this started out as a totally real question, but then my love of math super-seeded my love for women’s bodies. I think the same thing happens in the talk … eventually I’m able to pay attention to the math.
Do women check out guys while they are giving math talks? What might their thoughts be like?
I think mostly everyone, or at least every adult, has thoughts along these lines. The question is, how long does it take for someone to “eventually” pay attention to the math? I think that’s critical, and it might depend on how much interaction they have with the opposite sex in their regular life, or how well they’ve been sleeping, or whether they’ve gotten exercise lately, or any number of things.
Obviously it’s better for both audience members and the speaker at a math talk if the math is the center of attention, but there’s no way to remove our humanness entirely; at the end of the day it’s a person, in front of other people, explaining some beautiful thing, and there’s bound to be human interactions.
And that’s not a bad thing. I remember concocting a crush on the speaker, male of female, of most talks I went to in order to enjoy their talk more. It worked!
So, if there’s advice to give, I’d say stop feeling guilty about checking out women, do keep your deeper desires to yourself, and enjoy the math. And if possible, try to crush out on the men too.
Dear Aunt Pythia,
Intersecting Feminism and porn To overcome objectification
Wow, no I hadn’t heard of her, but I love her talk! I’ll check her out. I hope others do too.
Hi Aunt Pythia,
I wrote to you about a year ago when I realized that I wanted to leave my PhD program for a data science job (‘Slightly Hyperventilating’). You gave encouraging advice–thanks! I ended up taking a job a little too prematurely into my search, but it’s allowed me to improve my programming skills and rub shoulders with internet user behavior datasets which is awesome. But now I’m on the job market again and excited to find a new team!
Here’s my question: at my current company, there’s a ton of tension between the engineers and the analytics people. It’s really weird and gross and counter-productive and stops me from learning from them which is what I want to do. How common is it that engineering teams look down their noses at stats-leaning, data analyzing folks? And what questions do I ask in the interview to find this out? What other indicators should I look for?
Seeking Nice Engineers
Oh my god, you were on the luxury winnebago edition of Aunt Pythia. I remember it well. Sigh.
So, great! You did everything right. The thing about data science jobs is that they don’t last forever. People are expected to jump ship once they get the basic idea of stuff and the learning curve decelerates, or when the politics of the office get annoying. In your case the latter has occurred, so go for it.
And no, I don’t have experience with nasty programmers. Most of the people I’ve worked with have been incredibly sweet. I mean, there’s some macho brogrammer posturing every now and then, but I have never seen that dominating. Just find a new job, and keep in touch!
People, people! Aunt Pythia loves you so much. And she knows that you love her. She feels the love. She really really does.
Well, here’s your chance to spread your Aunt Pythia love to the world! Please ask her a question. She will take it seriously and answer it if she can.
Click here for a form or just do it now:
I’m visiting my good friend Aaron in Atlanta, Georgia, this week, with my youngest son. So far we’ve gone swimming twice in an incredibly large pool (100 meter lanes), had ridiculously delicious barbeque (Daddy D’z), and checked out the local “growler shop” to prepare for last night’s NBA finals game.
Don’t know what a growler shop is? Neither did I, but if you like beer, you’re going to want to learn. It’s basically a take-away bar, with an enormous number of beers that you can sample and of course purchase, at great prices. The growler shop we went to is called My Friend’s Growler Shop, and two very adorable and friendly sommeliers named Camric and John:
We ended up tasting a bunch of beers but taking home Eventide Kölsh, which comes from a local brewery and is a variation of Grolsh, and Left Hand Milk Stout, which is as close to a meal in a drink as you can get if you’ve been weaned.
Why doesn’t New York have growler shops? As Camric and John explained to me, each state has different interpretations of a federal law that prohibits reselling of beers in anything other than their original containers. Law is weird, but what it means is that New York State laws would only allow a shop to sell beer from a single brewery, which is super disappointing.
Also, if you don’t love beer, there are also such things as growler wine shops, but also in Georgia and not New York.