Every now and then when I complain about the Value-Added Model (VAM), people send me links to recent papers written Raj Chetty, John Friedman, and Jonah Rockoff like this one entitled Measuring the Impacts of Teachers II: Teacher Value-Added and Student Outcomes in Adulthood or its predecessor Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates.
I think I’m supposed to come away impressed, but that’s not what happens. Let me explain.
Their data set for students scores start in 1989, well before the current value-added teaching climate began. That means teachers weren’t teaching to the test like they are now. Therefore saying that the current VAM works because an retrograded VAM worked in 1989 and the 1990’s is like saying I must like blueberry pie now because I used to like pumpkin pie. It’s comparing apples to oranges, or blueberries to pumpkins.
I’m surprised by the fact that the authors don’t seem to make any note of the difference in data quality between pre-VAM and current conditions. They should know all about feedback loops; any modeler should. And there’s nothing like telling teachers they might lose their job to create a mighty strong feedback loop. For that matter, just consider all the cheating scandals in the D.C. area where the stakes were the highest. Now that’s a feedback loop. And by the way, I’ve never said the VAM scores are totally meaningless, but just that they are not precise enough to hold individual teachers accountable. I don’t think Chetty et al address that question.
So we can’t trust old VAM data. But what about recent VAM data? Where’s the evidence that, in this climate of high-stakes testing, this model is anything but random?
If it were a good model, we’d presumably be seeing a comparison of current VAM scores and current other measures of teacher success and how they agree. But we aren’t seeing anything like that. Tell me if I’m wrong, I’ve been looking around and I haven’t seen such comparisons. And I’m sure they’ve been tried, it’s not rocket science to compare VAM scores with other scores.
The lack of such studies reminds me of how we never hear about scientific studies on the results of Weight Watchers. There’s a reason such studies never see the light of day, namely because whenever they do those studies, they decide they’re better off not revealing the results.
And if you’re thinking that it would be hard to know exactly how to rate a teacher’s teaching in a qualitative, trustworthy way, then yes, that’s the point! It’s actually not obvious how to do this, which is the real reason we should never trust a so-called “objective mathematical model” when we can’t even decide on a definition of success. We should have the conversation of what comprises good teaching, and we should involve the teachers in that, and stop relying on old data and mysterious college graduation results 10 years hence. What are current 6th grade teachers even supposed to do about studies like that?
Note I do think educators and education researchers should be talking about these questions. I just don’t think we should punish teachers arbitrarily to have that conversation. We should have a notion of best practices that slowly evolve as we figure out what works in the long-term.
So here’s what I’d love to see, and what would be convincing to me as a statistician. If we see all sorts of qualitative ways of measuring teachers, and see their VAM scores as well, and we could compare them, and make sure they agree with each other and themselves over time. In other words, at the very least we should demand an explanation of how some teachers get totally ridiculous and inconsistent scores from one year to the next and from one VAM to the next, even in the same year.
We need some ground truth, people, and some common sense as well. Instead we’re seeing retired education professors pull statistics out of thin air, and it’s an all-out war of supposed mathematical objectivity against the civil servant.
This is a great book. It’s well written, clear, and it focuses on important issues. I did not check all of the claims made by the data but, assuming they hold up, the book makes two hugely important points which hopefully everyone can understand and debate, even if we don’t all agree on what to do about them.
First, the authors explain the insufficiency of monetary policy to get the country out of recession. Second, they suggest a new way to structure debt.
To explain these points, the authors do something familiar to statisticians: they think about distributions rather than averages. So rather than talking about how much debt there was, or how much the average price of houses fell, they talked about who was in debt, and where they lived, and which houses lost value. And they make each point carefully, with the natural experiments inherent in our cities due to things like available land and income, to try to tease out causation.
Their first main point is this: the financial system works against poor people (“borrowers”) much more than rich people (“lenders”) in times of crisis, and the response to the financial crisis exacerbated this discrepancy.
The crisis fell on poor people much more heavily: they were wiped out by the plummeting housing prices, whereas rich people just lost a bit of their wealth. Then the government stepped in and protected creditors and shareholders but didn’t renegotiate debt, which protected lenders but not borrowers. This is a large reason we are seeing so much increasing inequality and why our economy is stagnant. They make the case that we should have bailed out homeowners not only because it would have been fair but because it would have been helpful economically.
The authors looked into what actually caused the Great Recession, and they come to a startling conclusion: that the banking crisis was an effect, rather than a cause, of enormous household debt and consumer pull-back. Their narrative goes like this: people ran up debt, then started to pull back, and and as a result the banking system collapsed, as it was utterly dependent on ever-increasing debt. Moreover, the financial system did a very poor job of figuring out how to allocate capital and the people who made those loans were not adequately punished, whereas the people who got those loans were more than reasonably punished.
About half of the run-up of household debt was explained by home equity extraction, where people took out money from their home to spend on stuff. This is partly due to the fact that, in the meantime, wages were stagnant and home equity was a big thing and was hugely available.
But the authors also made the case that, even so, the bubble wasn’t directly caused by rising home valuations but rather to securitization and the creation of “financial innovation” which made investors believe they were buying safe products which were in fact toxic. In their words, securities are invented to exploit “neglected risks” (my experience working in a financial risk firm absolutely agrees to this; whenever you hear the phrase “financial innovation,” please interpret it to mean “an instrument whose risk hides somewhere in the creases that investors are not yet aware of”).
They make the case that debt access by itself elevates prices and build bubbles. In other words, it was the sausage factory itself, producing AAA-rated ABS CDO’s that grew the bubble.
Next, they talked about what works and what doesn’t, given this distributional way of looking at the household debt crisis. Specifically, monetary policy is insufficient, since it works through the banks, who are unwilling to lend to the poor who are already underwater, and only rich people benefit from cheap money and inflated markets. Even at its most extreme, the Fed can at most avoid deflation but it not really help create inflation, which is what debtors need.
Fiscal policy, which is to say things like helicopter money drops or added government jobs, paid by taxpayers, is better but it makes the wrong people pay – high income earners vs. high wealth owners – and isn’t as directly useful as debt restructuring, where poor people get a break and it comes directly from rich people who own the debt.
There are obstacles to debt restructuring, which are mostly political. Politicians are impotent in times of crisis, as we’ve seen, so instead of waiting forever for that to happen, we need a new kind of debt contract that automatically gets restructured in times of crisis. Such a new-fangled contract would make the financial system actually spread out risk better. What would that look like?
The authors give two examples, for mortgages and student debt. The student debt example is pretty simple: how quickly you need to pay back your loans depends in part on how many jobs there are when you graduate. The idea is to cushion the borrower somewhat from macro-economic factors beyond their control.
Next, for mortgages, they propose something the called the shared-responsibility mortgage. The idea here is to have, say, a 30-year mortgage as usual, but if houses in your area lost value, your principal and monthly payments would go down in a commensurate way. So if there’s a 30% drop, your payments go down 30%. To compensate the lenders for this loss-share, the borrowers also share the upside: 5% of capital gains are given to the lenders in the case of a refinancing.
In the case of a recession, the creditors take losses but the overall losses are smaller because we avoid the foreclosure feedback loops. It also acts as a form of stimulus to the borrowers, who are more likely to spend money anyway.
If we had had such mortgage contracts in the Great Recession, the authors estimate that it would have been worth a stimulus of $200 billion, which would have in turn meant fewer jobs lost and many fewer foreclosures and a smaller decline of housing prices. They also claim that shared-responsibility mortgages would prevent bubbles from forming in the first place, because of the fear of creditors that they would be sharing in the losses.
A few comments. First, as a modeler, I am absolutely sure that once my monthly mortgage payment is directly dependent on a price index, that index is going to be manipulated. Similarly as a college graduate trying to figure out how quickly I need to pay back my loans. And depending on how well that manipulation works, it could be a disaster.
Second, it is interesting to me that the authors make no mention of the fact that, for many forms of debt, restructuring is already a typical response. Certainly for commercial mortgages, people renegotiate their principal all the time. We can address the issue of how easy it is to negotiate principal directly by talking about standards in contracts.
Having said that I like the idea of having a contract that makes restructuring automatic and doesn’t rely on bypassing the very real organizational and political frictions that we see today.
Let me put it this way. If we saw debt contracts being written like this, where borrowers really did have down-side protection, then the people of our country might start actually feeling like the financial system was working for them rather than against them. I’m not holding my breath for this to actually happen.
I am now part of the administrative bloat over at Columbia. I am non-faculty administration, tasked with directing a data journalism program. The program is great, and I’m not complaining about my job. But I will be honest, it makes me uneasy.
Although I’m in the Journalism School, which is in many ways separated from the larger university, I now have a view into how things got so bloated. And how they might stay that way, as well: it’s not clear that, at the end of my 6-month gig, on September 16th, I could hand my job over to any existing person at the J-School. They might have to replace me, or keep me on, with a real live full-time person in charge of this program.
There are good and less good reasons for that, but overall I think there exists a pretty sound argument for such a person to run such a program and to keep it good and intellectually vibrant. That’s another thing that makes me uneasy, although many administrative positions have less of an easy sell attached to them.
I was reminded of this fact of my current existence when I read this recent New York Times article about the administrative bloat in hospitals. From the article:
And studies suggest that administrative costs make up 20 to 30 percent of the United States health care bill, far higher than in any other country. American insurers, meanwhile, spent $606 per person on administrative costs, more than twice as much as in any other developed country and more than three times as much as many, according to a study by the Commonwealth Fund.
A comprehensive study published by the Delta Cost Project in 2010 reported that between 1998 and 2008, America’s private colleges increased spending on instruction by 22 percent while increasing spending on administration and staff support by 36 percent. Parents who wonder why college tuition is so high and why it increases so much each year may be less than pleased to learn that their sons and daughters will have an opportunity to interact with more administrators and staffers— but not more professors.
There are similarities and there are differences between the university and the medical situations.
A similarity is that people really want to be educated, and people really need to be cared for, and administrations have grown up around these basic facts, and at each stage they seem to be adding something either seemingly productive or vitally needed to contain the complexity of the existing machine, but in the end you have enormous behemoths of organizations that are much too complex and much too expensive. And as a reality check on whether that’s necessary, take a look at hospitals in Europe, or take a look at our own university system a few decades ago.
And that also points out a critical difference: the health care system is ridiculously complicated in this country, and in some sense you need all these people just to navigate it for a hospital. And ObamaCare made that worse, not better, even though it also has good aspects in terms of coverage.
Whereas the university system made itself complicated, it wasn’t externally forced into complexity, except if you count the US News & World Reports gaming that seems inescapable.
You might have heard about the recent study entitled Higher social class predicts increased unethical behavior. In it, the authors figure out seven ways to measure the extent to which rich people are bigger assholes than poor people, a plan that works brilliantly every time.
What they term “unethical behavior” comes down to stuff like cutting off people and cars in an intersection, cheating in a game, and even stealing candy from a baby.
The authors also show that rich people are more likely to think of greed as good, and that attitude is sufficient to explain their feelings of entitlement. Another way of saying this it that, once you “account for greed feelings,” being rich doesn’t make you more likely to cheat.
I’d like to go one step further and ask, why do rich people think greed is good? A couple of things come to mind.
First, rich people rarely get arrested, and even when they are arrested, their experiences are very different and much less likely to end up with a serious sentence. Specifically, the fees are not onerous for the rich, and fancier lawyers do better jobs for the rich (by the way, in Finland, speeding tickets are on a sliding scale depending on the income of the perpetrator). It’s easy to think greed is good if you never get punished for cheating.
Second, rich people are examples of current or legacy winners in the current system, and that feeling that they have won leaks onto other feelings of entitlement. They have faith in the system to keep them from having to deal with consequences because so far so good.
Finally, some people deliberately judge that they can afford to be assholes. They are insulated from depending on other people because they have money. Who needs friends when you have resources?
Of course, not all rich people are greed-is-good obsessed assholes. But there are some that specialize in it. They call themselves Libertarians. Paypal founder Peter Thiel is one of their heroes.
Here’s some good news: some of those people intend to sail off on a floating country. Thiel is helping fund this concept. The only problem is, they all are so individualistic it’s hard for them to agree on ground rules and, you know, a process by which to decide things (don’t say government!).
This isn’t a new idea, but for some reason it makes me very happy. I mean, wouldn’t you love it if a good fraction of the people who cut you off in traffic got together and decided to leave town? I’m thinking of donating to that cause. Do they have a Kickstarter yet?
I gave a talk to the invitation-only NYC CTO Club a couple of weeks ago about my fears about big data modeling, namely:
- that big data modeling is discriminatory,
- that big data modeling increases inequality, and
- that big data modeling threatens democracy.
I had three things on my “to do” list for the audience of senior technologists, namely:
- test internal, proprietary models for discrimination,
- help regulators like the CFPB develop reasonable audits, and
- get behind certain models being transparent and publicly accessible, including credit scoring, teacher evaluations, and political messaging models.
Given the provocative nature of my talk, I was pleasantly surprised by the positive reception I was given. Those guys were great – interactive, talkative, and very thoughtful. I think it helped that I wasn’t trying to sell them something.
Even so, I shouldn’t have been surprised when one of them followed up with me to talk about a possible business model for “fairness audits.” The idea is that, what with the recent bad press about discrimination in big data modeling (some of the audience had actually worked with the Podesta team), there will likely be a business advantage to being able to claim that your models are fair. So someone should develop those tests that companies can take. Quick, someone, monetize fairness!
One reason I think this might actually work – and more importantly, be useful – is that I focused on “effects-based” discrimination, which is to say testing a model by treating it like a black box and seeing how it works on different inputs and gives different outputs. In other words, I want to give a resume-sorting algorithm different resumes with similar qualifications but different races. An algorithmically induced randomized experiment, if you will.
From the business perspective, a test that allows a model to remain a black box feels safe, because it does not require true transparency, and allows the “secret sauce” to remain secret.
One thing, though. I don’t think it makes too much sense to have a proprietary model for fairness auditing. In fact the way I was imagining this was to develop an open-source audit model that the CFPB could use. What I don’t want, and which would be worse than nothing, would be if some private company developed a proprietary “fairness audit” model that we cannot trust and would claim to solve the very real problems listed above.
Update: something like this is already happening for privacy compliance in the big data world (hat tip David Austin).
Need to be both nerdy and outraged today.
I’ve noticed something. When something shitty happens to me, and I’m complaining to a group of friends about it, I sometimes say something like “that only happened to me because I’m a woman.”
Now, first of all, I want to be clear, I’m no victim. I don’t let sexism get me down. In fact when I say something like that it usually is a coping mechanism to separate that person’s actions from my own actions, and to help me figure out what to do next. Usually I let it slide off of me and continue on my merry way.
But here’s where it’s weird. If I’m with a bunch of women friends, their immediate reaction is always the same: “hell yes, that bitch/ bastard is just a sexist fuck.” But if I’m with a bunch of man friend of mine, the reaction is very likely to be different: “oh, I don’t think there’s any reason to assume it was sexist. That guy/ girl is just an asshole.”
What it comes down to is priors. My prior is that there is sexism in the world, and it happens all the fucking time, especially to women with perceived power (or to women with no power whatsoever), and so when someone treats me or someone else badly, I do assume we should look into the sexism angle. It’s a natural choice, and Occam’s razor suggests it is involved.
So when Jill Abramson got fired, a bunch of the world’s women were like, those fuckers fired her because she is a powerful, take-no-bullshit woman, and if she’d been a man she would have been expected to act like a dick, but because she’s a woman they couldn’t handle it.
And a bunch of the world’s men were like, wow, I wonder what happened?
So, yes, now I have a prior on people’s priors on sexism, and I think men’s and women’s sexism priors are totally different. I can even explain it.
Men are men, so they don’t experience sexism. So they don’t update their priors like women do. Plus, because there is rarely a moment when an event or reaction is officially deemed “sexist,” men even categorize events differently than women (as discussed above), so even when they do update their prior, it is differently updated, partly because their prior is that nothing is sexist unless proven to be, since it’s so freaking unlikely, according to their prior.
This is a guest post written by Stephanie Yang and reposted from her blog. Stephanie and I went to graduate school at Harvard together. She is now a quantitative analyst living in New York City, and will be joining the data science team at Foursquare next month.
Last week’s hysterical report by the Daily Show’s Samantha Bee on federally funded penis pumps contained a quote which piqued our quantitative interest. Listen carefully at the 4:00 mark, when Ilyse Hogue proclaims authoritatively:
“Statistics show that probably some our members of congress have a vested interested in having penis pumps covered by Medicare!”
Ilya’s wording is vague, and intentionally so. Statistically, a lot of things are “probably” true, and many details are contained in the word “probably”. In this post we present a simple statistical model to clarify what Ilya means.
First we state our assumptions. We assume that penis pumps are uniformly distributed among male Medicare recipients and that no man has received two pumps. These are relatively mild assumptions. We also assume that what Ilya refers to as “members of Congress [with] a vested interested in having penis pumps covered by Medicare,” specifically means male member of congress who received a penis pump covered by federal funds. Of course, one could argue that female members congress could also have a vested interested in penis pumps as well, but we do not want to go there.
Now the number crunching. According to the report, Medicare has spent a total of $172 million supplying penis pumps to recipients, at “360 bucks a pop.” This means a total of 478,000 penis pumps bought from 2006 to 2011.
45% of the current 49,435,610 Medicare recipients are male. In other words, Medicare bought one penis pump for every 46.5 eligible men. Inverting this, we can say that 2.15% of male Medicare recipients received a penis pump.
There are currently 128 members of congress (32 senators plus 96 representatives) who are males over the age of 65 and therefore Medicare-eligible. The probability that none of them received a federally funded penis pump is:
In other words, the chances of at least one member of congress having said penis pumps is 93.8%, which is just shy of the 95% confidence that most statisticians agree on as significant. In order to get to 95% confidence, we need a total of 138 male members of congress who are over the age of 65, and this has not happened yet as of 2014. Nevertheless, the estimate is close enough for us to agree with Ilya that there is probably someone member of congress who has one.
Is it possible that there two or more penis pump recipients in congress? We did notice that Ilya’s quote refers to plural members of congress. Under the assumptions laid out above, the probability of having at least two federally funded penis pumps in congress is:
Again, we would say this is probably true, though not nearly with the same amount of confidence as before. In order to reach 95% confidence that there are two or moreq congressional federally funded penis pump, we would need 200 or more Medicare-eligible males in congress, which is unlikely to happen anytime soon.
Note: As a corollary to these calculations, I became the first developer in the history of mankind to type the following command:
git merge --squash penispump.