VAM shouldn’t be used for tenure
I recently read a New York Times “Room for Debate” discussion on the teacher Value-added model (VAM) and whether it’s fair.
I’ve blogged a few times about this model and I think it’s crap (see this prior post which is entitled “The Value Added Model Sucks” for example).
One thing I noticed about the room for debate is that the two most pro-VAM talking heads (this guy and this guy) both quoted the same paper, written by Dan Goldhaber and Michael Hansen, called “Assessing the Potential of Using Value-Added Estimates of Teacher Job Performance for Making Tenure Decisions,” which you can download here.
Looking at the paper, I don’t really think it’s a very good resource if you want to argue for tenure-decisions based on VAM, but I guess it’s one of those things, where they don’t expect you actually do the homework.
For example, they admit that year-to-year scores are only correlated between 20% and 50% for the same teacher (page 4). But then they go on to say that, if you average two or more years in a row, these correlations go up (page 4). I’m wondering if that’s just because they calculate the correlations that come from the same underlying data, in which case of course the correlations go up. They aren’t precise enough at that point to make me convinced they did this carefully.
But it doesn’t matter, because when teachers are up for tenure, they have one or two scores, that’s it. So the fact that 17 years of scores, on average, has actual information, even if true, is irrelevant. The point is that we are asking whether one or two scores, in a test that has 20-50% correlation year-to-year, is sufficiently accurate and precise to decide on someone’s job. And by the way, in my post the correlation of teachers’ scores for the same year in the same subject was 24%, so I’m guess we should lean more towards the bottom of this scale for accuracy.
This is ludicrous. Can you imagine being told you can’t keep your job because of a number that imprecise? I’m grasping for an analogy, but it’s something like getting tenure as a professor based on what an acquaintance you’ve never met head about your reputation while he was drunk at a party. Maddening. And I can’t imagine it’s attracting more good people to the trade. I’d walk the other way if I heard about this.
The reason the paper is quoted so much is that it looks at a longer-term test to see whether early-career VAM scores have predictive power for the students more than 11 years later. However, it’s for one data set in North Carolina, and the testing actually happened in 1995 (page 6), so before the testing culture really took over (an important factor), and they clearly exclude any teacher whose paperwork is unavailable or unclear, as well as small classes (page 7), which presumably means any special-ed kids. Moreover, they admit they don’t really know if the kids are actual students of the teacher who proctored the tests (page 6).
Altogether a different set-up than the idiosyncratic, real-world situation faced by actual teachers, whose tenure decision is actually being made based on one or two hugely noisy numbers.
I’m not a huge fan of tenure, and I want educators to be accountable to being good teachers just like everyone else who cares about this stuff, but this is pseudo-science.
I’m still obsessed with the idea that people would know how crappy this stuff is if we could get our hands on the VAM itself and set something up where people could test robustness directly, by putting in their information and seeing how their score would change based on how many kids they had in their class etc..