VAM versus what?
A few astute readers pointed out to me that in the past few days I both slammed the Value-added teacher’s model (VAM) and complained about people who reject something without providing an alternative. Good point, and today I’d like to start that discussion.
What should we be doing instead of VAM?
First of all, I do think that not rating teachers at all is better than the current system. So my “compare the the status quo” argument goes through in this instance. Namely, VAM is actively discouraging teachers whereas leaving them alone entirely would neither discourage or encourage anyone. So better than this.
At the same time, I am a realist, and I think there should be, ultimately, a system of evaluating teachers, just as there is a system for evaluating me at work. The difference between my workplace, of 45 people, and the NYC public schools is scale. It makes sense to have a very large and consistent evaluation system in the NYC public schools, whereas my job can have an ad hoc inconsistent system without it being a problem.
There’s another problem which is nearly impossible to tease from this discussion. Namely, the fact that what’s going on in NYC is a disingenuous political game between Bloomberg and the teacher’s union. Just to emphasize how important that fight is, let’s keep in mind that as of now, although the union is much weaker than it historically has been, it still has the tenure system. So any model, VAM or not, of evaluation is somewhat irrelevant for “removing bad teachers” given that they have tenure and tenure still means something.
Probably the best way to decouple the “Bloomberg vs. union/tenure” issue (a massive one here in NYC) from the “VAM versus other” question is to think nationally rather than citywide.
The truth is, the VAM is being tried out all over the country (although I don’t have hard numbers on this) and the momentum is for it to be used more and more. I predict within 10 years it will be done systematically everywhere in the country.
And, sadly, that’s kind of my prediction whether or not the underlying model is any good or not! The truth is, there is a large contingent of technocrats who want control over the evaluation system and believe in the models, whether or not they are producing pure noise or not. In other words, they believe in “data driven decisioning” as a holy grail even though there’s scant evidence that this will work in schools. And they also don’t want to back down now, even though the model sucks, because they feel like they’ll be losing momentum on the overall data-driven approach.
One thing I know for sure is that we should continue to be aware of how badly the current models are, and I want to set up an open source version of the models (see this post to get an idea how it could work) to exhibit that. In other words, even if we don’t turn off the models altogether, can’t we at least minimize their importance while their quality is bad? The first step is to plainly exhibit how bad they are.
It’s hard for me to decide what to do next, though. I’m essentially a modeler who is hugely skeptical of models. In fact, I don’t think using purely quantitative models to evaluate teachers is the right thing to do, period. Yet I feel like if it’s definitely going to happen, better for people like me to be in the middle of it, pointing out how bad the proposed (or in use) models are actually performing, and improving them.
One thing I know I’d do if I were to be put in charge of creating a better model: I’d train on data where the teacher is actually rated as a good teacher or not. In other words, I wouldn’t proxy “good teacher” by “if your students scored better than expected on tests”. A good model would be trained on data where there would be an expert teacher scorer, who would go into 500 classrooms and carefully evaluate the actual teachers, based on things like whether the teacher asked questions, or got the kids engaged, or talked too much or too little, or imposed too much busy work, etc. Then the model would be trying to mimic this expert.
Of course there are lots of really complicated issues to sort out- and they are *totally unavoidable*. This is why I’m so skeptical of models, by the way: people think you can simplify stuff when you actually can’t. There’s nothing simple about teaching and whether someone’s a good teacher. It’s just plain complex. A simple model will be losing too much information.
Here’s one. Different people think good teaching is different. A possible solution: maybe we could have 5 different “expert models” based on different people’s definitions of good teaching, and every teacher could be evaluated based on every model. Still need to find those 5 experts that teachers trust.
Here’s another. The kind of teacher-specific attributes collected for this test would be different from the VAM- things that happen inside a classroom (like percentage of time teacher talks vs. student, the tone of the discussion, the number and percentage of kids involved in the discussion, etc,) and are harder to capture accurately. These are technological hurdles that are hard.
I think one of the most important questions is whether we can come up with an evaluation system that would be sufficiently reasonable and transparent that the teachers themselves would get on board.
I’d to hear more ideas.