VAM versus what?

Home > data science, math education, rant, statistics > VAM versus what?

VAM versus what?

March 10, 2012 Cathy O'Neil, mathbabe

A few astute readers pointed out to me that in the past few days I both slammed the Value-added teacher’s model (VAM) and complained about people who reject something without providing an alternative. Good point, and today I’d like to start that discussion.

What should we be doing instead of VAM?

First of all, I do think that not rating teachers at all is better than the current system. So my “compare the the status quo” argument goes through in this instance. Namely, VAM is actively discouraging teachers whereas leaving them alone entirely would neither discourage or encourage anyone. So better than this.

At the same time, I am a realist, and I think there should be, ultimately, a system of evaluating teachers, just as there is a system for evaluating me at work. The difference between my workplace, of 45 people, and the NYC public schools is scale. It makes sense to have a very large and consistent evaluation system in the NYC public schools, whereas my job can have an ad hoc inconsistent system without it being a problem.

There’s another problem which is nearly impossible to tease from this discussion. Namely, the fact that what’s going on in NYC is a disingenuous political game between Bloomberg and the teacher’s union. Just to emphasize how important that fight is, let’s keep in mind that as of now, although the union is much weaker than it historically has been, it still has the tenure system. So any model, VAM or not, of evaluation is somewhat irrelevant for “removing bad teachers” given that they have tenure and tenure still means something.

Probably the best way to decouple the “Bloomberg vs. union/tenure” issue (a massive one here in NYC) from the “VAM versus other” question is to think nationally rather than citywide.

The truth is, the VAM is being tried out all over the country (although I don’t have hard numbers on this) and the momentum is for it to be used more and more. I predict within 10 years it will be done systematically everywhere in the country.

And, sadly, that’s kind of my prediction whether or not the underlying model is any good or not! The truth is, there is a large contingent of technocrats who want control over the evaluation system and believe in the models, whether or not they are producing pure noise or not. In other words, they believe in “data driven decisioning” as a holy grail even though there’s scant evidence that this will work in schools. And they also don’t want to back down now, even though the model sucks, because they feel like they’ll be losing momentum on the overall data-driven approach.

One thing I know for sure is that we should continue to be aware of how badly the current models are, and I want to set up an open source version of the models (see this post to get an idea how it could work) to exhibit that. In other words, even if we don’t turn off the models altogether, can’t we at least minimize their importance while their quality is bad? The first step is to plainly exhibit how bad they are.

It’s hard for me to decide what to do next, though. I’m essentially a modeler who is hugely skeptical of models. In fact, I don’t think using purely quantitative models to evaluate teachers is the right thing to do, period. Yet I feel like if it’s definitely going to happen, better for people like me to be in the middle of it, pointing out how bad the proposed (or in use) models are actually performing, and improving them.

One thing I know I’d do if I were to be put in charge of creating a better model: I’d train on data where the teacher is actually rated as a good teacher or not. In other words, I wouldn’t proxy “good teacher” by “if your students scored better than expected on tests”. A good model would be trained on data where there would be an expert teacher scorer, who would go into 500 classrooms and carefully evaluate the actual teachers, based on things like whether the teacher asked questions, or got the kids engaged, or talked too much or too little, or imposed too much busy work, etc. Then the model would be trying to mimic this expert.

Of course there are lots of really complicated issues to sort out- and they are *totally unavoidable*. This is why I’m so skeptical of models, by the way: people think you can simplify stuff when you actually can’t. There’s nothing simple about teaching and whether someone’s a good teacher. It’s just plain complex. A simple model will be losing too much information.

Here’s one. Different people think good teaching is different. A possible solution: maybe we could have 5 different “expert models” based on different people’s definitions of good teaching, and every teacher could be evaluated based on every model. Still need to find those 5 experts that teachers trust.

Here’s another. The kind of teacher-specific attributes collected for this test would be different from the VAM- things that happen inside a classroom (like percentage of time teacher talks vs. student, the tone of the discussion, the number and percentage of kids involved in the discussion, etc,) and are harder to capture accurately. These are technological hurdles that are hard.

I think one of the most important questions is whether we can come up with an evaluation system that would be sufficiently reasonable and transparent that the teachers themselves would get on board.

I’d to hear more ideas.

Categories: data science, math education, rant, statistics

Comments (13)

Japheth Wood

March 10, 2012 at 9:41 am

An alternative already exists in teacher evaluation rubrics. For example, check out http://ecologyofeducation.net/wsite/wp-content/uploads/2009/09/teacher-eval-rubrics-may-16-09.pdf
The upside is that these rubrics offer rich data that teachers value, and can use to improve their practice. The downside is that it’s labor intensive to send out experienced field supervisors or teaching coaches to visit the classrooms.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 11, 2012 at 6:53 am
  
  Japheth,
  
  That rubric is probably fine, but it’s not what I mean. I mean that we’d need to collect quantitative information about people that would, through an “expert”-trained model, be able to infer the scores in various categories. In other words, after training the model, in order to apply it to a given teacher we’d have to be doing less work than evaluating the teacher all over again, or it’s too expensive (as you mentioned).
  
  I’m not saying it’s going to work, by the way.
  
  Cathy
  
  LikeLike
AZ

March 10, 2012 at 10:47 am

I think the one really accurate way of assessing whether teachers are good is to ask their students 5-10 years afterwards. Unfortunately, this takes a pretty long time so it wouldn’t work for all purposes. However, a fair number of teachers do stay in teaching for that long, so there might be some possibility for using this evaluation method. Also, it could be used to train a model, even if it’s not used directly for evaluation.

LikeLike
- Dan L
  
  March 11, 2012 at 9:41 pm
  
  I agree with this 100%. I think that a delay of 3-5 years would be sufficient. The logistics might be a bit tricky, but in a place like NYC it shouldn’t be very hard. It would definitely be easier and cheaper than any evaluation system based on standardized testing.
  
  LikeLike
isotropy

March 10, 2012 at 5:23 pm

Discard the idea of identifying “good” teachers with the model – there are over three million public school teachers, and it would take forever to retrain everybody even if we knew what they should all be doing differently. How does the data look if you only consider the tails and throw out the rest? Can the model reliably identify the very worst performers? If we can’t even train a classifier to separate the 99th percentile from the 1st, then we know it’s hopeless. If we can, then we should focus alleviating the real problem that makes parents and voters angriest – the “rubber room” types, the small number of genuinely problematic people who need serious help and retraining, but are protected by tenure and shuffled by principals from school to school.

LikeLike
- Frank Miata
  
  March 10, 2012 at 11:37 pm
  
  Dear I,
  You seem to think that what is going on is an honest attempt to improve the educational experience in the city’s school as well as the nation’s schools at large.. This ‘reform movement” is smoke and mirrors to cover the more general decay in the quality of life of the whole society. The economic inequality that underlies the educational debate is never addressed by these bean counters.
  Teachers are attacked because they are unionized, heavily democratic, a soft target, close at hand to the public. The evaluation debate is a masked power struggle to destroy a strong, democratic constituency. The earnest educational futurists are just silly in this political war.
  Frank
  
  LikeLike
  - Cathy O'Neil, mathbabe
    
    March 11, 2012 at 6:17 am
    
    Frank,
    
    I’m afraid you are right. What you say does seem more consistent with what has been happening than the alternative that they are genuinely trying to figure out how to improve education. Then what is a nerd to do to help?
    
    Cathy
    
    LikeLike
    - chitownmama
      
      March 11, 2012 at 10:49 am
      
      I think it is the responsibility of the mathematically skilled and literate to point out the inadequacies and wrong-headedness in VAM and to advocate for the views of actual experts in education evaluation. Here is a great post about teacher evaluation from an expert (National Board Certified, taught for years, trains teachers): http://blogs.edweek.org/teachers/living-in-dialogue/2012/03/vam_has_driven_data_off_a_clif.html
      Note that very little of his description of evaluation is quantitative.
      
      Just because things like statistically trainable algorithms exist does not mean they are an appropriate technology to be applied to every task. Even here in Chicago in the third biggest school district in the US where there are 23,000 teachers, there is no need to use a computer to make hiring/firing/evaluation decisions. If a principal or other direct supervisor knows so little about their supervisees’ performance that they need to use a computer to evaluate them, there are bigger issues than what are the right set of variable to be plugging in to an evaluation metric.
      
      Frank is absolutely 100% right. Everyone in the US with any interest in education, but particularly parents with children in the public school system, should read Diane Ravitch’s 2010 book, The Death and Life of the Great American School System: How Testing and Choice Are Undermining Education. In a vague sort of way I knew that NCLB had made a complete mess of the public school system, but it was still a very, very eye-opening read. (Here is something specifically about VAM from her: http://voices.washingtonpost.com/answer-sheet/diane-ravitch/ravitch-the-pitfalls-of-puttin.html)
      
      LikeLike
David Austin

March 12, 2012 at 12:36 am

A heavily qualified defense of VAM is:

Douglas N. Harris, Value-Added Measures in Education (Harvard Education Press, 2011)
http://www.hepg.org/hep/book/132/ValueAddedMeasuresInEducation
http://www.hepg.org/blog/39

Criticism of VAM from someone with relevant professional experience:

Howard Wainer, “Assessing Teachers from Student Scores: On the Practicality of Value-Added Models,” Chapter 9, Uneducated Guesses: Using Evidence to Uncover Misguided Education Policies (Princeton UP, 2011)
http://press.princeton.edu/titles/9529.html
http://chance.amstat.org/2011/02/value-added-models/

A study purporting to find a positive effect on future earnings of students taught by those who would get higher VAM scores

Click to access STAR.pdf

http://obs.rc.fas.harvard.edu/chetty/value_added.html
http://obs.rc.fas.harvard.edu/chetty/index.html

has received some attention in the national press:

LikeLike
unreasonableresponse

March 12, 2012 at 11:48 am

VAM versus whatever you find that has well defined controllable levers highly correlated with required outputs.

LikeLike
ScentOfViolets

March 13, 2012 at 9:42 am

I’m coming in late, but here’s my take: there’s nothing wrong with VAMs. Models are all we have after all.

No, the problem is that the people who want to apply them also want to do the whole retool on the cheap. You want to rate teachers based upon certain defined outcomes, how well they educate their students? Fine. Then spend the money to actually find out what those students have learned modulo their personal circumstances. Shouldn’t cost more than a few hundred to a thousand dollars a student to generate these accurate evaluations and no more than ten to one hundred thousand specially trained new hires to conduct the individual assessments 😉

If the reformers are willing to commit the resources to obtain an accurate data set, I’ll take their professed motivations at face value. Otherwise, I think it’s safe to assume that their stated goals are not their real ones.

LikeLike
ScentOfViolets

March 13, 2012 at 10:04 am

Just so my own agenda is perfectly clear: my sense is that any evaluation that doesn’t control for how much time the student spends outside of class doing their homework has no purpose other than to club yet more concessions out teachers. Case in point; my own daughter in grade 7 was independently tested to have the mathematical facility of a 3rd grader. The kicker was that she was making a straight A in her supposedly “advanced” math classes. The solution? We enrolled her in Kumon (the Singapore math method, I believe it’s billed) for two years. And while we got the spiel about the systems supposedly advanced pedagogical techniques, what it amounted to was hours and hours of drill, drill, drill.[1] The sort of math homework, in short, that was standard when I was in K12 some forty years ago.

Iow, the sort of homework which it seems so many parents these days aren’t willing to put up with from their children’s teachers. So let’s put the blame where it lies just this once, m’kay?

[1]Which after much crying, pleading and tantrums had the happy result of instilling a basic number sense such that multiplying two fractions less than one together on her calculator and getting a number greater than one was immediately picked up on as a mistake . . . something some of my supposedly “advanced” college kids can’t seem to do, sigh.

LikeLike
ZeroInMyOnes

March 22, 2012 at 9:23 am

The teacher rating industry is a make-work ploy for business oriented workers, the ‘raters’, to skim out government pay for themselves, leaving less money to pay for actual teachers to teach our kids.

And we can assume that each worker who occupies an omnipotent position of ‘teacher rater’ is going to demand pay higher than that of a lowly teacher, and thus our per-worker cost of educating our kids will rise.

LikeLike