Value-added model doesn’t find bad teachers, causes administrators to cheat

Home > data science, math education, modeling, rant, statistics > Value-added model doesn’t find bad teachers, causes administrators to cheat

Value-added model doesn’t find bad teachers, causes administrators to cheat

March 31, 2013 Cathy O'Neil, mathbabe

There’ve been a couple of articles in the past few days about teacher Value-Added Testing that have enraged me.

If you haven’t been paying attention, the Value-Added Model (VAM) is now being used in a majority of the states (source: the Economist):

But it gives out nearly random numbers, as gleaned from looking at the same teachers with two scores (see this previous post). There’s a 24% correlation between the two numbers. Note that some people are awesome with respect to one score and complete shit on the other score:

Final thing you need to know about the model: nobody really understands how it works. It relies on error terms of an error-riddled model. It’s opaque, and no teacher can have their score explained to them in Plain English.

Now, with that background, let’s look into these articles.

First, there’s this New York Times article from yesterday, entitled “Curious Grade for Teachers: Nearly All Pass”. In this article, it describes how teachers are nowadays being judged using a (usually) 50/50 combination of classroom observations and VAM scores. This is different from the past, which was only based on classroom observations.

What they’ve found is that the percentage of teachers found “effective or better” has stayed high in spite of the new system – the numbers are all over the place but typically between 90 and 99 percent of teachers. In other words, the number of teachers that are fingered as truly terrible hasn’t gone up too much. What a fucking disaster, at least according to the NYTimes, which seems to go out of its way to make its readers understand how very much high school teachers suck.

A few things to say about this.

Given that the VAM is nearly a random number generator, this is good news – it means they are not trusting the VAM scores blindly. Of course, it still doesn’t mean that the right teachers are getting fired, since half of the score is random.
Another point the article mentions is that failing teachers are leaving before the reports come out. We don’t actually know how many teachers are affected by these scores.
Anyway, what is the right number of teachers to fire each year, New York Times? And how did you choose that number? Oh wait, you quoted someone from the Brookings Institute: “It would be an unusual profession that at least 5 percent are not deemed ineffective.” Way to explain things so scientifically! It’s refreshing to know exactly how the army of McKinsey alums approach education reform.
The overall article gives us the impression that if we were really going to do our job and “be tough on bad teachers,” then we’d weight the Value-Added Model way more. But instead we’re being pussies. Wonder what would happen if we weren’t pussies?

The second article explained just that. It also came from the New York Times (h/t Suresh Naidu), and it was a the story of a School Chief in Atlanta who took the VAM scores very very seriously.

What happened next? The teachers cheated wildly, changing the answers on their students’ tests. There was a big cover-up, lots of nasty political pressure, and a lot of good people feeling really bad, blah blah blah. But maybe we can take a step back and think about why this might have happened. Can we do that, New York Times? Maybe it had to do with the $500,000 in “performance bonuses” that the School Chief got for such awesome scores?

Let’s face it, this cheating scandal, and others like it (which may never come to light), was not hard to predict (as I explain in this post). In fact, as a predictive modeler, I’d argue that this cheating problem is the easiest thing to predict about the VAM, considering how it’s being used as an opaque mathematical weapon.

Categories: data science, math education, modeling, rant, statistics

Comments (22)

RZ0

March 31, 2013 at 9:03 am

One thing to always remember is that it costs money to perform these evaluations, particualarly the classroom observation. The Times story is not clear, but it sounds like Florida at one time based the observation grade on 20 minutes of observation a year. Given the small sample (20 minutes out of, say, 1,000 hours of classroom instruction per year), it would be difficult to grant any other grade but pass.
I wonder what a person would have to do to fail. Shoot a kid?
Meanwhile, if you base an evaluation on, say, 10 hours of work a year, a high school would need another FTE just to perform evaluations – a cost of $100,000 a year minimum (counting benefits) – pure overhead.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 31, 2013 at 9:47 am
  
  I’m ready for sensors to get cheap so we can use them in classroooms.
  
  LikeLike
  - UW
    
    April 1, 2013 at 10:34 am
    
    What kind of sensors? Can you elaborate?
    
    LikeLike
jim cooper

March 31, 2013 at 12:54 pm

With the passage into law WI act 166 requires wi teacher evaluations tied to test scores. This was part of gov. Walkers first round of reforms known as read to lead which was the committee he created and appointed himself chairman. Also on this committee was VARC or value added research center of uw madison.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 31, 2013 at 1:04 pm
  
  Thanks. Do you know a way to get your hands on the underlying models?
  
  LikeLike
  - UW
    
    April 1, 2013 at 10:17 am
    
    This is their web site: http://varc.wceruw.org/
    
    LikeLike
Matt DeLand (@deland)

March 31, 2013 at 1:19 pm

If we care about the output of a human driven process, we should focus on the inputs. Judging based on outputs will shift attention and energy away from the real problems, and is not helpful in the long run. Focusing on outputs is easy – but results in just that, a focus on the outputs.

LikeLike
MiaH

March 31, 2013 at 5:14 pm

I’m an elementary school teacher and I’ve followed your posts about VAM with a lot of interest–thank you for your work and advocacy! In your last post (March 6), you said there was a 24% correlation between the two scores. Is this looking at a different data set, or is it just a typo in today’s post?

LikeLike
- Cathy O'Neil, mathbabe
  
  April 11, 2013 at 2:49 pm
  
  Thanks, I fixed it. 24% is correct.
  
  LikeLike
beewhy2012

March 31, 2013 at 9:22 pm

The Brookings boffins say, ““It would be an unusual profession that at least 5 percent are not deemed ineffective.” I’d love to know where that stat comes from.
One informative article on this subject can be found here (http://www.huffingtonpost.com/leonie-haimson/factchecking-waiting-for-_b_802900.html) where it is reported that in the case of Illinois the estimated annual rate of disbarment for lawyers is something like 0.05% and loss of license for doctors is 0.3%.
The numbers of teachers denied tenure, counseled out or being fired on an annual basis is harder to come by for Illinois, but ranges from 0.2% in Huston to a relatively steep 10% – reported as 40% over 4 years (have I got that right?) – in NYC.
A system that sets a predetermined number of apprehended incompetents – how many “should” be fired/denied a license in any given year – is bound to be prey to cheating, fudging, prevarication on both sides especially when that number is unrealistically high. Properly evaluating teachers is a difficult game and, as one commenter above has noted, would cost far more that most systems are willing to pay.
That begs the question of the purpose of an evaluation system itself. If the ed biz practised what it preaches and truly strove for some sort of universal or wide-ranging success rate among students, then evaluation of teachers should more properly be aimed at improving the performance of as many as possible instead of simply dumping those whose effectiveness as assessed by a VAM is evidently contingent on random conditions that can vary so widely from one year to the next.

LikeLike
Doug K

April 1, 2013 at 1:38 pm

following the slow-moving crisis of Education in America leaves me feeling like a nutcase conspiracy theorist, bring me my tin-foil helmet, stat!
..
but it’s hard to see imbecilities like VAM foisted upon an unsuspecting public, and reconcile this with any kind of good faith explanation for why it is happening.
It seems to be a billionaire-driven scheme to turn education into a profit center, like US healthcare (since that is working so well).

Overview at:

A Confederacy of Reformers

Teacher perspective:
http://edgator.com/?p=283

As beewhy2 notes, Leonie Haimson is doing a good job of covering the wreckage.

LikeLike
dotkaye

April 1, 2013 at 1:43 pm

following the slow-moving crisis of Education in America leaves me feeling like a nutcase conspiracy theorist, bring me my tin-foil helmet, stat!
..
but it’s hard to see imbecilities like VAM foisted upon an unsuspecting public, and reconcile this with any kind of good faith explanation for why it is happening.
It seems to be a billionaire-driven scheme to turn education into a profit center, like US healthcare (since that is working so well).

Overview at:

A Confederacy of Reformers

Teacher perspective:
http://edgator.com/?p=283

As beewhy2 notes, Leonie Haimson is doing a good job of covering the wreckage as it piles up..

LikeLike
dotkaye

April 1, 2013 at 2:15 pm

two more comments on VAM as currently implemented in education:

Is this what the future holds? Or are some of you living it now? http://t.co/OS8ckxilz5—
Arthur Goldstein (@TeacherArthurG) April 01, 2013

Tennessee is planning to use VAM scores to deny welfare to families with underperforming children. Dickens would not have thought to make a villain so cruel..
http://www.knoxnews.com/news/2013/mar/31/bill-tying-student-performance-to-welfare-in/

LikeLike
- Cathy O'Neil, mathbabe
  
  April 1, 2013 at 3:23 pm
  
  Holy shit.
  
  LikeLike
  - beewhy2012
    
    April 1, 2013 at 10:14 pm
    
    Exactly! One step back to the Poor House and Debtor’s Prisons.
    
    LikeLike
Matt Erickson

April 2, 2013 at 12:28 am

ince we are now tweaking our mathematical model to fit our preconception about the number of bad teachers – rather than tweaking our preconceptions to fit the data – let’s extend this analogy to the student testing as well. Since any test will be discarded if it has near a 100% pass rate, we must admit that we will always insist some children be ‘left behind.’ If we can see how silly this tail-chasing exercise is in terms of evaluating teachers, perhaps those teachers might better understand how their students feel.

LikeLike
lovemyjob

April 6, 2013 at 1:11 pm

I teach in florida and we get 2 peer evaluations ( some one from district) and 2 principal evaluations. Two are full class time evals and 2 are 20 min “pop in” evals. Idk where NY times got their sources but it is not correct. If you are a new teacher you get even more evaluations. The evaluations do cost a lot of money and it depends on the peer and principal if the evaluations help or hurt a good teacher.

LikeLike
Allen Knutson

April 12, 2013 at 9:39 pm

Off-topic, but that scatter plot (!) reminds me of nothing so much as the classic Electron Band Structure In Germanium, My Ass.

LikeLike
- Cathy O'Neil, mathbabe
  
  April 12, 2013 at 9:42 pm
  
  nice!
  
  LikeLike
Al

April 28, 2013 at 4:12 am

Relying on VAM unhelpful and cruel to teachers, but if I were in the position of running a school system I might see it as my best option. To me the root cause of this nonsense is the intransigence of teachers unions. The current system encourages bad teachers, who don’t even want to be teachers anymore, to keep their jobs because their pay and benefits are tied to seniority. If, after 20 years as a teacher, you want to change to a different school system, or change professions, you probably won’t because no one else can offer you the same salary you’re currently getting, and you accepted low pay when you were longer to get it. I don’t think this system is good for anyone, not least the teachers who hate their jobs. Yet unions have been completely unwilling to back off of seniority-based pay. Even when pay is increased overall, but separated from seniority, and even when the changes are only applied to new teachers.

So, if I were in the position of running a school system, and someone offered me a system which produced a random number between 0 and 1 for each teacher, and if that number was below 0.1 I had the option of firing the teacher, I would take it. Not because I believed VAM was effective, but because it would give my a tiny modicum of control over a system which has been hijacked by a special-interest group.

LikeLike
- Cathy O'Neil, mathbabe
  
  April 28, 2013 at 6:53 am
  
  Keep in mind that the 10% of teachers being randomly chosen for getting fired in this system are not tenured. So it’s exactly the teachers you don’t want to get rid of that are leaving.
  
  LikeLike
eatingon1

July 8, 2013 at 12:20 pm

Thanks Mathbabe. You are now posted in the badass teachers association on facebook. Just keep driving home the math. That is what I do. I told them that it is the same as drawing straws.

LikeLike