The “One of Many” Fallacy

Home > Uncategorized > The “One of Many” Fallacy

The “One of Many” Fallacy

September 30, 2016 Cathy O'Neil, mathbabe

I’ve been on book tour for nearly a month now, and I’ve come across a bunch of arguments pushing against my book’s theses. I welcome them, because I want to be informed. So far, though, I haven’t been convinced I made any egregious errors.

Here’s an example of an argument I’ve seen consistently when it comes to the defense of the teacher value-added model (VAM) scores, and sometimes the recidivism risk scores as well. Namely, that the teacher’s VAM scores were “one of many considerations” taken to establish an overall teacher’s score. The use of something that is unfair is less unfair, in other words, if you also use other things which balance it out and are fair.

If you don’t know what a VAM is, or what my critique about it is, take a look at this post, or read my book. The very short version is that it’s little better than a random number generator.

The obvious irony of the “one of many” argument is, besides the mathematical one I will make below, that the VAM was supposed to actually have a real effect on teachers assessments, and that effect was meant to be valuable and objective. So any argument about it which basically implies that it’s okay to use it because it has very little power seems odd and self-defeating.

Sometimes it’s true that a single inconsistent or badly conceived ingredient in an overall score is diluted by the other stronger and fairer assessment constituents. But I’d argue that this is not the case for how teachers’ VAM scores work in their overall teacher evaluations.

Here’s what I learned by researching and talking to people who build teacher scores. That most of the other things they use – primarily scores derived from categorical evaluations by principals, teachers, and outsider observers – have very little variance. Almost all teachers are considered “acceptable” or “excellent” by those measurements, so they all turn into the same number or numbers when scored. That’s not a lot to work with, if the bottom 60% of teachers have essentially the same score, and you’re trying to locate the worst 2% of teachers.

The VAM was brought in precisely to introduce variance to the overall mix. You introduce numeric VAM scores so that there’s more “spread” between teachers, so you can rank them and you’ll be sure to get teachers at the bottom.

But if those VAM scores are actually meaningless, or at least extremely noisy, then what you have is “spread” without accuracy. And it doesn’t help to mix in the other scores.

In a statistical sense, even if you allow 50% or more of a given teacher’s score to consist of non-VAM information, the VAM score will still dominate the variance of a teacher’s score. Which is to say, the VAM score will comprise much more than 50% of the information that goes into the score.

An extreme version of this is to think about making the non-VAM 50% of a teacher’s score always exactly the same. Denote it by 50. When we take the population of teacher VAM scores and average them with 50, the population of teacher VAM scores are now between 25 and 75, instead of 0 and 100, but besides being squished into a smaller range, they haven’t changed with respect to each other. Their relative rankings, in particular, do not change. So whoever was unlucky enough to get a bad VAM score will still be on the bottom.

Screen Shot 2016-09-30 at 6.30.44 AM.png

y=(x+50)/2

This holds true for other choices of “50” as well.

A word about recidivism risk scores. It’s true that judges use all sorts of information in determining a defendant’s sentencing, or bail, or parole. But if one of the most trusted and most statistically variant ones is flawed – and in this case racist – then a similar argument to the above could be made, and the conclusion would be as follows: the overall effect of using flawed recidivism risk scores is stronger, rather than weaker, than one might expect given its weighting. We have to be more worried about it, not less.

Categories: Uncategorized

Comments (17)

howardat58

September 30, 2016 at 7:02 am

Reblogged this on Saving school math and commented:
Nice one!

LikeLike
medicalquackblog

September 30, 2016 at 9:28 am

Just saw a write up on the book at the Boston Globe yesterday. When I read this and saw the word “value” again, it’s like what you might call the “Value Cult”. It’s everywhere with using that same terminology, in other words “value” seems to be the key word being used to get folks to buy in to the models. It is all over healthcare and sure there are models that help but it seems some of this is of course going over the top with selling value perception as a goal, but are the goals actually attainable and what do people physically have to do to attain them? I see that as being a big problem as people are getting frustrated and mad with a lot of models, especially now that artificial intelligence is being sprinkled on top.

Medical billing models for one are driving everyone nuts as there’s now a new patent pending model that takes the codes from a medical visit from the doctor, simulates the data entered and determines if the reimbursement for the visit matches a “group” in enough areas to include MD notes and more to allow the insurer to down code a claim to a lower value and pay less. This hits both doctors and hospitals so it becomes a “gaming” mechanism for them to make sure they don’t fall into the pit when billing, and the models tend to change quite a bit based on using machine learning to create new groups, etc. to compare against…yeah keep pushing this awareness to look for bad math!

LikeLike
James Gray

September 30, 2016 at 11:08 am

Reblogged this on jamesgray2.

LikeLike
mike_bader

September 30, 2016 at 12:24 pm

I find it ironic that the industry so enamored with psychometric tests of “achievement” can’t figure out a better psychometric assessment to use when evaluating personnel.

LikeLike
davidwlocke

September 30, 2016 at 8:44 pm

The VAM scores exist so teachers can lose their pensions. Ed reform is really Ed deform. There never was any intention of improving the education of our kids.

LikeLike
Ori

October 1, 2016 at 10:31 pm

Do serious, sensible people use the “one of many” argument to defend flawed scores like VAM? Perhaps I’m just not getting the argument, but aren’t they saying that if something is “just a little stupid” (rather than “truly awfly moronic”), then it’s smart to do it?

It’s like saying that adding a few anchovies to a choclate cake is sensible because it won’t change its taste much. That’s a very weak argument for doing so.

This is even before taking into account the important argument you’re presenting here: we’re not just adding anchovies to the mix, we’re adding arsenic. If it’s the decisive ingredient, the other ingredients might help disguise the poison, but they won’t counter its effect.

LikeLike
- Aaron Lercher
  
  October 3, 2016 at 11:53 am
  
  The explanation is on page 10 of “Weapons of Math Destruction.” A low ranking by VAM is interpreted as a likelihood of being a bad teacher. The claim is made that it’s just a probability or, more vaguely, “one factor.”
  Yet when anyone fights back, the countervailing evidence must be ironclad, since anyone arguing against a bad rank is seen as engaging in special pleading. Teachers, for example, are said to be protecting themselves from accountability: “Unions bad bad bad!”
  The only effective arguments are systematic, such as in “Weapons.” But few people are capable of articulating such complex arguments. It’s not easy, but Cathy makes it as easy as possible in her book.
  It’s as if some people are bakers, while the rest are consumers. Then the bakers say that “just a little” arsenic in a cake is good for all of us, because it makes us stronger or weeds out the weak, and many of us believe it.
  
  LikeLike
Patrick Honner

October 2, 2016 at 6:48 pm

This is as much about political re-branding as it is about fallacious arguments. Initially the major education reformers argued that VAM was the most important component of fixing education. As VAM was slowly and consistently shown to be invalid, the argument quietly shifted to “VAM is just one component”. Those who originally made the case for VAM as essential pivoted to “Well, no one ever said we should *only* use VAM”.

It’s this kind of political maneuvering–with VAM, as well as with metrics for student achievement and school improvement–that has prevented politicians and reformers from being held accountable for their own failed policies.

LikeLike
LArs

October 3, 2016 at 3:46 pm

” if those VAM scores are actually meaningless, or at least extremely noisy, then what you have is “spread” without accuracy.”

I think this confuses the issues of accuracy and precision.

The fact that a teacher’s VAM score can change wildly from one year to the next and even during a single year from one value added model (or student standardized test) to another is really a statement of lack of precision.

Accuracy has to do with how close a given score is to the “true” score, but it’s not even clear what “true” score is/means in the case of VAM.

The whole VAM construct seems extremely confused and ill-defined, if not downright circular.

Good/effective teachers are defined as those who get high VAM scores and “poor/ineffective” teachers as ones who get low scores. So, of course, those who get high VAM scores are going to be rated “good” teachers — and those who get low scores are going to rated “poor” teachers.

That a teacher can go from “good” one year to “poor” the next (or vice versa) is a sure sign that there is something seriously amiss. But for some very odd reason, such blatant contradiction does not even seem to phase the people who develop and push VAM.

Such inconsistency should actually be extremely disconcerting — to say nothing of embarrassing — for the people who develop the models. It means that their models are basically crap.

How people can spend their careers working on such crap is a mystery bigger than life itself.

LikeLike
Guest2

October 4, 2016 at 6:51 am

VAM is just the tip of the proverbial iceberg, unfortunately.

Take IQ tests — they were designed to differentiate among individuals in the same way as VAM; they were designed to produce the same kind of distribution as output.
They were specifically designed to, because it was a priori postulated that something called intelligence was spread out across the population in a measureable way, just like weight and height, and that this was important to be known by adminstrators.

This was the key idea of Francis Galton, that this would mimic nature, and now the Galtonian legacy haunts our age as never before, from standardized testing of every kind, to every kind of statistics.

Columbia University played an important role in spreading these ideas, and there is an excellent essay on this, but the best source for how the ‘well of knowledge’ has been permanently poisoned (and why) is Donald A. MacKenzie’s Statistics in Britian, 1865-1930 (1981). There is always Stephan Jay Gould, and Diane Paul has a lovely little book on the subject as well.

LikeLike
Bryan

October 4, 2016 at 4:43 pm

I couldn’t agree with you more, but have one additional caveat; especially related to recidivism. People aren’t a coin toss. I think too many people (policy makers in particular) assume people’s behavior is like a random act. For example, years ago, a university professor came out with a study claiming football coaches should go for it way more than they do on 4th down. He looked at the success rates and determined that the chances of making a first down were too good to pass up. Well worth the risk. Of course going for it on 4th down isn’t a random, flip of the coin chance. It’s the result of a tremendous amount of study, planning, and practice. Coaches go for it when they’re pretty sure they can make it. My point is, I hope, is that people’s actions aren’t random, independent trials subject to laws of probability. The use of data and statistics in social studies need to reflect that there is a human element (independent, free thought; emotions) that are very difficult to quantify.

LikeLiked by 1 person
Flanigan Sean

October 5, 2016 at 7:05 pm

I don’t think it is credible to attack a thesis per se, but it’s supportive elements. Just imagine the explosion of combinations of reading an article, then reading its citations and the reading its citations and on and on. That is a lot of work. It is easy to poke holes in anyone’s thesis based upon that alone because they could cite one single research design in attempt to construe it as a house of cards.

LikeLike
- Lloyd Lofthouse
  
  October 9, 2016 at 11:43 am
  
  VAM is a thesis – do you mean theory? For centuries there was that flat earth theory and anyone who dared to disagree was persecuted by the Church.
  
  Ask Copernicus and Galileo what that was like.
  
  The VAM “thesis” is just as valid as the Flat Earth “Thesis” and the autocratic corporate education industry and its supporters all the way to the White House that celebrates when teachers are fired and public schools closed is similar to the Church back when the world was still flat before scientists inflated it into a sphere.
  
  LikeLike
Lloyd Lofthouse

October 8, 2016 at 11:34 am

Reblogged this on Crazy Normal – the Classroom Exposé and commented:
We must stop Bill Gates!

LikeLike
Lloyd Lofthouse

October 8, 2016 at 11:36 am

Trump isn’t the only (alleged) billionaire to stop. Bill Gates, the Koch brothers, the Walmart Walton family, Eli Broad, etc. Back in the 18th century, the French showed the world how to do it.

LikeLike
Madelyn Griffith-Haynie, MCC, SCAC

October 8, 2016 at 1:46 pm

“There are three kinds of lies: lies, damned lies, and statistics.” ~ attributed it to the British Prime Minister Benjamin Disraeli.

xx,
mgh
(Madelyn Griffith-Haynie – ADDandSoMuchMore dot com)
– ADD Coach Training Field founder; ADD Coaching co-founder –
“It takes a village to educate a world!”

LikeLike
wgersen

October 9, 2016 at 6:11 am

I’m late to the game here, but I have to believe that some statistician in DC knew all of this and still signed off on the concept because VAM is cheap, easy, and intuitively appealing. They also made the accurate political calculation that anyone who pushed back would be seen as either a union apologist or a pointy-headed and fuzzy thinking Ivy League statistician.

LikeLike