Eugene Stern: How Value Added Models are Like Turds

This is a guest post by Eugene Stern, originally posted on his blog


“Why am I surrounded by statistical illiterates?” — Roger Mexico in Gravity’s Rainbow

Oops, they did it again. This weekend, the New York Times put out this profile of William Sanders, the originator of evaluating teachers using value-added models based on student standardized test results. It is statistically illiterate, uses math to mislead and intimidate, and is utterly infuriating.

Here’s the worst part:

When he began calculating value-added scores en masse, he immediately saw that the ratings fell into a “normal” distribution, or bell curve. A small number of teachers had unusually bad results, a small number had unusually good results, and most were somewhere in the middle.

And later:

Up until his death, Mr. Sanders never tired of pointing out that none of the critiques refuted the central insight of the value-added bell curve: Some teachers are much better than others, for reasons that conventional measures can’t explain.

The implication here is that value added models have scientific credibility because they look like math — they give you a bell curve, you know. That sounds sort of impressive until you remember that the bell curve is also the world’s most common model of random noise. Which is what value added models happen to be.

Just to replace the Times’s name dropping with some actual math, bell curves are ubiquitous because of the Central Limit Theorem, which says that any variable that depends on many similar-looking but independent factors looks like a bell curve, no matter what the unrelated factors are. For example, the number of heads you get in 100 coin flips. Each single flip is binary, but when you flip a coin over and over, one flip doesn’t affect the next, and out comes a bell curve. Or how about height? It depends on lots of factors: heredity, diet, environment, and so on, and you get a bell curve again. The central limit theorem is wonderful because it helps explain the world: it tells you why you see bell curves everywhere. It also tells you that random fluctuations that don’t mean anything tend to look like bell curves too.

So, just to take another example, if I decided to rate teachers by the size of the turds that come out of their ass, I could wave around a lovely bell-shaped distribution of teacher ratings, sit back, and wait for the Times article about how statistically insightful this is. Because back in the bad old days, we didn’t know how to distinguish between good and bad teachers, but the Turd Size Model™ produces a shiny, mathy-looking distribution — so it must be correct! — and shows us that teacher quality varies for reasons that conventional measures can’t explain.

Or maybe we should just rate news articles based on turd size, so this one could get a Pulitzer.

Trump’s Path-Independent Theory of Mind

My newest Bloomberg View Column:

Donald Trump’s Path-Independent Theory of Mind: How the U.S. president is like a Google ad test

You can see all of my Bloomberg View columns here.

Unreliable Data Can Threaten Democracy

My newest Bloomberg Column about politically driven data finagling:

Unreliable Data Can Threaten Democracy

Also, you can see all my Bloomberg columns here.

100 Day Blanket

I’m a bit behind with posting my latest gargantuan knitting project. I call it the 100 Day Blanket because I bought the yarn on the day after the election in an effort to counterbalance my wildly unbalanced thoughts and emotions, and I finished it 100 days after the inauguration. It was a very successful coping mechanism for anxiety.

Given that it has 144 squares in it, and that there were about 10 weeks in between the election and inauguration, that means I knitted nearly one square on average. Actually it took me a couple of weeks to gather the courage to put it all together so I’d say I really did just continuously knit for a while there.

Because, dude, that’s a lot of nervous energy. I should also mention that I knitted numerous pussy hats and other smaller projects during that same period. Serious question, what do non-knitters do to deal with their anxiety?

Without further ado, the 100 Day Blanket:


Please don’t look too carefully at our messy side tables.

Here’s a glamour shot:


And a couple of shots of putting it together:


This took place at our friends’ ‘Happy House’ upstate.

IMG_0389 (1)

One quarter at a time!


The VAM Might Finally be Dead

My latest Bloomberg View column, probably my favorite so far:

Don’t Grade Teachers With a Bad Algorithm

I’d Rather Not Merge With Robots, Thank You

My newest column in Bloomberg View, in which I argue that Yuval Harari is putting us all on:

I’d Rather Not Merge With Robots, Thank You

Anonymous Guest Post: Mentorship Problems for Women in Tech

This is an anonymous guest post.

Mentorship is important in any field. In the tech industry, it is essential. In tech, one’s network is key for learning about the existence of smaller startups, where the financial upside is often higher than at big companies due to stock grants. For a culture that emphasizes meritocracy so heavily, tech is much more of a who-is-who than I ever realized before moving out to the Bay Area as an engineer last year. Not only that, but it is especially difficult to access this network as a woman.   I believe that the informal culture of tech, in which professional and social mix to an extent that it is unclear whether an interaction is professional or romantic, harms women in finding mentorship. Ultimately, those with real power and influence in Silicon Valley are in a network of their own.

I learned this firsthand when I met with my first Very Important Person (VIP). This VIP invited me to meet at The Battery, described on its website as a “unique sort of social destination” featuring “an eager, inquisitive bunch, always curious, always on the hunt for new ideas and problems to solve…Here is where they came to refill their cups. To tell stories. To swap ideas. To eschew status but enjoy the company of those they respected. Here is where they came to feel at home on an evening out.” For an easy annual payment of $2400.

I was initially surprised when this VIP decided to meet with me, given how difficult I had found it to get face time with anyone. I was even more surprised when he talked at me for nearly an hour (ignoring my pre-prepared questions), until his next meeting – a tall blonde girl – arrived. Being just out of college and naive, I thought nothing of it, though he did reference how he “just wanted to get laid in college” during our meeting – until the emails and texts started coming. Over the course of the next month, I received email after email from this person, to all three of my email addresses which he somehow got, and later to my cell phone, saying “wanted to see me again” among other things. I will never be 100 percent sure about his intent. At the same time, why on earth would a VIP be so interested in seeing me again?

Whatever his intent, I am confident that it wasn’t mentorship. Despite my having prepared specific questions for our meeting that I wanted advice on, he instead talked at me for the full hour. I think that was the most upsetting piece of it for me. I wanted mentorship, and instead ended up getting weird emails and texts.

I am not the only one of my friends with a Battery story. I’ve been told that there is a secret bar behind the regular bar, which is where things get really weird.

This VIP is certainly an outlier. Only a small fraction of men have creepy intent. And yet, I am sure that plenty of white men aged 35 to 50 (the “older generation” by tech standards that I am trying to access) probably don’t want to talk to me for precisely that reason. Getting coffee with a young woman can look like a date even if it is not, and men in positions of power are especially wary of sexual harassment allegations.

I believe that the informal culture of hoodies and happy hours makes it more difficult for women to access mentorship. A college classmate who works in politics remarked that senior people in politics are more willing to chat with her, sometimes for hours. The informal culture of tech, in which men frequently grab a drink with a male mentor but often do not feel comfortable doing the same with a woman, means that it is difficult for women to get access. At least in politics, it is more clear whether a mentor is inappropriately hitting on you in a professional setting, because that setting is clearly professional.

What about senior female mentors? I have pursued this strategy as well with some limited success, but feel that there are simply not enough senior women to go around for this to be a viable solution. Attrition rates, coupled with the fact that this industry was far more hostile to women 10+ years ago, means that senior women are few and far between, as well as stretched thin. It is essential to connect with mentors of all genders.

I am enormously grateful to those who have provided me with mentorship, including peers just a year or two above me who have helped fill in some of the gaps. That being said, I feel that as long as succeeding in tech involves being well-connected in a way that women and minorities in tech are not, diversity in the industry will stall. The past year has felt more like The Social Network than I ever could have imagined – creepy but well-connected mentors, hiring decisions made over drinks, and all.

