Here’s an interesting paper I’m reading this morning (hat tip Suresh Naidu) entitled Chameleons: The Misuse of Theoretical Models in Finance and Economics written by Paul Pfleiderer. The paper introduces the useful concept of chameleon models, defined in the following diagram:
Pfleiderer provides some examples of chameleon models, and also takes on the Milton Friedman argument that we shouldn’t judge a model by its assumptions but rather by its predictions (personally I think this is largely dependent on the way a model is used; the larger the stakes, the more the assumptions matter).
I like the term, and I think I might use it. I also like the point he makes that it’s really about usage. Most models are harmless until they are used as political weapons. Even the value-added teacher model could be used to identify school systems that need support, although in the current climate of distorted data due to teaching to the test and cheating, I think the signal is probably very slight.
This recent NYTimes article entitled Health Researchers Will Get $10.1 Million to Counter Gender Bias in Studies spelled out a huge problem that kind of blows me away as a statistician (and as a woman!).
Namely, they have recently decided over at the NIH, which funds medical research in this country, that we should probably check to see how women’s health are affected by drugs, and not just men’s. They’ve decided to give “extra money” to study this special group, namely females.
Here’s the bizarre and telling explanation for why most studies have focused on men and excluded women:
Traditionally many investigators have worked only with male lab animals, concerned that the hormonal cycles of female animals would add variability and skew study results.
Let’s break down that explanation, which I’ve confirmed with a medical researcher is consistent with the culture.
If you are afraid that women’s data would “skew study results,” that means you think the “true result” is the result that works for men. Because adding women’s data would add noise to the true signal, that of the men’s data. What?! It’s an outrageous perspective. Let’s take another look at this reasoning, from the article:
Scientists often prefer single-sex studies because “it reduces variability, and makes it easier to detect the effect that you’re studying,” said Abraham A. Palmer, an associate professor of human genetics at the University of Chicago. “The downside is that if there is a difference between male and female, they’re not going to know about it.”
Ummm… yeah. So instead of testing the effect on women, we just go ahead and optimize stuff for men and let women just go ahead and suffer the side effects of the treatment we didn’t bother to study. After all, women only comprise 50.8% of the population, they won’t mind.
This is even true for migraines, where 2/3rds of migraine sufferers are women.
One reason they like to exclude women: they have periods, and they even sometimes get pregnant, which is confusing for people who like to have clean statistics (on men’s health). In fact my research contact says that traditionally, this bias towards men in clinical trials was said to protect women because they “could get pregnant” and then they’d be in a clinical trial while pregnant. OK.
I’d like to hear more about who is and who isn’t in clinical trials, and why.
I’ve been writing my book, and I’m on chapter 4 right now, which is tentatively entitled Feedback Loops In Education. I’m studying the enormous changes in primary and secondary education that have occurred since the “data-driven” educational reform movement started with No Child Left Behind in 2001.
Here’s the issue I’m having writing this chapter. Things have really changed in the last 13 years, it’s really incredible how much money and politics – and not education – are involved. In fact I’m finding it difficult to write the chapter without sounding like a wingnut conspiracy theorist. Because that’s how freaking nuts things are right now.
On the one hand you have the people who believe in the promise of educational data. They are often pro-charter schools, anti-tenure, anti-union, pro-testing, and are possibly personally benefitting from collecting data about children and then sold to commercial interests. Privacy laws are things to bypass for these people, and the way they think about it is that they are going to improve education with all this amazing data they’re collecting. Because, you know, it’s big data, so it has to be awesome. They see No Child Left Behind and Race To The Top as business opportunities.
On the other hand you have people who do not believe in the promise of educational data. They believe in public education, and are maybe even teachers themselves. They see no proven benefits of testing, or data collection and privacy issues for students, and they often worry about job security, and public shaming and finger-pointing, and the long term consequences on children and teachers of this circus of profit-seeking “educational” reformers. Not to mention that none of this recent stuff is addressing the very real problems we have.
As it currently stands, I’m pretty much part of the second group. There just aren’t enough data skeptics in the first group to warrant my respect, and there’s way too much money and secrecy around testing and “value-added models.” And the politics of the anti-tenure case are ugly and I say that even though I don’t think teacher union leaders are doing themselves many favors.
But here’s the thing, it’s not like there could never be well-considered educational experiments that use data and have strict privacy measures in place, the results of which are not saved to individual records but are lessons learned for educators, and, it goes without saying, are strictly non-commercial. There is a place for testing, but not as a punitive measure but rather as a way of finding where there are problems and devoting resources to it. The current landscape, however, is so split and so acrimonious, it’s kind of impossible to imagine something reasonable happening.
It’s too bad, this stuff is important.
My friend Jan Zilinsky recently sent me this blogpost from the NeuroCritic which investigates the repercussions of having biomarkers held against individuals.
In this case, the biomarker was in the brain and indicated a propensity for taking financial risks. Or maybe it didn’t really – the case wasn’t closed – but that was the idea, and the people behind the research mentioned three times in 8 pages that policy makers might want to use already available brain scans to figure out which populations or individuals would be at risk. Here’s an excerpt from their paper:
Our finding suggests the existence of a simple biomarker for risk attitude, at least in the midlife [sic] population we examined in the northeastern United States. … If generalized to other groups, this finding will also imply that individual risk attitudes could, at least to some extent, be measured in many existing medical brain scans, potentially offering a tool for policy makers seeking to characterize the risk attitudes of populations.
The way the researchers did their tests was, as usual, to have them play artificial games of chance and see how different people strategized, and how their brains were different.
Studies like this are common and I don’t see a reason they won’t become even more common. The question is how we’re going to use them. Here’s a nasty way I could imagine they get used: when you apply for a job, you fill in a questionnaire that puts you into a category, and then people can see what biomarkers are typical for that category, and what the related health risks look like, and then they can decide whether to hire you. Not getting hired doesn’t say anything about your behaviors, just what happens with “people like you”.
I’m largely sidestepping the issue of accuracy. It’s quite likely that, at an individual level, many such predictions will be inaccurate but could still be used by commercial interests – and even be profitable – even so.
In the best case scenario, we would use such knowledge strictly to help people stay healthy. In the worst case, we have a system whereby people are judged by their biomarkers and not their behavior. If there were ever a case for regulation, I think this is it.
I just finished reading a fascinating article from Bloomberg BusinessWeek about a man who claims to have reverse-engineered the admission processes at Ivy League colleges (hat tip Jan Zilinsky).
His name is Steven Ma, and as befits an ex-hedge funder, he has built an algorithm of sorts to work well with both the admission algorithms at the “top 50 colleges,” and the US News & World Report model which defines which colleges are in the “to 50.” It’s a huge modeling war that you can pay to engage in.
Ma is a salesman too: he guarantees that a given high-school kid will get into a top school, your money back. In other words he has no problem working with probabilities and taking risks that he think are likely to pay off and that make the parents willing to put down huge sums. Here’s an example of a complicated contract he developed with one family:
After signing an agreement in May 2012, the family wired Ma $700,000 over the next five months—before the boy had even applied to college. The contract set out incentives that would pay Ma as much as $1.1 million if the son got into the No. 1 school in U.S. News’ 2012 rankings. (Harvard and Princeton were tied at the time.) Ma would get nothing, however, if the boy achieved a 3.0 GPA and a 1600 SAT score and still wasn’t accepted at a top-100 college. For admission to a school ranked 81 to 100, Ma would get to keep $300,000; schools ranked 51 to 80 would let Ma hang on to $400,000; and for a top-50 admission, Ma’s payoff started at $600,000, climbing $10,000 for every rung up the ladder to No. 1.
He’s also interested in reverse-engineering the “winning essay” in conjunction with after-school activities:
With more capital—ThinkTank’s current valuation to potential investors is $60 million—Ma hopes to buy hundreds of completed college applications from the students who submitted them, along with the schools’ responses, and beef up his algorithm for the top 50 U.S. colleges. With enough data, Ma plans to build an “optimizer” that will help students, perhaps via an online subscription, choose which classes and activities they should take. It might tell an aspiring Stanford applicant with several AP classes in his junior year that it’s time to focus on becoming president of the chess or technology club, for example.
This whole college coaching industry reminds me a lot of financial regulation. We complicate the rules to the point where only very well-off insiders know exactly how to bypass the rules. To the extent that getting into one of these “top schools” actually does give young people access to power, influence, and success, it’s alarming how predictable the whole process has become.
Here’s a thought: maybe we should have disclosure laws about college coaching and prep? Or would those laws be gamed too?
I’ve been sent this recent New York Times article by a few people (thanks!). It’s called Grading Teachers, With Data From Class, and it’s about how standardized tests are showing themselves to be inadequate to evaluate teachers, so a Silicon Valley-backed education startup called Panorama is stepping into the mix with a data collection process focused on student evaluations.
Putting aside for now how much this is a play for collecting information about the students themselves, I have a few words to say about the signal which one gets from student evaluations. It’s noisy.
So, for example, I was a calculus teacher at Barnard, teaching students from all over the Columbia University community (so, not just women). I taught the same class two semesters in a row: first in Fall, then in Spring.
Here’s something I noticed. The students in the Fall were young (mostly first semester frosh), eager, smart, and hard-working. They loved me and gave me high marks on all categories, except of course for the few students who just hated math, who would typically give themselves away by saying “I hate math and this class is no different.”
The students in the Spring were older, less eager, probably just as smart, but less hard-working. They didn’t like me or the class. In particular, they didn’t like how I expected them to work hard and challenge themselves. The evaluations came back consistently less excited, with many more people who hated math.
I figured out that many of the students had avoided this class and were taking it for a requirement, didn’t want to be there, and it showed. And the result was that, although my teaching didn’t change remarkably between the two semesters, my evaluations changed considerably.
Was there some way I could have gotten better evaluations from that second group? Absolutely. I could have made the class easier. That class wanted calculus to be cookie-cutter, and didn’t particularly care about the underlying concepts and didn’t want to challenge themselves. The first class, by contrast, had loved those things.
My conclusion is that, once we add “get good student evaluations” to the mix of requirements for our country’s teachers, we are asking for them to conform to their students’ wishes, which aren’t always good. Many of the students in this country don’t like doing homework (in fact most!). Only some of them like to be challenged to think outside their comfort zone. We think teachers should do those things, but by asking them to get good student evaluations we might be preventing them from doing those things. A bad feedback loop would result.
I’m not saying teachers shouldn’t look at student evaluations; far from it, I always did and I found them useful and illuminating, but the data was very noisy. I’d love to see teachers be allowed to see these evaluations without there being punitive consequences.
Any time I see an article about the evaluation system for teachers in New York State, I wince. People get it wrong so very often. Yesterday’s New York Times article written by Elizabeth Harris was even worse than usual.
First, her wording. She mentioned a severe drop in student reading and math proficiency rates statewide and attributed it to a change in the test to the Common Core, which she described as “more rigorous.”
The truth is closer to “students were tested on stuff that wasn’t in their curriculum.” And as you can imagine, if you are tested on stuff you didn’t learn, your score will go down (the Common Core has been plagued by a terrible roll-out, and the timing of this test is Exhibit A). Wording like this matters, because Harris is setting up her reader to attribute the falling scores to bad teachers.
Harris ends her piece with a reference to a teacher-tenure lawsuit: ‘In one of those cases, filed in Albany in July, court documents contrasted the high positive teacher ratings with poor student performance, and called the new evaluation system “deficient and superficial.” The suit said those evaluations were the “most highly predictive measure of whether a teacher will be awarded tenure.”’
In other words, Harris is painting a picture of undeserving teachers sneaking into tenure in spite of not doing their job. It’s ironic, because I actually agree with the statement that the new evaluation system is “deficient and superficial,” but in my case I think it is overly punitive to teachers – overly random, really, since it incorporates the toxic VAM model – but in her framing she is implying it is insufficiently punitive.
Let me dumb Harris’s argument down even further: How can we have 26% English proficiency among students and 94% effectiveness among teachers?! Let’s blame the teachers and question the legitimacy of tenure.
Indeed, after reading the article I felt like looking into whether Harris is being paid by David Welch, the Silicon Valley dude who has vowed to fight teacher tenure nationwide. More likely she just doesn’t understand education and is convinced by simplistic reasoning.
In either case, she clearly needs to learn something about statistics. For that matter, so do other people who drag out this “blame the teacher” line whenever they see poor performance by students.
Because here’s the thing. Beyond obvious issues like switching the content of the tests away from the curriculum, standardized test scores everywhere are hugely dependent on the poverty levels of students. Some data:
It’s not just in this country, either:
The conclusion is that, unless you think bad teachers have somehow taken over poor schools everywhere and booted out the good teachers, and good teachers have taken over rich schools everywhere and booted out the bad teachers (which is supposed to be impossible, right?), poverty has much more of an effect than teachers.
Just to clarify this reasoning, let me give you another example: we could blame bad journalists for lower rates of newspaper readership at a given paper, but since newspaper readership is going down everywhere we’d be blaming journalists for what is a cultural issue.
Or, we could develop a process by which we congratulate specific policemen for a reduced crime rate, but then we’d have to admit that crime is down all over the country.
I’m not saying there aren’t bad teachers, because I’m sure there are. But by only focusing on rooting out bad teachers, we are ignoring an even bigger and harder problem. And no, it won’t be solved by privatizing and corporatizing public schools. We need to address childhood poverty. Here’s one more visual for the road: