Home > modeling > How proxies fail

How proxies fail

June 4, 2013

A lot of the time perfectly well-meaning data goals end up terribly wrong. Certain kinds of these problems stem from the same issue, namely using proxies.

Here’s how it works. People focus on a problem. It’s a real problem, but it’s hard to collect data on the exact question that one would like (how well are students learning? how well is the company functioning? how do we measure risk?).

People have trouble measuring the object in question directly, so they reasonably ask, how do we measure this problem?

They’re smart, so they come up with something, say some metric (standardized test scores, shareprice, VaR). It’s not perfect, though, and so they discuss in detail all the inadequacies with the metric. Even so, they’d really like to address this issue, so they decide to try it.

Then they start using it – hey, it works pretty well in spite of its known issues! We have something to focus on, to improve on!

Then two things happen. First, the people who were so thoughtful at the beginning slowly forget inadequacies of the metric, or are replaced by people who never had that conversation. Slowly the community involved with this proxy starts thinking this thing is a perfect measurement of the thing we actually care about. For all intents and purposes, of course, it is, because that’s what we’re measuring, and that’s how their paycheck is defined.

Second, the discrepancy between the proxy and the original underlying problem becomes more and more of a problem itself, and as people game the proxy, the effectiveness of the proxy is weakened. It no longer does a good job as a stand-in for the original problem, due to gaming and intense focus on the proxy. Sadly, that original problem, which was important, is ignored.

This is a tough problem to solve because we always have the urge to address problems, and we always make do with imperfect proxies and metrics. My guess at the best way to deal with the ensuing problems is to always have a minimum number of different ways to look at and quantify a problem, and to keep in mind each of their inadequacies. Have a dashboard approach, and of course always be on the look-out for metrics that are being gamed. It’s a hard sell of course because it requires deeper understanding and thoughtful interpretation.

Categories: modeling
  1. June 4, 2013 at 8:52 am

    Rightly said, its the ever going quest to find the elusive measure. I agree having different takes on the problem…

    Like

  2. June 4, 2013 at 9:20 am

    “It is difficult to get a man to understand something, when his salary depends on his not understanding it.” ― Upton Sinclai

    Like

  3. June 4, 2013 at 9:23 am

    And here is a related article on wikipedia. http://en.wikipedia.org/wiki/Campbell%27s_Law

    Like

    • Glen S. McGhee
      June 4, 2013 at 4:56 pm

      I am also a big fan of Campbell’s Law — here is a variation that I am also exceedingly fond of: http://en.wikipedia.org/wiki/Goodhart%27s_law

      “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

      Come to think of it, researchers/academics sometimes call this decline, reactivity. Not an especially helpful term, but at least they have a word for it!
      http://onlinelibrary.wiley.com/doi/10.1348/135910710X492341/abstract
      Relevant ref, eh?

      Like

      • beewhy2012
        June 9, 2013 at 9:56 am

        Thanks for these two extremely illuminating references. They are at the core of the never-ending struggle to be human.

        Like

  4. Glen S. McGhee
    June 4, 2013 at 10:42 am

    In philosophy, this is an epistemological problem — what is called the hermeneutic problem.
    http://en.wikipedia.org/wiki/Hermeneutic_circle

    The obstacle to measuring anything (and everything) is that nothing can be measured directly, and there is always an interpretive step involved that runs the risk of distorting the output.

    Measurement theory, as I understand it, finds multiple paths around this — including “Item Response” grading of standardized tests. Item Response attempts to construct thought-processes, and uses this construct to interpret results.

    Like

  5. Eric
    June 4, 2013 at 10:48 am

    I have two thoughts – one procedural; would it be of any benefit during these discussions to insist on multiple proxies. At the very least, it would be more difficult to game three proxies vs a single proxy – and it seems like it would be easier to keep the discussion alive, especially if the proxies did not all track one another.

    My second thought is more political – how much is the choice of proxy determined by our inability to agree on what should be measured? To take the standardized test score as a measurement of scholastic achievement, how much is the use of that as a proxy determined by disagreements about what constituted ‘scholastic achievement’ or ‘school success’ (and now ‘teacher success’)? And how rare are discussions both of what ‘scholastic achievement’ and what the purpose of public education are?

    As I was writing this, a third though occurred to me; how much lobbying from business interests in the testing field drive the discussion of measurement? (Or – financial lobbying driving how stock prices are used as a proxy for national and world economic health? Etc.) Here the problem is that a proxy is being chosen specifically because it can be gamed later… (Or used to generate revenue by its existence in the case of standardized testing)…

    Like

  6. June 4, 2013 at 11:09 am

    Is one way of counteracting this effect collecting different types of data using different metrics, and then comparing the two systems to see how much variation there is?

    Like

  7. Zathras
    June 4, 2013 at 11:36 am

    Using a proxy is unavoidable, since data from the past is a proxy for data from the future. Mistakes happen when people forget they are using proxies.

    Like

  8. June 4, 2013 at 3:25 pm

    Ben Goldacre reports a striking example of this in Bad Pharma (p10-11). In the 1980s doctors thought that arrhythmic heart action indicated a risk of heart attack. They noted that people who’d had heart attacks often had arrhythmic hearts so they prescribed anti-arrhythmic drugs to these people to reduce heart arrhythmias. The drugs reduced the arrhythmias but increased the risk of another heart attack! Goldacre says “well over 100,000 people died unnecessarily”.

    Like

  9. Abe Kohen
    June 5, 2013 at 10:42 am

    Amen. Fully concur. Seen it over and over.

    Like

  10. CGS
    June 5, 2013 at 11:42 am

    An econometrics course had me silently laughing as the professor explained to his PhD students that years of education was a good proxy for intelligence. Wishful thinking?

    Like

  11. zakdavid
    June 10, 2013 at 8:02 pm

    Nicely put. This is a very generalizable way of thinking about how this can happen over and over again.

    I think you may have posted on William K Black’s “control fraud” theory before? If not, it goes well with this post.

    Like

  12. Ion
    June 20, 2013 at 8:03 am

    Obesity is now a disease, per the AMA. But, the definition of obesity is a mass to squared height ratio of over 30 kg/m^2. Now, if the allopaths thinks it’s the right thing to do, I’m not particularly opposed to labeling anyone with over 25% of their body weight in white adipose tissue as diseased, and maybe putting them in work camps or installing chips in their heads. Whatever’s necessary for public health. And I don’t doubt BMI is generally a good proxy for fatness. But now people who are simply heavy have been labeled as diseased by a pretty influential organization. It’s proxy gone haywire.

    Like

  1. June 4, 2013 at 12:53 pm
  2. June 7, 2013 at 4:08 am
  3. June 9, 2013 at 8:37 am
  4. June 9, 2013 at 9:11 pm
  5. June 19, 2013 at 9:06 am
Comments are closed.
%d bloggers like this: