Archive

Archive for the ‘statistics’ Category

Plumping up darts

Someone asked me a math question the other day and I had fun figuring it out. I thought it would be nice to write it down.

So here’s the problem. You are getting to see sample data and you have to infer the underlying distribution. In fact you happen to know you’re getting draws – which, because I’m a basically violent person, I like to think of as throws of a dart – from a uniform distribution from 0 to some unknown d, and you need to figure out what d is. All you know is your data, so in particular you know how many dart throws you’ve gotten to see so far. Let’s say you’ve seen n draws.

In other words, given x_1, x_2, x_3, \dots, x_n, what’s your best guess for d?

First, in order to simplify, note that all that really matters in terms of the estimate of d is what is max_{i \in \{1, \dots, n\}} (x_i) and how big n is.

Next, note you might as well assume that d=1 and you just don’t know it yet.

With this set-up, you’ve rephrased the question like this: if you throw n darts at the interval [0,1], then where do you expect the right-most dart – the maximum – to land?

It’s obvious from this phrasing that, as n goes to infinity, you can expect a dart to get closer and closer to 1. Moreover, you can look at the simplest case, where n=1, and since the uniform distribution is symmetric, you can see the answer is 1/2. Then you might guess the overall answer, which depends on n and goes to 1 as n goes to infinity, might be n/(n+1). It makes intuitive sense, but how do you prove that?

Start with a small case where you know the answer. For n=1 we just need to know what the expected value of max(x_1) is, and since there’s one dart, the max is just x_1 itself, which is to say we need to compute a simple integral to find the expected value (note it’s coming in handy here that I’ve normalized the interval from 0 to 1 so I don’t have to divide by the width of the interval):

\int_0^1 x \, dx = (x^2/2) |_0^1 = 1/2,

and we recover what we already know. In the next case, we need to integrate over two variables (same comment here, don’t have to divide by area of the 1×1 square base):

\int_0^1 \int_0^1 max(x_1, x_2) \, dx_1 dx_2.

If you think about it, though, x_1 and x_2 play symmetric parts in this matter, so you can assume without loss of generality that x_1 is bigger, as long as we only let x_2 range between 0 and x_1, and then multiply the end result by 2:

 = 2 \int_0^1 \int_0^{x_1} x_1 \, dx_2 dx_1.

But that simplifies to:

= 2 \int_0^1 x_1^2 \, dx = 2 (x_1^3/3) |_0^1 = 2/3.

Let’s do the general case. It’s an n-fold integral over the maximum of all n darts, and again without loss of generality x_1 is the maximum as long as we remember to multiply the whole thing by n. We end up computing:

= n \int_0^1 \int_0^{x_1} \int_0^{x_1} \cdots \int_0^{x_1} x_1 \, dx_n \cdots dx_3 dx_2 dx_1.

But this collapses to:

n \int_0^1 x_1^n \, dx_1 = n (x_1^{n+1}/(n+1)) |_0^1 = n/(n+1).

To finish the original question, take the maximum value in your collection of draws and multiply it by the plumping factor (n+1)/n to get a best estimate of the parameter d.

Categories: math, statistics

Educational accountability scores get politically manipulated again

My buddy Jordan Ellenberg just came out with a fantastic piece in Slate entitled “The Case of the Missing Zeroes: An astonishing act of statistical chutzpah in the Indiana schools’ grade-changing scandal.”

Here are the leading sentences of the piece:

Florida Education Commissioner Tony Bennett resigned Thursday amid claims that, in his former position as superintendent of public instruction in Indiana, he manipulated the state’s system for evaluating school performance. Bennett, a Republican who created an A-to-F grading protocol for Indiana schools as a way to promote educational accountability, is accused of raising the mark for a school operated by a major GOP donor.

Jordan goes on to explain exactly what happened and how that manipulation took place. Turns out it was a pretty outrageous and easy-to-understand lie about missing zeroes which didn’t make any sense. You should read the whole thing, Jordan is a great writer and his fantasy about how he would deal with a student trying the same scam in his calculus class is perfect.

A few comments to make about this story overall.

  1. First of all, it’s another case of a mathematical model being manipulated for political reasons. It just happens to be a really simple mathematical model in this case, namely a weighted average of scores.
  2. In other words, the lesson learned for corrupt politicians in the future may well to be sure the formulae are more complicated and thus easier to game.
  3. Or in other words, let’s think about other examples of this kind of manipulation, where people in power manipulate scores after the fact for their buddies. Where might it be happening now? Look no further than the Value-Added Model for teachers and schools, which literally nobody understands or could prove is being manipulated in any given instance.
  4. Taking a step further back, let’s remind ourselves that educational accountability models in general are extremely ripe for gaming and manipulation due to their high stakes nature. And the question of who gets the best opportunity to manipulate their scores is, as shown in this example of the GOP-donor-connected school, often a question of who has the best connections.
  5. In other words, I wonder how much the system can be trusted to give us a good signal on how well schools actually teach (at least how well they teach to the test).
  6. And if we want that signal to be clear, maybe we should take away the high stakes and literally measure it, with no consequences. Then, instead of punishing schools with bad scores, we could see how they need help.
  7. The conversation doesn’t profit  from our continued crazy high expectations and fundamental belief in the existence of a silver bullet, the latest one being the Kipp Charter Schools – read this reality check if you’re wondering what I’m talking about (hat tip Jordan Ellenberg).
  8. As any statistician could tell you, any time you have an “educational experiment” involving highly motivated students, parents, and teachers, it will seem like a success. That’s called selection bias. The proof of the pudding lies in the scaling up of the method.
  9. We need to think longer term and consider how we’re treating good teachers and school administration who have to live under arbitrary and unfair systems. They might just leave.

How much is the Stacks Project graph like a random graph?

This is a guest post from Jordan Ellenberg, a professor of mathematics at the University of Wisconsin. Jordan’s book, How Not To Be Wrong, comes out in May 2014. It is crossposted from his blog, Quomodocumque, and tweeted about at @JSEllenberg.

Cathy posted some cool data yesterday coming from the new visualization features of the magnificent Stacks Project. Summary: you can make a directed graph whose vertices are the 10,445 tagged assertions in the Stacks Project, and whose edges are logical dependency. So this graph (hopefully!) doesn’t have any directed cycles. (Actually, Cathy tells me that the Stacks Project autovomits out any contribution that would create a logical cycle! I wish LaTeX could do that.)

Given any assertion v, you can construct the subgraph G_v of vertices which are the terminus of a directed path starting at v. And Cathy finds that if you plot the number of vertices and number of edges of each of these graphs, you get something that looks really, really close to a line.

Why is this so? Does it suggest some underlying structure? I tend to say no, or at least not much — my guess is that in some sense it is “expected” for graphs like this to have this sort of property.

Because I am trying to get strong at sage I coded some of this up this morning. One way to make a random directed graph with no cycles is as follows: start with N edges, and a function f on natural numbers k that decays with k, and then connect vertex N to vertex N-k (if there is such a vertex) with probability f(k). The decaying function f is supposed to mimic the fact that an assertion is presumably more likely to refer to something just before it than something “far away” (though of course the stack project is not a strictly linear thing like a book.)

Here’s how Cathy’s plot looks for a graph generated by N= 1000 and f(k) = (2/3)^k, which makes the mean out-degree 2 as suggested in Cathy’s post.

stacksgraph_expmean2

Pretty linear — though if you look closely you can see that there are really (at least) a couple of close-to-linear “strands” superimposed! At first I thought this was because I forgot to clear the plot before running the program, but no, this is the kind of thing that happens.

Is this because the distribution decays so fast, so that there are very few long-range edges? Here’s how the plot looks with f(k) = 1/k^2, a nice fat tail yielding many more long edges:

stacksgraph_inversesquare

My guess: a random graph aficionado could prove that the plot stays very close to a line with high probability under a broad range of random graph models. But I don’t really know!

Update: Although you know what must be happening here? It’s not hard to check that in the models I’ve presented here, there’s a huge amount of overlap between the descendant graphs; in fact, a vertex is very likely to be connected all but c of the vertices below it for a suitable constant c.

I would guess the Stacks Project graph doesn’t have this property (though it would be interesting to hear from Cathy to what extent this is the case) and that in her scatterplot we are not measuring the same graph again and again.

It might be fun to consider a model where vertices are pairs of natural numbers and (m,n) is connected to (m-k,n-l) with probability f(k,l) for some suitable decay. Under those circumstances, you’d have substantially less overlap between the descendant trees; do you still get the approximately linear relationship between edges and nodes?

Categories: guest post, math, statistics

Math fraud in pensions

I wrote a post three months ago talking about how we don’t need better models but we need to stop lying with our models. My first example was municipal debt and how various towns and cities are in deep debt partly because their accounting for future pension obligations allows them to be overly optimistic about their investments and underfund their pension pots.

This has never been more true than it is right now, and as this New York Times Dealbook article explains, was a major factor in Detroit’s bankruptcy filing this past week. But don’t make any mistake: even in places where they don’t end up declaring bankruptcy, something is going to shake out because of these broken models, and it isn’t going to be extra money for retired civil servants.

It all comes down to wanting to avoid putting required money away and hiring quants (in this case actuaries) to make that seem like it’s mathematically acceptable. It’s a form of mathematical control fraud. From the article:

When a lender calculates the value of a mortgage, or a trader sets the price of a bond, each looks at the payments scheduled in the future and translates them into today’s dollars, using a commonplace calculation called discounting. By extension, it might seem that an actuary calculating a city’s pension obligations would look at the scheduled future payments to retirees and discount them to today’s dollars.

But that is not what happens. To calculate a city’s pension liabilities, an actuary instead projects all the contributions the city will probably have to make to the pension fund over time. Many assumptions go into this projection, including an assumption that returns on the investments made by the pension fund will cover most of the plan’s costs. The greater the average annual investment returns, the less the city will presumably have to contribute. Pension plan trustees set the rate of return, usually between 7 percent and 8 percent.

In addition, actuaries “smooth” the numbers, to keep big swings in the financial markets from making the pension contributions gyrate year to year. These methods, actuarial watchdogs say, build a strong bias into the numbers. Not only can they make unsustainable pension plans look fine, they say, but they distort the all-important instructions actuaries give their clients every year on how much money to set aside to pay all benefits in the future.

One caveat: if the pensions have actually been making between 7 percent and 8 percent on their investments every year then all is perhaps well. But considering that they typically invest in bonds, not stocks – which is a good thing – we’re likely seeing much smaller returns than that, which means their yearly contributions to the local pension plans are in dire straits.

What’s super interesting about this article is that it goes into the action on the ground inside the Actuary community, since their reputations are at stake in this battle:

A few years ago, with the debate still raging and cities staggering through the recession, one top professional body, the Society of Actuaries, gathered expert opinion and realized that public pension plans had come to pose the single largest reputational risk to the profession. A Public Plans Reputational Risk Task Force was convened. It held some meetings, but last year, the matter was shifted to a new body, something called the Blue Ribbon Panel, which was composed not of actuaries but public policy figures from a number of disciplines. Panelists include Richard Ravitch, a former lieutenant governor of New York; Bradley Belt, a former executive director of the Pension Benefit Guaranty Corporation; and Robert North, the actuary who shepherds New York City’s five big public pension plans.

I’m not sure what happened here, but it seems like a bunch of people in a profession, the actuaries, got worried that they were being used by politicians, and decided to investigate, but then that initiative got somehow replaced by a bunch of politicians. I’d love to talk to someone on the inside about this.

Categories: finance, math, modeling, statistics

Measuring Up by Daniel Koretz

This is a guest post by Eugene Stern.

Now that I have kids in school, I’ve become a lot more familiar with high-stakes testing, which is the practice of administering standardized tests with major consequences for students who take them (you have to pass to graduate), their teachers (who are often evaluated based on standarized test results), and their school districts (state funding depends on test results). To my great chagrin, New Jersey, where I live, is in the process of putting such a teacher evaluation system in place (for a lot more detail and criticism, see here).

The excellent John Ewing pointed me to a pretty comprehensive survey of standardized testing called “Measuring Up,” by Harvard Ed School prof Daniel Koretz, who teaches a course there about this stuff. If you have any interest in the subject, the book is very much worth your time. But in case you don’t get to it, or just to whet your appetite, here are my top 10 takeaways:

  1. Believe it or not, most of the people who write standardized tests aren’t idiots. Building effective tests is a difficult measurement problem! Koretz makes an analogy to political polling, which is a good reminder that a test result is really a sample from a distribution (if you take multiple versions of a test designed to measure the same thing, you won’t do exactly the same each time), and not an absolute measure of what someone knows. It’s also a good reminder that the way questions are phrased can matter a great deal.

  2. The reliability of a test is inversely related to the standard deviation of this distribution: a test is reliable if your score on it wouldn’t vary very much from one instance to the next. That’s a function of both the test itself and the circumstances under which people take it. More reliability is better, but the big trade-off is that increasing the sophistication of the test tends to decrease reliability. For example, tests with free form answers can test for a broader range of skills than multiple choice, but they introduce variability across graders, and even the same person may grade the same test differently before and after lunch. More sophisticated tasks also take longer to do (imagine a lab experiment as part of a test), which means fewer questions on the test and a smaller cross-section of topics being sampled, again meaning more noise and less reliability.

  3. A complementary issue is bias, which is roughly about people doing better or worse on a test for systematic reasons outside the domain being tested. Again, there are trade-offs: the more sophisticated the test, the more extraneous skills beyond those being tested it may be bringing in. One common way to weed out such questions is to look at how people who score the same on the overall test do on each particular question: if you get variability you didn’t expect, that may be a sign of bias. It’s harder to do this for more sophisticated tests, where each question is a bigger chunk of the overall test. It’s also harder if the bias is systematic across the test.

  4. Beyond the (theoretical) distribution from which a single student’s score is a sample, there’s also the (likely more familiar) distribution of scores across students. This depends both on the test and on the population taking it. For example, for many years, students on the eastern side of the US were more likely to take the SAT than those in the west, where only students applying to very selective eastern colleges took the test. Consequently, the score distributions were very different in the east and the west (and average scores tended to be higher in the west), but this didn’t mean that there was bias or that schools in the west were better.

  5. The shape of the score distribution across students carries important information about the test. If a test is relatively easy for the students taking it, scores will be clustered to the right of the distribution, while if it’s hard, scores will be clustered to the left. This matters when you’re interpreting results: the first test is worse at discriminating among stronger students and better at discriminating among weaker ones, while the second is the reverse.

  6. The score distribution across students is an important tool in communicating results (you may not know right away what a score of 600 on a particular test means, but if you hear it’s one standard deviation above a mean of 500, that’s a decent start). It’s also important for calibrating tests so that the results are comparable from year to year. In general, you want a test to have similar means and variances from one year to the next, but this raises the question of how to handle year-to-year improvement. This is particularly significant when educational goals are expressed in terms of raising standardized test scores.

  7. If you think in terms of the statistics of test score distributions, you realize that many of those goals of raising scores quickly are deluded. Koretz has a good phrase for this: the myth of the vanishing variance. The key observation is that test score distributions are very wide, on all tests, everywhere, including countries that we think have much better education systems than we do. The goals we set for student score improvement (typically, a high fraction of all students taking a test several years from now are supposed to score above some threshold) imply a great deal of compression at the lower end of this distribution – compression that has never been seen in any country, anywhere. It sounds good to say that every kid who takes a certain test in four years will score as proficient, but that corresponds to a score distribution with much less variance than you’ll ever see. Maybe we should stop lying to ourselves?

  8. Koretz is highly critical of the recent trend to report test results in terms of standards (e.g., how many students score as “proficient”) instead of comparisons (e.g., your score is in the top 20% of all students who took the test). Standards and standard-based reporting are popular because it’s believed that American students’ performance as a group is inadequate. The idea is that being near the top doesn’t mean much if the comparison group is weak, so instead we should focus on making sure every student meets an absolute standard needed for success in life. There are three (at least) problems with this. First, how do you set a standard – i.e., what does proficient mean, anyway? Koretz gives enough detail here to make it clear how arbitrary the standards are. Second, you lose information: in the US, standards are typically expressed in terms of just four bins (advanced, proficient, partially proficient, basic), and variation inside the bins is ignored. Third, even standards-based reporting tends to slide back into comparisons: since we don’t know exactly what proficient means, we’re happiest when our school, or district, or state places ahead of others in the fraction of students classified as proficient.

  9. Koretz’s other big theme is score inflation for high-stakes tests: if everyone is evaluated based on test scores, everyone has an incentive to get those scores up, whether or not that actually has much correlation with learning. If you remember anything from the book or from this post, remember this phrase: sawtooth pattern. The idea is that when a new high-stakes standardized test appears, average scores start at some base level, go up quickly as people figure out how to game the test, then plateau. If the test is replaced with another, the same thing happens: base, rapid growth, plateau. Repeat ad infinitum. Koretz and his collaborators did a nice experiment in which they went back to a school district in which one high-stakes test had been replaced with another and administered the first test several years later. Now that teachers weren’t teaching to the first test, scores on it reverted back to the original base level. Moral: score inflation is real, pervasive, and unavoidable, unless we bite the bullet and do away with high-stakes tests.

  10. While Koretz is sympathetic toward test designers, who live the complexity of standardized testing every day, he is harsh on those who (a) interpret and report on test results and (b) set testing and education policy, without taking that complexity into account. Which, as he makes clear, is pretty much everyone who reports on results and sets policy.

Final thoughts

If you think it’s a good idea to make high-stakes decisions about schools and teachers based on standardized test results, Koretz’s book offers several clear warnings.

First, we should expect any high-stakes test to be gamed. Worse yet, the more reliable tests, being more predictable, are probably easier to game (look at the SAT prep industry).

Second, the more (statistically) reliable tests, by their controlled nature, cover only a limited sample of the domain we want students to learn. Tests trying to cover more ground in more depth (“tests worth teaching to,” in the parlance of the last decade) will necessarily have noisier results. This noise is a huge deal when you realize that high-stakes decisions about teachers are made based on just two or three years of test scores.

Third, a test that aims to distinguish “proficiency” will do a worse job of distinguishing students elsewhere in the skills range, and may be largely irrelevant for teachers whose students are far away from the proficiency cut-off. (For a truly distressing example of this, see here.)

With so many obstacles to rating schools and teachers reliably based on standardized test scores, is it any surprise that we see results like this?

Tonight: first Data Skeptics Meetup, Suresh Naidu

I’m psyched to see Suresh Naidu tonight in the first Data Skeptics Meetup. He’s talking about Political Uses and Abuses of Data and his abstract is this:

While a lot has been made of the use of technology for election campaigns, little discussion has focused on other political uses of data. From targeting dissidents and tax-evaders to organizing protests, the same datasets and analytics that let data scientists do prediction of consumer and voter behavior can also be used to forecast political opponents, mobilize likely leaders, solve collective problems and generally push people around. In this discussion, Suresh will put this in a 1000 year government data-collection perspective, and talk about how data science might be getting used in authoritarian countries, both by regimes and their opponents.

Given the recent articles highlighting this kind of stuff, I’m sure the topic will provoke a lively discussion – my favorite kind!

Unfortunately the Meetup is full but I’d love you guys to give suggestions for more speakers and/or more topics.

The politics of data mining

At first glance, data miners inside governments, start-ups, corporations, and political campaigns are all doing basically the same thing. They’ll all need great engineering infrastructure, good clean data, a working knowledge of statistical techniques and enough domain knowledge to get things done.

We’ve seen recent articles that are evidence for this statement: Facebook data people move to the NSA or other government agencies easily, and Obama’s political campaign data miners have launched a new data mining start-up. I am a data miner myself, and I could honestly work at any of those places – my skills would translate, if not my personality.

I do think there are differences, though, and here I’m not talking about ethics or trust issues, I’m talking about pure politics[1].

Namely, the world of data mining is divided into two broad categories: people who want to cause things to happen and people who want to prevent things from happening.

I know that sounds incredibly vague, so let me give some examples.

In start-ups, irrespective of what you’re actually doing (what you’re actually doing is probably incredibly banal, like getting people to click on ads), you feel like you’re the first person ever to do it, at least on this scale, or at least with this dataset, and that makes it technically challenging and exciting.

Or, even if you’re not the first, at least what you’re creating or building is state-of-the-art and is going to be used to “disrupt” or destroy lagging competition. You feel like a motherfucker, and it feels great[2]!

The same thing can be said for Obama’s political data miners: if you read this article, you’ll know they felt like they’d invented a new field of data mining, and a cult along with it, and it felt great! And although it’s probably not true that they did something all that impressive technically, in any case they did a great job of applying known techniques to a different data set, and they got lots of people to allow access to their private information based on their trust of Obama, and they mined the fuck out of it to persuade people to go out and vote and to go out and vote for Obama.

Now let’s talk about corporations. I’ve worked in enough companies to know that “covering your ass” is a real thing, and can overwhelm a given company’s other goals. And the larger the company, the more the fear sets in and the more time is spent covering one’s ass and less time is spent inventing and staying state-of-the-art. If you’ve ever worked in a place where it takes months just to integrate two different versions of SalesForce you know what I mean.

Those corporate people have data miners too, and in the best case they are somewhat protected from the conservative, risk averse, cover-your-ass atmosphere, but mostly they’re not. So if you work for a pharmaceutical company, you might spend your time figuring out how to draw up the numbers to make them look good for the CEO so he doesn’t get axed.

In other words, you spend your time preventing something from happening rather than causing something to happen.

Finally, let’s talk about government data miners. If there’s one thing I learned when I went to the State Department Tech@State “Moneyball Diplomacy” conference a few weeks back, it’s that they are the most conservative of all. They spend their time worrying about a terrorist attack and how to prevent it. It’s all about preventing bad things from happening, and that makes for an atmosphere where causing good things to happen takes a rear seat.

I’m not saying anything really new here; I think this stuff is pretty uncontroversial. Maybe people would quibble over when a start-up becomes a corporation (my answer: mostly they never do, but certainly by the time of an IPO they’ve already done it). Also, of course, there are ass-coverers in start-ups and there are risk-takers in corporation and maybe even in government, but they don’t dominate.

If you think through things in this light, it makes sense that Obama’s data miners didn’t want to stay in government and decided to go work on advertising stuff. And although they might have enough clout and buzz to get hired by a big corporation, I think they’ll find it pretty frustrating to be dealing with the cover-my-ass types that will hire them. It also makes sense that Facebook, which spends its time making sure no other social network grows enough to compete with it, works so well with the NSA.

1. If you want to talk ethics, though, join me on Monday at Suresh Naidu’s Data Skeptics Meetup where he’ll be talking about Political Uses and Abuses of Data.

2. This is probably why start-up guys are so arrogant.

Book out for early review

I’m happy to say that the book I’m writing with Rachel Schutt called Doing Data Science is officially out for early review. That means a few chapters which we’ve deemed “ready” have been sent to some prominent people in the field to see what they think. Thanks, prominent and busy people!

It also means that things are (knock on wood) wrapping up on the editing side. I’m cautiously optimistic that this book will be a valuable resource for people interested in what data scientists do, especially people interested in switching fields. The range of topics is broad, which I guess means that the most obvious complaint about the book will be that we didn’t cover things deeply enough, and perhaps that the level of pre-requisite assumptions is uneven. It’s hard to avoid.

Thanks to my awesome editor Courtney Nash over at O’Reilly for all her help!

And by the way, we have an armadillo on our cover, which is just plain cool:

book

Salt it up, baby!

An article in yesterday’s Science Times explained that limiting the salt in your diet doesn’t actually improve health, and could in fact be bad for you. That’s a huge turn-around for a public health rule that has run very deep.

How can this kind of thing happen?

Well, first of all epidemiologists use crazy models to make predictions on things, and in this case what happened was they saw a correlation between high blood pressure and high salt intake, and they saw a separate correlation between high blood pressure and death, and so they linked the two.

Trouble is, while very low salt intake might lower blood pressure a little bit, it also for what ever reason makes people die a wee bit more often.

As this Scientific American article explains, that “little bit” is actually really small:

Over the long-term, low-salt diets, compared to normal diets, decreased systolic blood pressure (the top number in the blood pressure ratio) in healthy people by 1.1 millimeters of mercury (mmHg) and diastolic blood pressure (the bottom number) by 0.6 mmHg. That is like going from 120/80 to 119/79. The review concluded that “intensive interventions, unsuited to primary care or population prevention programs, provide only minimal reductions in blood pressure during long-term trials.” A 2003 Cochrane review of 57 shorter-term trials similarly concluded that “there is little evidence for long-term benefit from reducing salt intake.”

Moreover, some people react to changing their salt intake with higher, and some with lower blood pressure. Turns out it’s complicated.

I’m a skeptic, especially when it comes to epidemiology. None of this surprises me, and I don’t think it’s the last bombshell we’ll be hearing. But this meta-analysis also might have flaws, so hold your breath for the next pronouncement.

One last thing – they keep saying that it’s too expensive to do this kind of study right, but I’m thinking that by now they might realize the real cost of not doing it right is a loss of the public’s trust in medical research.

Categories: modeling, statistics

The rise of big data, big brother

I recently read an article off the newsstand called The Rise of Big Data.

It was written by Kenneth Neil Cukier and Viktor Mayer-Schoenberger and it was published in the May/June 2013 edition of Foreign Affairs, which is published by the Council on Foreign Relations (CFR). I mention this because CFR is an influential think tank, filled with powerful insiders, including people like Robert Rubin himself, and for that reason I want to take this view on big data very seriously: it might reflect the policy view before long.

And if I think about it, compared to the uber naive view I came across last week when I went to the congressional hearing about big data and analytics, that would be good news. I’ll write more about it soon, but let’s just say it wasn’t everything I was hoping for.

At least Cukier and Mayer-Schoenberger discuss their reservations regarding “big data” in this article. To contrast this with last week, it seemed like the only background material for the hearing, at least for the congressmen, was the McKinsey report talking about how sexy data science is and how we’ll need to train an army of them to stay competitive.

So I’m glad it’s not all rainbows and sunshine when it comes to big data in this article. Unfortunately, whether because they’re tied to successful business interests, or because they just haven’t thought too deeply about the dark side, their concerns seem almost token, and their examples bizarre.

The article is unfortunately behind the pay wall, but I’ll do my best to explain what they’ve said.

Datafication

First they discuss the concept of datafication, and their example is how we quantify friendships with “likes”: it’s the way everything we do, online or otherwise, ends up recorded for later examination in someone’s data storage units. Or maybe multiple storage units, and maybe for sale.

They formally define later in the article as a process:

… taking all aspect of life and turning them into data. Google’s augmented-reality glasses datafy the gaze. Twitter datafies stray thoughts. LinkedIn datafies professional networks.

Datafication is an interesting concept, although as far as I can tell they did not coin the word, and it has led me to consider its importance with respect to intentionality of the individual.

Here’s what I mean. We are being datafied, or rather our actions are, and when we “like” someone or something online, we are intending to be datafied, or at least we should expect to be. But when we merely browse the web, we are unintentionally, or at least passively, being datafied through cookies that we might or might not be aware of. And when we walk around in a store, or even on the street, we are being datafied in an completely unintentional way, via sensors or Google glasses.

This spectrum of intentionality ranges from us gleefully taking part in a social media experiment we are proud of to all-out surveillance and stalking. But it’s all datafication. Our intentions may run the gambit but the results don’t.

They follow up their definition in the article, once they get to it, with a line that speaks volumes about their perspective:

Once we datafy things, we can transform their purpose and turn the information into new forms of value

But who is “we” when they write it? What kinds of value do they refer to? As you will see from the examples below, mostly that translates into increased efficiency through automation.

So if at first you assumed they mean we, the American people, you might be forgiven for re-thinking the “we” in that sentence to be the owners of the companies which become more efficient once big data has been introduced, especially if you’ve recently read this article from Jacobin by Gavin Mueller, entitled “The Rise of the Machines” and subtitled “Automation isn’t freeing us from work — it’s keeping us under capitalist control.” From the article (which you should read in its entirety):

In the short term, the new machines benefit capitalists, who can lay off their expensive, unnecessary workers to fend for themselves in the labor market. But, in the longer view, automation also raises the specter of a world without work, or one with a lot less of it, where there isn’t much for human workers to do. If we didn’t have capitalists sucking up surplus value as profit, we could use that surplus on social welfare to meet people’s needs.

The big data revolution and the assumption that N=ALL

According to Cukier and Mayer-Schoenberger, the Big Data revolution consists of three things:

  1. Collecting and using a lot of data rather than small samples.
  2. Accepting messiness in your data.
  3. Giving up on knowing the causes.

They describe these steps in rather grand fashion, by claiming that big data doesn’t need to understand cause because the data is so enormous. It doesn’t need to worry about sampling error because it is literally keeping track of the truth. The way the article frames this is by claiming that the new approach of big data is letting “N = ALL”.

But here’s the thing, it’s never all. And we are almost always missing the very things we should care about most.

So for example, as this InfoWorld post explains, internet surveillance will never really work, because the very clever and tech-savvy criminals that we most want to catch are the very ones we will never be able to catch, since they’re always a step ahead.

Even the example from their own article, election night polls, is itself a great non-example: even if we poll absolutely everyone who leaves the polling stations, we still don’t count people who decided not to vote in the first place. And those might be the very people we’d need to talk to to understand our country’s problems.

Indeed, I’d argue that the assumption we make that N=ALL is one of the biggest problems we face in the age of Big Data. It is, above all, a way of excluding the voices of people who don’t have the time or don’t have the energy or don’t have the access to cast their vote in all sorts of informal, possibly unannounced, elections.

Those people, busy working two jobs and spending time waiting for buses, become invisible when we tally up the votes without them. To you this might just mean that the recommendations you receive on Netflix don’t seem very good because most of the people who bother to rate things are Netflix are young and have different tastes than you, which skews the recommendation engine towards them. But there are plenty of much more insidious consequences stemming from this basic idea.

Another way in which the assumption that N=ALL can matter is that it often gets translated into the idea that data is objective. Indeed the article warns us against not assuming that:

… we need to be particularly on guard to prevent our cognitive biases from deluding us; sometimes, we just need to let the data speak.

And later in the article,

In a world where data shape decisions more and more, what purpose will remain for people, or for intuition, or for going against the facts?

This is a bitch of a problem for people like me who work with models, know exactly how they work, and know exactly how wrong it is to believe that “data speaks”.

I wrote about this misunderstanding here, in the context of Bill Gates, but I was recently reminded of it in a terrifying way by this New York Times article on big data and recruiter hiring practices. From the article:

“Let’s put everything in and let the data speak for itself,” Dr. Ming said of the algorithms she is now building for Gild.

If you read the whole article, you’ll learn that this algorithm tries to find “diamond in the rough” types to hire. A worthy effort, but one that you have to think through.

Why? If you, say, decided to compare women and men with the exact same qualifications that have been hired in the past, but then, looking into what happened next you learn that those women have tended to leave more often, get promoted less often, and give more negative feedback on their environments, compared to the men, your model might be tempted to hire the man over the woman next time the two showed up, rather than looking into the possibility that the company doesn’t treat female employees well.

In other words, ignoring causation can be a flaw, rather than a feature. Models that ignore causation can add to historical problems instead of addressing them. And data doesn’t speak for itself, data is just a quantitative, pale echo of the events of our society.

Some cherry-picked examples

One of the most puzzling things about the Cukier and Mayer-Schoenberger article is how they chose their “big data” examples.

One of them, the ability for big data to spot infection in premature babies, I recognized from the congressional hearing last week. Who doesn’t want to save premature babies? Heartwarming! Big data is da bomb!

But if you’re going to talk about medicalized big data, let’s go there for reals. Specifically, take a look at this New York Times article from last week where a woman traces the big data footprints, such as they are, back in time after receiving a pamphlet on living with Multiple Sclerosis. From the article:

Now she wondered whether one of those companies had erroneously profiled her as an M.S. patient and shared that profile with drug-company marketers. She worried about the potential ramifications: Could she, for instance, someday be denied life insurance on the basis of that profile? She wanted to track down the source of the data, correct her profile and, if possible, prevent further dissemination of the information. But she didn’t know which company had collected and shared the data in the first place, so she didn’t know how to have her entry removed from the original marketing list.

Two things about this. First, it happens all the time, to everyone, but especially to people who don’t know better than to search online for diseases they actually have. Second, the article seems particularly spooked by the idea that a woman who does not have a disease might be targeted as being sick and have crazy consequences down the road. But what about a woman is actually is sick? Does that person somehow deserve to have their life insurance denied?

The real worries about the intersection of big data and medical records, at least the ones I have, are completely missing from the article. Although they did mention that “improving and lowering the cost of health care for the world’s poor” inevitable  will lead to “necessary to automate some tasks that currently require human judgment.” Increased efficiency once again.

To be fair, they also talked about how Google tried to predict the flu in February 2009 but got it wrong. I’m not sure what they were trying to say except that it’s cool what we can try to do with big data.

Also, they discussed a Tokyo research team that collects data on 360 pressure points with sensors in a car seat, “each on a scale of 0 to 256.” I think that last part about the scale was added just so they’d have more numbers in the sentence – so mathematical!

And what do we get in exchange for all these sensor readings? The ability to distinguish drivers, so I guess you’ll never have to share your car, and the ability to sense if a driver slumps, to either “send an alert or atomatically apply brakes.” I’d call that a questionable return for my investment of total body surveillance.

Big data, business, and the government

Make no mistake: this article is about how to use big data for your business. It goes ahead and suggests that whoever has the biggest big data has the biggest edge in business.

Of course, if you’re interested in treating your government office like a business, that’s gonna give you an edge too. The example of Bloomberg’s big data initiative led to efficiency gain (read: we can do more with less, i.e. we can start firing government workers, or at least never hire more).

As for regulation, it is pseudo-dealt with via the discussion of market dominance. We are meant to understand that the only role government can or should have with respect to data is how to make sure the market is working efficiently. The darkest projected future is that of market domination by Google or Facebook:

But how should governments apply antitrust rules to big data, a market that is hard to define and is constantly changing form?

In particular, no discussion of how we might want to protect privacy.

Big data, big brother

I want to be fair to Cukier and Mayer-Schoenberger, because they do at least bring up the idea of big data as big brother. Their topic is serious. But their examples, once again, are incredibly weak.

Should we find likely-to-drop-out boys or likely-to-get-pregnant girls using big data? Should we intervene? Note the intention of this model would be the welfare of poor children. But how many models currently in production are targeting that demographic with that goal? Is this in any way at all a reasonable example?

Here’s another weird one: they talked about the bad metric used by US Secretary of Defense Robert McNamara in the Viet Nam War, namely the number of casualties. By defining this with the current language of statistics, though, it gives us the impression that we could just be super careful about our metrics in the future and: problem solved. As we experts in data know, however, it’s a political decision, not a statistical one, to choose a metric of success. And it’s the guy in charge who makes that decision, not some quant.

Innovation

If you end up reading the Cukier and Mayer-Schoenberger article, please also read Julie Cohen’s draft of a soon-to-be published Harvard Law Review article called “What Privacy is For” where she takes on big data in a much more convincing and skeptical light than Cukier and Mayer-Schoenberger were capable of summoning up for their big data business audience.

I’m actually planning a post soon on Cohen’s article, which contains many nuggets of thoughtfulness, but for now I’ll simply juxtapose two ideas surrounding big data and innovation, giving Cohen the last word. First from the Cukier and Mayer-Schoenberger article:

Big data enables us to experiment faster and explore more leads. These advantages should produce more innovation

Second from Cohen, where she uses the term “modulation” to describe, more or less, the effect of datafication on society:

When the predicate conditions for innovation are described in this way, the problem with characterizing privacy as anti-innovation becomes clear: it is modulation, not privacy, that poses the greater threat to innovative practice. Regimes of pervasively distributed surveillance and modulation seek to mold individual preferences and behavior in ways that reduce the serendipity and the freedom to tinker on which innovation thrives. The suggestion that innovative activity will persist unchilled under conditions of pervasively distributed surveillance is simply silly; it derives rhetorical force from the cultural construct of the liberal subject, who can separate the act of creation from the fact of surveillance. As we have seen, though, that is an unsustainable fiction. The real, socially-constructed subject responds to surveillance quite differently—which is, of course, exactly why government and commercial entities engage in it. Clearing the way for innovation requires clearing the way for innovative practice by real people, by preserving spaces within which critical self-determination and self-differentiation can occur and by opening physical spaces within which the everyday practice of tinkering can thrive.

How to reinvent yourself, nerd version

I wanted to give this advice today just in case it’s useful to someone. It’s basically the way I went about reinventing myself from being a quant in finance to being a data scientist in the tech scene.

In other words, many of the same skills but not all, and many of the same job description elements but not all.

The truth is, I didn’t even know the term “data scientist” when I started my job hunt, so for that reason I think it’s possibly good and useful advice: if you follow it, you may end up getting a great job you don’t even know exists right now.

Also, I used this advice yesterday on my friend who is trying to reinvent himself, and he seemed to find it useful, although time will tell how much – let’s see if he gets a new job soon!

Here goes.

  • Write a list of things you like about jobs: learning technical stuff, managing people, whatever floats your boat.
  • Next, write a list of things you don’t like: being secretive, no vacation, office politics, whatever. Some people hate working with “dumb people” but some people can’t stand “arrogant people”. It makes a huge difference actually.
  • Next, write a list of skills you have: python, basic statistics, math, managing teams, smelling a bad deal, stuff like that. This is probably the most important list, so spend some serious time on it.
  • Finally, write a list of skills you don’t have that you wish you did: hadoop, knowing when to stop talking, stuff like that.

Once you have your lists, start going through LinkedIn by cross-searching for your preferred city and a keyword from one of your lists (probably the “skills you have” list).

Every time you find a job that you think you’d like to have, take note of what skills it lists that you don’t have, the name of the company, and your guess on a scale of 1-10 of how much you’d like the job into a spreadsheet or at least a file. This last part is where you use the “stuff I like” and “stuff I don’t like” lists.

And when you’ve done this for a long time, like you made it your job for a few hours a day for at least a few weeks, then do some wordcounts on this file, preferably using a command line script to add to the nerdiness, to see which skills you’d need to get which jobs you’d really like.

Note LinkedIn is not an oracle: it doesn’t have every job in the world (although it might have most jobs you could ever get), and the descriptions aren’t always accurate.

For example, I think companies often need managers of software engineers, but they never advertise for managers of software engineers. They advertise for software engineers, and then let them manage if they have the ability to, and sometimes even if they don’t. But even in that case I think it makes sense: engineers don’t want to be managed by someone they think isn’t technical, and the best way to get someone who is definitely technical is just to get another engineer.

In other words, sometimes the “job requirements” data on LInkedIn dirty, but it’s still useful. And thank god for LinkedIn.

Next, make sure your LinkedIn profile is up-to-date and accurate, and that your ex-coworkers have written letters for you and endorsed you for your skills.

Finally, buy a book or two to learn the new skills you’ve decided to acquire based on your research. I remember bringing a book on Bayesian statistics to my interview for a data scientist. I wasn’t all the way through the book, and my boss didn’t even know enough to interview me on that subject, but it didn’t hurt him to see that I was independently learning stuff because I thought it would be useful, and it didn’t hurt to be on top of that stuff when I started my new job.

What I like about this is that it looks for jobs based on what you want rather than what you already know you can do. It’s in some sense the dual method to what people usually do.

How much math do scientists need to know?

I’m catching up with reading the “big data news” this morning (via Gil Press) and I came across this essay by E. O. Wilson called “Great Scientist ≠ Good at Math”. In it, he argues that most of the successful scientists he knows aren’t good at math, and he doesn’t see why people get discouraged from being scientists just because they suck at math.

Here’s an important excerpt from the essay:

Over the years, I have co-written many papers with mathematicians and statisticians, so I can offer the following principle with confidence. Call it Wilson’s Principle No. 1: It is far easier for scientists to acquire needed collaboration from mathematicians and statisticians than it is for mathematicians and statisticians to find scientists able to make use of their equations.

Given that he’s written many papers with mathematicians and statisticians, then, he is not claiming that math itself is not part of great science, just that he hasn’t been the one that supplied the mathy bits. I think this is really key.

And it resonates with me: I’ve often said that the cool thing about working on a data science team in industry, for example, is that different people bring different skills to the table. I might be an expert on some machine learning algorithms, while someone else will be a domain expert. The problem requires both skill sets, and perhaps no one person has all that knowledge. Teamwork kinda rocks.

Another thing he exposes with Wilson’s Principle No. 1, though, which doesn’t resonate with me, is a general lack of understanding of what mathematicians are actually trying to accomplish with “their equations”.

It is a common enough misconception to think of the quant as a guy with a bunch of tools but no understanding or creativity. I’ve complained about that before on this blog. But when it comes to professional mathematicians, presumably including his co-authors, a prominent scientist such as Wilson should realize that they are doing creative things inside the realm of mathematics simply for the sake of understanding mathematics.

Mathematicians, as a group, are not sitting around wishing someone could “make use of their equations.” For one thing, they often don’t even think about equations. And for another, they often think about abstract structures with no goal whatsoever of connecting it back to, say, how ants live in colonies. And that’s cool and beautiful too, and it’s not a failure of the system. That’s just math.

I’m not saying it wouldn’t be fun for mathematicians to spend more time thinking about applied science. I think it would be fun for them, actually. Moreover, as the next few years and decades unfold, we might very well see a large-scale shrinkage in math departments and basic research money, which could force the issue.

And, to be fair, there are probably some actual examples of mathy-statsy people who are thinking about equations that are supposed to relate to the real world but don’t. Those guys should learn to be better communicators and pair up with colleagues who have great data. In my experience, this is not a typical situation.

One last thing. The danger in ignoring the math yourself, if you’re a scientist, is that you probably aren’t that great at knowing the difference between someone who really knows math and someone who can throw around terminology. You can’t catch charlatans, in other words. And, given that scientists do need real math and statistics to do their research, this can be a huge problem if your work ends up being meaningless because your team got the math wrong.

Categories: modeling, news, statistics

Guest post by Julia Evans: How I got a data science job

This is a guest post by Julia Evans. Julia is a data scientist & programmer who lives in Montréal. She spends her free time these days playing with data and running events for women who program or want to — she just started a Montréal chapter of pyladies to teach programming, and co-organize a monthly meetup called Montréal All-Girl Hack Night for women who are developers.

asked mathbabe a question a few weeks ago saying that I’d recently started a data science job without having too much experience with statistics, and she asked me to write something about how I got the job. Needless to say I’m pretty honoured to be a guest blogger here :) Hopefully this will help someone!

Last March I decided that I wanted a job playing with data, since I’d been playing with datasets in my spare time for a while and I really liked it. I had a BSc in pure math, a MSc in theoretical computer science and about 6 months of work experience as a programmer developing websites. I’d taken one machine learning class and zero statistics classes.

In October, I left my web development job with some savings and no immediate plans to find a new job. I was thinking about doing freelance web development. Two weeks later, someone posted a job posting to my department mailing list looking for a “Junior Data Scientist”. I wrote back and said basically “I have a really strong math background and am a pretty good programmer”. This email included, embarrassingly, the sentence “I am amazing at math”. They said they’d like to interview me.

The interview was a lunch meeting. I found out that the company (Via Science) was opening a new office in my city, and was looking for people to be the first employees at the new office. They work with clients to make predictions based on their data.

My interviewer (now my manager) asked me about my role at my previous job (a little bit of everything — programming, system administration, etc.), my math background (lots of pure math, but no stats), and my experience with machine learning (one class, and drawing some graphs for fun). I was asked how I’d approach a digit recognition problem and I said “well, I’d see what people do to solve problems like that, and I’d try that”.

I also talked about some data visualizations I’d worked on for fun. They were looking for someone who could take on new datasets and be independent and proactive about creating model, figuring out what is the most useful thing to model, and getting more information from clients.

I got a call back about a week after the lunch interview saying that they’d like to hire me. We talked a bit more about the work culture, starting dates, and salary, and then I accepted the offer.

So far I’ve been working here for about four months. I work with a machine learning system developed inside the company (there’s a paper about it here). I’ve spent most of my time working on code to interface with this system and make it easier for us to get results out of it quickly. I alternate between working on this system (using Java) and using Python (with the fabulous IPython Notebook) to quickly draw graphs and make models with scikit-learn to compare our results.

I like that I have real-world data (sometimes, lots of it!) where there’s not always a clear question or direction to go in. I get to spend time figuring out the relevant features of the data or what kinds of things we should be trying to model. I’m beginning to understand what people say about data-wrangling taking up most of their time. I’m learning some statistics, and we have a weekly Friday seminar series where we take turns talking about something we’ve learned in the last few weeks or introducing a piece of math that we want to use.

Overall I’m really happy to have a job where I get data and have to figure out what direction to take it in, and I’m learning a lot.

We don’t need more complicated models, we need to stop lying with our models

The financial crisis has given rise to a series of catastrophes related to mathematical modeling.

Time after time you hear people speaking in baffled terms about mathematical models that somehow didn’t warn us in time, that were too complicated to understand, and so on. If you have somehow missed such public displays of throwing the model (and quants) under the bus, stay tuned below for examples.

A common response to these problems is to call for those models to be revamped, to add features that will cover previously unforeseen issues, and generally speaking, to make them more complex.

For a person like myself, who gets paid to “fix the model,” it’s tempting to do just that, to assume the role of the hero who is going to set everything right with a few brilliant ideas and some excellent training data.

Unfortunately, reality is staring me in the face, and it’s telling me that we don’t need more complicated models.

If I go to the trouble of fixing up a model, say by adding counterparty risk considerations, then I’m implicitly assuming the problem with the existing models is that they’re being used honestly but aren’t mathematically up to the task.

But this is far from the case – most of the really enormous failures of models are explained by people lying. Before I give three examples of “big models failing because someone is lying” phenomenon, let me add one more important thing.

Namely, if we replace okay models with more complicated models, as many people are suggesting we do, without first addressing the lying problem, it will only allow people to lie even more. This is because the complexity of a model itself is an obstacle to understanding its results, and more complex models allow more manipulation.

Example 1: Municipal Debt Models

Many municipalities are in shit tons of problems with their muni debt. This is in part because of the big banks taking advantage of them, but it’s also in part because they often lie with models.

Specifically, they know what their obligations for pensions and school systems will be in the next few years, and in order to pay for all that, they use a model which estimates how well their savings will pay off in the market, or however they’ve invested their money. But they use vastly over-exaggerated numbers in these models, because that way they can minimize the amount of money to put into the pool each year. The result is that pension pools are being systematically and vastly under-funded.

Example 2: Wealth Management

I used to work at Riskmetrics, where I saw first-hand how people lie with risk models. But that’s not the only thing I worked on. I also helped out building an analytical wealth management product. This software was sold to banks, and was used by professional “wealth managers” to help people (usually rich people, but not mega-rich people) plan for retirement.

We had a bunch of bells and whistles in the software to impress the clients – Monte Carlo simulations, fancy optimization tools, and more. But in the end, the banks and their wealth managers put in their own market assumptions when they used it. Specifically, they put in the forecast market growth for stocks, bonds, alternative investing, etc., as well as the assumed volatility of those categories and indeed the entire covariance matrix representing how correlated the market constituents are to each other.

The result is this: no matter how honest I would try to be with my modeling, I had no way of preventing the model from being misused and misleading to the clients. And it was indeed misused: wealth managers put in absolutely ridiculous assumptions of fantastic returns with vanishingly small risk.

Example 3: JP Morgan’s Whale Trade

I saved the best for last. JP Morgan’s actions around their $6.2 billion trading loss, the so-called “Whale Loss” was investigated recently by a Senate Subcommittee. This is an excerpt (page 14) from the resulting report, which is well worth reading in full:

While the bank claimed that the whale trade losses were due, in part, to a failure to have the right risk limits in place, the Subcommittee investigation showed that the five risk limits already in effect were all breached for sustained periods of time during the first quarter of 2012. Bank managers knew about the breaches, but allowed them to continue, lifted the limits, or altered the risk measures after being told that the risk results were “too conservative,” not “sensible,” or “garbage.” Previously undisclosed evidence also showed that CIO personnel deliberately tried to lower the CIO’s risk results and, as a result, lower its capital requirements, not by reducing its risky assets, but by manipulating the mathematical models used to calculate its VaR, CRM, and RWA results. Equally disturbing is evidence that the OCC was regularly informed of the risk limit breaches and was notified in advance of the CIO VaR model change projected to drop the CIO’s VaR results by 44%, yet raised no concerns at the time.

I don’t think there could be a better argument explaining why new risk limits and better VaR models won’t help JPM or any other large bank. The manipulation of existing models is what’s really going on.

Just to be clear on the models and modelers as scapegoats, even in the face of the above report, please take a look at minute 1:35:00 of the C-SPAN coverage of  former CIO head Ina Drew’s testimony when she’s being grilled by Senator Carl Levin (hat tip Alan Lawhon, who also wrote about this issue here).

Ina Drew firmly shoves the quants under the bus, pretending to be surprised by the failures of the models even though, considering she’d been at JP Morgan for 30 years, she might know just a thing or two about how VaR can be manipulated. Why hasn’t Sarbanes-Oxley been used to put that woman in jail? She’s not even at JP Morgan anymore.

Stick around for a few minutes in the testimony after Levin’s done with Drew, because he’s on a roll and it’s awesome to watch.

Categories: finance, modeling, news, rant, statistics

Value-added model doesn’t find bad teachers, causes administrators to cheat

There’ve been a couple of articles in the past few days about teacher Value-Added Testing that have enraged me.

If you haven’t been paying attention, the Value-Added Model (VAM) is now being used in a majority of the states (source: the Economist):

Screen Shot 2013-03-31 at 7.31.53 AM

But it gives out nearly random numbers, as gleaned from looking at the same teachers with two scores (see this previous post). There’s a 24% correlation between the two numbers. Note that some people are awesome with respect to one score and complete shit on the other score:

gradegrade

Final thing you need to know about the model: nobody really understands how it works. It relies on error terms of an error-riddled model. It’s opaque, and no teacher can have their score explained to them in Plain English.

Now, with that background, let’s look into these articles.

First, there’s this New York Times article from yesterday, entitled “Curious Grade for Teachers: Nearly All Pass”. In this article, it describes how teachers are nowadays being judged using a (usually) 50/50 combination of classroom observations and VAM scores. This is different from the past, which was only based on classroom observations.

What they’ve found is that the percentage of teachers found “effective or better” has stayed high in spite of the new system – the numbers are all over the place but typically between 90 and 99 percent of teachers. In other words, the number of teachers that are fingered as truly terrible hasn’t gone up too much. What a fucking disaster, at least according to the NYTimes, which seems to go out of its way to make its readers understand how very much high school teachers suck.

A few things to say about this.

  1. Given that the VAM is nearly a random number generator, this is good news – it means they are not trusting the VAM scores blindly. Of course, it still doesn’t mean that the right teachers are getting fired, since half of the score is random.
  2. Another point the article mentions is that failing teachers are leaving before the reports come out. We don’t actually know how many teachers are affected by these scores.
  3. Anyway, what is the right number of teachers to fire each year, New York Times? And how did you choose that number? Oh wait, you quoted someone from the Brookings Institute: “It would be an unusual profession that at least 5 percent are not deemed ineffective.” Way to explain things so scientifically! It’s refreshing to know exactly how the army of McKinsey alums approach education reform.
  4. The overall article gives us the impression that if we were really going to do our job and “be tough on bad teachers,” then we’d weight the Value-Added Model way more. But instead we’re being pussies. Wonder what would happen if we weren’t pussies?

The second article explained just that. It also came from the New York Times (h/t Suresh Naidu), and it was a the story of a School Chief in Atlanta who took the VAM scores very very seriously.

What happened next? The teachers cheated wildly, changing the answers on their students’ tests. There was a big cover-up, lots of nasty political pressure, and a lot of good people feeling really bad, blah blah blah. But maybe we can take a step back and think about why this might have happened. Can we do that, New York Times? Maybe it had to do with the $500,000 in “performance bonuses” that the School Chief got for such awesome scores?

Let’s face it, this cheating scandal, and others like it (which may never come to light), was not hard to predict (as I explain in this post). In fact, as a predictive modeler, I’d argue that this cheating problem is the easiest thing to predict about the VAM, considering how it’s being used as an opaque mathematical weapon.

Guest Post SuperReview Part III of VI: The Occupy Handbook Part I and a little Part II: Where We Are Now

March 21, 2013 7 comments

Whattup.

Moving on from Lewis’ cute Bloomberg column reprint, we come to the next essay in the series:

The Widening Gyre: Inequality, Polarization, and the Crisis by Paul Krugman and Robin Wells

Indefatigable pair Paul Krugman and Robin Wells (KW hereafter) contribute one of the several original essays in the book, but the content ought to be familiar if you read the New York Times, know something about economics or practice finance. Paul Krugman is prolific, and it isn’t hard to be prolific when you have to rewrite essentially the same column every week; question, are there other columnists who have been so consistently right yet have failed to propose anything that the polity would adopt? Political failure notwithstanding, Krugman leaves gems in every paragraph for the reader new to all this. The title “The Widening Gyre” comes from an apocalyptic William Yeats Butler poem. In this case, Krugman and Wells tackle the problem of why the government responded so poorly to the crisis. In their words:

By 2007, America was about as unequal as it had been on the eve of the Great Depression – and sure enough, just after hitting this milestone, we lunged into the worst slump since the Depression. This probably wasn’t a coincidence, although economists are still working on trying to understand the linkages between inequality and vulnerability to economic crisis.

Here, however, we want to focus on a different question: why has the response to crisis been so inadequate? Before financial crisis struck, we think it’s fair to say that most economists imagined that even if such a crisis were to happen, there would be a quick and effective policy response [editor's note: see Kautsky et al 2016 for a partial explanation]. In 2003 Robert Lucas, the Nobel laureate and then president of the American Economic Association, urged the profession to turn its attention away from recessions to issues of longer-term growth. Why? Because he declared, the “central problem of depression-prevention has been solved, for all practical purposes, and has in fact been solved for many decades.”

Famous last words from Professor Lucas. Nevertheless, the curious failure to apply what was once the conventional wisdom on a useful scale intrigues me for two reasons. First, most political scientists suggest that democracy, versus authoritarian system X, leads to better outcomes for two reasons.

1. Distributional – you get a nicer distribution of wealth (possibly more productivity for complicated macro reasons); economics suggests that since people are mostly envious and poor people have rapidly increasing utility in wealth, democracy’s tendency to share the wealth better maximizes some stupid social welfare criterion (typically, Kaldor-Hicks efficiency).

2. Information – democracy is a better information aggregation system than dictatorship and an expanded polity makes better decisions beyond allocation of produced resources. The polity must be capable of learning and intelligent OR vote randomly if uninformed for this to work. While this is the original rigorous justification for democracy (first formalized in the 1800s by French rationalists), almost no one who studies these issues today believes one-person one-vote democracy better aggregates information than all other systems at a national level. “Well Leon,” some knave comments, “we don’t live in a democracy, we live in a Republic with a president…so shouldn’t a small group of representatives better be able to make social-welfare maximizing decisions?” Short answer: strong no, and US Constitutionalism has some particularly nasty features when it comes to political decision-making.

Second, KW suggest that the presence of extreme wealth inequalities act like a democracy disabling virus at the national level. According to KW extreme wealth inequalities perpetuate themselves in a way that undermines both “nice” features of a democracy when it comes to making regulatory and budget decisions.* Thus, to get better economic decision-making from our elected officials, a good intermediate step would be to make our tax system more progressive or expand Medicare or Social Security or…Well, we have a lot of good options here. Of course, for mathematically minded thinkers, this begs the following question: if we could enact so-called progressive economic policies to cure our political crisis, why haven’t we done so already? What can/must change for us to do so in the future? While I believe that the answer to this question is provided by another essay in the book, let’s take a closer look at KW’s explanation at how wealth inequality throws sand into the gears of our polity. They propose four and the following number scheme is mine:

1. The most likely explanation of the relationship between inequality and polarization is that the increased income and wealth of a small minority has, in effect bought the allegiance of a major political party…Needless to say, this is not an environment conducive to political action.

2. It seems likely that this persistence [of financial deregulation] despite repeated disasters had a lot do with rising inequality, with the causation running in both directions. On the one side the explosive growth of the financial sector was a major source of soaring incomes at the very top of the income distribution. On the other side, the fact that the very rich were the prime beneficiaries of deregulation meant that as this group gained power- simply because of its rising wealth- the push for deregulation intensified. These impacts of inequality on ideology did not in 2008…[they] left us incapacitated in the face of crisis.

3. Conservatives have always seen seen [Keynesian economics] as the thin edge of the wedge: concede that the government can play a useful role in fighting slumps, and the next thing you know we’ll be living under socialism.

4. [Krugman paraphrasing Kalecki] Every widening of state activity is looked upon by business with suspicion, but the creation of employment by government spending has a special aspect which makes the opposition particularly intense. Under a laissez-faire system the level of employment to a great extend on the so-called state of confidence….This gives capitalists a powerful indirect control over government policy: everything which may shake the state of confidence must be avoided because it would cause an economic crisis.

All of these are true to an extent. Two are related to the features of a particular policy position that conservatives don’t like (countercyclical spending) and their cost will dissipate if the economy improves. Isn’t it the case that most proponents and beneficiaries of financial liberalization are Democrats? (Wall Street mostly supported Obama in 08 and barely supported Romney in 12 despite Romney giving the house away). In any case, while KW aren’t big on solutions they certainly have a strong grasp of the problem.

Take a Stand: Sit In by Phillip Dray

As the railroad strike of 1877 had led eventually to expanded workers’ rights, so the Greensboro sit-in of February 1, 1960, helped pave the way for passage of the Civil Rights Act of 1964 and the Voting Rights Act of 1965. Both movements remind us that not all successful protests are explicit in their message and purpose; they rely instead on the participants’ intuitive sense of justice. [28]

I’m not the only author to have taken note of this passage as particularly important, but I am the only author who found the passage significant and did not start ranting about so-called “natural law.” Chronicling the (hitherto unknown-to-me) history of the Great Upheaval, Dray does a great job relating some important moments in left protest history to the OWS history. This is actually an extremely important essay and I haven’t given it the time it deserves. If you read three essays in this book, include this in your list.

Inequality and Intemperate Policy by Raghuram Rajan (no URL, you’ll have to buy the book)

Rajan’s basic ideas are the following: inequality has gotten out of control:

Deepening income inequality has been brought to the forefront of discussion in the United States. The discussion tends to center on the Croesus-like income of John Paulson, the hedge fund manager who made a killing in 2008 betting on a financial collapse and netted over $3 billion, about seventy-five-thousand times the average household income. Yet a more worrying, everyday phenomenon that confronts most Americans is the disparity in income growth rates between a manager at the local supermarket and the factory worker or office assistant. Since the 1970s, the wages of the former, typically workers at the ninetieth percentile of the wage distribution in the United States, have grown much faster than the wages of the latter, the typical median worker.

But American political ideologies typically rule out the most direct responses to inequality (i.e. redistribution). The result is a series of stop-gap measures that do long-run damage to the economy (as defined by sustainable and rising income levels and full employment), but temporarily boost the consumption level of lower classes:

It is not surprising then, that a policy response to rising inequality in the United States in the 1990s and 200s – whether carefully planned or chosen as the path of least resistance – was to encourage lending to households, especially but not exclusively low-income ones, with the government push given to housing credit just the most egregious example. The benefit – higher consumption – was immediate, whereas paying the inevitable bill could be postponed into the future. Indeed, consumption inequality did not grow nearly as much as income inequality before the crisis. The difference was bridged by debt. Cynical as it may seem, easy credit has been used as a palliative success administrations that been unable to address the deeper anxieties of the middle class directly. As I argue in my book Fault Lines, “Let them eat credit” could well summarize the mantra of the political establishment in the go-go years before the crisis.

Why should you believe Raghuram Rajan? Because he’s one of the few guys who called the first crisis and tried to warn the Fed.

A solid essay providing a more direct link between income inequality and bad policy than KW do.

The 5 Percent by Michael Hiltzik

The 5 percent’s [consisting of the seven million Americans who, in 1934, were sixty-five and older] protests coalesced as the Townsend movement, launched by a sinewy midwestern farmer’s son and farm laborer turned California physician. Francis Townsend was a World War I veteran who had served in the Army Medical Corps. He had an ambitious, and impractical plan for a federal pension program. Although during its heyday in the 1930s the movement failed to win enactment of its [editor's note: insane] program, it did play a critical role in contemporary politics. Before Townsend, America understood the destitution of its older generations only in abstract terms; Townsend’s movement made it tangible. “It is no small achievment to have opened the eyes of even a few million Americans to these facts,” Bruce Bliven, editor of the New Republic observed. “If the Townsend Plan were to die tomorrow and be completely forgotten as miniature golf, mah-jongg, or flinch [editor's note: everything old is new again], it would still have left some sedimented flood marks on the national consciousness.” Indeed, the Townsend movement became the catalyst for the New Deal’s signal achievement, the old-age program of Social Security. The history of its rise offers a lesson for the Occupy movement in how to convert grassroots enthusiasm into a potent political force – and a warning about the limitations of even a nationwide movement.

Does the author live up to the promises of this paragraph? Is the whole essay worth reading? Does FDR give in to the people’s demands and pass Social Security?!

Yes to all. Read it.

Hidden in Plain Sight by Gillian Tett (no URL, you’ll have to buy the book)

This is a great essay. I’m going to outsource the review and analysis to:

http://beyoubesure.com/2012/10/13/generation-lost-lazy-afraid/

because it basically sums up my thoughts. You all, go read it.

What Good is Wall Street? by John Cassidy

If you know nothing about Wall Street, then the essay is worth reading, otherwise skip it. There are two common ways to write a bad article in financial journalism. First, you can try to explain tiny index price movements via news articles from that day/week/month. “Shares in the S&P moved up on good news in Taiwan today,” that kind of nonsense. While the news and price movements might be worth knowing for their own sake, these articles are usually worthless because no journalist really knows who traded and why (theorists might point out even if the journalists did know who traded to generate the movement and why, it’s not clear these articles would add value – theorists are correct).

The other way, the Cassidy! way is to ask some subgroup of American finance what they think about other subgroups in finance. High frequency traders think iBankers are dumb and overpaid, but HFT on the other hand, provides an extremely valuable service – keeping ETFs cheap, providing liquidity and keeping shares the right level. iBankers think prop-traders add no value, but that without iBanking M&A services, American manufacturing/farmers/whatever would cease functioning. Low speed prop-traders think that HFT just extracts cash from dumb money, but prop-traders are reddest blooded American capitalists, taking the right risks and bringing knowledge into the markets. Insurance hates hedge funds, hedge funds hate the bulge bracket, the bulge bracket hates the ratings agencies, who hate insurance and on and on.

You can spit out dozens of articles about these catty and tedious rivalries (invariably claiming that financial sector X, rivals for institutional cash with Y, “adds no value”) and learn nothing about finance. Cassidy writes the article taking the iBankers side and surprises no one (this was originally published as an article in The New Yorker).

Your House as an ATM by Bethany McLean

Ms. McLean holds immense talent. It was always pretty obvious that the bottom twenty-percent, i.e. the vast majority of subprime loan recipients, who are generally poor at planning, were using mortgages to get quick cash rather than buy houses. Regulators and high finance, after resisting for a good twenty years, gave in for reasons explained in Rajan’s essay.

Against Political Capture by Daron Acemoglu and James A. Robinson (sorry I couldn’t find a URL, for this original essay you’ll have to buy the book).

A legit essay by a future Nobelist in Econ. Read it.

A Nation of Business Junkies by Arjun Appadurai

Anthro-hack Appadurai writes:

I first came to this country in 1967. I have been either a crypto-anthropologist or professional anthropologist for most of the intervening years. Still, because I came here with an interest in India and took the path of least resistance in choosing to retain India as my principal ethnographic referent, I have always been reluctant to offer opinions about life in these United States.

His instincts were correct. The essay reads like an old man complaining about how bad the weather is these days. Skip it.

Causes of Financial Crises Past and Present: The Role of This-Time-Is-Different Syndrome by Carmen M. Reinhart and Kenneth S. Rogoff

Editor Byrne has amazing powers of persuasion or, a lot of authors have had some essays in the desk-drawer they were waiting for an opportunity to publish. In any case, Rogoff and Reinhart (RR hereafter) have summed up a couple hundred studies and two of their books in a single executive summary and given it to whoever buys The Occupy Handbook. Value. RR are Republicans and the essay appears to be written in good faith (unlike some people *cough* Tyler Cowen and Veronique de Rugy *cough*). RR do a great job discovering and presenting stylized facts about financial crises past and present. What to expect next? A couple national defaults and maybe a hyperinflation or two.

Government As Tough Love by Robert Shiller as interviewed by Brandon Adams (buy the book)!

Shiller has always been ahead of the curve. In 1981, he wrote a cornerstone paper in behavioral finance at a time when the field was in its embryonic stages. In the early 1990s, he noticed insufficient attention was paid to real estate values, despite their overwhelming importance to personal wealth levels; this led him to create, along with Karl E. Case, the Case-Shiller index – now the Case-Shiller Home Prices Indices. In March 2000**, Shiller published Irrational Exuberance, arguing that U.S. stocks were substantially overvalued and due for a tumble. [Editor's note: what Brandon Adams fails to mention, but what's surely relevant is that Shiller also called the subprime bubble and re-released Irrational Exuberance in 2005 to sound the alarms a full three years before The Subprime Solution]. In 2008, he published The Subprime Solution, which detailed the origins of the housing crisis and suggested innovative policy responses for dealing with the fallout. These days, one of his primary interests is neuroeconomics, a field that relates economic decision-making to brain function as measured by fMRIs.

Shiller is basically a champ and you should listen to him.

Shiller was disappointed but not surprised when governments bailed out banks in extreme fashion while leaving the contracts between banks and homeowners unchanged. He said, of Hank Paulson, “As Treasury secretary, he presented himself in a very sober and collected way…he did some bailouts that benefited Goldman Sachs, among others. And I can imagine that they were well-meaning, but I don’t know that they were totally well-meaning, because the sense of self-interest is hard to clean out of your mind.”

Shiller understates everything.

Verdict: Read it.

And so, we close our discussion of part I. Moving on to part II:

In Ms. Byrne’s own words:

Part 2, “Where We Are Now,” which covers the present, both in the United States and abroad, opens with a piece by the anthropologist David Graeber. The world of Madison Avenue is far from the beliefs of Graeber, an anarchist, but it’s Graeber who arguably (he says he didn’t do it alone) came up with the phrase “We Are the 99 percent.” As Bloomberg Businessweek pointed out in October 2011, during month two of the Occupy encampments that Graeber helped initiate and three moths after the publication of his Debt: The First 5,000 Years, “David Graeber likes to say that he had three goals for the year: promote his book, learn to drive, and launch a worldwide revolution. The first is going well, the second has proven challenging and the third is looking up.” Graeber’s counterpart in Chile can loosely be said to be Camila Vallejo, the college undergraduate, pictured on page 219, who, at twenty-three, brought the country to a standstill. The novelist and playwright Ariel Dorfman writes about her and about his own self-imposed exile from Chile, and his piece is followed by an entirely different, more quantitative treatment of the subject. This part of the book also covers the indignados in Spain, who before Occupy began, “occupied” the public squares of Madrid and other cities – using, as the basis for their claim on the parks could be legally be slept in, a thirteenth-century right granted to shepherds who moved, and still move, their flocks annually.

In other words, we’re in occupy is the hero we deserve, but not the hero we need territory here.

*Addendum 1: Some have suggested that it’s not the wealth inequality that ought to be reduced, but the democratic elements of our system. California’s terrible decision-making resulting from its experiments with direct democracy notwithstanding, I would like to stay in the realm of the sane.

**Addendum 2: Yes, Shiller managed to get the book published the week before the crash. Talk about market timing.

Guest Post SuperReview Part II of VI: The Occupy Handbook Part I: How We Got Here

March 20, 2013 8 comments

Whatsup.

This is a review of Part I of The Occupy Handbook. Part I consists of twelve pieces ranging in quality from excellent to awful. But enough from me, in Janet Byrne’s own words:

Part 1, “How We Got Here,” takes a look at events that may be considered precursors of OWS: the stories of a brakeman in 1877 who went up against the railroads; of the four men from an all-black college in North Carolina who staged the first lunch counter sit-in of the 1960s; of the out-of-work doctor whose nationwide, bizarrely personal Townsend Club movement led to the passage of Social Security. We go back to the 1930s and the New Deal and, in Carmen M. Reinhart and Kenneth S. Rogoff‘s “nutshell” version of their book This Time Is Different: Eight Centuries of Financial Folly, even further.

Ms. Byrne did a bang-up job getting one Nobel Prize Winner in economics (Paul Krugman), two future Economics Nobel Prize winners (Robert Shiller, Daron Acemoglu) and two maybes (sorry Raghuram Rajan and Kenneth Rogoff) to contribute excellent essays to this section alone. Powerhouse financial journalists Gillian Tett, Michael Hilztik, John Cassidy, Bethany McLean and the prolific Michael Lewis all drop important and poignant pieces into this section. Arrogant yet angry anthropologist Arjun Appadurai writes one of the worst essays I’ve ever had the misfortune of reading and the ubiquitous Brandon Adams make his first of many mediocre appearances interviewing Robert Shiller. Clocking in at 135 pages, this is the shortest section of the book yet varies the most in quality. You can skip Professor Appadurai and Cassidy’s essays, but the rest are worth reading.

Advice from the 1 Percent: Lever Up, Drop Out by Michael Lewis

Framed as a strategy memo circulated among one-percenters, Lewis’ satirical piece written after the clearing of Zucotti Park begins with a bang.

The rabble has been driven from the public parks. Our adversaries, now defined by the freaks and criminals among them, have demonstrated only that they have no idea what they are doing. They have failed to identify a single achievable goal.

Indeed, the absurd fixation on holding Zuccotti Park and refusal to issue demands because doing so “would validate the system” crippled Occupy Wall Street (OWS). So far OWS has had a single, but massive success: it shifted the conversation back to the United States’ out of control wealth inequality managed to do so in time for the election, sealing the deal on Romney. In this manner, OWS functioned as a holding action by the 99% in the interests of the 99%.

We have identified two looming threats: the first is the shifting relationship between ambitious young people and money. There’s a reason the Lower 99 currently lack leadership: anyone with the ability to organize large numbers of unsuccessful people has been diverted into Wall Street jobs, mainly in the analyst programs at Morgan Stanley and Goldman Sachs. Those jobs no longer exist, at least not in the quantities sufficient to distract an entire generation from examining the meaning of their lives. Our Wall Street friends, wounded and weakened, can no longer pick up the tab for sucking the idealism out of America’s youth.We on the committee are resigned to all elite universities becoming breeding grounds for insurrection, with the possible exception of Princeton.

Michael Lewis speaks from experience; he is a Princeton alum and a 1 percenter himself. More than that however, he is also a Wall Street alum from Salomon Brothers during the 1980s snafu and wrote about it in the original guide to Wall Street, Liar’s Poker. Perhaps because of his atypicality (and dash of solipsism), he does not have a strong handle on human(s) nature(s). By the time of his next column in Bloomberg, protests had broken out at Princeton.

Ultimately ineffectual, but still better than…

Lewis was right in the end, but more than anyone sympathetic to the movement might like. OccupyPrinceton now consists of only two bloggers, one of which has graduated and deleted all his work from an already quiet site and another who is a senior this year. OccupyHarvard contains a single poorly written essay on the front page. Although OccupyNewHaven outlasted the original Occupation, Occupy Yale no longer exists. Occupy Dartmouth hasn’t been active for over a year, although it has a rather pathetic Twitter feed here. Occupy Cornell, Brown, Caltech, MIT and Columbia don’t exist, but some have active facebook pages. Occupy Michigan State, Rutgers and NYU appear to have had active branches as recently as eight months ago, but have gone silent since. Functionally, Occupy Berkeley and its equivalents at UCBerkeley predate the Occupy movement and continue but Occupy Stanford hasn’t been active for over a year. Anecdotally, I recall my friends expressing some skepticism that any cells of the Occupy movement still existed.

As for Lewis’ other points, I’m extremely skeptical about “examined lives” being undermined by Wall Street. As someone who started in math and slowly worked his way into finance, I can safely say that I’ve been excited by many of the computing, economic, and theoretical problems quants face in their day-to-day work and I’m typical. I, and everyone who has lived long-enough, knows a handful of geniuses who have thought long and hard about the kinds of lives they want to lead and realized that A. there is no point to life unless you make one and B. making money is as good a point as any. I know one individual, after working as a professional chemist prior to college,who decided to in his words, “fuck it and be an iBanker.” He’s an associate at DB. At elite schools, my friend’s decision is the rule rather than the exception, roughly half of Harvard will take jobs in finance and consulting (for finance) this year. Another friend, an exception, quit a promising career in operations research to travel the world as a pick-up artist. Could one really say that either the operations researcher or the chemist failed to examine their lives or that with further examinations they would have come up with something more “meaningful”?

One of the social hacks to give lie to Lewis-style idealism-emerging-from-an attempt-to-examine-ones-life is to ask freshpeople at Ivy League schools what they’d like to do when they graduate and observe their choices four years later. The optimal solution for a sociopath just admitted to a top school might be to claim they’d like to do something in the peace corp, science or volunteering for the social status. Then go on to work in academia, finance, law or tech or marriage and household formation with someone who works in the former. This path is functionally similar to what many “average” elite college students will do, sociopathic or not. Lewis appears to be sincere in his misunderstanding of human(s) nature(s). In another book he reveals that he was surprised at the reaction to Liar’s Poker – most students who had read the book “treated it as a how-to manual” and cynically asked him for tips on how to land analyst jobs in the bulge bracket. It’s true that there might be some things money can’t buy, but an immensely pleasurablemeaningful life do not seem to be one of them. Today for the vast majority of humans in the Western world, expectations of sufficient levels of cold hard cash are necessary conditions for happiness.

In short and contra Lewis, little has changed. As of this moment, Occupy has proven so harmless to existing institutions that during her opening address Princeton University’s president Shirley Tilghman called on the freshmen in the class of 2016 to “Occupy” Princeton. No freshpeople have taken up her injunction. (Most?) parts of Occupy’s failure to make a lasting impact on college campuses appear to be structural; Occupy might not have succeeded even with better strategy. As the Ivy League became more and more meritocratic and better at discovering talent, many of the brilliant minds that would have fallen into the 99% and become its most effective advocates have been extracted and reached their so-called career potential, typically defined by income or status level. More meritocratic systems undermine instability by making the most talented individuals part of the class-to-be-overthrown, rather than the over throwers of that system. In an even somewhat meritocratic system, minor injustices can be tolerated: Asians and poor rural whites are classes where there is obvious evidence of discrimination relative to “merit and the decision to apply” in elite gatekeeper college admissions (and thus, life outcomes generally) and neither group expresses revolutionary sentiment on a system-threatening scale, even as the latter group’s life expectancy has begun to decline from its already low levels. In the contemporary United States it appears that even as people’s expectations of material security evaporate, the mere possibility of wealth bolsters and helps to secure inequities in existing institutions.

Lewis continues:

Hence our committee’s conclusion: we must be able to quit American society altogether, and they must know it.The modern Greeks offer the example in the world today that is, the committee has determined, best in class. Ordinary Greeks seldom harass their rich, for the simple reason that they have no idea where to find them. To a member of the Greek Lower 99 a Greek Upper One is as good as invisible.

He pays no taxes, lives no place and bears no relationship to his fellow citizens. As the public expects nothing of him, he always meets, and sometimes even exceeds, their expectations. As a result, the chief concern of the ordinary Greek about the rich Greek is that he will cease to pay the occasional visit.

Michael Lewis is a wise man.

I can recall a conversation with one of my Professors; an expert on Democratic Kampuchea (American: Khmer Rouge), she explained that for a long time the identity of the oligarchy ruling the country was kept secret from its citizens. She identified this obvious subversion of republican principles (how can you have control over your future when you don’t even know who runs your region?) as a weakness of the regime. Au contraire, I suggested, once you realize your masters are not gods, but merely humans with human characteristics, that they: eat, sleep, think, dream, have sex, recreate, poop and die – all their mystique, their claims to superior knowledge divine or earthly are instantly undermined. De facto segregation has made upper classes in the nation more secure by allowing them to hide their day-to-day opulence from people who have lost their homes, job and medical care because of that opulence. Neuroscience will eventually reveal that being mysterious makes you appear more sexy, socially dominant, and powerful, thus making your claims to power and dominance more secure (Kautsky et. al. 2018).*

If the majority of Americans manage to recognize that our two tiered legal system has created a class whose actual claim to the US immense wealth stems from, for the most part, a toxic combination of Congressional pork, regulatory and enforcement agency capture and inheritance rather than merit, there will be hell to pay. Meanwhile, resentment continues to grow. Even on the extreme right one can now regularly read things like:

Now, I think I’d be downright happy to vote for the first politician to run on a policy of sending killer drones after every single banker who has received a post-2007 bonus from a bank that received bailout money. And I’m a freaking libertarian; imagine how those who support bombing Iraqi children because they hate us for our freedoms are going to react once they finally begin to grasp how badly they’ve been screwed over by the bankers. The irony is that a banker-assassination policy would be entirely constitutional according to the current administration; it is very easy to prove that the bankers are much more serious enemies of the state than al Qaeda. They’ve certainly done considerably more damage.

Wise financiers know when it’s time to cash in their chips and disappear. Rarely, they can even pull it off with class.

The rest of part I reviewed tomorrow. Hang in there people.

Addendum 1: If your comment amounts to something like “the Nobel Prize in Economics is actually called the The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel” and thus “not a real Nobel Prize” you are correct, yet I will still delete your comment and ban your IP.

*Addendum 2: More on this will come when we talk about the Saez-Delong discussion in part III.

Black Scholes and the normal distribution

There have been lots of comments and confusion, especially in this post, over what people in finance do or do not assume about how the markets work. I wanted to dispel some myths (at the risk of creating more).

First, there’s a big difference between quantitative trading and quantitative risk. And there may be a bunch of other categories that also exist, but I’ve only worked in those two arenas.

Markets are not efficient

In quantitative trading, nobody really thinks that “markets are efficient.” That’s kind of ridiculous, since then what would be the point of trying to make money through trading? We essentially make money because they aren’t. But of course that’s not to say they are entirely inefficient. Some approaches to removing inefficiency, and some markets, are easier than others. There can be entire markets that are so old and well-combed-over that the inefficiencies (that people have thought of) have been more or less removed and so, to make money, you have to be more thoughtful. A better way to say this is that the inefficiencies that are left are smaller than the transaction costs that would be required to remove them.

It’s not clear where “removing inefficiency” ends and where a different kind of trading begins, by the way. In some sense all algorithmic trades that work for any amount of time can be thought of as removing inefficiency, but then it becomes a useless concept.

Also, you can see from the above that traders have a vested interest to introduce new kinds of markets to the system, because new markets have new inefficiencies that can be picked off.

This kind of trading is very specific to a certain kind of time horizon as well. Traders and their algorithms typically want to make money in the average year. If there’s an inefficiency with a time horizon of 30 years it may still exist but few people are patient enough for it (I should add that we also probably don’t have good enough evidence that they’d work, considering how quickly the markets change). Indeed the average quant shop is going in the opposite direction, of high speed trading, for that very reason, to find the time horizon at which there are still obvious inefficiencies.

Black-Scholes

A long long time ago, before Black Monday in 1987, people didn’t know how to price options. Then Black-Scholes came out and traders started using the Black-Scholes (BS) formula and it worked pretty well, until Black Monday came along and people suddenly realized the assumptions in BS were ridiculous. Ever since then people have adjusted the BS formula. Everyone.

There are lots of ways to think about how to adjust the formula, but a very common one is through the volatility smile. This allows us to remove the BS assumption of constant volatility (of the underlying stock) and replace it with whatever inferred volatility is actually traded on in the market for that strike price and that maturity. As this commenter mentioned, the BS formula is still used here as a convenient reference to do this calculation.  If you extend your consideration to any maturity and any strike price (for the same underlying stock or thingy) then you get a volatility surface by the same reasoning.

Two things to mention. First, you can think of the volatility smile/ surface as adjusting the assumption of constant volatility, but you can also ascribe to it an adjustment of the assumption of a normal distribution of the underlying stock. There’s really no way to extricate those two assumptions, but you can convince yourself of this by a thought experiment: if the volatility stays fixed but the presumed shape of the distribution of the stocks gets fatter-tailed, for example, then option prices (for options that are far from the current price) will change, which will in turn change the implied volatility according to the market (i.e. the smile will deepen). In other words, the smile adjusts for more than one assumption.

The other thing to mention: although we’ve done a relatively good job adjusting to market reality when pricing an option, when we apply our current risk measures like Value-at-Risk (VaR) to options, we still assume a normal distribution of risk factors (one of the risk factors, if we were pricing options, would be the implied volatility). So in other words, we might have a pretty good view of current prices, but it’s not at all clear we know how to make reasonable scenarios of future pricing shifts.

Ultimately, this assumption of normal distributions of risk factors in calculating VaR is actually pretty important in terms of our view of systemic risks. We do it out of computational convenience, by the way. That and because when we use fatter-tailed assumptions, people don’t like the answer.

Categories: finance, modeling, statistics

Team Turnstile: how do NYC neighborhoods recover from extreme weather events?

I wanted to give you the low-down on a data hackathon I participated in this weekend, which was sponsored by the NYU Institute for Public Knowledge on the topic of climate change and social information. We were assigned teams and given a very broad mandate. We had only 24 hours to do the work, so it had to be simple.

Our team consisted of Venky Kannan, Tom Levine, Eric Schles, Aaron Schumacher, Laura Noren, Stephen Fybish, and me.

We decided to think about the effects of super storms on different neighborhoods. In particular, to measure the recovery time of the subway ridership in various neighborhoods using census information. Our project was inspired by this “nofarehikes” map of New York which tries to measure the impact of a fare hike on the different parts of New York. Here’s a copy of our final slides.

Also, it’s not directly related to climate change, but rather rests on the assumption that with climate change comes more frequent extreme weather events, which seems to be an existing myth (please tell me if the evidence is or isn’t there for that myth).

We used three data sets: subway ridership by turnstile, which only exists since May 2010, the census of 2010 (which is kind of out of date but things don’t change that quickly) and daily weather observations from NOAA.

Using the weather map and relying on some formal definitions while making up some others, we came up with a timeline of extreme weather events:

Screen Shot 2013-03-11 at 6.50.04 AM

Then we looked at subway daily ridership to see the effect of the storms or the recovery from the storms:

Screen Shot 2013-03-11 at 6.50.19 AMWe broke it down to individual stations. Here’s a closeup around Sandy:

Screen Shot 2013-03-11 at 6.51.05 AM

Then we used the census tracts to understand wealth in New York:

Screen Shot 2013-03-11 at 6.51.50 AMAnd of course we had to know which subway stations were in which census tracts. This isn’t perfect because we didn’t have time to assign “empty” census tracts to some nearby subway station. There are on the order of 2,000 census tracts but only on the order of 800 subway stations. But again, 24 hours isn’t alot of time, even to build clustering algorithms.

Finally, we attempted to put the data together to measure which neighborhoods have longer-than-expected recovery times after extreme weather events. This is our picture:

Screen Shot 2013-03-11 at 6.51.59 AM

Interestingly, it looks like the neighborhoods of Manhattan are most impacted by severe weather events, which is not in line with our prior [Update: I don't think we actually computed the impact on a given resident, but rather just the overall change in rate of ridership versus normal. An impact analysis would take into account the relative wealth of the neighborhoods and would probably look very different].

There are tons of caveats, I’ll mention only a few here:

  • We didn’t have time to measure the extent to which the recovery time took longer because the subway stopped versus other reasons people might not sure the subway. But our data is good enough to do this.
  • Our data might have been overwhelmingly biased by Sandy. We’d really like to do this with much longer-term data, but the granular subway ridership data has not been available for long. But the good news is we can do this from now on.
  • We didn’t have bus data at the same level, which is a huge part of whether someone can get to work, especially in the outer boroughs. This would have been great and would have given us a clearer picture.
  • When someone can’t get to work, do they take a car service? How much does that cost? We’d love to have gotten our hands on the alternative ways people got to work and how that would impact them.
  • In general we’d have like to measure the impact relative to their median salary.
  • We would also have loved to have measured the extent to which each neighborhood consisted of salary versus hourly wage earners to further understand how a loss of transportation would translate into an impact on income.

Unintended Consequences of Journal Ranking

I just read this paper, written by Björn Brembs and Marcus Munafò and entitled “Deep Impact: Unintended consequences of journal rank”. It was recently posted on the Computer Science arXiv (h/t Jordan Ellenberg).

I’ll give you a rundown on what it says, but first I want to applaud the fact that it was written in the first place. We need more studies like this, which examine the feedback loop of modeling at a societal level. Indeed this should be an emerging scientific or statistical field of study in its own right, considering how many models are being set up and deployed on the general public.

Here’s the abstract:

Much has been said about the increasing bureaucracy in science, stifling innovation, hampering the creativity of researchers and incentivizing misconduct, even outright fraud. Many anecdotes have been recounted, observations described and conclusions drawn about the negative impact of impact assessment on scientists and science. However, few of these accounts have drawn their conclusions from data, and those that have typically relied on a few studies. In this review, we present the most recent and pertinent data on the consequences that our current scholarly communication system has had on various measures of scientific quality (such as utility/citations, methodological soundness, expert ratings and retractions). These data confirm previous suspicions: using journal rank as an assessment tool is bad scientific practice. Moreover, the data lead us to argue that any journal rank (not only the currently-favored Impact Factor) would have this negative impact. Therefore, we suggest that abandoning journals altogether, in favor of a library-based scholarly communication system, will ultimately be necessary. This new system will use modern information technology to vastly improve the filter, sort and discovery function of the current journal system.

The key points in the paper are as follows:

  • There’s a growing importance of science and trust in science
  • There’s also a growing rate (x20 from 2000 to 2010) of retractions, with scientific misconduct cases growing even faster to become the majority of retractions (to an overall rate of 0.02% of published papers)
  • There’s a larger and growing “publication bias” problem – in other words, an increasing unreliability of published findings
  • One problem: initial “strong effects” get published in high-ranking journal, but subsequent “weak results” (which are probably more reasonable) are published in low-ranking journals
  • The formal “Impact Factor” (IF) metric for rank is highly correlated to “journal rank”, defined below.
  • There’s a higher incidence of retraction in high-ranking (measured through “high IF”) journals.
  • “A meta-analysis of genetic association studies provides evidence that the extent to which a study over-estimates the likely true effect size is positively correlated with the IF of the journal in which it is published”
  • Can the higher retraction error in high-rank journal be explained by higher visibility of those journals? They think not. Journal rank is bad predictor for future citations for example. [mathbabe inserts her opinion: this part needs more argument.]
  • “…only the most highly selective journals such as Nature and Science come out ahead over unselective preprint repositories such as ArXiv and RePEc”
  • Are there other measures of excellence that would correlate with IF? Methodological soundness? Reproducibility? No: “In fact, the level of reproducibility was so low that no relationship between journal rank and reproducibility could be detected.
  • More about Impact Factor: The IF is a metric for the number of citations to articles in a journal (the numerator), normalized by the number of articles in that journal (the denominator). Sounds good! But:
  • For a given journal, IF is not calculated but is negotiated – the publisher can (and does) exclude certain articles (but not citations). Even retroactively!
  • The IF is also not reproducible – errors are found and left unexplained.
  • Finally, IF is likely skewed by the fat-tailedness of citations (certain articles get lots, most get few). Wouldn’t a more robust measure be given by the median?

Conclusion

  1. Journal rank is a weak to moderate predictor of scientific impact
  2. Journal rank is a moderate to strong predictor of both intentional and unintentional scientific unreliability
  3. Journal rank is expensive, delays science and frustrates researchers
  4. Journal rank as established by IF violates even the most basic scientific standards, but predicts subjective judgments of journal quality

Long-term Consequences

  • “IF generates an illusion of exclusivity and prestige based on an assumption that it will predict subsequent impact, which is not supported by empirical data.”
  • “Systemic pressures on the author, rather than increased scrutiny on the part of the reader, inflate the unreliability of much scientific research. Without reform of our publication system, the incentives associated with increased pressure to publish in high-ranking journals will continue to encourage scientiststo be less cautious in their conclusions (or worse), in an attempt to market their research to the top journals.”
  • “It is conceivable that, for the last few decades, research institutions world-wide may have been hiring and promoting scientists who excel at marketing their work to top journals, but who are not necessarily equally good at conducting their research. Conversely, these institutions may have purged excellent scientists from their ranks, whose marketing skills did not meet institutional requirements. If this interpretation of the data is correct, we now have a generation of excellent marketers (possibly, but not necessarily also excellent scientists) as the leading figures of the scientific enterprise, constituting another potentially major contributing factor to the rise in retractions. This generation is now in charge of training the next generation of scientists, with all the foreseeable consequences for the reliability of scientific publications in the future.

The authors suggest that we need a new kind of publishing platform. I wonder what they’d think of the Episciences Project.

Follow

Get every new post delivered to your Inbox.

Join 976 other followers