Archive

Author Archive

Analyzing the complexity of the Stacks Project graphs

So yesterday I told you about the cool new visualizations now available on Johan’s Stack Project.

But how do we use these visualizations to infer something about either mathematics or, at the very least, the way we think about mathematics? Here’s one way we thought of with Pieter.

So, there’s a bunch of results, and each of them has its own subgraph of the entire graph which positions that result as the “base node” and shows all the other results which it logically depends on.

And each of those graphs has structure and attributes, the stupidest two of which are the just counts of the nodes and edges. So for each result, we have an ordered pair (#nodes, #edges). What can we infer about mathematics from these pairs?

Here’s a scatter plot of the nodes-vs-edges for each of the 10,445 results (email me if you want to play with this data yourself):

nodes_vs_edges_stacks_project

I also put a best-fit line in, just to illustrate that the scatter plot is super linear but not perfectly linear.

So there are a bunch of comments I can make about this, but I’ll limit myself to the following:

  1. There are a lot of points at (1,0), corresponding to remarks, axioms, beginning lemmas, definitions, and tags for sections.
  2. As a data person, let me just say that data is never this clean. There’s something going on, some internal structure to these graphs that we should try to understand.
  3. By “clean” I’m not exactly referring to the fact that things look pretty linear, although that’s weird and we should think about that. What I really mean is that things are so close to the curve that is being approximated. They’re all within a very tight border of this imaginary line. It’s super amazing.
  4. Let’s pretend it’s just plain straight. Does that make sense, that as graphs get more complex the edges don’t get more dense than some multiple (1.86) of of the number of nodes?
  5. Kind of: remember, we don’t depict all logical dependency edges, just the ones that are directly referred to in the proof of a result. So right off the bat you are less surprised that the edges aren’t growing quadratically in the number of nodes, even though the number of possible edges is of course quadratic in the number of nodes.
  6. Think about it this way: assume that every result that requires proof (so, that’s not a (1,0) result) refers to exactly 2 other results in its proof. Then those two child results each correspond to some subgraph of the entire graph, and say their subgraphs each have something like twice as many edges as nodes. Then, ignoring overlap, we’d see two graphs with a 2:1 ratio, then we’d see that parent node, plus two edges leading to each result, which is also a 2:1 ratio, and the disjoint union of all those graphs gives us a large graph with a 2:1 ratio.
  7. Then if you imagine now allowing the overlap, the ratio goes down a bit on average. In this toy model, the discrepancy between 2.0 and the slope we actually see, 1.86, is a measurement of the collapse of the two child graphs, which can be taken as a proxy for how much the two supporting results overlap as notions.
  8. Of course, not every result has exactly two children.
  9. Plus it doesn’t really explain how ridiculously consistent the plot above is. What would?
  10. If you think about it, the only real explanation of the consistency above is my husband brain.
  11. In other words, he’s humming along, thinking about stacks, and at some point, when he thinks things have gotten complicated enough, he says to himself “It’s time to wrap this stuff up and call it a result!” and then he does so. That moment, when he’s decided things are getting complicated enough, is very consistent internally to his brain.
  12. In other words, if someone else created the stacks project, I’d expect to see another kind of plot, possibly also very consistent, but possibly with a different slope.
  13. Also it’d be interesting to compare this plot to another kind of citation network graph, like the papers in the arXiv. Has anyone made that?
Categories: math, modeling

The Stacks Project gets ever awesomer with new viz

Crossposted on Not Even Wrong.

Here’s a completely biased interview I did with my husband A. Johan de Jong, who has been working with Pieter Belmans on a very cool online math project using d3js. I even made up some of his answers (with his approval).

Q: What is the Stacks Project?

A: It’s an open source textbook and reference for my field, which is algebraic geometry. It builds foundations starting from elementary college algebra and going up to algebraic stacks. It’s a self-contained exposition of all the material there, which makes it different from a research textbook or the experience you’d have reading a bunch of papers.

We were quite neurotic setting it up – everything has a proof, other results are referenced explicitly, and it’s strictly linear, which is to say there’s a strict ordering of the text so that all references are always to earlier results.

Of course the field itself has different directions, some of which are represented in the stacks project, but we had to choose a way of presenting it which allowed for this idea of linearity (of course, any mathematician thinks we can do that for all of mathematics).

Q: How has the Stacks Project website changed?

A: It started out as just a place you could download the pdf and tex files, but then Pieter Belmans came on board and he added features such as full text search, tag look-up, and a commenting system. In this latest version, we’ve added a whole bunch of features, but the most interesting one is the dynamic generation of dependency graphs.

We’ve had some crude visualizations for a while, and we made t-shirts from those pictures. I even had this deal where, if people found mathematical mistakes in the Stacks Project, they’d get a free t-shirt, and I’m happy to report that I just last week gave away my last t-shirt. Here’s an old picture of me with my adorable son (who’s now huge).

tshirt_stacks

Q: Talk a little bit about the new viz.

A: First a word about the tags, which we need to understand the viz.

Every mathematical result in the Stacks Project has a “tag”, which is a four letter code, and which is a permanent reference for that result, even as other results are added before or after that one (by the way, Cathy O’Neil figured this system out).

The graphs show the logical dependencies between these tags, represented by arrows between nodes. You can see this structure in the above picture already.

So for example, if tag ABCD refers to Zariski’s Main Theorem, and tag ADFG refers to Nakayama’s Lemma, then since Zariski depends on Nakayama, there’s a logical dependency, which means the node labeled ABCD points to the node labeled ADFG in the entire graph.

Of course, we don’t really look at the entire graph, we look at the subgraph of results which a given result depends on. And we don’t draw all the arrows either, we only draw the arrows corresponding to direct references in the proofs. Which is to say, in the subgraph for Zariski, there will be a path from node ABCD to node ADFG, but not necessarily a direct link.

Q: Can we see an example?

Let’s move to an example for result 01WC, which refers to the proof that “a locally projective morphism is proper”.

First, there are two kinds of heat maps. Here’s one that defines distance as the maximum (directed) distance from the root node. In other words, how far down in the proof is this result needed? In this case the main result 01WC is bright red with a black dotted border, and any result that 01WC depends on is represented as a node. The edges are directed, although the arrows aren’t drawn, but you can figure out the direction by how the color changes. The dark blue colors are the leaf nodes that are farthest away from the root.

Another way of saying this is that the redder results are the results that are closer to it in meaning and sophistication level.

Note if we had defined the distance as the minimum distance from the root node (to come soon hopefully), then we’d have a slightly different and also meaningful way of thinking about “redness” as “relevance” to the root node.

This is a screenshot but feel free to play with it directly here. For all of the graphs, hovering over a result will cause the statement of the result to appear, which is awesome.

Stacks Project — Force (depth) 01WC

Next, let’s look at another kind of heat map where the color is defined as maximum distance from some leaf note in the overall graph. So dark blue nodes are basic results in algebra, sheaves, sites, cohomology, simplicial methods, and other chapters. The link is the same, you can just toggle between the different metric.

Stacks Project — Force (height) 01WC

Next we delved further into how results depend on those different topics. Here, again for the same result, we can see the extent to which that result depends on the different on results from the various chapters. If you scroll over the nodes you can see more details. This is just a screenshot but you can play with it yourself here and you can collapse it in various ways corresponding to the internal hierarchy of the project.

Stacks Project — Collapsible 01WC

Finally, we have a way of looking at the logical dependency graph directly, where result node is labeled with a tag and colored by “type”: whether it’s a lemma, proposition, theorem, or something else, and it also annotates the results which have separate names. Again a screenshot but play with it here, it rotates!

Stacks Project — Cluster 01WC

Check out the whole project here, and feel free to leave comments using the comment feature!

Larry Summers being set up to fail?

I’m back from PyData, which was a lot of fun and filled with super nice nerdy people. My prezi slides are now available here.

I have time for one thought: a bunch of people have chatted me up recently with the theory that Larry Summers is being put in the running for the Fed Chair alongside Janet Yellen just so that, when Yellen gets the call, we can all breathe a sigh of relief it didn’t go to Summers.

In other words, it’s a wholly political ploy so the Obama can look like a hero for women everywhere when he chooses Yellen, and so that we can all conclude that at least Obama’s learned this one lesson with regards to dealing with the ongoing financial crisis: Summers isn’t the solution.

Depending on my mood I sometimes buy into this theory, but obviously I’m still worried.

Categories: finance, news

PyData talk today

Not much time because I’m giving a keynote talk at the PyData 2013 conference in Cambridge today, which is being held at the Microsoft NERD conference center.

It’s gonna be videotaped so I’ll link to that when it’s ready.

My title is “Storytelling With Data” but for whatever reason on the schedule handed out yesterday the name had been changed to “Scalable Storytelling With Data”. I’m thinking of addressing this name change in my talk – one of the points of the talk, in fact, is that with great tools, we don’t need to worry too much about the scale.

Plus since it’s Sunday morning I’m going to make an effort to tie my talk into an old testament story, which is totally bizarre since I’m not at all religious but for some reason it feels right. Please wish me luck.

Aunt Pythia’s advice

It’s a speed advice column today, folks, because I’m blogging whilst sitting at the PyData 2013 conference [Aside: I believe in Travis Oliphant, the nerd Santa Claus, do you?]. I’ll try to keep it to the point yet amusing slash provocative.

By the way, if you don’t know what the hell I’m talking about, go here for past advice columns and here for an explanation of the name Pythia.

And please, Submit your question for Aunt Pythia at the bottom of this page!

——

Dear Aunt Pythia,

I’m having a baby soon, and I’m planning to be the primary caregiver for a few months (from 3 months onward). I’m hoping that I’ll be able to get some research done at the same time, but I’m not sure how practical that is. What should I expect? Do you have any tips for juggling baby care and math research? (assuming no teaching and minimal responsibilities around the department.)

Baffled About Birth Year

Dear BABY,

Other people are gonna tell you encouraging things like, “oh you can do it!” or “If anybody can do it, it’s you!” but not me.

Don’t get me wrong, I’m not telling you you can’t do it, but by acting like it’s just a matter of proper planning, I’d be underselling how much work you’re signing up for, and how fucking hard it really is going to be.

So here’s the real deal: it’s the hardest thing you’ll ever do (hopefully). You know how grad school was hard? This is like having to write a thesis once a year while living 24/7 with someone who’s only goal is for you to not get that done.

Which is to say: be incredibly proud of yourself every day you survive this period, and don’t add an ounce of guilt to yourself that you can avoid. Guilt doesn’t help. And also, the system is set up badly for you, to be sure, but don’t dwell on it too much, that also doesn’t help while you’re in it.

In terms of very practical advice: pay through the nose for good babysitting and daycare, it’s worth the investment so that you don’t have to worry your kid is getting love and attention. Go into debt, borrow money, or whatever, but get it set up so that you actually feel jealous of your kid, and specifically so you know your kid is better off with that situation for the next few hours than being with you.

Finally, when you feel crazy and insane and underproductive, know that it’ll get better, for sure, by the time the kids can wipe their own asses, and that you won’t regret having those beautiful children nor trying to get something else done too. Never apologize for needing to cry and vent about how hard this period is, and if you’re around people who don’t get it, find new people.

Good luck!

Cathy

——

Aunt Pythia,

How do I dress to make people think I am an adult? I’m a 25-year-old woman, and I’m getting a bit tired of people asking me if I’m a student.

I think they ask me this because I only wear jeans and nerdy t-shirts. I basically only own jeans and nerdy t-shirts, plus some cardigans. I am not at all interested in skirts or girly things, but I’m open to wearing slightly nicer clothes. Like more cardigans? Messenger bags that aren’t falling apart? Urk.

People on the internet claim that I need to pluck my eyebrows to be taken seriously, but fuck that shit.

Shopping Is Hard! Let’s Do Math

Dear Sihldm,

First, I gotta say I was expecting a bit more from that sign-off. I really don’t see what “Sihldm” is supposed to mean, but maybe I’m just out of the loop.

Second, I’m gonna say something kind of controversial. Namely, I think the single attribute that makes people take me seriously is the fact that I’m overweight (and that, nowadays, I have grey hair, which also helps).

I think people just stop thinking “girl” and start thinking “woman” when confronted with me, and that totally works to my advantage. Controversial because, according to the social contract, I’m supposed to feel consistently bad about my weight, but here’s an example where I’m like, wow I’ve never been underestimated as a “girl”.

So, my advice to you is: pack on like 100 pounds.

Just kidding, probably not a great plan, nor possible.

Here’s another try: whenever you’re giving a talk or starting a class, wear wool slacks and a sweater. For whatever reason people take you super seriously when you do, even if you’re not fat, and even if you’re short. If it’s summer, go for summer slacks and silk shirts, although not the kind of silk that shows sweat stains easily, those are embarrassing.

And if it’s not a special event like a talk or the first day of class, then fuck it, be yourself.

Good luck!

Cathy

——

Dear Aunt Pythia,

My husband stays home with the children, but in spite of a graduate degree in engineering and graduate work in mathematics, seems incapable of maintaining a clean house.

My question is, if 95% of the time he doesn’t sort the mail, 75% of the time he doesn’t vacuum, 50% of the time he doesn’t wash the dishes, and 80% of the time he doesn’t wipe the kitchen counters, what is the probability that he doesn’t actually see dirt? (He is color blind.)

Buried in junk mail

Dear Bijm,

Bijm? Really?

Are the kids healthy? Happy? Do they get fed non-dorito-like food? I’d say be grateful. If and when you can afford it get housekeeping, but don’t make the mistake I see so much of allowing resentment to build up over chores.

Also, keep in mind that the kids will be able to help with the chores soon. And by “soon” I mean “probably already”. Buy cute toy-like vacuum cleaners and make up a game about getting all the dirt. Make it part of the dessert ritual that the counters need to be clean first. Move your bills to online payments.

And enjoy your sexy househusband!! [Important aside: is he willing to wear an apron and nothing else when he cooks? Please answer privately, preferably with jpeg-formatted evidence.]

Aunt Pythia

——

Please submit your well-specified, fun-loving, cleverly-abbreviated ethical quandary to Aunt Pythia!

Categories: Aunt Pythia

Huma’s Little Weiner Problem

This is a guest post by my friend Laura Strausfeld.

As an unlicensed psychotherapist, here’s my take on why Huma Abedin is supporting her husband Anthony Weiner’s campaign for mayor:

It’s all about the kid.

Jordan Weiner is 19 months old. When he’s 8 or 9—or 5, and wearing google glasses—maybe he’ll google his name and read about his father’s penis. Either that, or one of his buddies at school may ask him about his father’s penis. Jordan might then ask his mommy and daddy about his father’s penis and they’ll tell him either 1) your daddy was a great politician, but had to resign from Congress because he admitted to showing people his penis, which we recommend you don’t do, especially when you’re a grownup and on twitter; or 2) your daddy was a great politician and ran a very close race for mayor—that’s right, your daddy was almost mayor of New York City!—but he lost because people said he showed people his penis and that’s none of anybody’s business.

Let’s look at this from Huma’s perspective. She’s got a child for a husband, with a weird sexual addiction that on the positive side, doesn’t appear to carry the threat of STDs. But her dilemma is not about her marriage. The marriage is over. What she cares about is Jordan. And this is where she’s really fucked. Whatever happens, Anthony will always be her child’s father.

That bears repeating. You’ve got a child you love more than anything in the world, will sacrifice anything for, and will always now be stigmatized as the son of a celebrity-sized asshole. What are your choices?

The best scenario for Huma is if Anthony becomes mayor. Then she can divorce his ass, get primary custody and protect her child from growing up listening to penis jokes about his loser father. There will be jokes, but at least they’ll be about the mayor’s penis. And with a whole lot of luck, they might even be about how his father’s penis was a lot smaller in the mind of the public than his policies.

Weiner won’t get my vote, however. And for that, I apologize to you, Jordan. You have my sympathy, Huma.

Categories: guest post

Radhika Nagpal is a role model for fun people everywhere

Can I hear an amen for Radhika Nagpal, the brave woman who explained to the world recently how she lived through being a tenure-track professor at Harvard without losing her soul?

You should really read Nagpal’s guest blogpost from Scientific American (hat tip Ken Ribet) yourself, but here’s just a sneak preview, namely her check list of survival tactics that she describes in more detail later in the piece:

  • I decided that this is a 7-year postdoc.
  • I stopped taking advice.
  • I created a “feelgood” email folder.
  • I work fixed hours and in fixed amounts.
  • I try to be the best “whole” person I can.
  • I found real friends.
  • I have fun “now”.

I really love this list, especially the “stop taking advice” part. I can’t tell you how much crap advice you get when you’re a tenure-track woman in a technical field. Nagpal was totally right to decide to ignore it, and I wish I’d taken her advice to ignore people’s advice, even though that sounds like a logical contradiction.

What I like the most about her list was her insistence on being a whole person and having fun – I have definitely had those rules since forever, and I didn’t have to make them explicit, I just thought of them as obvious, although maybe it was for me because my alternative was truly dark.

It’s just amazing how often people are willing to make themselves miserable and delay their lives when they’re going for something ambitious. For some reason, they argue, they’ll get there faster if they’re utterly submissive to the perceived expectations.

What bullshit! Why would anyone be more efficient at learning, at producing, or at creating when they’re sleep-deprived and oppressed? I don’t get it. I know this sounds like a matter of opinion but I’m super sure there’ll be some study coming out describing the cognitive bias which makes people believe this particular piece of baloney.

Here’s some advice: go get laid, people, or whatever it is that you really enjoy, and then have a really good night’s sleep, and you’ll feel much more creative in the morning. Hell, you might even think of something during the night – all my good ideas come to me when I’m asleep.

Even though her description of tenure-track life resonates with me, this problem, of individuals needlessly sacrificing their quality of life, isn’t confined to academia by any means. For example I certainly saw a lot of it at D.E. Shaw as well.

In fact I think it happens anywhere where there’s an intense environment of expectation, with some kind of incredibly slow-moving weeding process – academia has tenure, D.E. Shaw has “who gets to be a Managing Director”. People spend months or even years in near-paralysis wondering if their superiors think they’re measuring up. Gross!

Ultimately it happens to someone when they start believing in the system. Conversely the only way to avoid that kind of oppression is to live your life in denial of the system, which is what Nagpal achieved by insisting on thinking of her tenure-track job as having no particular goal.

Which didn’t mean she didn’t work hard and get her personal goals done, and I have tremendous respect for her work ethic and drive. I’m not suggesting that we all get high-powered positions and then start slacking. But we have to retain our humanity above all.

Bottomline, let’s perfect the art of ignoring the system when it’s oppressive, since it’s a useful survival tactic, and also intrinsically changes the system in a positive way by undermining it. Plus it’s way more fun.

Categories: math, musing, women in math

MOOCs, their failure, and what is college for anyway?

Have you read this recent article in Slate about they canceled online courses at San Jose State University after more than half the students failed? The failure rate ranged from 56 to 76 percent for five basic undergrad classes with a student enrollment limit of 100 people.

Personally, I’m impressed that so many people passed them considering how light-weight the connection is in such course experiences. Maybe it’s because they weren’t free – they cost $150.

It all depends on what you were expecting, I guess. It begs the question of what college is for anyway.

I was talking to a business guy about the MOOC potential for disruption, and he mentioned that, as a Yale undergrad himself, he never learned a thing in classes, that in fact he skipped most of his classes to hang out with his buddies. He somehow thought MOOCs would be a fine replacement for that experience. However, when I asked him whether he still knew any of his buddies from college, he acknowledged that he does business with them all the time.

Personally, this confirms my theory that education is more about making connection than education per se, and although I learned a lot of math in college, I also made a friend who helped me get into grad school and even introduced me to my thesis advisor.

Proprietary credit score model now embedded in law

I’ve blogged before about how I find it outrageous that the credit scoring models are proprietary, considering the impact they have on so many lives.

The argument given for keeping them secret is that otherwise people would game the models, but that really doesn’t make sense.

After all, the models that the big banks have to deal with through regulation aren’t secret, and they game those models all the time. It’s one of the main functions of the banks, in fact, to figure out how to game the models. So either we don’t mind gaming or we don’t hold up our banks to the same standards as our citizens.

Plus, let’s say the models were open and people started gaming the credit score models – what would that look like? A bunch of people paying their electricity bill on time?

Let’s face it: the real reason the models are secret is that the companies who set them up make more money that way, pretending to have some kind of secret sauce. What they really have, of course, is a pretty simple model and access to an amazing network of up-to-date personal financial data, as well as lots of clients.

Their fear is that, if their model gets out, anyone could start a credit scoring agency, but actually it wouldn’t be so easy – if I wanted to do it, I’d have to get all that personal data on everyone. In fact, if I could get all that personal data on everyone, including the historical data, I could easily build a credit scoring model.

So anyhoo, it’s all about money, that and the fact that we’re living under the assumption that it’s appropriate for credit scoring companies to wield all this power over people’s lives, including their love lives.

It’s like we have a secondary system of secret laws where we don’t actually get to see the rules, nor do we get to point out mistakes or reasonably refute them. And if you’re thinking “free credit report,” let’s be clear that that only tells you what data goes in to the model, it doesn’t tell you how it’s used.

As it turns out, though, it’s now more than like a secondary system of laws – it’s become embedded in our actual laws. Somehow the proprietary credit scoring company Equifax is now explicitly part of our healthcare laws. From this New York Times article (hat tip Matt Stoller):

Federal officials said they would rely on Equifax — a company widely used by mortgage lenders, social service agencies and others — to verify income and employment and could extend the initial 12-month contract, bringing its potential value to $329.4 million over five years.

Contract documents show that Equifax must provide income information “in real time,” usually within a second of receiving a query from the federal government. Equifax says much of its information comes from data that is provided by employers and updated each payroll period.

Under the contract, Equifax can use sources like credit card applications but must develop a plan to indicate the accuracy of data and to reduce the risk of fraud.

Thanks Equifax, I guess we’ll just trust you on all of this.

If we bailed out the banks, why not Detroit? (#OWS)

I wrote a post yesterday to discuss the fact that, as we’ve seen in Detroit and as we’ll soon see across the country, the math isn’t working out on pensions. One of my commenters responded, saying I was falling for a “very right wing attack on defined benefit pensions.”

I think it’s a mistake to think like that. If people on the left refuse to discuss reality, then who owns reality? And moreover, who will act and towards what end?

Here’s what I anticipate: just as “bankruptcy” in the realm of airlines has come to mean “a short period wherein we toss our promises to retired workers and then come back to life as a company”, I’m afraid that Detroit may signal the emergence of a new legal device for cities to do the same thing, especially the tossing out of promises to retired workers part. A kind of coordinated bankruptcy if you will.

It comes down to the following questions. For whom do laws work? Who can trust that, when they enter a legal obligation, it will be honored?

From Trayvon Martin to the people who have been illegally foreclosed on, we’ve seen the answer to that.

And then we might ask, for whom are laws written or exceptions made? And the answer to that might well be for banks, in times of crisis of their own doing, and so they can get their bonuses.

I’m not a huge fan of the original bailouts, because it ignored the social and legal contracts in the opposite way, that failures should fail and people who are criminals should go to jail. It didn’t seem fair then, and it still doesn’t now, as JP Morgan posts record $6.4 billion profits in the same quarter that it’s trying to settle a $500 million market manipulation charge.

It’s all very well to rest our arguments on the sanctity of the contract, but if you look around the edges you’ll see whose contracts get ripped up because of fraudulent accounting, and whose bonuses get bigger.

And it brings up the following question: if we bailed out the banks, why not the people of Detroit?

Categories: #OWS, finance, rant

Math fraud in pensions

I wrote a post three months ago talking about how we don’t need better models but we need to stop lying with our models. My first example was municipal debt and how various towns and cities are in deep debt partly because their accounting for future pension obligations allows them to be overly optimistic about their investments and underfund their pension pots.

This has never been more true than it is right now, and as this New York Times Dealbook article explains, was a major factor in Detroit’s bankruptcy filing this past week. But don’t make any mistake: even in places where they don’t end up declaring bankruptcy, something is going to shake out because of these broken models, and it isn’t going to be extra money for retired civil servants.

It all comes down to wanting to avoid putting required money away and hiring quants (in this case actuaries) to make that seem like it’s mathematically acceptable. It’s a form of mathematical control fraud. From the article:

When a lender calculates the value of a mortgage, or a trader sets the price of a bond, each looks at the payments scheduled in the future and translates them into today’s dollars, using a commonplace calculation called discounting. By extension, it might seem that an actuary calculating a city’s pension obligations would look at the scheduled future payments to retirees and discount them to today’s dollars.

But that is not what happens. To calculate a city’s pension liabilities, an actuary instead projects all the contributions the city will probably have to make to the pension fund over time. Many assumptions go into this projection, including an assumption that returns on the investments made by the pension fund will cover most of the plan’s costs. The greater the average annual investment returns, the less the city will presumably have to contribute. Pension plan trustees set the rate of return, usually between 7 percent and 8 percent.

In addition, actuaries “smooth” the numbers, to keep big swings in the financial markets from making the pension contributions gyrate year to year. These methods, actuarial watchdogs say, build a strong bias into the numbers. Not only can they make unsustainable pension plans look fine, they say, but they distort the all-important instructions actuaries give their clients every year on how much money to set aside to pay all benefits in the future.

One caveat: if the pensions have actually been making between 7 percent and 8 percent on their investments every year then all is perhaps well. But considering that they typically invest in bonds, not stocks – which is a good thing – we’re likely seeing much smaller returns than that, which means their yearly contributions to the local pension plans are in dire straits.

What’s super interesting about this article is that it goes into the action on the ground inside the Actuary community, since their reputations are at stake in this battle:

A few years ago, with the debate still raging and cities staggering through the recession, one top professional body, the Society of Actuaries, gathered expert opinion and realized that public pension plans had come to pose the single largest reputational risk to the profession. A Public Plans Reputational Risk Task Force was convened. It held some meetings, but last year, the matter was shifted to a new body, something called the Blue Ribbon Panel, which was composed not of actuaries but public policy figures from a number of disciplines. Panelists include Richard Ravitch, a former lieutenant governor of New York; Bradley Belt, a former executive director of the Pension Benefit Guaranty Corporation; and Robert North, the actuary who shepherds New York City’s five big public pension plans.

I’m not sure what happened here, but it seems like a bunch of people in a profession, the actuaries, got worried that they were being used by politicians, and decided to investigate, but then that initiative got somehow replaced by a bunch of politicians. I’d love to talk to someone on the inside about this.

Categories: finance, math, modeling, statistics

Aunt Pythia’s advice

Aunt Pythia is back and, since her family has finally been reunited, sleeping well. Thank goodness! Hallelujah!

I’m psyched to be getting some great questions from the math community. If you’re a math nerd, and even if you’re not, please:

Submit your question for Aunt Pythia at the bottom of this page!

By the way, if you don’t know what the hell I’m talking about, go here for past advice columns and here for an explanation of the name Pythia.

——

Dear Aunt Pythia,

I’ve been thinking a lot about your remark from this previous post:

Like a lot of academics, he understands ambition in one narrow field, and doesn’t even relate to not wanting to be successful in this realm

That has really resonated with me. I am trying to make it as an academic, and I admit I am super boring because all I really care about is math and exercise, and I’m not really smart enough or care enough to have an informed opinion of much else.

Unfortunately this makes it hard to attract women, and the ones I have gone out on dates with said that I am not very engaging. On top of that most women want children, and I have read (and agree with) your post on why wanting children is ridiculous. I am also not located in a region where I have any colleagues or even graduate students working in my area of math to talk math with and so I feel pretty isolated in so many levels.

What does it take to become a math professor at an ivy league caliber institution (e.g. Harvard, MIT Columbia, Princeton)? Does one have to be working/thinking about math for much of one’s day? I presume you have an inside view.

Math is Titillating

Dear MiT,

First of all thanks for bringing up that previous answer. I have gotten a lot of people writing in saying I misinterpreted his description of taking extra time to finish his Ph.D.; most people generally think he only took one extra year whereas I read it as two extra years, which makes a big difference. Given this, I was probably too harsh on the guy, although I still think grad students should go to seminars.

As an aside, when did we start using “last year” to mean “this year” and “next year” to mean “next year” but stopped using “this year” to mean anything?

Now on to your question. Do you have to be thinking about math all the time to get a great job? Probably. There are exceptions but they’re rare, as you know.

Let’s face it, this wasn’t really a question for Aunt Pythia. I think you just identified with the description of being boring and only caring about getting a fancy math job, since that’s all you actually care about, as evidenced by your question.

But hey, I’m Aunt Pythia, so I’ve got advice for you anyway.

Don’t feel bad about it! It’s just how you’re programmed, it’s fine. You love math and not much else! Shout it loud from the rooftops and you might just find a girl nerd who’s psyched with your boring self. Just please don’t expect everyone else to be like you, especially your graduate students.

Aunt Pythia

——

Dear Aunt Pythia,

I’m a math professor in a bit of an ethical quandary.

There is a researcher in my field who is widely known (by those in the field) to be a Certified Asshole (CA). He cuts down other people and their work, often in underhanded and awful ways. The people in question are often women (but not always) and often young (grad students or postdocs). He is a tenured full prof at a Very Good School, though, so those who don’t know him respect the position and his publication record. They consider him to be a Serious Person instead of the CA that he is.

In our recent round of hiring, I read the packet of a very talented graduating student who is applying for postdocs. This student has a few publications already including one very, very nice result. He is also a current collaborator of mine, and I know him a bit personally.

The letter in the student’s application from CA (another collaborator of the student) is underhanded and sabotaging. It says nothing outright negative, of course, but has key phrases like “promising teaching career at a liberal arts school” or somesuch. It also manages to be self-aggrandizing about CA himself rather than praising the grad student and his work.

This student did not get any offers this year, and I know he will be on the market again this year. I can’t help thinking that this letter is hurting his chances for a research postdoc. CA is not his advisor. While it would help to have a good letter from a person in a position such as CA’s, I don’t think this particular letter is helping him.

I can’t figure out an ethical way to help the student. I can’t come out and tell him what’s in the letter. I can’t really say anything even alluding to that. Is there anything I can do to help him?

Better yet, is there anything I can do to hurt CA even though I am in a more junior position at a less well-respected school?

Math is Awesome, People Suck

Dear MAPS,

What a rich question! There are so many issues here, I do believe we could start an entire blog addressing just this ethical quandary, worked out in its entirety.

First of all, I agree that there is an ethical quandary, mostly because you read the CA letter.

If you’d told your friend not to get a letter from the CA beforehand, because he’s a known shitty letter writer, I think that would have been fine and not unethical. But given that you didn’t, and that your friend got that letter, and that you read the letter, it would now seem like spying to go back and tell your friend to get a new letter in the next round. After all, if you’d read the letter and it was great, then you wouldn’t be telling your friend to go get a new letter writer.

As an aside, it doesn’t make sense to me that, during the hiring process, people read the folders of their current collaborators – doesn’t that seem ripe for this kind of conflict of interest?

Now just a few words on “shitty letter writers” before we go on to actual advice. There are different kinds of shitty letter writers, which I’ll split into two broad categories: the tough letter writer, who has consistently high standards and doesn’t wax poetic about anyone ever, and the narcissistic letter writer, who is inconsistent with their praise, sometimes cold sometimes hot, depending on idiosyncratic things like whether they like the young person’s personality and whether they’ve seen enough citations to the narcissist’s own work.

In the large and relatively functional system that is recommendation letters for math jobs, the tough letter writer is a pretty familiar concept, and the system has adapted more or less to its existence. In other words, people who read a lot of letters in a lot of folders get to know the letter writers and they say stuff to themselves along the lines of, oh this guy never writes good letters, so given that, this letter is actually pretty good!

Of course that’s not to say that it’s a perfect system of adaptation to such tough letter writing biases: for sure there are hiring committees unfamiliar with those letter writers, and for those students who have those tough letters, they inevitably suffer in such situations.

On the other hand, if you tried to explicitly adjust this problem, you could be inviting other, even bigger problems. For example, if you had a public yet anonymous webpage which scored every letter writer on a scale of toughness, then the young people looking for jobs might feel like to compete, they’d need to only get letters from people who always write good letters (they exist), and then the entire system would fail because the letters would contain less and less information. That would be a problem.

OK, what about the narcissist letter writer? That’s harder, since they’re not consistently tough, but rather they’re tough on people they just don’t like for whatever reason. It’s much much harder for people on hiring committees to spot the narcissists, and thus those narcissists probably do lots of damage. Luckily they’re also less common then the tough letter writers, but of course they exist.

I’d like to respond to your last question, about wanting to hurt CA, who I’m guessing is a narcissist letter writer, and even though the question is posed strangely.

I don’t think it’s unethical, when you’re counseling any person in your field from now on, to explicitly suggest not using that guy, or for that matter any narcissist letter writer. Of course, this is before you’ve read the putative letter, and of course the person might think you’re wrong and might ignore your advice (and of course, you might be wrong).

My advice to you about the person who didn’t get a job this year (note usage of “this year”): make sure they’re aware of how much letters count, and how different writers are known for different styles, and tell them to consider getting new letters. Ask them to explicitly ask their letter writers whether their letters are good, and define “good”, something I always counsel people to do when they ask for letters.  I don’t think you can do much more than this.

But I’m eager to hear what Jason Starr thinks, he’s always very thoughtful!

Best,

Aunt Pythia

——

Dear Aunt Pythia,

You write an amazing blog that

  • lets your readership get to know you as a person and
  • showcases your interests and expertise without
  • too much compartmentalizing.

Help a sister out with some advice for how to achieve similar results?

Bridging Lives Online Gets Gnarly Yo

Dear BLOGGY,

My advice is to

  • Set aside time every day to write. Consistency is your friend.
  • Choose a (possibly imaginary) friend of yours each day to write to – your audience – that is on your side but will also ask clarifying questions, and explain something to them that you find interesting. That’s a blog post!
  • Also, explain one idea well, then stop. People can barely stand one idea before losing interest.

Good luck, I know you’re gonna rock it!!!

Love,

Auntie P

——

Please submit your well-specified, fun-loving, cleverly-abbreviated ethical quandary to Aunt Pythia!

Categories: Aunt Pythia

The Stop and Frisk sleight of hand

I’m finishing up an essay called “On Being a Data Skeptic” in which I catalog different standard mistakes people make with data – sometimes unintentionally, sometimes intentionally.

It occurred to me, as I wrote it, and as I read the various press conferences with departing mayor Bloomberg and Police Commissioner Raymond Kelly when they addressed the Stop and Frisk policy, that they are guilty of making one of these standard mistakes. Namely, they use a sleight of hand with respect to the evaluation metric of the policy.

Recall that an evaluation metric for a model is the way you decide whether the model works. So if you’re predicting whether someone would like a movie, you should go back and check whether your recommendations were good, and revise your model if not. It’s a crucial part of the model, and a poor choice for it can have dire consequences – you could end up optimizing to the wrong thing.

[Aside: as I’ve complained about before, the Value Added Model for teachers doesn’t have an evaluation method of record, which is a very bad sign indeed about the model. And that’s a Bloomberg brainchild as well.]

So what am I talking about?

Here’s the model: stopping and frisking suspicious-looking people in high-crime areas will improve the safety and well-being of the city as a whole.

Here’s Bloomberg/Kelly’s evaluation method: the death rate by murder has gone down in New York during the policy. However, that rate is highly variable and depends just as much on whether there’s a crack epidemic going on as anything else. Or maybe it’s improved medical care. Truth is people don’t really know. In any case ascribing credit for the plunging death rate to Stop and Frisk is a tenuous causal argument. Plus since Stop and Frisk events have decreased drastically recently, we haven’t seen the murder rate shoot up.

Here’s another possible evaluation method: trust in the police. And considering that 400,000 innocent black and Latino New Yorkers were stopped last year under this policy (here are more stats), versus less than 50,000 whites, and most of them were young men, it stands to reason that the average young minority male feels less trust towards police than the average young white male. In fact, this is an amazing statistic put together by the NYCLU from 2011:

The number of stops of young black men exceeded the entire city population of young black men (168,126 as compared to 158,406).

If I’m a black guy I have an expectation of getting stopped and frisked at least once per year. How does that make me trust cops?

Let’s choose an evaluation method closer to what we can actually control, and let’s optimize to it.

Update: a guest columnist fills in for David Brooks, hopefully not for the last time, and gives us his take on Kelly, Obama, and racial profiling.

Categories: data science, modeling, rant

Money in politics: the BFF project

This is a guest post by Peter Darche, an engineer at DataKind and recent graduate of NYU’s ITP program.  At ITP he focused primarily on using personal data to improve personal social and environmental impact.  Prior to graduate school he taught in NYC public schools with Teach for America and Uncommon Schools.

We all ‘know’ that money influences the way congressmen and women legislate; at least we certainly believe it does.  According to poll conducted by law professor Larry Lessig for his book Republic Lost, 75% of respondents (Republican and Democrat) said that ‘money buys results in Congress.’

And we have good reason to believe so. With astronomical sums of campaign money flowing into the system and costly, public-welfare reducing legislation coming out, it’s the obvious explanation.

But what does that explanation really tell us? Yes, a congresswoman’s receiving millions dollars from an industry then voting with that industry’s interests reeks of corruption. But, when that industry is responsible for 80% of her constituents’ jobs the causation becomes much less clear and the explanation much less informative.

The real devil is in the details. It is in the ways that money has shaped her legislative worldview over time and in the small, particular actions that tilt her policy one way rather than another.

In the past finding these many and subtle ways would have taken a herculean effort: untold hours collecting campaign contributions, voting records, speeches, and so on. Today however, due to the efforts of organizations like the Sunlight Foundation and Center for Responsive Politics, this information is online and programmatically accessible; you can write a few lines of code and have a computer gather it all for you.

The last few months Cathy O’Neil, Lee Drutman (a Senior Fellow at the Sunlight Foundation), myself and others have been working on a project that leverages these data sources to attempt to unearth some of these particular facts. By connecting all the avenues by which influence is exerted on the legislative process to the actions taken by legislators, we’re hoping to find some of the detailed ways money changes behavior over time.

The ideas is this: first, find and aggregate what data exists related to the ways influence can be exerted on the legislative process (data on campaign contributions, lobbying contributions, etc), then find data that might track influence manifesting itself in the legislative process (bill sponsorships, co-sponsorships, speeches, votes, committee memberships, etc). Finally, connect the interest group or industry behind the influence to the policies and see how they change over time.

One immediate and attainable goal for this project, for example, is to create an affinity score between legislators and industries, or in other words a metric that would indicate the extent to which a given legislator is influenced by and acts in the interest of a given industry.

So far most of our efforts have focused on finding, collecting, and connecting the records of influence and legislative behavior. We’ve pulled in lobbying and campaign contribution data, as well as sponsored legislation, co-sponsored legislation, speeches and votes. We’ve connected the instances of influence to legislative actions for a given legislator and visualized it on a timeline showing the entirety of a legislator’s career.

Here’s an example of how one might use the timeline. The example below is of Nancy Pelosi’s career. Each green circle represents a campaign contribution she received, and is grouped within a larger circle by the month it was recorded by the FEC. Above are colored rectangles representing legislative actions she took during the time-period in focus (indigo are votes, orange speeches, red co-sponsored bills, blue sponsored bills). Some of the green circles are highlighted because the events have been filtered for connection to health professionals.

Changing the filter to Health Services/HMOs, we see different contributions coming from that industry as well as a co-sponsored bill related to that industry.

Mousing over the bill indicates its a proposal to amend the Social Security act to provide Medicaid coverage to low-income individuals with HIV. Further, looking around at speeches, one can see a relevant speech about the children’s health insurance. Clicking on the speech reveals the text.

By combining data about various events, and allowing users to filter and dive into them, we’re hoping to leverage our natural pattern-seeking capabilities to find specific hypotheses to test. Once an interesting pattern has been found, the tool would allow one to download the data and conduct analyses.

Again, It’s just start, and the timeline and other project related code are internal prototypes created to start seeing some of the connections. We wanted to open it up to you all though to see what you all think and get some feedback. So, with it’s pre-alphaness in mind, what do you think about the project generally and the timeline specifically?  What works well – helps you gain insights or generate hypotheses about the connection between money and politics – and what other functionality would you like to see?

The demo version be found here with data for the following legislators:

  • Nancy Pelosi
  • John Boehner
  • Cathy McMorris Rodgers
  • John Boehner
  • Eric Cantor
  • James Lankford
  • John Cornyn
  • Nancy Pelosi
  • James Clyburn
  • Kevin McCarthy
  • Steny Hoyer

Note: when the timeline is revealed, click and drag over content at the bottom of the timeline to reveal the focus events.

THIS REQUIRES YOUR MOCKERY

My title today is the subject line of a message I received from my buddy Jordan Ellenberg. Thanks for making things so easy for me to blog this morning, Jordan!

So here’s the subject: a Silicon Valley entrepreneur’s self-help book, including advice on how to quantify and measure your sex life, among other things – every other thing, in fact.

Just in case you’ve missed it, there’s a movement afoot among certain people to collect data about themselves on the level of heart rate, daily exercise and eating patterns, and the like, with the goal of self-improvement.

It’s got a name – the Quantified Self movement – and if I haven’t mentioned it before, it’s because honestly, it’s too easy, and I generally speaking like a challenge.

I saw a bunch of these guys at the health analytics conference I went to a couple of months ago, and let me tell you, they’re weird, and they know it, and they don’t care.

They honestly feel sorry for people who don’t have a Ironman Triathlon (or four) to train for via wireless excel spreadsheets. I mean, how do those people know whether they’ve actually improved? How do they know if they’ve eaten enough carbs? How do they know if they’ve slept??

As far as these Quantified Selfers (QSers) are concerned, it’s only a matter of time before everyone is, like them, making themselves perfect, and they’re the vanguard with nothing to be defensive about.

So anyhoo, those QS guys are convinced that they’re accomplishing something with all of their number collecting and crunching, like maybe they’ll live forever or something (after curing cancer), and they’re just so douchey I feel sorry for them. Blogging about them and trashing them would be like a mean older kid in the playground telling a bunch of little kids that there’s no Santa Claus.

Why do that? Why pop their bubble?

Here’s why: it’s just plain fun, especially now that they’ve ventured into sexy territory with their spreadsheets.

Here are a couple of questions for the Quantified Sexual Selfers (QSSers) in the audience, please get back to me.

  1. Yes or no: nothing says “hot ‘n’ steamy” like a fitbit readout of historical orgasms.
  2. Where does the sensor band get attached, and does it come with a vibrating option?
  3. Are your orgasms more satisfying before or after syncing your daily data with Stephen Wolfram’s?
  4. What’s your metric of success, and how do you know your girlfriend ain’t gaming the system?
Categories: modeling, musing

Aunt Pythia’s advice

Aunt Pythia is ever so pleased to be here today, on her 41st birthday no less, spewing forth questionable advice that nobody will be willing to go on the record as having read, but which she knows in her heart each reader secretly treasures.

Now, when Aunt Pythia was on her death bed two weeks ago, the call was raised for more questions, and quickly. And readers, you responded, which brings tears to Aunt Pythia’s eyes, it really does. It brought her back from the brink and she’s eternally grateful.

The problem is, though, this: some of these questions are of dubious substance. To be honest, they’re very short, not extremely well-thought out or juicy, and don’t pose an existential conundrum.

Of course, one doesn’t want to look a gift horse in the mouth, so I’ve arranged to answer these questions in speed-round fashion today. I hope you enjoy it, and please don’t forget:

Submit your existential conundrums to Aunt Pythia at the bottom of this page!

By the way, if you don’t know what the hell I’m talking about, go here for past advice columns and here for an explanation of the name Pythia.

——

Dear Aunt Pythia,

What should I do when, after posting a video from Vi Hart, a reader responds “I’ve got to marry that girl.”?

Math Guy

Dear Math Guy,

Offer to administer the wedding! Turns out you can get certified as a minister with an app called “OrdainThyself”.

Screen Shot 2013-07-13 at 6.37.49 AM

Aunt Pythia

——

Dear Aunt Pythia,

If you were a flavor of ice cream, what flavor of ice cream would you be?

Sleepless in Seattle

Dear SiS,

Not sure about me, but my kids would all be Ben & Jerry’s Coffee Heath Bar Crunch, which I ate pretty much continuously and exclusively during my three pregnancies.

Not me, but I had that same stoned expression.

Not me, but I had that same stoned expression.

I hope that helps!

Aunt Pythia

——

Dear Aunt Pythia,

I am a 24 year-old grad student, and I’ve noticed the following trend in my life: When I was younger (read, 14 and older), I always was attracted to people around 19 years of age which was too old for me. But now, I’m still attracted to people around 19 years of age, which is quickly getting too young for me. What should I do???

Feeling a little bit like a Cougar…

Dear Wanna-be Cougar,

Just as I can’t claim to be part of the generation of 20-somethings that refuse to make appointments more than 17 minutes in advance, and then only by text, you cannot claim to be a cougar, sorry. That’s reserved for women who are at least 40, possibly 41, and there’s no extra room at this table.

Not me, but I do share the sentiment

Not me, but I do share the sentiment

In terms of your “problem,” it’s one of those things you can’t control, as far as I know, so just take the posture of bewildered amusement at your own desires, and make sure you don’t do anything illegal or weird.

Smooches,

Aunt Pythia

——

Dear Aunt Pythia,

Since I know how fond you are of bridge, I have a question about slam bidding: Given the fact that you and your partner have a guaranteed slam, what is the probability that you will bid into that slam? What are the ways to maximize that probability, in terms of convention? What are the easiest ways to invite slam to your partner? What is your opinion of cue bidding, and what are the least confusing ways to cue bid?

Seeker Abling Young Cardsharks

Dear SAYC,

I appreciate how your sign-off is code for how I should answer this question.

But even so, I’m going to go with my gut here: when I’m in a perceived slam with my partner, I always make sure to stare knowingly into his or her eyes, with raised eyebrows, and mouth the word “slam”, Colbert-style.

Me.

Me.

If that isn’t getting through I squeeze his or her knee under the table. Works every time. For me, bridge is all about being fun and ridiculous, and I never follow the rules unless it’s more fun to do so.

I hope that helps!

Auntie P

——

Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!

Categories: Aunt Pythia

The creepy mindset of online credit scoring

Usually I like to think through abstract ideas – thought experiments, if you will – and not get too personal. I take exceptions for certain macroeconomists who are already public figures but most of the time that’s it.

Here’s a new category of people I’ll call out by name: CEO’s who defend creepy models using the phrase “People will trade their private information for economic value.”

That’s a quote of Douglas Merrill, CEO of Zest Finance, taken from this video taken at a recent data conference in Berkeley (hat tip Rachel Schutt). It was a panel discussion, the putative topic of which was something like “Attacking the structure of everything”, whatever that’s supposed to mean (I’m guessing it has something to do with being proud of “disrupting shit”).

Do you know the feeling you get when you’re with someone who’s smart, articulate, who probably buys organic eggs from a nice farmer’s market, but who doesn’t expose an ounce of sympathy for people who aren’t successful entrepreneurs? When you’re with someone who has benefitted so entirely and so consistently from the system that they have an almost religious belief that the system is perfect and they’ve succeeded through merit alone?

It’s something in between the feeling that, maybe you’re just naive because you’ve led such a blessed life, or maybe you’re actually incapable of human empathy, I don’t know which because it’s never been tested.

That’s the creepy feeling I get when I hear Douglas Merrill speak, but it actually started earlier, when I got the following email almost exactly one year ago via LinkedIn:

Hi Catherine,

Your profile looked interesting to me.

I’m seeking stellar, creative thinkers like you, for our team in Hollywood, CA. If you would consider relocating for the right opportunity, please read on.

You will use your math wizardry to develop radically new methods for data access, manipulation, and modeling. The outcome of your work will result in game-changing software and tools that will disrupt the credit industry and better serve millions of Americans.

You would be working alongside people like Douglas Merrill – the former CIO of Google – along with a handful of other ex-Googlers and Capital One folks. More info can be found on our LinkedIn company profile or at www.ZestFinance.com.

At ZestFinance we’re bringing social responsibility to the consumer loan industry.

Do you have a few moments to talk about this? If you are not interested, but know someone else who might be a fit, please send them my way!

I hope to hear from you soon. Thank you for your time.

Regards,
Adam

Wow, let’s “better serve millions of Americans” through manipulation of their private data, and then let’s call it being socially responsible! And let’s work with Capital One which is known to be practically a charity.

What?

Message to ZestFinance: “getting rich with predatory lending” doesn’t mean “being socially responsible” unless you have a really weird definition of that term.

Going back to the video, I have a few more tasty quotes from Merrill:

  1. First when he’s describing how he uses personal individual information scraped from the web: “All data is credit data.”
  2. Second, when he’s comparing ZestFinance to FICO credit scoring: “Context is developed by knowing thousands of things about you. I know you as a person, not just you via five or six variables.”

I’d like to remind people that, in spite of the creepiness here, and the fact that his business plan is a death spiral of modeling, everything this guy is talking about is totally legal. And as I said in this post, I’d like to see some pushback to guys like Merrill as well as to the NSA.

Categories: data science, rant

On being a data science skeptic: due out soon

A few months ago, at the end of January, I wrote a post about Bill Gates naive views on the objectivity of data. One of the commenters, “CitizensArrest,” asked me to take a look at a related essay written by Susan Webber entitled “Management’s Great Addiction: It’s time we recognized that we just can’t measure everything.”

Webber’s essay is really excellent, not to mention impressively prescient considering it was published in 2006, before the credit crisis. The format of the essay is simple: it brings up and explains various dangers in the context of measurement and modeling of business data, and calls for finding a space in business for skepticism. What an idea! Imagine if that had actually happened in finance when it should have back in 2006.

Please go read her essay, it’s short.

Recently, when O’Reilly asked me to write an essay, I thought back to this short piece and decided to use it as a template for explaining why I think there’s a just-as-desperate need for skepticism in 2013 here in the big data world as there was back then in finance.

Whereas most of Webber’s essay talks about people blindly accepting numbers as true, objective, precise, and important, and the related tragic consequences, I’ve added a small wrinkle to this discussion. Namely, I also devote concern over the people who underestimate the power of data.

Most of this disregard for unintended consequences is blithe and unintentional (and some of it isn’t), but even so it can be hugely damaging, especially to the individuals being modeled: think foreclosed homes due to crappy housing-related models in the past, and think creepy models and the death spiral of modeling for the present and future.

Anyhoo, I’m actively writing it now, and it’ll be coming out soon. Stay tuned!

Categories: data science, finance, modeling

PyData and a few other things

So here’s the thing about being a parent of benign neglect: it’s no walk in the park. I talk a big game, but the truth is I’ve have trouble getting to sleep from the anxiety. To distract myself I’ve been watching Law & Order episodes on Netflix until the wee hours of the night.

Two things about this plan suck. First, my husband is in Amsterdam, which means he’s 6 time zones away from our oldest son whereas I’m only 3, but somehow that means I’m shouldering 99.5% of the responsibility to worry (there’s some universal geographic law of parenting at work there but I don’t know how to formulate it). Second, half of the L&O episodes involve either children getting maimed or killed or child killers. Not restful but I freaking can’t stop!

In any case, not much extra energy to spring out of bed and write the blog, so apologies for a sparse period for mathbabe. For whatever reason I woke up this morning in time to blog, however, so as to not miss an opportunity it’s gonna be in list form:

  1. I’ve been invited to keynote at PyData in Cambridge, MA at the end of the month – me and Travis Oliphant! I’m still coming up with the title and abstract for my talk, but it’s going to be something about storytelling with data using the iPython Notebook. Please make suggestions!
  2. I was in a Wall Street Journal article about Larry Summers, talking about whether he’s got a good personality to take over from Ben Bernanke, i.e. should we trust our lives and our future with him. I say nope. What’s funny is that my uncle, economist Bob Hall, is also referred to in the same article. The journalist didn’t know we’re related until after the article came out and Uncle Bob informed him.
  3. Hey, can we give it up for Eliot Spitzer? The powers that be are down about that guy presumably for having sex with prostitutes but really because he’s a threat. I say legalize prostitution, unionize the prostitutes a la the dutch, and put Spitzer in charge of something involving money and corruption, he’s smart and fearless. Who’s with me?
  4. It looks like good news: the Consumer Financial Protection Bureau might be cracking down on illegal debt collector tactics. Update: wait, the fines are fractions of 1% of the revenue these guys made on their unfair practices. Can we please have a rule that when you get caught breaking the law, the fine will be large enough so it’s no longer profitable?
Categories: news, open source tools

Measuring Up by Daniel Koretz

This is a guest post by Eugene Stern.

Now that I have kids in school, I’ve become a lot more familiar with high-stakes testing, which is the practice of administering standardized tests with major consequences for students who take them (you have to pass to graduate), their teachers (who are often evaluated based on standarized test results), and their school districts (state funding depends on test results). To my great chagrin, New Jersey, where I live, is in the process of putting such a teacher evaluation system in place (for a lot more detail and criticism, see here).

The excellent John Ewing pointed me to a pretty comprehensive survey of standardized testing called “Measuring Up,” by Harvard Ed School prof Daniel Koretz, who teaches a course there about this stuff. If you have any interest in the subject, the book is very much worth your time. But in case you don’t get to it, or just to whet your appetite, here are my top 10 takeaways:

  1. Believe it or not, most of the people who write standardized tests aren’t idiots. Building effective tests is a difficult measurement problem! Koretz makes an analogy to political polling, which is a good reminder that a test result is really a sample from a distribution (if you take multiple versions of a test designed to measure the same thing, you won’t do exactly the same each time), and not an absolute measure of what someone knows. It’s also a good reminder that the way questions are phrased can matter a great deal.

  2. The reliability of a test is inversely related to the standard deviation of this distribution: a test is reliable if your score on it wouldn’t vary very much from one instance to the next. That’s a function of both the test itself and the circumstances under which people take it. More reliability is better, but the big trade-off is that increasing the sophistication of the test tends to decrease reliability. For example, tests with free form answers can test for a broader range of skills than multiple choice, but they introduce variability across graders, and even the same person may grade the same test differently before and after lunch. More sophisticated tasks also take longer to do (imagine a lab experiment as part of a test), which means fewer questions on the test and a smaller cross-section of topics being sampled, again meaning more noise and less reliability.

  3. A complementary issue is bias, which is roughly about people doing better or worse on a test for systematic reasons outside the domain being tested. Again, there are trade-offs: the more sophisticated the test, the more extraneous skills beyond those being tested it may be bringing in. One common way to weed out such questions is to look at how people who score the same on the overall test do on each particular question: if you get variability you didn’t expect, that may be a sign of bias. It’s harder to do this for more sophisticated tests, where each question is a bigger chunk of the overall test. It’s also harder if the bias is systematic across the test.

  4. Beyond the (theoretical) distribution from which a single student’s score is a sample, there’s also the (likely more familiar) distribution of scores across students. This depends both on the test and on the population taking it. For example, for many years, students on the eastern side of the US were more likely to take the SAT than those in the west, where only students applying to very selective eastern colleges took the test. Consequently, the score distributions were very different in the east and the west (and average scores tended to be higher in the west), but this didn’t mean that there was bias or that schools in the west were better.

  5. The shape of the score distribution across students carries important information about the test. If a test is relatively easy for the students taking it, scores will be clustered to the right of the distribution, while if it’s hard, scores will be clustered to the left. This matters when you’re interpreting results: the first test is worse at discriminating among stronger students and better at discriminating among weaker ones, while the second is the reverse.

  6. The score distribution across students is an important tool in communicating results (you may not know right away what a score of 600 on a particular test means, but if you hear it’s one standard deviation above a mean of 500, that’s a decent start). It’s also important for calibrating tests so that the results are comparable from year to year. In general, you want a test to have similar means and variances from one year to the next, but this raises the question of how to handle year-to-year improvement. This is particularly significant when educational goals are expressed in terms of raising standardized test scores.

  7. If you think in terms of the statistics of test score distributions, you realize that many of those goals of raising scores quickly are deluded. Koretz has a good phrase for this: the myth of the vanishing variance. The key observation is that test score distributions are very wide, on all tests, everywhere, including countries that we think have much better education systems than we do. The goals we set for student score improvement (typically, a high fraction of all students taking a test several years from now are supposed to score above some threshold) imply a great deal of compression at the lower end of this distribution – compression that has never been seen in any country, anywhere. It sounds good to say that every kid who takes a certain test in four years will score as proficient, but that corresponds to a score distribution with much less variance than you’ll ever see. Maybe we should stop lying to ourselves?

  8. Koretz is highly critical of the recent trend to report test results in terms of standards (e.g., how many students score as “proficient”) instead of comparisons (e.g., your score is in the top 20% of all students who took the test). Standards and standard-based reporting are popular because it’s believed that American students’ performance as a group is inadequate. The idea is that being near the top doesn’t mean much if the comparison group is weak, so instead we should focus on making sure every student meets an absolute standard needed for success in life. There are three (at least) problems with this. First, how do you set a standard – i.e., what does proficient mean, anyway? Koretz gives enough detail here to make it clear how arbitrary the standards are. Second, you lose information: in the US, standards are typically expressed in terms of just four bins (advanced, proficient, partially proficient, basic), and variation inside the bins is ignored. Third, even standards-based reporting tends to slide back into comparisons: since we don’t know exactly what proficient means, we’re happiest when our school, or district, or state places ahead of others in the fraction of students classified as proficient.

  9. Koretz’s other big theme is score inflation for high-stakes tests: if everyone is evaluated based on test scores, everyone has an incentive to get those scores up, whether or not that actually has much correlation with learning. If you remember anything from the book or from this post, remember this phrase: sawtooth pattern. The idea is that when a new high-stakes standardized test appears, average scores start at some base level, go up quickly as people figure out how to game the test, then plateau. If the test is replaced with another, the same thing happens: base, rapid growth, plateau. Repeat ad infinitum. Koretz and his collaborators did a nice experiment in which they went back to a school district in which one high-stakes test had been replaced with another and administered the first test several years later. Now that teachers weren’t teaching to the first test, scores on it reverted back to the original base level. Moral: score inflation is real, pervasive, and unavoidable, unless we bite the bullet and do away with high-stakes tests.

  10. While Koretz is sympathetic toward test designers, who live the complexity of standardized testing every day, he is harsh on those who (a) interpret and report on test results and (b) set testing and education policy, without taking that complexity into account. Which, as he makes clear, is pretty much everyone who reports on results and sets policy.

Final thoughts

If you think it’s a good idea to make high-stakes decisions about schools and teachers based on standardized test results, Koretz’s book offers several clear warnings.

First, we should expect any high-stakes test to be gamed. Worse yet, the more reliable tests, being more predictable, are probably easier to game (look at the SAT prep industry).

Second, the more (statistically) reliable tests, by their controlled nature, cover only a limited sample of the domain we want students to learn. Tests trying to cover more ground in more depth (“tests worth teaching to,” in the parlance of the last decade) will necessarily have noisier results. This noise is a huge deal when you realize that high-stakes decisions about teachers are made based on just two or three years of test scores.

Third, a test that aims to distinguish “proficiency” will do a worse job of distinguishing students elsewhere in the skills range, and may be largely irrelevant for teachers whose students are far away from the proficiency cut-off. (For a truly distressing example of this, see here.)

With so many obstacles to rating schools and teachers reliably based on standardized test scores, is it any surprise that we see results like this?