An Interview And A Notebook

Interview on Junk Charts

Yesterday I was featured on Kaiser Fung’s Junk Charts blog in an interview where he kindly refers to me as a “Numbersense Pro”. Previous to this week, my strongest connection with Kaiser Fung was through Andrew Gelman’s meta-review of my review and Kaiser’s review of Nate Silver’s book The Signal And The Noise.

iPython Notebook in Data Journalism

Speaking of Nate Silver, Brian Keegan, a quantitative social scientist from Northeastern University, recently built a very cool iPython notebook (hat tip Ben Zaitlen), replete with a blog post in markdown on the need for openness in journalism (also available here), which revisited a fivethirtyeight article originally written by Walt Hickey on the subject of women in film. Keegan’s notebook is truly a model of open data journalism, and the underlying analysis is also interesting, so I hope you have time to read it.

Defining poverty #OWS

I am always amazed by my Occupy group, and yesterday’s meeting was no exception. We decided to look into redefining the poverty line, and although the conversation took a moving and deeply philosophical turn, I’ll probably only have time to talk about the nuts and bolts of formulas this morning.

The poverty line, or technically speaking the “poverty threshold,” is the same as it was in 1964 when it was invented except for being adjusted for inflation via the CPI.

In the early 1960′s, it was noted that poor families spent about a third of their money on food. To build an “objective” measure of poverty, then, they decided to measure the cost of an “economic food budget” for a family of that size and then multiply that cost by 3.

Does that make sense anymore?

Well, no. Food has gotten a lot cheaper since 1964, and other stuff hasn’t. According to the following chart, which I got from The Atlantic, poor families now spend about one sixth of their money on food:

Rich people spend even less on food.

Rich people spend even less on food.

Now if you think about it, the formula should be more like “economic food budget” * 6, which would effectively double all the numbers.

Does this matter? Well, yes. Various programs like Medicare and Medicaid determine eligibility based on poverty. Also, the U.S. census measures poverty in our country using this yardstick. If we double those numbers we will be seeing a huge surge in the official numbers.

Not that we’d be capturing everyone even then. The truth is, in some locations, like New York, rent is so high that the formula would likely be needing even more adjustment. Although food is expensive too, so maybe the base “economic food budget” would simply need adjusting.

As usual the key questions are, what are we accomplishing with such a formula, and who is “we”?

Categories: #OWS, modeling, statistics

Aunt Pythia’s advice

Aunt Pythia is psyched to be writing today after missing a couple of days of regular posts. Aunt Pythia loves you people and understands how much you rely on her ridiculous advice, so she really goes out of her way to get up on Saturdays, stretch out on the couch in her underwear and armed only with a laptop and copious coffee, and spout utter nonsense. She knows you love it to, and want it to continue indefinitely. So please, after enjoying today’s bilge:

think of something to ask Aunt Pythia at the bottom of the page!

By the way, if you don’t know what the hell Aunt Pythia is talking about, go here for past advice columns and here for an explanation of the name Pythia.


Dear Aunt Pythia,

I am an undergraduate at a liberal arts college applying to REUs. If I don’t get into any, I won’t have any opportunities to do research before applying to PhD programs. Would that mean I won’t get into grad school either? What options do I have to prove I am research-ready?

Possibly Not Research Ready

Dear PNRP,

I’m just blown away by the list of REU’s that have sprung up since I was a wee lass. I mean, I went to one, it was at Duluth and run by the incomparable Joe Gallian, but I’m more or less sure it was the only one around back then. He’s been doing it since 1977, and although I wasn’t at the very first one, I was early enough so that all the participants names could fit on one shirt. Holy crap there’s a picture of me from this page at my REU:

Screen Shot 2014-04-05 at 6.20.45 AM

Man, we played a lot of bridge that summer.

OK sorry for the nostalgic stream of consciousness. I will now attempt to answer your question.

First of all, given that very few people used to do REU’s before grad school, I obviously don’t think that it’s strictly necessary. On the other hand, given how many now exist, I’m guessing it’s become a common if not vital signaling device for getting into grad school (readers, weigh in!). It’s also probably gotten easier to get into. Definitely apply to many of them.

If you somehow didn’t apply to enough and it’s too late and you don’t get in anywhere, don’t despair. Look around for a teacher at your school or a nearby school, or even online, that is willing to do a reading course with you and develop some kind of senior thesis type project, or open problem to solve.

I feel that I need to add that most of these programs don’t actually ask you to solve open problems. It’s more like a peek at graduate school math and a mindset of research rather than an expectation that you will publish a paper. I know because I’ve taught at a couple since my college years.

Good luck!

Aunt Pythia


Dear Aunt Pythia,

What kind of skills/classes do I need to break into data science as an undergrad? It seems like a really interesting field and I don’t know whether I’m qualified to jump into an internship or not. Currently a math major without any stats classes under my belt.

Data Internship Youngster

Dear DIY,

First steps: take a CS class in a scripting language like python, take probability and statistics, and read my recent book or at least skim it at the bookstore.

Second steps, if you have time: take classes on machine learning, Bayesian statistics, and ethics.

Third steps, if you have even more time: take an advanced programming class, an optimization or information theory class, and become an anthropologist.

In the meantime, keep an eye on the curriculums for the industry data science camps not popping up everywhere, for example at the Microsoft Research Center.

Good luck!

Aunt Pythia


Dear Pythia,

Thank you for answering my question about “fairness” rankings by mentioning the Gini coefficient and upward mobility study, both of which I found interesting and hadn’t seen before.

Though obviously major, money wasn’t the only thing I had in mind. Judicial systems that imprison unfairly – possibly due to unfair laws to begin with – unfair job and housing discrimination, unfair environmental conditions and situations (Bhopal comes to mind), reasonable access to medical care, or lack thereof – all of these could be tossed into a fairness score as well.

I guess that in the end though, “fairness” may be a little too vague and subjective a term to be attached to any meaningful objective ranking. Fortunately the world already has lots of watchdog organizations that observe and report on objectively measurable facets of human life. OWS is one such organization.

Thanks again,


First of all, that’s Aunt Pythia to you. Har har.

Actually, even though it doesn’t appear that you’ve asked another question, I want to thank you for giving me an opening to my favorite recent rant.

In the context of my weekly Occupy meetings, I’ve been thinking more and more about the outrageous prison system in our own country and the multitude of mostly minority young men in that system. It’s a truly disgusting and predatory big business. As one of my co-occupiers said, if you’re too poor for us to make money off of you directly, we will throw you in prison and make money off of your incarceration.

Which brings me to your idea of measuring that kind of unfairness, even within our own country, and indeed even within the city of New York. Here’s the idea I’ve been tossing around inside my head.

It’s been long tossed around that the rate of marijuana use is similar for whites and blacks but blacks are going to jail way more for possession, essentially because of Stop & Frisk. In other words, blacks are more likely to get caught and to not have a fancy lawyer to get them out of trouble when they find themselves in trouble.

It brings up a host of questions, but I’ll focus on one: what is the relative chance that someone can get away with a mistake?

In other words, think about it this way.  We all make mistakes, and young men (and women) are especially impulsive and judgment-lacking. So instead of asking whether they make mistakes, ask instead what the chances are that such mistakes will land them in jail or prison. I feel like those probabilities might be a good start at what you’re getting at. Do you agree?


Aunt Pythia


Dear Aunt P,

My question may appear a little blunt but it’s one that’s troubled me for ages and I can’t think of any other way to ask it, so here goes: Does clitoris size reflect sexuality?

I mean, might larger be associated with more dominant or further along the hetero/homo-sexual scale, for example?

My follow-up question is, how would one go about assessing this? No, I don’t mean you to say ‘with warm hands and a micrometer’ but a suggestion of the mathematic parameters and procedures.

Jenny Taylor

Dear …umm… Jenny,

I’m going to say no. I have the following reasons for this answer, with exactly zero evidence gathered and assessed. Namely, it’s patently untrue of penises, which we all think about all the time in this society, so why should it be true of clitorides? Yes, that’s the plural of clitoris, I looked it up.

Now it’s true that a given woman’s clitoris ebbs and flows depending on how sexually stimulated she is, but other than that I think you just assume randomness.

As far as follow up, I’m gonna have to say: none needed, but if you want to turn this into a weird pick-up line at a bar then the more power to you.

Aunt Pythia


Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!

Categories: Aunt Pythia

What Monsanto and college funds have in common


I recently read this letter to the editor written by Catharine Hill, the President of Vassar College, explaining why reducing family contributions in college tuition and fees isn’t a good idea. It was in response to this Op-Ed by Steve Cohen about the onerous “E.F.C.” system.

Let me dumb down the debate a bit for the sake of simplicity. Steve is on one side basically saying, “college costs too damn much, it didn’t used to cost this much!” and Catharine is on the other side saying, “colleges need to compete! If you’re not willing to pay then someone else will!”

Here’s the thing, there’s an arms race of colleges driving up costs. In some perverse combination of US News & World Reports model gaming and in responding to the Federal loan support incentive system, not to mention political decisions methodically removing funding from state colleges, college costs have been wildly rising.

And when you have an arms race, as I’ve learned from Tom Slee, the only solution is an armistice. In this case an armistice would translate into something like an agreement among colleges to set a maximum and reasonable tuition and fee structure. Sounds good, right? But an armistice won’t happen if the players in question are benefitting from the arms race. In this case parents are suffering but colleges are largely benefitting.


This recent Salon article detailing the big data approach that Monsanto is taking to their massive agricultural empire is in the same boat.

The idea is that Monsanto has bought up a bunch of big data firms and satellite firms to perform predictive analytics on a massive scale for farming. And they are offering farmers who are already internal to the Monsanto empire the chance to benefit from their models.

Farmers are skeptical of using the models, because they are worried about how much data Monsanto will be able to collect about them if they do.

But here’s the thing, farmers: Monsanto already has all your data, and will have it forever, due to their surveillance. They will know exactly what you plant, where, and how densely.

And what they are offering you is probably actually a benefit to you, but of course the more important thing for them is that they are explicitly creating an arms race between Monsanto farmers and non-Monsanto farmers.

In other words, if they give Monsanto farmers a extra boost, it will lead other farmers into the conclusion that, without such a boost, they won’t be able to keep up, and they will be forced into the Monsanto system by economic necessity.

Again an arms race, and again no armistice in sight, since Monsanto is doing this deliberately towards their profit bottom line. Assuming their models are good, the only way to avoid this for non-Monsanto farmers is to build their own predictive models, but clearly that would require enormous investment.

Categories: arms race, modeling

Navigating sexism does not mean accepting sexism

Not enough time for a full post this morning, but I’d like people to read a New York Times article ironically entitled Moving Past Gender Barriers to Negotiate a Raise (hat tip Laura Strausfeld). It has amazing and awful tidbits like the following:

“It’s totally unfair because we don’t require the same thing of men. But if women want to be successful in this domain, they need to pay attention to this.”

If you read on you realize that what they mean by “pay attention to” is “roll over and conform to stereotypes”. Super gross, and fuck that.

I feel like this is a more subtle, New York Times version of Susan Patton’s terrible advice for young women in snaring husbands. What happened to the feminists?!!

Categories: rant

Lobbyists have another reason to dominate public commenting #OWS

Before I begin this morning’s rant, I need to mention that, as I’ve taken on a new job recently and I’m still trying to write a book, I’m expecting to not be able to blog as regularly as I have been. It pains me to say it but my posts will become more intermittent until this book is finished. I’ll miss you more than you’ll miss me!

On to today’s bullshit modeling idea, which was sent to me by both Linda Brown and Michael Crimmins. It’s a new model built in part by the former chief economist for the Commodity Futures Trading Commission (CFTC) Andrei Kirilenko, who is now a finance professor at Sloan. In case you don’t know, the CFTC is the regulator in charge of futures and swaps.

I’ll excerpt this New York Times article which describes the model:

The algorithm, he says, uncovers key word clusters to measure “regulatory sentiment” as pro-regulation, anti-regulation or neutral, on a scale from -1 to +1, with zero being neutral.

If the number assigned to a final rule is different from the proposed one and closer to the number assigned to all the public comments, then it can be inferred that the agency has taken the public’s views into account, he says.

Some comments:

  1. I know really smart people that use similar sentiment algorithms on word clusters. I have no beef with the underlying NLP algorithm.
  2. What I do have a problem with is the apparent assumption that the “the number assigned to all the public comments” makes any sense, and in particular whether it takes into account “the public’s view”.
  3. It sounds like the algorithm dumps all the public comment letters into a pot and mixes it together to get an overall score. The problem with this is that the industry insiders and their lobbyists overwhelm public commenting systems.
  4. For example, go take a look at the list of public letters for the Volcker Rule. It’s not unlike this graphic on the meetings of the regulators on the Volcker Rule:reg_volcker
  5. Besides dominating the sheer number of letters, I’ll bet the length of each letter is also much longer on average for such parties with very fancy lawyers.
  6. Now think about how the NLP algorithm will deal with this in a big pot: it will be dominated by the language of the pro-industry insiders.
  7. Moreover, if such a model were to be directly used, say to check that public commenting letters were written in a given case, lobbyists would have even more reason to overwhelm public commenting systems.

The take-away is that this is an amazing example of a so-called objective mathematical model set up to legitimize the watering down of financial regulation by lobbyists.


Update: I’m willing to admit I might have spoken too soon. I look forward to reading the paper on this algorithm and taking a deeper look instead of relying on a newspaper.

Categories: #OWS, finance, modeling, rant

Aunt Pythia’s advice

Aunt Pythia is so very pleased to bring you more of her pearls of wisdom this nearly-believably-spring morning.

In celebration of above-freezing temperature, she’s extra cheerful and she welcomes the clouds and drizzle. After all, late March showers bring late April flowers, or something like that! Let there be blooming and cleansing!

And please, after you enjoy Aunt Pythia’s wisdom, and possibly after you clean out the front closet, please don’t forget to:

think of something to ask Aunt Pythia at the bottom of the page!

By the way, if you don’t know what the hell Aunt Pythia is talking about, go here for past advice columns and here for an explanation of the name Pythia.


Dear Aunt Pythia,

What the hell is goin’ on with Bitcoin? Will it survive into the future (or something else akin to it) or is it ultimately doomed???

Bitcoin Boogie

p.s. – I hope you realize you’ll probably have more success explaining quantum mechanics to me than Bitcoin.

Dear BB,

I promise not to try to explain Bitcoin’s underlying algorithms. But I think I can still answer your questions.

First of all, Bitcoin has been in the news lately in bad or confusing ways, first with the exchange (Mt. Gox) that went bankrupt, and second because regulators and institutional authorities are having trouble figuring out what Bitcoins are.

Even so, think of these hiccups as growing pains, according to Coinbase co-founder and former Goldman Sachs foreign exchange trader Fred Ehrsam, quoted as saying inspiring things like:

I would go to the bathroom and trade bitcoin on my smartphone and then return to my real desk to do my real job trading real currency.

If you don’t know about it, Coinbase is the “digital wallet” company that you’d probably sign up with if you wanted Bitcoins and you weren’t a huge nerd or a criminal willing to do things on the technical downlow: it makes owning Bitcoins easy, like signing up for a normal checking account.

And they are seeing lots of people joining, and they just got Overstock to accept Bitcoins as payment. So Ehrsam and people like him are pretty positive, and you never know.

Between you and me, though, I think the biggest competitor out there is Google, which has plans to allow people to share money over gmail (hat tip Suresh Naidu). Instead of paying heavy fees, you – guess what – tell Google about your checking accounts and other financial information. I see this potentially competing with banks, Apple, and of course PayPal, which sucks.

I hope that helps!

Aunt Pythia


Dear Aunt Pythia,

I am originally from a country where it’s normal to be sentimental. I am easily moved to tears and worry that this annoys others around me. Of course I can take counter-measures, for example I try to steady myself if the music is becoming emotional or before viewing some breathtakingly beautiful scenery, or, when news about a disaster or a sad film is being shown on the television I discreetly leave the room before it affects me.

I would like to be strong enough to withstand what appears to provoke no reaction in people here. Do you have any suggestion?

Too Sensitive

Dear Too,

I hear you, I’m a huge cryer too.  I blame the Irish side of my family.

What I do is I playfully prepare people I’m around, for their own comfort, and especially when they are not familiar with this side of me. So when I feel some sentimentality coming on, I’ll announce, “Hey I’m about to totally cry, because that’s what I do! Please bear with me and please ignore the tears, I’ll be OK in 10 minutes or less.” and then I’ll laugh, usually out of embarrassment.

That way they will know I realize it’s about me, not them, and that they’re not responsible to comfort me in any way. It works great, and it’s easy for me to do because I’m an extrovert. If you’re shy, it’s going to be harder, but the alternative is often that you have to explain yourself while you’re crying, which I think is worse.

Good luck!

Auntie P


Dear Aunt Pythia.

I am but a humble traveler trying to win you over with a Firefly reference and desperately seeking your advice.

Come July, I will find myself in New York for a week. I will be in need of a place to stay and some things to do while I’m visiting your fine city.

I have been looking on airbnb for a place to stay over a hotel or a hostel but am overwhelmed by all the options. Do I stay in Brooklyn, or Lower Manhattan? Harlem or the Upper West Side. I am a young data analyst from New Zealand, what do I know of New York neighborhoods?

And then there is the sightseeing, do I go and tick off all the tourist spots or are there better things for me to do with my time? Do you know any secret spots filled with good food, great coffee and devoid of the fanny-pack wearing, obnoxiously-photographing tourist hordes?


Seeking Habitation In New York

P.S. In New Zealand we call fanny-packs ‘bum-bags’. A fanny in NZ is something entirely different!


I don’t know from Firefly, sorry. But I’ll answer you anyway and let readers add their opinions.

I’d suggest you stay in a different neighborhood every night or two. That way you get to see more of New York, and any annoyance is short-lived. Most of your time will be spent traveling from place to place, so pack light. Make sure at least one night is in Astoria, Queens, which is just cool and kind of the epitome of the melting pot.

The reason I suggest this is that, for me, official tourist destinations are incredibly boring and expensive for what they offer (and what they offer is bum-bag bearing tourists, which you can already see in NZ anyway). I mean, if you think you’ll regret not going to the top of the Empire State Building, then by all means go, but go 10 minutes before they open and depart quickly.

Authentic sight-seeing in New York City consists, in my opinion, of walking through neighborhoods and checking out bars and restaurants and the local cultural gathering places. Look for live music in each neighborhood you stay in, if you like that sort of thing. Or if you are into food, make a plan for a foodie tour of each neighborhood. Yum!

Aunt Pythia


Dear Aunt Pythia,

In searching online dating profile in New York City (I live nearby), I encounter a bunch of profiles of finance professionals working in, say, investment banking. After reading your blog, I have become convinced that people who work in banking

1) are morally bankrupt
1.5) are swindlers
2) are not very thoughtful in regards to the concerns of the 99%
3) are greedy
4) are arrogant … they think they are the best and the brightest, and point to the fake wealth they created to justify their salary
5) are overworked, stressed out at work, and their job is slowly killing them physically and emotionally
6) have expectations of a lavish lifestyle (nothing wrong with that, just not for me…I can’t compete, and perhaps mo money mo problems)

Am I right or am I right? Should I even bother expressing an interest in these profiles?

Just Pondering

Dear Just,

There are two questions here, which I’d like to pose separately.

First, are investment bankers are morally bankrupt swindlers who ignore lesser folk and hate their jobs?

Second, how do optimize my chances of finding love – or at least great sex with a tolerable partner – on an online dating site?

The answer to the first questions is, of course not. There are plenty of people in finance and even in investment banking that are perfectly nice and even sensitive and empathetic. On the other hand, there is some story explaining why they’re there, and it often exposes a weird side to them. On the other other hand, who here doesn’t have a weird side? On the whole I’d say, never disqualify someone on one attribute, especially if they otherwise seem great and you find yourself liking them at a basic human level.

The answer to the second question is a lot trickier, though, and is related to the first in the following sense: if you are playing the numbers – which is all you can do on these websites – then you might well decide to avoid investment bankers. After all, you only have so much time and some many free Friday nights, and you want to optimize for best chance of liking someone. All you have is demographic information like their job and age, and even if you gather more information through emails, you might first want to filter out red flags, and you might find “investment banker” to be a red flag.

As an aside, I would love someone to do a quantitative and qualitative investigation to see how people have changed their dating and mating habits through online dating. It seems like the most profound area of the internet affecting cultural practices.

My bottomline suggestion is to try to find a date through a friend of a friend. Good luck!

Aunt Pythia


Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!


Categories: Aunt Pythia

Get every new post delivered to your Inbox.

Join 887 other followers