Author Archive

Tailored political ads threaten democracy

Not sure if you saw this recent New York Times article on the new data-driven political ad machines. Consider for example, the 2013 Virginia Governor campaign won by Terry McAuliffe:

…the McAuliffe campaign invested heavily in both the data and the creative sides to ensure it could target key voters with specialized messages. Over the course of the campaign, he said, it reached out to 18 to 20 targeted voter groups, with nearly 4,000 Facebook ads, more than 300 banner display ads, and roughly three dozen different pre-roll ads — the ads seen before a video plays — on television and online.

Now I want you to close your eyes and imagine what kind of numbers we will see for the current races, not to mention the upcoming presidential election.

What’s crazy to me about the Times article is that it never questions the implications of this movement. The biggest problem, it seems, is that the analytics have surpassed the creative work of making ads: there are too many segments of populations to tailor the political message to, and not enough marketers to massage those particular messages for each particular segment. I’m guessing that there will be more money and more marketers in the presidential campaign, though.

Translation: politicians can and will send different messages to individuals on Facebook, depending on what they think we want to hear. Not that politicians follow through with all their promises now – they don’t, of course – but imagine what they will say when they can make a different promise to each group. We will all be voting for slightly different versions of a given story. We won’t even know when the politician is being true to their word – which word?

This isn’t the first manifestation of different messages to different groups, of course. Romney’s famous “47%” speech was a famous example of tailored messaging to super rich donors. But on the other hand, it was secretly recorded by a bartender working the event. There will be no such bartenders around when people read their emails and see ads on Facebook.

I’m not the only person worried about this. For example, ProPublica studied this in Obama’s last campaign (see this description). But given the scale of the big data political ad operations now in place, there’s no way they – or anyone, really – can keep track of everything going on.

There are lots of ways that “big data” is threatening democracy. Most of the time, it’s by removing open discussions of how we make decisions and giving them to anonymous and inaccessible quants; think evidence-based sentencing or value-added modeling for teachers. But this political campaign ads is a more direct attack on the concept of a well-informed public choosing their leader.

Categories: data science, modeling, rant

Core Econ: a free economics textbook

Today I want to tell you guys about, a free (although you do have to register) textbook my buddy Suresh Naidu is using this semester to teach out of and is also contributing to, along with a bunch of other economists.


This was obviously not taken in New York.

It’s super cool, and I wish a class like that had been available when I was an undergrad. In fact I took an economics course at UC Berkeley and it was a bad experience – I couldn’t figure out why anyone would think that people behaved according to arbitrary mathematical rules. There was no discussion of whether the assumptions were valid, no data to back it up. I decided that anybody who kept going had to be either religious or willing to say anything for money.

Not much has changed, and that means that Econ 101 is a terrible gateway for the subject, letting in people who are mostly kind of weird. This is a shame because, later on in graduate level economics, there really is no reason to use toy models of society without argument and without data; the sky’s the limit when you get through the bullshit at the beginning. The goal of the Core Econ project is to give students a taste for the good stuff early; the subtitle on the webpage is teaching economics as if the last three decades happened.

What does that mean? Let’s take a look at the first few chapters of the curriculum (the full list is here):

  1. The capitalist revolution
  2. Innovation and the transition from stagnation to rapid growth
  3. Scarcity, work and progress
  4. Strategy, altruism and cooperation
  5. Property, contract and power
  6. The firm and its employees
  7. The firm and its customers

Once you register, you can download a given chapter in pdf form. So I did that for Chapter 6, The firm and its employees, and here’s a screenshot of the first page:

Still dry but at least real.

Still dry but at least real.

The chapter immediately dives into a discussion of Apple and Foxconn. Interesting! Topical! Like, it might actually help you understand the newspaper!! Can you imagine that?

The project is still in beta version, so give it some time to smooth out the rough edges, but I’m pretty excited about it already. It has super high production values and will squarely compete with the standard textbooks and curriculums, which is a good thing, both because it’s good stuff and because it’s free.

The war against taxes (and the unmarried)

The American Enterprise Institute, conservative think-tank, is releasing a report today. It’s called For richer, for poorer: How family structures economic success in America, and there is also an event in DC today from 9:30am til 12:15pm that will be livestreamed. The report takes a look at statistics for various races and income levels at how marriage is associated with increased hours works and income, for men especially.

It uses a technique called the “fixed-effects model,” and since I’d never studied that I took a look at it on the wikipedia page, and in this worked-out example on Josh Blumenstock’s webpage of massage prices in various cities, and in this example, on Richard William’s webpage, where it’s also a logit model, for girls in and out of poverty.

The critical thing to know about fixed effects models is that we need more than one snapshot of an object of interest – in this case a person who is or isn’t married – in order to use that person as a control against themselves. So in 1990 Person A is 18 and unmarried, but in 2000 he is 28 and married, and makes way more money. Similarly, in 1990 Person B is 18 and unmarried, but in 2000 he is 28 and still unmarried, and makes more money but not quite as much more money as Person A.

The AEI report cannot claim causality – and even notes as much on page 8 of their report – so instead they talk about a bunch of “suggested causal relationships” between marriage and income. But really what they are seeing is that, as men get more hours at work, they also tend to get married. Not sure why the married thing would cause the hours, though. As women get married, they tend to work fewer hours. I’m guessing this is because pregnancy causes both.

The AEI report concludes, rightly, that people who get married, and come from homes where there were married parents, make more money. But that doesn’t mean we can “prescribe” marriage to a population and expect to see that effect. Causality is a bitch.

On the other hand, that’s not what the AEI says we should do. Instead, the AEI is recommending (what else?) tax breaks to encourage people to get married. Most bizarre of their suggestions, at least to me, is to expand tax benefits for single, childless adults to “increase their marriageability.” What? Isn’t that also an incentive to stay single and childless?

What I’m worried about is that this report will be cleverly marketed, using the phrase “fixed effects,” to make it seem like they have indeed proven “mathematically” that individuals, yet again, are to be blamed for the structural failure of our nation’s work problems, and if they would only get married already we’d all be ok and have great jobs. All problems will be solved by tax breaks.

Categories: economics, modeling, rant

Guest post: Clustering and predicting NYC taxi activity

This is a guest post by Deepak Subburam, a data scientist who works at Tessellate.



Greetings fellow Mathbabers! At Cathy’s invitation, I am writing here about, a public service web app my co-founder and I have developed. It overlays on a Google map around you estimated taxi activity, as expected number of passenger pickups and dropoffs this current hour. We modeled these estimates from the recently released 2013 NYC taxi trips dataset comprising 173 million trips, the same dataset that Cathy’s post last week on deanonymization referenced. Our work will not help you stalk your favorite NYC celebrity, but guide your search for a taxi and maybe save some commute time. My writeup below shall take you through the four broad stages our work proceeded through: data extraction and cleaning , clustering, modeling, and visualization.

We extract three columns from the data: the longitude and latitude GPS coordinates of the passenger pickup or dropoff location, and the timestamp. We make no distinction between pickups and dropoffs, since both of these events imply an available taxicab at that location. The data was generally clean, with a very small fraction of a percent of coordinates looking bad, e.g. in the middle of the Hudson River. These coordinate errors get screened out by the clustering step that follows.

We cluster the pickup and dropoff locations into areas of high density, i.e. where many pickups and dropoffs happen, to determine where on the map it is worth making and displaying estimates of taxi activity. We rolled our own algorithm, a variation on heatmap generation, after finding existing clustering algorithms such as K-means unsuitable—we are seeking centroids of areas of high density rather than cluster membership per se. See figure below which shows the cluster centers as identified by our algorithm on a square-mile patch of Manhattan. The axes represent the longitude and latitude of the area; the small blue crosses a random sample of pickups and dropoffs; and the red numbers the identified cluster centers, in descending order of activity.

Taxi activity clusters

We then model taxi activity at each cluster. We discretize time into hourly intervals—for each cluster, we sum all pickups and dropoffs that occur each hour in 2013. So our datapoints now are triples of the form [<cluster>, <hour>, <activity>], with <hour> being some hour in 2013 and <activity> being the number of pickups and dropoffs that occurred in hour <hour> in cluster <cluster>. We then regress each <activity> against neighboring clusters’ and neighboring times’ <activity> values. This regression serves to smooth estimates across time and space, smoothing out effects of special events or weather in the prior year that don’t repeat this year. It required some tricky choices on arranging and aligning the various data elements; not technically difficult or maybe even interesting, but nevertheless likely better part of an hour at a whiteboard to explain. In other words, typical data science. We then extrapolate these predictions to 2014, by mapping each hour in 2014 to the most similar hour in 2013. So we now have a prediction at each cluster location, for each hour in 2014, the number of passenger pickups and dropoffs.

We display these predictions by overlaying them on a Google maps at the corresponding cluster locations. We round <activity> to values like 20, 30 to avoid giving users number dyslexia. We color the labels based on these values, using the black body radiation color temperatures for the color scale, as that is one of two color scales where the ordering of change is perceptually intuitive.

If you live in New York, we hope you find useful. Regardless, we look forward to receiving any comments.

Aunt Pythia’s advice

You guys know how much Aunt Pythia loves you, right (answer: a ton)?

OK, good. Because that means I can be honest with you. The truth is, I’ve been getting some very weird questions recently, and I’ve had to throw out a bunch of them, sifting through the weeds to find some tulips.

It’s not that I mind it when you guys make up questions. By all means, make shit up! It’s just that the made-up questions have to actually be interesting, or at least they have to have an embedded question which I can answer. So please, no more fantasies of poop in pots, thank you very much!

And just to get that image out of your minds, let me brag about my most recent knitted gift for one of my best friends:

Pattern available here:

I also knitted a matching cap. Very very cute.

Pattern available here, yarn here.

OK, all good? Fantastic! I hope you enjoy today’s tulipy column, and after you’re done,

please think of something interesting, reasonable, and non-excrement related

to ask Aunt Pythia at the bottom of the page!

By the way, if you don’t know what the hell Aunt Pythia is talking about, go here for past advice columns and here for an explanation of the name Pythia.


Aunt Pythia,

My partner (female) and I (also female) have been together for about ten years now. Over that timespan, she’s gained about 100 pounds. Not due to any illness, or pregnancy, just to inactivity and poor eating habits.

I don’t know how to put this any better, but I’m just simply not attracted to her in her current state. I’m actively turned off by her body shape. I know we’ll grow old, and our bodies will change naturally, but we’re not *that* old just yet. And it frustrates me that this is a result of her poor choices–this is ultimately under her control.

I have no desire to leave her. We have kids, she’s my best friend, I love her. I wish there were a switch I could turn on to be… well… turned on. From all the advice I’ve found online, I’m an asshole for feeling this way. I know weight issues are deep rooted and difficult to tackle, I’m empathetic. But this doesn’t change the fact that I’m just not attracted to the weight.

Sincerely not an asshole

Dear Sincerely,

Does this mean you guys aren’t having sex? And neither of you having sex with other people? And are you staying together because of the kids?

Look, there are plenty of marriages that become, over time, not very sexy, and for various reasons. When that happens and there are no kids, I always suggest breaking up. Because yes, it’s great to have a best friend, but if you are also a sexual person it just won’t do to live with your best friend and never get laid.

When there are kids, like there are with you, I’d suggest (possibly) staying together for the kids but (definitely) having sex with other people. The hardest part of this plan is the initial conversation, but if you aren’t having sex right now then it probably won’t really come as a surprise to anyone.

It’s not really a single conversation, of course, and it also isn’t really a negotiation: you are telling your partner that you need sex in your life and you’re going to go find it. And there’s no need to tell her all the details once it’s happening. It can be hard to say, but it’s likely still the kindest and most direct route.

What you don’t say is that if she loses 100 pounds you will be faithful. That would be hurtful and, if you’ve ever examined dieting data, useless. The truth is, it doesn’t really matter why you’re not having sex, just that you’re not having sex. Plus, other people will find your partner super hot.

Once you have that conversation, you will both be free to go be desired and be desirous, which is a better place for both of you.

Good luck!

Aunt Pythia


Dear Aunt Pythia,

Is data science an IT function, or a business function?

I work at a large financial services firm as a data scientist. At our company, we have data scientists on both sides of the wall, integrated into a data architecture group (me) or in analytics hubs across the lines of business (others).

I often question why I’m in IT. I and my business counterparts are often doing very similar work, but sit in vastly different cultures. And I personally feel the culture of business (at least in our company) is more agile and responsive than IT, which is far slower-paced and more monolithic.

Where do you see data science groups sitting? And how can I make the best of my position sitting close to IT?

Caught between two worlds

Dear Caught,

Interesting, for various reasons. First, I think of finance quants as the original data scientists, so it’s funny to me that finance firms are explicitly hiring “data scientists.”

Second, I think of data scientists as living in a third group, outside either IT or business. In some sense the modern data scientist’s job is to translate between those two worlds without being in either of them. But since that’s obviously not how they thought about it in your company, I think the best advice I can give you is to look around for another job. Turns out there are quite a few jobs out there for people with data skills.

You might have to take a pay cut, though. Finance firms tend to pay IT people well, partly for the experience of working in what is often a massively boring place.

Auntie P


Dear Aunt Pythia,

My friend is going for a doctorate at a top department. He has the chance to work with a world renowned scientist who scares the living daylight out of him. He knows he will never possibly be able to meet his advisor’s expectations. So my friend will do everything he can to work on the challenging problems he’s assigned alone, but he does occasionally relent and ask a question. In a few key diagrams his advisor shows him how the problem could be solved. My friend says its like an epiphany, so beautiful and simple, and that he just dreams of possibly ever be that good someday.

Meanwhile I go for a doctorate at a reasonable good department. I am working with a well funded professor who is known for landing her students top notch postdocs with amazing mathematicians. All good, except that she is very demanding and I never seem to be able to meet her expectations. I do everything I can to work on the obscure problems she assigns me but do occasionally give up and ask a question. In a few key diagrams she shows me how the problem could be solved and, boy, I feel like a complete idiot and wonder if I should even be getting a PhD.

Should I find a new advisor or should I just quit?

Brainy Incensed Adolescent Student Earning his Doctorate


Wait, what? Am I supposed to believe these stories? Or is this some kind of test about how things seem when it’s a man versus a woman that is your advisor? I’m a bit confused.

In any case, the options you’ve given – find a new advisor or quit – is missing the most obvious option, which is to continue, because being a graduate student in math, whether your advisor is a man or a woman, is a period where you constantly feel like an idiot. Constantly. So you have no perspective whatsoever.

The most important information you have given me about your future prospects is that your advisor has successful students. So just close your eyes and pretend you might be one of them someday, and keep trying, and keep asking questions, and keep feeling like an idiot, because that’s what learning feels like.

Aunt Pythia


Dear Aunt Pythia,

What are your thoughts on John Cochrane’s post on inequality?  I’m especially curious given that you two seem to have the exact opposite view of e.g. Dodd-Frank.

Fake Name

Dear Fake,

I have trouble reading stuff by people who only refer to taxes as “confiscatory”, so I only skimmed this. But my general feeling is that this man has spent a lifetime figuring out how to use fancy language to avoid the very simple concept of fairness. Particularly when he says:

Maybe the poor should rise up and overthrow the rich, but they never have. Inequality was pretty bad on Thomas Jefferson’s farm. But he started a revolution, not his slaves.

Sounds pretty smug to me, almost like an invitation.

Aunt Pythia


Dear Aunt Pythia,

I read some of your posts on working in a hedge fund. Working as a quant, is there a difference between working for a hedge fund vs. investment bank – in terms of feeling ok about the work that you do? Is that possible at all? And how do you recognize a good, honest hedge fund?


Dear T,

Hahahahahaha! Good one.

Aunt Pythia


Dear Aunt Pythia,

You call yourself “super healthy fat woman”. What is your definition of healthy? How is that different from your definition of “super healthy”?


Dear NYC,

Most days I bike 12 miles. I just got a checkup and all my tests and levels are perfect. I feel incredibly strong and healthy on a daily basis and I haven’t yet reached the period of my life where I get easily injured. For me, that qualifies as “super healthy.” I’m not saying I couldn’t be healthier, say if I had better endurance running, which is hard for me because of my weight, or biking up steep hills, again hard for me.

I usually only mention this stuff because I am, happily, a counter example to the tired stereotype of the lazy fat woman. I have never been lazy, and my weight has basically nothing to do with my exercise levels.

Auntie P


Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!

Click here for a form.

Categories: Aunt Pythia

The class warfare of Halloween

What’s the best thing about Halloween, the dress-up or the candy? Or is it the fact that, for that one night, you can go up to people’s houses and ring their bell and talk to them when they answer the door, and if you’re a kid you can even get demand and receive a gift? (Update: I asked my 6-year-old this question and he answered immediately: “it’s eating the candy afterwards.”)

For me it’s always been about the way social rules get thrown out the window and there’s a celebration of generosity and neighborliness. Costumes are the excuse to tell each other how amazing they look, and candy is the excuse to symbolically exchange a token of friendship.

I pretty much had kids in part so I could start going trick-or-treating again, that’s how much I love it. And yes, I went trick-or-treating well into my teens, it was embarrassing for everyone except my best friends who went with me. Near the end there we’d use the phrase “tricks or beer!” just to make fun of ourselves at being too old to do it. But it was addictive and magical nonetheless because of the human interactions and the broken rules. Thrilling.

Even when I was a grown-up and before I had kids, I was super psyched to live in Somerville, Massachusetts where the trick-or-treating was an intense activity – people would drive to my street with piles of trick-or-treaters because we had the exact right density of buildings and everyone on the block would sit outside cheering on the little groups of candy-grabbers. Later on the older kids would come, and we’d leave whatever was left of our stash in big bowls on the porch. And even when we’d bought 12 bags of candy, it was never very expensive, and money wasn’t the point anyway. The point was the freedom.

At least that was my naive opinion until a friend of mine (subject line “this question made me want to nuke connecticut from orbit”) forwarded me this recent’s Dear Prudence advice column entitled Kids from poorer neighborhoods keep coming to trick-or-treat in mine. Do I have to give them candy? 

Read the column, unless you think you might barf. It’s exactly as bad as you think it is. The good news is that Prudence’s answer is spot on and includes the phrase:

Your whine makes me kind of wish that people from the actual poor side of town come this year not with scary costumes but with real pitchforks.

To tell you the truth, I’d never seen a whiff of class warfare in Halloween until this ridiculous question. But now, having thought about what Halloween represents, as an alternative – if very brief – economic system in which we all actually share (versus the so-called “sharing economy”), I can understand why someone who intensely examines and frets about their place in the hierarchy might find some way to distort it.

Instead of reveling in the inherent rule-breaking nature of Halloween, in other words, this person is threatened by it and wants to control it and make it conform to the class-based system they are familiar with. At least that’s my interpretation, because obviously it’s not really about how much Halloween candy costs.

Or maybe that person is just a witch (or a warlock).


Categories: economics

Links (with annotation)

I’ve been heads down writing this week but I wanted to share a bunch of great stuff coming out.

  1. Here’s a great interview with machine learning expert Michael Jordan on various things including the big data bubble (hat tip Alan Fekete). I had a similar opinion over a year ago on that topic. Update: here’s Michael Jordan ranting about the title for that interview (hat tip Akshay Mishra). I never read titles.
  2. Have you taken a look at Janet Yellen’s speech on inequality from last week? She was at a conference in Boston about inequality when she gave it. It’s a pretty amazing speech – she acknowledges the increasing inequality, for example, and points at four systems we can focus on as reasons: childhood poverty and public education, college costs, inheritances, and business creation. One thing she didn’t mention: quantitative easing, or anything else the Fed has actual control over. Plus she hid behind the language of economics in terms of how much to care about any of this or what she or anyone else could do. On the other hand, maybe it’s the most we could expect from her. The Fed has, in my opinion, already been overreaching with QE and we can’t expect it to do the job of Congress.
  3. There’s a cool event at the Columbia Journalism School tomorrow night called #Ferguson: Reporting a Viral News Story (hat tip Smitha Corona) which features sociologist and writer Zeynep Tufekci among others (see for example this article she wrote), with Emily Bell moderating. I’m going to try to go.
  4. Just in case you didn’t see this, Why Work Is More And More Debased (hat tip Ernest Davis).
  5. Also: Poor kids who do everything right don’t do better than rich kids who do everything wrong (hat tip Natasha Blakely).
  6. Jesse Eisenger visits the defense lawyers of the big banks and writes about his experience (hat tip Aryt Alasti).

After writing this list, with all the hat tips, I am once again astounded at how many awesome people send me interesting things to read. Thank you so much!!


Get every new post delivered to your Inbox.

Join 2,068 other followers