mathbabe

Huma’s Little Weiner Problem

July 26, 2013 Cathy O'Neil, mathbabe 13 comments

This is a guest post by my friend Laura Strausfeld.

As an unlicensed psychotherapist, here’s my take on why Huma Abedin is supporting her husband Anthony Weiner’s campaign for mayor:

It’s all about the kid.

Jordan Weiner is 19 months old. When he’s 8 or 9—or 5, and wearing google glasses—maybe he’ll google his name and read about his father’s penis. Either that, or one of his buddies at school may ask him about his father’s penis. Jordan might then ask his mommy and daddy about his father’s penis and they’ll tell him either 1) your daddy was a great politician, but had to resign from Congress because he admitted to showing people his penis, which we recommend you don’t do, especially when you’re a grownup and on twitter; or 2) your daddy was a great politician and ran a very close race for mayor—that’s right, your daddy was almost mayor of New York City!—but he lost because people said he showed people his penis and that’s none of anybody’s business.

Let’s look at this from Huma’s perspective. She’s got a child for a husband, with a weird sexual addiction that on the positive side, doesn’t appear to carry the threat of STDs. But her dilemma is not about her marriage. The marriage is over. What she cares about is Jordan. And this is where she’s really fucked. Whatever happens, Anthony will always be her child’s father.

That bears repeating. You’ve got a child you love more than anything in the world, will sacrifice anything for, and will always now be stigmatized as the son of a celebrity-sized asshole. What are your choices?

The best scenario for Huma is if Anthony becomes mayor. Then she can divorce his ass, get primary custody and protect her child from growing up listening to penis jokes about his loser father. There will be jokes, but at least they’ll be about the mayor’s penis. And with a whole lot of luck, they might even be about how his father’s penis was a lot smaller in the mind of the public than his policies.

Weiner won’t get my vote, however. And for that, I apologize to you, Jordan. You have my sympathy, Huma.

Categories: guest post

Radhika Nagpal is a role model for fun people everywhere

July 25, 2013 Cathy O'Neil, mathbabe 10 comments

Can I hear an amen for Radhika Nagpal, the brave woman who explained to the world recently how she lived through being a tenure-track professor at Harvard without losing her soul?

You should really read Nagpal’s guest blogpost from Scientific American (hat tip Ken Ribet) yourself, but here’s just a sneak preview, namely her check list of survival tactics that she describes in more detail later in the piece:

I decided that this is a 7-year postdoc.
I stopped taking advice.
I created a “feelgood” email folder.
I work fixed hours and in fixed amounts.
I try to be the best “whole” person I can.
I found real friends.
I have fun “now”.

I really love this list, especially the “stop taking advice” part. I can’t tell you how much crap advice you get when you’re a tenure-track woman in a technical field. Nagpal was totally right to decide to ignore it, and I wish I’d taken her advice to ignore people’s advice, even though that sounds like a logical contradiction.

What I like the most about her list was her insistence on being a whole person and having fun – I have definitely had those rules since forever, and I didn’t have to make them explicit, I just thought of them as obvious, although maybe it was for me because my alternative was truly dark.

It’s just amazing how often people are willing to make themselves miserable and delay their lives when they’re going for something ambitious. For some reason, they argue, they’ll get there faster if they’re utterly submissive to the perceived expectations.

What bullshit! Why would anyone be more efficient at learning, at producing, or at creating when they’re sleep-deprived and oppressed? I don’t get it. I know this sounds like a matter of opinion but I’m super sure there’ll be some study coming out describing the cognitive bias which makes people believe this particular piece of baloney.

Here’s some advice: go get laid, people, or whatever it is that you really enjoy, and then have a really good night’s sleep, and you’ll feel much more creative in the morning. Hell, you might even think of something during the night – all my good ideas come to me when I’m asleep.

Even though her description of tenure-track life resonates with me, this problem, of individuals needlessly sacrificing their quality of life, isn’t confined to academia by any means. For example I certainly saw a lot of it at D.E. Shaw as well.

In fact I think it happens anywhere where there’s an intense environment of expectation, with some kind of incredibly slow-moving weeding process – academia has tenure, D.E. Shaw has “who gets to be a Managing Director”. People spend months or even years in near-paralysis wondering if their superiors think they’re measuring up. Gross!

Ultimately it happens to someone when they start believing in the system. Conversely the only way to avoid that kind of oppression is to live your life in denial of the system, which is what Nagpal achieved by insisting on thinking of her tenure-track job as having no particular goal.

Which didn’t mean she didn’t work hard and get her personal goals done, and I have tremendous respect for her work ethic and drive. I’m not suggesting that we all get high-powered positions and then start slacking. But we have to retain our humanity above all.

Bottomline, let’s perfect the art of ignoring the system when it’s oppressive, since it’s a useful survival tactic, and also intrinsically changes the system in a positive way by undermining it. Plus it’s way more fun.

Categories: math, musing, women in math

MOOCs, their failure, and what is college for anyway?

July 24, 2013 Cathy O'Neil, mathbabe 22 comments

Have you read this recent article in Slate about they canceled online courses at San Jose State University after more than half the students failed? The failure rate ranged from 56 to 76 percent for five basic undergrad classes with a student enrollment limit of 100 people.

Personally, I’m impressed that so many people passed them considering how light-weight the connection is in such course experiences. Maybe it’s because they weren’t free – they cost $150.

It all depends on what you were expecting, I guess. It begs the question of what college is for anyway.

I was talking to a business guy about the MOOC potential for disruption, and he mentioned that, as a Yale undergrad himself, he never learned a thing in classes, that in fact he skipped most of his classes to hang out with his buddies. He somehow thought MOOCs would be a fine replacement for that experience. However, when I asked him whether he still knew any of his buddies from college, he acknowledged that he does business with them all the time.

Personally, this confirms my theory that education is more about making connection than education per se, and although I learned a lot of math in college, I also made a friend who helped me get into grad school and even introduced me to my thesis advisor.

Categories: math education, open source tools

Proprietary credit score model now embedded in law

July 23, 2013 Cathy O'Neil, mathbabe 14 comments

I’ve blogged before about how I find it outrageous that the credit scoring models are proprietary, considering the impact they have on so many lives.

The argument given for keeping them secret is that otherwise people would game the models, but that really doesn’t make sense.

After all, the models that the big banks have to deal with through regulation aren’t secret, and they game those models all the time. It’s one of the main functions of the banks, in fact, to figure out how to game the models. So either we don’t mind gaming or we don’t hold up our banks to the same standards as our citizens.

Plus, let’s say the models were open and people started gaming the credit score models – what would that look like? A bunch of people paying their electricity bill on time?

Let’s face it: the real reason the models are secret is that the companies who set them up make more money that way, pretending to have some kind of secret sauce. What they really have, of course, is a pretty simple model and access to an amazing network of up-to-date personal financial data, as well as lots of clients.

Their fear is that, if their model gets out, anyone could start a credit scoring agency, but actually it wouldn’t be so easy – if I wanted to do it, I’d have to get all that personal data on everyone. In fact, if I could get all that personal data on everyone, including the historical data, I could easily build a credit scoring model.

So anyhoo, it’s all about money, that and the fact that we’re living under the assumption that it’s appropriate for credit scoring companies to wield all this power over people’s lives, including their love lives.

It’s like we have a secondary system of secret laws where we don’t actually get to see the rules, nor do we get to point out mistakes or reasonably refute them. And if you’re thinking “free credit report,” let’s be clear that that only tells you what data goes in to the model, it doesn’t tell you how it’s used.

As it turns out, though, it’s now more than like a secondary system of laws – it’s become embedded in our actual laws. Somehow the proprietary credit scoring company Equifax is now explicitly part of our healthcare laws. From this New York Times article (hat tip Matt Stoller):

Federal officials said they would rely on Equifax — a company widely used by mortgage lenders, social service agencies and others — to verify income and employment and could extend the initial 12-month contract, bringing its potential value to $329.4 million over five years.

Contract documents show that Equifax must provide income information “in real time,” usually within a second of receiving a query from the federal government. Equifax says much of its information comes from data that is provided by employers and updated each payroll period.

Under the contract, Equifax can use sources like credit card applications but must develop a plan to indicate the accuracy of data and to reduce the risk of fraud.

Thanks Equifax, I guess we’ll just trust you on all of this.

Categories: finance, modeling, open source tools, rant

If we bailed out the banks, why not Detroit? (#OWS)

July 22, 2013 Cathy O'Neil, mathbabe 15 comments

I wrote a post yesterday to discuss the fact that, as we’ve seen in Detroit and as we’ll soon see across the country, the math isn’t working out on pensions. One of my commenters responded, saying I was falling for a “very right wing attack on defined benefit pensions.”

I think it’s a mistake to think like that. If people on the left refuse to discuss reality, then who owns reality? And moreover, who will act and towards what end?

Here’s what I anticipate: just as “bankruptcy” in the realm of airlines has come to mean “a short period wherein we toss our promises to retired workers and then come back to life as a company”, I’m afraid that Detroit may signal the emergence of a new legal device for cities to do the same thing, especially the tossing out of promises to retired workers part. A kind of coordinated bankruptcy if you will.

It comes down to the following questions. For whom do laws work? Who can trust that, when they enter a legal obligation, it will be honored?

From Trayvon Martin to the people who have been illegally foreclosed on, we’ve seen the answer to that.

And then we might ask, for whom are laws written or exceptions made? And the answer to that might well be for banks, in times of crisis of their own doing, and so they can get their bonuses.

I’m not a huge fan of the original bailouts, because it ignored the social and legal contracts in the opposite way, that failures should fail and people who are criminals should go to jail. It didn’t seem fair then, and it still doesn’t now, as JP Morgan posts record $6.4 billion profits in the same quarter that it’s trying to settle a $500 million market manipulation charge.

It’s all very well to rest our arguments on the sanctity of the contract, but if you look around the edges you’ll see whose contracts get ripped up because of fraudulent accounting, and whose bonuses get bigger.

And it brings up the following question: if we bailed out the banks, why not the people of Detroit?

Categories: #OWS, finance, rant

Math fraud in pensions

July 21, 2013 Cathy O'Neil, mathbabe 25 comments

I wrote a post three months ago talking about how we don’t need better models but we need to stop lying with our models. My first example was municipal debt and how various towns and cities are in deep debt partly because their accounting for future pension obligations allows them to be overly optimistic about their investments and underfund their pension pots.

This has never been more true than it is right now, and as this New York Times Dealbook article explains, was a major factor in Detroit’s bankruptcy filing this past week. But don’t make any mistake: even in places where they don’t end up declaring bankruptcy, something is going to shake out because of these broken models, and it isn’t going to be extra money for retired civil servants.

It all comes down to wanting to avoid putting required money away and hiring quants (in this case actuaries) to make that seem like it’s mathematically acceptable. It’s a form of mathematical control fraud. From the article:

When a lender calculates the value of a mortgage, or a trader sets the price of a bond, each looks at the payments scheduled in the future and translates them into today’s dollars, using a commonplace calculation called discounting. By extension, it might seem that an actuary calculating a city’s pension obligations would look at the scheduled future payments to retirees and discount them to today’s dollars.

But that is not what happens. To calculate a city’s pension liabilities, an actuary instead projects all the contributions the city will probably have to make to the pension fund over time. Many assumptions go into this projection, including an assumption that returns on the investments made by the pension fund will cover most of the plan’s costs. The greater the average annual investment returns, the less the city will presumably have to contribute. Pension plan trustees set the rate of return, usually between 7 percent and 8 percent.

In addition, actuaries “smooth” the numbers, to keep big swings in the financial markets from making the pension contributions gyrate year to year. These methods, actuarial watchdogs say, build a strong bias into the numbers. Not only can they make unsustainable pension plans look fine, they say, but they distort the all-important instructions actuaries give their clients every year on how much money to set aside to pay all benefits in the future.

One caveat: if the pensions have actually been making between 7 percent and 8 percent on their investments every year then all is perhaps well. But considering that they typically invest in bonds, not stocks – which is a good thing – we’re likely seeing much smaller returns than that, which means their yearly contributions to the local pension plans are in dire straits.

What’s super interesting about this article is that it goes into the action on the ground inside the Actuary community, since their reputations are at stake in this battle:

A few years ago, with the debate still raging and cities staggering through the recession, one top professional body, the Society of Actuaries, gathered expert opinion and realized that public pension plans had come to pose the single largest reputational risk to the profession. A Public Plans Reputational Risk Task Force was convened. It held some meetings, but last year, the matter was shifted to a new body, something called the Blue Ribbon Panel, which was composed not of actuaries but public policy figures from a number of disciplines. Panelists include Richard Ravitch, a former lieutenant governor of New York; Bradley Belt, a former executive director of the Pension Benefit Guaranty Corporation; and Robert North, the actuary who shepherds New York City’s five big public pension plans.

I’m not sure what happened here, but it seems like a bunch of people in a profession, the actuaries, got worried that they were being used by politicians, and decided to investigate, but then that initiative got somehow replaced by a bunch of politicians. I’d love to talk to someone on the inside about this.

Categories: finance, math, modeling, statistics

Aunt Pythia’s advice

July 20, 2013 Cathy O'Neil, mathbabe 8 comments

Aunt Pythia is back and, since her family has finally been reunited, sleeping well. Thank goodness! Hallelujah!

I’m psyched to be getting some great questions from the math community. If you’re a math nerd, and even if you’re not, please:

Submit your question for Aunt Pythia at the bottom of this page!

By the way, if you don’t know what the hell I’m talking about, go here for past advice columns and here for an explanation of the name Pythia.

——

Dear Aunt Pythia,

I’ve been thinking a lot about your remark from this previous post:

Like a lot of academics, he understands ambition in one narrow field, and doesn’t even relate to not wanting to be successful in this realm

That has really resonated with me. I am trying to make it as an academic, and I admit I am super boring because all I really care about is math and exercise, and I’m not really smart enough or care enough to have an informed opinion of much else.

Unfortunately this makes it hard to attract women, and the ones I have gone out on dates with said that I am not very engaging. On top of that most women want children, and I have read (and agree with) your post on why wanting children is ridiculous. I am also not located in a region where I have any colleagues or even graduate students working in my area of math to talk math with and so I feel pretty isolated in so many levels.

What does it take to become a math professor at an ivy league caliber institution (e.g. Harvard, MIT Columbia, Princeton)? Does one have to be working/thinking about math for much of one’s day? I presume you have an inside view.

Math is Titillating

Dear MiT,

First of all thanks for bringing up that previous answer. I have gotten a lot of people writing in saying I misinterpreted his description of taking extra time to finish his Ph.D.; most people generally think he only took one extra year whereas I read it as two extra years, which makes a big difference. Given this, I was probably too harsh on the guy, although I still think grad students should go to seminars.

As an aside, when did we start using “last year” to mean “this year” and “next year” to mean “next year” but stopped using “this year” to mean anything?

Now on to your question. Do you have to be thinking about math all the time to get a great job? Probably. There are exceptions but they’re rare, as you know.

Let’s face it, this wasn’t really a question for Aunt Pythia. I think you just identified with the description of being boring and only caring about getting a fancy math job, since that’s all you actually care about, as evidenced by your question.

But hey, I’m Aunt Pythia, so I’ve got advice for you anyway.

Don’t feel bad about it! It’s just how you’re programmed, it’s fine. You love math and not much else! Shout it loud from the rooftops and you might just find a girl nerd who’s psyched with your boring self. Just please don’t expect everyone else to be like you, especially your graduate students.

Aunt Pythia

——

Dear Aunt Pythia,

I’m a math professor in a bit of an ethical quandary.

There is a researcher in my field who is widely known (by those in the field) to be a Certified Asshole (CA). He cuts down other people and their work, often in underhanded and awful ways. The people in question are often women (but not always) and often young (grad students or postdocs). He is a tenured full prof at a Very Good School, though, so those who don’t know him respect the position and his publication record. They consider him to be a Serious Person instead of the CA that he is.

In our recent round of hiring, I read the packet of a very talented graduating student who is applying for postdocs. This student has a few publications already including one very, very nice result. He is also a current collaborator of mine, and I know him a bit personally.

The letter in the student’s application from CA (another collaborator of the student) is underhanded and sabotaging. It says nothing outright negative, of course, but has key phrases like “promising teaching career at a liberal arts school” or somesuch. It also manages to be self-aggrandizing about CA himself rather than praising the grad student and his work.

This student did not get any offers this year, and I know he will be on the market again this year. I can’t help thinking that this letter is hurting his chances for a research postdoc. CA is not his advisor. While it would help to have a good letter from a person in a position such as CA’s, I don’t think this particular letter is helping him.

I can’t figure out an ethical way to help the student. I can’t come out and tell him what’s in the letter. I can’t really say anything even alluding to that. Is there anything I can do to help him?

Better yet, is there anything I can do to hurt CA even though I am in a more junior position at a less well-respected school?

Math is Awesome, People Suck

Dear MAPS,

What a rich question! There are so many issues here, I do believe we could start an entire blog addressing just this ethical quandary, worked out in its entirety.

First of all, I agree that there is an ethical quandary, mostly because you read the CA letter.

If you’d told your friend not to get a letter from the CA beforehand, because he’s a known shitty letter writer, I think that would have been fine and not unethical. But given that you didn’t, and that your friend got that letter, and that you read the letter, it would now seem like spying to go back and tell your friend to get a new letter in the next round. After all, if you’d read the letter and it was great, then you wouldn’t be telling your friend to go get a new letter writer.

As an aside, it doesn’t make sense to me that, during the hiring process, people read the folders of their current collaborators – doesn’t that seem ripe for this kind of conflict of interest?

Now just a few words on “shitty letter writers” before we go on to actual advice. There are different kinds of shitty letter writers, which I’ll split into two broad categories: the tough letter writer, who has consistently high standards and doesn’t wax poetic about anyone ever, and the narcissistic letter writer, who is inconsistent with their praise, sometimes cold sometimes hot, depending on idiosyncratic things like whether they like the young person’s personality and whether they’ve seen enough citations to the narcissist’s own work.

In the large and relatively functional system that is recommendation letters for math jobs, the tough letter writer is a pretty familiar concept, and the system has adapted more or less to its existence. In other words, people who read a lot of letters in a lot of folders get to know the letter writers and they say stuff to themselves along the lines of, oh this guy never writes good letters, so given that, this letter is actually pretty good!

Of course that’s not to say that it’s a perfect system of adaptation to such tough letter writing biases: for sure there are hiring committees unfamiliar with those letter writers, and for those students who have those tough letters, they inevitably suffer in such situations.

On the other hand, if you tried to explicitly adjust this problem, you could be inviting other, even bigger problems. For example, if you had a public yet anonymous webpage which scored every letter writer on a scale of toughness, then the young people looking for jobs might feel like to compete, they’d need to only get letters from people who always write good letters (they exist), and then the entire system would fail because the letters would contain less and less information. That would be a problem.

OK, what about the narcissist letter writer? That’s harder, since they’re not consistently tough, but rather they’re tough on people they just don’t like for whatever reason. It’s much much harder for people on hiring committees to spot the narcissists, and thus those narcissists probably do lots of damage. Luckily they’re also less common then the tough letter writers, but of course they exist.

I’d like to respond to your last question, about wanting to hurt CA, who I’m guessing is a narcissist letter writer, and even though the question is posed strangely.

I don’t think it’s unethical, when you’re counseling any person in your field from now on, to explicitly suggest not using that guy, or for that matter any narcissist letter writer. Of course, this is before you’ve read the putative letter, and of course the person might think you’re wrong and might ignore your advice (and of course, you might be wrong).

My advice to you about the person who didn’t get a job this year (note usage of “this year”): make sure they’re aware of how much letters count, and how different writers are known for different styles, and tell them to consider getting new letters. Ask them to explicitly ask their letter writers whether their letters are good, and define “good”, something I always counsel people to do when they ask for letters. I don’t think you can do much more than this.

But I’m eager to hear what Jason Starr thinks, he’s always very thoughtful!

Best,

Aunt Pythia

——

Dear Aunt Pythia,

You write an amazing blog that

lets your readership get to know you as a person and
showcases your interests and expertise without
too much compartmentalizing.

Help a sister out with some advice for how to achieve similar results?

Bridging Lives Online Gets Gnarly Yo

Dear BLOGGY,

My advice is to

Set aside time every day to write. Consistency is your friend.
Choose a (possibly imaginary) friend of yours each day to write to – your audience – that is on your side but will also ask clarifying questions, and explain something to them that you find interesting. That’s a blog post!
Also, explain one idea well, then stop. People can barely stand one idea before losing interest.

Good luck, I know you’re gonna rock it!!!

Love,

Auntie P

——

Please submit your well-specified, fun-loving, cleverly-abbreviated ethical quandary to Aunt Pythia!

Categories: Aunt Pythia

The Stop and Frisk sleight of hand

July 19, 2013 Cathy O'Neil, mathbabe 7 comments

I’m finishing up an essay called “On Being a Data Skeptic” in which I catalog different standard mistakes people make with data – sometimes unintentionally, sometimes intentionally.

It occurred to me, as I wrote it, and as I read the various press conferences with departing mayor Bloomberg and Police Commissioner Raymond Kelly when they addressed the Stop and Frisk policy, that they are guilty of making one of these standard mistakes. Namely, they use a sleight of hand with respect to the evaluation metric of the policy.

Recall that an evaluation metric for a model is the way you decide whether the model works. So if you’re predicting whether someone would like a movie, you should go back and check whether your recommendations were good, and revise your model if not. It’s a crucial part of the model, and a poor choice for it can have dire consequences – you could end up optimizing to the wrong thing.

[Aside: as I’ve complained about before, the Value Added Model for teachers doesn’t have an evaluation method of record, which is a very bad sign indeed about the model. And that’s a Bloomberg brainchild as well.]

So what am I talking about?

Here’s the model: stopping and frisking suspicious-looking people in high-crime areas will improve the safety and well-being of the city as a whole.

Here’s Bloomberg/Kelly’s evaluation method: the death rate by murder has gone down in New York during the policy. However, that rate is highly variable and depends just as much on whether there’s a crack epidemic going on as anything else. Or maybe it’s improved medical care. Truth is people don’t really know. In any case ascribing credit for the plunging death rate to Stop and Frisk is a tenuous causal argument. Plus since Stop and Frisk events have decreased drastically recently, we haven’t seen the murder rate shoot up.

Here’s another possible evaluation method: trust in the police. And considering that 400,000 innocent black and Latino New Yorkers were stopped last year under this policy (here are more stats), versus less than 50,000 whites, and most of them were young men, it stands to reason that the average young minority male feels less trust towards police than the average young white male. In fact, this is an amazing statistic put together by the NYCLU from 2011:

The number of stops of young black men exceeded the entire city population of young black men (168,126 as compared to 158,406).

If I’m a black guy I have an expectation of getting stopped and frisked at least once per year. How does that make me trust cops?

Let’s choose an evaluation method closer to what we can actually control, and let’s optimize to it.

Update: a guest columnist fills in for David Brooks, hopefully not for the last time, and gives us his take on Kelly, Obama, and racial profiling.

Categories: data science, modeling, rant

Money in politics: the BFF project

July 16, 2013 Cathy O'Neil, mathbabe 9 comments

This is a guest post by Peter Darche, an engineer at DataKind and recent graduate of NYU’s ITP program. At ITP he focused primarily on using personal data to improve personal social and environmental impact. Prior to graduate school he taught in NYC public schools with Teach for America and Uncommon Schools.

We all ‘know’ that money influences the way congressmen and women legislate; at least we certainly believe it does. According to poll conducted by law professor Larry Lessig for his book Republic Lost, 75% of respondents (Republican and Democrat) said that ‘money buys results in Congress.’

And we have good reason to believe so. With astronomical sums of campaign money flowing into the system and costly, public-welfare reducing legislation coming out, it’s the obvious explanation.

But what does that explanation really tell us? Yes, a congresswoman’s receiving millions dollars from an industry then voting with that industry’s interests reeks of corruption. But, when that industry is responsible for 80% of her constituents’ jobs the causation becomes much less clear and the explanation much less informative.

The real devil is in the details. It is in the ways that money has shaped her legislative worldview over time and in the small, particular actions that tilt her policy one way rather than another.

In the past finding these many and subtle ways would have taken a herculean effort: untold hours collecting campaign contributions, voting records, speeches, and so on. Today however, due to the efforts of organizations like the Sunlight Foundation and Center for Responsive Politics, this information is online and programmatically accessible; you can write a few lines of code and have a computer gather it all for you.

The last few months Cathy O’Neil, Lee Drutman (a Senior Fellow at the Sunlight Foundation), myself and others have been working on a project that leverages these data sources to attempt to unearth some of these particular facts. By connecting all the avenues by which influence is exerted on the legislative process to the actions taken by legislators, we’re hoping to find some of the detailed ways money changes behavior over time.

The ideas is this: first, find and aggregate what data exists related to the ways influence can be exerted on the legislative process (data on campaign contributions, lobbying contributions, etc), then find data that might track influence manifesting itself in the legislative process (bill sponsorships, co-sponsorships, speeches, votes, committee memberships, etc). Finally, connect the interest group or industry behind the influence to the policies and see how they change over time.

One immediate and attainable goal for this project, for example, is to create an affinity score between legislators and industries, or in other words a metric that would indicate the extent to which a given legislator is influenced by and acts in the interest of a given industry.

So far most of our efforts have focused on finding, collecting, and connecting the records of influence and legislative behavior. We’ve pulled in lobbying and campaign contribution data, as well as sponsored legislation, co-sponsored legislation, speeches and votes. We’ve connected the instances of influence to legislative actions for a given legislator and visualized it on a timeline showing the entirety of a legislator’s career.

Here’s an example of how one might use the timeline. The example below is of Nancy Pelosi’s career. Each green circle represents a campaign contribution she received, and is grouped within a larger circle by the month it was recorded by the FEC. Above are colored rectangles representing legislative actions she took during the time-period in focus (indigo are votes, orange speeches, red co-sponsored bills, blue sponsored bills). Some of the green circles are highlighted because the events have been filtered for connection to health professionals.

Changing the filter to Health Services/HMOs, we see different contributions coming from that industry as well as a co-sponsored bill related to that industry.

Mousing over the bill indicates its a proposal to amend the Social Security act to provide Medicaid coverage to low-income individuals with HIV. Further, looking around at speeches, one can see a relevant speech about the children’s health insurance. Clicking on the speech reveals the text.

By combining data about various events, and allowing users to filter and dive into them, we’re hoping to leverage our natural pattern-seeking capabilities to find specific hypotheses to test. Once an interesting pattern has been found, the tool would allow one to download the data and conduct analyses.

Again, It’s just start, and the timeline and other project related code are internal prototypes created to start seeing some of the connections. We wanted to open it up to you all though to see what you all think and get some feedback. So, with it’s pre-alphaness in mind, what do you think about the project generally and the timeline specifically? What works well – helps you gain insights or generate hypotheses about the connection between money and politics – and what other functionality would you like to see?

The demo version be found here with data for the following legislators:

Nancy Pelosi
John Boehner
Cathy McMorris Rodgers
John Boehner
Eric Cantor
James Lankford
John Cornyn
Nancy Pelosi
James Clyburn
Kevin McCarthy
Steny Hoyer

Note: when the timeline is revealed, click and drag over content at the bottom of the timeline to reveal the focus events.

Categories: guest post, modeling, open source tools

THIS REQUIRES YOUR MOCKERY

July 14, 2013 Cathy O'Neil, mathbabe 11 comments

My title today is the subject line of a message I received from my buddy Jordan Ellenberg. Thanks for making things so easy for me to blog this morning, Jordan!

So here’s the subject: a Silicon Valley entrepreneur’s self-help book, including advice on how to quantify and measure your sex life, among other things – every other thing, in fact.

Just in case you’ve missed it, there’s a movement afoot among certain people to collect data about themselves on the level of heart rate, daily exercise and eating patterns, and the like, with the goal of self-improvement.

It’s got a name – the Quantified Self movement – and if I haven’t mentioned it before, it’s because honestly, it’s too easy, and I generally speaking like a challenge.

I saw a bunch of these guys at the health analytics conference I went to a couple of months ago, and let me tell you, they’re weird, and they know it, and they don’t care.

They honestly feel sorry for people who don’t have a Ironman Triathlon (or four) to train for via wireless excel spreadsheets. I mean, how do those people know whether they’ve actually improved? How do they know if they’ve eaten enough carbs? How do they know if they’ve slept??

As far as these Quantified Selfers (QSers) are concerned, it’s only a matter of time before everyone is, like them, making themselves perfect, and they’re the vanguard with nothing to be defensive about.

So anyhoo, those QS guys are convinced that they’re accomplishing something with all of their number collecting and crunching, like maybe they’ll live forever or something (after curing cancer), and they’re just so douchey I feel sorry for them. Blogging about them and trashing them would be like a mean older kid in the playground telling a bunch of little kids that there’s no Santa Claus.

Why do that? Why pop their bubble?

Here’s why: it’s just plain fun, especially now that they’ve ventured into sexy territory with their spreadsheets.

Here are a couple of questions for the Quantified Sexual Selfers (QSSers) in the audience, please get back to me.

Yes or no: nothing says “hot ‘n’ steamy” like a fitbit readout of historical orgasms.
Where does the sensor band get attached, and does it come with a vibrating option?
Are your orgasms more satisfying before or after syncing your daily data with Stephen Wolfram’s?
What’s your metric of success, and how do you know your girlfriend ain’t gaming the system?

Categories: modeling, musing

Aunt Pythia’s advice

July 13, 2013 Cathy O'Neil, mathbabe 3 comments

Aunt Pythia is ever so pleased to be here today, on her 41st birthday no less, spewing forth questionable advice that nobody will be willing to go on the record as having read, but which she knows in her heart each reader secretly treasures.

Now, when Aunt Pythia was on her death bed two weeks ago, the call was raised for more questions, and quickly. And readers, you responded, which brings tears to Aunt Pythia’s eyes, it really does. It brought her back from the brink and she’s eternally grateful.

The problem is, though, this: some of these questions are of dubious substance. To be honest, they’re very short, not extremely well-thought out or juicy, and don’t pose an existential conundrum.

Of course, one doesn’t want to look a gift horse in the mouth, so I’ve arranged to answer these questions in speed-round fashion today. I hope you enjoy it, and please don’t forget:

Submit your existential conundrums to Aunt Pythia at the bottom of this page!

By the way, if you don’t know what the hell I’m talking about, go here for past advice columns and here for an explanation of the name Pythia.

——

Dear Aunt Pythia,

What should I do when, after posting a video from Vi Hart, a reader responds “I’ve got to marry that girl.”?

Math Guy

Dear Math Guy,

Offer to administer the wedding! Turns out you can get certified as a minister with an app called “OrdainThyself”.

Aunt Pythia

——

Dear Aunt Pythia,

If you were a flavor of ice cream, what flavor of ice cream would you be?

Sleepless in Seattle

Dear SiS,

Not sure about me, but my kids would all be Ben & Jerry’s Coffee Heath Bar Crunch, which I ate pretty much continuously and exclusively during my three pregnancies.

Not me, but I had that same stoned expression.

I hope that helps!

Aunt Pythia

——

Dear Aunt Pythia,

I am a 24 year-old grad student, and I’ve noticed the following trend in my life: When I was younger (read, 14 and older), I always was attracted to people around 19 years of age which was too old for me. But now, I’m still attracted to people around 19 years of age, which is quickly getting too young for me. What should I do???

Feeling a little bit like a Cougar…

Dear Wanna-be Cougar,

Just as I can’t claim to be part of the generation of 20-somethings that refuse to make appointments more than 17 minutes in advance, and then only by text, you cannot claim to be a cougar, sorry. That’s reserved for women who are at least 40, possibly 41, and there’s no extra room at this table.

Not me, but I do share the sentiment

In terms of your “problem,” it’s one of those things you can’t control, as far as I know, so just take the posture of bewildered amusement at your own desires, and make sure you don’t do anything illegal or weird.

Smooches,

Aunt Pythia

——

Dear Aunt Pythia,

Since I know how fond you are of bridge, I have a question about slam bidding: Given the fact that you and your partner have a guaranteed slam, what is the probability that you will bid into that slam? What are the ways to maximize that probability, in terms of convention? What are the easiest ways to invite slam to your partner? What is your opinion of cue bidding, and what are the least confusing ways to cue bid?

Seeker Abling Young Cardsharks

Dear SAYC,

I appreciate how your sign-off is code for how I should answer this question.

But even so, I’m going to go with my gut here: when I’m in a perceived slam with my partner, I always make sure to stare knowingly into his or her eyes, with raised eyebrows, and mouth the word “slam”, Colbert-style.

Me.

If that isn’t getting through I squeeze his or her knee under the table. Works every time. For me, bridge is all about being fun and ridiculous, and I never follow the rules unless it’s more fun to do so.

I hope that helps!

Auntie P

——

Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!

Categories: Aunt Pythia

The creepy mindset of online credit scoring

July 12, 2013 Cathy O'Neil, mathbabe 12 comments

Usually I like to think through abstract ideas – thought experiments, if you will – and not get too personal. I take exceptions for certain macroeconomists who are already public figures but most of the time that’s it.

Here’s a new category of people I’ll call out by name: CEO’s who defend creepy models using the phrase “People will trade their private information for economic value.”

That’s a quote of Douglas Merrill, CEO of Zest Finance, taken from this video taken at a recent data conference in Berkeley (hat tip Rachel Schutt). It was a panel discussion, the putative topic of which was something like “Attacking the structure of everything”, whatever that’s supposed to mean (I’m guessing it has something to do with being proud of “disrupting shit”).

Do you know the feeling you get when you’re with someone who’s smart, articulate, who probably buys organic eggs from a nice farmer’s market, but who doesn’t expose an ounce of sympathy for people who aren’t successful entrepreneurs? When you’re with someone who has benefitted so entirely and so consistently from the system that they have an almost religious belief that the system is perfect and they’ve succeeded through merit alone?

It’s something in between the feeling that, maybe you’re just naive because you’ve led such a blessed life, or maybe you’re actually incapable of human empathy, I don’t know which because it’s never been tested.

That’s the creepy feeling I get when I hear Douglas Merrill speak, but it actually started earlier, when I got the following email almost exactly one year ago via LinkedIn:

Hi Catherine,

Your profile looked interesting to me.

I’m seeking stellar, creative thinkers like you, for our team in Hollywood, CA. If you would consider relocating for the right opportunity, please read on.

You will use your math wizardry to develop radically new methods for data access, manipulation, and modeling. The outcome of your work will result in game-changing software and tools that will disrupt the credit industry and better serve millions of Americans.

You would be working alongside people like Douglas Merrill – the former CIO of Google – along with a handful of other ex-Googlers and Capital One folks. More info can be found on our LinkedIn company profile or at www.ZestFinance.com.

At ZestFinance we’re bringing social responsibility to the consumer loan industry.

Do you have a few moments to talk about this? If you are not interested, but know someone else who might be a fit, please send them my way!

I hope to hear from you soon. Thank you for your time.

Regards,
Adam

Wow, let’s “better serve millions of Americans” through manipulation of their private data, and then let’s call it being socially responsible! And let’s work with Capital One which is known to be practically a charity.

What?

Message to ZestFinance: “getting rich with predatory lending” doesn’t mean “being socially responsible” unless you have a really weird definition of that term.

Going back to the video, I have a few more tasty quotes from Merrill:

First when he’s describing how he uses personal individual information scraped from the web: “All data is credit data.”
Second, when he’s comparing ZestFinance to FICO credit scoring: “Context is developed by knowing thousands of things about you. I know you as a person, not just you via five or six variables.”

I’d like to remind people that, in spite of the creepiness here, and the fact that his business plan is a death spiral of modeling, everything this guy is talking about is totally legal. And as I said in this post, I’d like to see some pushback to guys like Merrill as well as to the NSA.

Categories: data science, rant

On being a data science skeptic: due out soon

July 11, 2013 Cathy O'Neil, mathbabe 17 comments

A few months ago, at the end of January, I wrote a post about Bill Gates naive views on the objectivity of data. One of the commenters, “CitizensArrest,” asked me to take a look at a related essay written by Susan Webber entitled “Management’s Great Addiction: It’s time we recognized that we just can’t measure everything.”

Webber’s essay is really excellent, not to mention impressively prescient considering it was published in 2006, before the credit crisis. The format of the essay is simple: it brings up and explains various dangers in the context of measurement and modeling of business data, and calls for finding a space in business for skepticism. What an idea! Imagine if that had actually happened in finance when it should have back in 2006.

Please go read her essay, it’s short.

Recently, when O’Reilly asked me to write an essay, I thought back to this short piece and decided to use it as a template for explaining why I think there’s a just-as-desperate need for skepticism in 2013 here in the big data world as there was back then in finance.

Whereas most of Webber’s essay talks about people blindly accepting numbers as true, objective, precise, and important, and the related tragic consequences, I’ve added a small wrinkle to this discussion. Namely, I also devote concern over the people who underestimate the power of data.

Most of this disregard for unintended consequences is blithe and unintentional (and some of it isn’t), but even so it can be hugely damaging, especially to the individuals being modeled: think foreclosed homes due to crappy housing-related models in the past, and think creepy models and the death spiral of modeling for the present and future.

Anyhoo, I’m actively writing it now, and it’ll be coming out soon. Stay tuned!

Categories: data science, finance, modeling

PyData and a few other things

July 10, 2013 Cathy O'Neil, mathbabe 20 comments

So here’s the thing about being a parent of benign neglect: it’s no walk in the park. I talk a big game, but the truth is I’ve have trouble getting to sleep from the anxiety. To distract myself I’ve been watching Law & Order episodes on Netflix until the wee hours of the night.

Two things about this plan suck. First, my husband is in Amsterdam, which means he’s 6 time zones away from our oldest son whereas I’m only 3, but somehow that means I’m shouldering 99.5% of the responsibility to worry (there’s some universal geographic law of parenting at work there but I don’t know how to formulate it). Second, half of the L&O episodes involve either children getting maimed or killed or child killers. Not restful but I freaking can’t stop!

In any case, not much extra energy to spring out of bed and write the blog, so apologies for a sparse period for mathbabe. For whatever reason I woke up this morning in time to blog, however, so as to not miss an opportunity it’s gonna be in list form:

I’ve been invited to keynote at PyData in Cambridge, MA at the end of the month – me and Travis Oliphant! I’m still coming up with the title and abstract for my talk, but it’s going to be something about storytelling with data using the iPython Notebook. Please make suggestions!
I was in a Wall Street Journal article about Larry Summers, talking about whether he’s got a good personality to take over from Ben Bernanke, i.e. should we trust our lives and our future with him. I say nope. What’s funny is that my uncle, economist Bob Hall, is also referred to in the same article. The journalist didn’t know we’re related until after the article came out and Uncle Bob informed him.
Hey, can we give it up for Eliot Spitzer? The powers that be are down about that guy presumably for having sex with prostitutes but really because he’s a threat. I say legalize prostitution, unionize the prostitutes a la the dutch, and put Spitzer in charge of something involving money and corruption, he’s smart and fearless. Who’s with me?
It looks like good news: the Consumer Financial Protection Bureau might be cracking down on illegal debt collector tactics. Update: wait, the fines are fractions of 1% of the revenue these guys made on their unfair practices. Can we please have a rule that when you get caught breaking the law, the fine will be large enough so it’s no longer profitable?

Categories: news, open source tools

Measuring Up by Daniel Koretz

July 9, 2013 Cathy O'Neil, mathbabe 12 comments

This is a guest post by Eugene Stern.

Now that I have kids in school, I’ve become a lot more familiar with high-stakes testing, which is the practice of administering standardized tests with major consequences for students who take them (you have to pass to graduate), their teachers (who are often evaluated based on standarized test results), and their school districts (state funding depends on test results). To my great chagrin, New Jersey, where I live, is in the process of putting such a teacher evaluation system in place (for a lot more detail and criticism, see here).

The excellent John Ewing pointed me to a pretty comprehensive survey of standardized testing called “Measuring Up,” by Harvard Ed School prof Daniel Koretz, who teaches a course there about this stuff. If you have any interest in the subject, the book is very much worth your time. But in case you don’t get to it, or just to whet your appetite, here are my top 10 takeaways:

Believe it or not, most of the people who write standardized tests aren’t idiots. Building effective tests is a difficult measurement problem! Koretz makes an analogy to political polling, which is a good reminder that a test result is really a sample from a distribution (if you take multiple versions of a test designed to measure the same thing, you won’t do exactly the same each time), and not an absolute measure of what someone knows. It’s also a good reminder that the way questions are phrased can matter a great deal.
The reliability of a test is inversely related to the standard deviation of this distribution: a test is reliable if your score on it wouldn’t vary very much from one instance to the next. That’s a function of both the test itself and the circumstances under which people take it. More reliability is better, but the big trade-off is that increasing the sophistication of the test tends to decrease reliability. For example, tests with free form answers can test for a broader range of skills than multiple choice, but they introduce variability across graders, and even the same person may grade the same test differently before and after lunch. More sophisticated tasks also take longer to do (imagine a lab experiment as part of a test), which means fewer questions on the test and a smaller cross-section of topics being sampled, again meaning more noise and less reliability.
A complementary issue is bias, which is roughly about people doing better or worse on a test for systematic reasons outside the domain being tested. Again, there are trade-offs: the more sophisticated the test, the more extraneous skills beyond those being tested it may be bringing in. One common way to weed out such questions is to look at how people who score the same on the overall test do on each particular question: if you get variability you didn’t expect, that may be a sign of bias. It’s harder to do this for more sophisticated tests, where each question is a bigger chunk of the overall test. It’s also harder if the bias is systematic across the test.
Beyond the (theoretical) distribution from which a single student’s score is a sample, there’s also the (likely more familiar) distribution of scores across students. This depends both on the test and on the population taking it. For example, for many years, students on the eastern side of the US were more likely to take the SAT than those in the west, where only students applying to very selective eastern colleges took the test. Consequently, the score distributions were very different in the east and the west (and average scores tended to be higher in the west), but this didn’t mean that there was bias or that schools in the west were better.
The shape of the score distribution across students carries important information about the test. If a test is relatively easy for the students taking it, scores will be clustered to the right of the distribution, while if it’s hard, scores will be clustered to the left. This matters when you’re interpreting results: the first test is worse at discriminating among stronger students and better at discriminating among weaker ones, while the second is the reverse.
The score distribution across students is an important tool in communicating results (you may not know right away what a score of 600 on a particular test means, but if you hear it’s one standard deviation above a mean of 500, that’s a decent start). It’s also important for calibrating tests so that the results are comparable from year to year. In general, you want a test to have similar means and variances from one year to the next, but this raises the question of how to handle year-to-year improvement. This is particularly significant when educational goals are expressed in terms of raising standardized test scores.
If you think in terms of the statistics of test score distributions, you realize that many of those goals of raising scores quickly are deluded. Koretz has a good phrase for this: the myth of the vanishing variance. The key observation is that test score distributions are very wide, on all tests, everywhere, including countries that we think have much better education systems than we do. The goals we set for student score improvement (typically, a high fraction of all students taking a test several years from now are supposed to score above some threshold) imply a great deal of compression at the lower end of this distribution – compression that has never been seen in any country, anywhere. It sounds good to say that every kid who takes a certain test in four years will score as proficient, but that corresponds to a score distribution with much less variance than you’ll ever see. Maybe we should stop lying to ourselves?
Koretz is highly critical of the recent trend to report test results in terms of standards (e.g., how many students score as “proficient”) instead of comparisons (e.g., your score is in the top 20% of all students who took the test). Standards and standard-based reporting are popular because it’s believed that American students’ performance as a group is inadequate. The idea is that being near the top doesn’t mean much if the comparison group is weak, so instead we should focus on making sure every student meets an absolute standard needed for success in life. There are three (at least) problems with this. First, how do you set a standard – i.e., what does proficient mean, anyway? Koretz gives enough detail here to make it clear how arbitrary the standards are. Second, you lose information: in the US, standards are typically expressed in terms of just four bins (advanced, proficient, partially proficient, basic), and variation inside the bins is ignored. Third, even standards-based reporting tends to slide back into comparisons: since we don’t know exactly what proficient means, we’re happiest when our school, or district, or state places ahead of others in the fraction of students classified as proficient.
Koretz’s other big theme is score inflation for high-stakes tests: if everyone is evaluated based on test scores, everyone has an incentive to get those scores up, whether or not that actually has much correlation with learning. If you remember anything from the book or from this post, remember this phrase: sawtooth pattern. The idea is that when a new high-stakes standardized test appears, average scores start at some base level, go up quickly as people figure out how to game the test, then plateau. If the test is replaced with another, the same thing happens: base, rapid growth, plateau. Repeat ad infinitum. Koretz and his collaborators did a nice experiment in which they went back to a school district in which one high-stakes test had been replaced with another and administered the first test several years later. Now that teachers weren’t teaching to the first test, scores on it reverted back to the original base level. Moral: score inflation is real, pervasive, and unavoidable, unless we bite the bullet and do away with high-stakes tests.
While Koretz is sympathetic toward test designers, who live the complexity of standardized testing every day, he is harsh on those who (a) interpret and report on test results and (b) set testing and education policy, without taking that complexity into account. Which, as he makes clear, is pretty much everyone who reports on results and sets policy.

Final thoughts

If you think it’s a good idea to make high-stakes decisions about schools and teachers based on standardized test results, Koretz’s book offers several clear warnings.

First, we should expect any high-stakes test to be gamed. Worse yet, the more reliable tests, being more predictable, are probably easier to game (look at the SAT prep industry).

Second, the more (statistically) reliable tests, by their controlled nature, cover only a limited sample of the domain we want students to learn. Tests trying to cover more ground in more depth (“tests worth teaching to,” in the parlance of the last decade) will necessarily have noisier results. This noise is a huge deal when you realize that high-stakes decisions about teachers are made based on just two or three years of test scores.

Third, a test that aims to distinguish “proficiency” will do a worse job of distinguishing students elsewhere in the skills range, and may be largely irrelevant for teachers whose students are far away from the proficiency cut-off. (For a truly distressing example of this, see here.)

With so many obstacles to rating schools and teachers reliably based on standardized test scores, is it any surprise that we see results like this?

Categories: guest post, math education, modeling, statistics

Parenting through benign neglect

July 7, 2013 Cathy O'Neil, mathbabe 11 comments

In 1985, when I was 12 years old, I went to communist Budapest by myself, for a month. I’d met and befriended two Hungarian families when I was 11 and they were living next door to me for a year in Lexington, Massachusetts, and when they went back to Budapest they invited me to visit.

So it wasn’t like I didn’t have a place to sleep when I got there, but even so, my parents decided that yes, a trip across the world into a country that needed a visa to enter, that didn’t have a hard currency, and that didn’t have consistent phone lines at post offices (never mind at people’s homes, that was out of the question) was a great place for their 12-year-old daughter to visit by herself.

I also almost didn’t make the correct connection in Zurich, and I am seriously wondering what would have happened if I’d missed my flight. How would I have connected with my hosts? Where would I have slept? What would I have done for money?

I did make my flight, though, and I did meet my hosts, and the worst thing that happened to me was that when the cows got sick, I got sick – very sick. And to be fair, I turned 13 when I was there.

I came home appreciating milk pasteurization, and to a lesser extent milk homogenization. I was skinnier and less spoiled, I knew what really good peaches tasted like, and I was completely sick of paprika. Overall it was a good trip, and I’m glad I went.

And if I or my parents had been more cautious, I wouldn’t have gone. Goes to show you, sometimes it’s good not to think too hard about what could go wrong.

Unfortunately, I’m older now, and my 13-year-old just got on a plane to San Francisco by himself to attend a Model UN camp at Stanford. And all I can think about it what might go wrong.

Don’t get me wrong, it didn’t stop me from putting him on the plane. I’m trying to channel my parents’ benign neglect child-raising technique from which I benefitted so tremendously. He’s got a working cell phone, plenty of cash, and my BFF Becky will be within driving distance of him over there.

Hey, it’s not like he’s going to North Korea – which is, by the way, where he requested to be sent – and I’m pretty sure the milk there is pasteurized, as long as you avoid farmer’s markets.

Categories: musing

Aunt Pythia: alive and well!

July 6, 2013 Cathy O'Neil, mathbabe 5 comments

Aunt Pythia is just bursting with love and admiration for the courageous and articulate readers that sent in their thought-provoking and/or heart-rending questions in the last week which got her off life support and back into fighting shape.

On the one hand, Aunt Pythia did’t want to be a histrionic burden to you all, but on the other hand clearly histrionics work, so there it is. Thank you thank you thank you for allowing histrionics to work.

That’s not to say you should rest on your laurels, readers! First of all, Aunt Pythia always needs new questions (you don’t want her to get sick again, right?), and secondly, I’ve heard laurels can be quite prickly.

In other words,

Submit your question for Aunt Pythia at the bottom of this page!

By the way, if you don’t know what the hell I’m talking about, go here for past advice columns and here for an explanation of the name Pythia.

——

Dear Aunt Pythia,

Isn’t the distribution thing kind of REALLY IMPORTANT for how we think of the sexual partner thing? If fifty women are getting it on with one man, while the other 49 men are, uh, monks, or vice versa, depending on the universe you live in, that certainly influences how you think about stereotypes.

Ms. Hold On A Second

Dear HOAS,

Yes it is, but the average should take care of that as long as the sample size is large enough to have that one lucky man represented, as well as the 49 unlucky men, in the correct proportions.

Let’s go with this a bit. How fat-tailed would sexual practice have to be to make this a problem? After all, there are distributions that defy basic intuition around this – look at the Cauchy Distribution, which has no defined mean or variance, for example. Maybe that’s what’s going on?

Hold on one cotton-picking second! We have a finite number of people in the world, so obviously this is not what’s going on – the average number of sexual partners exists, even if it’s a pain in the ass to compute!

But I’m willing to believe that there’s a sampling bias at work here. Maybe female prostitutes are excluded from surveys, for example. And if men always included their visits to prostitutes, that would introduce a bias.

I’ll go on record saying I doubt that explains the discrepancy, although to be mathematical about it I’d need to have an estimate of how much prostitute sex happens and with how many men. I don’t have that data but maybe someone does.

And of course it’s probably not just one thing. Some combination of the surveys being for college students, and fewer prostitutes being at college, and some actual lying. But my money’s on the lying every time.

Aunt Pythia

——

Dear Aunt Pythia,

I’m not sure this is the correct forum for this question, but here it goes: I come from an economics/econometrics background, where the statistical modeling tool of choice is Stata. I now work at an organization in a capacity that is heavy on statistical modeling, in some cases (but not always) working with “Big Data”.

There is some freedom in terms of the tools we can use, but nobody uses Stata, to my knowledge. As somebody who is just starting out in this industry, I’m trying to get a pulse on which tool I should invest the time into learning, SAS or R. Do you have an opinion either way?

Lonely in Missouri

Dear Lonely,

Always go with the open source option. R, or even better, python. What with pandas and other recent packages, python is just fabulous.

Aunt Pythia

——

Dear Aunt Pythia,

I’m a graduate student in math at a large state research school in the midwest, finishing in 2014. My question is about my advisor and my job plans.

First, here’s what I’m planning to do next year. My wife is a student in the same department as I am, and she’s also finishing next year. We both want to move to a big city. We’d settle for a Philadelphia or Seattle or really anywhere we can live without a car, but by “big city” I really mean New York. We’ve both lived there before and we like it better than anywhere else.

My wife wants a non-academic job. I’m going to apply for research postdocs. I should be a fairly strong candidate, but I’m no superstar and I definitely don’t think it’s assured that I’ll get one, especially with the limited set of places we’re willing to live. And that’s fine! I like the idea of being a professor, but there’s lots of other jobs that I think I’d like too. I know that I wouldn’t like living apart from my wife, or living somewhere that we hate.

My advisor has done a good job of making me into a researcher. The problem is that he’s just a difficult person. Less charitably, he’s an asshole (at least to me). He’s arrogant, rude, and demanding. The one time I ever told him he wasn’t treating me fairly (which I did politely, but in an email), he completely flipped out (in a series of emails) and told me that as his student, I had no right to talk to him that way.

I don’t want to make him sound like a complete monster: he’s from a culture that puts a lot of weight on respect and hierarchy, and I’ve seen him be empathetic and kind. But he absolutely cannot handle it if I disagree with him or don’t do what he says.

In all conversations we’ve had about my future, he seems to have no interest in what I actually want to do. I could have graduated last year, but my department had no problem letting me stay on so that I could finish at the same time as my wife. My advisor was really unhappy about this. His attitude was that a year wasn’t much time to spend away from a spouse (after all, he spent three!), and I should have at least applied for a few prestigious postdocs to maximize my chances of getting one.

Recently, my advisor emailed me just to tell me how disappointed he is in me: I have a bad attitude, I don’t always go to seminars even when he tells me I should, and that I make decisions about my future on my own, instead of in consultation with him. I responded politely (and distantly) to this.

So, here’s the question: should I do anything about all of this? I don’t work with my advisor mathematically anymore, and I’ve been much happier since we stopped. I have other projects to work on and other collaborators to work with, and I think other people in the department would be happy to give me problems or work with me on them. I don’t think my advisor is going to change in any way, and I’m the kind of person who can’t stand to be treated like an underling or told what to do. My advisor has said that he’s still happy to write me a recommendation. What things should I do? I’m hoping your answer is nothing, so that I can continue having as little contact as possible with my advisor.

Feeling Refreshed at the End of an Era

Dear FREE,

Here’s the thing. I have sympathy with some of your story but not with all of it. First I’ll tackle the negative stuff, then I’ll get to the sympathy.

If I understand correctly, you could have graduated last year but instead you’re graduating next year. So you’re staying an extra two years on the department’s dime. Doesn’t that seem a bit strange? How about if you finish and get a job in town as an actuary or something to see if non-academic work suits you? Are you preventing someone from entering the department by being there so long?

Also, you mention that you don’t go to seminars. I don’t think I always went to seminars as a young graduate student, but as I got more senior I appreciated how much language development there is in seminars – even when I didn’t understand the results I learned about how people think and talk about their work by going to seminars. I don’t think it was a waste of my time even though I ultimately left academics. I don’t think it would waste your time to go to seminars.

In other words, you sound like an entitled lazy graduate student, and I’m not so surprised your advisor is fed up with you. And I’m pretty sure your non-academic boss would be even less sympathetic to someone spending an extra two years doing not much.

Now here’s where I do my best to sound nice.

Sounds like your advisor doesn’t get you, possibly because he’s fed up with the above-mentioned issues. Like a lot of academics, he understands ambition in one narrow field, and doesn’t even relate to not wanting to be successful in this realm. That’s probably not going to change, and there’s no reason to take advice from him about how you want to live your life and the decisions you’re making for your family.

So yes, ignore him. But don’t ignore me, and I’m here to say: stop being an entitled lazy-ass.

Aunt Pythia

——

Ok I’ve never heard of Aunt Pythia, and I know this is too easy for her, but I can’t let her die.

Aunt Pythia,

If each woman I date is an independent trial, and the probability of marrying a woman I date is 0.1, how many women do I have to date before I can be at least 90% sure of getting married? (You can substitute “having sex” for “getting married” if you like.)

Anonymous

Dear Anonymous,

Aunt Pythia appreciates the sentiment, and the question.

Let’s sex up the question just a wee bit and change it from “getting married” to “having sex” as you suggested, and also raise your chances a bit to 17%, out of pure human compassion.

Let’s establish some notation: each time you date some woman we will record it either as a “G” for “got laid” or as a “D” for “dry.” So for example, after 4 women you might have a record like:

DDDGD,

which would mean you got laid with the fourth woman but with no other women.

Are we good on notation?

OK now let’s answer the question. How long do we wait for a G?

The trick is to turn it around and ask it another way: how likely is a reeeeeeally long string of D’s?

Chances of one D are good: (100-17)% = 83%.

Chances of two D’s in a row are less good: 0.83*0.83 = 0.67 = 67%.

Chances of three D’s in a row are even less good: $0.83^3 = 55\%.$

If you keep going you’ll notice that chances of 11 D’s in a row is 11% but chances of 12 D’s in a row are only 9%. that means that, by dating 12 women or more, your chances of getting laid are better than 90%. If you think it’s really a 10% chance every time, you’ll have to date 22 women for such odds. I’d suggest you invest in a membership on OK Cupid or some such.

Good luck!

Auntie P

——

Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!

Categories: Aunt Pythia

You give me a capital requirement, I’ll give you a derivative to skirt it.

July 3, 2013 Cathy O'Neil, mathbabe 41 comments

I’ve enjoyed reading Anat Admati and Martin Hellwig’s recent book, The Bankers’ New Clothes, which explains a ton of things extremely well, including:

Differentiating between what’s “good for banks” (i.e. bankers) versus what’s good for the public, and how, through unnecessary complexity and shittons of lobbying money, the “good for bankers” case is made much more often and much more vehemently,
that, when there’s a guaranteed backstop for a loan, the person taking out the loan has incentive to take on more risk, and
that there are two different definitions of “big returns” depending on the context: one means big in absolute value (where -30% is bigger than -10%), the other mean big as in more positive (where -10% is bigger than -30%). Believe it or not, this ambiguity could be (at least metaphorically) taken as a cause of confusion when bankers talk to the public, in the following sense. Namely, when the expected return on an investment is, say, 3%, it makes sense for bankers to lever up their bets so they get “bigger returns” in the first sense, especially since there’s essentially no down side for them (a -30% return doesn’t affect them personally, a 30% return means a huge bonus). From the perspective of the public, they’d like to see the banks go for the “bigger return” in the second sense, so avoid the -30% scenario altogether, via restrained risk-taking.

Admati and Hellwig’s suggestion is to raise capital requirements to much higher levels than we currently have.

Here’s the thing though, and it’s really a question for you readers. How do derivatives show up on the balance sheet exactly, and what prevents me from building a derivative that avoids adding to my capital requirement but which adds risk to my portfolio?

I’ve been getting a lot of different information from people about whether this is possible, or will be possible once Basel III is implemented, but I haven’t reached anyone yet who is actually expert enough to make a definitive claim one way or the other.

It’s one thing if you’re talking about government interest rate swaps, but how do CDS’s, for example, get treated in terms of capital requirements? Is there an implicit probability of default used for accounting purposes? In that case, since such instruments are famously incredibly fat-tailed (i.e. the probability of default looks miniscule until it doesn’t), wouldn’t that encourage everyone to invest extremely heavily in instruments that don’t move their capital ratios much but take on outrageous risks? The devil’s in the detail here.

Categories: finance

The regressive domestic complexity tax

July 2, 2013 Cathy O'Neil, mathbabe 27 comments

I’ve been keeping tabs on hard it is to do my bills. I did my bills last night, and man, I’m telling you, I used all of my organizational abilities, all of my customer service experience, and quite a bit of my alpha femaleness just to get it done. Not to mention I needed more than 2 hours of time which I squeezed out by starting the bills while waiting for take-out.

By the way, I am not one of those sticklers for doing everything myself – I have an accountant, and I don’t read those forms, I just sign them and pray. But even so, removing tax issues from the conversation, the kind of expertise required to do my monthly bills is ridiculous and getting worse.

Take medical bills. I have three kids, so there’s always a few appointments pending, but it’s absolutely amazing to me how often I’m getting charged for appointments unfairly. I recently got charged for a physical for my 10-year-old son, even though I know that physicals are free thanks to ObamaCare.

So I call up my insurance company and complain, spend 15 minutes on the phone waiting, then it turns out he isn’t allowed to have more than one physical in a 12-month period which is why it was charged to me. But wait, he had one last April and one this April, what gives? Turns out last April it was on the 14th and this April it was on the 8th. So less than one year.

But surely, I object, you can’t ask for people to always be exactly 12 months apart or more! It turns out that, yes, they have a 30-day grace period for this exact reason, but for some reason it’s not automatic – it requires a person to call and complain to the insurance company to get their son’s physical covered.

Do you see what I mean? This is not actually a coincidence – insurance companies make big money from having non-automatic grace periods, because many people don’t have the time, the patience, and the pushiness to make them do it right, and that’s free money for insurance companies.

There are the (abstract) “rules” and then there’s what actually happens, and it’s a constant battle between what you know you’re paying for which you shouldn’t be and how much your time is worth. For example, if it’s less than $50 I just pay it even if it’s not reasonable. I’m sure other people have different limits.

I see this as a systemic problem. So this isn’t a diatribe against just insurance companies, because I have to jump through about 15 hoops a month like this just to get my paperwork sorted out, and they are mostly not medical issues. This is really a diatribe against complexity, and the regressive tax that complexity projects onto our society.

Rich people have people to work out their paperwork for them. People like me, we don’t have people to do this, but we have the time, skills, and patience to do it ourselves (and the money to buy takeout while we do it). There are plenty of people with no time, or who aren’t organized to have all the information they need at their fingertips when they make these calls, or are too intimidated by customer service phone lines to work it out.

And, as in the example above, there’s usually a perverse incentive for complexity to exist – people give up and pay extra because it’s not worth doing the paperwork. That means it’s always getting worse.

Bottomline: you shouldn’t need to have a college degree and customer service experience to do your bills. I’d love to see an estimate of how much more in unnecessary fees and accounting errors are paid by the poor in this country.

Categories: rant

Payroll cards: “It costs too much to get my money” (#OWS)

July 1, 2013 Cathy O'Neil, mathbabe 7 comments

If this article from yesterday’s New York Times doesn’t make you want to join Occupy, then nothing will.

It’s about how, if you work at a truly crappy job like Walmart or McDonalds, they’ll pay you with a pre-paid card that charges you for absolutely everything, including checking your balance or taking your money, and will even charge you for not using the card. Because we aren’t nickeling and diming these people enough.

The companies doing this stuff say they’re “making things convenient for the workers,” but of course they’re really paying off the employers, sometimes explicitly:

In the case of the New York City Housing Authority, it stands to receive a dollar for every employee it signs up to Citibank’s payroll cards, according to a contract reviewed by The New York Times.

Thanks for the convenience, payroll card banks!

One thing that makes me extra crazy about this article is how McDonalds uses its franchise system to keep its hands clean:

For Natalie Gunshannon, 27, another McDonald’s worker, the owners of the franchise that she worked for in Dallas, Pa., she says, refused to deposit her pay directly into her checking account at a local credit union, which lets its customers use its A.T.M.’s free. Instead, Ms. Gunshannon said, she was forced to use a payroll card issued by JPMorgan Chase. She has since quit her job at the drive-through window and is suing the franchise owners.

“I know I deserve to get fairly paid for my work,” she said.

The franchise owners, Albert and Carol Mueller, said in a statement that they comply with all employment, pay and work laws, and try to provide a positive experience for employees. McDonald’s itself, noting that it is not named in the suit, says it lets franchisees determine employment and pay policies.

I actually heard about this newish scheme against the poor when I attended the CFPB Town Hall more than a year ago and wrote about it here. Actually that’s where I heard people complain about Walmart doing this but also court-appointed child support as well.

Just to be clear, these fees are illegal in the context of credit cards, but financial regulation has not touched payroll cards yet. Yet another way that the poor are financialized, which is to say they’re physically and psychologically separated from their money. Get on this, CFPB!

Update: an excellent article about this issue was written by Sarah Jaffe a couple of weeks ago (hat tip Suresh Naidu). It ends with an awesome quote by Stephen Lerner: “No scam is too small or too big for the wizards of finance.”

Categories: #OWS, finance, rant

Newer Entries Older Entries

mathbabe

Huma’s Little Weiner Problem

Radhika Nagpal is a role model for fun people everywhere

MOOCs, their failure, and what is college for anyway?

Proprietary credit score model now embedded in law

If we bailed out the banks, why not Detroit? (#OWS)

Math fraud in pensions

Aunt Pythia’s advice

The Stop and Frisk sleight of hand

Money in politics: the BFF project

THIS REQUIRES YOUR MOCKERY

Aunt Pythia’s advice

The creepy mindset of online credit scoring

On being a data science skeptic: due out soon

PyData and a few other things

Measuring Up by Daniel Koretz

Parenting through benign neglect

Aunt Pythia: alive and well!

You give me a capital requirement, I’ll give you a derivative to skirt it.

The regressive domestic complexity tax

Payroll cards: “It costs too much to get my money” (#OWS)

Top Posts & Pages

Follow Blog via Email

Recent Posts

Meta