May | 2012 | mathbabe

Best case/ worst case: Medicine 50 years from now

May 31, 2012 Cathy O'Neil, mathbabe 5 comments

Best Case

The scientific models and, when possible, the data have been made available to the wider scientific community for vetting. Incorrect or non-robust results are questioned and thrown out by that community, interesting and surprising new results are re-tested on larger data sets under iterative and different conditions to test for universality.

The result is that a person, with the help of their doctor and thorough exams and information-gathering session, and with their informed consent to use this data for their benefit, will have a better idea of what to watch out for in terms of health risks, how to prevent certain diseases that they may be vulnerable to, and how the tried-and-true medicines would affect them.

For example, in spite of the fact that Vioxx gives some people heart attacks, it also really helps other people with joint pain that aspirin or ibuprofen can’t touch. But which people? In the future we may know the answer to this through segmentation models, which group people by their attributes (which could come under the category of daily life conditions, such as how much someone exercises, or under the category of genetic profile).

For example, we recently learned that exercise is not always good for everyone. But instead of using that unlikely possibility as an excuse not to do any exercise, we could be able to look at a given profile and tell a person if they are in the clear and what kind of exercises would be most beneficial to their health.

It wouldn’t solve every problem; people would still die, after all. But it could help people live happier and healthier lives. It depends on the open exchange of ideas among scientists as well as strong regulation about who owns personal data and how it can be used.

Worst Case

The scientific community continues its practice of essentially private data collection and models. Scientific journals become more and more places where, backed by pharmaceutical companies and insurance companies, paid Ph.D.’s boast about their latest breakthrough with no cultural standard of evidence.

Indeed there is progress in segmentation models for disease and medicine, but the data, models, and results are owned exclusively by corporations, specifically insurance companies. This leads to a death spiral in modeling, where the very people who are vulnerable to disease and need medicine or treatment the most are priced out of the insurance system and no longer have access to anything resembling reasonable medical care, even for chronic diseases such as diabetes.

And you won’t need to give your consent for those insurance companies to use your data – they will have already bought all the data that they need to know about you from data collectors, which have been gleaning information about you from your online presence since birth. These companies will know everything about you; they control and sell your data for extra profit. To them, you represent a potential customer and a potential cost, a risk/return profile like any other investment.

Categories: data science

How to talk conservative

May 30, 2012 Cathy O'Neil, mathbabe 12 comments

I finished reading “The Righteous Mind: Why Good People are Divided by Politics and Religion” and I have to say, I got a lot out of it. Even if they are just approximations to the truth, it’s interesting to consider his various positions. Near the end he talks about religion and “groupishness,” and how people are too focused on the technical aspects of religious beliefs rather than what a religion accomplishes in a community, which he claims is its main benefit.

But what I found more interesting is the beginning of the book when he discusses the different moral make-up of liberals and conservatives (and libertarians) in this country. Namely, he claims that liberals care primarily about the following three things:

caring for the vulnerable or victimized,
the concept of oppression from bullies – or conversely the concept of liberty, and
the concept of proportional fairness (you deserve a part of the pie since you helped make it, but you wouldn’t deserve any if you hadn’t helped).

By contrast, conservatives care about a larger set of six things, the above three as well as:

the concept of sanctity,
the concept of authority – when it’s just and those in power take proper responsibility, and
the concept of loyalty.

I took away three points. First, liberals are bad at guessing what conservatives think, because they are somewhat blind to these last three things, and when they see conservatives go on about them, they assume conservatives don’t care about the first three, which is wrong, although it’s true that they care about them differently (especially proportional fairness: whereas liberals emphasize leaving nobody out, conservatives emphasize not letting people get extra, especially if it comes from their stuff). Second, if I, as a liberal, want to communicate with a conservative, I have to talk about all six of these with some level of understanding. Finally, statistics and other rational arguments only work if the person you’re talking to already agrees with you or if they are exceptionally open-minded – in any case you have to appeal to their morals before going into stats.

With that in mind, here are two rants against the Stop, Question, and Frisk policy, one written for a liberal audience, one for a conservative audience.

Liberal version First, the stop, question, and frisk policy targets minority men almost exclusively. Second, almost 90% of the events end up without an arrest, which means it’s unwarranted intrusion and bullying- typically the reason given for the stop is a “furtive movement”, which could be absolutely anything. Finally, there is a quota system in the police department which forces each officer to perform these unwarranted searches whether or not there is cause, which inevitably leads them to target the “least likely to complain,” namely young, poor minorities. We need to stop the police abusing their privileges in this way immediately.

Conservative version What is the difference between a police force and a gang of men who walk around with guns? The answer, in the best of worlds, is authority, intentionality, and the rule of law. Police have an important job to do, which is to protect us, and to keep the streets safe. And when they do a good job, we admire them for that and count on them for their protection. But imagine if, instead of seeing your neighborhood cop as someone you can count on, he instead consistently stops you on your way home from school or work and asks you suspicious questions, and sometimes even takes your keys from your pocket, and, while you’re locked in the police car, enters your apartment and terrorizes your family. This makes you feel like you are the bad guy, even though you did nothing wrong. After a while, it would make you and your neighborhood less trusting of the authority of the cops, which would lead to reckless behavior and lawlessness, because your rights are no longer being protected. We need to stop the policy of Stop, Question, and Frisk in order to make sure the police never become just a bunch of bullies with guns.

Categories: musing

When “extend and pretend” becomes “delay and pray”

May 29, 2012 Cathy O'Neil, mathbabe 2 comments

When banks have non-performing loans, they sometimes don’t want to admit it. So instead of calling it a loss, because the debtor can’t pay, they simply rewrite the contract so that it has been extended. This way the debtor is not technically behind in payments and the creditor can pretend that the corresponding debt on their books is worth something. It’s called extend and pretend, and it’s not new.

And actually, this ploy sometimes works. After all, sometimes the debtor just needs a bit more time – they could be temporarily unable to pay for whatever reason. Indeed it would be a convenient option for people who are just in need of a few more months to get back on their feet and not lose their house (typically this offer is not extended to individuals, since their loans are too small to fret over).

Make no mistake: there is a real incentive for the banks to do this. Currently the worst example of this method is in Spain, where the banks are finding it politically impossible to admit their losses. The government doesn’t want to hear it, because they will need to bail them out, and their borrowing costs are already precariously high. The Eurozone leaders don’t want to think Spain is as bad off as Greece, because they can’t handle that kind of problem. The investors don’t want to hear it because their investments will be worth less once the news comes out (an example of asymmetric information if there ever was one – shouldn’t investors already know how much extending and pretending is really going on?). And of course the lenders themselves don’t want to admit they are working at an insolvent institution, especially when they probably each know other institutions that are even more insolvent.

What are the chances that this method of delay and pray will work for Spain? With an enormous housing bubble and 24% unemployment, not good. Most of the bad loans that have been extended after non-payment are housing market related. Half of the lenders are zombie, which means insolvent but still technically open for business. Essentially the numbers are just too high and now everybody knows it (see this Bloomberg article for the low-down on Spain).

So what should Spain be doing?

I like to point to the example of Iceland, which admitted its debts early on (although it has to be admitted they didn’t have much of a choice), defaulted on a bunch of international debt, bailed out their citizens from onerous home debt, and is recovering nicely (see this Bloomberg article for more on Iceland).

Oh, and let me add that they (Iceland) are indicting and jailing the bankers who got them into the mess, to the tune of 200 indictments. Considering the U.S. has a population 981 times as large, that would be equivalent to us indicting 196,341 bankers. In fact we’ve indicted no top bank executive, although everyone will be relieved to know the SEC “sanctioned” 39 people for the housing market debacle. Phew!

Unfortunately, it would be tough for Spain to repeat that act- it depended on the fact that Iceland has control over its economic choices, but Spain is part of the Eurozone and as such is embedded in a huge network of agreements and debts and currency with the other Eurozone nations.

In some sense, Spain is being forced into the zombie bank situation by a lack of options. Unless I’m missing something – would love to be wrong!

Categories: finance

Biking in New York City

May 28, 2012 Cathy O'Neil, mathbabe 1 comment

I’m a huge fan of biking around the city. I like to commute to work, from the Columbia University neighborhood up at 116th and Broadway to just below Houston on Varick. Since both my house and my work are within blocks from the west side of Manhattan, I can bike the whole way along the west side bike path (see, for example, this map).

It’s a gorgeous ride along the Hudson River, and there’s not one day I ride it without appreciating not being stuck in the traffic next to me on the West Side Highway. Okay, actually, last Monday was one, when I got caught in a huge thunderstorm. Luckily I had dry clothes, but for some reason no dry socks (note to self: bare feet with wet leather boots is gross). I’m also happy not to be on the subway (1 line) on Monday mornings when people are extra grumpy about going to work.

I don’t bike when it’s (already) raining, or when it’s icy, and it’s always a bummer when daylight savings starts, because it means it’s already dark by the time I leave work. But otherwise I am on the lookout for great biking days and opportunities.

A few weeks ago, on the first really gorgeous day of spring, I biked from one Occupy meeting to another, the first one up at Columbia and the second in Union Square (to see my friend Suresh Naidu speak about Radical Economics 101). I biked through Central Park, which was bursting with spring joy, and then all the way to Union Square down Broadway, which now has a beautiful bike lane. The only annoying part was Times Square, which is so full of tourists you have to walk your bike. So that’s a good sign, when the pedestrians are more dangerous than the cars.

And I also bike on other streets, although after being doored a few times and breaking someone’s windshield with my head (a long time ago in Berkeley but still) I am hugely defensive- I pretty much assume every moving car is trying to hit me and every parked car’s door is about to open. Even so, there are quite a few quiet streets I can feel safe biking down, in the middle, and although it’s not very fast, it’s certainly faster than walking. A great way to explore the city.

And I’m not alone, here’s a great essay by David Byrne in a recent New York Times Opinion column entitled “This is How We Ride”. It’s a beautifully written piece, and he describes the joys of biking in the city perfectly. He mentions that there’s a new bike-share initiative starting this summer, where there will be 10,000 bikes for rent at 420 bike stations in Manhattan, Long Island City, and Brooklyn.

That’s awesome, even if I will have to share the bike lane with even more enthusiasts. The rides are limited to 30 minutes, so not a full commute for me, but it means that if I’m already downtown and want to get to the East Side (which is always hard – I like to say that going to the East Side is like going to L.A. in terms of logistical difficulties) I will be able to hop on a bike and cross town. Cool!

Categories: musing

Everybody lies (except me)

May 27, 2012 Cathy O'Neil, mathbabe 5 comments

There’s an interesting article in the Wall Street Journal from yesterday about lying. In the article it explains that everybody lies a little bit and, yes, some people are serious liars, but the little lies are the more destructive because they are so pervasive.

It also explains that people only lie the amount they can get away with to themselves (besides maybe the out-and-out huge liars, but who knows what they’re thinking?).

When I read this article, of course, I thought to myself, I don’t lie even a little bit! And that kind of proved their point.

So here’s the thing. They also explained that people lie a bit more when they are in a situation where the consequences of lying are more abstract (think: finance) and that they lie more when they are around people they perceive as cheating (think: finance). So my conclusion is that finance is populated by liars, but that’s because of the culture that already exists there: most people just amble in as honest as anyone else and become that way.

Of course, every field has that problem, so it’s really not fair to single out finance. Except it is fair to single out any place where you can cheat easily, where there are ample opportunities to lie and profit off of lies.

One cool thing about the article is that they have a semi-solution, namely to remind people of moral rules right before the moment of possible lying. This can be reciting the ten commandments or swearing on a bible, which for some reason also works for atheists (but wouldn’t stop me from lying!), or could be as simple as making someone sign their name just before lying (or, even better, just before not lying) on their auto insurance forms.

Can we use this knowledge somehow in setting up the system of finance?

The result where people are more likely to lie when they know who the victim of their lie is may explain something about how, back when banks lent out money to people and held the money on their books, we had less fraud (but not zero fraud of course). The idea of personally knowing who the other person is in a transaction seems kind of important.

The idea that we make people swear they are telling the truth and sign their name seems easy enough, but obviously not infallible considering the robo-signing stuff. I wonder if we can use more tricks of the honesty trade and do things like make sure each person signing is also being videotaped or something, maybe that would also help.

Unfortunately another thing the article said was that having been taught ethics some time in the past actually doesn’t help. So it’s less to do with knowledge and more to do with habit (or opportunity), it seems. Food for thought as I’m planning the ethics course for data scientists.

Categories: data science, finance, musing

All the good data nowadays is private – what’s the point of having a data science Ph.D.?

May 25, 2012 Cathy O'Neil, mathbabe 14 comments

I go back and forth on whether there should be an undergrad major or Ph.D. program on data science. On the one hand, I am convinced it’s a burgeoning field which will need all the smart people it can get in the next few years or decades. On the other hand, I’m just not sure how capable academics really are at teaching the required skills. Let me explain.

It’s not that professors aren’t super smart and great at what they do. But the truth is, they typically don’t have access to the kind of data that’s now available to data scientists working in Google or Facebook or other tech companies (see this recent New York Times article on the subject). Even where I work, which is a medium sized start-up, I have access to data which many academics would kill for. This means I get to play with an incredibly rich resource, assuming I have built up the toolset to do so.

So while academics are creating (unrealistic) models of “influence” based on weird assumptions about how information gets propagated through networks, nerds at Facebook and Google and Foursquare just get to see it happen in real time. There’s an enormous advantage to having the data at your fingertips – you get good results fast. But then since it’s all proprietary you can’t publish it (a topic for another post).

Another thing: since academics typically don’t have this kind of big data, they also don’t have to create tools or methods for taming huge data. Sometimes I hear statisticians say that data science is just statistics, but they are typically missing the point of this “taming” aspect of data science. Namely, if we use state-of-the-art proven statistical methods on 15 terabytes of data and it takes 50 years to come up with an answer, then guess what, it doesn’t work.

At the same time, data science isn’t purely algorithmic time considerations either, and a computer scientist without a good statistical background would be equally wrong if they said that data science is just machine learning.

For that matter, data science also isn’t purely speculative research – there’s a bottomline business aspect to it, and the intention is (usually) to make profit. But there’s no way someone with a business degree that doesn’t know how to model can be a data scientist either.

End result: To teach data science for reals, you’d need to form a inter-disciplinary department across business, computer science, applied math, and statistics. Even so, I’m not sure how well strictly academic departments can really teach the nitty gritty of data science if they do collaborate across departments because they just don’t have good enough data (and by the way, this is a huge “if” – it seems politically impossible in some of the universities I’ve talked to).

On the other hand, I think it’s a good idea to try, because it is a great opportunity to teach at least some basic stuff and to instill a code of ethics in young data scientists.

The way things work now, the tech industry takes in former mathematicians, physicists, computer scientists, and statisticians and puts them on projects creating models of human behavior (I’ll include finance in that category) that are infinitely scalable and sometimes nearly infinitely scaled. Nobody is ever taught to stop and think about how their models are going to be used and how to think about the long-term effects of their models.

In spite of all the data problems and political obstacles, I feel that for the sake of this conversation, i.e. of personal responsibility of a modeler, we should go ahead and make a program, because it’s important and it isn’t gonna happen in your typical finance firm or tech startup.

Categories: data science

Favorite bands

May 24, 2012 Cathy O'Neil, mathbabe 5 comments

My 9-year-old’s favorite bands (and favorite songs):

Queen (Bohemian Rhapsody)
AC/DC (Back in Black)
ABBA (Fernando)
Green Day (American Idiot)
Weird Al Yankovic (Canadian Idiot)

Categories: musing

The engaged skeptic

May 23, 2012 Cathy O'Neil, mathbabe 7 comments

Last night I read this article by Jane Brody in the New York Times, which was about staying optimistic and the various benefits of a can-do attitude, including health benefits.

At one point in her essay she defines optimism like this:

She wrote, “People can learn to be more optimistic by acting as if they were more optimistic,” which means “being more engaged with and persistent in the pursuit of goals.”

If you behave more optimistically, you will be likely to keep trying instead of giving up after an initial failure. “You might succeed more than you expected,” she wrote. Even if the additional effort is not successful, it can serve as a positive learning experience, suggesting a different way to approach a similar problem the next time.

But in another part of her essay it has been transformed:

Avoid negative self-talk. Instead of focusing on prospects of failure, dwell on the positive aspects of a situation.

In college, I would approach every exam, even those I had barely studied for, with the thought that I was going to do well. Time after time, this turned out to be a self-fulfilling prophecy.

So which is it? Does being optimistic mean I’ll be more engaged with an persistent in the pursuit of goals, or does it mean I’ll barely study for an exam and then talk myself into thinking I’ll do well? Because those two ways sound pretty different, if not downright opposite. And who wants to be around a lazy optimist?

I don’t want to quibble, but I think there’s a common and important conflation of the two ideas of engagement and naivety, and I’d like to separate them.

It’s possible, and very possibly more interesting, to have a can-do attitude but not be optimistic, or in other words to be an engaged skeptic.

Just because I work hard and devote myself to something doesn’t mean I’ve fooled myself into thinking it will be a piece of cake. But it does mean I don’t think it’s impossible, and it will only work if I try to make it work. It’s not likely to work but it’s worth trying. Many very hard and very worthwhile things are like that.

Finally, when Brody says “Focus on situations that you can control, and forget those you can’t”, I’d argue that’s often code for letting yourself off too easy.

I claim that, as an engaged skeptic, you shouldn’t really forget anything, because you should figure out how you can maybe affect it after all, or in some small way, or the system it lives in, even if it’s in the future, and even if the chances are it won’t work.

Categories: rant

An open source credit rating agency now exists!

May 22, 2012 Cathy O'Neil, mathbabe 5 comments

I was very excited that Marc Joffe joined the Alternative Banking meeting on Sunday to discuss his new open source credit rating model for municipal and governmental defaults, called Public Sector Credit Framework, or PCSF. He’s gotten some great press, including this article entitled, “Are We Witnessing the Start of a Ratings Revolution?”.

Specifically, he has a model which, if you add the relevant data, can give ratings to city, state, or government bonds. I’ve been interested in this idea for a while now, although more at the level of publicly traded companies to start; see this post or this post for example.

His webpage is here, and you will note that his code is available on github, which is very cool, because it means it’s truly open source. From the webpage:

The framework allows an analyst to set up and run a budget simulation model in an Excel workbook. The analyst also specifies a default point in terms of a fiscal ratio. The framework calculates annual default probabilities as the the proportion of simulation trials that surpass the default point in a given year.

On May 2, we released the initial version of the software and two sample models – one for the US and one for the State of California – which are available on this page. For PSCF project to have an impact, we need developers to improve the software and analysts to build models. If you care about the implicatiions of growing public debt or you believe that transparent, open source technology can improve the standard of rating agency practice, please join us.

If you are a developer interested in helping him out, definitely reach out to him, his email is also available on the website.

He explained a few things on Sunday I want to share with you. They are all based on the kind of conflict of interest ratings agencies now have because they are paid by the people who they rate. I’ve discussed this conflict of interest many times, most recently in this post.

First, a story about California and state bonds. In the 2000’s, California was rated A, which is much lower than AAA, which is where lots of people want their bond ratings to be. So in order to achieve “AAA status,” California paid a bond insurer which was itself rated AAA. That is, through buying the insurance, the ratings status is transferred. In all, California paid $102 million for this benefit, which is a huge amount of money. What did this really buy though?

At some point their insurer, which was 139 times leveraged, was downgraded to below A level, and that meant that the California bonds were now essentially unbacked, so down to A level, and California had to pay higher interest payments because of this lower rating.

Considering the fact that no state has actually defaulted on their bonds in decades, but insurers have, Marc makes the following points. First, states are consistently under-rated and are paying too much for debt, either through these insurance schemes, where they pay questionable rates for questionable backing, or directly to the investors when their ratings are too low. Second, there is actually an incentive for ratings agencies to under-rate states, namely it gives them more business in rating the insurers etc. In other words they have an eco system of ratings rather than a state-by-state set of jobs.

How are taxpayers in California not aware of and incensed by the waste of $102 million? I would put this in the category of “too difficult to understand” for the average taxpayer, but that just makes me more annoyed. That money could have gone towards all sorts of public resources but instead went to insurance company executives.

Marc then went on to discuss his new model, which avoids this revenue model, and therefore conflict of interest, and takes advantage of the new format, XBRL, that is making it possible to automate ratings. It’s my personal belief that it will ultimately be the standardization of financial statements in XBRL format that will cause the revolution, more than anything we can do or say about something like the Volcker rule. Mostly this is because politicians and lobbyists don’t understand what data and models can do with raw standardized data. They aren’t nerdy enough to see it for what it is.

What about a revenue model for PCSF? Right now Marc is hoping for volunteer coders and advertising, but he did mention that there are two German initiatives that are trying to start non-profit, transparent ratings agencies essentially with large endowments. One of them is called INCRA, and you can get info here. The trick is to get $400 million and then be independent of the donors. They have a complicated governance structure in mind to insulate the ratings from the donors. But let’s face it, $400 million is a lot of money, and I don’t see Goldman Sachs in line to donate money. Indeed, they have a vested interest in having all good information kept internal anyway.

We also talked about the idea of having a government agency be in charge of ratings. But I don’t trust that model any more than a for-profit version, because we’ve seen how happy governments are at being downgraded, even when they totally deserve it. Any governmental ratings agencies couldn’t be trusted to impartially rate themselves, or systemically important companies for that matter.

I’m really excited about Marc’s model and I hope it really does start a revolution. I’ll be keeping an eye on things and writing more about it as events unfold.

Categories: data science, finance, open source tools

Buying organic doesn’t make you better than me

May 21, 2012 Cathy O'Neil, mathbabe 38 comments

There was a recent study published here which described how people who viewed organic foods with annoyingly self-righteous names actually behave more selfishly than people who viewed “comfort food” or other, bland categories of food. The abstract:

Recent research has revealed that specific tastes can influence moral processing, with sweet tastes inducing prosocial behavior and disgusting tastes harshening moral judgments. Do similar effects apply to different food types (comfort foods, organic foods, etc.)? Although organic foods are often marketed with moral terms (e.g., Honest Tea, Purity Life, and Smart Balance), no research to date has investigated the extent to which exposure to organic foods influences moral judgments or behavior. After viewing a few organic foods, comfort foods, or control foods, participants who were exposed to organic foods volunteered significantly less time to help a needy stranger, and they judged moral transgressions significantly harsher than those who viewed nonorganic foods. These results suggest that exposure to organic foods may lead people to affirm their moral identities, which attenuates their desire to be altruistic.

I read the original study (and also a hilarious post riffing on it from jezebel.com), and found it interesting that the experimenters at least claimed to be unsure of the outcome of the study in advance (although they did cite another study in which people were more likely to cheat and steal after purchasing “green” products).

Specifically, they thought one of two things could happen: that the sense of elevation cause by staring at the organic labels could make them feel like part of a larger community and therefore more willing to volunteer, or else the “moral piggybacking” on a perceived good deed (i.e. organic food is good for the environment) would make them feel like they’d already done enough, and be less likely to be nice. It turns out the latter.

[As an aside, another study cited was one in which people assumed there were fewer calories in chocolate which was described as “fair trade”, which explains something to me about why those kinds of labels are so popular and also so ripe for fraud.]

The results of this study resonates with me: ever since Whole Foods opened I’ve had the impression that the people shopping there thought they’d done enough for the world simply by paying too much for produce and not being able to buy Cheerios (a pet peeve of mine). Haven’t you noticed how rude Whole Foods shoppers are? I’d rather be in a Stop and Shop check-out line any day.

In other words, I’m going through a major case of confirmation bias here. I’ve been a huge skeptic about the organic food movement since it began when I was in college at Berkeley. I’ve challenged a whole bunch of my friends on this (yes I’m an asshole) and I’ve noticed there are essentially two camps. One camp defends organic as good for the environment, the other camp defends organic as more nutritious.

For the environmentalists, my argument is that local produce is better than California organic produce, given that it’s been shipped across the country. It seems silly to me to be able to purchase organic blueberries imported from somewhere instead of locally grown blueberries. In fact I’m not sure where there’s good evidence that organic, locally grown produce is better for the environment than just locally grown produce.

The other camp defends organic as more nutritious, but that really drives me completely nuts, because if you flip that around the message is that we can let the poor people eat the toxic vegetables while we rich people eat the healthy stuff. It’s crazy! If there really is toxicity in our standard produce, then this is a huge problem for the country and we need to address it directly, rather than making a certain class of very expensive food.

Categories: rant, statistics

Stop, Question, and Frisk policy getting stopped, questioned, and frisked

May 20, 2012 Cathy O'Neil, mathbabe Comments off

I’m happy to see that Federal District Court Judge Shira A. Scheindlin has granted class-action status to a lawsuit filed in January 2008 by the Center for Constitutional Rights which challenged the New York Police Department’s stop-and-frisk tactics.

The practice has been growing considerably in the last few years by way of a quota system for officers: an estimated 300,000 people have been stopped and frisked in New York City so far this year.

From the New York Times article on the class-action lawsuit:

In granting class-action status to the case, which was filed in January 2008 by the Center for Constitutional Rights on behalf of four plaintiffs, the judge wrote that she was giving voice to the voiceless.

“The vast majority of New Yorkers who are unlawfully stopped will never bring suit to vindicate their rights,” Judge Scheindlin wrote.

The judge said the evidence presented in the case showed that the department had a “policy of establishing performance standards and demanding increased levels of stops and frisks” that has led to an exponential growth in the number of stops.

But the judge used her strongest language in condemning the city’s position that a court-ordered injunction banning the stop-and-frisk practice would represent “judicial intrusion” and could not “guarantee that suspicionless stops would never occur or would only occur in a certain percentage of encounters.”

Judge Scheindlin said the city’s attitude was “cavalier,” and added that “suspicionless stops should never occur.”

I feel pretty awesome about this progress, since I was the data wrangler on the Data Without Borders datadive weekend and worked with the NYCLU to examine Stop, Question, and Frisk data. Some of that analysis, I’m guessing, has helped give ammunition to people trying to stop the policy – here is the wiki we made that weekend, and here’s another post I wrote a few weeks later.

For example, if you look at this editorial from the New York Times from a few days ago, you see a similar kind of analysis:

Over time, the program has grown to alarming proportions. There were fewer than 100,000 stops in 2002, but the police department carried out nearly 700,000 in 2011 and appears to be on track to exceed that number this year. About 85 percent of those stops involved blacks and Hispanics, who make up only about half the city’s population. Judge Scheindlin said the evidence showed that the unlawful stops resulted from “the department’s policy of establishing performance standards and demanding increased levels of stops and frisks.”

She noted that police officers had conducted tens of thousands of clearly unlawful stops in every precinct of the city, and that in nearly 36 percent of stops in 2009, officers had failed to list an acceptable “suspected crime.” The police are required to have a reasonable suspicion to make a stop. Only 5.37 percent of all stops between 2004 and 2009, the period of data considered by the court, resulted in arrests, an indication that a vast majority of people stopped did nothing wrong. Judge Scheindlin rebuked the city for a “deeply troubling apathy toward New Yorkers’ most fundamental constitutional rights.” The message of this devastating ruling is clear: The city must reform its abusive stop-and-frisk policy.

Woohoo! This is a great example of data analysis where it’s actually used to protect people instead of exploit them, which is pretty rare. It’s also a cool example of how open source data has been used to probe shady practices- but note that there was a separate lawsuit to force the NYPD to open source this Stop, Question, and Frisk data. They did not do it willingly, and they still don’t have the first few years of it publicly available.

Here’s another thing we could do with such data. My friend Catalina and I were talking yesterday about one of the consequences of the Stop, Question, and Frisk data as follows. From a Time Magazine article on Trayvon Martin:

in the U.S., African Americans and whites take drugs at about the same rate, but black youth are twice as likely to be arrested for it and more than five times more likely to be prosecuted as an adult for drug crimes. In New York City, 87% of residents arrested under the police department’s “stop and frisk” policy are black or Hispanic.

I’d love to see a study that breaks this down in a kind of dual way. If you’re a NYC teenager walking down the street in your own neighborhood with a joint in your pocket, what are your chances of getting put in jail a) if you’re white, b) if you’re black, c) if you’re hispanic, or d) if you’re asian?

I think those numbers would really bring home the kind of policy that we’re dealing with here. Let’s see some grad student theses coming out of this data set.

Categories: data science, news, statistics

WTF with girdles?!?

May 19, 2012 Cathy O'Neil, mathbabe 9 comments

The post today has absolutely nothing to do with math, finance, data science, or Occupy Wall Street. I’ll get back to that stuff after venting.

Can I just say, as a bounteous 3-time mother, that I absolutely positively don’t understand the new-found popularity of girdles?

I was going to not mention it because it seems like the girdle-pushing crowd may get more attention than they deserve simply by being thought about, but it seems like it’s hit a certain crest of popularity that forces my hand.

So here’s what happened. For whatever reason I received a SPANX catalog in the mail, and just out of sheer disbelief that there could be a whole catalog of such nonsense, I took a look inside.

And do you know what I found out? I found out that many of the things in the catalog don’t even come in my size! That’s because they go down to like size 4. No, I’m not kidding. Plus, they also have girdles for men, no shit.

Then I came across this NYT article about corsets. From the article:

At Aishti, his store in Jackson Heights, Queens, Moussa Balaghi has begun carrying girdles in size “extra small,” because, to his shock, so many teenagers and even younger girls were coming in to request them. “Only chubby fat girls used to use this; now, everybody is,” he said, shaking his head. “If she has the smallest little thing at her waist, she wants to use this.”

WTF?!

May I ask, what is a young skinny woman doing thinking about crap like this? What is the point of them? I am honestly confused. Is the point to have something strapped around you, keeping you from breathing correctly, keeping you from biking around town or bending over, and generally confirming that you’re imperfect?

I actually object to all girdles, because I like to see people love and accept their bodies, which seems kind of hard when you’re wearing an ace bandage all over your body.

Something is going on here and it smells bad.

Categories: rant

Google’s promotion policy sucks for women

May 18, 2012 Cathy O'Neil, mathbabe 14 comments

I’m going to start this post with an excerpt from a comment of reader JoanDelilah from a couple of weeks ago, commenting on my post The meritocracy myth:

And at the end of the day, this also assumes that it is right and proper for a structure to be in place which requires you to *grab* tough/interesting work to prove yourself, as opposed to it being given to you. There is competition inherent in the foundational world-view behind that statement. Why so much competition? We are supposed to be on the same team and competing with other businesses, right? What about the woman who is happy to crush any assignment she is given but simply doesn’t want to have to compete for the assignments that will “prove” her abilities? Why must she step so far out of her comfort zone just in order for the company that pays her to make use of the talents they are paying her to use?

This really nails down what I see all the time with respect to women getting promoted or even just getting recognized for their achievements.

To paraphrase it, women tend not to compete for recognition as much as men, for whatever reason. Maybe they’ve been socialized not to, maybe it is a simple question of testosterone. I will go into why I think this happens below. But for now let me just say I get super pissed when a system has been set up to diminish the success of people simply because of this personality issue.

Google is one such system. At Google, one must self-promote. I believe the rule is that, after two quarters or so of getting good reviews, you are eligible to self-promote, but you don’t have to.

And guess what? That policy sucks for women. Women don’t do it as often. I’ll bet this is statistically significant, even though I don’t have the numbers. Hey Google, do the math on this policy! And then change it!

Here’s the first part of my theory of why this happens. Women are not as secure in their accomplishments. By the way, note I am not saying women are insecure and men are secure. I think it’s more like men are over-secure and women are realistic, kind of like those studies that shows that depressed people are realists and non-depressed people are optimists. I definitely have seen men who actually think they (individually) accomplished something which clearly took a team effort. Women are less likely to “forget” the help they received in making something happen. See this amazing blog rant on the subject from a professor at NYU.

Here’s the second part. Women tend to choose mentors (i.e. bosses or advisors) that are brilliant, thoughtful, and approachable. Typically this also means that those mentors are not the kind of bullying personalities that are best suited to promote their team. Even when one doesn’t have a choice in who your boss is, I claim this approach to pairing still happens in a business when that business decides who should be the boss of a woman.

Example in pure math: Yau at Harvard is famously dynasty-building with his students, but he’s probably not someone who has a tissue box in his office (to be fair I haven’t checked). I didn’t even consider taking Yau as my advisor, in part because he was super intimidating and seemed to challenge grad students with a ring of fire.

The reward for being brave in a situation like that are that he is fiercely loyal to his students once he accepts them, and helps them get great jobs. My point is that fewer women choose Yau-like personalities as their advisor (although it has to be said that Yau has had women students, including Columbia’s Melissa Liu). And thus fewer women end up with advisors that will land them jobs and give them good advice on how to get ahead. I just don’t think women are thinking about that aspect of a mentor the way men do (it’s also possible than men don’t think about it either but are less likely to shy away from rings of fire in general due to their “optimistic” egos).

I am not saying this is an easy problem to fix, because it’s not, and the best self-promoters will always do well no matter where they work. But I do think Google can do better than this; maybe they could think of something a bit more double-blind like the orchestra auditions.

Categories: math education, rant, women in math

Recovery begins when addiction ends: an open letter to Jamie Dimon (#OWS)

May 17, 2012 Cathy O'Neil, mathbabe 2 comments

Posted here on Naked Capitalism, written by the Alternative Banking group of Occupy Wall Street.

Please spread widely!

Categories: #OWS, finance

Stop with the man-diets already, coffee is good for you.

May 17, 2012 Cathy O'Neil, mathbabe 6 comments

Every now and then my husband goes on a “man-diet”, which is a term I’ve coined meaning a restriction that has absolutely no reasonable goal except the very last event, namely breaking the diet and thus having an awesome moment of relapse.

His favorite man-diet is the coffee man-diet. He’ll suddenly decide that his two espressos in the morning and two more during the day is too much and he “needs to cut down”. He’ll make it one coffee in the morning and one in the afternoon, and he’ll stare wistfully at my second (or third) morning espresso, complain vaguely of headaches, and generally be a space cadet (by which I mean more than usual).

This will go on for about 5 days or so, until one morning where he’ll wake up in an addict’s rage and rebelliously suck down two coffees whilst raving about the magical properties of caffeine.

This happens about once a year. Every time it happens I remind him that coffee is actually good for you and there’s no reason to try to break an addiction that’s doing you good- we don’t try to go off oxygen, do we?

Well here’s more evidence of that. And in case you’re wondering, yes I do ignore all negative evidence of any one of my theories, but in this case I’m pretty sure the evidence I collect on my side is quite a bit stronger than the stuff I ignore. From the Bloomberg article:

The study found that men who drank 2 to 3 cups a day had a 14 percent lower risk of dying from heart disease, 17 percent lower risk of dying from respiratory disease, 16 percent decreased chance of dying from stroke and a 25 percent lower risk of dying from diabetes than those who drank no coffee.

…

Women who consumed 2 to 3 cups of coffee a day had a 15 percent lower chance of dying from heart disease, 21 percent lower risk of dying from respiratory disease, 7 percent decreased chance of dying from stroke and a 23 percent lower risk of dying from diabetes.

Categories: news, rant

The modeling death spiral for public schools

May 16, 2012 Cathy O'Neil, mathbabe 17 comments

There was recently a New York Times article about how the public schools have become super segregated by race.

I’m wondering how much of this can be explained by income rather than by race in combination with the obsession we all have with test scores. Let me explain.

If I’m living in a neighborhood with a neighborhood school and the school seems pretty good, then depending on how picky I am I might just stay living there and let my kids go there.

Now assume that suddenly there are test scores available for all the schools in the area, and it turns out my neighborhood school doesn’t do as well as a surrounding neighborhood. Then, depending on how much I think those test scores matter to my childrens’ futures, and how much resources I have, I will be tempted to move to that neighborhood for the “better schools” (read: better test scores).

Over time, people with good resources will move to the new neighborhood, which will become more expensive because there’s competition to get it, which in turn will make it easier for that town to raise local taxes to improve the school, and will also attract parents who really care about the quality of the schools, which will improve the school and presumably the test scores of that school, exacerbating the original difference of test scores.

And of course that’s just what’s happened in this country. My parents moved to Lexington Massachusetts for the schools, and they paid a premium for their house for the location and the school system. So I went to a public school but one that increasingly was attended by richer and richer kids.

Income segregated public schools are the new private schools.

In New York City, where there is more to consider than just your neighborhood, because you can get your kids into schools in other neighborhoods, and there’s a whole network of gifted and talented schools as well, it’s a much more complicated dynamic, but the underlying reasons are the same, and they again have to do with segmentation modeling: we know which schools do well on tests and we avoid poorly testing schools if we can.

The availability of the test scores is huge- if I’m thinking of moving to a new city I can just look up the SAT scores of the high schools in the area and try to find a place to live which is in one of the highest-scoring towns.

This is what I call a death spiral of modeling, and it’s the same idea I described here when insurance companies have too much information about you and deny you coverage because you need insurance so bad. And it’s very difficult to get out of a death spiral, because to do so you need to reset the whole system and re-pool resources but in this case people have already moved out of town.

Questions I am thinking about:

Is it dumb to care so much about test scores? On the one hand I don’t want to take chances on my kids, so I will opt for the conservative route, which is to think they should be surrounded by kids who test well, because certainly in extreme cases that kind of thing is likely to be contagious behavior. But maybe we have exaggerated ideas about how contagious these things are or how important test scores really are to our kids futures. How would we test that and how would we disseminate the results? And what if we found out that everybody has been acting totally rationally?
Which begs the other question, namely how can we get this system to work better overall for the average student that would be realistic?
Note that in the above discussion I haven’t talked about the teachers at all, which is strange. But from my perspective, our system is all about concentrating kids who test well together, and it’s not all that clear that the teachers matter, although I’m sure they do actually. What am I missing? Is there a way of solving this death spiral problem through awesome teachers?

Categories: musing

Tech firm mindset to avoid like the plague

May 15, 2012 Cathy O'Neil, mathbabe 10 comments

There was recently an article entitled “Silicon Valley Avoids ‘B Players’ Like the Plague” which got my attention. Go ahead and read it, it’s pretty short. Here’s the heart of the story:

And not only are companies able to achieve more with less people, they’re also wary of hiring anyone but the best engineers. This is sometimes called the “bozo factor.” The late Steve Jobs often talked about the importance of hiring nothing but “A players.”

The former Apple chief executive said to an interviewer in 1998: “You’re well advised to go after the cream of the cream. That’s what we’ve done. You can then build a team that pursues the A+ players. A small team of A+ players can run circles around a giant team of B and C players.”

To avoid hiring less than A players, companies can go to extremes. At Violin Memory, managers can spend up to half of their time on screening and interviewing candidates. Reference checking alone can eat up large portions of the day. Candidates typically provide three references, but hiring managers will then tap their own networks to make contact with up to five people who have worked with the person. “Your reputation follows you,” said Vice President of Marketing Matt Barletta.

First, you know it’s going to be dripping with compassion and thoughtfulness if it’s a Steve Job’s quote. Second, I’m sure those managers who spend half their day stalking candidates think they’re super productive, when all it says to me is that the more time you spend spinning your creative genius in an environment like this the better – not a good incentive in an industry that probably needs less spin and more skepticism.

Okay, so they have some nutty ideas about hiring people. You might want to consider how they go about firing people as well. From the article:

Jay Fulcher, chief executive of online video technology start-up Ooyala, says he’s “never fired someone fast enough. By the time you know that it’s time for them to go, it’s already too late.”

Ummm… okay, but maybe the work is amazing? From the article:

“It sucks people in, and it takes away from your family life,” says Vice President of Engineering Kevin Rowett. “We have to figure out, can people tolerate that level of intensity?”

Ummm… sure, okay. But maybe this is some kind of super creative environment where you’re expected to be quirky and spontaneous, and you’re not expected to follow rules? From the article (emphasis mine):

Landing a position at Kaggle, a San Francisco-based start-up that crowdsources data analysis problems, is considered such a score that the company is able to have potential candidates move to San Francisco for one to two weeks and audition for a job.

People, people. What the fuck? Are we still wondering why there aren’t enough women engineers in Silicon Valley?

Caveats: 1) the comments on that article are scathing and worth reading, and 2) this toxic mindset is also apparent in New York.

Categories: rant

Who wants Jamie Dimon’s job?

May 14, 2012 Cathy O'Neil, mathbabe 12 comments

It’s Jamie Dimon day for me today, I’ve offered to write a first draft of a Alternative Banking piece on the JP Morgan $2,000,000,000 “hedging loss” that he announced last week and which resulted in a 12% stock price loss in the past 5 trading days. There are many sordid details to wade through to prepare, but here’s the question I’d like you to think about for a minute.

Who would want to take that job once he resigns or gets booted?

I’m thinking the world is divided into people who are realistic about how ridiculously large and unmanageable any too-big-to-fail bank actually is, and the psychopaths that think they are the guy who can tame the beast. Jamie Dimon was definitely one of the most psychopathic of the original crowd of CEO’s left over from before the credit crisis, and honestly he played his part so well it was amazing. It didn’t hurt that JP Morgan was never the worst example of any kind of underhanded and outrageous risk-taking, until now.

For example: Dimon consistently and vehemently complained about any regulation as “strangulation” for his industry, and as anti-american. He was so adamant that people (read: regulators, politicians, and the Fed) took him very very seriously and he was on the verge of fatally weakening the Volcker Rule. We’ll see how that pans out now, but I’m hopeful for something a bit less pathetic.

Is there anyone left who can take that over? Who has the required psychopathic balls?

Categories: #OWS, finance

Ideas for two thesis problems in data science

May 13, 2012 Cathy O'Neil, mathbabe 11 comments

Natural Language Processing on math overflow

You know about math overflow? It’s a site where grad students in math (or anyone) go and pose questions, and other people can answer them. There are lots of uninteresting, unanswered questions (like questions that are too easy and the person should be able to look up) and there are some really popular ones and some really dumb ones. Sometimes there are interesting ones.

Here’s a thesis idea, come up with a metric for “interestingness” and try to forecast the interestingness of a question from its language. Might as well also try to forecast its popularity while you’re at it. That way, if you make a good model, some of the more interesting questions will get higher in the queue and people will have a better time at the site.

Genealogy graphs in different fields

You know about the mathematics genealogy project? It shows everyone with a Ph.D. in math and considers them to be “descended” from their advisor in a family-tree like structure. For example, I’m here, and if I got up through my ancestors in 7 steps I get to Jacobi. Actually there are lots of ways to go up since a bunch of people have more than one advisor – I’m also 7 steps away from Poisson, 8 from Lagrange and Laplace, and 9 from Euler. This is probably not because I’m so cool but because there just weren’t many mathematicians back then- probably most people descended from Euler. And because we have this cool data set we can see if that’s true!

Here’s what I think someone should do, besides visualizing this graph in an awesome way (which by itself would be really cool, has anyone done that?). They should draw the graph for other fields as well and try to see if there are graph properties that characterize mathematics as distinct from other disciplines like Physics or Law or History.

Categories: data science, math

Conspiracy theorists may be right but they can’t explain why

May 12, 2012 Cathy O'Neil, mathbabe 8 comments

I’m still reading Haidt’s “The Righteous Mind: Why Good People Are Divided by Politics and Religion,” very slowly, because I have approximately 15 spare minutes a week set aside for free reading.

The part I read last night had to do with how we use our brain as a press secretary for what we believe, arguing for that policy using all the persuasiveness we can muster, no matter how weak our evidence is.

Specifically, if we see evidence for our point of view, we jump on it. If we see evidence against our point of view (oh shit!), we wrinkle our foreheads and feel stressed out, causing us to search and search until we finally find evidence for our point of view again (whew!).

This all seems right to me, although I may just be excited about it because it was already my point of view.

Haidt then goes on to explain that our pleasure centers are directly stimulated when we go through this process of confirming our view, especially when it was somewhat challenged by contrary evidence, and especially if our view is pretty hardline. Finally, he explains that conspiracy theorists are addicted to that pleasure center stimulation moment like a heroin addict.

Assuming this is correct, it explains something I’ve been super puzzled by with conspiracy theorists. And I should say that, being part of OWS, you get to interact with more than your fair share of such people.

Namely, they can never explain their position. In fact I’d say that this is their characterizing feature: one is dubbed a conspiracy theorist not by the unreasonableness of one’s position but by the way one tries to communicate it to other people. If you just had a strong opinion but could explain it well and persuasively, then you’ll never be considered a conspiracy theorist, although of course you could be considered an asshole (depending on what hardline opinion you harbor).

Example: when I try to engage my conspiracy theorist friends (because I do think they are for the most part dear people), they very often get into the tangential loops where they concentrate on one of the following:

They don’t explain this to anyone/ it’s a secret
It’s too hard to understand
There’s a small group of people who have all the power

In spite of trying to convince them that you are listening, you are smart, and you understand that we don’t live in a perfectly fair system, it’s really hard (but not necessarily impossible) to get them to settle down and tell you why they believe these things. I think their pleasure centers must also get stimulated when they go over these three points, because as I said they get totally distracted and it’s difficult to interrupt them.

And by the way, they are willing to try to explain their theory to you. I think one thing Haidt forgot to mention is that it must be a huge thrill to convert other people to your point of view when you are a conspiracy theorist, because often that seems to be a very serious goal.

When it comes to our current financial system especially, I’m starting to believe that many of these points are overall valid, but it’s kind of tragic how poorly my friends explain them. Maybe part of my blog can be devoted to explaining the “why” of the conspiracy once I think I’ve got a good argument.

One last thing which Haidt mentioned and that I’ve noticed too (but which I’m wary of exclaiming as his key point since, again, I already believed it). Namely, scientists are trained to look at evidence and admit when they are wrong. In the realm of mathematics this is certainly so: if you see something disproved it’s a simple waste of time to keep thinking it’s true, even if you previously fervently believed it.

But of course this only holds in the context of theorems and proofs- I’m not sure mathematicians are any better than anyone else in admitting they’re wrong outside of the context of theorems and proofs. Haidt mentioned that there’s no evidence that moral philosophers are any more moral than other philosophers, for example.

Categories: musing

Older Entries

mathbabe

Archive

Best case/ worst case: Medicine 50 years from now

How to talk conservative

When “extend and pretend” becomes “delay and pray”

Biking in New York City

Everybody lies (except me)

All the good data nowadays is private – what’s the point of having a data science Ph.D.?

Favorite bands

The engaged skeptic

An open source credit rating agency now exists!

Buying organic doesn’t make you better than me

Stop, Question, and Frisk policy getting stopped, questioned, and frisked

WTF with girdles?!?

Google’s promotion policy sucks for women

Recovery begins when addiction ends: an open letter to Jamie Dimon (#OWS)

Stop with the man-diets already, coffee is good for you.

The modeling death spiral for public schools

Tech firm mindset to avoid like the plague

Who wants Jamie Dimon’s job?

Ideas for two thesis problems in data science

Conspiracy theorists may be right but they can’t explain why

Top Posts & Pages

Follow Blog via Email

Recent Posts

Meta