rant | mathbabe

Duke deans drop the ball on scientific misconduct

November 10, 2015 Cathy O'Neil, mathbabe 8 comments

Former Duke University cancer researcher Anil Potti was found guilty of research misconduct yesterday by the federal Office of Research Integrity (ORI), after a multi-year investigation. You can read the story in Science, for example. His punishment is that he won’t do research without government-sponsored supervision for the next five years. Not exactly stiff.

This article also covers the ORI decision, and describes some of the people who suffered from poor cancer treatment because of his lies. Here’s an excerpt:

Shoffner, who had Stage 3 breast cancer, said she still has side effects from the wrong chemotherapy given to her in the Duke trial. Her joints were damaged, she said, and she suffered blood clots that prevent her from having knee surgery now. Of the eight patients who sued, Shoffner said, she is one of two survivors.

What’s interesting to me this morning is that both articles above mention the same reason for the initial investigation in his work. Namely, that he had padded his resume, pretending to be a Rhodes Scholar when he wasn’t. That fact was reported by a website called Cancer Letter in 2010.

But here’s the thing, back in 2008 a 3rd-year medical student named Bradford Perez sent the deans at Duke (according to Cancer Letter) a letter explaining that Potti’s lab was fabricating results. And for those of you who can read nerd, please go ahead and read his letter, it is extremely convincing. An excerpt:

Fifty-nine cell line samples with mRNA expression data from NCI-60 with associated radiation sensitivity were split in half to designate sensitive and resistant phenotypes. Then in developing the model, only those samples which fit the model best in cross validation were included. Over half of the original samples were removed. It is very possible that using these methods two samples with very little if any difference in radiation sensitivity could be in separate phenotypic categories. This was an incredibly biased approach which does little more than give the appearance of a successful cross validation.

Instead of taking up the matter seriously, the deans pressured Perez to keep quiet. And nothing more happened for two more years.

The good news: Bradford Perez seems to have gotten a perfectly good job.

The bad news: the deans at Duke suck. Unfortunately I don’t know exactly which deans and what their job titles are, but still: why are they not under investigation? What would deans have to do – or not do – to get in trouble? Is there any kind of accountability here?

Categories: education, modeling, news, rant, statistics

Sharing insurance costs with the sharing economy

May 11, 2015 Cathy O'Neil, mathbabe 16 comments

One consequence of the “sharing economy” that hasn’t been widely discussed, at least as far as I’ve seen, is how the externalities are being absorbed. Specifically, insurance costs.

Maybe because it’s an ongoing process, but for both Uber and AirBnB, the companies tell individuals who drive that their primary car insurance should be in use, and they tell individual home- or apartment-dwellers that their renters insurance should apply.

In other words, if something goes wrong, the wishful thinking goes, the private, individual insurance plans should kick in.

When people have tried to verify this, however, they responses have been mixed and mostly negative. The insurance companies obviously don’t want to cover a huge number of people for circumstances they didn’t expect when they offered the coverage.

So, if an Uber driver gets into an accident while ferrying a passenger, it’s not clear whether their primary insurance will cover it. It’s even less clear if the driver is using the Uber app and is on their way to get a passenger. Similarly, if an AirBnB guest falls because of a broken staircase, it’s not clear who is supposed to pay for the damages to the person or the staircase. What if the guest burns down the house?

So far I don’t think it’s been fully decided, but I think one of two things could happen.

In the first scenario, the insurance companies will really refuse to cover such things. To do this they will have to have a squad of investigators who somehow make sure the customer in question was or was not hosting a guest or driving a customer. That would involve suspicion and some amount of harassment, which customers don’t like.

In the second scenario, which I think is more likely given the above, the insurance companies will quietly pay for the damages accrued by Uber and AirBnB usage. They won’t advertise this, and if asked, they will discourage any customer from doing stuff like that, but they also won’t actually refuse to pay the costs, which they will simply transfer to the larger pool of customers. It doesn’t really matter to them at all, in fact, as long as they are not the only insurance company with this problem.

That will mean that the quants who figure out the costs of insurance will see their numbers change over time, depending on how much more the insurance is being called into action. I expect this to happen a lot more for Uber drivers, because if you are an Uber driver 40 hours a week, that means you’re always in your car. So our insurance costs will go up in proportion to how many people become Uber drivers. I expect this to happen somewhat more for AirBnB renters, because the house or apartment is in constant use; if it’s being rented by rowdy partiers, all the more. Our renters insurance will go up in proportion to how many people are AirBnB renters.

That reminds me of a story my dad used to like telling, whereby a friend of his rented out his Cambridge house to a Harvard professor, and when he came back it was totally trashed, including what looked like a bonfire pit in the living room. The professor in question was Timothy Leary.

Anyhoo, my overall conclusion is that the new “sharing economy” businesses really will end up sharing something with the rest of us soon, namely the cost of insurance. We will all be paying more for car insurance and home- or renters-insurance if my guess is accurate. Thanks, guys.

Categories: economics, rant, statistics

The Police State is already here.

April 27, 2015 Cathy O'Neil, mathbabe 9 comments

The thing that people like Snowden are worried about with respect to mass surveillance has already happened. It’s being carried out by police departments, though, not the NSA, and its targets are black men, not the general population.

Take a look at this incredible Guardian article written by Rose Hackman. Her title is, Is the online surveillance of black teenagers the new stop-and-frisk? but honestly that’s a pretty tame comparison if you think about the kinds of permanent electronic information that the police are collecting about black boys in Harlem as young as 10 years old.

Some facts about the program:

28,000 residents are being surveilled
300 “crews,” a designation that rises to “gangs” when there are arrests,
Officers trawl Facebook, Instagram, Twitter, YouTube, and other social media for incriminating posts
They pose as young women to gain access to “private” accounts
Parents are not notified
People never get off these surveillance lists
In practice, half of court cases actually use social media data to put people away
NYPD cameras are located all over Harlem as well

We need to limit the kind of information police can collect, and put limits on how discriminatory their collection practices are. As the article points out, white fraternity brothers two blocks away at Columbia University are not on the lists, even though there was a big drug bust in 2010.

For anyone who wonders what a truly scary police surveillance state looks like, they need look no further than what’s already happening for certain Harlem residents.

Categories: #OWS, discrimination, modeling, rant, white privilege

Workplace Personality Tests: a Cynical View

April 16, 2015 Cathy O'Neil, mathbabe 20 comments

There’s a frightening article in the Wall Street Journal by Lauren Weber about personality tests people are now forced to take to get shitty jobs in customer calling centers and the like. Some statistics from the article include: 8 out of 10 of the top private employers use such tests, and 57% of employers overall in 2013, a steep rise from previous years.

The questions are meant to be ambiguous so you can’t game them if you are an applicant. For example, yes or no: “I have never understood why some people find abstract art appealing.”

At the end of the test, you get a red light, a yellow light, or a green light. Red lighted people never get an interview, and yellow lighted may or may not. Companies cited in the article use the tests to disqualify more than half their applicants without ever talking to them in person.

The argument for these tests is that, after deploying them, turnover has gone down by 25% since 2000. The people who make and sell personality tests say this is because they’re controlling for personality type and “company fit.”

I have another theory about why people no longer leave shitty jobs, though. First of all, the recession has made people’s economic lives extremely precarious. Nobody wants to lose a job. Second of all, now that everyone is using arbitrary personality tests, the power of the worker to walk off the job and get another job the next week has gone down. By the way, the usage of personality tests seems to correlate with a longer waiting period between applying and starting work, so there’s that disincentive as well.

Workplace personality tests are nothing more than voodoo management tools that empower employers. In fact I’ve compared them in the past to modern day phrenology, and I haven’t seen any reason to change my mind since then. The real “metric of success” for these models is the fact that employers who use them can fire a good portion of their HR teams.

Categories: data science, modeling, rant

Fingers crossed – book coming out next May

April 15, 2015 Cathy O'Neil, mathbabe 14 comments

As it turns out, it takes a while to write a book, and then another few months to publish it.

I’m very excited today to tentatively announce that my book, which is tentatively entitled Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, will be published in May 2016, in time to appear on summer reading lists and well before the election.

Fuck yeah! I’m so excited.

p.s. Fight for 15 is happening now.

Categories: arms race, credit scores, data journalism, data science, discrimination, economics, education, feedback loop, finance, journalism, law, math education, modeling, musing, open source tools, rant, statistics

Predatory credit score-based insurance fees

April 13, 2015 Cathy O'Neil, mathbabe 3 comments

I’ve been looking into who uses credit scores – FICO scores or other alternative scores – and I’ve found that the insurance industry is a major user.

Homeowners insurance rates, for example, varies wildly by state depending on what kind of credit score you have, often more than doubling for people with poor credit versus people with excellent credit. This is in spite of the fact that homeowners insurance applies not to the payments of mortgages but rather to the contents of an apartment or home.

Similarly, auto insurance rates vary by credit score, even though someone with a poor credit score isn’t obviously a bad driver. For example, in Maryland, people with bad credit scores can be charged 40% more just for having bad credit scores.

Statistics like this make me wonder, how much of this price discrimination comes from the insurance companies trying to understand and account for actual risk, and how much comes from their understanding that poorer people have fewer options and will simply pay predatory rates?

And just in case you’re a believer in free markets and fair competition, and think such predatory behavior would be whisked away in a competitive market, insurance companies actually target people who don’t shop around and charge them more. In other words, it’s not a free market if not everyone actually has good information.

Tell me if you have more examples like this, I’m a collector!

Categories: credit scores, discrimination, economics, rant

Everyone hates college administrators

April 7, 2015 Cathy O'Neil, mathbabe 19 comments

If you were wondering why I didn’t blog yesterday, which you probably weren’t (confession: I don’t read other peoples’ blogs and I don’t listen to any podcasts. So I would never, ever ask anyone to read my blog or listen to my podcast), it was because I was completely confused and irritated by this NYTimes opinion piece on the rising cost of college, written by University of Colorado Law Professor Paul Campos.

I really think the Times needs to either have footnotes or hyperlinks in their opinion pieces, because this guy was playing so fast and loose with his numbers that I had really no idea what he was talking about most of the time. That’s saying something considering that this, the cost of college and its causes, is something I have spent many hours thinking about and researching.

So what happened was, I didn’t have time to completely formulate my opposition to why his reasoning was muddled and confusing. I spent way too much time trying to figure out where he was getting his data. Waste of time.

Good news, though, my Slate Money co-host Jordan Weissman has done all that work for us, in his piece aptly entitled The New York Times Offers One of the Worst Explanations You’ll Read of Why College Is So Expensive. Who says procrastination doesn’t work?

As usual, if you’ve ever listened to my podcast (and this isn’t a request for you to do so!), I don’t agree completely with Jordan. However, my delta of agreement with Jordan is very manageable compared to the delta of disagreement I had with Campos. Basically I would quibble with laying any of the blame at the feet of instructors, but since he barely does that, let’s just go with his awesome take-down.

Take-down of what? Well, Campos basically hates college administrators, and pretends there’s no other problems in the world except them. It’s a mistake that he doesn’t have to make.

I mean really, who doesn’t hate college administrators? As a former college administrator myself, I know it’s universal; I certainly hated myself the entire time.

But that doesn’t mean there’s no other factors at all. Reduced public money for colleges is in fact a huge problem, especially when you pair it with the increased federal aid money going to students at corrupt for-profit colleges. Corinthian obtained $1.4 billion in federal grant and loan dollars in 2010 alone, more than the 10 University of California campuses combined for that same year. This system is in terrible need of repair.

Instead of simply hating on college admin, or rather, in addition to hating on admin, can we start thinking about an alternative no-frills state college system that is truly affordable and gives honest and basic instructions without trying to compete on the US News & World Reports stage?

Categories: education, rant

The arbitrary punishment of New York teacher evaluations

April 2, 2015 Cathy O'Neil, mathbabe 24 comments

The Value-Added Model for teachers (VAM), currently in use all over the country, is a terrible scoring system, as I’ve described before. It is approximately a random number generator.

Even so, it’s still in use, mostly because it wields power over the teacher unions. Let me explain why I say this.

Cuomo’s new budget negotiations with the teacher’s union came up with the following rules around teacher tenure, as I understand them (readers, correct me if I’m wrong):

It will take at least 4 years to get tenure,
A teacher must get at least 3 “effective” or “highly effective” ratings in those three years,
A teacher’s yearly rating depends directly on their VAM score: they are not allowed to get an “effective” or “highly effective” rating if their VAM score comes out as “ineffective.”

Now, I’m ignoring everything else about the system, because I want to distill the effect of VAM.

Let’s think through the math of how likely it is that you’d be denied tenure based only on this random number generator. We will assume only that you otherwise get good ratings from your principal and outside observations. Indeed, Cuomo’s big complaint is that 98% of teachers get good ratings, so this is a safe assumption.

My analysis depends on what qualifies as an “ineffective” VAM score, i.e. what the cutoff is. For now, let’s assume that 30% of teachers receive “ineffective” in a given year, because it has to be some number. Later on we’ll see how things change if that assumption is changed.

That means that 30% of the time, a teacher will not be able to receive an “effective” score, no matter how else they behave, and no matter what their principals or outside observations report for a given year.

Think of it as a biased coin flip, and 30% of the time – for any teacher and for any year – it lands on “ineffective”, and 70% of the time it lands on “effective.” We will ignore the other categories because they don’t matter.

How about if you look over a four year period? To avoid getting any “ineffective” coin flips, you’d need to get “effective” every year, which would happen 0.70^4 = 24% of the time. In other words, 76% of the time, you’d get at least one “ineffective” rating just by chance.

But remember, you don’t need to get an “effective” rating for all four years, you are allowed one “ineffective rating.” The chances of exactly one “ineffective” coin flip and three “effective” flips is 4 (1-0.70) 0.70^3 = 41%.

Adding those two scenarios together, it means that 65% of the time, over a four year period, you’d get sufficient VAM scores to receive tenure. But it also means that 35% of the time you wouldn’t, through no fault of your own.

This is the political power of a terrible scoring system. More than a third of teachers are being arbitrarily chosen to be punished by this opaque and unaccountable test.

Let’s go back to my assumption, that 30% of teachers are deemed “ineffective.” Maybe I got this wrong. It directly impacts my numbers above. If the overall probability of being deemed “effective” is p, then the overall chance of getting sufficient VAM scores will be $p^4 + 4 p^3 (1-p).$

So if I got it totally wrong, and 98% of teachers are described as effective by the VAM model, this would mean almost all teachers get sufficient VAM scores.

On the other hand, remember that the reason VAM is being pushed so hard by people is that they don’t like it when evaluations systems think too many people are effective. In fact, they’d rather see arbitrary and random evaluation than see most people get through unscathed.

In other words, it is definitely more than 2% of teachers that are called “ineffective,” but I don’t know the true cutoff.

If anyone knows the true cutoff, please tell me so I can compute anew the percentage of teachers that are arbitrarily being kept from tenure.

Categories: education, rant, statistics

A critique of a review of a book by Bruce Schneier

March 17, 2015 Cathy O'Neil, mathbabe 6 comments

I haven’t yet read Bruce Schneier’s new book, Data and Goliath: The Hidden Battles To Collect Your Data and Control Your World. I plan to in the coming days, while I’m traveling with my kids for spring break.

Even so, I already feel capable of critiquing this review of his book (hat tip Jordan Ellenberg), written by Columbia Business School Professor and Investment Banker Jonathan Knee. You see, I’m writing a book myself on big data, so I feel like I understand many of the issues intimately.

The review starts out flattering, but then it hits this turn:

When it comes to his specific policy recommendations, however, Mr. Schneier becomes significantly less compelling. And the underlying philosophy that emerges — once he has dispensed with all pretense of an evenhanded presentation of the issues — seems actually subversive of the very democratic principles that he claims animates his mission.

That’s a pretty hefty charge. Let’s take a look into Knee’s evidence that Schneier wants to subvert democratic principles.

NSA

First, he complains that Schneier wants the government to stop collecting and mining massive amounts of data in its search for terrorists. Knee thinks this is dumb because it would be great to have lots of data on the “bad guys” once we catch them.

Any time someone uses the phrase “bad guys,” it makes me wince.

But putting that aside, Knee is either ignorant of or is completely ignoring what mass surveillance and data dredging actually creates: the false positives, the time and money and attention, not to mention the potential for misuse and hacking. Knee’s opinion on that is simply that we normal citizens just don’t know enough to have an opinion on whether it works, including Schneier, and in spite of Schneier knowing Snowden pretty well.

It’s just like waterboarding – Knee says – we can’t be sure it isn’t a great fucking idea.

Wait, before we move on, who is more pro-democracy, the guy who wants to stop totalitarian social control methods, or the guy who wants to leave it to the opaque authorities?

Corporate Data Collection

Here’s where Knee really gets lost in Schneier’s logic, because – get this – Schneier wants corporate collection and sale of consumer data to stop. The nerve. As Knee says:

Mr. Schneier promotes no less than a fundamental reshaping of the media and technology landscape. Companies with access to large amounts of personal data would be “automatically classified as fiduciaries” and subject to “special legal restrictions and protections.”

That these limits would render illegal most current business models — under which consumers exchange enhanced access by advertisers for free services – does not seem to bother Mr. Schneier”

I can’t help but think that Knee cannot understand any argument that would threaten the business world as he knows it. After all, he is a business professor and an investment banker. Things seem pretty well worked out when you live in such an environment.

By Knee’s logic, even if the current business model is subverting democracy – which I also argue in my book – we shouldn’t tamper with it because it’s a business model.

The way Knee paints Schneier as anti-democratic is by using the classic fallacy in big data which I wrote about here:

Although professing to be primarily preoccupied with respect of individual autonomy, the fact that Americans as a group apparently don’t feel the same way as he does about privacy appears to have little impact on the author’s radical regulatory agenda. He actually blames “the media” for the failure of his positions to attract more popular support.

Quick summary: Americans as a group do not feel this way because they do not understand what they are trading when they trade their privacy. Commercial and governmental interests, meanwhile, are all united in convincing Americans not to think too hard about it. There are very few people devoting themselves to alerting people to the dark side of big data, and Schneier is one of them. It is a patriotic act.

Also, yes Professor Knee, “the media” generally speaking writes down whatever a marketer in the big data world says is true. There are wonderful exceptions, of course.

So, here’s a question for Knee. What if you found out about a threat on the citizenry, and wanted to put a stop to it? You might write a book and explain the threat; the fact that not everyone already agrees with you wouldn’t make your book anti-democratic, would it?

MLK

The rest of the review basically boils down to, “you don’t understand the teachings of the Reverend Dr. Martin Luther King Junior like I do.”

Do you know about Godwin’s law, which says that as soon as someone invokes the Nazis in an argument about anything, they’ve lost the argument?

I feel like we need another, similar rule, which says, if you’re invoking MLK and claiming the other person is misinterpreting him while you have him nailed, then you’ve lost the argument.

Categories: data science, economics, journalism, modeling, rant

Creepy big data health models

January 6, 2015 Cathy O'Neil, mathbabe 8 comments

There’s an excellent Wall Street Journal article by Joseph Walker, entitled Can a Smartphone Tell if You’re Depressed?, that describes a lot of creepy new big data projects going on now in healthcare, in partnership with hospitals and insurance companies.

Some of the models come in the form of apps, created and managed by private, third-party companies that try to predict depression in, for example, postpartum women. They don’t disclose what they are doing to many of the women, or the extent of what they’re doing, according to the article. They own the data they’ve collected at the end of the day and, presumably, can sell it to anyone interested in whether a woman is depressed. For example, future employers. To be clear, this data is generally not covered by HIPAA.

Perhaps the creepiest example is a voice analysis model:

Nurses employed by Aetna have used voice-analysis software since 2012 to detect signs of depression during calls with customers who receive short-term disability benefits because of injury or illness. The software looks for patterns in the pace and tone of voices that can predict “whether the person is engaged with activities like physical therapy or taking the right kinds of medications,” Michael Palmer, Aetna’s chief innovation and digital officer, says.

…

Patients aren’t informed that their voices are being analyzed, Tammy Arnold, an Aetna spokeswoman, says. The company tells patients the calls are being “recorded for quality,” she says.

“There is concern that with more detailed notification, a member may alter his or her responses or tone (intentionally or unintentionally) in an effort to influence the tool or just in anticipation of the tool,” Ms. Arnold said in an email.

In other words, in the name of “fear of gaming the model,” we are not disclosing the creepy methods we are using. Also, considering that the targets of this model are receiving disability benefits, I’m wondering if the real goal is to catch someone off their meds and disqualify them for further benefits or something along those lines. Since they don’t know they are being modeled, they will never know.

Conclusion: we need more regulation around big data in healthcare.

Categories: data journalism, modeling, rant

Wage Gaps Don’t Magically Get Smaller Because Big Data

December 19, 2014 Cathy O'Neil, mathbabe 7 comments

Today, just a rant. Sorry. I mean, I’m not a perfect person either, and of course that’s glaringly obvious, but this fluff piece from Wired, written by Pam Wikham of Raytheon, is just aggravating.

The title is Big Data, Smaller Wage Gap? and, you know, it almost gives us the impression that she has a plan to close the wage gap using big data, or alternatively an argument that the wage gap will automatically close with the advent of big data techniques. It turns out to be the former, but not really.

After complaining about the wage gap for women in general, and after we get to know how much she loves her young niece, here’s the heart of the plan (emphasis mine, on the actual plan parts of the plan):

Analytics and microtargeting aren’t just for retailers and politicians — they can help us grow the ranks of executive women and close the gender wage gap. Employers analyze who clicked on internal job postings, and we can pursue qualified women who looked but never applied. We can go beyond analyzing the salary and rank histories of women who have left our companies. We can use big data analytics to tell us what exit interviews don’t.

Facebook posts, Twitter feeds and LinkedIn groups provide a trove of valuable intel from ex-employees. What they write is blunt, candid and useful. All the data is there for the taking — we just have to collect it and figure out what it means. We can delve deep into whether we’re promoting the best people, whether we’re doing enough to keep our ranks diverse, whether potential female leaders are being left behind and, importantly, why.

That’s about it, after that she goes back to her niece.

Here’s the thing, I’m not saying it’s not an important topic, but that plan doesn’t seem worthy of the title of the piece. It’s super vague and fluffy and meaningless. I guess, if I had to give it meaning, it would be that she’s proposing to understand internal corporate sexism using data, rather than assuming “data is objective” and that all models will make things better. And that’s one tiny step, but it’s not much. It’s really not enough.

Here’s an idea, and it kind of uses big data, or at least small data, so we might be able to sell it. Ask people in your corporate structure what the actual characteristics are of people they promote, and how they are measured, or if they are measured, and look at the data to see if what they say is consistent with what they do, and whether those characteristics are inherently sexist. It’s a very specific plan and no fancy mathematical techniques are necessary, but we don’t have to tell anyone that.

What combats sexism is a clarification and transparent description of job requirements and a willingness to follow through. Look at blind orchestra auditions for a success story there. By contrast, my experience with the corporate world is that, when hiring or promoting, they often list a long series of unmeasurable but critical properties like “good cultural fit” and “leadership qualities” that, for whatever reason, more men are rated high on than women.

Categories: data science, rant

The re-emergence of debtors’ prisons

December 1, 2014 Cathy O'Neil, mathbabe 5 comments

Yesterday at my weekly Occupy meeting we watched videos called To Prison For Poverty by Brave New Films (Part I and Part II) before discussing them. Take a look, they are well done:

It’s not the first time this issue has come up recently; the NPR investigations into court fees from last May, called Guilty and Charged, led to a bunch of reports on issues similar to this. Probably the closest is the one entitled Unpaid Court Fees Land The Poor In 21st Century Debtors’ Prisons.

A few comments:

Ferguson is now famous for having a basically white police force patrolling a basically black populace. But it also has this fines-and-fees-and-jails problem: fines and fees associated to mostly traffic violations accounted for 21% of the city’s budget in 2013. And there were more arrest warrants than people in Ferguson last year, mostly for non-violent offenses.
But the debtors’ prison problem isn’t just a racial issue. The people profiled in the above video were white, which could have been a documentarian’s decision, but in any case is a fact: the poverty-to-prison system is screwing all poor people, not just minorities. This is in spite of the fact that the Supreme Court found it unconstitutional in the landmark 1983 case, Bearden v. Georgia.
This sense that “everyone is screwed” creates solidarity among poor whites and poor blacks, and especially young people. The Ferguson protests have been multi-racial, for example. And if you’ve read The New Jim Crow by Michelle Alexander, you’ll recognize a historical pattern whereby political change happens when poor whites and poor blacks start working together.
One interesting and scary question to emerge from the above stories is, how did so many fees and fines get attached to low-level misdemeanors in the first place? It seems like privatized probation and prison companies have a lot to do with it.
In some cases, they are putting people in jail for days and weeks, which costs the government hundreds of dollars, in order to capture a small fee. That makes no sense.
In other cases, the fees accumulate so fast that the poor person who committed the misdemeanor ends up being responsible for an outrageous amount of money, far surpassing the scale of the original misdeed, and all because they are poor. That also makes no sense.
It’s not just for prisons either; all sorts of functions that we consider governmental functions have been privatized, like health and human services: child welfare services, homeless services, half-way houses, and more.
In the worst cases, the original intent of the agency (“putting people on probation so they don’t have to be in jail”) has been perverted into an entirely different beast (“putting them in jail because they can’t pay their daily $35 probation fees”). The question we’d like to investigate further is, how did that happen and why?

Categories: #OWS, news, rant

What the fucking shit, Barbie?

November 19, 2014 Cathy O'Neil, mathbabe 12 comments

I’m back from Haiti! It was amazing and awesome, and please stand by for more about that, with cultural observations and possibly a slide show if you’re all well behaved.

Today, thanks to my math camp buddy Lenore Cowen, I am going to share with you an amazing blog post by Pamela Ribon. Her post is called Barbie Fucks It Up Again and it describes a Barbie book entitled Barbie: I Can Be a Computer Engineer

The other book is called “I Can Be an Actress”

Just to give you an idea of the plot, Barbie’s sister finds Barbie engaged on a project on her computer, and after asking her about it, Barbie responds:

“I’m only creating the design ideas,” Barbie says, laughing. “I’ll need Steven and Brian’s help to turn it into a real game!”

To which blogger Pamela Ribon comments:

What the fucking shit, Barbie?

Update: Please check out the amazing Amazon reviews of this book (hat tip Chris Wiggins).

BEST UPDATE EVER (hat tip Marko): BARBIE CAN CODE REMIXED

Categories: education, open source tools, rant, women in math

Alt Banking in Huffington Post #OWS

November 11, 2014 Cathy O'Neil, mathbabe 1 comment

Great news! The Alt Banking group had a piece published today in the Huffington Post entitled With Economic Justice For All, about our hopes for the next Attorney General.

For the sake of the essay, we coined the term “marble columns” to mean the opposite of “broken windows.” Instead of getting arrested for nothing, you never get arrested, as long as you work at a company with marble columns. For more, take a look at the whole piece!

Also, my good friend and bandmate Tom Adams (our band, the Tomtown Ramblers, is named after him) will be covering for me on mathbabe for the next few days while I’m away in Haiti. Please make him feel welcome!

Categories: #OWS, economics, finance, journalism, rant

Nerd catcalling

November 6, 2014 Cathy O'Neil, mathbabe 21 comments

This is a guest post by Becky Jaffe.

It has come to my attention that I am a nerd. I take this on good authority from my students, my friends, and, as of this morning, strangers in a coffee shop. I was called a nerd three times today before 10:30 am, while I was standing in line for coffee – which is to say, before I was caffeinated, and therefore utterly defenseless. I asked my accusers for suggestions on how to be less nerdy. Here was their helpful advice:

Guy in coffee shop: “Wear makeup and high heels.”

Another helpful interlocutor: “Use smaller words.”

My student, later in the day: “Care less about ideas.”

A friend: “Think less like NPR and more like C-SPAN.”

What I wish someone had said: “Is that a dictionary in your pocket or are you happy to see me?”

What I learned today is that if I want to avoid being called a nerd, I should be more like Barbie. And I don’t mean the Professor Barbie version, which – get this – does not exist. When I googled “Professor Barbie,” I got “Fashion Professor Barbie.”

So many lessons in gender conformity for one day! This nerd is taking notes.

Categories: Becky Jaffe, guest post, rant

Tailored political ads threaten democracy

October 31, 2014 Cathy O'Neil, mathbabe 14 comments

Not sure if you saw this recent New York Times article on the new data-driven political ad machines. Consider for example, the 2013 Virginia Governor campaign won by Terry McAuliffe:

…the McAuliffe campaign invested heavily in both the data and the creative sides to ensure it could target key voters with specialized messages. Over the course of the campaign, he said, it reached out to 18 to 20 targeted voter groups, with nearly 4,000 Facebook ads, more than 300 banner display ads, and roughly three dozen different pre-roll ads — the ads seen before a video plays — on television and online.

Now I want you to close your eyes and imagine what kind of numbers we will see for the current races, not to mention the upcoming presidential election.

What’s crazy to me about the Times article is that it never questions the implications of this movement. The biggest problem, it seems, is that the analytics have surpassed the creative work of making ads: there are too many segments of populations to tailor the political message to, and not enough marketers to massage those particular messages for each particular segment. I’m guessing that there will be more money and more marketers in the presidential campaign, though.

Translation: politicians can and will send different messages to individuals on Facebook, depending on what they think we want to hear. Not that politicians follow through with all their promises now – they don’t, of course – but imagine what they will say when they can make a different promise to each group. We will all be voting for slightly different versions of a given story. We won’t even know when the politician is being true to their word – which word?

This isn’t the first manifestation of different messages to different groups, of course. Romney’s famous “47%” speech was a famous example of tailored messaging to super rich donors. But on the other hand, it was secretly recorded by a bartender working the event. There will be no such bartenders around when people read their emails and see ads on Facebook.

I’m not the only person worried about this. For example, ProPublica studied this in Obama’s last campaign (see this description). But given the scale of the big data political ad operations now in place, there’s no way they – or anyone, really – can keep track of everything going on.

There are lots of ways that “big data” is threatening democracy. Most of the time, it’s by removing open discussions of how we make decisions and giving them to anonymous and inaccessible quants; think evidence-based sentencing or value-added modeling for teachers. But this political campaign ads is a more direct attack on the concept of a well-informed public choosing their leader.

Categories: data science, modeling, rant

The war against taxes (and the unmarried)

October 28, 2014 Cathy O'Neil, mathbabe 18 comments

The American Enterprise Institute, conservative think-tank, is releasing a report today. It’s called For richer, for poorer: How family structures economic success in America, and there is also an event in DC today from 9:30am til 12:15pm that will be livestreamed. The report takes a look at statistics for various races and income levels at how marriage is associated with increased hours works and income, for men especially.

It uses a technique called the “fixed-effects model,” and since I’d never studied that I took a look at it on the wikipedia page, and in this worked-out example on Josh Blumenstock’s webpage of massage prices in various cities, and in this example, on Richard William’s webpage, where it’s also a logit model, for girls in and out of poverty.

The critical thing to know about fixed effects models is that we need more than one snapshot of an object of interest – in this case a person who is or isn’t married – in order to use that person as a control against themselves. So in 1990 Person A is 18 and unmarried, but in 2000 he is 28 and married, and makes way more money. Similarly, in 1990 Person B is 18 and unmarried, but in 2000 he is 28 and still unmarried, and makes more money but not quite as much more money as Person A.

The AEI report cannot claim causality – and even notes as much on page 8 of their report – so instead they talk about a bunch of “suggested causal relationships” between marriage and income. But really what they are seeing is that, as men get more hours at work, they also tend to get married. Not sure why the married thing would cause the hours, though. As women get married, they tend to work fewer hours. I’m guessing this is because pregnancy causes both.

The AEI report concludes, rightly, that people who get married, and come from homes where there were married parents, make more money. But that doesn’t mean we can “prescribe” marriage to a population and expect to see that effect. Causality is a bitch.

On the other hand, that’s not what the AEI says we should do. Instead, the AEI is recommending (what else?) tax breaks to encourage people to get married. Most bizarre of their suggestions, at least to me, is to expand tax benefits for single, childless adults to “increase their marriageability.” What? Isn’t that also an incentive to stay single and childless?

What I’m worried about is that this report will be cleverly marketed, using the phrase “fixed effects,” to make it seem like they have indeed proven “mathematically” that individuals, yet again, are to be blamed for the structural failure of our nation’s work problems, and if they would only get married already we’d all be ok and have great jobs. All problems will be solved by tax breaks.

Categories: economics, modeling, rant

What male allies should really be doing

October 10, 2014 Cathy O'Neil, mathbabe 32 comments

Chris Wiggins was kind enough to forward me this article on a recent panel discussion of “Male Allies of Women” at the 2014 Grace Hopper Celebration, which is a big deal conference for women in tech.

Panelists included Facebook CTO Mike Schroepfer, Google’s SVP of Search Alan Eustace, GoDaddy CEO Blake Irving, and Intuit CTO Tayloe Stansbury. The advice was stale and trite and included things like “speak up,” “lean in,” and “get excited about your ideas like men do.”

Yes, I said GoDaddy.

By far the best part was the audience response – I wish I’d been there just for that part.

There was a Bingo game on the phrases that were anticipated:

What male allies should really be doing, step 1

Here’s the thing. If you haven’t seen this video of gamer Anita Sarkeesian speaking at the Feminist Frequency conference (hat tip Josh Vekhter), go take a look. It’s a fantastic and articulate diatribe against sexism and misogyny, and it ends with a super reasonable request of the men in the audience and in the world:

Trust women who say they experience sexism.

What’s amazing to me is how hard this is to hear for men in my life. When I repeated this to a couple of them, they actually said that I didn’t experience the stuff that I had. It was kind of nuts, and I had to point out to them that they were failing on the most basic level.

Yes, it requires empathy, and observation, and yes it sucks, because once you start seeing it you will be disappointed in the world. Tough shit, it’s reality.

What male allies should really be doing, step 2

Once men start trusting the women they love and admire and work with, then the next thing they can do is start acting on that knowledge.

I don’t know how many times I’ve been the target of sexism in front of other men and somehow it’s my job to confront it and deal with it. Men, step the fuck up and, when you see sexism happening, once you can manage that, defend the target and put a stop to it. Speak up and defend your friend, or your wife, or your daughter, or your colleague. Thanks.

Categories: rant

Detroit’s water problem and the Koch brothers

October 6, 2014 Cathy O'Neil, mathbabe 11 comments

Yesterday at the Alt Banking group we discussed the recent Koch brothers article from Rolling Stone Magazine, written by Tim Dickinson. You should read it now if you haven’t already.

There are tons of issues that came up, but one of them in particular was the control of information that the Koch brothers maintain over their activities. If you read the article, you realize that the brothers are die-hard libertarians but at some point realized that saying out loud that they are die-hard libertarians was working against them, specifically in terms of getting into trouble for polluting the environment with their chemical factories, so instead they started talking about how much they love the environment and work to protect it.

It’s not that they stopped polluting, it’s that their rhetoric changed. In fact there’s no reason to think they stopped polluting, since they still had plenty of regulators going after them for various violations. Since their apparent change of heart they’ve also decided to be publicly philanthropic, giving money to hospitals, and Lincoln Center, and even PBS (see how that worked out on Stephen Colbert).

The problem with all this window dressing is that people are actually starting to think the Koch brothers may be good guys after all, and what with the fancy lawyers that the Koch brothers hire to control information about them, the public view is very skewed.

For example, how many economists have they bought and inserted into universities nationwide? We will never really know. There’s no way we can keep a score sheet with “good deeds” on one side and “shitty deeds” on the other. We don’t have enough information for the second side.

The exception to this information control is when they get in trouble with regulators and it becomes a matter of public record. And thank goodness those court documents exist, and thank goodness investigative journalist Tim Dickinson did all the work he did to explain it to us.

A couple of conclusions. First, we complain a lot about the bank settlements for the misdeeds of the big banks. Nobody went to jail, and the system is just as likely to repeat this kind of thing again as it was in 2005. But another problem with this out-of-court settlement process, we now realize, is that we actually don’t know what happened except in big, vague terms. There will be no Tim Dickinson reporting on big banks.

Second, the connection to Detroit. Right now there are 15,000 residents of Detroit whose water has been shut down, basically so they can privatize the water system with the best deal from Wall Street. They owe less than $10 million, on average a measly $540. The United Nations has called this water shutoff a violation of the human rights of the people of Detroit.

If you feel bad about that, you can donate to someone’s water bill directly, which is kind of neat.

Or is it? Shouldn’t Obama be declaring Detroit a state of emergency? Wouldn’t we be doing that in another city that had 15,000 residents without water? Why is this an exception to that rule? Because the victims are poor? Don’t we recognize Detroit as a place where it’s unusually difficult to find work? Are we going to allow people to shut off heat as well, once winter comes?

Once you think about it, the idea of a “private solution” to the Detroit water emergency seems wrong. In fact, you can almost imagine David Koch coming to the rescue here, as part of his “positive optics” campaign, and bailing out the Detroit citizens and then, for good measure, buying up the water system altogether. A hero!

And if you’re in that mode, you can think about the asymptotic limit of that approach, whereby a few very rich people gradually take control of resources, and then there are intermittent famines of various types in different cities, and the rich people swoop in and heroically save the day whilst scooping up even more ownership of what used to be public infrastructure. And we might thank them every time, because it was a dire situation and they didn’t really need to do that with all their money.

It’s frustrating to live in a country that has so many resources but which can’t seem to get it together to meet the basic human needs of its citizens. We need a basic income, at least for the people in Detroit, at least right now.

Categories: #OWS, economics, rant

Women not represented in clinical trials

September 26, 2014 Cathy O'Neil, mathbabe 13 comments

This recent NYTimes article entitled Health Researchers Will Get $10.1 Million to Counter Gender Bias in Studies spelled out a huge problem that kind of blows me away as a statistician (and as a woman!).

Namely, they have recently decided over at the NIH, which funds medical research in this country, that we should probably check to see how women’s health are affected by drugs, and not just men’s. They’ve decided to give “extra money” to study this special group, namely females.

Here’s the bizarre and telling explanation for why most studies have focused on men and excluded women:

Traditionally many investigators have worked only with male lab animals, concerned that the hormonal cycles of female animals would add variability and skew study results.

Let’s break down that explanation, which I’ve confirmed with a medical researcher is consistent with the culture.

If you are afraid that women’s data would “skew study results,” that means you think the “true result” is the result that works for men. Because adding women’s data would add noise to the true signal, that of the men’s data. What?! It’s an outrageous perspective. Let’s take another look at this reasoning, from the article:

Scientists often prefer single-sex studies because “it reduces variability, and makes it easier to detect the effect that you’re studying,” said Abraham A. Palmer, an associate professor of human genetics at the University of Chicago. “The downside is that if there is a difference between male and female, they’re not going to know about it.”

Ummm… yeah. So instead of testing the effect on women, we just go ahead and optimize stuff for men and let women just go ahead and suffer the side effects of the treatment we didn’t bother to study. After all, women only comprise 50.8% of the population, they won’t mind.

This is even true for migraines, where 2/3rds of migraine sufferers are women.

One reason they like to exclude women: they have periods, and they even sometimes get pregnant, which is confusing for people who like to have clean statistics (on men’s health). In fact my research contact says that traditionally, this bias towards men in clinical trials was said to protect women because they “could get pregnant” and then they’d be in a clinical trial while pregnant. OK.

I’d like to hear more about who is and who isn’t in clinical trials, and why.

Categories: modeling, news, rant, statistics

Older Entries

mathbabe

Archive

Duke deans drop the ball on scientific misconduct

Sharing insurance costs with the sharing economy

The Police State is already here.

Workplace Personality Tests: a Cynical View

Fingers crossed – book coming out next May

Predatory credit score-based insurance fees

Everyone hates college administrators

The arbitrary punishment of New York teacher evaluations

A critique of a review of a book by Bruce Schneier

Creepy big data health models

Wage Gaps Don’t Magically Get Smaller Because Big Data

The re-emergence of debtors’ prisons

What the fucking shit, Barbie?

Alt Banking in Huffington Post #OWS

Nerd catcalling

Tailored political ads threaten democracy

The war against taxes (and the unmarried)

What male allies should really be doing

Detroit’s water problem and the Koch brothers

Women not represented in clinical trials

Top Posts & Pages

Follow Blog via Email

Recent Posts

Meta