December | 2011 | mathbabe

A New Year’s resolution you can keep

December 31, 2011 Cathy O'Neil, mathbabe 22 comments

Ladies, I know what you’re thinking. But don’t do it. If you need any more proof of the impossibility of losing weight, look no further than this New York Times article which describes it in excruciating, painful detail.

But that’s not even my angle. I’m not going to argue that you shouldn’t try to lose weight because it’s so hard. In fact, it’s possible for some people, as the article describes, as long as they are willing to think about nothing else for the rest of their life. But in fact, even if it wasn’t hard, even if it was an achievable goal, I’d still be arguing against it.

My angle is this: it’s just not interesting enough. You have better things to do than devote yourself to vanity. And plus, I’ve said it before and I’ll say it again, sexy is something you do, not something you look like.

Do you know how boring those people must be who think about food all the time? Have you ever spent time, real time, with someone who is singularly obsessed with food or exercise? I got news for you, if you haven’t, it’s insanely boring. They can only talk about their plan and how it’s going. And I’ve got news for you if you’re one of those people: you are insanely boring.

Get a hobby that involves other people, that gives you a higher sense of purpose. If you’re a lefty, join your local Occupy Wall Street group. Start writing a blog. Start a book club. Read stuff.

Do you know, I’ve heard this story, that some pollster asked a bunch of people what they’d do if they could do anything. It was left as an open question like that, if you could do anything, what would you do? And the majority of women said they’d lose weight (on average 10 pounds). WHAT?! They could have mentioned closing the income gap? Stopping wars? Improving climate change? Making sure people everywhere had access to clean water? And they chose to lose freaking 10 pounds, are you kidding me?

Actually it’s probably a myth, but it doesn’t really matter, because I believe it. And looking around at the world we live in, with all of the ridiculous assumptions of vanity and, what gets me even more riled up, time spent on crap like that, I can believe that women are typically so bombarded with self-image issues that they can’t think beyond them to bigger issues like wars and clean water and global warming. WTF, culture??

So my New Year’s resolution this year is to fight back against that crap. I’m starting today, one day early, by writing this post. Can I get a fuck yeah?!

I am officially out of the fat ladies’ closet, in full armor, ready to fight against that idea that we have any time to waste with vanity. We have way too much to do, ladies (and gentlemen), and not enough time to do it all. Let’s start now. For starters, I’m going to start to emulate a woman I read about a few years ago on Overheard in New York: “Mom, yeah, I got a Ph.D., I live in my own apartment in Manhattan, and I’ve got a fat ass. How about it?”

I am not ranting against humanity here. I think humanity can be great. But not when our culture is encouraging us to gaze at our navels. We need to actively create greatness, which we can do, but it takes thinking beyond ourselves and asking, what do I really want to do this year?

Categories: rant

Matt Stoller explains politics

December 30, 2011 Cathy O'Neil, mathbabe 3 comments

I’ve never understood politics, partly because they’re complicated, partly because the people who do understand politics are so heavily involved they don’t know how to contextualize for people like me. I’ve come to think of it as a lot like finance, where there’s power to be had by withholding information, and part of that power is wielded simply by inventing a new vocabulary that makes people on the outside feel tired and hopeless. You really need a tour guide, a translator, to walk you through stuff to achieve a decent level of understanding.

I now consider Matt Stoller my personal translator. Matt regularly contributes to Naked Capitalism, my go-to blog for informed, vitriolic insights into the corrupt world of finance. His recent post on Naked Capitalism concerning Ron Paul and liberals beautifully explains how confused modern liberals are when confronted by someone like Ron Paul, who is both unattractive and on their side for a number of reasons. I confess that I’ve been that confused liberal myself at many an #OWS Alternative Banking Working Group meeting, when the Ron Paul fans come and talk about Fed transparency.

But Matt doesn’t just tell a good story, although he does that. He also give you insight into the process of politics. He peppers his story with helpful, nerdy explanations like this:

An old Congressional hand once told me, and then drilled into my head, that every Congressional office is motivated by three overlapping forces – policy, politics, and procedure. And this is true as far as it goes. An obscure redistricting of two Democrats into one district that will take place in three years could be the motivating horse-trade in a decision about whether an important amendment makes it to the floor, or a possible opening of a highly coveted committee slot on Appropriations due to a retirement might cause a policy breach among leadership. Depending on committee rules, a Sub-Committee chairman might have to get permission from a ranking member or Committee Chairman to issue a subpoena, sometimes he might not, and sometimes he doesn’t even have to tell his political opposition about it. Congress is endlessly complex, because complexity can be a useful tool in wielding power without scrutiny. And every office has a different informal matrix, so you have to approach each of them differently.

Another recent Stoller post that really blew my mind was How the Federal Reserve Fights, which explained Matt’s experiences as a Senior Policy Advisor to Alan Grayson, a congressman on the Financial Services Committee in 2009-2011. Grayson teamed up with Ron Paul to force more transparency at the Fed. It’s an awesome story, but my favorite part, because I’m such a nerd and I love my nerd heroes, is the following:

When it gets down to crunch time, as a staffer going up against a big force of lots of lawyers, you get really tired and cut corners. One obstacle in legislating is that it is really hard to tell what bills do, because they have multiple provisions like “In Section 203, delete “do” and replace with “shall”. You have to constantly reference pieces of the code and compare changes, which gets confusing. It’s like doing “track changes”, but on paper and with multiple versions. This is a problem software could easily solve and I’ve heard that agencies and (probably the Fed) have such software. But I didn’t. So the Fed thought we would do nothing more than cursory reading of Watt’s amendment, and rely on their validators who told us the amendment would increase transparency. And this is where Grayson showed legislative genius. We were exhausted, but he got all the difference pieces of the law, and spent a few hours deciphering exactly what this amendment meant. And he figured out that not only did the amendment not open up the Fed facilities to independent inspection, it actually increased the secrecy of the Fed. If you want the gory details, here’s Grayson’s argument during the markup.

I’m kind of wishing Matt Stoller would write a book about “How Politics Works,” but then again does anybody read books anymore? Is it better for him to just continue to write timely blog posts? I’ll take what I can get.

Categories: #OWS, finance

Information loss

December 29, 2011 Cathy O'Neil, mathbabe 7 comments

When people ask me why the financial system is so complicated, I always say the same thing: because it benefits the insiders of the financial system to make it that way. The more complicated and opaque something is, the more opportunities to extract fees and withhold information. Or rather, to withhold information in order to extract fees.

In some sense you can think of the financial system as a huge “information loss” system, where people get paid based on how much more information they know than you do. Incidentally, this theory flies in the face of most economic assumptions of transparency, and explains the origin of the phrase “dumb money.” And it’s not my idea, it’s kind of an elemental fact for insiders; I’m bringing it up because I want to make sure people are aware of it.

As an analogy, think of the situation when you buy a used car from someone. They tell you some things, like its make and model, and they may let you test drive it, but you end up not knowing how many accidents it’s been in, and stuff like that. Your partial information in general lets them make money.

It’s kind of understandable why there’s so much insider trading going on. Insider trading is the ultimate and most efficient way to profit from information.

Another good example of information loss is with mortgages, and mortgage-backed securities. The original idea behind securitizing mortgages was that investors get to buy pieces of pools of mortgages, which “behave better” than individual mortgages: whereas an individual can refinance (and often does, when interest rates go down) or default, it’s less likely that a majority of the people in a pool refinance or default.

[Let’s ignore for now the issue that the banks got so high on the profits of securitization that the assumption of better behavior of pools got thrown out the window as the underlying mortgages became worse and worse – a gleaming example of information loss.]

In selling these pools, the banks were charging fees so that you, the investor, wouldn’t have to “deal with the details” of all of the individual mortgages in the pool. This is one way that people withhold information and charge a fee for it, by calling it a chore.

And it is a chore, if you actually do it. However, in the case of mortgages, lots of banks charged that fee for that chore and then never actually did the chore– they kept terrible accounts of the mortgages, and when they started to default in large numbers, started illegally pretending their papers were in order, through “robo-signing,” in order to foreclose quickly.

Here’s something you can do if you have a mortgage. Demand to see your mortgage note. It turns out there’s a legal way for you to ask your bank to trace the ownership of your mortgage through the securitization system, and you can do it for fun, you don’t need to be late on your mortgage payments or anything.

There does seem to be a risk associated with asking to see your note, however, namely to your credit score, which is bullshit. There’s also a form letter of complaint if your bank somehow doesn’t come up with the answer.

Categories: #OWS, finance

Economist versus quant

December 28, 2011 Cathy O'Neil, mathbabe 10 comments

There’s an uneasy relationship between economists and quants. Part of this stems from the fact that each discounts what the other is really good at.

Namely, quants are good at modeling, whereas economists generally are not (I’m sure there are exceptions to this rule, so my apologies to those economists who are excellent modelers); they either oversimplify to the point of uselessness, or they add terms to their models until everything works but by then the models could predict anything. Their worst data scientist flaw, however, is the confidence they have, and that they project, in their overfit models. Please see this post for examples of that overconfidence.

On the other hand, economists are good at big-picture thinking, and are really really good at politics and influence, whereas most quants are incapable of those things, partly because quants are hyper aware of what they don’t know (which makes them good modelers), and partly because they are huge nerds (apologies to those quants who have perspective and can schmooze).

Economists run the Fed, they suggest policy to politicians, and generally speaking nobody else has a plan so they get heard. The sideline show of the two different schools of mainstream economics constantly at war with each other doesn’t lend credence to their profession (in fact I consider it a false dichotomy altogether) but again, who else has the balls and the influence to make a political suggestion? Not quants. They basically wait for the system to be set up and then figure out how to profit.

I’m not suggesting that they team up so that economists can teach quants how to influence people more. That would be really scary. However, it would be nice to team up so that the underlying economic model is either reasonably adjusted to the data, or discarded, and where the confidence of the model’s predictions is better known.

To that end, Cosma Shalizi is already hard at work.

Generally speaking, economic models are ripe for an overhaul. Let’s get open source modeling set up, there’s no time to lose. For example, in the name of opening up the Fed, I’d love to see their unemployment prediction model be released to the public, along with the data used to train it, and along with a metric of success that we can use to compare it to other unemployment models.

Categories: data science, finance, hedge funds

Is Stop, Question and Frisk racist?

December 27, 2011 Cathy O'Neil, mathbabe 8 comments

A few weeks ago I was a “data wrangler” at the first Data Without Borders datadive weekend. My group of volunteer data scientists was exploring the NYPD “Stop, Question and Frisk” data from the previous few years. I blogged about it here and here.

One thing we were interested in exploring was the extent to which this policy, whereby people can be stopped, questioned, and frisked for merely looking suspicious (to the cops) is racist. This is what I said in my second post:

We read Gelman, Fagan and Kiss’s article about using the Stop and Frisk data to understand racial profiling, with the idea that we could test it out on more data or modify their methodology to slightly change the goal. However, they used crime statistics data that we don’t have and can’t find and which are essential to a good study.

As an example of how crucial crime data like this is, if you hear the statement, “10% of the people living in this community are black but 50% of the people stopped and frisked are black,” it sounds pretty damning, but if you add “50% of crimes are committed by blacks” then it sound less so. We need that data for the purpose of analysis.

Why is crime statistics data so hard to find? If you go to NYPD’s site and search for crime statistics, you get really very little information, which is not broken down by area (never mind x and y coordinates) or ethnicity. That stuff should be publicly available. In any case it’s interesting that the Stop and Frisk data is but the crime stats data isn’t.

I still think it is outrageous that we don’t have open source crime statistics in New York, where Bloomberg claims to be such a friend to data and to openness.

And I also still think that, in order to prove racism in the strict sense of the above discussion, we need that data.

However, my overall opinion has changed about whether we have enough data already to say if this policy is broadly racist. It is. My mind changed reading this article from the New York Times a couple of weeks ago. It was written by a young black man from New York, describing his experiences first-hand being stopped, questioned, and frisked. The entire article is excellently written and you should take a look; here’s an excerpt:

For young people in my neighborhood, getting stopped and frisked is a rite of passage. We expect the police to jump us at any moment. We know the rules: don’t run and don’t try to explain, because speaking up for yourself might get you arrested or worse. And we all feel the same way — degraded, harassed, violated and criminalized because we’re black or Latino. Have I been stopped more than the average young black person? I don’t know, but I look like a zillion other people on the street. And we’re all just trying to live our lives.

The argument for this policy is that it improves crime statistics. For some people, especially if they aren’t young and aren’t constant targets of the policy, it’s probably a price worth paying to live in a less crime-ridden area.

And we all want there to be less crime, of course, but what we really want is something even more fundamental, which is a high quality of life. Part of that is not being victimized by crooks, but another part of that is not being (singled out and) victimized by authority either.

I think a good thought experiment is to consider how they could make the policy colorblind. One obvious way is to have cops in every neighborhood performing stop, question and frisk to random people. The argument against this is, of course, that we don’t have enough cops or enough money to do something like that.

Instead, to be more realistic about resources, we could have groups of cops randomly be assigned to neighborhoods on a given day for such stops. If you think the policy is such a good crime deterrent, than you can even weight the probability of a given neighborhood by the crime rate in that neighborhood. (As an aside, I would love to see whether there’s statistically significant reason to believe that this policy does, in fact, deter crime. So often mayors and policies take credit for lowered crime rates in a given city when in fact crime rates are going down all over the country in a kind of seasonality way.) So in this model the cops are more likely to land in a high-crime area, but eventually by the laws of statistics they will visit every neighborhood.

My guess is that, the very first time the Upper East Side is chosen randomly, and a white hedge fund manager is stopped, questioned, and frisked by a cop, who takes away his key and enters his apartment, terrorizing his family while he’s handcuffed in the back of a cop car, is the very last day this policy is in place.

Categories: data science

A good data scientist is hard to find

December 26, 2011 Cathy O'Neil, mathbabe 17 comments

As a data scientist at an internet start-up, I am something of a quantitative handyman. I go where there is need for quantitative thinking. Since the business model of my company is super quantitative, this means I have lots of work. I have recently categorized the kind of things I do into 4 bins:

I visualize data for business people to digest. This is a kind of fancy data science-y way of saying I design reports. It’s actually a hugely critical part of the business, since our clients are less quantitative than we are and need to feel like they understand the situation, so clear, honest, and easily digestible visuals is a priority.
I forecast behavior using models. This means I forecast what users on a website will do, based on their attributes and historical precedent for what people who shared their attributes did in the past, and I also do things like stress test the business itself, in order to answer questions like, what would happen to our revenue stream if one of our advertisers jumped out of the auction?
I measure. This is where the old-school statistics comes in, in deciding whether things are statistically significant and what is our confidence interval. It’s related to reporting as well, but it’s a separate task.
I help decide whether business ideas are quantitatively reasonable. Will there be enough data to answer this question? How long will we need to collect data to have a statistically significant answer to that? This is kind of like being a McKinsey consultant on data steroids.

So why is it so hard to find a good data scientist?

Here’s why. Most data scientists don’t really think that 3 and 4 above are their job. It is far less sexy to try to honestly find the confidence interval of a prediction than it is to model behavior. Data scientists are considered magical when they forecast behavior that was hitherto unknown, and they are considered total downers when they tell their CEO, hey there’s just not enough data to start that business you want to start, or hey this data is actually really fat-tailed and our confidence intervals suck.

In other words, it’s something like what the head of risk management had to face at a big bank taking risks in 2007. There’s a responsibility to warn people that too much confidence in the models is bad, but then there’s the political reality of the situation, where you just want to be liked and you don’t actually have the power to stop the relevant decisions anyway. And there’s the added issue in a start-up that they are your models, and you want them to be liked (and to be invincible).

It’s far easier to focus on visualizing and modeling, or to stay even sexier and more mystical, just modeling itself, and let the business make decisions that could ultimately not work out, or act on data that’s pure noise.

How do you select for a good data scientist? Look for one that speaks clearly, directly, and emphasizes skepticism. Look for one that is ready to vent about how people trust models too much, and also someone who’s pushy enough to speak up at a meeting and be that annoying person who holds people back from drinking too much kool-aid.

Categories: data science

Steam queen

December 24, 2011 Cathy O'Neil, mathbabe 9 comments

I’m writing today for all those people who are about to receive, or who have just received, fancy espresso machines with milk steaming functionality but who have no idea how to actually steam milk. You too can become a steam queen (or king)! A little background first.

When I wrote earlier about my friend the coffee douche, I mentioned that in high school I worked as a barista at Coffee Connection in Lexington center (Massachusetts). At the time I was considered kind of fancy (or at least I considered myself kind of fancy) because I knew the difference between a cappuccino (just foam) and a cafe au lait (foam and steamed milk). Just to be clear, there was no such thing back then as a “latte”, and the sizes were, “small”, “medium”, and “large,” and nobody used the word “barista”.

It was a pretty repetitive job, but I liked it. I liked the hustle and bustle and meeting all the strange people who would be grumpy or friendly, who would talk to me like a human or order me about. I got to know lots of people that way whom I never would have met otherwise. I enjoyed explaining the different roasts and beans, and asking people about their tastes to try to match it their coffee beans. I was a kind of coffee douchy matchmaker.

To make things more interesting for myself and the customers, I’d compute people’s bills in my head, to the penny. Massachusetts sales tax back then was 5% so it’s not as hard as it sounds (now it’s 6.25%, what a pain). As long as you can add things up in your head, and know the cutoffs for rounding that the cash register uses, then it’s a piece of cake. I did enjoy every now and then telling people that if they bought their two cappuccinos separately, instead of together, then they’d save a penny. If they asked me how I figured it out I’d say, after all it is a step function, so it stands to reason!

I also enjoyed the manual labor of it, and on a wet day, when the light grey stone floor would get filthy with mud people had tracked in, I enjoyed mopping clean it after everyone had left, listening to Allman Brothers, Tracy Chapman, and Sinead O’Connor mixed tapes really loud.

Now to the point of this post: I got really good at steaming milk. In fact I formally designated myself the Steam Queen of Coffee Connection, and as far as I know nobody has ever challenged me on that. Let me tell you the secret to awesome steamed milk. It’s essentially an interplay between fat content and temperature.

First, use really cold milk, and please don’t let it be skim. Yes, we all think you look really good in your size 2 leather pants, but if you want steamed milk with those leather pants then you should just go for lowfat and spend an extra hour at the gym or something (or whatever it is you do). Because the crucial yumminess of excellently steamed milk bubbles is, you guessed it, butter fat.

If you can go with whole milk, you won’t even need these instructions because whole milk will practically steam itself if left near a steaming apparatus. Come to think of it, steaming half and half should probably be left to small children exclusively as an ego booster.

So I hope I’ve made my point: lowfat milk, at least, and super cold. Now put it into one of those silver cans. For some reason you really do need a silver metallic can, it doesn’t work as well with ceramic cups, probably because you are less aware of the internal temperature. Fill it up between halfway and two third of the way. So, like 60% full. The process of steaming will expand it to be full.

They key is that it’s easier to make excellent bubbles when the milk is cold. So do that first: put the steaming nozzle, which is on full blast, just below the surface of the milk, as high as possible without it spitting milk out of your can. You want to create a hydrodynamical feedback loop, where the milk is rotating below and around the nozzle tip, luxuriating in its steaming process. As the milk get steamed, it expands, so be sure to lower the can to keep the nozzle just below the top.

You want small bubbles, but don’t worry about a few large bubbles, we will deal with them later. Focus on creating that feedback loop, until the expansion is done, and the can is full.

A huge mistake I commonly see is that people think they’re done at this point. You’re not done! The milk below the bubbles is still relatively cold, and nobody wants to drink a cold latte. This is the time to put the nozzle to the very bottom of the can and use your fingers on the can to determine when the milk is sufficiently hot. People are way too impatient at this stage. Wait til it’s hot (by the way, contrary to the advice you may receive from various sources, you don’t need a thermometer for this if you can use your sense of touch)!

Finally, one more thing. Take out the nozzle and let the can sit for between 30 seconds and 2 minutes next to the coffee machine, and in the meantime get the coffee cup and espresso ready. When everything is ready, pick up the can of steamed milk an inch, and drop it to the counter once, firmly. This pops the big bubbles that haven’t popped themselves, and leaves you only delicious little scrumptious bubbles for your delicious latte. Yum!

Categories: rant

Crappy modeling

December 23, 2011 Cathy O'Neil, mathbabe 3 comments

I’m here to vent today about crappy modeling I’m seeing in the world of finance. First up is the 14% corrections of home sales from 2007 to 2010 we’ve been seeing from the National Association of Realtors. From the New York Times article we see the following graph:

Supposedly the reason their models went so wrong was that they assumed a bunch of people were selling their houses without real estate agents. But isn’t this something they can check? I’m afraid it doesn’t pass the smell test, especially because it went on for so long and because it worked in their favor, in that the market didn’t seem as bad as it actually was. In other words, they had a reason not to update their model.

Here’s the next on our list, namely unemployment projections from the Fed versus actual unemployment figures, brought to us by Rortybomb:

Again we see outrageously bad modeling, which is always and consistently biased towards good news. Is this better than having no model at all? What kind of model is this biased? At the very least can you shorten your projection lengths to make up for how inaccurate they are, kind of like how weather forecasters don’t predict out past a week?

Finally, I’d like to throw in one last modeling complaint, namely about weekly unemployment filings. It seems to me that every December for the past few years, the projected unemployment filings have been “surprising” economists with how low they are, after seasonal adjustment.

First, seasonal adjustment is a weird thing, and was the subject of one of my earliest posts. We effectively don’t really know what the numbers mean after they are seasonally adjusted. But even so, we can imagine a bit: they look at past years and see how much the filings typically dip or surge (sadly it looks like they typically surge) at the end of the year, and assume the same kind of dip or surge should happen this year.

Here’s my thing. The fact that the filings surge less than expected the fourth or fifth year of an economic slump shouldn’t surprise anyone. These are real people, losing their real jobs with real consequences, right before Christmas. If you were a boss, wouldn’t you have made sure to have already fired someone in the early Fall or be willing to wait til after the holidays, especially when they know the chances of getting hired again quickly are very slim? Bosses are people too. I do not have statistically significant evidence for this by the way, just a theory.

Categories: data science, finance

Need your vote

December 22, 2011 Cathy O'Neil, mathbabe 1 comment

Footnoted needs your vote on the most outrageous handout to executives in 2011. Here are the candidates:

MF Global agreeing to pay then-CEO Jon Corzine a $1.5 million retention bonus months before the company imploded.
Clear Channel Media Holdings paying $3 million a year to a company controlled by Bob Pittman so that Pittman can fly in a Mystere Falcon 900 that Pittman owns for both business and personal use.
Leo Apotheker collecting around $25 million in severance and other benefits from Hewlett-Packard, including relocation back to France or Belgium after less than a year on the job.
IBM’s outgoing CEO Samuel Palmisano becoming eligible for as much as $170 million in retirement benefits, just by waiting until he was past 60 to announce his retirement.
Nabors Industries agreeing to pay outgoing CEO Eugene Isenberg $100 million in severance on his way out the door.

Vote here.

Also, please read a letter to Jamie Dimon that I enjoyed, from the Reformed Broker blog. From the letter:

So please, do us all a favor and come to the realization that the loathing you feel from your fellow Americans has nothing to do with your “success” or your “wealth” and it has everything to do with the fact that your wealth and success have come at a cost to the rest of us. No one wants your money or opportunities, what they want is the same chance that their parents had to attain these things for themselves. You are viewed, and rightfully so, as part of the machine that has removed this chance for many – and that is what they hate.

Categories: #OWS, finance, news

Why work?

December 21, 2011 Cathy O'Neil, mathbabe 10 comments

In his recent Vanity Fair article, Joseph Stiglitz puts forth the following theory about why the Great Depression was inevitable (and in particular wouldn’t have been prevented by the Fed loosening monetary policy). Namely, that our society was transitioning from an agrarian society to something else- which turned out to be a manufacturing society, kicked off in earnest at the beginning of World War II. He goes on to say that we are now going through another great transition, from manufacturing to something else- he calls it service. And he also says there’s no way monetary policy will fix this trauma either- we need to invest heavily in infrastructure in order to prepare ourselves for the coming service society we will be.

Take a few steps back, and we see this picture: a hundred years ago we got so efficient at farming that we didn’t need everyone to farm to be well fed. Then we figured out how to make things so efficiently that we don’t need to worry about having enough stuff. So now, what are we all working for exactly? If service means we take care of each other (medical stuff) and we educate each other, that is fine, but not everyone is a doctor or a teacher. If service means we spend all our time making video games and entertaining each other, than it seems like we need to rethink this plan.

There are two essays I’ve read about the nature of this change that I think will help us rethink work and how our society values work and how it doles it out. First, there’s this highbrow essay on the language of work. From the essay:

Work deploys a network of techniques and effects that make it seem inevitable and, where possible, pleasurable. Central among these effects is the diffusion of responsibility for the baseline need to work: everyone accepts, because everyone knows, that everyone must have a job. Bosses as much as subordinates are slaves to the larger servomechanisms of work. In effect, work is the largest self-regulation system the universe has so far manufactured, subjecting each of us to a panopticon under which we dare not do anything but work, or at least seem to be working, lest we fall prey to a disapproval all the more powerful for its obscurity. The work idea functions in the same manner as a visible surveillance camera, which need not even be hooked up to anything. No, let’s go further: there need not even be a camera. Like the prisoners in the perfected version of Bentham’s utilitarian jail, workers need no overseer because they watch themselves. When we submit to work, we are guard and guarded at once.

What is less clear is why we put up with this demand-structure of a workplace, why we don’t resist more robustly. As Max Weber noted in his analysis of leadership under capitalism, any ideology must, if it is to succeed, give people reasons to act. It must offer a narrative of identity to those caught within its ambit, otherwise they will not continue to perform, and renew, its reality. As with most truly successful ideologies, the work idea latches on to a very basic feature of human existence: our character as social animals forever competing for relative advantage.

The author Mark Kingwell makes a pretty convincing case that people have bought into work just as they buy into other cultural norms. It underlines the real audacity of the #Occupy Wall Streeters who dared to do something with their time than be baristas at Starbucks.

Paired with the Stiglitz view of our culture and its future, though, it makes me think about the extent to which we’ve synthesized work. Mark Kingwell points out that one of the major outputs of workplaces is more work, a kind of purely synthetic made-up idea which we all need to believe in as long as we are all convinced about this work-as-cultural imperative.

The quintessential example of work-creating-work comes from finance, of course, where there isn’t even really a product at the end of the day. It’s essentially all completely made up, pushing around numbers on a spreadsheet.

What happens when people question this industry and its associated maniacal belief in work as moral? I say “maniacal” based on the number of hours people put in at most financial firms, sacrificing their families and even their internal lives, not to mention their associated martyred attitudes at having worked so hard.

This article from Bloomberg addresses the issue indirectly. In it, Richard Sennett talks about what bonds people to their colleagues and their workplace. He compares manufacturing jobs in 1970’s Boston to the recent financial services industry, and notes that people nowadays in finance have no loyalty to each other or to their workplace, and also have very little respect for the bosses. He blames this on unthoughtful hierarchical structures and the fact that bosses are essentially incompetent and everybody knows it. He concludes his article as follows:

These employees were relentless judges of their bosses, always on the lookout for details of conversation or behavior to suggest that the executives didn’t deserve their powers and perks. Such vigilance naturally weakened the bosses’ earned authority. And it didn’t make the people judging feel good about themselves either, as they were stuck in the relationship. On the contrary, it was more likely to be embittering than a cause for secret satisfaction.

Even for those workers who have recovered quickly, the crash isn’t something they are likely to forget. The front office may want to get back as quickly as possible to the old regime, to business as usual, but lower down the institutional ladder, people seem to feel that during the long boom something was missing in their lives: the connections and bonds forged at work.

Although those are fine reasons to dismiss loyalty, as I know from experience, I’d like to suggest another reason we are seeing so much disloyalty, namely that people see through the meaningless of their job, and are wondering why the system has even been set up this way in the first place.

In other words, I don’t think a better hierarchy and super smart bosses in finance is going to make back office people gung-ho. I think that the credit crisis has clearly exposed what people already suspected, namely that they are working hard but not accomplishing much. If we want people to feel fulfilled, wouldn’t it make more sense to work less, and spend more time off with their families and their thoughts? Could we as a society imagine something like that?

Categories: finance, news

How to challenge the SEC

December 20, 2011 Cathy O'Neil, mathbabe 3 comments

This is a guest post by Aaron Abrams and Zeph Landau.

No, not the Securities and Exchange Commission. We are talking about the Southeastern Conference, a collection of 12 (or so) college football teams.

College football is a mess. Depending on your point of view, you can take it to task for many reasons: universities exploiting student athletes, student athletes not getting an education, student athletes getting special treatment, money corrupting everybody’s morals . . .

Putting those issues aside, however, there is virtually unanimous discontent with an aspect of the sport that is very quantitative, namely, how the season ends. Fans hate it, coaches hate it, players hate it, and there is a substantial controversy almost every single year. Only a few people who make lots of money off the current system seem to like it. (Never mind that anyone with a brain thinks there should be a playoff … perhaps that’s the subject of another post.)

Here’s how it works. There are roughly 120 major college football teams and each team plays 12 or 13 games in the fall. Almost all the teams belong to one of eleven conferences — these are like regional leagues — and most of the games they play are against other teams in their conference. (How they schedule their out-of-conference games is an interesting issue that we may write about another day.) At the end of the season, teams are invited to play in bowl games: games hosted at big stadiums with names like the Rose Bowl, the Orange Bowl, etc., that have long traditions. The problem centers around how teams are chosen to compete in these bowl games.

Basically, a cartel comprised of six of the eleven conferences (those that historically have been the strongest, including the SEC) created a system, called the BCS, that favors their own teams to get into the 5 most prestigious (and lucrative) bowl games, including the so called “national championship game” that claims to feature the top two teams in the country. The prestige gained by the 10 teams that compete in these games is matched by large amounts of money, coming mainly from television contracts and ticket sales. For each of these teams, we are talking about 10-20 million dollars that goes to a combination of the team and their conference. This is not a paltry sum for schools facing major budget cuts.

The most blatant problem with this system is that is it literally unfair: the rules of the system are written in such a way that at the beginning of the season, before any games are played and regardless of how good the teams are, the teams from the six “BCS conferences” have a better chance of getting into one of the major bowls than a team from a non-BCS conference. (You can read the rules, but notably, each of the conference champions of the six BCS conferences automatically gets to play in a BCS bowl game; whereas the other conference champions only get to play in a BCS bowl game if various other conditions are met, like they’re highly ranked in the polls).

There are lots of other problems, too, but we’re not really going to talk about those. For instance the method for choosing the top two teams (which is based on both human and computer polls) is deeply and fundamentally flawed. These are the teams that play for the “BCS championship”, so it matters who the top two are. But again, that’s the subject of a different post.

Back to the inherent unfairness. Colleges in the non-BCS conferences are well aware of this situation. Led by Tulane, they filed a lawsuit several years ago and essentially won; the rules used to be even worse before that. But in the face of the continued lack of fairness, colleges from non-BCS conferences have lately taken to responding by trying to get into the BCS conferences, jockeying for opportunity at big money. It has gotten so bad that a BCS conference called the Big East now includes teams from Idaho and California. Realignment has caused the Big 12 to have only 10 teams, while the Big 10 has 12.

But realignment takes a lot of work to pull off, and it only benefits the teams that get into the major conferences. The minor conferences themselves are still left behind. So here is a better idea. If you’re a non-BCS conference, do what any good red-blooded american corporation would do: find a loophole.

Here is one we thought of.

The current rules force the BCS to choose a team from one of the 5 non-BCS conferences if:

(a) a team has won its conference AND is ranked in the top 12, or

(b) a team has won its conference AND is ranked in the top 16 AND is ranked higher than the conference champion from one of the BCS conferences.

What they don’t say is what it means for a team to “win its conference.” Some conferences determine their champion by overall record, and others have a championship game to decide the champion. This is the chink in the armor.

This year two interesting things happened: (1) in the Western Athletic Conference (WAC), Boise State ended up ranked #7, but did not win their conference. The WAC doesn’t have its own championship game, and the conference winner was TCU by virtue of beating Boise State in a game in the middle of the season. However, TCU also lost a game to a team outside their conference, and they ended up ranked only #18. As a result, neither team satisfied condition (a) or (b) above.

And (2) in Conference USA, Houston was undefeated and ranked #6 in the country before the final game of the season, when it lost its conference championship game to USM. The loss dropped Houston to #19 in the rankings, whereas USM, the conference champion, finished with a final ranking of #21. Thus neither of these teams met (a) or (b), either.

Here is what we noticed: if the Western Athletic Conference had a conference championship game, then either Boise St would have won it, been declared the champion, and qualified for a BCS bowl, or else TCU would have won it and would almost surely have ended up with a high enough rank to qualify for a BCS bowl. (As it was, TCU finished the season at #18, but one more victory against a top ten team would very likely have gained them at least two spots. This would have been good enough for (b) to apply, since the champion of the Big East (a BCS conference) was West Virginia, who finished ranked #23.)

On the other hand, if Conference USA hadn’t had a championship game, Houston would have been declared the conference champion (by virtue of being undefeated before the championship game) and they would easily have been ranked highly enough to get into a BCS bowl. Indeed, it has been estimated that their loss to USM cost Conference USA $17 million.

So, what should these non-BCS conferences do? Hold a conference championship game . . . if, and only if, it benefits them. They can decide this during the last week of the season. This year, with nothing to lose and plenty to gain, the WAC would clearly have chosen to have a championship game. With nothing to gain and plenty to lose, Conference USA would have chosen not to.

Bingo. Loophole. 17 million big ones. Cha ching.

Categories: data science, guest post, news

Bloomberg engineering competition goes to Cornell

December 19, 2011 Cathy O'Neil, mathbabe Comments off

This just in. Pretty surprising considering we were supposed to hear the results January 15th. I wrote about this here and here.

I wonder what Columbia is going to do with their plans?? I guess there may be two winners, so still exciting.

Categories: data science, news

A rising tide lifts which boats?

December 19, 2011 Cathy O'Neil, mathbabe 3 comments

My friend Jordan Ellenberg has a really excellent blog post over at Quomodocumque, which is one of my favorite blogs in that it combines hard-core math nerdiness with funny observations about how much the Baltimore Orioles stink (among other things).

In his post he talks about an anti-#OWS article called “The Occupy movement has it all wrong”, by Larry Kaufman, recently published in a Madison, WI newspaper called Isthmus.

Specifically, in that article, Kaufman tries to use the old saying “a rising tide lifts all boats” to argue that most people (in fact, 81% of them) are better off than their parents were. What’s awesome about Jordan is that he goes to the source, a Scott Winship article, and susses out the extent to which that figure is true. Turns out it’s kind of true with a certain way of weighting numbers depending on how many kids there are in the family and because so many women have started working in the past 40 years. Jordan’s conclusion:

So yes: almost all present adults have more money than their parents did. And how did they accomplish this? By having one or two kids instead of three or four, and by sending both parents to work outside the home. Now it can’t be denied that a society in which most familes have two income-earning parents, and the business-hours care of young children is outsourced to daycare and preschool, is more productive from the economic point of view. And I, who grew up with a single sibling and two working parents and went to plenty of preschool, find it downright wholesome. But it is not the kind of development political conservatives typically celebrate.

Another thing that Jordan tears apart from the article is that the original source specifically pointed out stuff that Kaufman seems to have missed, given his political agenda:

Winship also emphasizes the finding that children in Canada and Western Europe have an easier time moving out of poverty than Americans do. This part is absent from Kaufmann’s piece. Maybe he didn’t have the space. Maybe it’s because a comparison with higher-tax economies would make some trouble for his confident conclusion: “the punitive redistribution policies favored by Occupy Madison will divert capital away from productive initiatives that enhance growth and earnings opportunities for all, while doing nothing to build the stable families and “bottom-up” capabilities that are particularly important for helping the poorest Americans escape poverty.”

When the Isthmus is running a more doctrinaire GOP line on poverty than the National Review, the alternative press has arrived at a very strange place indeed.

Go, Jordan!

Let’s go back to that phrase “a rising tide lifts all boats”. It was the basis of Kaufman’s argument, and as Jordan points out was a pretty weak basis, in that the lift was arguable gotten only through sacrifice. But my question is, is that a valid argument to make anyway?

Let’s examine this metaphor a bit. When we think about it positively, and imagine something like the housing bubble which elevated many people’s net worth (ignoring the people who weren’t home owners at all during that time), we can see why “a rising tide lifts all boats” is a good thing: we want the generic imaginary person to do well, and we’re all happy for them to do well.

However, if we turn that phrase around in a negative moment, it’s really not clearly a good perspective. Let’s try it: “an ebbing tide lowers all boats”. Take the example of a housing crash analogously to the above. Firstly it’s not true, since for those people who couldn’t afford housing in the bubble, a more reasonable housing market is a good thing (for some reason people keep forgetting this). Secondly, when we are thinking about lowered boats we worry about those people whose boats are lowered. Who are those people? How much have they lost? Will they be okay?

The answers are, they are the people who were barely able to own the house in the good times. They’ve lost everything. They aren’t okay.

It’s a nearly vapid phrase when you think about it, but it’s used by conservatives a lot to justify policies that only work well in good times.

I’d argue that the real question we should be asking isn’t whether we are all sailing away in boats but how much risk we take on as individuals. I will go into this further in another post, but the gist is that, instead of the unit of measurement being assumed to be dollars, I’d like to reframe the concept of economic health in terms of a unit of risk. Risk is harder to measure than dollars, and there are lots of different kinds of risk, but even so it’s a worthy exercise.

For example, in the housing boom we had people who could barely afford a house get into ridiculous mortgage contracts, with resetting usurious interest rates. They were taking on enormous amounts of risk, in this case risk of being foreclosed on and losing their home. By contrast, people who were well-off at the start of the housing boom are for the most part still well off. There was very little risk for them.

I’d like to offer up an alternative phrase which would capture the risk perspective. Something like, “we should make sure everyone’s boats are water tight and firmly moored to the pier”. Not nearly as catchy, I know. But to make for it I’m linking to this related video called I’m On a Boat. I’ve actually been looking for excuses to link to this for a while. Here’s a kind of awesome picture from the video:

Categories: #OWS, finance, news

Bloomberg engineering competition gets exciting

December 18, 2011 Cathy O'Neil, mathbabe 2 comments

Stanford has bowed out of the Bloomberg administration’s competition for an engineering center in New York City. From the New York Times article:

Stanford University abruptly dropped out of the intense international competition to build an innovative science graduate school in New York City, releasing its decision on Friday afternoon. A short time later, its main rival in the contest, Cornell, announced a $350 million gift — the largest in its history — to underwrite its bid.

From what I’d heard, Stanford was the expected winner, with Cornell being a second place. This changes things, and potentially means that Columbia’s plan for a Data Science and Engineering Institute is still a possibility.

Cool and exciting, because I want that place to be really really good.

It also seems like the open data situation in New York is good and getting better. From the NYC Open Data website:

This catalog supplies hundreds of sets of public data produced by City agencies and other City organizations. The data sets are now available as APIs and in a variety of machine-readable formats, making it easier than ever to consume City data and better serve New York City’s residents, visitors, developer community and all!

Maybe New York will be a role model for good, balancing its reputation as the center of financial shenanigans.

Categories: data science, open source tools

What up, New York Times? (#OWS)

December 17, 2011 Cathy O'Neil, mathbabe 3 comments

There have been people complaining about the #OWS coverage by the New York Times, saying that it’s dismissive and slanted, generally not reporting enough and, when it does report, looking at things from the perspective of the Bloomberg administration.

I have tried to reserve judgment, although I did notice that the day of the 2-month anniversary of the occupation (a few days after Bloomberg cleared the park), where there were lots of actions and the big march, the NYT didn’t seem to cover anything in the morning at all, whereas the WSJ had live coverage of the hundreds of people trying to close down the exchange and disrupt the morning bell.

And I’m not sure if it’s the reporters or the editors who are responsible for the slanted coverage. It’s sometimes hard to tell.

Except sometimes. Here’s an article about Occupy Frankfurt from two days ago, in which a peaceful protest with a supportive police force is described:

“If all demonstrations went so well we wouldn’t have much to do,” said Michael Jenisch, a spokesman for the Frankfurt Ordnungsamt, or Office of Public Order, which issues permits for public gatherings and has been monitoring the Occupy Frankfurt encampment.

“If they have the staying power, they can camp there all winter,” Mr. Jenisch said. That attitude contrasts with that of the authorities in cities like New York, Oakland or Boston, where the police have evicted protesters from public space, and also with other financial centers in Europe.

That’s all fine, but here’s where I find the coverage outrageous. The article was not on the virtual front page; instead there was a link to the article from the front page, and the teaser line was:

Unlike at other Occupy sites, the Frankfurt protesters are being careful to make their points without inciting police interference.

What? Seriously??

I can’t tell you how often I was down at Zucotti, wondering why there were so many cops there, wasting our tax payer money, when the protesters were so incredibly peaceful. Who incited police interference? Was it the sleeping protesters in tents?

The message is not for protesters, on how to incite police aggression. The message here is for American cops, on how to deal with peaceful protesters. New York Times editors, did you even read your own article?

Categories: #OWS, news

Where is Volcker’s letter? (#OWS)

December 16, 2011 Cathy O'Neil, mathbabe 1 comment

At the Alternative Banking working group we are working on publicly commenting on the proposed Volcker Rule. Check out this blog post which addresses the exemption for repos. Keeping in mind that repos brought down MF Global a few weeks ago, this is a hot topic.

Here’s another hot topic, at least to me. Who has a copy of Volcker’s original 3-page letter? The published rumor has it that he wrote a 3-page letter to Obama outlining the goal of the regulation, but I can’t find it anywhere. I do have this quote from Volcker about the proposed 550 page behemoth (taken from a New York Times article):

“I’d write a much simpler bill. I’d love to see a four-page bill that bans proprietary trading and makes the board and chief executive responsible for compliance. And I’d have strong regulators. If the banks didn’t comply with the spirit of the bill, they’d go after them.” – Volcker

Also from the New York Times, a column of Simon Johnson’s on the Volcker rule and what it’s missing.

If anyone knows how to get their hands on the original letter, please tell me, I’d really love to see it. Maybe someone knows Volcker and can just ask him for a copy?

Categories: #OWS, finance, news

Privacy vs. openness

December 15, 2011 Cathy O'Neil, mathbabe 6 comments

I believe in privacy rights when it comes to modern technology and data science models, especially when the models are dealing with things like email or health records. It makes sense, for example, that the data itself is not made public when researchers study diseases and treatments.

Andrew Gelman’s blog post recently brought this up, and clued me into the rules of sharing data coming from the Institutional Review Board (IRB).

The IRB rules deal with questions like whether the study participants have agreed to let their data be shared if the data is first anonymized. But the crucial question is whether it’s really possible to anonymize data at all.

It turns out it’s not that easy, especially if the database is large. There have been famous cases (Netflix prize) where people have been identified even though the data was “anonymized.”

On the other hand, we don’t want people creating and running with secret models with the excuse that they are protecting people’s privacy. First, because the models may not work: we want scientific claims to be substantiated by retesting, for example (this was the point of Gelman’s post). But also we generally want a view into how people are using personal information about people.

Most modeling going on nowadays involving personal information is probably not fueled by academic interest in curing diseases, but rather how to sell stuff and how to monitor people.

As two examples, this Bloomberg article describes how annoyed people get when they are being tracked while they’re shopping in malls, even though the actual intrusiveness of the tracking is arguably much worse when people shop online, and this Wall Street Journal article describes the usage of French surveillance systems in the Gadhafi regime.

I think we should separate two issues here, namely the model versus the data. In the cases of public surveillance, like at the mall or online, or something involving public employees, I think people should be able to see how their data is being used even if the entire database is being kept out of their view. This way nobody can say their privacy is being invaded.

For example, if the public school system uses data from students and teachers to score the value added of teaching, then the teachers should have access to the model being used to score them. In particular this would mean they’d be able to see how their score would have changed if certain of their attributes changed, like which school they teach at or how many kids are in their class.

It is unlikely that private companies would be happy to expose the models they use to sell merchandise or clicks. If private companies don’t want to reveal their secret sauce, then one possibility is to make their modeling opt-in (rather than opt-out). By the way, right now you can opt out of most things online by consistently clearing your cookies.

I am being pretty extreme here in my suggestions, but even if we don’t go this far, I think it’s clear that we will have to consider these questions and many more questions like this soon. The idea that the online data modeling can be self-regulating is pretty laughable to me, especially when you consider how well that worked in finance. The kind of “stalker apps” that are popping up everywhere are very scary and very creepy to people who like the idea of privacy.

In the meantime we need some nerds to figure out a better way to anonymize data. Please tell me if you know of progress in that field.

Categories: data science

The sin of debt (part 2)

December 13, 2011 Cathy O'Neil, mathbabe 5 comments

I wrote a post about a month ago about the sin of debt and how, for normal people, debt carries a moral weight that isn’t present for corporations (even though corporations are people). Since then I’ve noticed some egregious examples of this phenomenon which I want to share with you.

Warning: this post contains really sickening stuff. I usually try to think of how to solve problems (I really do!) but today I’m just in awe of the problems themselves. Maybe I’ll eventually come up with a “part 3” for this post and have some good, proactive ideas (please help!).

First, there’s this article from the Wall Street Journal which discusses the fact that people are being arrested and put in jail for their credit card, auto loan, and other debt:

More than one-third of U.S. states allow borrowers who can’t or won’t pay to be jailed. Nationwide statistics aren’t known because many courts don’t keep track of warrants by alleged offense, but a tally by The Wall Street Journal earlier this year of court filings in nine counties across the U.S. showed that judges signed off on more than 5,000 such warrants since the start of 2010.

…

Some judges have criticized the use of such warrants, comparing them to a modern-day version of debtors’ prison. Ms. Madigan said she has grown increasingly concerned that borrowers sometimes are being thrown into jail without even knowing they were sued, a problem she blames on sloppy, incomplete or false paperwork submitted to courts.

Outrageous. To contrast that, let’s take a peek at the tone of a Reuters article on AMR bankruptcy filing; AMR is the parent company of American Airlines. From the article:

American plans to operate normally while in bankruptcy, but the Chapter 11 filing could punch a hole in the pensions of roughly 130,000 workers and retirees.

AMR pension plans are $10 billion short of what the carrier owes, and any default could be the largest in U.S. history, government pension insurers estimated.

Ray Neidl, aerospace analyst at Maxim Group, said a lack of progress in contract talks with pilots tipped the carrier into Chapter 11, though it has enough cash to operate. The carrier’s passenger planes average 3,000 daily U.S. departures.

“They were proactive,” Neidl said. “They should have adequate cash reserves to get through this.”

So from where I stand, it looks like this company is being applauded for its bankruptcy filing because it’s such a great opportunity to get rid of pesky pensions from its 130,000 workers and retirees. Note there’s nothing about AMR or American Airlines executives being arrested and brought to jail here.

Finally, there’s the “debt as sin” theme amplified by mourning. This Wall Street Journal article describes the practice that debt collection agencies use to harass the living relatives of people who have passed away in debt. From the article:

Debt collectors often tell surviving family members that they aren’t personally responsible for paying the debts of the deceased. But those words barely register with grieving relatives, according to interviews with a dozen lawyers who represent about 60 families pursued for money owed by dead relatives.

“Each call brought up fresh memories of my husband’s death,” Patricia Smith, 56, says about the calls she started getting last year about $1,787.04 in credit-card debt owed by her late husband, Arthur.

The debt-collection calls and letters kept coming and wore her down, says Mrs. Smith, who lives in Jackson, Miss. She agreed to scrounge together $50 a month “just to make the calls stop.”

The Wall Street Journal provides a graphic to explain why this is a growing field for debt collectors:

You might ask what the relevant regulator, the Federal Trade Commission (FTC), is doing about this practice. From the article:

Still, the agency determined the previous guidelines were ineffective and “too constricting,” Mr. Dolan says. So, in July, the agency issued a policy statement. Before the new guidelines, collectors were supposed to discuss a dead person’s debt only with the person’s spouse or someone chosen by the dead person’s estate. But Mr. Dolan said few debtors are formally designating someone to handle their affairs after death, leaving debt-collection firms unable to determine whom to contact for payment of any outstanding bills.

The FTC sought to improve the process and now allows debt-collection firms to contact anyone believed to be handling the estate, including parents, friends and neighbors. Agency officials wanted to resolve a “tension that was emerging” between state and U.S. laws on how collectors can go after money, Mr. Dolan adds. “While people might think it is horrible for collectors to speak with surviving spouses, we have no power to change that.”

FTC officials rejected requests by lawyers representing family members for an outright ban on calling surviving family members. The agency also declined to impose a cooling-off period during which relatives couldn’t be contacted by debt collectors.

Thanks, FTC! Thanks for representing the little guy, with the dead wife.

Categories: finance

Conservation Law of Money

December 12, 2011 Cathy O'Neil, mathbabe 3 comments

Being a mathematician, I’ve always been on the lookout for quantitative statements about the financial system as a whole that explain large economic phenomena. Just for the satisfaction of it all.

For example, I think it’s possible that many bubbles (dot com, Japanese stocks) can be explained simply by saying that, when normal people, rather than professional investors, are putting their money in the market, then the market is gonna go up. And moreover, when that happens, we can throw away any silly ideas we may have been harboring about price as indicator of true value; the prices are going up because the public is inflating the market with more cash money.

In other words, the opportunities for good investing doesn’t keep up with the cash flow in those moments. We could go further and say that there’s a kind of seasonality of money flow, which is dominating the market signal at times like these, just like you see in housing markets (when the housing market is functional) in the springtime when humans come out of their hibernation and feel like nesting and mating.

There are other ways for seasonality of money flow to affect the market without introducing newcomers to the market, such as inflation or deflation, where the value of the existing money itself changes. or when there’s an outside force either extracting or adding to the money supply, like the Saudi Arabia or China, although that’s more of a continual drag than a sudden jolt.

I think we may be encountering a very real “Conservation Law of Money” situation with the European debt crisis. Namely, there are all these banks that are on the prowl for cash, since they’re worried about being undercapitalized, with good reason: they are still hanging on to many toxic assets from the credit crisis, and in the meantime their enormous government bond holdings are going to pot. In the meantime Basel III, a new regulatory regime, will be in effect in 2015 and requires much more liquid, high quality assets than they currently own.

But here’s the thing, and it’s not a new observation but it’s an important one: not all of those banks can recover, unless something dramatic happens. From BusinessWeek:

“There aren’t enough assets in the world that are genuinely liquid and of high enough quality to allow all the banks to meet this ratio,” said Barbara Ridpath, chief executive officer of the International Centre for Financial Regulation, a London research group funded by banks and the U.K. government. “And that’s only likely to get worse because of the changing credit quality of some of the sovereigns.”

Mathematically speaking we have an impossibility: way more required stuff than existing stuff inside the current European system. What’s gonna give? Here’s a list of the things I can think of:

Basel III will be scrapped and Europe will live with an enormous zombie system,
the ECB will start printing money and eventually inflate the system out of insolvency,
the banking system will fight to the death over the scarce resources and thereby be massively shrinked (but probably the few surviving banks will be politically well-places too-big-to-fail behemoths),
European citizens will foot the bill through bailouts and then taxes, possibly leading to widespread civil unrest or even war, or
outsiders will step in when the price is right (China and/or the Middle East) and end up owning the European banks.

There may be others options as well (please tell me). Of the above though I guess I prefer 3, where Europe ends up with a few huge banks that are highly regulated and well-capitalized. This is the Australian model and I posted about it here. Maybe then the financial system can be allowed to be utility-bank oriented and boring and smart young people will apply their energy outside of financial engineering.

Categories: finance

Resampling

December 11, 2011 Cathy O'Neil, mathbabe 3 comments

I’m enjoying reading and learning about agile software development, which is a method of creating software in teams where people focus on short and medium term “iterations”, with the end goal in sight but without attempting to map out the entire path to that end goal. It’s an excellent idea considering how much time can be wasted by businesses in long-term planning that never gets done. And the movement has its own manifesto, which is cool.

The post I read this morning is by Mike Cohn, who seems heavily involved in the agile movement. It’s a good post, with a good idea, and I have just one nerdy pet peeve concerning it.

I’m a huge fan of stealing good ideas from financial modeling and importing them into other realms. For example, I stole the idea of stress testing of portfolios and use them in stress testing the business itself where I work, replacing scenarios like “the Dow drops 9% in a day” with things like, “one of our clients drops out of the auction.”

I’ve also stolen the idea of “resampling” in order to forecast possible future events based on past data. This is particularly useful when the data you’re handling is not normally distributed, and when you have quite a few data points.

To be more precise, say you want to anticipate what will happen over the next week (5 days) with something. You have 100 days of daily results in the past, and you think the daily results are more or less independent of each other. Then you can take 5 random days in the past and see how that “artificial week” would look if it happened again. Of course, that’s only one artificial week, and you should do that a bunch of times to get an idea of the kind of weeks you may have coming up.

If you do this 10,000 times and then draw a histogram, you have a pretty good sense of what might happen, assuming of course that the 100 days of historical data is a good representation of what can happen on a daily basis.

Here comes my pet peeve. In Mike Cohn’s blog post, he goes to the trouble of resampling to get a histogram, so a distribution of fake scenarios, but instead of really using that as a distribution, for the sake of computing a confidence interval, he only computes the average and standard deviation and then replaces the artificial distribution with a normal distribution with those parameters. From his blog:

Armed with 200 simulations of the ten sprints of the project (or ideally even more), we can now answer the question we started with, which is, How much can this team finish in ten sprints? Cells E17 and E18 of the spreadsheet show the average total work finished from the 200 simulations and the standard deviation around that work.

In this case the resampled average is 240 points (in ten sprints) with a standard deviation of 12. This means our single best guess (50/50) of how much the team can complete is 240 points. Knowing that 95% of the time the value will be within two standard deviations we know that there is a 95% chance of finishing between 240 +/- (2*12), which is 216 to 264 points.

What? This is kind of the whole point of resampling, that you could actually get a handle on non-normal distributions!

For example, let’s say in the above example, your daily numbers are skewed and fat-tailed, like a lognormal distribution or something, and say the weekly numbers are just the sum of 5 daily numbers. Then the weekly numbers will also be skewed and fat-tailed, although less so, and the best estimate of a 95% confidence interval would be to sort the scenarios and look at the 2.5th percentile scenario, the 97.5th percentile scenario and use those as endpoints of your interval.

The weakness of resampling is the possibility that the data you have isn’t representative of the future. But the strength is that you get to work with a honest-to-goodness distribution and don’t need to revert to assuming things are normally distributed.

Categories: data science, finance, internet startup

Older Entries

mathbabe

Archive

A New Year’s resolution you can keep

Matt Stoller explains politics

Information loss

Economist versus quant

Is Stop, Question and Frisk racist?

A good data scientist is hard to find

Steam queen

Crappy modeling

Need your vote

Why work?

How to challenge the SEC

Bloomberg engineering competition goes to Cornell

A rising tide lifts which boats?

Bloomberg engineering competition gets exciting

What up, New York Times? (#OWS)

Where is Volcker’s letter? (#OWS)

Privacy vs. openness

The sin of debt (part 2)

Conservation Law of Money

Resampling

Top Posts & Pages

Follow Blog via Email

Recent Posts

Meta