mathbabe

The Greek situation

June 17, 2011 Cathy O'Neil, mathbabe 1 comment

If you’re anything like me, you eat up the news on the Greek situation whenever and wherever you can. It’s like watching a slow-motion train wreck that takes years to hit. No, even better, it’s like this:

Imagine there’s a family that you know as self-absorbed, undisciplined, and indulgent, especially with their kids- they let their kids watch too much TV, they give their kids every gadget they can’t really afford, flat-screen TVs on credit, they stay up too late, eat crap food, they bribe their kids to like them, bringing them presents after every trip. It borders on neglect, for God’s sake, and it will come back to haunt them, you think to yourself. Then imagine seeing them in a crowded restaurant with their kids, older now, and utterly obnoxious and lazy and entitled, screaming at the top of their lungs that whatever the complaint is, it’s definitely not their fault, it’s their stinking parents’ fault, and why should they get a job. It’s an obnoxiously satisfying scene to watch as an exhausted parent who has been sure to feed their kids broccoli and have their kids tucked in by 9 with their homework done and their backpacks ready for school the next day.

But here’s the thing, I kind of have to side with the spoiled kids. I mean, it is the parents’ fault if they’ve completely spoiled their kids. As bratty as the kids are, you really can’t blame them on this until they are rational adults.

In summation, Greece is the European version of the Kardashians.

Here’s an article which kinds of proves my point. The politicians have spoiled the Greeks for so long, by buying votes with do-nothing government jobs, and simply ignoring the state of the deficit and anything involving money or taxes (mostly because the politicians themselves are the worst of the tax-evaders and don’t want to rock the boat), that the people living there are looking anywhere but at themselves for where the problem lies. In other words, a completely backwards-looking approach with no forwards-looking solution in mind. They are that kid at that restaurant, somewhere in late adolescence but not quite adults.

Another aspect of this crisis is the enormous disconnect between the economists and bankers on the one hand, who have absolute certainty that the banking system must be kept functional at any cost, and the actual people living in a country on the other hand, who don’t want to pay for the mistakes of the rich bankers. What makes this gulf so wide? It’s wide in any country actually, but in Greece you have the extra layer of spoiled entitlement. I’ll talk about this disconnect in my second post about working at D.E. Shaw, where I experienced it first-hand.

Update

After quite a bit of feedback (love feedback!) I’ve decided to add to this post because I think I was too glib and didn’t make my point well. First, let me be clear that I don’t think that the Greek workers are spoiled. I have a lot of compassion for the working people of Greece- especially the youth. The young people of Greece have a broken system, filled with closed guilds, high unemployment, and corrupt politicians. I am extremely empathetic to their plight and if I were them I’d be protesting in the streets too. What I mean to get across with the spoiled kid thing is that spoiling kids really is neglect and really is the fault of the authorities, and it sets up someone to fail and it gives them no tools to correct systemic mistakes. In this analogy I’m trying to point out that the political class has neglected its people and its duty to create a working system. They have done nothing for those young people, and now they are trying to make inside deals with the European bankers and don’t seem to understand why the actual working (or unemployed) people of Greece don’t see why this is a great opportunity.

Categories: finance, news

Data mining contests

June 16, 2011 Cathy O'Neil, mathbabe 2 comments

So a friend of mine came over last night and he recently became a data scientist in a New York startup too. In fact we have an eery number of things in common, although he only considered working in finance but didn’t actually go through it. It was pretty awesome to see him.

He was also pretty into the idea of this blog and making quantitative techniques more open-source and collaborative. And with that goal in mind he sent me these links:

So what do you guys say? Should we work on something together on this blog that may actually help the world/ make us some prize money? That would be filthy good.

I’m also hoping to get this guy to make a guest post on some quantitative techniques he wants to add to my list. Please comment if you have more suggestions! I will start writing about the list topics very soon.

Categories: data science, news, open source tools

The basics of quantitative modeling

June 16, 2011 Cathy O'Neil, mathbabe 4 comments

One exciting goal I have for this blog is to articulate the basic methods of quantitative modeling, followed by, hopefully, collaborative real-time examples of how this craft works out in given examples. Today I just want to outline the techniques, and in later posts I will follow up with a post which goes into more detail on one or more points.

Data cleaning: bad data (corrupt) vs. outliers (actual data which have unusual values)
In sample/ out of sample data
Predictive variables: choosing and preparing which ones and how many
Exponential down-weighting of “old” data
Remaining causal: predictive vs. descriptive modeling
Regressions: linear and multivariate with exponentially down-weighted data
Bayesian priors and how to implement them
Open source tools
When do you have enough data?
When do you have statistically significant results?
Visualizing everything
General philosophy of avoiding fitting your model to the data

For those of you reading this who know a thing or two about being a quant, please do tell me if I’ve missed something.

I can’t wait!

Categories: data science, open source tools

Working with Larry Summers (part 1)

June 15, 2011 Cathy O'Neil, mathbabe 6 comments

This post is continued here and then here.

After I had been working at D.E. Shaw for a few months, I was asked by the American Mathematic Society to write an expository article on leaving academics for finance. Here’s what I wrote. It was infinitely vetted by the legal department, and they removed a bunch of stuff- by the time they approved it I couldn’t remember why I had wanted to write it in the first place. Oh yeah, something about answering a bunch of questions that math grad students kept asking me. The one edit I refused to budge on, I remember, was that they objected to the word “rich” in the sentence “However, it is clear that if you stay in finance for long enough, and are successful, you do become rich”. They wanted change the word to “wealthy”. As if that was going to soften the blow to the poor suckers who weren’t privileged enough to work at this holy place.

Ever since it was published, I’ve wanted to write a second edition. It would go something like this (taken from a letter I wrote to a friend recently who is applying to another hedge fund):

I actually never really intended to stay in finance, it was just the only “real job” I could get with my number theory skills. In the end I decided I wanted to work at a startup and there are more internet startups than finance ones. The truth is, there are a bunch of jerks
in finance, very likely due to the amount of money floating around, and I noticed a correlation with the size/age of the company and the douchebagginess of the “leaders” of the firms. I don’t know alot about ****** but word on the street is that they are huge douchebags. On the other hand, I myself don’t regret working with douchebags for four years, because it thickened my skin quite a bit (and in particular made me realize how impotent and feeble the academic douchebags are in comparison) and made me strive for something better. Although to be honest it sometimes really sucked.

I could sum it up pretty well thus: people who are successful for a while think they know everything. People who are rich think they are always right. People who are both successful and rich are absolutely incredible douchebags. It seems like a law of nature (i.e. I can only assume that if I ever become rich and successful I will also become a douchebag. One more reason not to be wishing too hard for things like that.).

So instead I work for *pretty good* money (better than I’d have gotten in academics but not as good as at DE Shaw) and I enjoy things like oatmeal in the morning, biking to work on the bike path, my incredible adorable macho developer colleagues, a really cool hands-off boss, and a bunch of awesome karaoke-loving beer-drinking coworkers who think I have special powers since I can do math. Oh, and the possibility that someday my numerous stock options in this startup may make me a douchebag someday.

I just want to add that, of course, not everyone I worked with at D.E. Shaw is a douchebag, not even all the leaders. In fact I still have many friends from there. But it’s definitely not a random cut of the population, and I would have to believe that people in it would agree with that (and would say it’s worth it).

In part 2 of this post I will talk about what specifically made me decide to leave the hedge fund industry.

Categories: finance, hedge funds, women in math

Why “mathbabe”?

June 13, 2011 Cathy O'Neil, mathbabe 6 comments

Let me tell you a bit about my childhood.

I grew up in Lexington, MA, which is an upper-middle class liberal suburb of Boston. Most of the people I went to school with had parents that either worked at or went to Harvard or M.I.T. – it was a pretty nerdy, intellectual environment. My parents, both computer scientists, moved there for the public schools.

In spite of that, I was a hopeless, pathetic nerd. My idea of fun was practicing classical piano, watching “Amadeus” over and over again, and factoring license plate numbers in my head. When you add to that the facts that I wore glasses, braces, and was chubby, you are talking about one pathetic young nerd girl. When, you top *that* off with the fact that I went through puberty at the wrong time, you can imagine that I went through junior high wondering what everyone was smoking. Oh, and did I mention that my mom hated shopping so I was always wearing one of two bright pink stretch polyester pants? And that my personal hygiene skills were undeveloped? You get the picture.

I was lucky enough to have a best friend starting in 7th grade, who saved me from many pits of despair (although not all). But come high school, my self-esteem was pretty crappy, and the only thing I seemed to be good at, my refuge, was piano and math team.

My parents did an excellent job of not really caring about what I did for the most part, so I wasn’t at all pressured into doing math, and definitely not pressured into doing music. When I came home with an advertisement from a math camp at Hampshire College in western Massachusetts, though, my parents essentially bribed me to go. It didn’t take much convincing, I was intrigued.

Here’s where we get to the title of the post. When I got there, I quickly noticed there were 50 boys and 10 girls. And then I noticed that a bunch of these guys were kind of… cool, they were mostly from places like Stuyvesant and Bronx Science and Evanston, places I’d never heard of but which obviously placed a premium on being a math nerd. Then, this was the miracle, I noticed that these cool, sexy guys, thought I was cool and sexy. OMG, I was a math babe!

It was the first moment I had ever felt like I belonged somewhere, that I was with my peeps. I learned lots of math that first summer, and although most of the specifics kind of wore away over the following year, the feeling that I had a community never did. Actually the one thing I did really learn for good that summer was how to solve the Rubik’s cube using group theory (a subject for another post!). And I distinctly remember carrying around a Rubik’s cube like a piece of platinum my entire junior year of high school, just because it reminded me that I was, in fact, a math babe, at least in one context (although not here! not here whatsoever!).

Which reminds me! This summer, I’m very excited to be going back to the same math camp to teach as a senior staff member. Here’s the list of stuff I have prepared to teach this crop of math studs and math babes:

1) magic squares and generalizations. I just figured out how to generate all 3×3 magic squares! I love those little guys.
2) elementary number theory: fundamental theorem of arithmetic
3) cool geometry stuff like bisectors of angles and sides and all those cool theorems
4) pigeon hole principle, lots of examples
5) euler’s formula and the platonic solids
6) cool stuff with perfect numbers and non-perfect numbers
7) proof by induction, lots of examples
8) basic graph theory
9) bipartite graphs and related theorems.
10) basic ramsey theory
11) more number theory
12) farey fractions
13) continued fractions and the golden ratio

I can’t friggin wait!! Please send me more suggestions if I’m missing something that they really need to know. By the way I’m only teaching the first three weeks, because I couldn’t arrange for the whole 6- the second half they will be learning more specialized subjects from some very cool mathematicians.

Actually there’s another reason I ultimately decided to call this blog “mathbabe,” namely when I googled it, I was first of all offended that the name wasn’t already taken by some other woman math nerd who posting about cool stuff, but what really offended me was that there’s another site with a very similar name which simply shows nearly naked women next to cliff notes on basic math subjects. WTF?!? It is ridiculously obvious to me that math babes should be doing math, not adorning it. So I kind of had to call myself mathbabe after that.

Categories: math education, women in math

What is seasonal adjustment?

June 12, 2011 Cathy O'Neil, mathbabe 5 comments

One thing that kind of drives me crazy in economic or business news (which I’m frankly addicted to (which makes me incredibly old and boring)) is the lack of precision exactly when there seems to be some actual data- so at the very moment when you think you’re going to be told what the hard cold facts are, so you can make up your own mind about whether the economy is still sucking or is finally recovering, you get a pseudo-statistic with a side of caveat. I make it a point to try to formally separate the true bullshit from the stuff that actually is pretty informative if you know what they are talking about. I consider “seasonal adjustment” pretty much in the latter category, although there are exceptions (more on that later).

So what does “seasonal adjustment” mean? Let’s take an example: a common one is home sales. It’s a well known fact that people don’t buy as many homes in January and February as they do in May and June– due to some combination of people sitting in their houses eating ice cream straight from the Ben & Jerry’s container when it’s cold outside and the dirty snow tracks on their immaculate rugs during open houses making people trying to sell their houses enraged. So people delay house-hunting til Spring and they delay house-selling til house-hunting starts (side note: because of this, desperate people getting divorced or being forced to move often have to sell their houses at major discounts, so always do your house-hunting right after a huge blizzard).

Considering the cyclical and predictable nature of home sales, people want to “seasonally adjust” the data so that they can discern a move that is *not* due to the time of the year, in other words they want to detect whether a more macroeconomic issue is affecting home sales, such as a recession or housing glut (or both). It’s a reasonable approach- how does it work exactly?

Say you have a bunch of housing data, maybe 20 years of monthly home sales. You see that every single year the same pattern emerges, more or less. Then you could, for a given year, compute the average sale per month for that year. It’s important to compute this average, as we will see, because one golden rule of adjusting data is that the sum of the adjusted data must equal the original data, otherwise you introduce a problem that’s bigger than the one you’re solving.

Once you have the average sale per month, you figure out (using all 20 years) the typical divergence from the average that you see per month, as a percentage of the average per month that year. So for example, January is the worst month for home sales, and in the 20 years of data you see that on average there are 20% fewer home sales in January than there are on the average month of that year, whereas in June there are typically (in your sample) 15% more sales than in the average month that year. Using this historical data, you come up with numbers for each month (-20% for January, 15% for June, etc.). I can finally say what “seasonally adjusted” means: it is the rate of sales for the average month or for the year given these numbers. So if we saw 80,000 home sales in January, and our number for January is -20%, then we will say we have a seasonally adjusted rate of 100,000 sales per month or 1.2 million sales per year.

Note that this system of adjustment follows the golden rule at least for the historical data; by the end of each calendar year, we have attributed the correct overall number of sales, spread out over the months. However, if we start predicting July sales from what we’ve seen from home sales from January to March, taking into account these adjustments, we will also be tacitly assuming an overall number of sales for the year, and the golden rule will probably not hold. This is just another way to say that we won’t really know how many home sales have occurred in a given year until the year is over, so duh. But it’s not hard to believe that knowing these numbers is pretty useful if you want to make a ballpark estimate of the yearly rate of home sales and it’s only March.

A slightly more sophisticated way of doing this, which doesn’t depend as much on the calendar year, is to use the 20 years of data and a rolling 12 month window (i.e. where we add a month in the front and drop off a month in the back and thus always consider 12 consecutive months at a time) to compute the monthly adjustment for each month relative not to the average for the upcoming year, but rather relative to the average of the 12 past months. This has the advantage of be a causal model, (i.e. a model which only uses data in the past to predict the future- I’ll write a post soon about causal modeling) but has the disadvantage of not following the golden rule, at least in a short amount of time. For example, if housing sales are on a slow slide over months and months, this model will consistently fail to predict how low home sale figures should be.

The biggest problems with seasonally adjusted numbers are, in my opinion, that the model itself is never described- do we use 20 years of historical data? 3 years? Do we use a rolling window or calendar years? Without this kind of information, I’m frankly left wondering if you could frigging show me the raw data and let me decide whether it’s good news or bad news.

A few comments have trickled in from friends (over email) who are quants, and I wanted to add them here.

First, any predicting is hard and assumes a model, i.e., each year is the same, or each month is the same. In other words, as soon as you are talking about something being surprisingly anything, you are modeling, even when you don’t think you are. Most assumptions go unnoticed in fact. Part of being a good quant is simply being able to list your modeling assumptions.
As we will see when we discuss quant techniques further, a very important metric of a model is how many independent data points you have going into the model- this informs the calculation of statistical significance, for example. The comment then is that modeling seasonal adjustment as I’ve described above lowers your “number of independent data points” count by a factor of 12, because you are basically using all 12 months of a year to predict the _next year_, so what looked like 12 data points is really becoming only one. However, you could try to fit a smaller (than 12) parameter curve to the seasonal data differences, but then there’s overfit from having chosen the family of curves to be one that looks right. More on questions like this when we explore the concept of fitting model to the data, and in particular on how many different models you try for a given data set.
The final comment is this: all predictions likely violate the golden rule, but the point is you at least want one that isn’t biased, so in expectation it matches the rule.

Categories: data science, finance

What’s it take to be a woman in math?

June 11, 2011 Cathy O'Neil, mathbabe 4 comments

One of the first things I’d like to set people straight on is what it takes to be a woman in math. The short answer is, a warrior. The longer answer starts like this. At least in this country, in this culture*, it required near-constant resistance to the niggling feeling that you don’t belong, that you are an outsider, and that you will always be an outsider. It takes the belief in yourself as an abstract thinker, as a scientist, and as a _source_ of wisdom. This is completely counter to how the average woman has been taught to behave: demurely, modestly, quietly. Unleaderly. And the above description refers only to the psychological barriers, not the underlying mathematics.

Considering how difficult the material itself is, it’s not surprising how many women drop out eventually.

To be fair, we are seeing many more women finishing college degrees in mathematics and Ph.D.s in mathematics, and that is frigging awesome. But we are still not seeing that many professors, not in the numbers you might think from the Ph.D. programs. Why is this? I think I can explain this at least in part. When one decides to become a math major, it’s a difficult decision in terms of the surrounding cultural expectations, but there’s very good, very consistent feedback (at least outside of Harvard), namely in the form of homework and test grades from undergrad classes. In other words, it may be a weird decision to be a woman in math, but you can *see* your success whenever your homework comes back with a good grade. It’s proof positive that you are doing ok. To some extent in grad school this feedback loop continues, and with luck you have a good advisor who is encouraging and nurturing. However, once outside of grad school the feedback loop all but vanishes and you are left to decide, *within yourself* whether you are good at what you do. This is when you as a woman (and of course this happens to men too but for whatever reason, maybe just hormones, maybe culture, not as often) question yourself, and then look to the outside world for affirmation, and to be honest that’s a pretty tough moment. Many women leave at that moment.

In some sense I am one of them, because I did leave academics. But I left because I decided I wanted more, so more of a moment of strength than a moment of fear. I got a Ph.D. at Harvard, went to M.I.T. for a post-doc, then became an assistant professor at Barnard College. I got to the point where I was pretty sure I’d be able to get tenure, or in other words to the point that I was sure I deserved tenure, and I looked around and decided, this isn’t the kind of feedback loop I want in my life. I need actual feedback, in real time. I left to be a quant in finance (and since then a data scientist at an internet ad company). I feel very lucky that I could make that decision without fear, and I still consider myself a woman in math, and I still encourage women in math to stay in math or at least stay mathematical.

I think if people understood what women in math need to do in order to just be themselves every day, they would be treated less like anomalies and more like superheroes. It’s a tough thing to do, and they should be respected for it. And they are cool. I mean, what’s cooler than someone who lives as an outsider and has come to terms with that? It’s a strength that not everyone has.

Here’s the thing, I don’t want to end this post on a negative note. In spite of everything I’ve said, being a math babe totally rocks, because math rocks. I hope to convincingly illustrate just how much math rocks in future posts.

* I’ve talked to women outside the US about being mathematicians in their country. One thing that commonly comes up is that in Italy, and to some extent France, it is much more common to see women mathematicians. Why is this? One of my Italian women mathematician friends described it to me like this: in Italy, the academic track to become a mathematician is identical to that of becoming a high school math teacher- indeed the two tracks diverge only after a masters degree. The outcome of this system is that it is not seen as a particularly glamorous or even difficult profession- perhaps similar to that of an engineer. According to her, truly ambitious Italians become politicians, not mathematicians.

Categories: math education, women in math

Hello world! [stet]

June 11, 2011 Cathy O'Neil, mathbabe 4 comments

Welcome to my new “mathbabe” blog! I’d like to outline my aspirations for this blog, at least as I see it now.

First, I want to share my experiences as a female mathematician, for the sake of young women wanting to know what things are like as a professional woman mathematician. Second, I want to share my experiences as an academic mathematician and as a quant in finance, and finally as a data scientist in internet advertising. (Wait, did I say finally?)

I also want to share explicit mathematical and statistical techniques that I’ve learned by doing these jobs. For some reason being a quant is treated like a closed guild, and I object to that, because these are powerful techniques that are not that difficult to learn and use.

Next I want to share thoughts and news on subjects such as mathematics and science education, open-source software packages, and anything else I want, since after all this is a blog.

Finally, I want to use this venue to explore new subjects using the techniques I have under my belt, and hopefully develop new ones. I have a few in mind already and I’m really excited by them, and hopefully with time and feedback from readers some progress can be made. I want to primarily focus on things that will actually help people, or at least have the potential to help people, and which lend themselves to quantitative analysis.

Woohoo!

Categories: data science, finance, hedge funds, math education, news, open source tools, women in math

Newer Entries