August | 2011 | mathbabe

Wall Street versus us

August 9, 2011 Cathy O'Neil, mathbabe 3 comments

There have been two articles in the past few days which address the mentality of people working on Wall Street versus the rest of us.

First, we have this article from William Cohen, posted on Bloomberg.com, which is the first part of a series entitled, “Ending the Moral Rot on Wall Street.” This first part doesn’t contain much new; it goes over just how obnoxious and easy to hate the various Goldman Sachs assholes were when they packaged and sold mortgage debris and then emailed their friends about how much money they stood to make. And the second (and perhaps further) parts promise to explain how we are going to address the corruption and greed. My complaint, which is totally unfounded since I haven’t read the next parts, is that this guy is not disagreeing well. In other words, he’s setting up the guys on Wall Street to be monstrous and ethically vapid. This attitude is not going to help really understand the situation, nor will it lend itself to satisfying solutions. Here’s an example of the kind of “they are monsters” prose that probably won’t help:

These crimes are being committed, he said, by people who “have already made more money than could ever be spent in one lifetime and achieved more impressive success than could ever be chronicled in one obituary. And it begs the question, is corporate culture becoming increasingly corrupt?”

Yes, it certainly does raise that question.

Second, we have this blog post by Mark Cuban, which was originally posted in 2010 but is still relevant. In it, an effort is made to understand the actual mentality of the traders on Wall Street. Namely, they are framed as hackers:

Just as hackers search for and exploit operating system and application shortcomings, traders do the same thing. A hacker wants to jump in front of your shopping cart and grab your credit card and then sell it. A high frequency trader wants to jump in front of your trade and then sell that stock to you. A hacker will tell you that they are serving a purpose by identifying the weak links in your system. A trader will tell you they deserve the pennies they are making on the trade because they provide liquidity to the market.

I recognize that one is illegal, the other is not. That isn’t the important issue.

I agree with this characterization, and moreover I applaud the effort to understand the culture. These guys actually do think they are playing fairly within the context of their “game” (and they do care that it’s legal). To change their mindset we need to actually change the rules of the game, not just complain that they are corrupt, because, like in a religious disagreement, they can easily dismiss such talk as irrelevant to their lives.

Going back to the first article, it says:

That Wall Street executives have been able to avoid any shred of responsibility for their actions in the years leading up to the crisis speaks volumes not only about an abject ethical deterioration but also about the unhealthy alliance that exists between the powerful in Washington and their patrons in New York. Our collective failure to demand redress against a Wall Street culture that remains out of control is one of the more troubling facts of life in America today.

I agree that we do need to demand redress, but not against a culture’s ethical deterioration, which is just far too vague, but rather against individual corrupt actions. In other words we need to make the punishments for well-defined evil deeds clear and we need to follow through with the consequences. In order to do this we need to demand transparency so we can start to even define evil deeds. This means some system of understanding the models that are being used, and the risks being taken, and a market consensus that the models are sufficient. It means the actual threat of losing actual money, or even going to jail, if the models being used are crappy or if it turns out you were lying about the risks you were taking – or even if you were ignorant of them.

Categories: finance, news, rant

Monday morning reading list

August 8, 2011 Cathy O'Neil, mathbabe 2 comments

I’m happy to have found three really interesting articles in the New York Times this morning that I thought I’d share.

First, there’s a book review of “The Theory That Would Not Die,” a book about the history of Bayes’ law and the field of Bayesian statistics. It’s always seemed silly (and amusing) to me that there are such pissing contests between different groups of statisticians (the Bayesians versus the Frequentists), but there you are. And I guess this book is here to explain that partly it’s due to the fact that nobody took Bayes’ law seriously, so the people using it were constantly having to defend themselves. Honestly I’m just psyched that a math book is being reviewed in the first place, and written by a woman no less.

Second, there’s an interesting article about A.I.G. suing Bank of America over the mortgage bonds, with excellent background for how little litigation is actually happening due to the credit crisis, especially by our government. Reading between the lines, I would say we could summarize this attitude by our government as along the lines of the following: “Oh wow, those models are complicated. Since I don’t understand them and I don’t expect you to, even though you relied on them for your business, I will let you off the hook. After all, you can’t go to jail for not understanding math!”.

Finally, there’s a really scathing description here of how the politicians are rendering the S.E.C. impotent by giving them too much to do, taking away their power and resources, and generally trying to get micromanaging control over how they do their thing. True, it’s written by a former chairman of the S.E.C., but it’s still not a convincing way to create a powerful regulator (if that’s what anyone wants).

Categories: finance, women in math

Data Viz

August 7, 2011 Cathy O'Neil, mathbabe 11 comments

The picture below is a visualization of the complexity of algebra. The vertices are theorems and the edges between theorems are dependencies. Technically the edges should be directed, since if Theorem A depends on Theorem B, we shouldn’t have it the other way around too!

This comes from data mining my husband’s open source Stacks Project; I should admit that, even though I suggested the design of the picture, I didn’t implement it! My husband used graphviz to generate this picture – it puts heavily connected things in the middle and less connected things on the outside. I’ve also used graphviz to visualize the connections in databases (MySQL automatically generates the graph).

Here’s another picture which labels each vertex with a tag. I designed the tag system, which gives each theorem a unique identifier; the hope is that people will be willing to refer to the theorems in the project even though their names and theorem numbers may change (i.e. Theorem 1.3.3 may become Theorem 1.3.4 if someone adds a new result in that section). It’s also directed, showing you dependency (Theorem A points to Theorem B if you need Theorem A to prove Theorem B). This visualizes the results needed to prove Chow’s Lemma:

Categories: data science, math education, open source tools

Adam Smith made me buy a Kindle

August 6, 2011 Cathy O'Neil, mathbabe 2 comments

When I was pregnant with my third son, and working at D.E. Shaw, I got really into reading Adam Smith’s seminal work “Wealth of Nations” on the subway rides to and from work. Once the baby came, though, the problem was that the book is huge, like 1,200 pages, and impossible to read while breastfeeding. In my frustration, and to combat baby brain-rot, I bought a Kindle to continue my reading through many many exhausting hours those first few months. Totally worth it, an investment in my sanity.

This post got me remembering my personal experience with Adam Smith. Adam Smith has really gotten a bum rap. He is generally known for inventing the concept of the invisible hand, which is the idea that, as long as each person is working as hard as they can to personally profit from their labor, the overall economy will benefit from that self-interest. However, it’s often used is as an excuse for why regulations are unnecessary, because somehow, the feeling goes, the invisible hand is all we need. To tell you the truth, I don’t even remember seeing that in his book. Maybe it was there, and maybe I was getting barfed on during that page, but he definitely didn’t focus on it. He had other fascinating points though which he did reiterate.

Here’s why Wealth of Nations is so amazing. First, Smith really is incredibly good at explaining how markets work and, considering that he was inventing a field as he was writing, did so extremely well (although at times the book can be a bit repetitive, probably because he never invented notation- he just rewrote out entire phrase whenever he wanted to refer to an idea). The most basic goal of the book is to explain that it makes more sense to trade between countries so that things that are relatively cheaper to make or produce in Country A can be traded for things that are easier for Country B to make, and to generalize that to “between towns” or “between people”.

The examples he uses are really interesting, and include various layered considerations such as whether the goods are easily stored. For example, he maintains that cotton and wools should absolutely have free trade, since there is a clear advantage to having the appropriate climate for the growth of the plants, as well as the long storage. By contrast, he talks about the price of meat in England versus Argentina, being non-storable, and mentions that the price of a cow in Argentina is equal to the tip you need to give a village boy to go catch a cow (I’m paraphrasing because it was almost three years ago).

Another fascinating aspect of the book is that, since he wrote it in the 1770’s, economic conditions were really different, and he talks at length of the peasant classes in various countries. One of the most striking descriptions comes when he describes how much healthier the Irish peasants were compared to the Scottish peasants, because they ate potatoes, whereas the Scots ate oatmeal. It took me a few minutes to realize that he meant, that they only ate oatmeal. And he was saying that you could tell, by the way the 20 year olds still had teeth in Ireland, how much better a staple potatoes are than oatmeal.

He also talks about the various economies of South America and Europe and it sounds like they were doing better than Great Britain, especially Holland, which was a huge trading country back then. It’s fascinating just to understand, at the level of the average person, the peasants and the merchants, how incredibly different the world was then, something you don’t get as good a look at reading history books (at least the history books I’ve read).

Adam Smith was certainly pro-business, in the sense that he wanted a functioning and efficient system to work for all of the people in the world. However, he was well aware of the natural tendencies of people in power to abuse that power. He speaks at length against monopolies, which he thinks are a natural tendency, and claims that regulations to prevent such things are absolutely necessary.

He also talks at length about currencies and bank notes and the concept of borrowing money to be paid later. He is a proponent of usury laws- he doesn’t think it’s fair to entrap people into debt that they can’t repay (and back then I believe the consequences for unpaid debt were pretty severe). He also goes into incredible detail in describing the way Scotland went through a credit crisis, caused by a lending bubble, where people were cycling through various banks with different loans, borrowing more money to repay other debts, and which spiraled into a huge mess which caused the banking system to collapse. The Bank of England itself defaulted as well in one of his other historical accounts of lending bubbles.

One really interesting point he made about the credit crises he talks about is that, in those days, if you had money, which were called bank notes, then if you wanted to use them in another country you’d have to exchange them for gold when you left the country, and then you’d have to exchange the gold back into bank notes when you entered the next country. He claims that this system actually limited the scope of the credit crisis from going beyond the shores of Scotland; he used a kind of conservation of money argument, wherein he considered promised money, i.e. bank notes, to be only probabilistically worth something . Of course there are many parallels to be made to our current credit crisis, but that part about containing the crisis inside a country really makes me think about how much China has lent to the United States.

Adam Smith had one huge blind spot, which was the way he talked about slaves. It was a long time ago and times were different but it’s really hard to read those passages where he talks condescendingly about how naturally lazy slaves are, although he also mentions how little motivation they have. It’s totally brutal, but then again if you read the 1911 Encyclopedia Britannica you will find much the same kind of thing and worse.

Categories: finance, rant

Why should you care about statistical modeling?

August 5, 2011 Cathy O'Neil, mathbabe 3 comments

One of the major goals of this blog is to let people know how statistical modeling works. My plan is to explain as much as I can in simple plain English, with the least amount of confusion, and the maximum amount of elucidation at every possible level, so every reader can take at least a basic understanding away.

Why? What’s so important about you knowing about what nerds do?

Well, there are different answers. First, you may be interested in it from a purely cerebral perspective – you may yourself be a nerd or a potential nerd. Since it is interesting, and since there will be I suspect many more job openings coming soon that use this stuff, there’s nothing wrong with getting technical; it may come in handy.

But I would argue that even if it’s not intellectually stimulating for you, you should know at least the basics of this stuff, kind of like how we should all know how our government is run and how to conserve energy; kind of a modern civic duty, if you will.

Civic duty? Whaaa?

Here’s why. There’s an incredible amount of data out there, more than every before, and certainly more than when I was growing up. I mean, sure, we always kept track of our GDP and the stock market, that’s old school data collection. And marketers and politicians have always experimented with different ads and campaigns and kept track of what does and what doesn’t work. That’s all data too. But the sheer volume of data that we are now collecting about people and behaviors is positively stunning. Just think of it as a huge and exponentially growing data vat.

And with that data comes data analysis. This is a young field. Even though I encourage every nerd out there to consider becoming a data scientist, I know that if a huge number of them agreed to it today, there wouldn’t be enough jobs out there for everyone. Even so, there will be, and very soon. Each CEO of each internet startup should be seriously considering hiring a data scientist, if they don’t have one already. The power in data mining is immense and it’s only growing. And as I said, the field is young but it’s growing in sophistication rapidly, for good and for evil.

And that gets me to the evil part, and with it the civic duty part.

I claim two things. First, that statistical modeling can and does get out of hand, which I define as when it starts controlling things in a way that is not intended or understood by the people who built the model (or who use the model, or whose lives are affected by the model). And second, that by staying informed about what models are, what they aren’t, what limits they have and what boundaries need to be enforced, we can, as a society, live in a place which is still data-intensive but reasonable.

To give evidence to my first claim, I point you to the credit crisis. In fact finance is a field which is not that different from others like politics and marketing, except that it is years ahead in terms of data analysis. It was and still is the most data-driven, sophisticated place where models rule and the people typically stand back passively and watch (and wait for the money to be transferred to their bank accounts). To be sure, it’s not the fault of the models. In fact I firmly believe that nobody in the mortgage industry, for example, really believed that the various tranches of the mortgage backed securities were in fact risk-free; they knew they were just getting rid of the risk with a hefty reward and they left it at that. And yet, the models were run, and their numbers were quoted, and people relied on them in an abstract way at the very least, and defended their AAA ratings because that’s what the models said. It was a very good example of models being misapplied in situations that weren’t intended or appropriate. The result, as we know, was and still is an economic breakdown when the underlying numbers were revealed to be far far different than the models had predicted.

Another example, which I plan to write more about, is the value-added models being used to evaluate school teachers. In some sense this example is actually more scary than the example of modeling in finance, in that in this case, we are actually talking about people being fired based on a model that nobody really understands. Lives are ruined and schools are closed based on the output of an opaque process which even the model’s creators do not really comprehend (I have seen a technical white paper of one of the currently used value-added models, and it’s my opinion that the writer did not really understand modeling or at best tried not to explain it if he did).

In summary, we are already seeing how statistical modeling can and has affected all of us. And it’s only going to get more omnipresent. Sometimes it’s actually really nice, like when I go to Pandora.com and learn about new bands besides Bright Eyes (is there really any band besides Bright Eyes?!). I’m not trying to stop cool types of modeling! I’m just saying, we wouldn’t let a model tell us what to name our kids, or when to have them. We just like models to suggest cool new songs we’d like.

Actually, it’s a fun thought experiment to imagine what kind of things will be modeled in the future. Will we have models for how much insurance you need to pay based on your DNA? Will there be modeling of how long you will live? How much joy you give to the people around you? Will we model your worth? Will other people model those things about you?

I’d like to take a pause just for a moment to mention a philosophical point about what models do. They make best guesses. They don’t know anything for sure. In finance, a successful model is a model that makes the right bet 51% of the time. In data science we want to find out who is twice as likely to click a button- but that subpopulation is still very unlikely to click! In other words, in terms of money, weak correlations and likelihoods pay off. But that doesn’t mean they should decide peoples’ fates.

My appeal is this: we need to educate ourselves on how the models around us work so we can spot one that’s a runaway model. We need to assert our right to have power over the models rather than the other way around. And to do that we need to understand how to create them and how to control them. And when we do, we should also demand that any model which does affect us needs to be explained to us in terms we can understand as educated people.

Categories: data science, finance, hedge funds, internet startup, rant

Some R code and a data mining book

August 4, 2011 Cathy O'Neil, mathbabe 2 comments

I’m very pleased to add some R code which does essentially the same thing as my python code for this post, which was about using Bayesian inference to thing about women on boards of directors of S&P companies, and for this post, which was about measuring historical volatility for the S&P index. I have added the code to those respective posts. Hopefully the code will be useful for some of you to start practicing manipulating visualizing data in the two languages.

Thanks very much to Daniel Krasner for providing the R code!

Also, I wanted to mention a really good book I’m reading about data mining, namely “Data Analysis with Open Source Tools,” by Phillipp Janert, published by O’Reilly. He wrote it without assuming much mathematics, but in a sophisticated manner. In other words, for people who are mathematicians, the lack of explanation of the math will be fine, but the good news is he doesn’t dumb down the craft of modeling itself. And I like his approach, which is to never complicate stuff with fancy methods and tools unless you have a very clear grasp on what it will mean and why it’s going to improve the situation. In the end this is very similar to the book I would have imagined writing on data analysis, so I’m kind of annoyed that it’s already written and so good.

Speaking of O’Reilly, I’ll be at their “Strata: Making Data Work” conference next month here in New York, who’s going to meet me there? It looks pretty great, and will be a great chance to meet other people who are as in love with sexy data as I am.

Categories: data science, math education, open source tools, rant

How do you disagree?

August 3, 2011 Cathy O'Neil, mathbabe 14 comments

I remember when I was considering moving to New York from Boston, in late 2004. I came to give a number theory seminar at the CUNY Graduate Center, and afterwards we had a very nice dinner and discussion. Bush had just won re-election, and being typical left-wing academics, we were all disappointed by the news. The most startling aspect of that conversation to me was how often the word “crazy” or “stupid” was used to describe this result. In other words, it seemed like the only way we could come to terms with how half the country had voted for Bush was to describe them as feeble-minded one way or the other.

Gary Gutting wrote a wonderful Opinionator article in today’s New York Times which addresses this issue. It talks about the difference between logical argument and rational thought. He first promotes the idea that we each carry around a developed “picture” of the world:

Conservatives, for example, see business as primarily a source of social and economic good, achieved by the market mechanism of seeking to maximize profit. They therefore think government’s primary duty regarding businesses is to see that they are free to pursue their goal of maximizing profit. Liberals, on the other hand, think that the effort to maximize profit threatens at least as much as it contributes to our societies’ well-being. They therefore think that government’s primary duty regarding businesses is to protect citizens against business malpractice.

He then goes on to say that it’s not irrational to have a picture of the world in mind- we all do it, and it’s an important if not essential way to develop moral, political, and religious views. Moreover, we reasonably view other peoples’ opinions in the context of our pictures, looking naturally for evidence that ours is right.

But what does qualify as irrational is when we stick to our picture in light of really good evidence against its consistency:

But although accepting one of these rival pictures is not irrational, inflexible adherence to it can be. Neither picture would be viable without an exception-clause that acknowledges a certain validity to the rival picture. When an issue about regulation comes up, it’s entirely appropriate (and rational) for liberals and conservatives to begin with an inclination to the response generally favored by their picture. But both sides need to attend to the specific facts of the situation at hand and take seriously the possibility that these facts give reason for invoking the exception-clause in their picture. (For example: The risk from that nuclear plant is too big to take for the sake of free market principles, or the severity of our unemployment makes it worthwhile to exempt small businesses from some record-keeping regulations.) When liberals or conservatives become incapable of thinking this way, their positions become irrational.

I’d like to go one step further (because I agree with everything he said) and ask, what can we do to encourage ourselves and the people we disagree with to have this exception-clause out and ready to use?

It seems to me that when you approach a disagreement armed with facts and arguments to prove your point, you may as well concede defeat before you begin – you won’t “win” an argument that way, at least if it’s a deep argument, even if you can leave it feeling like you made the cleverer points, because you will not have persuaded anyone to change their mind. On the other hand, if you approach disagreement genuinely wondering why the other person feels and thinks the way they do, it becomes much easier to hone in on the basic cause for conflict, and for each person in the discussion to take out their exception-clause and listen to logical argument. In fact I don’t think logical argument can be useful until this point of readiness has been reached. I will call this approach, where you are each mutually assured of the exception-clause readiness before delving into logical argument, as “disagreeing well”.

For example, if I had the time, it would be fascinating to get to know sufficiently many people who voted for Bush in 2004 to be not at all surprised that he won the election. It’s a sad fact about the insularity of my life that I don’t know enough people like that.

More generally, I think a key element of developing your ability to disagree well is to expose yourself to lots of opinions. I am glad to have done a few really different jobs – loading trucks for Fair Foods, barista at Coffee Connection, secretary at a corrupt computer hardware store, student, teacher, quant, professor, data scientist – and met enough people of different classes and backgrounds that I feel relatively exposed to the world- but only the world of the Northeast United States, which is primarily composed of Democrats (although my excursions into the Bluegrass community may be the exception to that rule).

Here’s the irony of disagreeing well: you end up not actually believing your own opinion nearly as much as you thought to begin with. That’s probably why it’s hard to do, because it’s scary to put your belief on the line in an attempt to understand someone else’s viewpoint better. It’s way more work, and it’s for the most part a relationship-building event, with the logical discussion coming in after a long time and sporadically. In particular you can’t plan it and you won’t know how long it will take or even if it will work. I think, though, that to have the most interesting and provocative discussions, we need to do it anyway, even though for the most part you end up more confused than convinced, or convincing.

What about you? How do you disagree well? How do you take out your exception clause and how do you convince other people to do the same?

Categories: rant

Cool example of Bayesian methods applied to education

August 2, 2011 Cathy O'Neil, mathbabe 1 comment

My friend Matt DeLand teamed up recently with Jared Chung to enter a data mining hacking contest sponsored by Donors Choose, which is a well-known online charity connecting low-income classrooms across the country to donors who get to choose which projects to support.

Their goal was to figure out how many of the thousands of projects up for funding were directly related to career preparation, and they performed a nifty Bayesian analysis to do it. Turns out it’s less than 1%!

Here’s their report. It’s really well explained in the 5-page pdf, if you have a few minutes.

Speaking of Donors Choose, it was featured at a HackNY Summer Fellows event I went to last week. The Summer Fellows is essentially like the math camp I taught at for high school students except it’s a computer camp for college students – same level of nerdy loveliness though. The event was a showcase for the fantastically nerdy student hackers, and there were some very impressive exhibits.

The hack involving Donors Choose shows a movie of how the donations are being given from some location to the classroom that’s benefitting on a big map of the country, and shown quickly from 2005 or so really exhibits how quickly the concept grew. It’s not unlike this visualization of the history of the world through the lens of Wikipedia.

Categories: data science, math education, news