Archive for December, 2011

A New Year’s resolution you can keep

Ladies, I know what you’re thinking. But don’t do it. If you need any more proof of the impossibility of losing weight, look no further than this New York Times article which describes it in excruciating, painful detail.

But that’s not even my angle. I’m not going to argue that you shouldn’t try to lose weight because it’s so hard. In fact, it’s possible for some people, as the article describes, as long as they are willing to think about nothing else for the rest of their life. But in fact, even if it wasn’t hard, even if it was an achievable goal, I’d still be arguing against it.

My angle is this: it’s just not interesting enough. You have better things to do than devote yourself to vanity. And plus, I’ve said it before and I’ll say it again, sexy is something you do, not something you look like.

Do you know how boring those people must be who think about food all the time? Have you ever spent time, real time, with someone who is singularly obsessed with food or exercise? I got news for you, if you haven’t, it’s insanely boring. They can only talk about their plan and how it’s going. And I’ve got news for you if you’re one of those people: you are insanely boring.

Get a hobby that involves other people, that gives you a higher sense of purpose. If you’re a lefty, join your local Occupy Wall Street group. Start writing a blog. Start a book club. Read stuff.

Do you know, I’ve heard this story, that some pollster asked a bunch of people what they’d do if they could do anything. It was left as an open question like that, if you could do anything, what would you do? And the majority of women said they’d lose weight (on average 10 pounds). WHAT?! They could have mentioned closing the income gap? Stopping wars? Improving climate change? Making sure people everywhere had access to clean water? And they chose to lose freaking 10 pounds, are you kidding me?

Actually it’s probably a myth, but it doesn’t really matter, because I believe it. And looking around at the world we live in, with all of the ridiculous assumptions of vanity and, what gets me even more riled up, time spent on crap like that, I can believe that women are typically so bombarded with self-image issues that they can’t think beyond them to bigger issues like wars and clean water and global warming. WTF, culture??

So my New Year’s resolution this year is to fight back against that crap. I’m starting today, one day early, by writing this post. Can I get a fuck yeah?!

I am officially out of the fat ladies’ closet, in full armor, ready to fight against that idea that we have any time to waste with vanity. We have way too much to do, ladies (and gentlemen), and not enough time to do it all. Let’s start now. For starters, I’m going to start to emulate a woman I read about  a few years ago on Overheard in New York: “Mom, yeah, I got a Ph.D., I live in my own apartment in Manhattan, and I’ve got a fat ass. How about it?”

I am not ranting against humanity here. I think humanity can be great. But not when our culture is encouraging us to gaze at our navels. We need to actively create greatness, which we can do, but it takes thinking beyond ourselves and asking, what do I really want to do this year?

Categories: rant

Matt Stoller explains politics

I’ve never understood politics, partly because they’re complicated, partly because the people who do understand politics are so heavily involved they don’t know how to contextualize for people like me. I’ve come to think of it as a lot like finance, where there’s power to be had by withholding information, and part of that power is wielded simply by inventing a new vocabulary that makes people on the outside feel tired and hopeless. You really need a tour guide, a translator, to walk you through stuff to achieve a decent level of understanding.

I now consider Matt Stoller my personal translator. Matt regularly contributes to Naked Capitalism, my go-to blog for informed, vitriolic insights into the corrupt world of finance. His recent post on Naked Capitalism concerning Ron Paul and liberals beautifully explains how confused modern liberals are when confronted by someone like Ron Paul, who is both unattractive and on their side for a number of reasons. I confess that I’ve been that confused liberal myself at many an #OWS Alternative Banking Working Group meeting, when the Ron Paul fans come and talk about Fed transparency.

But Matt doesn’t just tell a good story, although he does that. He also give you insight into the process of politics. He peppers his story with helpful, nerdy explanations like this:

An old Congressional hand once told me, and then drilled into my head, that every Congressional office is motivated by three overlapping forces – policy, politics, and procedure. And this is true as far as it goes. An obscure redistricting of two Democrats into one district that will take place in three years could be the motivating horse-trade in a decision about whether an important amendment makes it to the floor, or a possible opening of a highly coveted committee slot on Appropriations due to a retirement might cause a policy breach among leadership. Depending on committee rules, a Sub-Committee chairman might have to get permission from a ranking member or Committee Chairman to issue a subpoena, sometimes he might not, and sometimes he doesn’t even have to tell his political opposition about it. Congress is endlessly complex, because complexity can be a useful tool in wielding power without scrutiny. And every office has a different informal matrix, so you have to approach each of them differently.

Another recent Stoller post that really blew my mind was How the Federal Reserve Fights, which explained Matt’s experiences as a Senior Policy Advisor to Alan Grayson, a congressman on the Financial Services Committee in 2009-2011. Grayson teamed up with Ron Paul to force more transparency at the Fed. It’s an awesome story, but my favorite part, because I’m such a nerd and I love my nerd heroes, is the following:

When it gets down to crunch time, as a staffer going up against a big force of lots of lawyers, you get really tired and cut corners. One obstacle in legislating is that it is really hard to tell what bills do, because they have multiple provisions like “In Section 203, delete “do” and replace with “shall”. You have to constantly reference pieces of the code and compare changes, which gets confusing. It’s like doing “track changes”, but on paper and with multiple versions. This is a problem software could easily solve and I’ve heard that agencies and (probably the Fed) have such software. But I didn’t. So the Fed thought we would do nothing more than cursory reading of Watt’s amendment, and rely on their validators who told us the amendment would increase transparency. And this is where Grayson showed legislative genius. We were exhausted, but he got all the difference pieces of the law, and spent a few hours deciphering exactly what this amendment meant. And he figured out that not only did the amendment not open up the Fed facilities to independent inspection, it actually increased the secrecy of the Fed. If you want the gory details, here’s Grayson’s argument during the markup.

I’m kind of wishing Matt Stoller would write a book about “How Politics Works,” but then again does anybody read books anymore? Is it better for him to just continue to write timely blog posts? I’ll take what I can get.

Categories: #OWS, finance

Information loss

When people ask me why the financial system is so complicated, I always say the same thing: because it benefits the insiders of the financial system to make it that way. The more complicated and opaque something is, the more opportunities to extract fees and withhold information. Or rather, to withhold information in order to extract fees.

In some sense you can think of the financial system as a huge “information loss” system, where people get paid based on how much more information they know than you do. Incidentally, this theory flies in the face of most economic assumptions of transparency, and explains the origin of the phrase “dumb money.” And it’s not my idea, it’s kind of an elemental fact for insiders; I’m bringing it up because I want to make sure people are aware of it.

As an analogy, think of the situation when you buy a used car from someone. They tell you some things, like its make and model, and they may let you test drive it, but you end up not knowing how many accidents it’s been in, and stuff like that. Your partial information in general lets them make money.

It’s kind of understandable why there’s so much insider trading going on. Insider trading is the ultimate and most efficient way to profit from information.

Another good example of information loss is with mortgages, and mortgage-backed securities. The original idea behind securitizing mortgages was that investors get to buy pieces of pools of mortgages, which “behave better” than individual mortgages: whereas an individual can refinance (and often does, when interest rates go down) or default, it’s less likely that a majority of the people in a pool refinance or default.

[Let’s ignore for now the issue that the banks got so high on the profits of securitization that the assumption of better behavior of pools got thrown out the window as the underlying mortgages became worse and worse – a gleaming example of information loss.]

In selling these pools, the banks were charging fees so that you, the investor, wouldn’t have to “deal with the details” of all of the individual mortgages in the pool. This is one way that people withhold information and charge a fee for it, by calling it a chore.

And it is a chore, if you actually do it. However, in the case of mortgages, lots of banks charged that fee for that chore and then never actually did the chore– they kept terrible accounts of the mortgages, and when they started to default in large numbers, started illegally pretending their papers were in order, through “robo-signing,” in order to foreclose quickly.

Here’s something you can do if you have a mortgage. Demand to see your mortgage note. It turns out there’s a legal way for you to ask your bank to trace the ownership of your mortgage through the securitization system, and you can do it for fun, you don’t need to be late on your mortgage payments or anything.

There does seem to be a risk associated with asking to see your note, however, namely to your credit score, which is bullshit. There’s also a form letter of complaint if your bank somehow doesn’t come up with the answer.

Categories: #OWS, finance

Economist versus quant

There’s an uneasy relationship between economists and quants. Part of this stems from the fact that each discounts what the other is really good at.

Namely, quants are good at modeling, whereas economists generally are not (I’m sure there are exceptions to this rule, so my apologies to those economists who are excellent modelers); they either oversimplify to the point of uselessness, or they add terms to their models until everything works but by then the models could predict anything. Their worst data scientist flaw, however, is the confidence they have, and that they project, in their overfit models. Please see this post for examples of that overconfidence.

On the other hand, economists are good at big-picture thinking, and are really really good at politics and influence, whereas most quants are incapable of those things, partly because quants are hyper aware of what they don’t know (which makes them good modelers), and partly because they are huge nerds (apologies to those quants who have perspective and can schmooze).

Economists run the Fed, they suggest policy to politicians, and generally speaking nobody else has a plan so they get heard. The sideline show of the two different schools of mainstream economics constantly at war with each other doesn’t lend credence to their profession (in fact I consider it a false dichotomy altogether) but again, who else has the balls and the influence to make a political suggestion? Not quants. They basically wait for the system to be set up and then figure out how to profit.

I’m not suggesting that they team up so that economists can teach quants how to influence people more. That would be really scary. However, it would be nice to team up so that the underlying economic model is either reasonably adjusted to the data, or discarded, and where the confidence of the model’s predictions is better known.

To that end, Cosma Shalizi is already hard at work.

Generally speaking, economic models are ripe for an overhaul. Let’s get open source modeling set up, there’s no time to lose. For example, in the name of opening up the Fed, I’d love to see their unemployment prediction model be released to the public, along with the data used to train it, and along with a metric of success that we can use to compare it to other unemployment models.

Is Stop, Question and Frisk racist?

A few weeks ago I was a “data wrangler” at the first Data Without Borders datadive weekend. My group of volunteer data scientists was exploring the NYPD “Stop, Question and Frisk” data from the previous few years. I blogged about it here and here.

One thing we were interested in exploring was the extent to which this policy, whereby people can be stopped, questioned, and frisked for merely looking suspicious (to the cops) is racist. This is what I said in my second post:

We read Gelman, Fagan and Kiss’s article about using the Stop and Frisk data to understand racial profiling, with the idea that we could test it out on more data or modify their methodology to slightly change the goal. However, they used crime statistics data that we don’t have and can’t find and which are essential to a good study.

As an example of how crucial crime data like this is, if you hear the statement, “10% of the people living in this community are black but 50% of the people stopped and frisked are black,” it sounds pretty damning, but if you add “50% of crimes are committed by blacks” then it sound less so. We need that data for the purpose of analysis.

Why is crime statistics data so hard to find? If you go to NYPD’s site and search for crime statistics, you get really very little information, which is not broken down by area (never mind x and y coordinates) or ethnicity. That stuff should be publicly available. In any case it’s interesting that the Stop and Frisk data is but the crime stats data isn’t.

I still think it is outrageous that we don’t have open source crime statistics in New York, where Bloomberg claims to be such a friend to data and to openness.

And I also still think that, in order to prove racism in the strict sense of the above discussion, we need that data.

However, my overall opinion has changed about whether we have enough data already to say if this policy is broadly racist. It is. My mind changed reading this article from the New York Times a couple of weeks ago. It was written by a young black man from New York, describing his experiences first-hand being stopped, questioned, and frisked. The entire article is excellently written and you should take a look; here’s an excerpt:

For young people in my neighborhood, getting stopped and frisked is a rite of passage. We expect the police to jump us at any moment. We know the rules: don’t run and don’t try to explain, because speaking up for yourself might get you arrested or worse. And we all feel the same way — degraded, harassed, violated and criminalized because we’re black or Latino. Have I been stopped more than the average young black person? I don’t know, but I look like a zillion other people on the street. And we’re all just trying to live our lives.

The argument for this policy is that it improves crime statistics. For some people, especially if they aren’t young and aren’t constant targets of the policy, it’s probably a price worth paying to live in a less crime-ridden area.

And we all want there to be less crime, of course, but what we really want is something even more fundamental, which is a high quality of life. Part of that is not being victimized by crooks, but another part of that is not being (singled out and) victimized by authority either.

I think a good thought experiment is to consider how they could make the policy colorblind. One obvious way is to have cops in every neighborhood performing stop, question and frisk to random people. The argument against this is, of course, that we don’t have enough cops or enough money to do something like that.

Instead, to be more realistic about resources, we could have groups of cops randomly be assigned to neighborhoods on a given day for such stops. If you think the policy is such a good crime deterrent, than you can even weight the probability of a given neighborhood by the crime rate in that neighborhood. (As an aside, I would love to see whether there’s statistically significant reason to believe that this policy does, in fact, deter crime. So often mayors and policies take credit for lowered crime rates in a given city when in fact crime rates are going down all over the country in a kind of seasonality way.) So in this model the cops are more likely to land in a high-crime area, but eventually by the laws of statistics they will visit every neighborhood.

My guess is that, the very first time the Upper East Side is chosen randomly, and a white hedge fund manager is stopped, questioned, and frisked by a cop, who takes away his key and enters his apartment, terrorizing his family while he’s handcuffed in the back of a cop car, is the very last day this policy is in place.

Categories: data science

A good data scientist is hard to find

As a data scientist at an internet start-up, I am something of a quantitative handyman. I go where there is need for quantitative thinking. Since the business model of my company is super quantitative, this means I have lots of work. I have recently categorized the kind of things I do into 4 bins:

  1. I visualize data for business people to digest. This is a kind of fancy data science-y way of saying I design reports. It’s actually a hugely critical part of the business, since our clients are less quantitative than we are and need to feel like they understand the situation, so clear, honest, and easily digestible visuals is a priority.
  2. I forecast behavior using models. This means I forecast what users on a website will do, based on their attributes and historical precedent for what people who shared their attributes did in the past, and I also do things like stress test the business itself, in order to answer questions like, what would happen to our revenue stream if one of our advertisers jumped out of the auction?
  3. I measure. This is where the old-school statistics comes in, in deciding whether things are statistically significant and what is our confidence interval. It’s related to reporting as well, but it’s a separate task.
  4. I help decide whether business ideas are quantitatively reasonable. Will there be enough data to answer this question? How long will we need to collect data to have a statistically significant answer to that? This is kind of like being a McKinsey consultant on data steroids.

So why is it so hard to find a good data scientist?

Here’s why. Most data scientists don’t really think that 3 and 4 above are their job. It is far less sexy to try to honestly find the confidence interval of a prediction than it is to model behavior. Data scientists are considered magical when they forecast behavior that was hitherto unknown, and they are considered total downers when they tell their CEO, hey there’s just not enough data to start that business you want to start, or hey this data is actually really fat-tailed and our confidence intervals suck.

In other words, it’s something like what the head of risk management had to face at a big bank taking risks in 2007. There’s a responsibility to warn people that too much confidence in the models is bad, but then there’s the political reality of the situation, where you just want to be liked and you don’t actually have the power to stop the relevant decisions anyway. And there’s the added issue in a start-up that they are your models, and you want them to be liked (and to be invincible).

It’s far easier to focus on visualizing and modeling, or to stay even sexier and more mystical, just modeling itself, and let the business make decisions that could ultimately not work out, or act on data that’s pure noise.

How do you select for a good data scientist? Look for one that speaks clearly, directly, and emphasizes skepticism. Look for one that is ready to vent about how people trust models too much, and also someone who’s pushy enough to speak up at a meeting and be that annoying person who holds people back from drinking too much kool-aid.

Categories: data science

Steam queen

I’m writing today for all those people who are about to receive, or who have just received, fancy espresso machines with milk steaming functionality but who have no idea how to actually steam milk. You too can become a steam queen (or king)! A little background first.

When I wrote earlier about my friend the coffee douche, I mentioned that in high school I worked as a barista at Coffee Connection in Lexington center (Massachusetts). At the time I was considered kind of fancy (or at least I considered myself kind of fancy) because I knew the difference between a cappuccino (just foam) and a cafe au lait (foam and steamed milk). Just to be clear, there was no such thing back then as a “latte”, and the sizes were, “small”, “medium”, and “large,” and nobody used the word “barista”.

It was a pretty repetitive job, but I liked it. I liked the hustle and bustle and meeting all the strange people who would be grumpy or friendly, who would talk to me like a human or order me about. I got to know lots of people that way whom I never would have met otherwise. I enjoyed explaining the different roasts and beans, and asking people about their tastes to try to match it their coffee beans. I was a kind of coffee douchy matchmaker.

To make things more interesting for myself and the customers, I’d compute people’s bills in my head, to the penny. Massachusetts sales tax back then was 5% so it’s not as hard as it sounds (now it’s 6.25%, what a pain). As long as you can add things up in your head, and know the cutoffs for rounding that the cash register uses, then it’s a piece of cake. I did enjoy every now and then telling people that if they bought their two cappuccinos separately, instead of together, then they’d save a penny. If they asked me how I figured it out I’d say, after all it is a step function, so it stands to reason!

I also enjoyed the manual labor of it, and on a wet day, when the light grey stone floor would get filthy with mud people had tracked in, I enjoyed mopping clean it after everyone had left, listening to Allman Brothers, Tracy Chapman, and Sinead O’Connor mixed tapes really loud.

Now to the point of this post: I got really good at steaming milk. In fact I formally designated myself the Steam Queen of Coffee Connection, and as far as I know nobody has ever challenged me on that. Let me tell you the secret to awesome steamed milk. It’s essentially an interplay between fat content and temperature.

First, use really cold milk, and please don’t let it be skim. Yes, we all think you look really good in your size 2 leather pants, but if you want steamed milk with those leather pants then you should just go for lowfat and spend an extra hour at the gym or something (or whatever it is you do). Because the crucial yumminess of excellently steamed milk bubbles is, you guessed it, butter fat.

If you can go with whole milk, you won’t even need these instructions because whole milk will practically steam itself if left near a steaming apparatus. Come to think of it, steaming half and half should probably be left to small children exclusively as an ego booster.

So I hope I’ve made my point: lowfat milk, at least, and super cold. Now put it into one of those silver cans. For some reason you really do need a silver metallic can, it doesn’t work as well with ceramic cups, probably because you are less aware of the internal temperature. Fill it up between halfway and two third of the way. So, like 60% full. The process of steaming will expand it to be full.

They key is that it’s easier to make excellent bubbles when the milk is cold. So do that first: put the steaming nozzle, which is on full blast, just below the surface of the milk, as high as possible without it spitting milk out of your can. You want to create a hydrodynamical feedback loop, where the milk is rotating below and around the nozzle tip, luxuriating in its steaming process. As the milk get steamed, it expands, so be sure to lower the can to keep the nozzle just below the top.

You want small bubbles, but don’t worry about a few large bubbles, we will deal with them later. Focus on creating that feedback loop, until the expansion is done, and the can is full.

A huge mistake I commonly see is that people think they’re done at this point. You’re not done! The milk below the bubbles is still relatively cold, and nobody wants to drink a cold latte. This is the time to put the nozzle to the very bottom of the can and use your fingers on the can to determine when the milk is sufficiently hot. People are way too impatient at this stage. Wait til it’s hot (by the way, contrary to the advice you may receive from various sources, you don’t need a thermometer for this if you can use your sense of touch)!

Finally, one more thing. Take out the nozzle and let the can sit for between 30 seconds and 2 minutes next to the coffee machine, and in the meantime get the coffee cup and espresso ready. When everything is ready, pick up the can of steamed milk an inch, and drop it to the counter once, firmly. This pops the big bubbles that haven’t popped themselves, and leaves you only delicious little scrumptious bubbles for your delicious latte. Yum!

Categories: rant