Archive for May, 2012

Best case/ worst case: Medicine 50 years from now

Best Case

The scientific models and, when possible, the data have been made available to the wider scientific community for vetting. Incorrect or non-robust results are questioned and thrown out by that community, interesting and surprising new results are re-tested on larger data sets under iterative and different conditions to test for universality.

The result is that a person, with the help of their doctor and thorough exams and information-gathering session, and with their informed consent to use this data for their benefit, will have a better idea of what to watch out for in terms of health risks, how to prevent certain diseases that they may be vulnerable to, and how the tried-and-true medicines would affect them.

For example, in spite of the fact that Vioxx gives some people heart attacks, it also really helps other people with joint pain that aspirin or ibuprofen can’t touch. But which people? In the future we may know the answer to this through segmentation models, which group people by their attributes (which could come under the category of daily life conditions, such as how much someone exercises, or under the category of genetic profile).

For example, we recently learned that exercise is not always good for everyone. But instead of using that unlikely possibility as an excuse not to do any exercise, we could be able to look at a given profile and tell a person if they are in the clear and what kind of exercises would be most beneficial to their health.

It wouldn’t solve every problem; people would still die, after all. But it could help people live happier and healthier lives. It depends on the open exchange of ideas among scientists as well as strong regulation about who owns personal data and how it can be used.

Worst Case

The scientific community continues its practice of essentially private data collection and models. Scientific journals become more and more places where, backed by pharmaceutical companies and insurance companies, paid Ph.D.’s boast about their latest breakthrough with no cultural standard of evidence.

Indeed there is progress in segmentation models for disease and medicine, but the data, models, and results are owned exclusively by corporations, specifically insurance companies. This leads to a death spiral in modeling, where the very people who are vulnerable to disease and need medicine or treatment the most are priced out of the insurance system and no longer have access to anything resembling reasonable medical care, even for chronic diseases such as diabetes.

And you won’t need to give your consent for those insurance companies to use your data – they will have already bought all the data that they need to know about you from data collectors, which have been gleaning information about you from your online presence since birth. These companies will know everything about you; they control and sell your data for extra profit. To them, you represent a potential customer and a potential cost, a risk/return profile like any other investment.

Categories: data science

How to talk conservative

I finished reading “The Righteous Mind: Why Good People are Divided by Politics and Religion” and I have to say, I got a lot out of it. Even if they are just approximations to the truth, it’s interesting to consider his various positions. Near the end he talks about religion and “groupishness,” and how people are too focused on the technical aspects of religious beliefs rather than what a religion accomplishes in a community, which he claims is its main benefit.

But what I found more interesting is the beginning of the book when he discusses the different moral make-up of liberals and conservatives (and libertarians) in this country. Namely, he claims that liberals care primarily about the following three things:

  1. caring for the vulnerable or victimized,
  2. the concept of oppression from bullies – or conversely the concept of liberty, and
  3. the concept of proportional fairness (you deserve a part of the pie since you helped make it, but you wouldn’t deserve any if you hadn’t helped).

By contrast, conservatives care about a larger set of six things, the above three as well as:

  1. the concept of sanctity,
  2. the concept of authority – when it’s just and those in power take proper responsibility, and
  3. the concept of loyalty.

I took away three points. First, liberals are bad at guessing what conservatives think, because they are somewhat blind to these last three things, and when they see conservatives go on about them, they assume conservatives don’t care about the first three, which is wrong, although it’s true that they care about them differently (especially proportional fairness: whereas liberals emphasize leaving nobody out, conservatives emphasize not letting people get extra, especially if it comes from their stuff). Second, if I, as a liberal, want to communicate with a conservative, I have to talk about all six of these with some level of understanding. Finally, statistics and other rational arguments only work if the person you’re talking to already agrees with you or if they are exceptionally open-minded – in any case you have to appeal to their morals before going into stats.

With that in mind, here are two rants against the Stop, Question, and Frisk policy, one written for a liberal audience, one for a conservative audience.

Liberal version First, the stop, question, and frisk policy targets minority men almost exclusively. Second, almost 90% of the events end up without an arrest, which means it’s unwarranted intrusion and bullying- typically the reason given for the stop is a “furtive movement”, which could be absolutely anything. Finally, there is a quota system in the police department which forces each officer to perform these unwarranted searches whether or not there is cause, which inevitably leads them to target the “least likely to complain,” namely young, poor minorities. We need to stop the police abusing their privileges in this way immediately.

Conservative version What is the difference between a police force and a gang of men who walk around with guns? The answer, in the best of worlds, is authority, intentionality, and the rule of law. Police have an important job to do, which is to protect us, and to keep the streets safe. And when they do a good job, we admire them for that and count on them for their protection. But imagine if, instead of seeing your neighborhood cop as someone you can count on, he instead consistently stops you on your way home from school or work and asks you suspicious questions, and sometimes even takes your keys from your pocket, and, while you’re locked in the police car, enters your apartment and terrorizes your family. This makes you feel like you are the bad guy, even though you did nothing wrong. After a while, it would make you and your neighborhood less trusting of the authority of the cops, which would lead to reckless behavior and lawlessness, because your rights are no longer being protected. We need to stop the policy of Stop, Question, and Frisk in order to make sure the police never become just a bunch of bullies with guns.

Categories: musing

When “extend and pretend” becomes “delay and pray”

When banks have non-performing loans, they sometimes don’t want to admit it. So instead of calling it a loss, because the debtor can’t pay, they simply rewrite the contract so that it has been extended. This way the debtor is not technically behind in payments and the creditor can pretend that the corresponding debt on their books is worth something. It’s called extend and pretend, and it’s not new.

And actually, this ploy sometimes works. After all, sometimes the debtor just needs a bit more time – they could be temporarily unable to pay for whatever reason. Indeed it would be a convenient option for people who are just in need of a few more months to get back on their feet and not lose their house (typically this offer is not extended to individuals, since their loans are too small to fret over).

Make no mistake: there is a real incentive for the banks to do this. Currently the worst example of this method is in Spain, where the banks are finding it politically impossible to admit their losses. The government doesn’t want to hear it, because they will need to bail them out, and their borrowing costs are already precariously high. The Eurozone leaders don’t want to think Spain is as bad off as Greece, because they can’t handle that kind of problem. The investors don’t want to hear it because their investments will be worth less once the news comes out (an example of asymmetric information if there ever was one – shouldn’t investors already know how much extending and pretending is really going on?). And of course the lenders themselves don’t want to admit they are working at an insolvent institution, especially when they probably each know other institutions that are even more insolvent.

What are the chances that this method of delay and pray will work for Spain? With an enormous housing bubble and 24% unemployment, not good. Most of the bad loans that have been extended after non-payment are housing market related. Half of the lenders are zombie, which means insolvent but still technically open for business. Essentially the numbers are just too high and now everybody knows it (see this Bloomberg article for the low-down on Spain).

So what should Spain be doing?

I like to point to the example of Iceland, which admitted its debts early on (although it has to be admitted they didn’t have much of a choice), defaulted on a bunch of international debt, bailed out their citizens from onerous home debt, and is recovering nicely (see this Bloomberg article for more on Iceland).

Oh, and let me add that they (Iceland) are indicting and jailing the bankers who got them into the mess, to the tune of 200 indictments. Considering the U.S. has a population 981 times as large, that would be equivalent to us indicting 196,341 bankers. In fact we’ve indicted no top bank executive, although everyone will be relieved to know the SEC “sanctioned” 39 people for the housing market debacle. Phew!

Unfortunately, it would be tough for Spain to repeat that act- it depended on the fact that Iceland has control over its economic choices, but Spain is part of the Eurozone and as such is embedded in a huge network of agreements and debts and currency with the other Eurozone nations.

In some sense, Spain is being forced into the zombie bank situation by a lack of options. Unless I’m missing something – would love to be wrong!

Categories: finance

Biking in New York City

I’m a huge fan of biking around the city. I like to commute to work, from the Columbia University neighborhood up at 116th and Broadway to just below Houston on Varick. Since both my house and my work are within blocks from the west side of Manhattan, I can bike the whole way along the west side bike path (see, for example, this map).

It’s a gorgeous ride along the Hudson River, and there’s not one day I ride it without appreciating not being stuck in the traffic next to me on the West Side Highway. Okay, actually, last Monday was one, when I got caught in a huge thunderstorm. Luckily I had dry clothes, but for some reason no dry socks (note to self: bare feet with wet leather boots is gross). I’m also happy not to be on the subway (1 line) on Monday mornings when people are extra grumpy about going to work.

I don’t bike when it’s (already) raining, or when it’s icy, and it’s always a bummer when daylight savings starts, because it means it’s already dark by the time I leave work. But otherwise I am on the lookout for great biking days and opportunities.

A few weeks ago, on the first really gorgeous day of spring, I biked from one Occupy meeting to another, the first one up at Columbia and the second in Union Square (to see my friend Suresh Naidu speak about Radical Economics 101). I biked through Central Park, which was bursting with spring joy, and then all the way to Union Square down Broadway, which now has a beautiful bike lane. The only annoying part was Times Square, which is so full of tourists you have to walk your bike. So that’s a good sign, when the pedestrians are more dangerous than the cars.

And I also bike on other streets, although after being doored a few times and breaking someone’s windshield with my head (a long time ago in Berkeley but still) I am hugely defensive- I pretty much assume every moving car is trying to hit me and every parked car’s door is about to open. Even so, there are quite a few quiet streets I can feel safe biking down, in the middle, and although it’s not very fast, it’s certainly faster than walking. A great way to explore the city.

And I’m not alone, here’s a great essay by David Byrne in a recent New York Times Opinion column entitled “This is How We Ride”. It’s a beautifully written piece, and he describes the joys of biking in the city perfectly. He mentions that there’s a new bike-share initiative starting this summer, where there will be 10,000 bikes for rent at 420 bike stations in Manhattan, Long Island City, and Brooklyn.

That’s awesome, even if I will have to share the bike lane with even more enthusiasts. The rides are limited to 30 minutes, so not a full commute for me, but it means that if I’m already downtown and want to get to the East Side (which is always hard – I like to say that going to the East Side is like going to L.A. in terms of logistical difficulties) I will be able to hop on a bike and cross town. Cool!

Categories: musing

Everybody lies (except me)

There’s an interesting article in the Wall Street Journal from yesterday about lying. In the article it explains that everybody lies a little bit and, yes, some people are serious liars, but the little lies are the more destructive because they are so pervasive.

It also explains that people only lie the amount they can get away with to themselves (besides maybe the out-and-out huge liars, but who knows what they’re thinking?).

When I read this article, of course, I thought to myself, I don’t lie even a little bit! And that kind of proved their point.

So here’s the thing. They also explained that people lie a bit more when they are in a situation where the consequences of lying are more abstract (think: finance) and that they lie more when they are around people they perceive as cheating (think: finance). So my conclusion is that finance is populated by liars, but that’s because of the culture that already exists there: most people just amble in as honest as anyone else and become that way.

Of course, every field has that problem, so it’s really not fair to single out finance. Except it is fair to single out any place where you can cheat easily, where there are ample opportunities to lie and profit off of lies.

One cool thing about the article is that they have a semi-solution, namely to remind people of moral rules right before the moment of possible lying. This can be reciting the ten commandments or swearing on a bible, which for some reason also works for atheists (but wouldn’t stop me from lying!), or could be as simple as making someone sign their name just before lying (or, even better, just before not lying) on their auto insurance forms.

Can we use this knowledge somehow in setting up the system of finance?

The result where people are more likely to lie when they know who the victim of their lie is may explain something about how, back when banks lent out money to people and held the money on their books, we had less fraud (but not zero fraud of course). The idea of personally knowing who the other person is in a transaction seems kind of important.

The idea that we make people swear they are telling the truth and sign their name seems easy enough, but obviously not infallible considering the robo-signing stuff. I wonder if we can use more tricks of the honesty trade and do things like make sure each person signing is also being videotaped or something, maybe that would also help.

Unfortunately another thing the article said was that having been taught ethics some time in the past actually doesn’t help. So it’s less to do with knowledge and more to do with habit (or opportunity), it seems. Food for thought as I’m planning the ethics course for data scientists.

Categories: data science, finance, musing

All the good data nowadays is private – what’s the point of having a data science Ph.D.?

I go back and forth on whether there should be an undergrad major or Ph.D. program on data science. On the one hand, I am convinced it’s a burgeoning field which will need all the smart people it can get in the next few years or decades. On the other hand, I’m just not sure how capable academics really are at teaching the required skills. Let me explain.

It’s not that professors aren’t super smart and great at what they do. But the truth is, they typically don’t have access to the kind of data that’s now available to data scientists working in Google or Facebook or other tech companies (see this recent New York Times article on the subject). Even where I work, which is a medium sized start-up, I have access to data which many academics would kill for. This means I get to play with an incredibly rich resource, assuming I have built up the toolset to do so.

So while academics are creating (unrealistic) models of “influence” based on weird assumptions about how information gets propagated through networks, nerds at Facebook and Google and Foursquare just get to see it happen in real time. There’s an enormous advantage to having the data at your fingertips – you get good results fast. But then since it’s all proprietary you can’t publish it (a topic for another post).

Another thing: since academics typically don’t have this kind of big data, they also don’t have to create tools or methods for taming huge data. Sometimes I hear statisticians say that data science is just statistics, but they are typically missing the point of this “taming” aspect of data science. Namely, if we use state-of-the-art proven statistical methods on 15 terabytes of data and it takes 50 years to come up with an answer, then guess what, it doesn’t work.

At the same time, data science isn’t purely algorithmic time considerations either, and a computer scientist without a good statistical background would be equally wrong if they said that data science is just machine learning.

For that matter, data science also isn’t purely speculative research – there’s a bottomline business aspect to it, and the intention is (usually) to make profit. But there’s no way someone with a business degree that doesn’t know how to model can be a data scientist either.

End result: To teach data science for reals, you’d need to form a inter-disciplinary department across business, computer science, applied math, and statistics. Even so, I’m not sure how well strictly academic departments can really teach the nitty gritty of data science if they do collaborate across departments because they just don’t have good enough data (and by the way, this is a huge “if” – it seems politically impossible in some of the universities I’ve talked to).

On the other hand, I think it’s a good idea to try, because it is a great opportunity to teach at least some basic stuff and to instill a code of ethics in young data scientists.

The way things work now, the tech industry takes in former mathematicians, physicists, computer scientists, and statisticians and puts them on projects creating models of human behavior (I’ll include finance in that category) that are infinitely scalable and sometimes nearly infinitely scaled. Nobody is ever taught to stop and think about how their models are going to be used and how to think about the long-term effects of their models.

In spite of all the data problems and political obstacles, I feel that for the sake of this conversation, i.e. of personal responsibility of a modeler, we should go ahead and make a program, because it’s important and it isn’t gonna happen in your typical finance firm or tech startup.

Categories: data science

Favorite bands

My 9-year-old’s favorite bands (and favorite songs):

  1. Queen (Bohemian Rhapsody)
  2. AC/DC (Back in Black)
  3. ABBA (Fernando)
  4. Green Day (American Idiot)
  5. Weird Al Yankovic (Canadian Idiot)
Categories: musing