## Good for the IASB!

There’s an article here in the Financial Times which describes how the International Accounting Standards Board is complaining publicly about how certain financial institutions are lying through their teeth about how much their Greek debt is worth.

It’s a rare stand for them (in fact the article describes it as “unprecedented”), and it highlights just how much a difference in assumptions in your model can make for the end result:

Financial institutions have slashed billions of euros from the value of their Greek government bond holdings following the country’s second bail-out. The extent to which Greek sovereign debt losses were acknowledged has varied, with some banks and insurers writing down their holdings by a half and others by only a fifth.

It all comes down to whether the given institution decided to use a “mark to model” valuation for their Greek debt or a “mark to market” valuation. “Mark to model” valuations are used in accounting when the market is “sufficiently illiquid” that it’s difficult to gauge the market price of a security; however, it’s often used (as IASB is claiming here) as a ruse to be deceptive about true values when you just don’t want to admit the truth.

There’s an amusingly technical description of the mark to model valuation for Greek debt used by BNP Paribas here. I’m no accounting expert but my overall takeaway is that it’s a huge stretch to believe that something as large as a sovereign debt market is illiquid and needs mark to model valuation: true, not many people are trading Greek bonds right now, but that’s because they suck so much and nobody wants to sell them at their true price since then they’d have to mark down their holdings. It’s a cyclical and unacceptable argument.

In any case, it’s nice to see the IASB make a stand. And it’s an example where, although there are two possible assumptions one can make, there really is a better, more reasonable one that should be made.

That reminds me, here’s another example of different assumptions changing the end result by quite a lot. The “trillion dollar mistake” that S&P supposedly made was in fact caused by them making a different assumption than that which the White House was prepared to make:

As it turns out, the sharpshooters were wide of the target. S&P didn’t make an arithmetical error, as Summers would have us believe. Nor did the sovereign-debt analysts show “a stunning lack of knowledge,” as Treasury Secretary Tim Geithner claimed. Rather, they used a different assumption about the growth rate of discretionary spending, something the nonpartisan Congressional Budget Office does regularly in its long-term outlook.

CBO’s “alternative fiscal scenario,” which S&P used for its initial analysis, assumes discretionary spending increases at the same rate as nominal gross domestic product, or about 5 percent a year. CBO’s baseline scenario, which is subject to current law, assumes 2.5 percent annual growth in these outlays, which means less new debt over 10 years.

Is anyone surprised about this? Not me. It also goes under the category of “modeling error”, which is super important for people to know and to internalize: different but reasonable assumptions going into a mathematical model can have absolutely huge effects on the output. Put another way, we won’t be able to infer *anything* from a model unless we have some estimate of the modeling error, and in this case we see the modeling error involves *at least* one trillion dollars.

## Strata data conference

So I’m giving a talk at this conference. I’m talking on Monday, September 19th, to business people, about how they should want to hire a data scientist (or even better, a team of data scientists) and how to go about hiring someone awesome.

Any suggestions?

And should I wear my new t-shirt when I’m giving my talk? Part of the proceeds of these sexy and funny data t-shirts goes to Data Without Borders! A great cause!

## Why log returns?

There’s a nice blog post here by Quantivity which explains why we choose to define market returns using the log function:

where denotes price on day .

I mentioned this question briefly in this post, when I was explaining how people compute market volatility. I encourage anyone who is interested in this technical question to read that post, it really explains the reasoning well.

I wanted to add two remarks to the discussion, however, which actually argue for not using log returns, but instead using *percentage* returns in some situations.

The first is that the assumption of a log-normal distribution of returns, especially over a longer term than daily (say weekly or monthly) is unsatisfactory, because the skew of log-normal distribution is positive, whereas actual market returns for, say, S&P is negatively skewed (because we see bigger jumps down in times of panic). You can get lots of free market data here and try this out yourself empirically, but it also makes sense. Therefore when you approximate returns as log normal, you should probably stick to daily returns.

Second, it’s difficult to logically combine log returns with fat-tailed distributional assumptions, even for daily returns, although it’s very tempting to do so because assuming “fat tails” sometimes gives you more reasonable estimates of risk because of the added kurtosis. (I know some of you will ask why not just use no parametric family at all and just bootstrap or something from the empirical data you have- the answer is that you don’t ever have enough to feel like that will be representative of rough market conditions, even when you pool your data with other similar instruments. So instead you try different parametric families and compare.)

Mathematically there’s a problem: when you assume a student-t distribution (a standard choice) of log returns, then you are automatically assuming that the expected value of any such stock in one day is infinity! This is usually not what people expect about the market, especially considering that there does not exist an infinite amount of money (yet!). I guess it’s technically up for debate whether this is an okay assumption but let me stipulate that it’s not what people usually intend.

This happens even at small scale, so for daily returns, and it’s because the moment generating function is undefined for student-t distributions (the moment generating function’s value at 1 is the expected return, in terms of *money*, when you use log returns). We actually saw this problem occur at Riskmetrics, where of course we didn’t see “infinity” show up as a risk number but we saw, every now and then, ridiculously large numbers when we let people combine “log returns” with “student-t distributions.” A solution to this is to use percentage returns when you want to assume fat tails.

## We didn’t make money on TARP!

There’s a pretty good article here by Gretchen Morgenson about how the banks have been treated well compared to average people- and since I went through the exercise of considering whether corporations are people, I’ve decided it’s misleading yet really useful to talk about “treating banks” well- we should keep in mind that this is shorthand for treating the people who control and profit from banks well.

On thing I really like about the article is that she questions the argument that you hear so often from the dudes like Paulson who made the decisions back then, namely that it was better to bail out the banks than to do nothing. Yes, but weren’t there alternatives? Just as the government could have demanded haircuts on the CDS’s they bailed out for AIG, they could have stipulated real conditions for the banks to receive bailout money. This is sort of like saying Obama could have demanded something in return for allowing Bush’s tax cuts for the rich to continue.

But on another issue I think she’s too soft. Namely, she says the following near the end of the article:

As for making money on the deals? Only half-true, Mr. Kane said. “Thanks to the vastly subsidized terms these programs offered, most institutions were eventually able to repay the formal obligations they incurred.” But taxpayers were inadequately compensated for the help they provided, he said. We should have received returns of 15 percent to 20 percent on our money, given the nature of these rescues.

Hold on, where did she get the 15-20%? As far as I’m concerned there’s no way that’s sufficient compensation for the future option to screw up as much as you can, knowing the government has your back. I’d love to see how she modeled the value of that. True, it’s inherently difficult to model, which is a huge problem, but I still think it has to be at least as big as the current credit card return limits! Or how about the Payday Loans interest rates?

I agree with her overall point, though, which is that this isn’t working. All of the things the Fed and the Treasury and the politicians have done since the credit crisis began has alleviated the pain of banks and, to some extent, businesses (like the auto industry). What about the people who were overly optimistic about their future earnings and the value of their house back in 2007, or who were just plain short-sighted, and who are still in debt?

It enough to turn you into an anarchist, like David Graeber, who just wrote a book about debt (here’s a fascinating interview with him) and how debt came before money. He thinks we should, as a culture, enact a massive act of debt amnesty so that the people are no longer enslaved to their creditors, in order to keep the peace.

I kind of agree- why is it so much easier for institutions to get bailed out when they’ve promised too much than it is for average people crushed under an avalanche of household debt? At the very least we should be telling people to walk away from their mortgages or credit card debts when it’s in their best interest (and we should help them understand when it* is* in their best interest).

## What is the mission statement of the mathematician?

In the past five years, I’ve been learning a lot about how mathematics is used in the “real world”. It’s fascinating, thought provoking, exciting, and truly scary. Moreover, it’s something I rarely thought about when I was in academics, and, I’d venture to say, something that most mathematicians don’t think about enough.

It’s weird to say that, because I don’t want to paint academic mathematicians as cold, uncaring or stupid. Indeed the average mathematician is quite nice, wants to make the world a better place (at least abstractly), and is quite educated and knowledgeable compared to the average person.

But there are some underlying assumptions that mathematicians make, without even noticing, that are pretty much wrong. Here’s one: mathematicians assume that people in general understand the assumptions that go into an argument (and in particular understand that there always *are* assumptions). Indeed many people go into math because of the very satisfying way in which mathematical statements are either true or false- this is one of the beautiful things about mathematical argument, and its consistency can give rise to great things: hopefulness about the possibility of people being able to sort out their differences if they would only engage in rational debate.

For a mathematician, nothing is more elevating and beautiful than the idea of a colleague laying out a palette of well-defined assumptions, and building a careful theory on top of that foundation, leading to some new-found clarity. It’s not too crazy, and it’s utterly attractive, to imagine that we could apply this kind of logical process to situations that are not completely axiomatic, that are real-world, and that, as long as people understand the simplifying assumptions that are made, and as long as they understand the estimation error, we could really improve understanding or even prediction of things like the stock market, the education of our children, global warming, or the jobless rate.

Unfortunately, the way mathematical models actually function in the real world is almost the opposite of this. Models are really thought of as nearly magical boxes that are so complicated as to render the results inarguable and incorruptible. Average people are completely intimidated by models, and don’t go anywhere near the assumptions nor do they question the inner workings of the model, the question of robustness, or the question of how many other models could have been made with similar assumptions but vastly different results. Typically people don’t even really understand the idea of errors.

Why? Why are people so trusting of these things that can be responsible for so many important (and sometimes even critical) issues in our lives? I think there are (at least) two major reasons. One touches on things brought up in this article, when it talks about information replacing thought and ideas. People don’t know about how the mortgage models work. So what? They also don’t know how cell phones work or how airplanes really stay up in the air. In some way we are all living in a huge network of trust, where we leave technical issues up to the experts, because after all we can’t be experts in everything.

But there’s another issue altogether, which is why I’m writing this post to mathematicians. Namely, there is a kind of scam going on in the name of mathematics, and I think it’s the *responsibility of mathematicians* to call it out and refuse to let it continue. Namely, people use the trust that people have of mathematics to endow their models with trust in an artificial and unworthy way. Much in the way that cops flashing their badges can abuse their authority, people flash the mathematics badge to synthesize mathematical virtue.

I think it’s time for mathematicians to start calling on people to stop abusing people’s trust in this way. One goal of this blog is to educate mathematicians about how modeling is used, so they can have a halfway decent understanding of how models are created and used in the name of mathematics, and so mathematicians can start talking about where mathematics actually plays a part and where politics, or greed, or just plain ignorance sometimes takes over.

By the way, I think mathematicians also have another responsibility which they are shirking, or said another way they should be taking on another project, which is to educate people about how mathematics is used. This is very close to the concept of “quantitative literacy” which is explained in this recent article by Sol Garfunkel and David Mumford. I will talk in another post about what mathematicians should be doing to promote quantitative literacy.

## Lagged autocorrelation plots

I wanted to share with you guys a plot I drew with python the other night (the code is at the end of the post) using blood glucose data that I’ve talked about previously in this post and I originally took a look at in this post.

First I want to motivate lagged autocorrelation plots. The idea is, given that you want to forecast something, say in the form of a time series (so a value every day or every ten minutes or whatever), the very first thing you can do is try to use *past values* to forecast the next value. In other words, you want to squeeze as much juice out of that orange as you can before you start using outside variable to predict future values.

Of course this won’t always work- it will only work, in fact, if there’s some correlation between past values and future values. To estimate how much “signal” there is in such an approach, we draw the correlation between values of the time series for various lags. At no (=0) lag, we are comparing a time series to itself so the correlation is perfect (=1). Typically there are a few lags after 0 which show some positive amount of correlation, then it quickly dies out.

We could also look at correlations between *returns* of the values, or *differences* of the values, in various situations. It depends on what you’re really trying to predict: if you’re trying to predict the change in value (which is usually what quants in finance do, since they want to bet on stock market changes for example), probably the latter will make more sense, but if you actually care about the value itself, then it makes sense to compute the raw correlations. In my case, since I’m interested in forecasting the blood glucose levels, which essentially have maxima and minima, I do care about the actual number instead of just the relative change in value.

Depending on what kind of data it is, and how scrutinized it is, and how much money can be made by betting on the next value, the correlations will die out more quickly. Note that, for example, if you did this with daily S&P returns and saw a nontrivial positive correlation after 1 lag, so the next day, then you could have a super simple model, namely bet that whatever happened yesterday will happen again today, and you would statistically make money on that model. At the same time, it’s a general fact that as “the market” recognizes and bets on trends, they tend to disappear. This means that such a simple, positive one-day correlation of returns would be “priced in” very quickly and would therefore disappear with new data. This tends to happen a lot with quant models- as the market learns the model, the predictability of things decreases.

However, in cases where there’s less money riding on the patterns, we can generally expect to see more linkage between lagged values. Since nobody is making money betting on blood glucose levels inside someone’s body, I had pretty high hopes for this analysis. Here’s the picture I drew:

What do you see? Basically I want you to see that the correlation is quite high for small lags, then dies down with a small resuscitation near 300 (hey, it turns out that 288 lags equals one day! So this autocorrelation lift is probably indicating a daily cyclicality of blood glucose levels). Here’s a close-up for the first 100 lags:

We can conclude that the correlation seems significant to about 30 lags, and is decaying pretty linearly.

This means that we can use the previous 30 lags to predict the next level. Of course we don’t want to let 30 parameters vary independently- that would be crazy and would totally overfit the model to the data. Instead, I’ll talk soon about how to place a prior on those 30 parameters which essentially uses them all but doesn’t let them vary freely- so the overall number of independent variables is closer to 4 or 5 (although it’s hard to be precise).

On last thing: the data I have used for this analysis is still pretty dirty, as I described here. I will do this analysis again once I decide how to try to remove crazy or unreliable readings that tend to happen before the blood glucose monitor dies.

Here’s the python code I used to generate these plots:

#!/usr/bin/env python import csv from matplotlib.pylab import * import os from datetime import datetime os.chdir('/Users/cathyoneil/python/diabetes/') gap_threshold = 12 dataReader = csv.DictReader(open('Jason_large_dataset.csv', 'rb'), delimiter=',', quotechar='|') i=0 datelist = [] datalist = [] firstdate = 4 skip_gaps_datalist = [] for row in dataReader: #print i, row["Sensor Glucose (mg/dL)"] if not row["Raw-Type"] == "GlucoseSensorData":continue if firstdate ==4: print i firstdate = \ datetime.strptime(row["Timestamp"], '%m/%d/%y %H:%M:%S') if row["Sensor Glucose (mg/dL)"] == "": datalist.append(-1) else: thisdate = datetime.strptime(row["Timestamp"], '%m/%d/%y %H:%M:%S') diffdate = thisdate-firstdate datelist.append(diffdate.seconds + 60*60*24*diffdate.days) datalist.append(float(row["Sensor Glucose (mg/dL)"])) skip_gaps_datalist.append(log(float(row["Sensor Glucose (mg/dL)"]))) i+=1 continue print min(datalist), max(datalist) ##figure() ##scatter(arange(len(datalist)), datalist) ## ##figure() ##hist(skip_gaps_datalist, bins = 100) ##show() def lagged_correlation(g): d = dict(zip(datelist, datalist)) s1 = [] s2 = [] for date in datelist: if date + 60*5 in datelist: s1.append(d[date]) s2.append(d[date + 60*5]) return corrcoef(s1, s2)[1, 0] figure() plot([lagged_correlation(f) for f in range(1,900)])

## Should short selling be banned?

Yesterday it was announced that the short selling ban in France, Italy, and Spain for financial stocks would be continued; there’s also an indefinite short selling ban in Belgium. What is this and does it make sense?

Short selling is mathematically equivalent to buying the negative of a stock. To see the actual mechanics of how it works, please look here.

Typically people at hedge funds use shorts to net out their exposure to the market as a whole: they will go long some bank stock they like and then go short another stock that they are neutral to or don’t like, with the goal of profiting on the difference of movements of the two – if the whole market goes up by some amount like 2%, it will only matter to them how much their long position outperformed their short. People also short stocks for direct negative forecasts on the stock, like when they detect fraud in accounting of the company, or otherwise think the market is overpricing the company. This is certainly a worthy reason to allow short selling: people who take the time to detect fraud should be rewarded, or otherwise said, people should be given an incentive to be skeptical.

If shorting the stock is illegal, then it generally takes longer for “price discovery” to happen; this is sort of like the way the housing market takes a long time to go down. People who bought a house at 400K simply don’t want to sell it for less, so they put it on the market for 400K even when the market has gone down and it is likely to sell for more like 350K. The result is that fewer people buy, and the market stagnates. In the past couple of years we’ve seen this happen in the housing market, although banks who have ownership of houses through foreclosures are much less quixotic about prices, which is why we’ve seen prices drop dramatically more recently.

The idea of banning short-selling is purely political. My favorite quote about it comes from Andrew Lo, an economist at M.I.T., who said, “It’s a bit like suggesting we take heart patients in the emergency room off of the heart monitor because you don’t want to make doctors and nurses anxious about the patient.” Basically, politicians don’t want the market to “panic” about bank stocks so they make it harder to bet against them. This is a way of avoiding knowing the truth. I personally don’t know good examples of the market driving down a bank’s stock when the bank is not in terrible shape, so I think even using the word “panic” is misleading.

When you suddenly introduce a short-selling ban, extra noise gets put into the market temporarily as people “cover their shorts”; overall this has a positive effect on the stocks in question, but it’s only temporary and it’s completely synthetic. There’s really nothing good about having temporary noise overwhelm the market except for the sake of the politicians being given a few extra days to try to solve problems. But that hasn’t happened.

Even though I’m totally against banning short selling, I think it’s a great idea to consider banning some other instruments. I actually go back and forth about the idea of banning credit default swaps (CDS), for example. We all know how much damage they can do (look at AIG), and they have a particularly explosive pay-off system, by design, since they are set up as insurance policies on bonds.

The ongoing crisis in Europe over debt is also partly due to the fact that the regulators don’t really know who owns CDS’s on Greek debt and how much there is out there. There are two ways to go about fixing this. First we could ban owning CDS unless you also own the underlying bond, so you are actually protecting your bond; this would stem the proliferation of CDS’s which hurt AIG so badly and which could also hurt the banks holding Greek bonds and who wrote Greek CDS protection. Alternatively, you could enforce a much more stringent system of transparency so that any regulator could go to a computer and do a search on where and how much CDS exposure (gross and net) people have in the world. I know people think this is impossibly difficult but it’s really not, and it should be happening already. What’s not acceptable is having a political and psychological stalemate because we don’t know what’s out there.

There are other instruments that definitely seem worthy of banning: synthetic over-the-counter instruments that seem created out of laziness (since the people who invented them could have approximated whatever hedge they wanted to achieve with standard exchange-traded instruments) and for the purpose of being difficult to price and to assess the risk of. Why not ban them? Why not ban things that don’t add value, that only add complexity to an already ridiculously complex system?

Why are we spending time banning things that make sense and ignoring actual opportunities to add clarity?