## Historical volatility on the S&P index

In a previous post I described the way people in finance often compute historical volatility, in order to try to anticipate future moves in a single stock. I’d like to give a couple of big caveats to this method as well as a worked example, namely on daily returns of the S&P index, with the accompanying python code. I will use these results in a future post I’m planning about errorbars and how people abuse and misuse them.

#### Two important characteristics of returns

First, market returns in general have fat-tailed distributions; things can seem “quiet” for long stretches of time (longer than any lookback window), during which the sample volatility is a possibly severe underestimate of the “true” standard of deviation of the underlying distribution (if that even makes sense – for the sake of this discussion let’s assume it does). Then when a fat-tailed event occurs, the sample volatility typically spikes to being an overestimate of the standard of deviation for that distribution.

Second, in the markets, there is clustering of volatility- another way of saying this is that volatility itself is rather auto-correlated, so even if we can’t predict the direction of the return, we can still estimate the *size* of the return. This is particularly true right after a shock, and there are time series models like ARCH and its cousins that model this phenomenon; they in fact allow you to model an overall auto-correlated volatility, which can be thought of as scaling for returns, and allows you to then approximate the* normalized* returns (returns divided by current volatility) as independent, although still not normal (because they are still fat-tailed even after removing the clustered volatility effect). See below for examples of normalized daily S&P returns with various decays.

**Example: S&P daily returns**

I got this data from Yahoo Finance, where they let you download daily S&P closes since 1950 to an excel spreadsheet. I could have used some other instrument class, but the below results would be stronger (especially for things like credit default swamps), not weaker- the S&P, being an index, is already the sum of a bunch of things and tends to be more normal as a result; in other words, the Central Limit Theorem is already taking effect on an intraday basis.

First let’s take a look at the last 3 years of closes, so starting in the summer of 2008:

Next we can look at the log returns for the past 3 years:

Now let’s look at how the historical volatility works out with different decays (decays are numbers less than 1 which you use to downweight old data: see this post for an explanation):

For each choice of the above decays, we can normalize the log returns. to try to remove the “volatility clustering”:

As we see, the long decay doesn’t do a very good job. In fact, here are the histograms, which are far from normal:

#### Here’s the python code I used to generate these plots from the data (see also R code below):

#!/usr/bin/env python

import csv

from matplotlib.pylab import *

from numpy import *

from math import *

import os

os.chdir(‘/Users/cathyoneil/python/sandp/’)

dataReader = csv.DictReader(open(‘SandP_data.txt’, ‘rU’), delimiter=’,’, quotechar=’|’)

close_list = []

for row in dataReader:

#print row["Date"], row["Close"]

close_list.append(float(row["Close"]))

close_list.reverse()

close_array = array(close_list)

close_log_array = array([log(x) for x in close_list])

log_rets = array(diff(close_log_array))

perc_rets = array([exp(x)-1 for x in log_rets])

figure()

plot(close_array[-780:-1], label = “raw closes”)

title(“S&P closes for the last 3 years”)

legend(loc=2)

#figure()

#plot(log_rets, label = “log returns”)

#legend()

#figure()

#hist(log_rets, 100, label = “log returns”)

#legend()

#figure()

#hist(perc_rets, 100, label = “percentage returns”)

#legend()

#show()

def get_vol(d):

var = 0.0

lam = 0.0

var_list = []

for r in log_rets:

lam = lam*(1.0-1.0/d) + 1

var = (1-1.0/lam)*var + (1.0/lam)*r**2

var_list.append(var)

return [sqrt(x) for x in var_list]

figure()

for d in [10, 30, 100]:

plot(get_vol(d)[-780:-1], label = “decay factor %.2f” %(1-1.0/d))

title(“Volatility in the S&P in the past 3 years with different decay factors”)

legend()

for d in [10, 30, 100]:

figure()

these_vols = get_vol(d)

plot([log_rets[i]/these_vols[i-1] for i in range(len(log_rets) – 780, len(log_rets)-1)], label = “decay %.2f” %(1-1.0/d))

title(“Volatility normalized log returns (last three years)”)

legend()

figure()

plot([log_rets[i] for i in range(len(log_rets) – 780, len(log_rets)-1)], label = “raw log returns”)

title(“Raw log returns (last three years)”)

for d in [10, 30, 100]:

figure()

these_vols = get_vol(d)

normed_rets = [log_rets[i]/these_vols[i-1] for i in range(len(log_rets) – 780, len(log_rets)-1)]

hist(normed_rets, 100,label = “decay %.2f” %(1-1.0/d))

title(“Histogram of volatility normalized log returns (last three years)”)

legend()

#### Here’s the R code Daniel Krasner kindly wrote for the same plots:

setwd(“/Users/cathyoneil/R”)

dataReader <- read.csv(“SandP_data.txt”, header=T)

close_list <- as.numeric(dataReader$Close)

close_list <- rev(close_list)

close_log_list <- log(close_list)

log_rets <- diff(close_log_list)

perc_rets = exp(log_rets)-1

x11()

plot(close_list[(length(close_list)-779):(length(close_list))], type=’l’, main=”S&P closes for the last 3 years”, col=’blue’)

legend(125, 1300, “raw closes”, cex=0.8, col=”blue”, lty=1)

get_vol <- function(d){

var = 0

lam=0

var_list <- c()

for (r in log_rets){

lam <- lam*(1 – 1/d) + 1

var = (1 – 1/lam)*var + (1/lam)*r^2

var_list <- c(var_list, var)

}

return (sqrt(var_list))

}

L <- (length(close_list))

x11()

plot(get_vol(10)[(L-779):L], type=’l’, main=”Volatility in the S&P in the past 3 years with different decay factors”, col=1)

lines(get_vol(30)[(L-779):L], col=2)

lines(get_vol(100)[(L-779):L], col=3)

legend(550, 0.05, c(“decay factor .90″, “decay factor .97″,”decay factor .99″) , cex=0.8, col=c(1,2,3), lty = 1:3)

x11()

par(mfrow=c(3,1))

plot((log_rets[2:L]/get_vol(10))[(L-779):L], type=’l’, col=1, lty=1, ylab=”)

legend(620, 3, “decay factor .90″, cex=0.6, col=1, lty = 1)

plot((log_rets[2:L]/get_vol(30))[(L-779):L], type=’l’, col=2, lty =2, ylab=”)

legend(620, 3, “decay factor .97″, cex=0.6, col=2, lty = 2)

plot((log_rets[2:L]/get_vol(100))[(L-779):L], type=’l’, col=3, lty =3, ylab=”)

legend(620, 3, “decay factor .99″, cex=0.6, col=3, lty = 3)

x11()

plot(log_rets[(L-779):L], type=’l’, main = “raw log returns”, col=”blue”, ylab=”)

par(mfrow=c(3,1))

hist((log_rets[2:L]/get_vol(10))[(L-779):L], breaks=200, col=1, lty=1, ylab=”, xlab=”, main=”)

legend(2, 15, “decay factor .90″, cex=.8, col=1, lty = 1)

hist((log_rets[2:L]/get_vol(30))[(L-779):L], breaks=200, col=2, lty =2, ylab=”, xlab=”, main=”)

legend(2, 40, “decay factor .97″, cex=0.8, col=2, lty = 2)

hist((log_rets[2:L]/get_vol(100))[(L-779):L], breaks=200, col=3, lty =3, ylab=”, xlab=”, main=”)

legend(3, 50, “decay factor .99″, cex=0.8, col=3, lty = 3)