Home > data science, finance, open source tools > Historical volatility on the S&P index

Historical volatility on the S&P index

July 30, 2011

In a previous post I described the way people in finance often compute historical volatility, in order to try to anticipate future moves in a single stock. I’d like to give a couple of big caveats to this method as well as a worked example, namely on daily returns of the S&P index, with the accompanying python code. I will use these results in a future post I’m planning about errorbars and how people abuse and misuse them.

Two important characteristics of returns

First, market returns in general have fat-tailed distributions; things can seem “quiet” for long stretches of time (longer than any lookback window), during which the sample volatility is a possibly severe underestimate of the “true” standard of deviation of the underlying distribution (if that even makes sense – for the sake of this discussion let’s assume it does). Then when a fat-tailed event occurs, the sample volatility typically spikes to being an overestimate of the standard of deviation for that distribution.

Second, in the markets, there is clustering of volatility- another way of saying this is that volatility itself is rather auto-correlated, so even if we can’t predict the direction of the return, we can still estimate the size of the return. This is particularly true right after a shock, and there are time series models like ARCH and its cousins that model this phenomenon; they in fact allow you to model an overall auto-correlated volatility, which can be thought of as scaling for returns, and allows you to then approximate the normalized returns (returns divided by current volatility) as independent, although still not normal (because they are still fat-tailed even after removing the clustered volatility effect). See below for examples of normalized daily S&P returns with various decays.

Example: S&P daily returns

I got this data from Yahoo Finance, where they let you download daily S&P closes since 1950 to an excel spreadsheet. I could have used some other instrument class, but the below results would be stronger (especially for things like credit default swamps), not weaker- the S&P, being an index, is already the sum of a bunch of things and tends to be more normal as a result; in other words, the Central Limit Theorem is already taking effect on an intraday basis.

First let’s take a look at the last 3 years of closes, so starting in the summer of 2008:

Next we can look at the log returns for the past 3 years:

Now let’s look at how the historical volatility works out with different decays (decays are numbers less than 1 which you use to downweight old data: see this post for an explanation):

For each choice of the above decays, we can normalize the log returns. to try to remove the “volatility clustering”:

As we see, the long decay doesn’t do a very good job. In fact, here are the histograms, which are far from normal:

Here’s the python code I used to generate these plots from the data (see also R code below):

#!/usr/bin/env python

import csv
from matplotlib.pylab import *
from numpy import *
from math import *
import os

dataReader = csv.DictReader(open(‘SandP_data.txt’, ‘rU’), delimiter=’,’, quotechar=’|’)

close_list = []
for row in dataReader:
#print row[“Date”], row[“Close”]
close_array = array(close_list)
close_log_array = array([log(x) for x in close_list])
log_rets = array(diff(close_log_array))
perc_rets = array([exp(x)-1 for x in log_rets])

plot(close_array[-780:-1], label = “raw closes”)
title(“S&P closes for the last 3 years”)
#plot(log_rets, label = “log returns”)
#hist(log_rets, 100, label = “log returns”)
#hist(perc_rets, 100, label = “percentage returns”)

def get_vol(d):
var = 0.0
lam = 0.0
var_list = []
for r in log_rets:
lam = lam*(1.0-1.0/d) + 1
var = (1-1.0/lam)*var + (1.0/lam)*r**2
return [sqrt(x) for x in var_list]

for d in [10, 30, 100]:
plot(get_vol(d)[-780:-1], label = “decay factor %.2f” %(1-1.0/d))
title(“Volatility in the S&P in the past 3 years with different decay factors”)
for d in [10, 30, 100]:
these_vols = get_vol(d)
plot([log_rets[i]/these_vols[i-1] for i in range(len(log_rets) – 780, len(log_rets)-1)], label = “decay %.2f” %(1-1.0/d))
title(“Volatility normalized log returns (last three years)”)
plot([log_rets[i] for i in range(len(log_rets) – 780, len(log_rets)-1)], label = “raw log returns”)
title(“Raw log returns (last three years)”)
for d in [10, 30, 100]:
these_vols = get_vol(d)
normed_rets = [log_rets[i]/these_vols[i-1] for i in range(len(log_rets) – 780, len(log_rets)-1)]
hist(normed_rets, 100,label = “decay %.2f” %(1-1.0/d))
title(“Histogram of volatility normalized log returns (last three years)”)

Here’s the R code Daniel Krasner kindly wrote for the same plots:


dataReader <- read.csv(“SandP_data.txt”, header=T)

close_list <- as.numeric(dataReader$Close)

close_list <- rev(close_list)

close_log_list <- log(close_list)

log_rets <- diff(close_log_list)

perc_rets = exp(log_rets)-1


plot(close_list[(length(close_list)-779):(length(close_list))], type=’l’, main=”S&P closes for the last 3 years”, col=’blue’)

legend(125, 1300, “raw closes”, cex=0.8, col=”blue”, lty=1)

get_vol <- function(d){

var = 0


var_list <- c()

for (r in log_rets){

lam <- lam*(1 – 1/d) + 1

var = (1 – 1/lam)*var + (1/lam)*r^2

var_list <- c(var_list, var)


return (sqrt(var_list))


L <- (length(close_list))


plot(get_vol(10)[(L-779):L], type=’l’, main=”Volatility in the S&P in the past 3 years with different decay factors”, col=1)

lines(get_vol(30)[(L-779):L],  col=2)

lines(get_vol(100)[(L-779):L],  col=3)

legend(550, 0.05, c(“decay factor .90”, “decay factor .97″,”decay factor .99”) , cex=0.8, col=c(1,2,3), lty = 1:3)



plot((log_rets[2:L]/get_vol(10))[(L-779):L], type=’l’,  col=1, lty=1, ylab=”)

legend(620, 3, “decay factor .90”, cex=0.6, col=1, lty = 1)

plot((log_rets[2:L]/get_vol(30))[(L-779):L], type=’l’, col=2, lty =2, ylab=”)

legend(620, 3, “decay factor .97”, cex=0.6, col=2, lty = 2)

plot((log_rets[2:L]/get_vol(100))[(L-779):L], type=’l’, col=3, lty =3, ylab=”)

legend(620, 3, “decay factor .99”, cex=0.6, col=3, lty = 3)


plot(log_rets[(L-779):L], type=’l’, main = “raw log returns”, col=”blue”, ylab=”)


hist((log_rets[2:L]/get_vol(10))[(L-779):L],  breaks=200, col=1, lty=1, ylab=”, xlab=”, main=”)

legend(2, 15, “decay factor .90”, cex=.8, col=1, lty = 1)

hist((log_rets[2:L]/get_vol(30))[(L-779):L],  breaks=200, col=2, lty =2, ylab=”,  xlab=”, main=”)

legend(2, 40, “decay factor .97”, cex=0.8, col=2, lty = 2)

hist((log_rets[2:L]/get_vol(100))[(L-779):L],  breaks=200,  col=3, lty =3, ylab=”,  xlab=”, main=”)

legend(3, 50, “decay factor .99”, cex=0.8, col=3, lty = 3)

  1. No comments yet.
  1. August 4, 2011 at 6:57 am
Comments are closed.
%d bloggers like this: