Causal versus causal
Today I want to talk about the different ways the word “causal” is thrown around by statisticians versus finance quants, because it’s both confusing and really interesting.
But before I do, can I just take a moment to be amazed at how pervasive Gangnam Style has become? When I first posted the video on August 1st, I had no idea how much of a sensation it was destined to become. Here’s the Google trend graph for “Gangnam” versus “Obama”:
It really hit home last night as I was reading a serious Bloomberg article take on the economic implications of Gangnam Style whilst the song was playing in the background at the playoff game between the Cardinals and the Giants.
Back to our regularly scheduled program. I’m first going to talk about how finance quants think about “causal models” and second how statisticians do. This has come out of conversations with Suresh Naidu and Rachel Schutt.
Causal modeling in finance
When I learned how to model causally, it basically meant something very simple: I never used “future information” to make a prediction about the future. I was strictly using information from the past, or that was available and I had access to, to make predictions about the future. In other words, as I trained a model, I always had in mind a timestamp explaining what the “present time” is, and all data I had access to at that moment had timestamps of availability for before that present time so that I could use this information to make a statement about what I think would happen after that present time. If I did this carefully, then my model was termed “causal.” It respected time, and in particular it didn’t have great-looking predictive power just because it was peeking ahead.
Causal modeling in statistics
By contrast, when statisticians talk about a causal model, they mean something very different. Namely, they mean whether the model shows that something caused something else to happen. For example, if we saw certain plants in a certain soil all died but those in a different soil lived, then they’d want to know if the soil caused the death of the plants. Usually to answer this kind of questions, in an ideal situation, statisticians set up randomly chosen experiments where the only difference between the treatments is that one condition (i.e. the type of soil, but not how often you water it or the type of sunlight it gets). When they can’t set it up perfectly (say because it involves people dying instead of plants) they do the best they can.
The differences and commonalities
On the one hand both concepts refer and depend on time. There’s no way X caused Y to happen if X happened after Y. But whereas in finance we only care about time, in statistics there’s more to it.
So for example, if there’s a third underlying thing that causes both X and Y, but X happens before Y, then the finance people are psyched because they have a way of betting on the direction of Y: just keep an eye on X! But the statisticians are not amused, since there’s no way to prove causality in this case unless you get your hands on that third thing.
Although I understand wanting to know the underlying reasons things happen, I have a personal preference for the finance definition, which is just plain easier to understand and test, and usually the best we can do with real world data. In my experience the most interesting questions relate to things that you can’t set up experiments for. So, for example, it’s hard to know whether blue-collar presidents would be impose less elitist policy than millionaires, because we only have millionaires.
Moreover, it usually is interesting to know what you can predict for the future knowing what you know now, even if there’s no proof of causation, and not only because you can maybe make money betting on something (but that’s part of it).