Combining priors and downweighting in linear regression
This is a continuation of yesterday’s post about understand priors on linear regression as minimizing penalty functions.
Today I want to talk about how we can pair different kinds of priors with exponential downweighting. There are two different kinds of priors, namely persistent priors and kick-off priors (I think I’m making up these terms, so there may be other official terms for these things).
Sometimes you want a prior to exist throughout the life of the model. Most “small coefficients” or “smoothness” priors are like this. In such a situation, you will aggregate today’s data (say), which means creating an matrix and an vector for that day, and you will add to every single day before downweighting your old covariance term and adding today’s covariance term.
Other times you just want your linear regression to start off kind of “knowing” what the expected answer is. In this case you only add the prior terms to the first day’s matrix and vector.
This is confusing so I’m going to work out an example. Let’s say we have a model where we have a prior that the 1) coefficients should look something like and also that 2) the coefficients should be small. This latter condition is standard and the former happens sometimes when we have older proxy data we can “pretrain” our model on.
Then on the first day, we find the matrix and vector coming from the data, but we add a prior to make it closer to :
How should we choose ? Note that if we set we have no prior, but on the other hand if we make absolutely huge, then we’d get This is perfect, since we are trying to attract the solution towards So we need to tune to be somewhere in between those two extremes – this will depend on how much you believe .
On the second day, we downweight data from the first day, and thus we also downweight the prior. We probably won’t “remind” the model to be close to anymore, since the idea is we’ve started off this model as if it had already been training on data from the past, and we don’t remind ourselves of old data except through downweighting.
However, we still want to remind the model to make the coefficients small – in other words a separate prior on the size of coefficients. So in fact, on the first day we will have two priors in effect, one as above and the other a simple prior on the covariance term, namely we add for some other tuning parameter . So actually the first day we compute:
And just to be really precise, of we denote by the downweighting constant, on day 2 we will have: