The basics of quantitative modeling

Home > data science, open source tools > The basics of quantitative modeling

The basics of quantitative modeling

June 16, 2011 Cathy O'Neil, mathbabe

One exciting goal I have for this blog is to articulate the basic methods of quantitative modeling, followed by, hopefully, collaborative real-time examples of how this craft works out in given examples. Today I just want to outline the techniques, and in later posts I will follow up with a post which goes into more detail on one or more points.

Data cleaning: bad data (corrupt) vs. outliers (actual data which have unusual values)
In sample/ out of sample data
Predictive variables: choosing and preparing which ones and how many
Exponential down-weighting of “old” data
Remaining causal: predictive vs. descriptive modeling
Regressions: linear and multivariate with exponentially down-weighted data
Bayesian priors and how to implement them
Open source tools
When do you have enough data?
When do you have statistically significant results?
Visualizing everything
General philosophy of avoiding fitting your model to the data

For those of you reading this who know a thing or two about being a quant, please do tell me if I’ve missed something.

I can’t wait!

Comments (3)

Aaron

June 16, 2011 at 9:05 am

By which I mean: you go girl!

LikeLike
Dan

July 12, 2011 at 1:36 am

Cathy,

Just came across your blog and your writing is very entertaining! Would love for you to write more about quantitative topics related to finance in the future.

LikeLiked by 1 person
Brian Dalessandro

October 24, 2011 at 11:05 am

Hi. One issue we come across in our work is, for lack of a better phrase,”data inbreeding.” What this means is by the nature of our business, we generate data that is biased because we only want to treat users that meet certain criteria. This in and of itself is not a problem, but we have to learn off of this biased data to update models that are then used to make predictions on the general population. This is essentially a sample selection bias problem, and I’m sure we’re not the only ones that deal with this (another example is credit card companies that have to build default models on those who already passed the first default screening).

LikeLike