Home > data science, open source tools > The basics of quantitative modeling

## The basics of quantitative modeling

One exciting goal I have for this blog is to articulate the basic methods of quantitative modeling, followed by, hopefully, collaborative real-time examples of how this craft works out in given examples.  Today I just want to outline the techniques, and in later posts I will follow up with a post which goes into more detail on one or more points.

• Data cleaning: bad data (corrupt) vs. outliers (actual data which have unusual values)
• In sample/ out of sample data
• Predictive variables: choosing and preparing which ones and how many
• Exponential down-weighting of “old” data
• Remaining causal: predictive vs. descriptive modeling
• Regressions: linear and multivariate with exponentially down-weighted data
• Bayesian priors and how to implement them
• Open source tools
• When do you have enough data?
• When do you have statistically significant results?
• Visualizing everything
• General philosophy of avoiding fitting your model to the data

For those of you reading this who know a thing or two about being a quant, please do tell me if I’ve missed something.

I can’t wait!

1. June 16, 2011 at 9:05 am | #1

By which I mean: you go girl!

2. July 12, 2011 at 1:36 am | #2

Cathy,

Just came across your blog and your writing is very entertaining! Would love for you to write more about quantitative topics related to finance in the future.

3. October 24, 2011 at 11:05 am | #3

Hi. One issue we come across in our work is, for lack of a better phrase,”data inbreeding.” What this means is by the nature of our business, we generate data that is biased because we only want to treat users that meet certain criteria. This in and of itself is not a problem, but we have to learn off of this biased data to update models that are then used to make predictions on the general population. This is essentially a sample selection bias problem, and I’m sure we’re not the only ones that deal with this (another example is credit card companies that have to build default models on those who already passed the first default screening).

1. March 21, 2012 at 7:01 am | #1