For whatever reason I find myself giving a lot of advice. Actually, it’s probably because I’m an opinionated loudmouth.
The funny thing is, I pretty much always give the same advice, no matter if it’s about whether to quit a crappy job, whether to ask someone out that you have a crush on, or which city to move to. Namely, I say the following three things (in this order):
- Go for it! (this usually is all most people need, especially when talking about the crush type of advice)
- Do what you’d do if you weren’t at all insecure (great for people trying to quit a bad job or deciding between job offers)
- Do what a man would do (I usually reserve this advice for women)
I was reminded of that third piece of advice when I read this article about mothers in Germany and how they all seem to decide to quit their jobs and stay home with their kids, putatively because they don’t trust their babysitter. I say, get a better babysitter!
As an aside, let me say, I really don’t have patience for the maternal guilt thing. Probably it has something to do with the fact that my mom worked hard, and loved her job (computer scientist), and never felt guilty about it: for me that was the best role model a young nerd girl could have. When the PTA asked my mom to bake cookies, she flat out refused, and that’s what I do now. In fact I take it up a notch: when asked to bake cookies for a bake sale fund-raiser at my kids’ school (keeping in mind that this is one of those schools where the kids aren’t even allowed to eat cookies at lunch), I never forget to ask how many fathers they’ve made the cookies request to. I’m never asked a second time by the same person (however I always give them cash for the fund raising, it should be said).
It’s kind of amazing how well these three rules of thumb for advice work. I guess people usually know what they want but need some amount of help to get the nerve up to decide, to make the leap. And people consistently come back to me for advice, probably because the discussion ends up being just as much a pep talk as anything else. I’m like that guy in the corner of the ring at a fight, squirting water into the fighter’s mouth and rubbing his shoulders, saying, “You can do it, champ! Go out and get that guy!”
There may be something else going on, which is that, although I’m super opinionated, I’m also not very judgmental. In fact this guy, the “ex-moralist,” is my new hero. In this article he talks about people using their religious beliefs to guide their ethics, versus people using their moralistic beliefs (i.e. the belief in right and wrong), and how he was firmly in the second camp until one day when he lost faith in that system too – he becomes amoral. He goes on to say:
One interesting discovery has been that there are fewer practical differences between moralism and amoralism than might have been expected. It seems to me that what could broadly be called desire has been the moving force of humanity, no matter how we might have window-dressed it with moral talk. By desire I do not mean sexual craving, or even only selfish wanting. I use the term generally to refer to whatever motivates us, which ranges from selfishness to altruism and everything in between and at right angles. Mother Theresa was acting as much from desire as was the Marquis de Sade. But the sort of desire that now concerns me most is what we would want if we were absolutely convinced that there is no such thing as moral right and wrong. I think the most likely answer is: pretty much the same as what we want now.
He goes on to say that, when he argues with people, he can no longer rely on common beliefs and actually has to reason with people who disagree with him but are themselves internally consistent. He then adds:
My outlook has therefore become more practical: I desire to influence the world in such a way that my desires have a greater likelihood of being realized. This implies being an active citizen. But there is still plenty of room for the sorts of activities and engagements that characterize the life of a philosophical ethicist. For one thing, I retain my strong preference for honest dialectical dealings in a context of mutual respect. It’s just that I am no longer giving premises in moral arguments; rather, I am offering considerations to help us figure out what to do. I am not attempting to justify anything; I am trying to motivate informed and reflective choices.
I’m really excited by this concept. Am I getting fooled because he’s such a good writer? Or is it possible that he’s hit upon something that actually helps people disagree well? That we should stop assuming that the person we are talking to shares our beliefs? This is something like what I experience when I go to a foreign country- the expectation that I will meet people who agree with me is sufficiently reduced that I end up having many more interesting, puzzling and deep conversations than I do when I’m in my own country.
I’m thinking of starting to keep a list of things that encourage or discourage honest communication- this would go on the side of “encourage,” and Fox news would go on the side of “discourage.”
What about you, readers? Anything to add to my list on either side? Or any advice you need on quitting that job and finding a better one? Oh, and that guy you think is hot? Go for it.
It has been my unspoken goal of this blog to sex up math (okay, now it’s a spoken goal). There are just too many ways math, and mathematical things, are portrayed and conventionally accepted as boring and dry, and I’ve taken on the task of making them titillating to the extent possible. Anybody who has ever personally met me will not be surprised by this.
The reason I mention this is that today I’ve decided to talk about demographics, which may be the toughest topic yet to rebrand in a sexy light – even the word ‘demographics’ is bone dry (although there have been lots of nice colorful pictures coming out from the census). So here goes, my best effort:
Is it just me, or have there been a weird number of articles lately claiming that demographic information explain large-scale economic phenomena? Just yesterday there was this article, which claims that, as the baby boomers retire they will take money out of the stock market at a sufficient rate to depress the market for years to come. There have been quite a few articles lately explaining the entire housing boom of the 90’s was caused by the boomers growing their families, redefining the amount of space we need (turns out we each need a bunch of rooms to ourselves) and growing the suburbs. They are also expected to cause another problem with housing as they retire.
Of course, it’s not just the boomers doing these things. It’s more like, they have a critical mass of people to influence the culture so that they eventually define the cultural trends of sprawling suburbs and megamansions and redecorating kitchens, which in turn give rise to bizarre stores like ‘Home Depot Expo‘. Thanks for that, baby boomers. Or maybe it’s that the marketers figure out how boomers can be manipulated and the marketers define the trends. But wait, aren’t the marketers all baby boomers anyway?
I haven’t read an article about it, but I’m ready to learn that the dot com boom was all about all of the baby boomers having a simultaneous midlife crisis and wanting to get in on the young person’s game, the economic trend equivalent of buying a sports car and dating a 25-year-old.
Then there are countless articles in the Economist lately explaining even larger scale economic trends through demographics. Japan is old: no wonder their economy isn’t growing. Europe is almost as old, no duh, they are screwed. America is getting old but not as fast as Europe, so it’s a battle for growth versus age, depending on how much political power the boomers wield as they retire (they could suck us into Japan type growth).
And here’s my favorite set of demographic forecasts: China is growing fast, but because of the one child policy, they won’t be growing fast for long because they will be too old. And that leaves India as the only superpower in the world in about 40 years, because they have lots of kids.
So there you have it, demographics is sexy. Just in case you missed it, let me go over it once again with the logical steps revealed:
Demographics – baby boomers – Bill Clinton – Monica Lewinsky – blow job under the desk. Got it?
When I woke up this morning the sun was unreasonably bright and the song “Wonderwall” was running in a loop in my head.
It’s not so bad working at a startup.
I’ve been reading lots of machine learning books lately, and let me say, as a relative outsider coming from finance: machine learners sure are spoiled for data.
It’s like, they’ve built these fancy techniques and machines that take a huge amount of data and try to predict an outcome, and they always seem to start with about 50 possible signals and “learn” the right combination of a bunch of them to be better at predicting. It’s like that saying, “It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.”
In finance, a quant gets maybe one or two or three time series, hopefully that haven’t been widely distributed so they may still have signal. The effect that this new data on a quant is key: it’s exciting almost to the point of sexually arousing to get new data. That’s right, I said it, data is sexy! We caress the data, we kiss it and go to bed with it every night (well, the in-sample part of it anyway). In the end we have an intimate relationship with each and every time series in our model. In terms of quantity, however, maybe it’s daily (so business days, 262 days per year about), for maybe 15 years, so altogether 4000 data points. Not a lot to work with but we make do.
In particular, given 50 possible signals in a pile of new data, we would first look at each time series by plotting, to be sure it’s not dirty, we’d plot the (in-sample) returns as a histogram to see what we’re dealing with, we’d regress each against the outcome, to see if anything contained signal. We’d draw lagged correlation graphs of each against the outcome. We’d draw cumulative pnl graphs over time with that univariate regression for that one potential signal at a time.
In other words, we’d explore the data in a careful, loving manner, signal by signal, without taking the data for granted, instead of stuffing the kit and kaboodle into a lawnmower. It’s more work but it means we have a sense of what’s going into the model.
I’m wondering how powerful it would be to combine the two approaches.
There’s a fascinating article here about “decision fatigue,” which talks about how people lose the ability to make good decisions after they’ve made a bunch of decisions, especially if those decisions required them to exert willpower. A decision can require willpower either by virtue of being a trade-off or compromise between what one wants versus what one can afford, or by virtue of being a virtuous choice, e.g. eating a healthy snack instead of ice cream.
After making lots of decisions, people get exhausted and go for the easiest choice, which is often not the “correct” one for various reasons- it could be unhealthy or too expensive, for example. The article describes how salespeople can take advantage of this human foible by offering so many choices that, after a while, people defer to the salesperson to help them choose, thus ending up with a larger bill. It also explains that eating sugar is a quick restorative for your brain; if you’ve been exhausted by too many willpower exertions, a sugary snack will get you back on track, if only for a short while.
This all makes sense to me, but what I think is most interesting, and was really only touched on in the article, is how much this concept does or could matter in understanding our culture. For example, it talks about how this could explain why poor people eat badly- they go to the grocery store and are forced to exert willpower the entire time, with every purchase, since they constantly have to decide what they can afford; at the end of that arduous process they are exhausted and end up buying a sugary snack to replenish themselves.
I’m wondering how much of our behavior can be explained by willpower as a quantifiable resource. If we imagine that each person has some amount of stored willpower, that gets replenished through food and gets depleted through decisions, would that explain some amount of variance in behavior? Would it explain why crime gets committed at certain times?
This also reminds me of the experiments they did on kids to see which one of them could postpone reward (in the form of marshmallows) the longest. Turns out the kids who could delay gratification were more likely to get Ph.D.’s (no duh!). It is of course not always appropriate to delay gratification (and it’s certainly not in anyone’s best interest that everyone in the population should want to get a Ph.D.); on the other hand being able to plan ahead certainly is a good thing.
Since delaying gratification is a form of willpower, I’ll put it in the same category and ask, how come even at the age of four some kids can do that and others can’t (or won’t)? Is it genetically wired? Or is it practiced as a family value? Or both? Is it like strength, where some people are naturally strong but then again people can work out and make themselves much stronger?
Here’s another question about willpower, which is kind of the dual to the idea of depletion: can you have too much stored willpower? Is it like sexual energy, that needs to get used or kind of boils up on its own? I’m wondering if, when you’ve been trained all your life to exert a certain amount of willpower, and then you suddenly (through becoming extremely well-off or winning the lottery) don’t need nearly as much as you’re used to, do you somehow boil over with willpower? Does that explain why really rich people join Scientology and constantly go to spas for cleansings? Are they inventing challenges in order to exert their unused, pent-up willpower? I certainly think it’s possible.
As an example, I’ve noticed that people with too little money or with too much money are constantly worrying about money. I’m wondering if this “too much money” is coinciding with “unused willpower” and the result ironically looks similar to “not enough money” in combination with “depleted willpower”. Just an idea, but Sunday mornings are for ridiculous theories after all.
One way people’s trust of mathematics is being abused by crappy models is through the Value-Added Model, or VAM, which is actually a congregation of models introduced nationally to attempt to assess teachers and schools and their influence on the students.
I have a lot to say about the context in which we decide to apply a mathematical model to something like this, but today I’m planning to restrict myself to complaints about the actual model. Some of these complaints are general but some of them are specific to the way the one in New York is set up (still a very large example).
The general idea of a VAM is that teachers are rewarded for bringing up their students’ test scores more than expected, given a bunch of context variables (like their poverty and last year’s test scores).
The very first question one should ask is, how good is the underlying test the kids are taking? This is famously a noisy answer, depending on how much sleep and food the kids got that day, and, with respect to the content, depends more on memory than on deep knowledge. Another way of saying this is that, if a student does a mediocre job on the test, it could be because they are learning badly at their school, or that they didn’t eat breakfast, or it could be that the teachers they have are focusing more on other things like understanding the reasons for the scientific method and creating college-prepared students by focusing on skills of inquiry rather than memorization.
This brings us to the next problem with VAM, which is a general problem with test-score cultures, namely that it is possible to teach to the test, which is to say it’s possible for teachers to chuck out their curriculums and focus their efforts on the students doing well on the test (which in middle school would mean teaching only math and English). This may be an improvement for some classrooms but in general is not.
People’s misunderstanding of this point gets to the underlying problem of skepticism of our teachers’ abilities and goals- can you imagine if, at your job, you were mistrusted so much that everyone thought it would be better if you were just given a series of purely rote tasks to do instead of using your knowledge of how things should be explained or introduced or how people learn? It’s a fact that teachers and schools that don’t teach to the test are being punished for this under the VAM system. And it’s also a fact that really good, smart teachers who would rather be able to use their pedagogical chops in an environment where they are being respected leave public schools to get away from this culture.
Another problem with the New York VAM is the way tenure is set up. The system of tenure is complex in its own right, and I personally have issues with it (and with the system of tenure in general), but in any case here’s the way it works now. New teachers are technically given three years to create a portfolio for tenure- but the VAM results of the third year don’t come back in time, which means the superintendent looking at a given person’s tenure folder only sees two years of scores, and one of them is the first year, where the person was completely inexperienced.
The reason this matters is that, depending on the population of kids that new teacher was dealing with, more or less of the year could have been spent learning how to manage a classroom. This is an effect that overall could be corrected for by a model but there’s no reason to believe was. In other words, the overall effect of teaching to kids who are difficult to manage in a classroom could be incorporated into a model but the steep learning curve of someone’s first year would be much harder to incorporate. Indeed I looked at the VAM technical white paper and didn’t see anything like that (although since the paper was written for the goal of obfuscation that doesn’t prove anything).
For a middle school teacher, the fact that they have only two years of test scores (and one year of experienced scores) going into a tenure decision really matters. Technically the breakdown of weights for their overall performance is supposed to be 20% VAM, 20% school-wide assessment, and 60% “subjective” performance evaluation, as in people coming to their classroom and taking notes. However, the superintendent in charge of looking at the folders has about 300 folders to look at in 2 weeks (an estimate), and it’s much easier to look at test scores than to read pages upon pages of written assessment. So the effective weighting scheme is measurably different, although hard to quantify.
One other unwritten rule: if the school the teacher is at gets a bad grade, then that teacher’s chances of tenure can be zero, even if their assessment is otherwise good. This is more of a political thing than anything else, in that Bloomberg doesn’t want to say that a “bad” school had a bunch of tenures go through. But it means that the 20/20/60 breakdown is false in a second way, and it also means that the “school grade” isn’t an independent assessment of the teachers’ grades- and the teachers get double punished for teaching at a school that has a bad grade.
That brings me to the way schools are graded. Believe it or not the VAM employs a binning system when they correct for poverty, which is measured in terms of the percentage of the student population that gets free school lunches. The bins are typically small ranges of percentages, say 20-25%, but the highest bin is something like 45% and higher. This means that a school with 90% of kids getting free school lunch is expected to perform on tests similarly to a school with half that many kids with unstable and distracting home lives. This penalizes the schools with the poorest populations, and as we saw above penalized the teachers at those schools, by punishing them for when the school gets a bad grade. It’s my opinion that there should never be binning in a serious model, for reasons just like this. There should always be a continuous function that is fit to the data for the sake of “correcting” for a given issue.
Moreover, as a philosophical issue, these are the very schools that the whole testing system was created to help (does anyone remember that testing was originally set up to help identify kids who struggle in order to help them?), but instead we see constant stress on their teachers, failed tenure bids, and the resulting turnover in staff is exactly the opposite of helping.
This brings me to a crucial complaint about VAM and the testing culture, namely that the emphasis put on these tests, which we’ve seen is noisy at best, reduces the quality of life for the teachers and the schools and the students to such an extent that there is no value added by the value added model!
If you need more evidence of this please read this article, which describes the rampant cheating on test in Atlanta, Georgia and which is in my opinion a natural consequence of the stress that tests and VAM put on school systems.
One last thing- a political one. There is idiosyncratic evidence that near elections, students magically do better on tests so that candidates can talk about how great their schools are. With that kind of extra variance added to the system, how can teachers and school be expected to reasonably prepare their curriculums?
Next steps: on top of the above complaints, I’d say the worst part of the VAM is actually that nobody really understands it. It’s not open source so nobody can see how the scores are created, and the training data is also not available, so nobody can argue with the robustness of the model either. It’s not even clear what a measurement of success is, and whether anyone is testing the model for success. And yet the scores are given out each year, with politicians adding their final bias, and teachers and schools are expected to live under this nearly random system that nobody comprehends. Things can and should be better than this. I will talk in another blog post about how they should be improved.
As an applied mathematician, I am often asked to provide errorbars with values. The idea is to give the person reading a statistic or a plot some idea of how much the value or values could be expected to vary or be wrongly estimated, or to indicate how much confidence one has in the statistic. It’s a great idea, and it’s always a good exercise to try to provide the level of uncertainty that one is aware of when quoting numbers. The problem is, it’s actually very tricky to get them right or to even know what “right” means.
A really easy way to screw this up is to give the impression that your data is flawless. Here’s a prime example of this.
More recently we’ve seen how much the government growth rate figures can really suffer from lack of error bars- the market reacts to the first estimate but the data can be revised dramatically later on. This is a case where very simple errorbars (say, showing the average size of the difference between first and final estimates of the data) should be provided and could really help us gauge confidence. [By the way, it also brings up another issue which most people think about as a data issue but really is just as much a modeling issue: when you have data that gets revised, it is crucial to save the first estimates, with a date on that datapoint to indicate when it was first known. If we instead just erase the old estimate and pencil in the new, without changing the date (usually leaving the first date), then it gives us a false sense that we knew the "corrected" data way earlier than we did.]
However, even if you don’t make stupid mistakes, you can still be incredibly misleading, or misled, by errorbars. For example, say we are trying to estimate risk on a stock or a portfolio of stocks. Then people typically use “volatility error bars” to estimate the expected range of values of the stock tomorrow, given how it’s been changing in the past. As I explained in this post, the concept of historical volatility depends crucially on your choice of how far back you look, which is given by a kind of half-life, or equivalently the decay constant. Anything that is so not robust should surely be taken with a grain of salt.
But in any case, volatility error bars, which are usually designed to be either one or two lengths of the measured historical volatility, contain only as much information as the data in the lookback window. In particular, you can get extremely confused if you assume that the underlying distribution of returns is normal, which is exactly what most people do in fact assume, even when they don’t realize they do.
To demonstrate this phenomenon of human nature, recall that during the credit crisis you’d hear things like “We were seeing things that were 25-standard deviation moves, several days in a row,” from Goldman Sachs; the implication was that this was an incredibly unlikely event, near probability zero in fact, that nobody could have foreseen. Considering what we’ve been seeing in the market in the past couple of weeks, it would be nice to understand this statement.
There were actually two flawed assumptions exposed here. First, if we have a fat-tailed distribution, then things can seem “quiet” for long stretches of time (longer than any lookback window), during which the sample volatility is a possibly severe underestimate of the standard of deviation. Then when a fat-tailed event occurs, the sample volatility spikes to being an overestimate of the standard of deviation for that distribution.
Second, in the markets, there is clustering of volatility- another way of saying this is that volatility itself is rather auto-correlated, so even if we can’t predict the direction of the return, we can still estimate the size of the return. So once the market dives 5% in one day, you can expect many more days of large moves.
In other words, the speaker was measuring the probability that we’d see several returns, 25 standard deviations away from the mean, if the distribution is normal, with a fixed standard deviation, and the returns are independent. This is indeed a very unlikely event. But in fact we aren’t dealing with normal distributions nor independent draws.
Another way to work with errorbars is to have confidence errorbars, which relies (explicitly or implicitly) on an actual distributional assumption of your underlying data, and which tells the reader how much you could expect the answer to range given the amount of data you have, with a certain confidence. Unfortunately, there are problems here too- the biggest one being that there’s really never any reason to believe your distributional assumptions beyond the fact that it’s probably convenient, and that so far the data looks good. But if it’s coming from real world stuff, a good level of skepticism is healthy.
In another post I’ll talk a bit more about confidence errorbars, otherwise known as confidence intervals, and I’ll compare them to hypothesis testing.