In my third effort to understand the Common Core State Standards (CC) for math, I interviewed an old college friend Kiri Soares, who is the principal and co-founder of the Urban Assembly Institute of Math and Science for Young Women. Here’s a transcript of the interview which took place earlier this month. My words are in italics below.
How are high school math teachers in New York City currently evaluated?
Teachers are now evaluated on 2 things:
- First, measures of teacher practice, which are based on observations, in turn based on some rubric. Right now it’s the Danielson Rubric. This is a qualitative measure. In fact it is essentially an old method with a new name.
- Second, measures of student learning, that is supposed to be “objective”. Overall it is worth 40% of the teacher’s score but it is separated into two 20% parts, where teachers choose the methodology of one part and principals choose the other. Some stuff is chosen for principals by the city. Any time there is a state test we have to choose it. In terms of the teachers’ choices, there are two ways to get evaluated: goals or growth. Goals are based on a given kid, and the teachers can guess they will get a certain slightly lower score or higher score for whatever reason. Otherwise, it’s a growth-based score. Teachers can also choose from an array of assessments (state tests, performance tests, and third party exams). They can also choose the cohort (their own kids/ the grade/the school). The city also chose performance tasks in some instances.
Can you give me a concrete example of what a teacher would choose as a goal?
At the beginning of year you give diagnostic tests to students in your subject. Based on what a given kid scored in September, you extrapolate a guess for their performance in the June test. So if a kid has a disrupted homelife you might guess lower. Teacher’s goal setting is based on these teachers’ guesses.
So in other words, this is really just a measurement of how well teachers guess?
Well they are given a baseline and teachers set goals relative to that, but yes. And they are expected to make those guesses in November, possibly well before homelife is disrupted. It definitely makes things more complicated. And things are pretty complicated. Let me say a bit more.
The first three weeks of school are all testing. We test math, social studies, science, and English in every grade, and overall it depending on teacher/principal selections it can take up to 6 weeks, although not in a given subject. Foreign language and gym teachers also getting measured, by the way, based on those other tests. These early tests are diagnostic tests.
Moreover, they are new types of tests, which are called performance-based assessments, and they are based on writing samples with prompts. They are theoretically better quality because they go deeper, the aren’t just bubble standardized tests, but of course they had no pre-existing baseline (like the state tests) and thus had to be administered as diagnostic. Even so, we are still trying to predict growth based on them, which is confusing since we don’t know how to predict performance on new tests. Also don’t even know how we can consistently grade such essay-based tests- despite “norming protocols”, which is yet another source of uncertainty.
How many weeks per year is there testing of students?
The last half of June is gone, a week in January, and 2-3 weeks in the high school in the beginning per subject. That’s a minimum of 5 weeks per subject per year, out of a total of 40 weeks. So one eighth of teacher time is spent administering tests. But if you think about it, for the teachers, it’s even more. They have to grade these tests too.
I’ve been studying the rhetoric around the CC. So far I’ve listened to Diane Ravitch stuff, and to Bill McCallum, the lead writer of the math CC. They have very different views. McCallum distinguished three things, which when they are separated like that, Ravitch doesn’t make sense.
Namely, he separates standards, curriculum, and testing. People complain about testing and say that CC standards make testing easier, and we already have too much testing, so CC is a bad thing. But McCallum makes this point: good standards also make good testing easier.
What do you think? Do teachers see those as three different things? Or is it a package deal, where all three things rolled into one in terms of how they’re presented?
It’s much easier to think of those three things as vertices of a triangle. We cannot make them completely isolated, because they are interrelated.
So, we cannot make the CC good without curriculum and assessment, since there’s a feedback loop. Similarly, we cannot have aligned curriculum without good standards and assessment, and we cannot have good tests without good standards and curriculum. The standards have existed forever. The common core is an attempt to create a set of nationwide standards. For example, without a coherent national curriculum it might seem OK to teach creationism in place of evolution in some states. Should that be OK?
CC is attempting to address this, in our global economy, but it hasn’t even approached science for clear political reasons. Math and English are the least political subjects so they started with those. This is a long time coming, and people often think CC refers to everything but so far it’s really only 40% of a kid’s day. Social studies CC standards are actually out right now, but they are very new.
Next, the massive machine of curriculum starts getting into play, as does the testing. I have CC standards and the CC-aligned test, but not curriculum.
Next, you’re throwing into the picture teacher evaluation aligned to CC tests. Teachers are freaking out now – they’re thinking, my curriculum hasn’t been CC-aligned for many years, what do I do now? By the way, importantly, none of the high school curriculum in NY State is actually CC-aligned now. DOE recommendations for the middle school happened last year, and DOE people will probably recommend this year for high school, since they went into talks with publication houses last year to negotiate CC curriculum materials.
The real problem is this: we’ve created these new standards to make things more difficult and more challenging without recognizing where kids are in the present moment. If I’m a former 5th grader, and the old standards were expecting something from me that I got used to, and it wasn’t very much, and now I’m in 6th grade, and there are all these raised expectations, and there’s no gap attention.
Bottomline, everybody is freaking out – teachers, students, and parents.
Last year was the first CC-aligned ELA and math tests. Everybody failed. They rolled out the test before any CC curriculum.
From the point of view of NYC teachers, this seems like a terrorizing regime, doesn’t it?
Yes, because the CC roll-out is rigidly tied to the tests, which are in turn rigidly tied to evaluations of teachers. So the teachers are worried they are automatically going to get a “failure” on that vector.
Another way of saying this is that, if teacher evaluations were taken out of the mix, we’d have a very different roll-out environment. But as it is, teachers are hugely anxious about the possibility that their kids might fail both the city and state tests, and that would give the teacher an automatic “failure” no matter how good their teacher observations are.
So if I’m a special ed teacher of a bunch of kids reading at 4th and 5th grade level even through they’re in 7th grade, I’m particularly worried with the introduction of the new and unknown CC-aligned tests.
So is that really what will happen? Will all these teachers get failing evaluation scores?
That’s the big question mark. I doubt it there will be massive failure though. I think given that the scores were so clustered in the middle/low muddle last year, they are going to add a curve and not allow so many students to fail.
So what you’re pointing out is that they can just redefine failure?
Exactly. It doesn’t actually make sense to fail everyone. Probably 75% of the kids got 2′s or 1′s out of a 4 point scale. What does failure mean when everyone fails? It just means the test was too hard, or that what the kids were being taught was not relevant to the test.
Let’s dig down to the the three topics. As far as you’ve heard from the teachers, what’s good and bad about CC?
My teachers are used to the CC. We’ve rolled out standards-based grading three years ago, so our math and ELA teachers were well adjusted, and our other subject teachers were familiar. The biggest change is what used to be 9th grade math is now expected of the 8th grade. And the biggest complaint I’ve heard is that it’s too much stuff – nobody can teach all that. But that’s always been true about every set of standards.
Did they get rid of anything?
Not sure, because I don’t know what the elementary level CC standards did. There was lots of shuffling in the middle school, and lots of emphasis on algebra and algebraic thinking. Maybe they moved data and stats to earlier grades.
So I believe that my teachers in particular were more prepared. In other schools, where teachers weren’t explicitly being asked to align themselves to standards, it was a huge shock. For them, it used to be solely about Regents, and also Regents exams are very predictable and consistent, so it was pretty smooth sailing.
Let’s move on to curriculum. You mentioned there is no CC-aligned curriculum in NY. I also heard NY state has recently come out against the CC, did you hear that?
Well what I heard is that they previously said they this year’s 9th graders (class of 2017) would be held accountable but now the class of 2022 will be. So they’ve shifted accountability to the future.
What does accountability mean in this context?
It means graduation requirements. You need to pass 5 Regents exams to graduate, and right now there are two versions of some of those exams: one CC-aligned, one old-school. The question is who has to pass the CC-aligned versions to graduate. Now the current 9th grade could take either the CC-aligned or “regular” Regents in math.
I’m going to ask my 9th grade students to take both so we can gather information, even though it means giving them 3 extra hours of tests. Most of my kids pass 2 Regents in 9th grade, 2 in 10th, and 3 in 11th, and then they’re supposed to be done. They only take those Regents tests in senior year that they didn’t pass earlier.
What are the good and bad things about testing?
What’s bad is how much time is lost, as we’ve already said. And also, it’s incredibly stressful. You and I went to school and we had one big college test that was stressful, namely the SAT. In terms of us finishing high school, that was it. For these kids it’s test, test, test, test. I don’t think it’s actually improved the quality of college students across the country. 20 years ago NY was the only one that had extra tests except California achievement tests, which I guess we sometimes took as well.
Another way to say it is that we did take some tests but it didn’t take 5 weeks.
And it wasn’t high stakes for the teacher!
Let’s go straight there: what are the good/bad things for the teachers with all these tests?
Well it definitely makes the teachers more accountable. Even teachers think this: there is a cadre of protected teachers in the city, and the principals didn’t want to take the time to get rid of them, so they’d excess them out of the schools, and they would stay in the system.
Now with testing it has become much more the principal’s responsibility to get rid of bad teachers. The number of floating teachers is going down.
How did they get rid of the floaters?
A lot of different ways. They made them go into the schools, take interviews, they made their quality of life not great, and a lot if them left or retired or found jobs. Principals took up the mantle as well, and they started to do due diligence.
Sounds like the incentive system for over-worked principals was wrong.
Yes, although the reason it became easier for the principals is because now we have data. So if you’re coming in as ineffective and I also have attendance data and observation data, I can add my observational data (subjective albeit rubric based) and do something.
If I may be more skeptical, it sounds like this data gathering was used as a weapon against teachers. There were probably lots of good teachers that have bad numbers attached to them that could get fired if someone wanted them to be fired.
Correct, except those good teachers generally have principals who protect them.
You could give everyone a bad number and then fire the people you want, right?
Is that the goal?
Under Bloomberg it was.
Is there anything else you want to mention?
I think testing needs to be dialed down but not disappear. Education is a bi-polar pendulum and it never stops in the middle. We’re on an extreme but let’s not get rid of everything. There is a place for testing.
Let’s get our CC standards, curriculum, and testing reasonable and college-aligned and let’s keep it reasonable. Let’s do it with standards across states and let’s make sure it makes sense.
Here’s what bothers me about that. It’s even harder to investigate the experience of the student with adaptive tests.
I’m not sure there’s enough technology to actually do this anyway very soon. For example, we were given $10,000 for 500 student. That’s not going to go far unless it takes 2 weeks to administer the test. But we are investing in our technology this year. For example, I’m looking forward to buying textbooks and get my updates pushed instead of having to buy new books every year.
Last question. They are redoing the SAT because rich kids are doing so much better. Are they just trying to get in on the test prep game? Because, here’s the thing, there’s no test that can’t be gamed that’s also easy to grade. It’s gotta depend on the letters and grades. We keep trying to shortcut that.
Listen, this is what I tell the kids. What’s going to matter to you is the letter of recommendation, so don’t be an jerk to your fellow students or to the teachers. Next, are you going to be able to meet the minimum requirements? That’s what the SAT is good for. It defines a lower bound.
Is it a good lower bound though?
Well, I define the lower bound as 1000 in total. My kids can target that. It’s a reasonable low bar.
To what extent do your students – mostly inner-city, black girls interested in math and science – suffer under the wholly gamed SAT system?
It serves to give them a point of self-reference with the rest of the country. You have to understand, they, like most kids in the nation, don’t have a conception of themselves outside of their own experience. The SAT serves that purpose. My kids, like many others, have the dream of Ivy League minus the understanding of where they actually stand.
So you’re saying their estimates of their chances are too high?
Yes, oftentimes. They are the big fish in a well-defined pond. At the very least, The SAT helps give them perspective.
Thanks so much for your time Kiri.
Today I’m planning to modify that talk so I can give a longer and more technical version of it on Friday morning at the Department of Mathematical Science of Worcester Polytechnic Institute, where I’ve been invited to speak by Suzy Weekes.
In about a month I’m going to Berkeley for a week to give a so-called MSRI-Evans talk on Monday, February 24th, at 4pm, thanks to the kind invitation of Lauren Williams. I still haven’t decided whether to give a “The World Is Going To Hell” talk, which would be kind of the technical version of my book (and which I gave at Harvard’s IQSS recently), or whether I should give yet another version of the Netflix talk, which is cool and technical but not as doomsday. If you’re planning to attend please voice your opinion!
Finally, I’m hoping to join in a meeting of some manifestation of the Noetherian Ring while I’m at Berkeley. This is a women in math group that was started when I was an undergrad there, back in the middle ages, in something like 1992. It’s where I gave my first and second math talks and there was always free pizza. It really was a great example of how to create a supportive environment for collaborative math.
I seem to be in a mood this week for provocative posts about body image and appearance (maybe this is what happens when I skip an Aunt Pythia column). Apologies to people who came for math talk.
I just wanted to mention something positive about the experience of being fat all my life, but especially as a school kid. Because just to be clear, this isn’t a phase. I’ve been pudgy since I was 2 weeks old. And overall it kind of works for me, and I’ll say why.
Namely, being a fat school kid meant that I was so uncool, so outside of normal social activity with boys and the like, that I was freed up to be as smart and as nerdy as I wanted, with very little stress about how that would “look”. You’re already fat, so why not be smart too? You’re not doing anything else, nobody’s paying attention to you, and there’s nothing to gossip about, so might as well join the math team.
It’s really a testament to both the pressure to be thin and the pressure to conform intellectually, i.e. not be a nerd, when you’re a young girl: they are both intense and super unpleasant. The happy truth is, one can be cover for the other. More than that, really: being fat (or “overweight” for people who are squeamish about the word “fat”) has opened up many doors that I honestly think would have, or at least could have, remained shut had I been more socially acceptable.
Going back to dress code at work for a moment: while people claim that corporate dress codes are meant to keep our minds off of sex, that is clearly a huge lie when it comes to many categories of women’s work clothes. Who are we kidding? The mere fact that many women wear high heels to work kind of says it all. And that’s fine, but let’s freaking acknowledge it.
On the other hand, it’s pretty hard to look sexy in a plus-sized suit (although not impossible), and the idea of high heels at work is just nuts. This ends up being a weirdly good thing for me, though: people take me more seriously because I have taken myself out of the sex game altogether – or at least the traditional sex game.
By the way, I’m not saying all fat women have the same perspective on it. I’m lucky enough to have figured out pretty early on how to separate other people’s projected feelings about my body from my own feelings. I am an observer of fat hatred, in other words. That doesn’t make me entirely insulated but it does give me one critical advantage: I have a lot of time on my hands to do stuff that I might otherwise spend fretting about my body.
It also might help partly explain why some girls get on the math team and others don’t. Being fat is something you don’t have control over (the continuing and damaging myth that each person does have control over it notwithstanding) but joining the math team is something you do have control over. And if you aren’t already excluded for some other reason (being fat is one but by no means the only way this could happen of course), you might not want to start that whole thing intentionally. Just a theory.
This is a guest post by Lillian Pierce, who is currently a faculty member of the Hausdorff Center for Mathematics in Bonn, and will next year join the faculty at Duke University.
I’m a mathematician. I also happen to be a mother. I turned in my Ph.D. thesis one week before the due date of my first child, and defended it five weeks after she was born. Two and a half years into my postdoc years, I had my second child.
Now after a few years of practice, I can pretty much handle daily life as a young academic and a parent, at least most of the time, but it still seems like a startlingly strenuous existence compared to what I remember of life as just a young academic, not a parent.
Last year I was asked by the Association for Women in Mathematics to write a piece for the AWM Newsletter about my impressions of being a young mother and getting a mathematical career off the ground at the same time. I suggested that instead I interview a lot of other mathematical mothers, because it’s risky to present just one view as “the way” to tackle mathematics and motherhood.
Besides, what I really wanted to know was: how is everyone else doing this? I wanted to pick up some pointers.
I met Mathbabe about ten years ago when I was a visiting prospective graduate student and she was a postdoc. She made a deep impression on me at the time, and I am very happy that I now have the chance to interview her for the series Mathematics+Motherhood, and to now share with you our conversation.
LP: Tell me about your current work.
CO: I am a data scientist working at a small start-up. We’re trying to combine consulting engagements with a new vision for data science training and education and possibly some companies to spin off. In the meantime, we’re trying not to be creepy.
LP: That sounds like a good goal. And tell me a bit about your family.
CO: I have three kids. I got pregnant with my first son, who’s 13 now, soon after my PhD. Then I had a second child 2 years later, also while I was a postdoc. I also have a 4 year old, whom I had when I was working in finance.
LP: Did you have any notions or worries in advance about how the growth of your family would intersect with the growth of your career?
CO: I absolutely did worry about it, and I was right to worry about it, but I did not hesitate about whether to have children because it was just not a question to me about how I wanted my life to proceed. And I did not want to wait until I was tenured because I didn’t want to risk being infertile, which is a real risk. So for me it was not an option not to do it as a woman, forget as a mathematician.
LP: What was it like as a postdoc with two very young children?
CO: On the one hand I was hopeful about it, and on the other hand I was incredibly disappointed about it. The hopeful part was that the chair of my department was incredibly open to negotiating a maternity leave for postdocs, and it really was the best maternity policy that I knew about: a semester off of teaching for each baby and in total an extra year of the postdoc, since I had 2 babies. So I ended up with four years of postdoc, which was really quite generous on the one hand, but on the other hand it really didn’t matter at all. Not “not at all”—it mattered somewhat but it simply wasn’t enough to feel like I was actually competing with my contemporaries who didn’t have children. That’s on the one hand completely obvious and natural and it makes sense, because when you have small children you need to pay attention to them because they need you—and at the same time it was incredibly frustrating.
LP: It’s interesting because it’s not that you were saying “I won’t be able to compete with my contemporaries over the course of my life,” but more “I can’t compete right now.”
CO: Exactly, “I can’t compete right now” with postdocs without children. I realize—and this is not a new idea—that mathematics as a culture frontloads entirely into those 3 or 4 years after you get your PhD. Ultimately it’s not my fault, it’s not women’s fault, it’s the fault of the academic system.
LP: What metrics could departments use to be thinking more about future potential?
CO: I actually think it’s hard. It’s not just for women that it should change. It’s for the actual culture of mathematics. Essentially, the system is too rigid. And it’s not only women who get lost. The same thing that winnows the pool down right after getting a PhD—it’s a whittling process, to get rid of people, get rid of people, get rid of people until you only have the elite left—that process is incredibly punishing to women, but it’s also incredibly punishing to everybody. And moreover because of the way you get tenure and then stay in your field for the rest of your life, my feeling is that mathematics actually suffers. The reason I say this is because I work in industry now, which is a very different system, and people can reinvent themselves in a way that simply does not happen in mathematics.
LP: Do you think industry, in terms of the young career phase, gets it closer to “right” than academia currently does?
CO: Much closer to right. It’s a brutal place, don’t get me wrong, it’s brutal. I’m not saying it’s a perfect system by any stretch of the imagination. But the truth is in industry you can have a 3 year stint somewhere that is a mistake. Forget having kids, you can have a 3 year stint that was just a mistake for you. You can say “I had a bad boss and I left that place and I got a new job” and people will say “Ok.” They don’t care. One thing that I like about it is the ability to reinvent yourself. And I don’t think you see that in math. In math, your progress is charted by your publication record at a granular level. And if you’re up for tenure and there’s a 3 year gap where you didn’t publish, even if in the other years you published a lot, you still have to explain that gap. It’s like a moral responsibility to keep publishing all the time.
LP: How are you measured in industry?
CO: In industry it’s the question “what have you done for me,” and “what have you done for me lately.” It’s a shorter-term question, and there are good elements to that. One of the good elements is that as a woman you can have a baby or a couple babies and then you can pick up the slack, work your ass off, and you can be more productive after something happens. If someone gets sick, people lower their expectations for that person for some amount of time until they recover, and then expectations are higher. Mathematics by contrast has frontloaded all of the stress, especially for the elite institutions, into the 3 or 4 years to get the tenure track offer and then the next 6 years to get tenure. And then all the stress is gone. I understand why people with tenure like that. But ultimately I don’t think mathematics gets done better because of it. And certainly when the question arises “why don’t women stay in math,” I can answer that very easily: because it’s not a very good place for women, at least if they want kids.
LP: You mention on your blog that your mother is an unapologetic nerd and computer scientist; the conclusion you drew from that was that it was natural for you not to doubt that your contributions to nerd-dom and science and knowledge would be welcomed. How do you think this experience of having a mother like that inoculated you?
CO: One of the great gifts that my mother gave me as a Mother Nerd was the gift of privacy—in the sense that I did not scrutinize myself. First of all she was role-modeling something for me, so if I had any expectations it would be to be like my mom. But second of all she wasn’t asking me to think about that. I think that was one of the rarest things I had, the most unusual aspect of my upbringing as a girl. Very few of the girls that I know are not scrutinized. My mother was too busy to pay attention to my music or my art or my math. And I was left alone to decide what I wanted to do—it wasn’t about what I was good at or what other people thought of my progress. It was all about answering the question, what did I want to do. Privacy for me is having elbow space to self-define.
LP: Do you think it’s harder for parents to give that space to girls than to boys?
CO: Yes I do, I absolutely do. It’s harder and for some reason it’s not even thought about. My mother also gave me the gift of not feeling at all guilty about putting me into daycare. And that’s one of my strongest lessons, is that I don’t feel at all guilty about sending my kids to daycare. In fact I recently had the daycare providers for my 4-year-old all over for dinner, and I was telling them in all honesty that sometimes I wish I could be there too, that I could just stay there all day, because it’s just a wonderful place to be. I’m jealous of my kids. And that’s the best of all worlds. Instead of saying “oh my kid is in daycare all day, I feel bad about that,” it’s “my kid gets to go to daycare.”
LP: Where did this ability not to scrutinize come from? Where did your mother get this?
CO: I don’t know. My mother has never given me advice, she just doesn’t give advice. And when I ask her to, she says “you know more about your life than I do.”
LP: How do you deal with scrutiny now?
CO: It’s transformed as I’ve gotten older. I’ve gotten a thicker skin, partly from working in finance. I’ve gotten to the point now where I can appreciate good feedback and ignore negative feedback. And that’s a really nice place to be. But it started out, I believe, because I was raised in an environment where I wasn’t scrutinized. And I had that space to self-define.
LP: The idea of pushing back against scrutiny to clear space for self-definition is inspiring for adults as well.
CO: Women in math, especially with kids, give yourself a break. You’re under an immense amount of pressure, of scrutiny. You should think of it as being on the front lines, you’re a warrior! And if you’re exhausted, there’s a reason for it. Please go read Radhika Nagpal’s Scientific American blog post (“The Awesomest 7-Year Postdoc Ever”) for tips on how to deal with the pressure. She’s awesome. And the last thing I want to say is that I never stopped loving math. Cardinal Rule Number 1: Before all else, don’t become bitter. Cardinal Rule Number 2: Remember that math is beautiful.
I like this essay written by Annie Gosfield, a self-described “composeress”, which is her word to mean a female composer. She finds it slightly absurd to be singled out for her femaleness. Her overall take on being a woman in a man’s world is refreshing, and resonates with me as a woman in math and technology.
From her essay:
I’ve never considered myself a “woman composer,” but I suspect that over the years being female has helped more than it’s hurt. Being a woman (and having high hair) has made me easier to recognize, easier to remember and has spared me from fitting into the generic description of a composer: “medium build, dark hair, glasses, beard.” I will admit to liking the invented honorific term “composeress.” (It sounds archaic, grand, and slightly ridiculous, just as a gender-specific title for a composer should.)
So, great for her, and wonderful that from her perspective she feels propelled rather than suffocated by her otherness status. To some extent I agree from my own experience.
But having said that, it doesn’t mean that other women, possibly many other women, haven’t been squeezed out, or have selected out, because of their female status. After all, we hear way more from the people who stay and “succeed”, which tends to give us massive survivorship bias.
Indeed, and to be nerdy and true to form, we can almost think about measuring the extent to which there is a weeding-out effect of women by asking the survivors the extent to which they identify as “women” versus the population at large. I think we’d find that the women who survive in nearly all-male environments have developed, or were born with, coping mechanisms which allow them to ignore their own otherness.
I know that was true of me – when I was in grad school at Harvard, I went through a distinct phase of wanting to wear men’s clothing, or at least gender neutral clothing – so jeans, t-shirts, leather shoes, never dresses – to be externally more consistent with how I felt inside. Not that I was sexually identified with men, but that I didn’t want to be seen as primarily feminine. Instead I wanted to be seen as primarily a mathematician.
Does it make me a freak, to wear men’s clothing and (sometimes) wish I could grow a beard? Possibly, although over time it’s changed, and nowadays I take pride in my femininity, and in fact I think much of my power emanates from it.
But it does give me pause when I hear successful women in men’s fields talking about how great it is to be a woman and how surprising all the attention is. We still seem to be contorting ourselves in an effort to not seem too womanly, and this makes me think it’s entirely un-coincidental, and possibly a crucial part of what allows us to succeed. Besides talent and hard work, of course. And I don’t think it’s undue attention at all – I think it’s just something we train ourselves not to consider because focusing on it too much could be paralyzing.
By the way, I’m not doing justice to Annie Gosfield’s essay, which you should read in its entirety and has nuanced things to say about otherness in the field of composing.
You should really read Nagpal’s guest blogpost from Scientific American (hat tip Ken Ribet) yourself, but here’s just a sneak preview, namely her check list of survival tactics that she describes in more detail later in the piece:
- I decided that this is a 7-year postdoc.
- I stopped taking advice.
- I created a “feelgood” email folder.
- I work fixed hours and in fixed amounts.
- I try to be the best “whole” person I can.
- I found real friends.
- I have fun “now”.
I really love this list, especially the “stop taking advice” part. I can’t tell you how much crap advice you get when you’re a tenure-track woman in a technical field. Nagpal was totally right to decide to ignore it, and I wish I’d taken her advice to ignore people’s advice, even though that sounds like a logical contradiction.
What I like the most about her list was her insistence on being a whole person and having fun – I have definitely had those rules since forever, and I didn’t have to make them explicit, I just thought of them as obvious, although maybe it was for me because my alternative was truly dark.
It’s just amazing how often people are willing to make themselves miserable and delay their lives when they’re going for something ambitious. For some reason, they argue, they’ll get there faster if they’re utterly submissive to the perceived expectations.
What bullshit! Why would anyone be more efficient at learning, at producing, or at creating when they’re sleep-deprived and oppressed? I don’t get it. I know this sounds like a matter of opinion but I’m super sure there’ll be some study coming out describing the cognitive bias which makes people believe this particular piece of baloney.
Here’s some advice: go get laid, people, or whatever it is that you really enjoy, and then have a really good night’s sleep, and you’ll feel much more creative in the morning. Hell, you might even think of something during the night – all my good ideas come to me when I’m asleep.
Even though her description of tenure-track life resonates with me, this problem, of individuals needlessly sacrificing their quality of life, isn’t confined to academia by any means. For example I certainly saw a lot of it at D.E. Shaw as well.
In fact I think it happens anywhere where there’s an intense environment of expectation, with some kind of incredibly slow-moving weeding process – academia has tenure, D.E. Shaw has “who gets to be a Managing Director”. People spend months or even years in near-paralysis wondering if their superiors think they’re measuring up. Gross!
Ultimately it happens to someone when they start believing in the system. Conversely the only way to avoid that kind of oppression is to live your life in denial of the system, which is what Nagpal achieved by insisting on thinking of her tenure-track job as having no particular goal.
Which didn’t mean she didn’t work hard and get her personal goals done, and I have tremendous respect for her work ethic and drive. I’m not suggesting that we all get high-powered positions and then start slacking. But we have to retain our humanity above all.
Bottomline, let’s perfect the art of ignoring the system when it’s oppressive, since it’s a useful survival tactic, and also intrinsically changes the system in a positive way by undermining it. Plus it’s way more fun.
I’m happy to say that the book I’m writing with Rachel Schutt called Doing Data Science is officially out for early review. That means a few chapters which we’ve deemed “ready” have been sent to some prominent people in the field to see what they think. Thanks, prominent and busy people!
It also means that things are (knock on wood) wrapping up on the editing side. I’m cautiously optimistic that this book will be a valuable resource for people interested in what data scientists do, especially people interested in switching fields. The range of topics is broad, which I guess means that the most obvious complaint about the book will be that we didn’t cover things deeply enough, and perhaps that the level of pre-requisite assumptions is uneven. It’s hard to avoid.
Thanks to my awesome editor Courtney Nash over at O’Reilly for all her help!
And by the way, we have an armadillo on our cover, which is just plain cool:
This is a guest post by Kaisa Taipale. Kaisa got a BS at Caltech, a Ph.D. in math at the University of Minnesota, was a post-doc at MSRI, an assistant professor at St. Olaf College 2010-2012, and is currently visiting Cornell, which is where I met here a couple of weeks ago, and where she told me about her cool visualizations of math Ph.D. emigration patterns and convinced her to write a guest post. Here’s Kaisa on a bridge:
Math data and viz
I was inspired by this older post on Mathbabe, about visualizing the arXiv postings of various math departments.
It got me thinking about tons of interesting questions I’ve asked myself and could answer with visualizations: over time, what’s been coolest on the arXiv? are there any topics that are especially attractive to hiring institutions? There’s tons of work to do!
I had to start somewhere though, and as I’m a total newbie when it comes to data analysis, I decided to learn some skills while focusing on a data set that I have easy non-technical access to and look forward to reading every year. I chose the AMS Annual Survey. I also wanted to stick to questions really close to my thoughts over the last two years, namely the academic job search.
I wanted to learn to use two tools, R and Circos. Why Circos? See the visualizations of college major and career path here - it’s pretty! I’ve messed around with a lot of questions, but in this post I’ll look at two and a half.
Where do graduating PhDs from R1 universities end up, in the short term? I started with graduates of public R1s, as I got my PhD at one.
The PhD-granting institutions are colored green, while academic institutions granting other degrees are in blue. Purple is for business, industry, government, and research institutions. Red is for non-U.S. employment or people not seeking — except for the bright red, which is still seeking. Yellow rounds things out at unknown. Remember, these figures are for immediate plans after graduation rather than permanent employment.
While I was playing with this data (read “learning how to use the reshape and ggplot2 packages”) I noticed that people from private R1s tend to end up at private R1s more often. So I graphed that too.
Does the professoriate in the audience have any idea if this is self-selection or some sort of preference on the part of employers? Also, what happened between 2001 and 2003? I was still in college, and have no idea what historical events are at play here.
Where mathematicians go
For any given year, we can use a circular graph to show us where people go. This is a more clumped version of the above data from 2010 alone, plotted using Circos. (Supplemental table E.4 from the AMS report online.)
The other question – the question current mathematicians secretly care more about, in a gossipy and potentially catty way – is what fields lead to what fate. We all know algebra and number theory are the purest and most virtuous subjects, and applied math is for people who want to make money or want to make a difference in the world.
[On that note, you might notice that I removed statistics PhDs in the visualization below, and I also removed some of the employment sectors that gained only a few people a year. The stats ribbons are huge and the small sectors are very small, so for looks alone I took them out.]
Higher resolution version available here.
I wish I could animate a series of these to show this view over time as well. Let me know if you know how to do that! Another nice thing I could do would be to set up a webpage in which these visualizations could be explored in a bit more depth. (After finals.)
- I haven’t computed any numbers for you
- the graphs from R show employment in each field by percentage of graduates instead of total number per category;
- it’s hard to show both data over time and all the data one could explore. But it’s a start.
I should finish with a shout-out to Roger Peng and Jeff Leek, though we’ve never met: I took Peng’s Computing for Data Analysis and much of Leek’s Data Analysis on Coursera (though I’m one of those who didn’t finish the class). Their courses and Stack Overflow taught me almost everything I know about R. As I mentioned above, I’m pretty new to this type of analysis.
What questions would you ask? How can I make the above cooler? Did you learn anything?
Almost a year ago now I wrote this post on being an alpha female. I had only recently understood that I was an alpha female, when I wrote it, and it was still kind of new and weird.
For whatever reason it’s been coming up a lot recently and I wanted to update that post with my observations.
Who’s burning which bridges?
Last week I wrote an outraged post about seeing Ina Drew at Barnard.
Mind you, I had anticipated I’d find the event objectionable. I had even polled my Occupy friends for prepared questions for her. But when I got there I realized pretty quickly that I wouldn’t be able to ask her anything. I was just too disgusted with the tone and conceit of the event to participate in it reasonably. Instead I live tweeted the event and seethed.
I lost sleep that night fuming about Drew-as-role-model, and I was grateful to be able to get some of my frustration out on my blog.
One of the first comments I received was this one, which said:
Boy Cathy, you sure do know how to burn bridges.
This was, for me, kind of a perfect alpha female moment. My immediate reaction was to think to myself,
They burned bridges with me, you mean.
Since that sounded too arrogant, at the moment anyway, I said something else just slightly less obnoxious. Three points to make here:
- Anyone who doesn’t agree with me about whether Ina Drew should be celebrated can go suck it.
- That post got linked to from Reuters, FT.com, and Naked Capitalism. Which doesn’t happen when you’re worrying about burning bridges.
- When I’m in a certain kind of mood, I’m simply not concerned with other people’s judgments. I think that’s just part of being an alpha female, and I’m grateful for it.
Why grateful? Because lots of shitty things happen when people go around worrying about “burning their bridges” instead of speaking up about bullshit or evil-doing. Or, as Felix Salmon tweeted recently:
Best point made on this #waronwhistleblowers panel: failure to leak has cost many more lives than leaking ever has.
— felix salmon (@felixsalmon) April 17, 2013
Taking notes from an uber alpha female
A few months ago I got an email inviting me to speak in a Python in Finance conference. The email was somewhat weird and kind of just came out and said they need women speakers. I was put in a position of being asked to be a token woman, which is a mindset I don’t enjoy.
I thought about it though, and although I use python, and I used to work in finance, I don’t work in finance any more, and I don’t really think about python too much, I just use it. So I said to the organizer, no thanks, I don’t have anything to say at that conference.
Fast forward to the week before the conference, when I got wind of the agenda. It turned out my friend Claudia Perlich, Chief Data Scientist at m6d and one of the contributors to my upcoming book with Rachel Schutt, was the keynote speaker. I decided to go to the conference essentially because I wanted to see her.
Well, it turned out Claudia had gotten a similar email, and she had accepted the invitation, even though she doesn’t work in finance and doesn’t even use python (she uses perl).
She gave a great talk about modeling blind spots, which everyone enjoyed. It was quite possibly the best talk of the day, in fact. Plus, she wasn’t at all token - having her on the schedule was what made me come to the conference, and I probably wasn’t the only one. And judging by the crowd at the Meetup I gave last night, I would have drawn my own crowd too, if I had been speaking.
I made an alpha female note to myself that day to accept any invitation to a conference that I’d enjoy, even if my expertise isn’t completely within the realm of the conference. I’m learning from Claudia, a master alpha female. Or is it mistress?
Alpha females and self-image
Chris Wiggins recently sent me this essay entitled “A Rant on Women” by Clay Shirky, a writer and professor who studies the social and economic effects of Internet technologies. Here’s the first paragraph:
So I get email from a good former student, applying for a job and asking for a recommendation. “Sure”, I say, “Tell me what you think I should say.” I then get a draft letter back in which the student has described their work and fitness for the job in terms so superlative it would make an Assistant Brand Manager blush.
Guess what? That student is male.
Shirky goes on to vent about how women don’t oversell themselves enough compared to men and how it’s a problem. An excerpt:
There is no upper limit to the risks men are willing to take in order to succeed, and if there is an upper limit for women, they will succeed less. They will also end up in jail less, but I don’t think we get the rewards without the risks.
This made me think about my experience. First, as a Barnard professor, I certainly saw this effect. I’d have men and women come talk to me about letters of recommendations, and not only would I prepare myself for the difference in posture, I’d try to address it directly, by encouraging women to learn how to brag about their accomplishments. I might have tried to convince men a couple of times to stop bragging quite so much, but quickly found that to be a huge waste of time.
But beyond corroborating that this is typical behavior, the essay made me remember myself as a college student.
When I met my thesis advisor, Barry Mazur, who was on sabbatical at UC Berkeley, I remember telling him a math problem I had worked on and solved. He expressed something about liking the problem and being impressed that I’d explained it so well, and I said back,
“Yeah, I’m awesome”
I remember this because of his reaction. At the time, the word “awesome” was widely used among teenagers, but evidently he hadn’t gotten the teenager memo, and he was taken aback by the way I used it. At least that’s what he said. But now that I think about it, maybe he was taken aback that I’d said it at all.
Alpha females and body image
My friend and guest poster Becky recently sent me this video:
It’s about how women have a biased view on their looks, or at least describe their looks to other people in a consistently negatively biased way.
There’s a great critique of this video here (hat tip Avani Patel), wherein fashion and style guru Jennifer Choy complains that the underlying message to the above video is that, in any case, beauty is about all women have going for them, so they should not underestimate their beauty. Plus that all the women in the video were skinny, young, and white.
Great points, but my take was somewhat different.
My immediate reaction to the video was to say, these women need to spend less time thinking about being fat or ugly, and more time thinking about what they think is sexy and attractive. Why is it always about finding flaws in ourselves? Why don’t we spend more time thinking about what turns us on or what we think is beautiful?
I’ll be honest: I think if I had been interviewed in that setting, I would have said something like, “Gorgeous and sexy as hell” and gone on to list my best features. I am not sure I’d have even been able to describe what I look like in any detail, with any accuracy. Most likely I would have just started bragging about my sexy grey streaks. Even more likely: I wouldn’t have had the time to sit down for this interview at all.
Don’t get me wrong, I’ve dabbled in being insecure in my looks: puberty sucked, as did all three post-natal periods until the baby was weaned*, in addition to any time I was ever on the pill**. I’ve concluded that my inherent arrogance is directly related to my hormones, which in turn makes it undeniably tied to my alpha femaleness.
Suffice it to say, when my hormones are not messed up I have “body eumorphia,” where I ignore or downplay any non-perfect parts of my body. It’s a nice feeling.
It kind of makes me want to develop an alpha female hormone treatment. Business model?
UPDATE: Please watch this new spoof video, it’s perfect (except it should be alpha females and men, not just men):
* It gets better when you know it’s going to go away. By the third kid I was like, “gonna cry every day at 3:00pm for the next six weeks. Must schedule that into my calendar.”
** Note to doctors: you need to tell women that the real reason birth control pills work so well is that you lose interest in sex when you’re on them!
This is a guest post by Julia Evans. Julia is a data scientist & programmer who lives in Montréal. She spends her free time these days playing with data and running events for women who program or want to — she just started a Montréal chapter of pyladies to teach programming, and co-organize a monthly meetup called Montréal All-Girl Hack Night for women who are developers.
I asked mathbabe a question a few weeks ago saying that I’d recently started a data science job without having too much experience with statistics, and she asked me to write something about how I got the job. Needless to say I’m pretty honoured to be a guest blogger here :) Hopefully this will help someone!
Last March I decided that I wanted a job playing with data, since I’d been playing with datasets in my spare time for a while and I really liked it. I had a BSc in pure math, a MSc in theoretical computer science and about 6 months of work experience as a programmer developing websites. I’d taken one machine learning class and zero statistics classes.
In October, I left my web development job with some savings and no immediate plans to find a new job. I was thinking about doing freelance web development. Two weeks later, someone posted a job posting to my department mailing list looking for a “Junior Data Scientist”. I wrote back and said basically “I have a really strong math background and am a pretty good programmer”. This email included, embarrassingly, the sentence “I am amazing at math”. They said they’d like to interview me.
The interview was a lunch meeting. I found out that the company (Via Science) was opening a new office in my city, and was looking for people to be the first employees at the new office. They work with clients to make predictions based on their data.
My interviewer (now my manager) asked me about my role at my previous job (a little bit of everything — programming, system administration, etc.), my math background (lots of pure math, but no stats), and my experience with machine learning (one class, and drawing some graphs for fun). I was asked how I’d approach a digit recognition problem and I said “well, I’d see what people do to solve problems like that, and I’d try that”.
I also talked about some data visualizations I’d worked on for fun. They were looking for someone who could take on new datasets and be independent and proactive about creating model, figuring out what is the most useful thing to model, and getting more information from clients.
I got a call back about a week after the lunch interview saying that they’d like to hire me. We talked a bit more about the work culture, starting dates, and salary, and then I accepted the offer.
So far I’ve been working here for about four months. I work with a machine learning system developed inside the company (there’s a paper about it here). I’ve spent most of my time working on code to interface with this system and make it easier for us to get results out of it quickly. I alternate between working on this system (using Java) and using Python (with the fabulous IPython Notebook) to quickly draw graphs and make models with scikit-learn to compare our results.
I like that I have real-world data (sometimes, lots of it!) where there’s not always a clear question or direction to go in. I get to spend time figuring out the relevant features of the data or what kinds of things we should be trying to model. I’m beginning to understand what people say about data-wrangling taking up most of their time. I’m learning some statistics, and we have a weekly Friday seminar series where we take turns talking about something we’ve learned in the last few weeks or introducing a piece of math that we want to use.
Overall I’m really happy to have a job where I get data and have to figure out what direction to take it in, and I’m learning a lot.
I’m back! I missed you guys bad.
My experience with Seattle in the last 8 days has convinced me of something I rather suspected, namely I’m a huge New York snob and can’t exist happily anywhere else. I will spare you the details (they have to do with cars, subways, and being an asshole pedestrian) but suffice it to say, glad to be home.
Just a few caveats on complaining about my vacation:
- I enjoyed visiting the University of Washington and giving the math colloquium there as well as a “Math Day” talk where I showed kids the winning strategy for Nim (as well as other impartial two-player games) following my notes from last summer.
- I enjoyed reading Leon and Becky’s guest posts. Thanks guys!
- And then there was the time spent with my darling family. Of course, goes without saying, it’s always magical to get to the point where your kids have invented a whole new language of insults after you’ve outlawed certain words: “Shut your fidoodle, you syncopathic lardle!”
Of all the topics I want to write about today, I’ve decided to go with the most immediate and surprising one : Leila Schneps is now a mystery writer! How cool is that? She’s written a book with her daughter, Math on Trial: How Numbers Get Used and Abused in the Courtroom, currently in stock and available on Amazon. And she wrote an op-ed for the New York Times talking about it (hat tip Chris Wiggins).
I know Leila from having been her grad student assistant at the GWU Summer Program for Women in Math the first year it existed, in 1995. She taught undergrads about Galois cohomology and interpreted elements of as twists and elements of as obstructions and then had them do a bunch of examples for homework with me. It was pretty awesome, and I learned a ton. Leila is also a regular and fantastic commenter on mathbabe.
I love the premise of the book she’s written. She finds a bunch of historical examples where mathematics is used in trials to the detriment of justice, and people get unfairly jailed (or, less often, let free). From the op-ed (emphasis mine):
Decades ago, the Harvard law professor Laurence H. Tribe wrote a stinging denunciation of the use of mathematics at trial, saying that the “overbearing impressiveness” of numbers tends to “dwarf” other evidence. But we neither can nor should throw math out of the courtroom. Advances in forensics, which rely on data analysis for everything from gunpowder to DNA, mean that quantitative methods will play an ever more important role in judicial deliberations.
The challenge is to make sure that the math behind the legal reasoning is fundamentally sound. Good math can help reveal the truth. But in inexperienced hands, math can become a weapon that impedes justice and destroys innocent lives.
I’m pretty sure you guys know this already, but I love my regular readers and commenters. It’s a large part of why I blog – I feel like I’m having a super interesting cocktail party every morning in my underwear. I’m investing in the quality of the rest of my day, stealing a moment before my family wakes up so I can articulate one single idea. The payoff is, most of the time, dependably good conversation that lasts all day, or even more than a day, as your comments and emails come in.
Of course, there are sometimes nasty people and comments in addition to thoughtful ones. Not everyone interprets me as trying to figure stuff out, they think I’m being intentionally asinine or manipulative. Or sometimes they just don’t agree with me, and instead of explaining their reasoning they just yell. Or sometimes they are just jerks, getting out their aggression on a stranger.
My first rule is to allow comments that disagree with me, as long as the reasons are articulated and as long as the comment isn’t abusive. Rude is ok, “you are stupid” is not ok.
My second rule is to have a thick skin. I can completely ignore the sentiment of an abusive commenter calling me names, because first of all I’ve heard it all before and second I’m pretty sure it’s not about me.
I’m not saying it doesn’t bother me at all, because obviously it’s a pain to have to go through my email and make sure people are being civil.
For example, whenever I get onto the top 10 of Hacker News, which has been a few times now, I’ve noticed a huge wave of nasty comments. Of course this could be a direct result of how many people I get (thousands per hour), but I don’t think so – the ratio of interesting to abusive comments coming from Hacker News traffic is tiny. It creates nasty work for me, which I feel compelled to do because letting nasty comments stay on my blog makes me feel violated and intentionally misunderstood.
This morning I found this article via Naked Capitalism regarding reader comments, and how nasty ones make subsequent readers evaluate the message differently, and in particular, more negatively. In other words, my intuition was right – it’s super important to curate comments.
My experience with Hacker News has also given me sympathy for Izabella Laba‘s position that she doesn’t accept comments on her blog (read this post for example). She puts herself out there, with strong opinions, and many of her posts are important and thought-provoking. And by the same token people can get pretty threatened by what she has to say. I can well imagine what her experience has been. What if every day was a Hacker News day? What if a majority of comments contained ridiculous and personal attacks? Yuck.
Makes me even more grateful to have you guys.
I’m excited about Rachel Schutt’s talk at Strata Santa Clara tomorrow at 1:30 PST. I don’t think it’s being live-streamed, unfortunately, but maybe we will eventually get our hands on a video.
The topic is next-gen data science and data scientists, which is explained in her abstract:
Data Science is an emerging field in industry, yet not well-defined as an academic discipline (or even in industry for that matter). I proposed the “Introduction to Data Science” course at Columbia in March, 2012. This was the first course at Columbia that had the term “Data Science” in the title. I had three primary motivations:
1) Bringing industry to students: I wanted to give students an education in what it’s like to be a data scientist in industry and give them some of the skills data scientists have. This is based on my experience as a lead analyst on the Google+ Data Science team. But I didn’t want to limit them to only my way of seeing the world, so each week, guest speakers from theNYC tech community came to teach the class.
2) I wanted to think more deeply about the science of data science: Data Science has the potential to be a deep and profound research discipline impacting all aspects of our lives. Columbia University and Mayor Bloomberg announced the Institute for Data Sciences and Engineering in July, 2012. This course created an opportunity to develop the theory of Data Science and to formalize it as a legitimate science.
3) Personal Challenge: I kept hearing from data scientists in industry that you can’t teach data science in a classroom or university setting and I took that on as a challenge. I wanted to test the hypothesis that it was possible to train awesome data scientists in the classroom.
In February 2013, 2 months will have passed since the class ended. I’ll be able to reflect on how the class went, how I thought about the curriculum, how I engaged the NYC tech community to be involved in the class, who the students were, whether I had impact on them, etc.
Rachel wrote a blog for the class and had a great post about being a next-gen data scientist. She has high hopes for the students in the class and wrote an aspirational list for them. It started with the idea of being more focused on integrity than on self-promotion, and it ended with bringing one’s humanity to the job.
When Rachel talks about it it seems possible that one could use data science to actually make the world a better place rather than to simply add to the hype and to the predatory nature of the current modeling space (see this article for a perfect example of the predatory modeling side – it doesn’t specifically talk about models but believe me, they’re there, helping the payday lenders and the banks choose who to trap and who to ignore. I’ve talked to people who worked on earlier generations of those models).
Rachel also gave a TEDx Women’s talk at Barnard on the subject of bringing humanity to modeling. Here’s the video of her talk. And while I make fun of TED talks a lot, mostly because they have overly polished ideas and delivery, one thing I love about Rachel’s is how raw and powerful it is. Go Rachel!
I don’t agree with everything she always says, but I agree with everything Izabella Laba says in this post called Gender Bias 101 For Mathematicians (hat tip Jordan Ellenberg). And I’m kind of jealous she put it together in such a fantastic no-bullshit way.
Namely, she debunks a bunch of myths of gender bias. Here’s my summary, but you should read the whole thing:
- Myth: Sexism in math is perpetrated mainly by a bunch of enormously sexist old guys. Izabella: Nope, it’s everyone, and there’s lots of evidence for that.
- Myth: The way to combat sexism is to find those guys and isolate them. Izabella: Nope, that won’t work, since it’s everyone.
- Myth: If it’s really everyone, it’s too hard to solve. Izabella: Not necessarily, and hey you are still trying to solve the Riemann Hypothesis even though that’s hard (my favorite argument).
- Myth: We should continue to debate about its existence rather than solution. Izabella: We are beyond that, it’s a waste of time, and I’m not going to waste my time anymore.
- Myth: Izabella, you are only writing this to be reassured. Izabella: Don’t patronize me.
Here’s what I’d add. I’ve been arguing for a long time that gender bias against girls in math starts young and starts at the cultural level. It has to do with expectations of oneself just as much as a bunch of nasty old men (by the way, the above is not to say there aren’t nasty old men (and nasty old women!), just that it’s not only about them).
My argument has been that the cultural differences are larger than the talent differences, something Larry Summers strangely dismissed without actually investigating in his famous speech.
And I think I’ve found the smoking gun for my side of this argument, in the form of an interactive New York Times graphic from last week’s Science section which I’ve screenshot here:
What this shows is that 15-year-old girls out-perform 15-year-old boys in certain countries and under-perform them in others. Those countries where they outperform boys is not random and has everything to do with cultural expectations and opportunities for girls in those countries and is explained to some extent by stereotype threat. Go read the article, it’s fascinating.
I’ll say again what I said already at the end of this post: the great news is that it is possible to address stereotype threat directly, which won’t solve everything but will go a long way.
You do it by emphasizing that mathematical talent is not inherent, nor fixed at birth, and that you can cultivate it and grow it over time and through hard work. I make this speech whenever I can to young people. Spread the word!
I’ve been here at the Nebraska conference for undergrad women in math for a couple of days now. There are quite a few grad students and young professors as well and I’m finding myself giving a few pieces of advice over and over again to the new female professors. I thought I’d write them down here too.
Obviously you can take this advice or leave it.
- Ban guilt from your child-rearing experience. The tenure system being what it is, it’s just impossible for you to work enough, including research, and to spend 4 hours a day with an awake baby. Instead think of it this way: it takes a village to raise a child, and this is the time when it’s more village than mom, which is ok. Make sure they are in loving environments, have super nice babysitters, get the best daycare you can, and stop worrying about being a crappy mom. Turns out you’ll have plenty of time to do awesome things with your kids and in the meantime they need you to be a role model, which means pursuing your dreams.
- I’m not suggesting working too much either – having a really set schedule which allows time for work during daycare and then time for family before and after is great, and your students and colleagues will just need to accept that you are available during working hours and not otherwise. Don’t apologize for this, just do your job, and don’t assume people are judging you for it either.
- I met a ton of women who seem to have taken on all of the household duties and are overwhelmed by them, especially when they also have small children. First of all, lower your standards. Houses can be messy, it doesn’t actually kill anyone if you ignore an upturned lego box because you want to go think about math. Second, budget a housecleaner – one woman described how she and her husband decided to sell their car but kept their housekeeper, and I fully endorse this trade-off. Third, sit down with your partner and write a list of chores and split them up. It’s not sexy but it works. Finally, be sure your kids help as soon as they can. Turns out kids can make their own school lunches starting when they’re 8 if the ingredients are readily available.
- Personally I never do more volunteering at the kids’ schools than my husband as a matter of principle. And it also turns out my husband never does any. This makes me a bitch but also saves me a ton of time. Consider it.
- Make time for something other than kids and work. Carve it out with a knife if necessary. It will be worth it and will keep you sane and remembering why you made this plan.
- Also don’t forget to have dates with your partner.
- Finally, if you ultimately decide it’s not working, remember you have lots of options with a math Ph.D. – don’t underestimate yourself and your options.
I hope that’s helpful!
Last night I was waiting for a bus to go hang with my Athena Mastermind group, which consists of a bunch of very cool Barnard student entrepreneurs and their would-be role models (I say would-be because, although we role models are also very cool, I often think the students are role modeling for us).
As I was waiting at the bus stop, I overheard two women talking about the new Applied Data Science class that just started at Columbia, which is being taught by Ian Langmore, Daniel Krasner and Chang She. I knew about this class because Ian came to advertise it last semester in Rachel Schutt’s Intro to Data Science class which I blogged. One of the women at the bus stop had been in Rachel’s class and the other is in Ian’s.
Turns out I just love overhearing nerd girls talking data science at the bus stop. Don’t you??
And to top off the nerd girl experience, I’m on my way today to Nebraska to give a talk to a bunch of undergraduate women in math about what they can do with math outside of academia. I’m planning it to be an informative talk, but that’s really just cover to its real goal, which is to give a pep talk.
My experience talking to young women in math, at least when they are grad students, is that they respond viscerally to encouragement, even if it’s vague. I can actually see their egos inflate in the audience as I speak, and that’s a good thing, that’s why I’m there.
As a community, I’ve realized, nerd girls going through grad school are virtually starved for positive feedback, and so my job is pretty clear cut: I’m going to tell them how awesome they are and answer their questions about what it’s like in the “real world” and then go back to telling them how awesome they are.
By the end they sit a bit straighter and smile a bit more after I’m done, after I’ve told them, or reminded them at least, how much power they have as nerd girls – how many options they have, and how they don’t have to be risk-averse, and how they never need to apologize.
Tomorrow my audience is undergraduates, which is a bit trickier, since as an undergrad you still get consistent feedback in the form of grades. So I will tailor my information as well as my encouragement a bit, and try not to make grad school sound too scary, because I do think that getting a Ph.D. is still a huge deal. Comment below if you have suggestions for my talk, please!
This week’s guest lecturer in Rachel Schutt’s Columbia Data Science class was Claudia Perlich. Claudia has been the Chief Scientist at m6d for 3 years. Before that she was a data analytics group at the IBM center that developed Watson, the computer that won Jeopardy!, although she didn’t work on that project. Claudia got her Ph.D. in information systems at NYU and now teaches a class to business students in data science, although mostly she addresses how to assess data science work and how to manage data scientists. Claudia also holds a masters in Computer Science.
Claudia is a famously successful data mining competition winner. She won the KDD Cup in 2003, 2007, 2008, and 2009, the ILP Challenge in 2005, the INFORMS Challenge in 2008, and the Kaggle HIV competition in 2010.
She’s also been a data mining competition organizer, first for the INFORMS Challenge in 2009 and then for the Heritage Health Prize in 2011. Claudia claims to be retired from competition.
Claudia’s advice to young people: pick your advisor first, then choose the topic. It’s important to have great chemistry with your advisor, and don’t underestimate the importance.
Here’s what Claudia historically does with her time:
- predictive modeling
- data mining competitions
- publications in conferences like KDD and journals
- digging around data (her favorite part)
Claudia likes to understand something about the world by looking directly at the data.
Here’s Claudia’s skill set:
- plenty of experience doing data stuff (15 years)
- data intuition (for which one needs to get to the bottom of the data generating process)
- dedication to the evaluation (one needs to cultivate a good sense of smell)
- model intuition (we use models to diagnose data)
Claudia also addressed being a woman. She says it works well in the data science field, where her intuition is useful and is used. She claims her nose is so well developed by now that she can smell it when something is wrong. This is not the same thing as being able to prove something algorithmically. Also, people typically remember her because she’s a woman, even when she don’t remember them. It has worked in her favor, she says, and she’s happy to admit this. But then again, she is where she is because she’s good.
Someone in the class asked if papers submitted for journals and/or conferences are blind to gender. Claudia responded that it was, for some time, typically double-blind but now it’s more likely to be one-sided. And anyway there was a cool analysis that showed you can guess who wrote a paper with 80% accuracy just by knowing the citations. So making things blind doesn’t really help. More recently the names are included, and hopefully this doesn’t make things too biased. Claudia admits to being slightly biased towards institutions – certain institutions prepare better work.
Skills and daily life of a Chief Data Scientist
Claudia’s primary skills are as follows:
- Data manipulation: unix (sed, awk, etc), Perl, SQL
- Modeling: various methods (logistic regression, nearest neighbors, k-nearest neighbors, etc)
- Setting things up
She mentions that the methods don’t matter as much as how you’ve set it up, and how you’ve translated it into something where you can solve a question.
More recently, she’s been told that at work she spends:
- 40% of time as “contributor”: doing stuff directly with data
- 40% of time as “ambassador”: writing stuff, giving talks, mostly external communication to represent m6d, and
- 20% of time in “leadership” of her data group
At IBM it was much more focused in the first category. Even so, she has a flexible schedule at m6d and is treated well.
The goals of the audience
She asked the class, why are you here? Do you want to:
- become a data scientist? (good career choice!)
- work with data scientist?
- work for a data scientist?
- manage a data scientist?
Most people were trying their hands at the first, but we had a few in each category.
She mentioned that it matters because the way she’d talk to people wanting to become a data scientist would be different from the way she’d talk to someone who wants to manage them. Her NYU class is more like how to manage one.
So, for example, you need to be able to evaluate their work. It’s one thing to check a bubble sort algorithm or check whether a SQL server is working, but checking a model which purports to give the probability of people converting is different kettle of fish.
For example, try to answer this: how much better can that model get if you spend another week on it? Let’s face it, quality control is hard for yourself as a data miner, so it’s definitely hard for other people. There’s no easy answer.
There’s an old joke that comes to mind: What’s the difference between the scientist and a consultant? The scientists asks, how long does it take to get this right? whereas the consultant asks, how right can I get this in a week?
Insights into data
A student asks, how do you turn a data analysis into insights?
For example, decision trees you interpret, and people like them because they’re easy to interpret, but I’d ask, why does it look like it does? A slightly different data set would give you a different tree and you’d get a different conclusion. This is the illusion of understanding. I tend to be careful with delivering strong insights in that sense.
Data mining competitions
Claudia drew a distinction between different types of data mining competitions.
On the one hand you have the “sterile” kind, where you’re given a clean, prepared data matrix, a standard error measure, and where the features are often anonymized. This is a pure machine learning problem.
Examples of this first kind are: KDD Cup 2009 and 2011 (Netflix). In such competitions, your approach would emphasize algorithms and computation. The winner would probably have heavy machines and huge modeling ensembles.
On the other hand, you have the “real world” kind of data mining competition, where you’re handed raw data, which is often in lots of different tables and not easily joined, where you set up the model yourself and come up with task-specific evaluations. This kind of competition simulates real life more.
Examples of this second kind are: KDD cup 2007, 2008, and 2010. If you’re competing in this kind of competition your approach would involve understanding the domain, analyzing the data, and building the model. The winner might be the person who best understands how to tailor the model to the actual question.
Claudia prefers the second kind, because it’s closer to what you do in real life. In particular, the same things go right or go wrong.
How to be a good modeler
Claudia claims that data and domain understanding is the single most important skill you need as a data scientist. At the same time, this can’t really be taught – it can only be cultivated.
A few lessons learned about data mining competitions that Claudia thinks are overlooked in academics:
- Leakage: the contestants best friend and the organizers/practitioners worst nightmare. There’s always something wrong with the data, and Claudia has made an artform of figuring out how the people preparing the competition got lazy or sloppy with the data.
- Adapting learning to real-life performance measures beyond standard measures like MSE, error rate, or AUC (profit?)
- Feature construction/transformation: real data is rarely flat (i.e. given to you in a beautiful matrix) and good, practical solutions for this problem remains a challenge.
Leakage refers to something that helps you predict something that isn’t fair. It’s a huge problem in modeling, and not just for competitions. Oftentimes it’s an artifact of reversing cause and effect.
Example 1: There was a competition where you needed to predict S&P in terms of whether it would go up or go down. The winning entry had a AUC (area under the ROC curve) of 0.999 out of 1. Since stock markets are pretty close to random, either someone’s very rich or there’s something wrong. There’s something wrong.
In the good old days you could win competitions this way, by finding the leakage.
Example 2: Amazon case study: big spenders. The target of this competition was to predict customers who spend a lot of money among customers using past purchases. The data consisted of transaction data in different categories. But a winning model identified that “Free Shipping = True” was an excellent predictor
What happened here? The point is that free shipping is an effect of big spending. But it’s not a good way to model big spending, because in particular it doesn’t work for new customers or for the future. Note: timestamps are weak here. The data that included “Free Shipping = True” was simultaneous with the sale, which is a no-no. We need to only use data from beforehand to predict the future.
Example 3: Again an online retailer, this time the target is predicting customers who buy jewelry. The data consists of transactions for different categories. A very successful model simply noted that if sum(revenue) = 0, then it predicts jewelry customers very well?
What happened here? The people preparing this data removed jewelry purchases, but only included people who bought something in the first place. So people who had sum(revenue) = 0 were people who only bought jewelry. The fact that you only got into the dataset if you bought something is weird: in particular, you wouldn’t be able to use this on customers before they finished their purchase. So the model wasn’t being trained on the right data to make the model useful. This is a sampling problem, and it’s common.
Example 4: This happened at IBM. The target was to predict companies who would be willing to buy “websphere” solutions. The data was transaction data + crawled potential company websites. The winning model showed that if the term “websphere” appeared on the company’s website, then they were great candidates for the product.
What happened? You can’t crawl the historical web, just today’s web.
You’re trying to study who has breast cancer. The patient ID, which seemed innocent, actually has predictive power. What happened?
In the above image, red means cancerous, green means not. it’s plotted by patient ID. We see three or four distinct buckets of patient identifiers. It’s very predictive depending on the bucket. This is probably a consequence of using multiple databases, some of which correspond to sicker patients are more likely to be sick.
A student suggests: for the purposes of the contest they should have renumbered the patients and randomized.
Claudia: would that solve the problem? There could be other things in common as well.
A student remarks: The important issue could be to see the extent to which we can figure out which dataset a given patient came from based on things besides their ID.
Claudia: Think about this: what do we want these models for in the first place? How well can you predict cancer?
Given a new patient, what would you do? If the new patient is in a fifth bin in terms of patient ID, then obviously don’t use the identifier model. But if it’s still in this scheme, then maybe that really is the best approach.
This discussion brings us back to the fundamental problem that we need to know what the purpose of the model is and how is it going to be used in order to decide how to do it and whether it’s working.
During an INFORMS competition on pneumonia predictions in hospital records, where the goal was to predict whether a patient has pneumonia, a logistic regression which included the number of diagnosis codes as a numeric feature (AUC of 0.80) didn’t do as well as the one which included it as a categorical feature (0.90). What’s going on?
This had to do with how the person prepared the data for the competition:
The diagnosis code for pneumonia was 486. So the preparer removed that (and replaced it by a “-1″) if it showed up in the record (rows are different patients, columns are different diagnoses, there are max 4 diagnoses, “-1″ means there’s nothing for that entry).
Moreover, to avoid telling holes in the data, the preparer moved the other diagnoses to the left if necessary, so that only “-1″‘s were on the right.
There are two problems with this:
- If the column has only “-1″‘s, then you know it started out with only pneumonia, and
- If the column has no “-1″‘s, you know there’s no pneumonia (unless there are actually 5 diagnoses, but that’s less common).
This was enough information to win the competition.
Note: winning competition on leakage is easier than building good models. But even if you don’t explicitly understand and game the leakage, your model will do it for you. Either way, leakage is a huge problem.
How to avoid leakage
Claudia’s advice to avoid this kind of problem:
- You need a strict temporal cutoff: remove all information just prior to the event of interest (patient admission).
- There has to be a timestamp on every entry and you need to keep
- Removing columns asks for trouble
- Removing rows can introduce inconsistencies with other tables, also causing trouble
- The best practice is to start from scratch with clean, raw data after careful consideration
- You need to know how the data was created! I only work with data I pulled and prepared myself (or maybe Ori).
How do I know that my model is any good?
With powerful algorithms searching for patterns of models, there is a serious danger of over fitting. It’s a difficult concept, but the general idea is that “if you look hard enough you’ll find something” even if it does not generalize beyond the particular training data.
To avoid overfitting, we cross-validate and we cut down on the complexity of the model to begin with. Here’s a standard picture (although keep in mind we generally work in high dimensional space and don’t have a pretty picture to look at):
The picture on the left is underfit, in the middle is good, and on the right is overfit.
The model you use matters when it concerns overfitting:
So for the above example, unpruned decision trees are the most over fitting ones. This is a well-known problem with unpruned decision trees, which is why people use pruned decision trees.
Claudia dismisses accuracy as a bad evaluation method. What’s wrong with accuracy? It’s inappropriate for regression obviously, but even for classification, if the vast majority is of binary outcomes are 1, then a stupid model can be accurate but not good (guess it’s always “1″), and a better model might have lower accuracy.
Probabilities matter, not 0′s and 1′s.
Nobody makes decisions on binary outcomes. I want to know the probability I have breast cancer, I don’t want to be told yes or no. It’s much more information. I care about probabilities.
How to evaluate a probability model
We separately evaluate the ranking and the calibration. To evaluate the ranking, we use the ROC curve and calculate the area under it, typically ranges from 0.5-1.0. This is independent of scaling and calibration. Here’s an example of how to draw an ROC curve:
Sometimes to measure rankings, people draw the so-called lift curve:
The key here is that the lift is calculated with respect to a baseline. You draw it at a given point, say 10%, by imagining that 10% of people are shown ads, and seeing how many people click versus if you randomly showed 10% of people ads. A lift of 3 means it’s 3 times better.
How do you measure calibration? Are the probabilities accurate? If the model says probability of 0.57 that I have cancer, how do I know if it’s really 0.57? We can’t measure this directly. We can only bucket those predictions and then aggregately compare those in that prediction bucket (say 0.50-0.55) to the actual results for that bucket.
For example, here’s what you get when your model is an unpruned decision tree, where the blue diamonds are buckets:
A good model would show buckets right along the x=y curve, but here we’re seeing that the predictions were much more extreme than the actual probabilities. Why does this pattern happen for decision trees?
Claudia says that this is because trees optimize purity: it seeks out pockets that have only positives or negatives. Therefore its predictions are more extreme than reality. This is generally true about decision trees: they do not generally perform well with respect to calibration.
Logistic regression looks better when you test calibration, which is typical:
- Accuracy is almost never the right evaluation metric.
- Probabilities, not binary outcomes.
- Separate ranking from calibration.
- Ranking you can measure with nice pictures: ROC, lift
- Calibration is measured indirectly through binning.
- Different models are better than others when it comes to calibration.
- Calibration is sensitive to outliers.
- Measure what you want to be good at.
- Have a good baseline.
Choosing an algorithm
This is not a trivial question and in particular small tests may steer you wrong, because as you increase the sample size the best algorithm might vary: often decision trees perform very well but only if there’s enough data.
In general you need to choose your algorithm depending on the size and nature of your dataset and you need to choose your evaluation method based partly on your data and partly on what you wish to be good at. Sum of squared error is maximum likelihood loss function if your data can be assumed to be normal, but if you want to estimate the median, then use absolute errors. If you want to estimate a quantile, then minimize the weighted absolute error.
We worked on predicting the number of ratings of a movie will get in the next year, and we assumed a poisson distributions. In this case our evaluation method doesn’t involve minimizing the sum of squared errors, but rather something else which we found in the literature specific to the Poisson distribution, which depends on the single parameter :
Charity direct mail campaign
Let’s put some of this together.
Say we want to raise money for a charity. If we send a letter to every person in the mailing list we raise about $9000. We’d like to save money and only send money to people who are likely to give – only about 5% of people generally give. How can we do that?
If we use a (somewhat pruned, as is standard) decision tree, we get $0 profit: it never finds a leaf with majority positives.
If we use a neural network we still make only $7500, even if we only send a letter in the case where we expect the return to be higher than the cost.
This looks unworkable. But if you model is better, it’s not. A person makes two decisions here. First, they decide whether or not to give, then they decide how much to give. Let’s model those two decisions separately, using:
Note we need the first model to be well-calibrated because we really care about the number, not just the ranking. So we will try logistic regression for first half. For the second part, we train with special examples where there are donations.
Altogether this decomposed model makes a profit of $15,000. The decomposition made it easier for the model to pick up the signals. Note that with infinite data, all would have been good, and we wouldn’t have needed to decompose. But you work with what you got.
Moreover, you are multiplying errors above, which could be a problem if you have a reason to believe that those errors are correlated.
We are not meant to understand data. Data are outside of our sensory systems and there are very few people who have a near-sensory connection to numbers. We are instead meant to understand language.
We are not mean to understand uncertainty: we have all kinds of biases that prevent this from happening and are well-documented.
Modeling people in the future is intrinsically harder than figuring out how to label things that have already happened.
Even so we do our best, and this is through careful data generation, careful consideration of what our problem is, making sure we model it with data close to how it will be used, making sure we are optimizing to what we actually desire, and doing our homework in learning which algorithms fit which tasks.
I’m very happy to say I just signed a book contract with my co-author, Rachel Schutt, to publish a book with O’Reilly called Doing Data Science.
For those of you who’ve been reading along for free as I’ve been blogging it, there might not be a huge incentive to buy it, but I can promise you more and better math, more explicit usable formulas, some sample code, and an overall better and more thought-out narrative.
It’s supposed to be published in May with a possible early release coming up at the end of February, in time for the O’Reilly Strata Santa Clara conference, where Rachel will be speaking about it and about other stuff curriculum related. Hopefully people will pick it up in time to teach their data science courses in Fall 2013.
Speaking of Rachel, she’s also been selected to give a TedXWomen talk at Barnard on December 1st, which is super exciting. She’s talking about advocating for the social good using data. Unfortunately the event is invitation-only, otherwise I’d encourage you all to go and hear her words of wisdom. Update: word on the street is that it will be video-taped.
Data Science Blog
Today we started with discussing Rachel’s new blog, which is awesome and people should check it out for her words of data science wisdom. The topics she’s riffed on so far include: Why I proposed the course, EDA (exploratory data analysis), Analysis of the data science profiles from last week, and Defining data science as a research discipline.
She wants students and auditors to feel comfortable in contributing to blog discussion, that’s why they’re there. She particularly wants people to understand the importance of getting a feel for the data and the questions before ever worrying about how to present a shiny polished model to others. To illustrate this she threw up some heavy quotes:
“Long before worrying about how to convince others, you first have to understand what’s happening yourself” – Andrew Gelman
“Agreed” – Rachel Schutt
Thought experiment: how would you simulate chaos?
We split into groups and discussed this for a few minutes, then got back into a discussion. Here are some ideas from students:
- A Lorenzian water wheel would do the trick, if you know what that is.
- Question: is chaos the same as randomness?
- Many a physical system can exhibit inherent chaos: examples with finite-state machines
- Teaching technique of “Simulating chaos to teach order” gives us real world simulation of a disaster area
- In this class w want to see how students would handle a chaotic situation. Most data problems start out with a certain amount of dirty data, ill-defined questions, and urgency. Can we teach a method of creating order from chaos?
- See also “Creating order from chaos in a startup“.
Talking to Doug Perlson, CEO of RealDirect
We got into teams of 4 or 5 to assemble our questions for Doug, the CEO of RealDirect. The students have been assigned as homework the task of suggesting a data strategy for this new company, due next week.
He came in, gave us his background in real-estate law and startups and online advertising, and told us about his desire to use all the data he now knew about to improve the way people sell and buy houses.
First they built an interface for sellers, giving them useful data-driven tips on how to sell their house and using interaction data to give real-time recommendations on what to do next. Doug made the remark that normally, people sell their homes about once in 7 years and they’re not pros. The goal of RealDirect is not just to make individuals better but also pros better at their job.
He pointed out that brokers are “free agents” – they operate by themselves. they guard their data, and the really good ones have lots of experience, which is to say they have more data. But very few brokers actually have sufficient experience to do it well.
The idea is to apply a team of licensed real-estate agents to be data experts. They learn how to use information-collecting tools so we can gather data, in addition to publicly available information (for example, co-op sales data now available, which is new).
One problem with publicly available data is that it’s old news – there’s a 3 month lag. RealDirect is working on real-time feeds on stuff like:
- when people start search,
- what’s the initial offer,
- the time between offer and close, and
- how people search online.
Ultimately good information helps both the buyer and the seller.
RealDirect makes money in 2 ways. First, a subscription, $395 a month, to access our tools for sellers. Second, we allow you to use our agents at a reduced commission (2% of sale instead of the usual 2.5 or 3%). The data-driven nature of our business allows us to take less commission because we are more optimized, and therefore we get more volume.
Doug mentioned that there’s a law in New York that you can’t show all the current housing listings unless it’s behind a registration wall, which is why RealDirect requires registration. This is an obstacle for buyers but he thinks serious buyers are willing to do it. He also doesn’t consider places that don’t require registration, like Zillow, to be true competitors because they’re just showing listings and not providing real service. He points out that you also need to register to use Pinterest.
Doug mentioned that RealDirect is comprised of licensed brokers in various established realtor associations, but even so they have had their share of hate mail from realtors who don’t appreciate their approach to cutting commission costs. In this sense it is somewhat of a guild.
On the other hand, he thinks if a realtor refused to show houses because they are being sold on RealDirect, then the buyers would see the listings elsewhere and complain. So they traditional brokers have little choice but to deal with them. In other words, the listings themselves are sufficiently transparent so that the traditional brokers can’t get away with keeping their buyers away from these houses
RealDirect doesn’t take seasonality issues into consideration presently – they take the position that a seller is trying to sell today. Doug talked about various issues that a buyer would care about- nearby parks, subway, and schools, as well as the comparison of prices per square foot of apartments sold in the same building or block. These are the key kinds of data for buyers to be sure.
In terms of how the site works, it sounds like somewhat of a social network for buyers and sellers. There are statuses for each person on site. active – offer made – offer rejected – showing – in contract etc. Based on your status, different opportunities are suggested.
Suggestions for Doug?
Example 1. You have points on the plane:
(x, y) = (1, 2), (2, 4), (3, 6), (4, 8).
The relationship is clearly y = 2x. You can do it in your head. Specifically, you’ve figured out:
- There’s a linear pattern.
- The coefficient 2
- So far it seems deterministic
Example 2. You again have points on the plane, but now assume x is the input, and y is output.
(x, y) = (1, 2.1), (2, 3.7), (3, 5.8), (4, 7.9)
Now you notice that more or less y ~ 2x but it’s not a perfect fit. There’s some variation, it’s no longer deterministic.
(x, y) = (2, 1), (6, 7), (2.3, 6), (7.4, 8), (8, 2), (1.2, 2).
Here your brain can’t figure it out, and there’s no obvious linear relationship. But what if it’s your job to find a relationship anyway?
First assume (for now) there actually is a relationship and that it’s linear. It’s the best you can do to start out. i.e. assume
and now find best choices for and . Note we include because it’s not a perfect relationship. This term is the “noise,” the stuff that isn’t accounted for by the relationship. It’s also called the error.
Before we find the general formula, we want to generalize with three variables now: , and we will again try to explain knowing these values. If we wanted to draw it we’d be working in 4 dimensional space, trying to plot points. As above, assuming a linear relationship means looking for a solution to:
Writing this with matrix notation we get:
How do we calculate ? Define the “residual sum of squares”, denoted to be
where ranges over the various data points. RSS is called a loss function. There are many other versions of it but this is one of the most basic, partly because it gives us a pretty nice measure of closeness of fit.
To minimize we differentiate it with respect to and set it equal to zero, then solve for We end up with
To use this, we go back to our linear form and plug in the values of to get a predicted .
But wait, why did we assume a linear relationship? Sometimes maybe it’s a polynomial relationship.
You need to justify why you’re assuming what you want. Answering that kind of question is a key part of being a data scientist and why we need to learn these things carefully.
All this is like one line of R code where you’ve got a column of y’s and a column of x’s.:
model <- lm(y ~ x)
Or if you’re going with the polynomial form we’d have:
model <- lm(y ~ x + x^2 + x^3)
Why do we do regression? Mostly for two reasons:
- If we want to predict one variable from the next
- If we want to explain or understand the relationship between two things.
Say you have the age, income, and credit rating for a bunch of people and you want to use the age and income to guess at the credit rating. Moreover, say we’ve divided credit ratings into “high” and “low”.
We can plot people as points on the plane and label people with an “x” if they have low credit ratings.
What if a new guy comes in? What’s his likely credit rating label? Let’s use k-nearest neighbors. To do so, you need to answer two questions:
- How many neighbors are you gonna look at? k=3 for example.
- What is a neighbor? We need a concept of distance.
For the sake of our problem, we can use Euclidean distance on the plane if the relative scalings of the variables are approximately correct. Then the algorithm is simple to take the average rating of the people around me. where average means majority in this case – so if there are 2 high credit rating people and 1 low credit rating person, then I would be designated high.
Note we can also consider doing something somewhat more subtle, namely assigning high the value of “1″ and low the value of “0″ and taking the actual average, which in this case would be 0.667. This would indicate a kind of uncertainty. It depends on what you want from your algorithm. In machine learning algorithms, we don’t typically have the concept of confidence levels. care more about accuracy of prediction. But of course it’s up to us.
Generally speaking we have a training phase, during which we create a model and “train it,” and then we have a testing phase where we use new data to test how good the model is.
For k-nearest neighbors, the training phase is stupid: it’s just reading in your data. In testing, you pretend you don’t know the true label and see how good you are at guessing using the above algorithm. This means you save some clean data from the overall data for the testing phase. Usually you want to save randomly selected data, at least 10%.
In R: read in the package “class”, and use the function knn().
You perform the algorithm as follows:
knn(train, test, cl, k=3)
The output includes the k nearest (in Euclidean distance) training set vectors, and the classification labels as decided by majority vote
How do you evaluate if the model did a good job?
This isn’t easy or universal – you may decide you want to penalize certain kinds of misclassification more than others. For example, false positives may be way worse than false negatives.
To start out stupidly, you might want to simply minimize the misclassification rate:
(# incorrect labels) / (# total labels)
How do you choose k?
This is also hard. Part of homework next week will address this.
When do you use linear regression vs. k-nearest neighbor?
Thinking about what happens with outliers helps you realize how hard this question is. Sometimes it comes down to a question of what the decision-maker decides they want to believe.
Note definitions of “closeness” vary depending on the context: closeness in social networks could be defined as the number of overlapping friends.
Both linear regression and k-nearest neighbors are examples of “supervised learning”, where you’ve observed both x and y, and you want to know the function that brings x to y.
One of the reasons I chose to call this blog “mathbabe” is that when I searched that term, I found a website, now defunct (woohoo!), where semi-naked women were adorning math.
This pissed me off, because I want math babes to be doing math.
If you get that (what’s not to get?) then you might see why the European Commission’s latest effort to inspire girls to do science is truly repugnant (hat tip Debbie Berebichez, a.k.a. Science Babe).
It’s a commercial where you see a standard male scientist (in a white lab coat no less) being surprised, and, we assume, aroused, when three girly models come in, giggle, dance, and generally adorn the commercial.
At the end they put on lab goggles in the style of an ironic accessory. They’re all wearing high heels and there’s even lipstick in a few shots for some unexplained reason (are we supposed to infer that wearing lipstick makes you more scientific-alicious?).
And although there are a couple of shots of an actual female writing what could be actual formulas on a hyped-up whiteboard, that’s more than balanced by some other shots of the models with unmistakable come-hither looks, gestures and blown kisses.
People. At the European Commission. Do you have no advisors!? Do you have no common sense? Who vetted this garbage video?!?
I’d like to see us get to the point where our slogan is more along the lines of:
Science, it’s for really smart women
And our video consists of cool, funky women giving actual talks and lectures or actually working on experiments. Maybe they’re wearing heels, but for sure they’re not acting like complete fucking idiots. How’s that?
I personally could suggest about 40 people for such a video. Not hard to do.