## Guest post: Kaisa Taipale visualizes mathematics Ph.D.’s emigration patterns

This is a guest post by Kaisa Taipale. Kaisa got a BS at Caltech, a Ph.D. in math at the University of Minnesota, was a post-doc at MSRI, an assistant professor at St. Olaf College 2010-2012, and is currently visiting Cornell, which is where I met here a couple of weeks ago, and where she told me about her cool visualizations of math Ph.D. emigration patterns and convinced her to write a guest post. Here’s Kaisa on a bridge:

Math data and viz

I was inspired by this older post on Mathbabe, about visualizing the arXiv postings of various math departments.

It got me thinking about tons of interesting questions I’ve asked myself and could answer with visualizations: over time, what’s been coolest on the arXiv? are there any topics that are especially attractive to hiring institutions? There’s tons of work to do!

I had to start somewhere though, and as I’m a total newbie when it comes to data analysis, I decided to learn some skills while focusing on a data set that I have easy non-technical access to and look forward to reading every year. I chose the AMS Annual Survey. I also wanted to stick to questions really close to my thoughts over the last two years, namely the academic job search.

I wanted to learn to use two tools, R and Circos. Why Circos? See the visualizations of college major and career path here - it’s pretty! I’ve messed around with a lot of questions, but in this post I’ll look at two and a half.

Where do graduating PhDs from R1 universities end up, in the short term? I started with graduates of public R1s, as I got my PhD at one.

The PhD-granting institutions are colored green, while academic institutions granting other degrees are in blue. Purple is for business, industry, government, and research institutions. Red is for non-U.S. employment or people not seeking — except for the bright red, which is still seeking. Yellow rounds things out at unknown. Remember, these figures are for immediate plans after graduation rather than permanent employment.

While I was playing with this data (read “learning how to use the reshape and ggplot2 packages”) I noticed that people from private R1s tend to end up at private R1s more often. So I graphed that too.

Does the professoriate in the audience have any idea if this is self-selection or some sort of preference on the part of employers? Also, what happened between 2001 and 2003? I was still in college, and have no idea what historical events are at play here.

Where mathematicians go

For any given year, we can use a circular graph to show us where people go. This is a more clumped version of the above data from 2010 alone, plotted using Circos. (Supplemental table E.4 from the AMS report online.)

The other question – the question current mathematicians secretly care more about, in a gossipy and potentially catty way – is what fields lead to what fate. We all know algebra and number theory are the purest and most virtuous subjects, and applied math is for people who want to make money or want to make a difference in the world.

[On that note, you might notice that I removed statistics PhDs in the visualization below, and I also removed some of the employment sectors that gained only a few people a year. The stats ribbons are huge and the small sectors are very small, so for looks alone I took them out.]

Higher resolution version available here.

Wish list

I wish I could animate a series of these to show this view over time as well. Let me know if you know how to do that! Another nice thing I could do would be to set up a webpage in which these visualizations could be explored in a bit more depth. (After finals.)

Also:

• I haven’t computed any numbers for you
• the graphs from R show employment in each field by percentage of graduates instead of total number per category;
• it’s hard to show both data over time and all the data one could explore. But it’s a start.

I should finish with a shout-out to Roger Peng and Jeff Leek, though we’ve never met: I took Peng’s Computing for Data Analysis and much of Leek’s Data Analysis on Coursera (though I’m one of those who didn’t finish the class). Their courses and Stack Overflow taught me almost everything I know about R. As I mentioned above, I’m pretty new to this type of analysis.

What questions would you ask? How can I make the above cooler? Did you learn anything?

## On being an alpha female, part 2

Almost a year ago now I wrote this post on being an alpha female. I had only recently understood that I was an alpha female, when I wrote it, and it was still kind of new and weird.

For whatever reason it’s been coming up a lot recently and I wanted to update that post with my observations.

Who’s burning which bridges?

Last week I wrote an outraged post about seeing Ina Drew at Barnard.

Mind you, I had anticipated I’d find the event objectionable. I had even polled my Occupy friends for prepared questions for her. But when I got there I realized pretty quickly that I wouldn’t be able to ask her anything. I was just too disgusted with the tone and conceit of the event to participate in it reasonably. Instead I live tweeted the event and seethed.

I lost sleep that night fuming about Drew-as-role-model, and I was grateful to be able to get some of my frustration out on my blog.

Boy Cathy, you sure do know how to burn bridges.

This was, for me, kind of a perfect alpha female moment. My immediate reaction was to think to myself,

They burned bridges with me, you mean.

Since that sounded too arrogant, at the moment anyway, I said something else just slightly less obnoxious. Three points to make here:

1. Anyone who doesn’t agree with me about whether Ina Drew should be celebrated can go suck it.
2. That post got linked to from Reuters, FT.com, and Naked Capitalism. Which doesn’t happen when you’re worrying about burning bridges.
3. When I’m in a certain kind of mood, I’m simply not concerned with other people’s judgments. I think that’s just part of being an alpha female, and I’m grateful for it.

Why grateful? Because lots of shitty things happen when people go around worrying about “burning their bridges” instead of speaking up about bullshit or evil-doing. Or, as Felix Salmon tweeted recently:

Taking notes from an uber alpha female

A few months ago I got an email inviting me to speak in a Python in Finance conference. The email was somewhat weird and kind of just came out and said they need women speakers. I was put in a position of being asked to be a token woman, which is a mindset I don’t enjoy.

I thought about it though, and although I use python, and I used to work in finance, I don’t work in finance any more, and I don’t really think about python too much, I just use it. So I said to the organizer, no thanks, I don’t have anything to say at that conference.

Fast forward to the week before the conference, when I got wind of the agenda. It turned out my friend Claudia Perlich, Chief Data Scientist at m6d and one of the contributors to my upcoming book with Rachel Schutt, was the keynote speaker. I decided to go to the conference essentially because I wanted to see her.

Well, it turned out Claudia had gotten a similar email, and she had accepted the invitation, even though she doesn’t work in finance and doesn’t even use python (she uses perl).

She gave a great talk about modeling blind spots, which everyone enjoyed. It was quite possibly the best talk of the day, in fact. Plus, she wasn’t at all token - having her on the schedule was what made me come to the conference, and I probably wasn’t the only one. And judging by the crowd at the Meetup I gave last night, I would have drawn my own crowd too, if I had been speaking.

I made an alpha female note to myself that day to accept any invitation to a conference that I’d enjoy, even if my expertise isn’t completely within the realm of the conference. I’m learning from Claudia, a master alpha female. Or is it mistress?

Alpha females and self-image

Chris Wiggins recently sent me this essay entitled “A Rant on Women” by Clay Shirky, a writer and professor who studies the social and economic effects of Internet technologies. Here’s the first paragraph:

So I get email from a good former student, applying for a job and asking for a recommendation. “Sure”, I say, “Tell me what you think I should say.” I then get a draft letter back in which the student has described their work and fitness for the job in terms so superlative it would make an Assistant Brand Manager blush.

Guess what? That student is male.

Shirky goes on to vent about how women don’t oversell themselves enough compared to men and how it’s a problem. An excerpt:

There is no upper limit to the risks men are willing to take in order to succeed, and if there is an upper limit for women, they will succeed less. They will also end up in jail less, but I don’t think we get the rewards without the risks.

This made me think about my experience. First, as a Barnard professor, I certainly saw this effect. I’d have men and women come talk to me about letters of recommendations, and not only would I prepare myself for the difference in posture, I’d try to address it directly, by encouraging women to learn how to brag about their accomplishments. I might have tried to convince men a couple of times to stop bragging quite so much, but quickly found that to be a huge waste of time.

But beyond corroborating that this is typical behavior, the essay made me remember myself as a college student.

When I met my thesis advisor, Barry Mazur, who was on sabbatical at UC Berkeley, I remember telling him a math problem I had worked on and solved. He expressed something about liking the problem and being impressed that I’d explained it so well, and I said back,

“Yeah, I’m awesome”

I remember this because of his reaction. At the time, the word “awesome” was widely used among teenagers, but evidently he hadn’t gotten the teenager memo, and he was taken aback by the way I used it. At least that’s what he said. But now that I think about it, maybe he was taken aback that I’d said it at all.

Alpha females and body image

My friend and guest poster Becky recently sent me this video:

It’s about how women have a biased view on their looks, or at least describe their looks to other people in a consistently negatively biased way.

There’s a great critique of this video here (hat tip Avani Patel), wherein fashion and style guru Jennifer Choy complains that the underlying message to the above video is that, in any case, beauty is about all women have going for them, so they should not underestimate their beauty. Plus that all the women in the video were skinny, young, and white.

Great points, but my take was somewhat different.

My immediate reaction to the video was to say, these women need to spend less time thinking about being fat or ugly, and more time thinking about what they think is sexy and attractive. Why is it always about finding flaws in ourselves? Why don’t we spend more time thinking about what turns us on or what we think is beautiful?

I’ll be honest: I think if I had been interviewed in that setting, I would have said something like, “Gorgeous and sexy as hell” and gone on to list my best features. I am not sure I’d have even been able to describe what I look like in any detail, with any accuracy. Most likely I would have just started bragging about my sexy grey streaks. Even more likely: I wouldn’t have had the time to sit down for this interview at all.

Don’t get me wrong, I’ve dabbled in being insecure in my looks: puberty sucked, as did all three post-natal periods until the baby was weaned*, in addition to any time I was ever on the pill**. I’ve concluded that my inherent arrogance is directly related to my hormones, which in turn makes it undeniably tied to my alpha femaleness.

Suffice it to say, when my hormones are not messed up I have “body eumorphia,” where I ignore or downplay any non-perfect parts of my body. It’s a nice feeling.

It kind of makes me want to develop an alpha female hormone treatment. Business model?

UPDATE: Please watch this new spoof video, it’s perfect (except it should be alpha females and men, not just men):

* It gets better when you know it’s going to go away. By the third kid I was like, “gonna cry every day at 3:00pm for the next six weeks. Must schedule that into my calendar.”

** Note to doctors: you need to tell women that the real reason birth control pills work so well is that you lose interest in sex when you’re on them!

## Guest post by Julia Evans: How I got a data science job

This is a guest post by Julia Evans. Julia is a data scientist & programmer who lives in Montréal. She spends her free time these days playing with data and running events for women who program or want to — she just started a Montréal chapter of pyladies to teach programming, and co-organize a monthly meetup called Montréal All-Girl Hack Night for women who are developers.

asked mathbabe a question a few weeks ago saying that I’d recently started a data science job without having too much experience with statistics, and she asked me to write something about how I got the job. Needless to say I’m pretty honoured to be a guest blogger here Hopefully this will help someone!

Last March I decided that I wanted a job playing with data, since I’d been playing with datasets in my spare time for a while and I really liked it. I had a BSc in pure math, a MSc in theoretical computer science and about 6 months of work experience as a programmer developing websites. I’d taken one machine learning class and zero statistics classes.

In October, I left my web development job with some savings and no immediate plans to find a new job. I was thinking about doing freelance web development. Two weeks later, someone posted a job posting to my department mailing list looking for a “Junior Data Scientist”. I wrote back and said basically “I have a really strong math background and am a pretty good programmer”. This email included, embarrassingly, the sentence “I am amazing at math”. They said they’d like to interview me.

The interview was a lunch meeting. I found out that the company (Via Science) was opening a new office in my city, and was looking for people to be the first employees at the new office. They work with clients to make predictions based on their data.

My interviewer (now my manager) asked me about my role at my previous job (a little bit of everything — programming, system administration, etc.), my math background (lots of pure math, but no stats), and my experience with machine learning (one class, and drawing some graphs for fun). I was asked how I’d approach a digit recognition problem and I said “well, I’d see what people do to solve problems like that, and I’d try that”.

I also talked about some data visualizations I’d worked on for fun. They were looking for someone who could take on new datasets and be independent and proactive about creating model, figuring out what is the most useful thing to model, and getting more information from clients.

I got a call back about a week after the lunch interview saying that they’d like to hire me. We talked a bit more about the work culture, starting dates, and salary, and then I accepted the offer.

So far I’ve been working here for about four months. I work with a machine learning system developed inside the company (there’s a paper about it here). I’ve spent most of my time working on code to interface with this system and make it easier for us to get results out of it quickly. I alternate between working on this system (using Java) and using Python (with the fabulous IPython Notebook) to quickly draw graphs and make models with scikit-learn to compare our results.

I like that I have real-world data (sometimes, lots of it!) where there’s not always a clear question or direction to go in. I get to spend time figuring out the relevant features of the data or what kinds of things we should be trying to model. I’m beginning to understand what people say about data-wrangling taking up most of their time. I’m learning some statistics, and we have a weekly Friday seminar series where we take turns talking about something we’ve learned in the last few weeks or introducing a piece of math that we want to use.

Overall I’m really happy to have a job where I get data and have to figure out what direction to take it in, and I’m learning a lot.

## Leila Schneps is a mystery writer!

I’m back! I missed you guys bad.

My experience with Seattle in the last 8 days has convinced me of something I rather suspected, namely I’m a huge New York snob and can’t exist happily anywhere else. I will spare you the details (they have to do with cars, subways, and being an asshole pedestrian) but suffice it to say, glad to be home.

Just a few caveats on complaining about my vacation:

1. I enjoyed visiting the University of Washington and giving the math colloquium there as well as a “Math Day” talk where I showed kids the winning strategy for Nim (as well as other impartial two-player games) following my notes from last summer.
2. I enjoyed reading Leon and Becky’s guest posts. Thanks guys!
3. And then there was the time spent with my darling family. Of course, goes without saying, it’s always magical to get to the point where your kids have invented a whole new language of insults after you’ve outlawed certain words: “Shut your fidoodle, you syncopathic lardle!”

Of all the topics I want to write about today, I’ve decided to go with the most immediate and surprising one : Leila Schneps is now a mystery writer! How cool is that? She’s written a book with her daughter, Math on Trial: How Numbers Get Used and Abused in the Courtroom, currently in stock and available on Amazon. And she wrote an op-ed for the New York Times talking about it (hat tip Chris Wiggins).

I know Leila from having been her grad student assistant at the GWU Summer Program for Women in Math the first year it existed, in 1995. She taught undergrads about Galois cohomology and interpreted elements of $H^1$ as twists and elements of $H^2$ as obstructions and then had them do a bunch of examples for homework with me. It was pretty awesome, and I learned a ton. Leila is also a regular and fantastic commenter on mathbabe.

I love the premise of the book she’s written. She finds a bunch of historical examples where mathematics is used in trials to the detriment of justice, and people get unfairly jailed (or, less often, let free). From the op-ed (emphasis mine):

Decades ago, the Harvard law professor Laurence H. Tribe wrote a stinging denunciation of the use of mathematics at trial, saying that the “overbearing impressiveness” of numbers tends to “dwarf” other evidence. But we neither can nor should throw math out of the courtroom. Advances in forensics, which rely on data analysis for everything from gunpowder to DNA, mean that quantitative methods will play an ever more important role in judicial deliberations.

The challenge is to make sure that the math behind the legal reasoning is fundamentally sound. Good math can help reveal the truth. But in inexperienced hands, math can become a weapon that impedes justice and destroys innocent lives.

Go Leila!

I’m pretty sure you guys know this already, but I love my regular readers and commenters. It’s a large part of why I blog – I feel like I’m having a super interesting cocktail party every morning in my underwear. I’m investing in the quality of the rest of my day, stealing a moment before my family wakes up so I can articulate one single idea. The payoff is, most of the time, dependably good conversation that lasts all day, or even more than a day, as your comments and emails come in.

Of course, there are sometimes nasty people and comments in addition to thoughtful ones. Not everyone interprets me as trying to figure stuff out, they think I’m being intentionally asinine or manipulative. Or sometimes they just don’t agree with me, and instead of explaining their reasoning they just yell. Or sometimes they are just jerks, getting out their aggression on a stranger.

My first rule is to allow comments that disagree with me, as long as the reasons are articulated and as long as the comment isn’t abusive. Rude is ok, “you are stupid” is not ok.

My second rule is to have a thick skin. I can completely ignore the sentiment of an abusive commenter calling me names, because first of all I’ve heard it all before and second I’m pretty sure it’s not about me.

I’m not saying it doesn’t bother me at all, because obviously it’s a pain to have to go through my email and make sure people are being civil.

For example, whenever I get onto the top 10 of Hacker News, which has been a few times now, I’ve noticed a huge wave of nasty comments. Of course this could be a direct result of how many people I get (thousands per hour), but I don’t think so – the ratio of interesting to abusive comments coming from Hacker News traffic is tiny. It creates nasty work for me, which I feel compelled to do because letting nasty comments stay on my blog makes me feel violated and intentionally misunderstood.

This morning I found this article via Naked Capitalism regarding reader comments, and how nasty ones make subsequent readers evaluate the message differently, and in particular, more negatively. In other words, my intuition was right – it’s super important to curate comments.

My experience with Hacker News has also given me sympathy for Izabella Laba‘s position that she doesn’t accept comments on her blog (read this post for example). She puts herself out there, with strong opinions, and many of her posts are important and thought-provoking. And by the same token people can get pretty threatened by what she has to say. I can well imagine what her experience has been. What if every day was a Hacker News day? What if a majority of comments contained ridiculous and personal attacks? Yuck.

Makes me even more grateful to have you guys.

## Rachel Schutt speaks at Strata tomorrow about Next-Gen data science

I’m excited about Rachel Schutt’s talk at Strata Santa Clara tomorrow at 1:30 PST. I don’t think it’s being live-streamed, unfortunately, but maybe we will eventually get our hands on a video.

The topic is next-gen data science and data scientists, which is explained in her abstract:

Data Science is an emerging field in industry, yet not well-defined as an academic discipline (or even in industry for that matter). I proposed the “Introduction to Data Science” course at Columbia in March, 2012. This was the first course at Columbia that had the term “Data Science” in the title. I had three primary motivations:

1) Bringing industry to students: I wanted to give students an education in what it’s like to be a data scientist in industry and give them some of the skills data scientists have. This is based on my experience as a lead analyst on the Google+ Data Science team. But I didn’t want to limit them to only my way of seeing the world, so each week, guest speakers from theNYC tech community came to teach the class.

2) I wanted to think more deeply about the science of data science: Data Science has the potential to be a deep and profound research discipline impacting all aspects of our lives. Columbia University and Mayor Bloomberg announced the Institute for Data Sciences and Engineering in July, 2012. This course created an opportunity to develop the theory of Data Science and to formalize it as a legitimate science.

3) Personal Challenge: I kept hearing from data scientists in industry that you can’t teach data science in a classroom or university setting and I took that on as a challenge. I wanted to test the hypothesis that it was possible to train awesome data scientists in the classroom.

In February 2013, 2 months will have passed since the class ended. I’ll be able to reflect on how the class went, how I thought about the curriculum, how I engaged the NYC tech community to be involved in the class, who the students were, whether I had impact on them, etc.

Rachel wrote a blog for the class and had a great post about being a next-gen data scientist. She has high hopes for the students in the class and wrote an aspirational list for them. It started with the idea of being more focused on integrity than on self-promotion, and it ended with bringing one’s humanity to the job.

When Rachel talks about it it seems possible that one could use data science to actually make the world a better place rather than to simply add to the hype and to the predatory nature of the current modeling space (see this article for a perfect example of the predatory modeling side – it doesn’t specifically talk about models but believe me, they’re there, helping the payday lenders and the banks choose who to trap and who to ignore. I’ve talked to people who worked on earlier generations of those models).

Rachel also gave a TEDx Women’s talk at Barnard on the subject of bringing humanity to modeling. Here’s the video of her talk. And while I make fun of TED talks a lot, mostly because they have overly polished ideas and delivery, one thing I love about Rachel’s is how raw and powerful it is. Go Rachel!

## Gender bias in math

I don’t agree with everything she always says, but I agree with everything Izabella Laba says in this post called Gender Bias 101 For Mathematicians (hat tip Jordan Ellenberg). And I’m kind of jealous she put it together in such a fantastic no-bullshit way.

Namely, she debunks a bunch of myths of gender bias. Here’s my summary, but you should read the whole thing:

1. Myth: Sexism in math is perpetrated mainly by a bunch of enormously sexist old guys. Izabella: Nope, it’s everyone, and there’s lots of evidence for that.
2. Myth: The way to combat sexism is to find those guys and isolate them. Izabella: Nope, that won’t work, since it’s everyone.
3. Myth: If it’s really everyone, it’s too hard to solve. Izabella: Not necessarily, and hey you are still trying to solve the Riemann Hypothesis even though that’s hard (my favorite argument).
4. Myth: We should continue to debate about its existence rather than solution. Izabella: We are beyond that, it’s a waste of time, and I’m not going to waste my time anymore.
5. Myth: Izabella, you are only writing this to be reassured. Izabella: Don’t patronize me.

Here’s what I’d add. I’ve been arguing for a long time that gender bias against girls in math starts young and starts at the cultural level. It has to do with expectations of oneself just as much as a bunch of nasty old men (by the way, the above is not to say there aren’t nasty old men (and nasty old women!), just that it’s not only about them).

My argument has been that the cultural differences are larger than the talent differences, something Larry Summers strangely dismissed without actually investigating in his famous speech.

And I think I’ve found the smoking gun for my side of this argument, in the form of an interactive New York Times graphic from last week’s Science section which I’ve screenshot here:

What this shows is that 15-year-old girls out-perform 15-year-old boys in certain countries and under-perform them in others. Those countries where they outperform boys is not random and has everything to do with cultural expectations and opportunities for girls in those countries and is explained to some extent by stereotype threat. Go read the article, it’s fascinating.

I’ll say again what I said already at the end of this post: the great news is that it is possible to address stereotype threat directly, which won’t solve everything but will go a long way.

You do it by emphasizing that mathematical talent is not inherent, nor fixed at birth, and that you can cultivate it and grow it over time and through hard work. I make this speech whenever I can to young people. Spread the word!

## Advice for young women math professors

I’ve been here at the Nebraska conference for undergrad women in math for a couple of days now. There are quite a few grad students and young professors as well and I’m finding myself giving a few pieces of advice over and over again to the new female professors. I thought I’d write them down here too.

Obviously you can take this advice or leave it.

1. Ban guilt from your child-rearing experience. The tenure system being what it is, it’s just impossible for you to work enough, including research, and to spend 4 hours a day with an awake baby. Instead think of it this way: it takes a village to raise a child, and this is the time when it’s more village than mom, which is ok. Make sure they are in loving environments, have super nice babysitters, get the best daycare you can, and stop worrying about being a crappy mom. Turns out you’ll have plenty of time to do awesome things with your kids and in the meantime they need you to be a role model, which means pursuing your dreams.
2. I’m not suggesting working too much either – having a really set schedule which allows time for work during daycare and then time for family before and after is great, and your students and colleagues will just need to accept that you are available during working hours and not otherwise. Don’t apologize for this, just do your job, and don’t assume people are judging you for it either.
3. I met a ton of women who seem to have taken on all of the household duties and are overwhelmed by them, especially when they also have small children. First of all, lower your standards. Houses can be messy, it doesn’t actually kill anyone if you ignore an upturned lego box because you want to go think about math. Second, budget a housecleaner – one woman described how she and her husband decided to sell their car but kept their housekeeper, and I fully endorse this trade-off. Third, sit down with your partner and write a list of chores and split them up. It’s not sexy but it works. Finally, be sure your kids help as soon as they can. Turns out kids can make their own school lunches starting when they’re 8 if the ingredients are readily available.
4. Personally I never do more volunteering at the kids’ schools than my husband as a matter of principle. And it also turns out my husband never does any. This makes me a bitch but also saves me a ton of time. Consider it.
5. Make time for something other than kids and work. Carve it out with a knife if necessary. It will be worth it and will keep you sane and remembering why you made this plan.
6. Also don’t forget to have dates with your partner.
7. Finally, if you ultimately decide it’s not working, remember you have lots of options with a math Ph.D. – don’t underestimate yourself and your options.

## I love me some nerd girls

Last night I was waiting for a bus to go hang with my Athena Mastermind group, which consists of a bunch of very cool Barnard student entrepreneurs and their would-be role models (I say would-be because, although we role models are also very cool, I often think the students are role modeling for us).

As I was waiting at the bus stop, I overheard two women talking about the new Applied Data Science class that just started at Columbia, which is being taught by Ian Langmore, Daniel Krasner and Chang She. I knew about this class because Ian came to advertise it last semester in Rachel Schutt’s Intro to Data Science class which I blogged. One of the women at the bus stop had been in Rachel’s class and the other is in Ian’s.

Turns out I just love overhearing nerd girls talking data science at the bus stop. Don’t you??

And to top off the nerd girl experience, I’m on my way today to Nebraska to give a talk to a bunch of undergraduate women in math about what they can do with math outside of academia. I’m planning it to be an informative talk, but that’s really just cover to its real goal, which is to give a pep talk.

My experience talking to young women in math, at least when they are grad students, is that they respond viscerally to encouragement, even if it’s vague. I can actually see their egos inflate in the audience as I speak, and that’s a good thing, that’s why I’m there.

As a community, I’ve realized, nerd girls going through grad school are virtually starved for positive feedback, and so my job is pretty clear cut: I’m going to tell them how awesome they are and answer their questions about what it’s like in the “real world” and then go back to telling them how awesome they are.

By the end they sit a bit straighter and smile a bit more after I’m done, after I’ve told them, or reminded them at least, how much power they have as nerd girls – how many options they have, and how they don’t have to be risk-averse, and how they never need to apologize.

Tomorrow my audience is undergraduates, which is a bit trickier, since as an undergrad you still get consistent feedback in the form of grades. So I will tailor my information as well as my encouragement a bit, and try not to make grad school sound too scary, because I do think that getting a Ph.D. is still a huge deal. Comment below if you have suggestions for my talk, please!

## Columbia Data Science course, week 12: Predictive modeling, data leakage, model evaluation

This week’s guest lecturer in Rachel Schutt’s Columbia Data Science class was Claudia Perlich. Claudia has been the Chief Scientist at m6d for 3 years. Before that she was a data analytics group at the IBM center that developed Watson, the computer that won Jeopardy!, although she didn’t work on that project. Claudia got her Ph.D. in information systems at NYU and now teaches a class to business students in data science, although mostly she addresses how to assess data science work and how to manage data scientists. Claudia also holds a masters in Computer Science.

Claudia is a famously successful data mining competition winner. She won the KDD Cup in 2003, 2007, 2008, and 2009, the ILP Challenge in 2005, the INFORMS Challenge in 2008, and the Kaggle HIV competition in 2010.

She’s also been a data mining competition organizer, first for the INFORMS Challenge in 2009 and then for the Heritage Health Prize in 2011. Claudia claims to be retired from competition.

Background

Here’s what Claudia historically does with her time:

• predictive modeling
• data mining competitions
• publications in conferences like KDD and journals
• talks
• patents
• teaching
• digging around data (her favorite part)

Claudia likes to understand something about the world by looking directly at the data.

Here’s Claudia’s skill set:

• plenty of experience doing data stuff (15 years)
• data intuition (for which one needs to get to the bottom of the data generating process)
• dedication to the evaluation (one needs to cultivate a good sense of smell)
• model intuition (we use models to diagnose data)

Claudia also addressed being a woman. She says it works well in the data science field, where her intuition is useful and is used. She claims her nose is so well developed by now that she can smell it when something is wrong. This is not the same thing as being able to prove something algorithmically. Also, people typically remember her because she’s a woman, even when she don’t remember them. It has worked in her favor, she says, and she’s happy to admit this. But then again, she is where she is because she’s good.

Someone in the class asked if papers submitted for journals and/or conferences are blind to gender. Claudia responded that it was, for some time, typically double-blind but now it’s more likely to be one-sided. And anyway there was a cool analysis that showed you can guess who wrote a paper with 80% accuracy just by knowing the citations. So making things blind doesn’t really help. More recently the names are included, and hopefully this doesn’t make things too biased. Claudia admits to being slightly biased towards institutions – certain institutions prepare better work.

Skills and daily life of a Chief Data Scientist

Claudia’s primary skills are as follows:

• Data manipulation: unix (sed, awk, etc), Perl, SQL
• Modeling: various methods (logistic regression, nearest neighbors,  k-nearest neighbors, etc)
• Setting things up

She mentions that the methods don’t matter as much as how you’ve set it up, and how you’ve translated it into something where you can solve a question.

More recently, she’s been told that at work she spends:

• 40% of time as “contributor”: doing stuff directly with data
• 40% of time as “ambassador”: writing stuff, giving talks, mostly external communication to represent m6d, and
• 20% of time in “leadership” of her data group

At IBM it was much more focused in the first category. Even so, she has a flexible schedule at m6d and is treated well.

The goals of the audience

She asked the class, why are you here? Do you want to:

• become a data scientist? (good career choice!)
• work with data scientist?
• work for a data scientist?
• manage a data scientist?

Most people were trying their hands at the first, but we had a few in each category.

She mentioned that it matters because the way she’d talk to people wanting to become a data scientist would be different from the way she’d talk to someone who wants to manage them. Her NYU class is more like how to manage one.

So, for example, you need to be able to evaluate their work. It’s one thing to check a bubble sort algorithm or check whether a SQL server is working, but checking a model which purports to give the probability of people converting is different kettle of fish.

For example, try to answer this: how much better can that model get if you spend another week on it? Let’s face it, quality control is hard for yourself as a data miner, so it’s definitely hard for other people. There’s no easy answer.

There’s an old joke that comes to mind: What’s the difference between the scientist and a consultant? The scientists asks, how long does it take to get this right? whereas the consultant asks, how right can I get this in a week?

Insights into data

A student asks, how do you turn a data analysis into insights?

Claudia: this is a constant point of contention. My attitude is: I like to understand something, but what I like to understand isn’t what you’d consider an insight. My message may be, hey you’ve replaced every “a” by a “0″, or, you need to change the way you collect your data. In terms of useful insight, Ori’s lecture from last week, when he talked about causality, is as close as you get.

For example, decision trees you interpret, and people like them because they’re easy to interpret, but I’d ask, why does it look like it does? A slightly different data set would give you a different tree and you’d get a different conclusion. This is the illusion of understanding. I tend to be careful with delivering strong insights in that sense.

For more in this vein, Claudia suggests we look at Monica Rogati‘s talk “Lies, damn lies, and the data scientist.”

Data mining competitions

Claudia drew a distinction between different types of data mining competitions.

On the one hand you have the ”sterile” kind, where you’re given a clean, prepared data matrix, a standard error measure, and where the features are often anonymized. This is a pure machine learning problem.

Examples of this first kind are: KDD Cup 2009 and 2011 (Netflix). In such competitions, your approach would emphasize algorithms and computation. The winner would probably have heavy machines and huge modeling ensembles.

On the other hand, you have the ”real world” kind of data mining competition, where you’re handed raw data, which is often in lots of different tables and not easily joined, where you set up the model yourself and come up with task-specific evaluations. This kind of competition simulates real life more.

Examples of this second kind are: KDD cup 2007, 2008, and 2010. If you’re competing in this kind of competition your approach would involve understanding the domain, analyzing the data, and building the model. The winner might be the person who best understands how to tailor the model to the actual question.

Claudia prefers the second kind, because it’s closer to what you do in real life. In particular, the same things go right or go wrong.

How to be a good modeler

Claudia claims that data and domain understanding is the single most important skill you need as a data scientist. At the same time, this can’t really be taught – it can only be cultivated.

A few lessons learned about data mining competitions that Claudia thinks are overlooked in academics:

• Leakage: the contestants best friend and the organizers/practitioners worst nightmare. There’s always something wrong with the data, and Claudia has made an artform of figuring out how the people preparing the competition got lazy or sloppy with the data.
• Adapting learning to real-life performance measures beyond standard measures like MSE, error rate, or AUC (profit?)
• Feature construction/transformation: real data is rarely flat (i.e. given to you in a beautiful matrix) and good, practical solutions for this problem remains a challenge.

Leakage

Leakage refers to something that helps you predict something that isn’t fair. It’s a huge problem in modeling, and not just for competitions. Oftentimes it’s an artifact of reversing cause and effect.

Example 1: There was a competition where you needed to predict S&P in terms of whether it would go up or go down. The winning entry had a AUC (area under the ROC curve) of 0.999 out of 1. Since stock markets are pretty close to random, either someone’s very rich or there’s something wrong. There’s something wrong.

In the good old days you could win competitions this way, by finding the leakage.

Example 2: Amazon case study: big spenders. The target of this competition was to predict customers who spend a lot of money among customers using past purchases. The data consisted of transaction data in different categories. But a winning model identified that “Free Shipping = True” was an excellent predictor

What happened here? The point is that free shipping is an effect of big spending. But it’s not a good way to model big spending, because in particular it doesn’t work for new customers or for the future. Note: timestamps are weak here. The data that included “Free Shipping = True” was simultaneous with the sale, which is a no-no. We need to only use data from beforehand to predict the future.

Example 3: Again an online retailer, this time the target is predicting customers who buy jewelry. The data consists of transactions for different categories. A very successful model simply noted that if sum(revenue) = 0, then it predicts jewelry customers very well?

What happened here? The people preparing this data removed jewelry purchases, but only included people who bought something in the first place. So people who had sum(revenue) = 0 were people who only bought jewelry. The fact that you only got into the dataset if you bought something is weird: in particular, you wouldn’t be able to use this on customers before they finished their purchase. So the model wasn’t being trained on the right data to make the model useful. This is a sampling problem, and it’s common.

Example 4: This happened at IBM. The target was to predict companies who would be willing to buy “websphere” solutions. The data was transaction data + crawled potential company websites. The winning model showed that if the term ”websphere” appeared on the company’s website, then they were great candidates for the product.

What happened? You can’t crawl the historical web, just today’s web.

Thought experiment

You’re trying to study who has breast cancer. The patient ID, which seemed innocent, actually has predictive power. What happened?

In the above image, red means cancerous, green means not. it’s plotted by patient ID. We see three or four distinct buckets of patient identifiers. It’s very predictive depending on the bucket. This is probably a consequence of using multiple databases, some of which correspond to sicker patients are more likely to be sick.

A student suggests: for the purposes of the contest they should have renumbered the patients and randomized.

Claudia: would that solve the problem? There could be other things in common as well.

A student remarks: The important issue could be to see the extent to which we can figure out which dataset a given patient came from based on things besides their ID.

Claudia: Think about this: what do we want these models for in the first place? How well can you predict cancer?

Given a new patient, what would you do? If the new patient is in a fifth bin in terms of patient ID, then obviously don’t use the identifier model. But if it’s still in this scheme, then maybe that really is the best approach.

This discussion brings us back to the fundamental problem that we need to know what the purpose of the model is and how is it going to be used in order to decide how to do it and whether it’s working.

Pneumonia

During an INFORMS competition on pneumonia predictions in hospital records, where the goal was to predict whether a patient has pneumonia, a logistic regression which included the number of diagnosis codes as a numeric feature (AUC of 0.80) didn’t do as well as the one which included it as a categorical feature (0.90). What’s going on?

This had to do with how the person prepared the data for the competition:

The diagnosis code for pneumonia was 486. So the preparer removed that (and replaced it by a “-1″) if it showed up in the record (rows are different patients, columns are different diagnoses, there are max 4 diagnoses, “-1″ means there’s nothing for that entry).

Moreover, to avoid telling holes in the data, the preparer moved the other diagnoses to the left if necessary, so that only “-1″‘s were on the right.

There are two problems with this:

1. If the column has only “-1″‘s, then you know it started out with only pneumonia, and
2. If the column has no “-1″‘s, you know there’s no pneumonia (unless there are actually 5 diagnoses, but that’s less common).

This was enough information to win the competition.

Note: winning competition on leakage is easier than building good models. But even if you don’t explicitly understand and game the leakage, your model will do it for you. Either way, leakage is a huge problem.

How to avoid leakage

Claudia’s advice to avoid this kind of problem:

• You need a strict temporal cutoff: remove all information just prior to the event of interest (patient admission).
• There has to be a timestamp on every entry and you need to keep
• Removing columns asks for trouble
• Removing rows can introduce inconsistencies with other tables, also causing trouble
• The best practice is to start from scratch with clean, raw data after careful consideration
• You need to know how the data was created! I only work with data I pulled and prepared myself (or maybe Ori).

Evaluations

How do I know that my model is any good?

With powerful algorithms searching for patterns of models, there is a serious danger of over fitting. It’s a difficult concept, but the general idea is that “if you look hard enough you’ll find something” even if it does not generalize beyond the particular training data.

To avoid overfitting, we cross-validate and we cut down on the complexity of the model to begin with. Here’s a standard picture (although keep in mind we generally work in high dimensional space and don’t have a pretty picture to look at):

The picture on the left is underfit, in the middle is good, and on the right is overfit.

The model you use matters when it concerns overfitting:

So for the above example, unpruned decision trees are the most over fitting ones. This is a well-known problem with unpruned decision trees, which is why people use pruned decision trees.

Accuracy: meh

Claudia dismisses accuracy as a bad evaluation method. What’s wrong with accuracy? It’s inappropriate for regression obviously, but even for classification, if the vast majority is of binary outcomes are 1, then a stupid model can be accurate but not good (guess it’s always “1″), and a better model might have lower accuracy.

Probabilities matter, not 0′s and 1′s.

Nobody makes decisions on binary outcomes. I want to know the probability I have breast cancer, I don’t want to be told yes or no. It’s much more information. I care about probabilities.

How to evaluate a probability model

We separately evaluate the ranking and the calibration. To evaluate the ranking, we use the ROC curve and calculate the area under it, typically ranges from 0.5-1.0. This is independent of scaling and calibration. Here’s an example of how to draw an ROC curve:

Sometimes to measure rankings, people draw the so-called lift curve:

The key here is that the lift is calculated with respect to a baseline. You draw it at a given point, say 10%, by imagining that 10% of people are shown ads, and seeing how many people click versus if you randomly showed 10% of people ads.  A lift of 3 means it’s 3 times better.

How do you measure calibration? Are the probabilities accurate? If the model says probability of 0.57 that I have cancer, how do I know if it’s really 0.57? We can’t measure this directly. We can only bucket those predictions and then aggregately compare those in that prediction bucket (say 0.50-0.55) to the actual results for that bucket.

For example, here’s what you get when your model is an unpruned decision tree, where the blue diamonds are buckets:

A good model would show buckets right along the x=y curve, but here we’re seeing that the predictions were much more extreme than the actual probabilities. Why does this pattern happen for decision trees?

Claudia says that this is because trees optimize purity: it seeks out pockets that have only positives or negatives. Therefore its predictions are more extreme than reality. This is generally true about decision trees: they do not generally perform well with respect to calibration.

Logistic regression looks better when you test calibration, which is typical:

Takeaways:

• Accuracy is almost never the right evaluation metric.
• Probabilities, not binary outcomes.
• Separate ranking from calibration.
• Ranking you can measure with nice pictures: ROC, lift
• Calibration is measured indirectly through binning.
• Different models are better than others when it comes to calibration.
• Calibration is sensitive to outliers.
• Measure what you want to be good at.
• Have a good baseline.

Choosing an algorithm

This is not a trivial question and in particular small tests may steer you wrong, because as you increase the sample size the best algorithm might vary: often decision trees perform very well but only if there’s enough data.

In general you need to choose your algorithm depending on the size and nature of your dataset and you need to choose your evaluation method based partly on your data and partly on what you wish to be good at. Sum of squared error is maximum likelihood loss function if your data can be assumed to be normal, but if you want to estimate the median, then use absolute errors. If you want to estimate a quantile, then minimize the weighted absolute error.

We worked on predicting the number of ratings of a movie will get in the next year, and we assumed a poisson distributions. In this case our evaluation method doesn’t involve minimizing the sum of squared errors, but rather something else which we found in the literature specific to the Poisson distribution, which depends on the single parameter $\lambda$:

Charity direct mail campaign

Let’s put some of this together.

Say we want to raise money for a charity. If we send a letter to every person in the mailing list we raise about $9000. We’d like to save money and only send money to people who are likely to give – only about 5% of people generally give. How can we do that? If we use a (somewhat pruned, as is standard) decision tree, we get$0 profit: it never finds a leaf with majority positives.

If we use a neural network we still make only $7500, even if we only send a letter in the case where we expect the return to be higher than the cost. This looks unworkable. But if you model is better, it’s not. A person makes two decisions here. First, they decide whether or not to give, then they decide how much to give. Let’s model those two decisions separately, using: $E(\|person) = P(response = 'yes'| person) \cdot E(\|response = 'yes', person).$ Note we need the first model to be well-calibrated because we really care about the number, not just the ranking. So we will try logistic regression for first half. For the second part, we train with special examples where there are donations. Altogether this decomposed model makes a profit of$15,000. The decomposition made it easier for the model to pick up the signals. Note that with infinite data, all would have been good, and we wouldn’t have needed to decompose. But you work with what you got.

Moreover, you are multiplying errors above, which could be a problem if you have a reason to believe that those errors are correlated.

Parting thoughts

We are not meant to understand data. Data are outside of our sensory systems and there are very few people who have a near-sensory connection to numbers. We are instead meant to understand language.

We are not mean to understand uncertainty: we have all kinds of biases that prevent this from happening and are well-documented.

Modeling people in the future is intrinsically harder than figuring out how to label things that have already happened.

Even so we do our best, and this is through careful data generation, careful consideration of what our problem is, making sure we model it with data close to how it will be used, making sure we are optimizing to what we actually desire, and doing our homework in learning which algorithms fit which tasks.

## O’Reilly book deal signed for “Doing Data Science”

I’m very happy to say I just signed a book contract with my co-author, Rachel Schutt, to publish a book with O’Reilly called Doing Data Science.

The book will be based on the class Rachel is giving this semester at Columbia which I’ve been blogging about here.

For those of you who’ve been reading along for free as I’ve been blogging it, there might not be a huge incentive to buy it, but I can promise you more and better math, more explicit usable formulas, some sample code, and an overall better and more thought-out narrative.

It’s supposed to be published in May with a possible early release coming up at the end of February, in time for the O’Reilly Strata Santa Clara conference, where Rachel will be speaking about it and about other stuff curriculum related. Hopefully people will pick it up in time to teach their data science courses in Fall 2013.

Speaking of Rachel, she’s also been selected to give a TedXWomen talk at Barnard on December 1st, which is super exciting. She’s talking about advocating for the social good using data. Unfortunately the event is invitation-only, otherwise I’d encourage you all to go and hear her words of wisdom. Update: word on the street is that it will be video-taped.

## Columbia data science course, week 2: RealDirect, linear regression, k-nearest neighbors

Data Science Blog

Today we started with discussing Rachel’s new blog, which is awesome and people should check it out for her words of data science wisdom. The topics she’s riffed on so far include: Why I proposed the course, EDA (exploratory data analysis), Analysis of the data science profiles from last week, and Defining data science as a research discipline.

She wants students and auditors to feel comfortable in contributing to blog discussion, that’s why they’re there. She particularly wants people to understand the importance of getting a feel for the data and the questions before ever worrying about how to present a shiny polished model to others. To illustrate this she threw up some heavy quotes:

“Long before worrying about how to convince others, you first have to understand what’s happening yourself” – Andrew Gelman

“Agreed” – Rachel Schutt

Thought experiment: how would you simulate chaos?

We split into groups and discussed this for a few minutes, then got back into a discussion. Here are some ideas from students:

Talking to Doug Perlson, CEO of RealDirect

We got into teams of 4 or 5 to assemble our questions for Doug, the CEO of RealDirect. The students have been assigned as homework the task of suggesting a data strategy for this new company, due next week.

He came in, gave us his background in real-estate law and startups and online advertising, and told us about his desire to use all the data he now knew about to improve the way people sell and buy houses.

First they built an interface for sellers, giving them useful data-driven tips on how to sell their house and using interaction data to give real-time recommendations on what to do next. Doug made the remark that normally, people sell their homes about once in 7 years and they’re not pros. The goal of RealDirect is not just to make individuals better but also pros better at their job.

He pointed out that brokers are “free agents” – they operate by themselves. they guard their data, and the really good ones have lots of experience, which is to say they have more data. But very few brokers actually have sufficient experience to do it well.

The idea is to apply a team of licensed real-estate agents to be data experts. They learn how to use information-collecting tools so we can gather data, in addition to publicly available information (for example, co-op sales data now available, which is new).

One problem with publicly available data is that it’s old news – there’s a 3 month lag. RealDirect is working on real-time feeds on stuff like:

• when people start search,
• what’s the initial offer,
• the time between offer and close, and
• how people search online.

Ultimately good information helps both the buyer and the seller.

RealDirect makes money in 2 ways. First, a subscription, \$395 a month, to access our tools for sellers. Second, we allow you to use our agents at a reduced commission (2% of sale instead of the usual 2.5 or 3%). The data-driven nature of our business allows us to take less commission because we are more optimized, and therefore we get more volume.

Doug mentioned that there’s a law in New York that you can’t show all the current housing listings unless it’s behind a registration wall, which is why RealDirect requires registration. This is an obstacle for buyers but he thinks serious buyers are willing to do it. He also doesn’t consider places that don’t require registration, like Zillow, to be true competitors because they’re just showing listings and not providing real service. He points out that you also need to register to use Pinterest.

Doug mentioned that RealDirect is comprised of licensed brokers in various established realtor associations, but even so they have had their share of hate mail from realtors who don’t appreciate their approach to cutting commission costs. In this sense it is somewhat of a guild.

On the other hand, he thinks if a realtor refused to show houses because they are being sold on RealDirect, then the buyers would see the listings elsewhere and complain. So they traditional brokers have little choice but to deal with them. In other words, the listings themselves are sufficiently transparent so that the traditional brokers can’t get away with keeping their buyers away from these houses

RealDirect doesn’t take seasonality issues into consideration presently – they take the position that a seller is trying to sell today. Doug talked about various issues that a buyer would care about- nearby parks, subway, and schools, as well as the comparison of prices per square foot of apartments sold in the same building or block. These are the key kinds of data for buyers to be sure.

In terms of how the site works, it sounds like somewhat of a social network for buyers and sellers. There are statuses for each person on site. active – offer made – offer rejected – showing – in contract etc. Based on your status, different opportunities are suggested.

Suggestions for Doug?

Linear Regression

Example 1. You have points on the plane:

(x, y) = (1, 2), (2, 4), (3, 6), (4, 8).

The relationship is clearly y = 2x. You can do it in your head. Specifically, you’ve figured out:

• There’s a linear pattern.
• The coefficient 2
• So far it seems deterministic

Example 2. You again have points on the plane, but now assume x is the input, and y is output.

(x, y) = (1, 2.1), (2, 3.7), (3, 5.8), (4, 7.9)

Now you notice that more or less y ~ 2x but it’s not a perfect fit. There’s some variation, it’s no longer deterministic.

Example 3.

(x, y) = (2, 1), (6, 7), (2.3, 6), (7.4, 8), (8, 2), (1.2, 2).

Here your brain can’t figure it out, and there’s no obvious linear relationship. But what if it’s your job to find a relationship anyway?

First assume (for now) there actually is a relationship and that it’s linear. It’s the best you can do to start out. i.e. assume

$y = \beta_0 + \beta_1 x + \epsilon$

and now find best choices for $\beta_0$ and $\beta_1$. Note we include $\epsilon$ because it’s not a perfect relationship. This term is the “noise,” the stuff that isn’t accounted for by the relationship. It’s also called the error.

Before we find the general formula, we want to generalize with three variables now: $x_1, x_2, x_3$, and we will again try to explain $y$ knowing these values. If we wanted to draw it we’d be working in 4 dimensional space, trying to plot points. As above, assuming a linear relationship means looking for a solution to:

$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \epsilon$

Writing this with matrix notation we get:

$y = x \cdot \beta + \epsilon.$

How do we calculate $\beta$? Define the “residual sum of squares”, denoted $RSS(\beta),$ to be

$RSS(\beta) = \sum_i (y_i - \beta x)^2,$

where $i$ ranges over the various data points. RSS is called a loss function. There are many other versions of it but this is one of the most basic, partly because it gives us a pretty nice measure of closeness of fit.

To minimize $RSS(\beta) = (y - \beta x)^t (y - \beta x),$ we differentiate it with respect to $\beta$ and set it equal to zero, then solve for $\beta.$ We end up with

$\beta = (x^t x)^{-1} x^t y.$

To use this, we go back to our linear form and plug in the values of $\beta$ to get a predicted $y$.

But wait, why did we assume a linear relationship? Sometimes maybe it’s a polynomial relationship.

$y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3.$

You need to justify why you’re assuming what you want. Answering that kind of question is a key part of being a data scientist and why we need to learn these things carefully.

All this is like one line of R code where you’ve got a column of y’s and a column of x’s.:

model <- lm(y ~ x)

Or if you’re going with the polynomial form we’d have:

model <- lm(y ~ x + x^2 + x^3)

Why do we do regression? Mostly for two reasons:

• If we want to predict one variable from the next
• If we want to explain or understand the relationship between two things.

K-nearest neighbors

Say you have the age, income, and credit rating for a bunch of people and you want to use the age and income to guess at the credit rating. Moreover, say we’ve divided credit ratings into “high” and “low”.

We can plot people as points on the plane and label people with an “x” if they have low credit ratings.

What if a new guy comes in? What’s his likely credit rating label? Let’s use k-nearest neighbors. To do so, you need to answer two questions:

1. How many neighbors are you gonna look at? k=3 for example.
2. What is a neighbor? We need a concept of distance.

For the sake of our problem, we can use Euclidean distance on the plane if the relative scalings of the variables are approximately correct. Then the algorithm is simple to take the average rating of the people around me. where average means majority in this case – so if there are 2 high credit rating people and 1 low credit rating person, then I would be designated high.

Note we can also consider doing something somewhat more subtle, namely assigning high the value of “1″ and low the value of “0″ and taking the actual average, which in this case would be 0.667. This would indicate a kind of uncertainty. It depends on what you want from your algorithm. In machine learning algorithms, we don’t typically have the concept of confidence levels. care more about accuracy of prediction. But of course it’s up to us.

Generally speaking we have a training phase, during which we create a model and “train it,”  and then we have a testing phase where we use new data to test how good the model is.

For k-nearest neighbors, the training phase is stupid: it’s just reading in your data. In testing, you pretend you don’t know the true label and see how good you are at guessing using the above algorithm.  This means you save some clean data from the overall data for the testing phase. Usually you want to save randomly selected data, at least 10%.

In R: read in the package “class”, and use the function knn().

You perform the algorithm as follows:

knn(train, test, cl, k=3)

The output includes the k nearest (in Euclidean distance) training set vectors, and the classification labels as decided by majority vote

How do you evaluate if the model did a good job?

This isn’t easy or universal – you may decide you want to penalize certain kinds of misclassification more than others. For example, false positives may be way worse than false negatives.

To start out stupidly, you might want to simply minimize the misclassification rate:

(# incorrect labels) / (# total labels)

How do you choose k?

This is also hard. Part of homework next week will address this.

When do you use linear regression vs. k-nearest neighbor?

Thinking about what happens with outliers helps you realize how hard this question is. Sometimes it comes down to a question of what the decision-maker decides they want to believe.

Note definitions of “closeness” vary depending on the context: closeness in social networks could be defined as the number of overlapping friends.

Both linear regression and k-nearest neighbors are examples of “supervised learning”, where you’ve observed both x and y, and you want to know the function that brings x to y.

## Is science a girl thing?

One of the reasons I chose to call this blog “mathbabe” is that when I searched that term, I found a website, now defunct (woohoo!), where semi-naked women were adorning math.

This pissed me off, because I want math babes to be doing math.

If you get that (what’s not to get?) then you might see why the European Commission’s latest effort to inspire girls to do science is truly repugnant (hat tip Debbie Berebichez, a.k.a. Science Babe).

It’s a commercial where you see a standard male scientist (in a white lab coat no less) being surprised, and, we assume, aroused, when three girly models come in, giggle, dance, and generally adorn the commercial.

At the end they put on lab goggles in the style of an ironic accessory. They’re all wearing high heels and there’s even lipstick in a few shots for some unexplained reason (are we supposed to infer that wearing lipstick makes you more scientific-alicious?).

And although there are a couple of shots of an actual female writing what could be actual formulas on a hyped-up whiteboard, that’s more than balanced by some other shots of the models with unmistakable come-hither looks, gestures and blown kisses.

People. At the European Commission. Do you have no advisors!? Do you have no common sense? Who vetted this garbage video?!?

I’d like to see us get to the point where our slogan is more along the lines of:

Science, it’s for really smart women

And our video consists of cool, funky women giving actual talks and lectures or actually working on experiments. Maybe they’re wearing heels, but for sure they’re not acting like complete fucking idiots. How’s that?

I personally could suggest about 40 people for such a video. Not hard to do.

Categories: rant, women in math

## On being an alpha female

About 8 months ago I found out I’m an alpha female. What happened was, one day at work my boss mentioned that he and everyone else is afraid of me. I looked around and realized he was pretty much right (there are exceptions).

I went home to my husband and mentioned how weird it was that people at work are afraid of me, and he said, “No, it’s not weird at all. Don’t you realize that you’re constantly giving people the impression that you’re about to take away their toy and break it??”. No, I hadn’t realized that – and that sounds pretty awful! Am I really that mean? Then he told me I was an alpha male living in a woman’s body.

If you google “alpha male in a woman’s body,” (without the quotes) which I did, you come upon the phrase “alpha female” pretty quickly.

It came as a surprise to me – I’d always thought I am nice. But it wasn’t a surprise to anyone else; in fact when I mentioned my realization to my close friends, each and every one of them laughed out loud that I hadn’t known this about myself. One of my friends told me it was less that I was about breaking toys and more about how I call out people’s bullshit, which is something I have to admit I relish doing.

Upon further reflection I had to admit to myself that I am nice, but only to people who I think are nice themselves. So I guess that means I’m not just simply nice. And if I enjoy calling people on their bullshit, that’s not exactly nice either.

Over the past 8 months, I’ve been slowly observing my alpha femaleness, and at this point I can honestly say I’m comfortable with it. I own it now. It’s kind of fun to know about it, because of how people react to me, without me intentionally doing anything.

How I now think about my alpha femaleness is that it lends me authority. It’s a kind of portable power. Not always, of course, and sometimes I am in situations where I’m totally incompetent, and sometimes I run into someone who completely ignores my alpha femaleness or is themselves an alpha male and competes. I usually really like them.

I’ve also realize how much my life has been informed by this property; my life has been, for the most part, much easier than it could have been without this property. And I want to acknowledge that because most people aren’t like this and don’t have this advantage.

For example: I interview really well. I speak with perceived confidence even when I don’t feel confident, and that comes across well in interviews.

In fact all my life people have mentioned to me that “things seem easy” to me, even in situations where I felt completely insecure and flustered. I used to lift weights at the gym with my buddies in college, and they would not really spot me on the bench press because they were convinced I didn’t need help. I almost dropped the weights on my neck a couple of times calling my friends over from the other side of the room. So in retrospect maybe it was a sign I’m an alpha female, but at the time I was just baffled.

It’s good and bad. When people perceive you as more confident and more comfortable in a situation than you actually are, it’s about 80% good and 20% bad, and could be the opposite depending on the situation. It’s bad when it’s dangerous and you really don’t know what you’re doing (that happened to me when I was driving an ATV once, and luckily when I turned it over in a mud pit I didn’t actually break my legs, but I could have) and it’s totally convenient when you’re presenting stuff or in an interview.

Why am I mentioning all of this? Because I think it might help people, especially women in math or in tech, to learn to think a bit more like an alpha female, and I want to give some tips on how to do it. It’s like injecting a shot of testosterone at the right time.

These tips can be used in specific situations like an interview or a talk or at a work meeting. Feel free to ignore these tips if you hate everything about the idea, which I would totally understand too. In fact when I first learned about it myself, I was offended by it on a matter of principle, but I’ve come to think of it more like a mysterious part of the human experience, on the same page as pheromones and how women have the same menstrual cycle when they live together.

Tips on how to think and act like an alpha female

• When you’re asked to describe your accomplishments, talk about yourself the way your best friend would describe you. So in other words with pride and enthusiasm for your accomplishments, without being embarrassed. Don’t lie or exaggerate, but don’t underplay anything.
• Let there be silence. If you’ve finished what you’re saying and you’re done, wait for someone else to say something.
• If you want credit, give credit first. Generosity is, in my experience, contagious. So if you want to get credit for contributing something to a project, start out by talking about how awesome your collaborators have been on the project. This gets people thinking about credit in a generous way, and it also gives you authority for bestowing it as the first person who brought it up. Note this is different from what I see lots of people do, namely not mentioning credit themselves and waiting passively for someone else to raise it (and to share it).
• Ignore titles and hierarchy. Those things are silly. You can talk to anyone at any time if you have a good idea.
• If you want feedback, give feedback. This includes to your boss (see previous tip). If you want to find out how you stand with someone, the best thing to do is to tell someone else how they stand with you. People love hearing about themselves. This works best when you can say something nice, but it also works when it’s a difficult conversation.
• Define your narrative. When your standing is in question, put out your version of the story first, for a couple of reasons – one is that you define the scope of the question, and the other is that your narrative is now the standard, and any one refuting it has to refute it.
• When you’re in a meeting and want to bring your point across in a room full of alpha males, think about defending or arguing for an idea, rather than for yourself. It helps with gaining confidence in your argument.
• Of course it also helps if your argument is water-tight, so practice making your points in your mind, and write them down beforehand if that helps.
• Develop a thick skin. When you say what you think first, there are plenty of people who might take offense and jump on you and be vicious. Sometimes it’s just a show of power. Keep an observer’s eye on that kind of reaction, and don’t take it personally, because it’s almost never about you really, it’s maybe about their relationship with their mom or something.
• At the same time, what’s cool about putting yourself out there is that people react and often point out how your thinking is flawed or lazy and you get to learn really, really quickly. Learning is the best part!
Categories: rant, women in math

## The fake problem of fake geek girls, and how to be a sexy man nerd

My friend Rachel Schutt recently sent me this Forbes article by Tara Tiger Brown on the so-called problem of too many fake geek girls stealing the thunder and limelight from us true geek girls.

The working definition of geek seems to be someone who is obsessively interested in something (I would argue that you don’t get to be a geek if your obsession is art, for example, I’d like to define it to be an obsession with something technical). She also claims that “true geeks” don’t do something for airtime. From the article:

Girls who genuinely like their hobby or interest and document what they are doing to help others, not garner attention, are true geeks. The ones who think about how to get attention and then work on a project in order to maximize their klout, are exhibitionists.

I kind of like this but I kind of don’t too. I like this because, like you, I have run into many many people (men and women) who loudly claim technical knowledge that they don’t seem to actually have, which is annoying and exhibitionistic. And yes, it’s annoying to see people like that doing things like giving things like Ted talks on “big data” when you seriously doubt they know how to program a linear regression. But again, men and women.

At the same time, there’s no reason someone can’t be both a true geek and an exhibitionist, and it seems kind of funny for a Forbes magazine writer to be claiming the authentic rights to the former but not the latter.

If there’s one thing I’d like to avoid, it’s peer pressure that, as a girl geek, I have to have a certain personality. I like the fact that girl geeks are sometimes shy and sometimes outspoken, sometimes humble and sometimes arrogant, sometimes demure and sometimes slutty. It makes it way more interesting during technical chats.

What’s the asymmetry between men and women here? According to Tara Tiger Brown, women think they’ll get attention from men by acting like a geek but my experience is that men don’t think they’ll get attention from women by acting like a geek.

I think this is a mistake that man geeks are making. For me, and for essentially all my female friends, being really fucking good at some thing is extremely sexy. Man geeks are, therefore, very sexy, if they are in fact really fucking good at something and not just posing. Maybe they just need to realize that and own it a bit more.

Next time, instead of apologizing for doing something nerdy, I suggest you (a man geek I’m imagining talking to right now) figure out how to describe what skill you mastered and talk about it as an accomplishment.

No: I’m kind of tired today, sorry. I stayed up all night playing with my computer. Should we reschedule?

Yes: Last night I implemented dynamic logistic regression and managed to get it to converge on 30 terabytes of streaming data in under 3 hours. And it’s all open source, I just checked in into github. That was awesome! But now I need to sleep. Wanna take a nap with me?

## Google’s promotion policy sucks for women

I’m going to start this post with an excerpt from a comment of reader JoanDelilah from a couple of weeks ago, commenting on my post The meritocracy myth:

And at the end of the day, this also assumes that it is right and proper for a structure to be in place which requires you to *grab* tough/interesting work to prove yourself, as opposed to it being given to you. There is competition inherent in the foundational world-view behind that statement. Why so much competition? We are supposed to be on the same team and competing with other businesses, right? What about the woman who is happy to crush any assignment she is given but simply doesn’t want to have to compete for the assignments that will “prove” her abilities? Why must she step so far out of her comfort zone just in order for the company that pays her to make use of the talents they are paying her to use?

This really nails down what I see all the time with respect to women getting promoted or even just getting recognized for their achievements.

To paraphrase it, women tend not to compete for recognition as much as men, for whatever reason. Maybe they’ve been socialized not to, maybe it is a simple question of testosterone. I will go into why I think this happens below. But for now let me just say I get super pissed when a system has been set up to diminish the success of people simply because of this personality issue.

Google is one such system. At Google, one must self-promote. I believe the rule is that, after two quarters or so of getting good reviews, you are eligible to self-promote, but you don’t have to.

And guess what? That policy sucks for women. Women don’t do it as often. I’ll bet this is statistically significant, even though I don’t have the numbers. Hey Google, do the math on this policy! And then change it!

Here’s the first part of my theory of why this happens. Women are not as secure in their accomplishments. By the way, note I am not saying women are insecure and men are secure. I think it’s more like men are over-secure and women are realistic, kind of like those studies that shows that depressed people are realists and non-depressed people are optimists. I definitely have seen men who actually think they (individually) accomplished something which clearly took a team effort. Women are less likely to “forget” the help they received in making something happen. See this amazing blog rant on the subject from a professor at NYU.

Here’s the second part. Women tend to choose mentors (i.e. bosses or advisors) that are brilliant, thoughtful, and approachable. Typically this also means that those mentors are not the kind of bullying personalities that are best suited to promote their team. Even when one doesn’t have a choice in who your boss is, I claim this approach to pairing still happens in a business when that business decides who should be the boss of a woman.

Example in pure math: Yau at Harvard is famously dynasty-building with his students, but he’s probably not someone who has a tissue box in his office (to be fair I haven’t checked). I didn’t even consider taking Yau as my advisor, in part because he was super intimidating and seemed to challenge grad students with a ring of fire.

The reward for being brave in a situation like that are that he is fiercely loyal to his students once he accepts them, and helps them get great jobs. My point is that fewer women choose Yau-like personalities as their advisor (although it has to be said that Yau has had women students, including Columbia’s Melissa Liu). And thus fewer women end up with advisors that will land them jobs and give them good advice on how to get ahead. I just don’t think women are thinking about that aspect of a mentor the way men do (it’s also possible than men don’t think about it either but are less likely to shy away from rings of fire in general due to their “optimistic” egos).

I am not saying this is an easy problem to fix, because it’s not, and the best self-promoters will always do well no matter where they work. But I do think Google can do better than this; maybe they could think of something a bit more double-blind like the orchestra auditions.

## The meritocracy myth

Jack and Larry

Recently a Wall Street Journal article described what I’ll call a “Larry Summers” moment for women in business. Namely, Jack Welch, the former CEO of General Electric, spoke to a bunch of women about how if they work hard enough they’ll be appreciated and get ahead. From the article:

He had this advice for women who want to get ahead: Grab tough assignments to prove yourself, get line experience, and embrace serious performance reviews and the coaching inherent in them.

“Without a rigorous appraisal system, without you knowing where you stand…and how you can improve, none of these ‘help’ programs that were up there are going to be worth much to you,” he said. Mr. Welch said later that the appraisal “is the best way to attack bias” because the facts go into the document, which both parties have to sign.

Just as in the case of Larry Summer’s now-famous 2005 speech about women in science and math, a bunch of women left Welch’s talk in frustration.

There is no such thing as a meritocracy

Having been in academic mathematics and a quant in a hedge fund, I’d guess I’ve experienced what comes closest in many people’s minds as the closest to a meritocratic system. But my experience is that it’s anything but, even in these highly quantitative settings.

Instead, as it probably is everywhere, the job environment is a huge social game where it matters, a lot, what kind of priorities you demonstrate and what kind of other signals you give off or respond to. We don’t expect people to play golf and smoke cigars in academia but caring about teaching, or worse, getting a teaching award, can be the kiss of death.

I’m not saying that your personal efforts don’t matter at all, because they do, and you do need to produce stuff, and at a certain rate, but even “personal efforts” are first of all received in the context of a social order (i.e. the perceived importance of your efforts at the very least is a social invention), and second of all they’re are not really personal – one frames the questions one answers with the help of the community, so it’s important you have a good connection and social acceptance in that community (i.e. access to the experts).

Business in more generality is even less meritocratic- there’s a specific requirement that you must “play well with others,” which is absent from academics (mercifully). This means that instead of being an implicit social game, it’s been made very explicit. This is where people promote their work, take credit for others’ work, learn to say what people want to hear, etc. The performance review is a circle-jerk event for such empty-headed manipulations, which makes it particularly ironic that Welch suggested women take the criticism in an appraisal so seriously.

In my experience, it is unbelievably useful for these social games to have an alpha personality, which just kind of means you assume you’re in charge even when it’s not explicitly a situation where someone’s in charge. People respond to such personalities on a chemical level and there’s really nothing a so-called meritocratic system can do about that.

In other words, I’m not holding my breath for a truly meritocratic system. It’s just not what humans evolved for. Let’s acknowledge that and work on how to make the system responsive to good ideas anyway (whatever the system is).

Successful people want to believe that there is such a thing as meritocracy

This begs the question, why do people like Jack Welch and Larry Summers hold on so tight to the myth of meritocracy? My theory is that it serves a two-fold goal: as advertisement for new people and as a validation of the winners in the system.

People want to feel like they are entering a level playing field then the best thing you can do is advertise it as a meritocracy, because it’s human nature to think that you’re better than average. So everyone wants to enter such a field, assuming they will rise to the top.

At the same time, the winners’ of the social game want desperately to think they did amazing stuff in order to be so successful. They hold on to the myth of meritocracy as a religious belief, and it is pure dogma by the time they reach upper management. This plays into another part of human nature where we discount luck and the infrastructure that led to our success and take it as a sign of our personal choices. Lots of people in finance in general suffer from this diseased mindset but actually anyone who is high enough up in their respective meritocratic system’ does too.

That’s my simple explanation for why these guys can go in front of a bunch of women and be so unbelievably tone-deaf. They are true believers, because their entire egos are built on this belief, and it doesn’t matter how much counter-evidence is presented to them, even in the form of humans in the room with them.

One last thought. If I saw people leaving a room in disgust when I was giving a talk, I imagine I’d be slightly aghast- I might even pause and ask them what’s wrong. But I guess that’s because I’m not alpha enough.

Categories: finance, math, rant, women in math

## How to teach someone how to prove something

In a couple of my posts (most recently here), I’ve talked about the need for a course early on in undergraduate math classes on proof techniques.

The goals of the class are two-fold: first, teach the students basic skills, and second demystify the concept of proof. The students should come away from the class thinking, no it’s not magic, and I’ve learned how to do this stuff, and there are a few basic techniques which seem to come in handy.

Today I want to go further into what a curriculum for such a course might look like.

And I will, in a moment, but first I want to explain something. It’s actually a really important and dangerous question,  how to teach such a course, because it could go wildly wrong, and sometimes does. From my commenter Jordan:

… “Numbers, Equations, and Proofs,” which I started at Princeton in 2002 and which is still going as well. Though here’s an interview with a dude who was an ace math competition dude and found the course so hard as to drive him out of the math major! So maybe it’s no longer as “for everyone” as I designed it to be….

This struck me, how perverted Jordan’s class became. For that matter, Math 55 at Harvard could have started out as a good idea as well, but by the time I got to Harvard as a grad student it was the reason so few math majors ever stuck at Harvard and why there were especially few women.

I remember Noam Elkies taught it while I was there and was famous for asking questions in class and getting students to compete to answer them quickly. It makes sense that he’d run a class like this, because he’s so fast and clever, and he’s naturally wondering, am I the fastest and clevererest of them all? But rather than a place where proof is demystified and people feel safe asking dumb questions, he’d created the polar opposite, a live quiz show of clever competition. Ew!

In order to combat this downfall and decay, I think the class needs to have a clearly stated mission as well as built-in curriculum requirements that works against ostentatious displays of cleverness, which indeed only serve to further the “I got it but you don’t” stereotype of math skills (but which mathematicians themselves are incentivized to further since that magical aura comes in handy).

For example, when I taught it, I let the students hand in homework again and again until they got a score they liked. Of course, this depending on me having an awesome grader (and a relatively small class), which luckily I had.

Also, I asked each student to give a presentation to the class on some proof they particularly enjoyed, and I sat through a preview of their presentation and gave them extensive advice on board work and eye contact, which took a lot of work but really helped them prepare and also boosted their egos while at the same time increased their sympathy with each other and with me.

But of course the most important thing was that I clearly stated at the beginning of each class in the first two weeks that proving things in math was a skill like any other that you get good at through practice. And when I left Barnard Dusa McDuff took over the class and still teaches it, so I know it’s in good hands.

If I hadn’t had Dusa, I’d probably have written a manifesto to be given to each person who would teach the class after me. Of course anyone could have just thrown that away but it’s an idea.

As for content, I taught them really basic proof techniques, so induction, proof by contradiction, the pigeon-hole principle, and some epsilon-delta practice. We covered some basic logic, graph theory, group theory, ordinals, and basic analysis. We constructed the reals two ways and the complex numbers once and talked for a long time about whether “i” is real and what that even means. We used A Transition to Higher Mathematics, which I recommend with a few reservations (please tell me if you’ve found a better text for something like this!).

Everything was done super explicitly and carefully, no rushing. I said things three times in three different ways. I wasn’t expecting people to be fast or clever, because I know intelligence works in different ways and that this stuff was completely new to most of the students. And at least one student in the class, who had been an artist, is now a grad student in math at Berkeley.

Looking over my post I realize I spent way more time talking about the tone of the class than the content, but that’s totally appropriate, since I think of this class as an introduction to the culture of mathematics (or rather the culture I wish we had) just as much as mathematics itself.

After all, there really is no time limit on good ideas, and you do get to do it over if you make a mistake, and going over things slowly gives you more time to ask good questions and find mistakes.

## On the making of a girl nerd

Today I want to discuss the process by which girls become math and cs nerds.

I could be tempted to talk primarily about my own story, since I’m a huge nerd. And I will talk about my story, but my focus is going to be on the girls of my generation who could have become nerds but didn’t. I’m hoping we can learn some lessons so that future generations will have more nerd girls.

Both my parents are nerds. My mother has a Ph.D. in applied math and my father has a Ph.D. in pure math. Moreover, I was on the math team in high school, found out about a math camp, and went to it for two summers, with the full support of my family.

I want to go over these details again, because I want to point out that they gave me an enormous advantage to becoming a successful nerd.

First, my parents being nerds: I have found an amazing correlation between women with math Ph.D.’s and women whose fathers are mathematicians. I don’t think this is random- indeed I think it means two things. First, that girls with mathematician dads have an easy time imagining themselves as mathematicians (and an even easier time if their mom is too). Second, that girls without mathematician dads don’t. Otherwise you wouldn’t be able to explain the statistics I have.

Second, the math camp experience. I went to math camp in spite of it being an extremely uncool summer endeavor, according to my classmates at school. Yet I didn’t care, and went anyway, mostly because I was already a complete outsider, a fat girl on the math team (but a mathbabe when I got there!).

Two things about this. First, most smart girls around me in Lexington High School, and there were a lot of them, would not have been willing to go to math camp and ruin their reputations. Most of them were relatively popular, and wanted to keep it that way. I had nothing to lose in that aspect and knew it. This kind of thinking may seem silly to us as grownups but seemed like life or death choices then.

Second, the advantage having been to math camp gave me when I got to college was phenomenal. I knew how to prove things by induction, by contradiction, and using the pigeon-hole principle. I knew basic group theory, graph theory, and real analysis. This gave me a jump-start in all of my undergrad math major classes. I was an elite, and what I could do seemed like magic to the kids who were math majors who didn’t know that stuff.

The thing about math is that people get into this mindset about being good at it: they think that you either have it or you don’t (see this post for more on the mindset). So the experience for the other kids, boys and girls, going to an algebra class and sitting next to me and a few other kids from math camp backgrounds was understandably intimidating and made them think they couldn’t compete. But I believe that, considering the social constructs and the kind of confidence girls and boys are trained to have (or not have), it was particularly daunting for other girls to see their competition in a small group of elite nerds who already knew all the answers.

I’m not advocating closing math camps. In fact, I am going back to teach at my high school math camp in July for three weeks (woohoo!). What I am advocating is thinking seriously about the selection process for young nerds and how much it weeds out girls. We can do better.

For example, Harvey Mudd is doing better by careful thought and attention to the issue. Namely, they are changing the introduction to programming class to be more appealing for non-math-or-cs-camp nerds. From the New York Times article:

Known as CS 5, the course focused on hard-core programming, appealing to a particular kind of student — young men, already seasoned programmers, who dominated the class. This only reinforced the women’s sense that computer science was for geeky know-it-alls.

“Most of the female students were unwilling to go on in computer science because of the stereotypes they had grown up with,” said Zachary Dodds, a computer scientist at Mudd. “We realized we were helping perpetuate that by teaching such a standard course.”

To reduce the intimidation factor, the course was divided into two sections — “gold,” for those with no prior experience, and “black” for everyone else. Java, a notoriously opaque programming language, was replaced by a more accessible language called Python. And the focus of the course changed to computational approaches to solving problems across science.

This sounds like a brilliant idea, and one that we should all consider (and python rocks!). It is reminiscent of the “Introduction to Proofs” class which I started with Karen Edwards and Sara Robinson in 1993 at UC Berkeley as an undergrad and which is still going, as well as the class I started at in 2006 at Barnard College, which is also still going. The dual goals of such a class are to teach basic proof techniques to people interested in the major (who probably didn’t go to math camp) and to show people that being able to prove things isn’t magic, it just takes practice and knowing techniques.

Let’s get more campuses across the country to think about all the math and cs nerds they are missing out on by teaching the same old math (or cs) major classes every year. This is a curriculum change that is easy, fun to teach, and completely worthwhile.

## Today is Sonia Kovalevsky Day

Sometimes I imagine what my life would have been life if I’d been born way earlier, like in 1850. Knowing how difficult it was back then to be a female mathematician, and not wanting to assume some special property like I was born royalty or otherwise incredibly rich, I usually settle on something like a farmer’s life, with 7 kids and a butter churn, Little-House-on-the-Prairie style. To satisfy my nerdy urges I imagine myself knitting difficult patterns and formally organizing the community’s crop rotations.

I really don’t have much insight into what it must have been like back then, but even a short thought experiment like this helps me appreciate the story of Sofia Kovalevskaya, who was indeed born in Moscow in 1850 and unbelievably contributed majorly to mathematics, even though (hat tip Robert Lipshitz):

1. it was illegal to go to university in Russia at the time so she had a faux marriage in order to get permission from her husband to go abroad to study,
2. got a Ph.D. in Berlin studying under some famous men (Helmholtz, Kirchhoff and Bunsen in Heidelberg, Weirstrass), becoming the first woman in Europe to ever get hold the degree,
3. after which time nobody in Germany would let her work so she did various jobs including installing streetlamps,
4. and finally managed to get some kind of weird position in Sweden (here‘s a more complete bio).

Did I mention that she eventually had a kid with her husband and then died at the age of 41 from the flu?

I’d really love to go back in time for a day, find Sweden, and buy that amazing woman a drink (and I’d try to arrange to slip some antibiotics into said drink).

Today we are celebrating Sonia at Barnard College (here’s the schedule), where for the nth time (where n is at least 5) we’re having a Sonia Kovalevsky Day with a crowd of young women mathematicians, 9th graders from the Urban Assembly Institute of Math & Science for Young Women, will come and enjoy math talks from Barnard and Columbia professors and then engage in a team competition (with their teachers, which is my favorite part) to see who will win incredibly small prizes but for which they will all scream their heads off for 2 hours. It’s fun!

I started this tradition when I was a Barnard math professor back in 2006 with my friend Kiri Soares who runs the UA Institute, and that fact that it’s still going makes me very happy. Every time I go I try to teach the students how to solve the Rubiks cube using a few tricks which stem from group theory. It’s fun to do and they all get to take home their cubes, along with other math toys and goodies. Mmmm… math toys.

Categories: math, women in math