## Guest post: Kaisa Taipale visualizes mathematics Ph.D.’s emigration patterns

*This is a guest post by Kaisa Taipale. Kaisa got a BS at Caltech, a Ph.D. in math at the University of Minnesota, was a post-doc at MSRI, an assistant professor at St. Olaf College 2010-2012, and is currently visiting Cornell, which is where I met here a couple of weeks ago, and where she told me about her cool visualizations of math Ph.D. emigration patterns and convinced her to write a guest post. Here’s Kaisa on a bridge:*

**Math data and viz**

I was inspired by this older post on Mathbabe, about visualizing the arXiv postings of various math departments.

It got me thinking about tons of interesting questions I’ve asked myself and could answer with visualizations: over time, what’s been coolest on the arXiv? are there any topics that are especially attractive to hiring institutions? There’s tons of work to do!

I had to start somewhere though, and as I’m a total newbie when it comes to data analysis, I decided to learn some skills while focusing on a data set that I have easy non-technical access to and look forward to reading every year. I chose the AMS Annual Survey. I also wanted to stick to questions really close to my thoughts over the last two years, namely the academic job search.

I wanted to learn to use two tools, R and Circos. Why Circos? See the visualizations of college major and career path here – it’s pretty! I’ve messed around with a lot of questions, but in this post I’ll look at two and a half.

**Graduating PhDs**

Where do graduating PhDs from R1 universities end up, in the short term? I started with graduates of public R1s, as I got my PhD at one.

The PhD-granting institutions are colored green, while academic institutions granting other degrees are in blue. Purple is for business, industry, government, and research institutions. Red is for non-U.S. employment or people not seeking — except for the bright red, which is still seeking. Yellow rounds things out at unknown. Remember, these figures are for immediate plans after graduation rather than permanent employment.

While I was playing with this data (read “learning how to use the reshape and ggplot2 packages”) I noticed that people from private R1s tend to end up at private R1s more often. So I graphed that too.

Does the professoriate in the audience have any idea if this is self-selection or some sort of preference on the part of employers? Also, what happened between 2001 and 2003? I was still in college, and have no idea what historical events are at play here.

**Where mathematicians go**

For any given year, we can use a circular graph to show us where people go. This is a more clumped version of the above data from 2010 alone, plotted using Circos. (Supplemental table E.4 from the AMS report online.)

The other question – the question current mathematicians secretly care more about, in a gossipy and potentially catty way – is what fields lead to what fate. We all know algebra and number theory are the purest and most virtuous subjects, and applied math is for people who want to make money or want to make a difference in the world.

[On that note, you might notice that I removed statistics PhDs in the visualization below, and I also removed some of the employment sectors that gained only a few people a year. The stats ribbons are huge and the small sectors are very small, so for looks alone I took them out.]

**Higher resolution version available here.**

**Wish list**

I wish I could animate a series of these to show this view over time as well. Let me know if you know how to do that! Another nice thing I could do would be to set up a webpage in which these visualizations could be explored in a bit more depth. (After finals.)

Also:

- I haven’t computed any numbers for you
- the graphs from R show employment in each field by percentage of graduates instead of total number per category;
- it’s hard to show both data over time and all the data one could explore. But it’s a start.

I should finish with a shout-out to Roger Peng and Jeff Leek, though we’ve never met: I took Peng’s Computing for Data Analysis and much of Leek’s Data Analysis on Coursera (though I’m one of those who didn’t finish the class). Their courses and Stack Overflow taught me almost everything I know about R. As I mentioned above, I’m pretty new to this type of analysis.

What questions would you ask? How can I make the above cooler? Did you learn anything?

Your wish is not that hard to make true — have you seen the d3 library by Mike Bostock (e.g. this example: http://bost.ocks.org/mike/uberdata/) to produce visualisations in a browser. It may look a bit daunting at first, but his examples are very illuminating. Essentially you bind your data to a visualisation (I’ve found that it is easier to manipulate the data into the correct format in R, not in d3), and the library takes care of the rest.

Proof that Google can’t read my mind and find me everything — this is what I was looking for but could not find! Thanks!

Hi, Kaisa!

I think what happened between 2001 and 2003 is the collapse of the dot-com bubble. It officially crashed at the beginning of that time period, but the effects took a while to trickle through the economy. I applied to grad school in ’03, and every program was reporting record applications due to people fleeing the high-tech sector. (Also, my father was laid off right after my June ’03 graduation. He’s a mechanical engineer, but the problem was that Intel wasn’t building new facilities. My dad is a pretty good economic indicator!)

Hi, Ursula!

I thought about that as an explanation, since I applied the same year and grad committees were telling me about crazy numbers of applicants too. So I’m wondering if the people who would have wanted to go into industry or business would have had a hard time in ’03 — the number of unknowns is much bigger than usual. This could also be an artifact in the AMS data that doesn’t correspond to a history event per se — it could just have to do with one institution not responding and everyone being marked as unknown, for instance.

We have great timing, don’t we?

The dot-com bubble crash was big enough to affect endowments & government revenues, so you might have seen a reduction in available academic jobs, as well as industry positions. Actually, if you check out this chart of US federal & state government revenues, the dips match up pretty well with the dips in your chart:

http://www.usgovernmentrevenue.com/recent_revenue

I feel another viz coming on! The best kind, too, collaborative!

Kaisa, if you want greater fame, the AMS’ new e-mentoring blog might be a good venue for these visualizations:

http://blogs.ams.org/mathmentoringnetwork/

Just eyeballing the last graph, it looks to me like you have a better chance of getting a job in business and industry if you’re in Algebra and Number Theory rather than Differential Equations. Is Hardy turning over in his grave?

I think Hardy is. That virtue and purity seems to go only so far…

Also perhaps messing with our stereotypes is that applied PhD students don’t go immediately into business and industry at the rate I might have thought — in fact, more go to “other academic departments.”

Is it possible that someone with a PhD in Algebra or Number Theory is more likely to come from a top tier department than someone who works on Differential Equations? Such a circumstance would explain the observation.

That is something that the AMS should have data about, but they don’t seem to have a table with that data available. Maybe I could ask.

Another observation is that there are at most twice as many people hired into academic ANT positions as academic DiffEq positions, but there are 2.3 times as many graduates in ANT — which lines up with Ursula’s conjecture below. Perhaps departments are filling up on ANT at a faster rate than they’re filling up DE positions — but those DE classes still need to be taught. How could we analyze this and answer Zathras’ question/test Ursula’s hypothesis?

Is it that an ANT has an easier time getting an industry job, or a harder time finding an academic job?

It looks to me as if roughly equal percentages of people in Algebra & Number Theory and DiffEq go into business and industry, but there are more people in A&NT overall?