Do what I want or do what I really want

I’m on my way out to a picnic in Central Park on this glorious Sunday morning, and I plan to write a much more thorough post in response to this New Yorker article on overparenting that my friend Chris Wiggins sent me, but today I just wanted to impart one idea I’ve developed as a mother of three boys.

Namely, kids don’t ever want to do what you want them to do, especially when they’re tired, and it’s awful to feel helpless to get them to something without ridiculous, possibly empty threats, or something worse.

What to do?

My solution is pretty simple, and it works great, at least in my experience. Namely, if I’m getting no response from a reasonable request from my, say, 4-year-old, then I form a separate request which is easier for me and less good for them. And then I offer him a choice between doing what I want or doing what I really want.

Example: it’s bedtime (i.e. 7pm, which we will come back to in further post, which I’m considering calling “In defense of neglectful parenting”) and my kid doesn’t want to stop watching Star Wars Lego movies on Youtube. I’ve asked repeatedly for him to pause the movie so he can brush his teeth, get into his pajamas, and have me read his favorite bedtime story (currently: “Peter and the Shadow Thieves”).

Instead of screaming, picking him up and dragging him to the bathroom, which is increasingly difficult since he’s the size of a 6-year-old, I simply make him an offer:

Either you come brush your teeth right now and I read to you, or you come brush your teeth now and I don’t read to you, and you’ll have to go to bed without a bedtime story. I’m going to count to five and if you don’t come to the bathroom to brush your teeth when I get to “5″ then no story.

Here’s the thing. It’s important that he knows I’m serious. I will actually not read to him if he doesn’t hurry up. To be fair, I only had to follow through with this exactly once for him to understand the seriousness of this kind of offer.

What I like about this is the avoidance of drama, empty threats, and physical coercion, or what’s just as annoying, a wasted evening of arguing with an exhausted child about “why there are bedtimes”, which happens so easily without a strategy in place.

Categories: musing

Aunt Pythia’s advice – nose rings, breakups, itchy fingers, and data science

Aunt Pythia is yet again gratified to find a few new questions in her inbox this morning, but as usual, she’s running quite low. After reading and enjoying the column below, please consider making some fabricated, melodramatic dilemma up out of whole cloth and, more importantly:

Please submit your fake question for Aunt Pythia at the bottom of this page!

——

Dear Aunt Pythia,

Can I have a nose piercing and still be taken seriously as an academic in Mathematics?

Math Dyke

Dear Math Dyke,

Actually, I think you can. Mathematicians may be elitist snobs about some things, but it’s not about the way they’re dressed. They tend to be pretty open-minded about physically presented strangeness. Plus they’ll probably just think it’s some kind of cultural signifier that they don’t understand.

Don’t let this fear hold you back from getting your nose pierced if that’s what you wanna do! It’ll look fabulous!

Auntie P

——

Hi Aunt Pythia,

I was recently dating this girl, and thought I had no feelings towards her other than enjoying her company and being attracted to her. Recently, after dating for a month or so, she wanted to have a “talk” and make things serious. I confessed that I did not love her, but told her that I did not expect these feeings at this point. She dumped me. What could I have done? Should I lie? Thanks :(

Adones

Dear Adones,

First of all, I’m sympathetic to your viewpoint. But I’m also sympathetic to hers – and I’m much more like her myself.

People just move at different paces, and yours was too slow for her. I think the conversation you two had was probably the best thing, and I’m glad you didn’t lie.

My guess is that, from her perspective, you guys had been dating for a full six weeks (I’m interpreting your “or so” broadly), that you were pleasant yet tepid, and that she just wanted more from her love life than that. She didn’t get the impression, based on your conversation, that passion was around the corner, so why bother? From her vantage point, she deserves an interesting and exciting love life.

But don’t despair: there are other women who want to move slowly, especially if they’re not interested in having kids any time soon. My advice is to go find someone with a slower pace that matches yours!

Aunt Pythia

——

Dear Aunt Pythia,

I love to twiddle my fingers . . . but I never took up knitting, for example, because I figured you have to have the mind of an accountant to keep track of the pattern. I supposed I could crank out a scarf or two …. Plus, wool is so itchy. (I note that linen is an option?). Should I be discouraged?

ItchyFingers

Dear Itchy,

One possibility is to have the “mind of an accountant” (I put this in quotes because I know a few accountants that may be offended by the assumptions) and count out each stitch as you go. Or you could instead have the mind of an artist, and not worry about imperfections in stitch count, since they add texture and individuality to your project. Or, you could do what I do, and have the mind of a mathematician, and choose or design patterns that allow you every now and then think, but mostly just happily knit whilst watching Star Trek or something.

The real reason I love knitting is that I love color and I love the touch of yarn. I just can’t get enough of touching it. And most luxury wool yarn is not itchy at all. My suggestion is to go to a yarn store and touch everything in sight. It’s what people do, don’t worry, nobody will be surprised.

Aunt P

p.s. if you live in New York, try Knitty City on 79th near Broadway.

——

Dear Aunt Pythia,

I don’t have a math background. I studied Political Science in college. But I’m fascinated by data science and want to learn more. If I keep chugging along, teaching myself things, do you think this is a viable career? I’m teaching myself programming right now (JavaScript, Ruby), a bit of R, a bit of SAS.

Don’t Always Take Advice

Dear DATA,

I do think you need to understand the math behind the algorithms in order to really be a good data science (as I explained in this post). But that doesn’t mean you have to have a math background – you can give yourself a math foreground right now. So yes, if you are willing to really go deep and understand these algorithms from top to bottom, of course you can become a data scientist. There’s no secret property of college learning that makes it somehow better, after all. And there are tons of online resources that you can use for this stuff, as well as the book I’m writing which will be out soon.

One more piece of advice: get yourself a github account and store your code for projects in that, as well as written descriptions of what problems you’ve solved with your code. Since you don’t have a standard background in math and stats and CS, you’ll have to have evidence that you really can do this stuff.

Good luck!

Aunt Pythia

——

Please submit your question to Aunt Pythia!

Categories: Aunt Pythia

What does it mean that our public square is a private place?

I just read this opinion piece written by Jillian York and published by Aljazeera.com. York discusses “How social network policies are changing speech and privacy norms” and she makes the point that there’s a big difference between our legal rights as citizens and the way Facebook has defined its policies, and by extension our “rights” inside Facebook.

So, for example, there’s the question of whether we can show pictures of breastfeeding our children on Facebook. The policy on this has changed – nowadays they say yes, but they used to remove such pictures.

Another example might be more important: whether you can be anonymous. As York points out, Facebook might have an opinion about this, and Zuckerberg seems to – she quotes him as having said ”having two identities for yourself is an example of a lack of integrity” – and yet their vested interest in this question is related to making sure they’ve accurately targeted you for advertisements.

I want to make the case that the “real-life” version of anonymity in Facebook is really just privacy in the simplest sense.

If I am even half-aware of the extent of the surveillance and tracking that goes on when I log into Facebook under my real name, which I don’t even think I am, then I’d tend to use a separate browser, with cleared cookies, and an anonymous Facebook account in order to do absolutely anything without it being tracked. In other words, anonymity is what it takes to do anything privately on Facebook.

Now, you might argue that I can just not go to Facebook at all if I want to do private things, and I’m sure that’s Facebook position as well. But the truth is, Facebook is the world’s public square. Some enormous fraction of the world visits Facebook at least once a week. Exclusion from this would be a big deal.

In any case, it’s weird that decisions like this, that affect our notions of privacy, are being decided by some dude who’s probably thinking more about ad revenue than anything else, under pressure from shareholders.

Not that it’s a new problem. When I was growing up in Lexington, MA, over the cold winters we’d hang out in the Burlington Mall. It was the public square of its time, and yes it was utterly commercial and private, and of course they excluded anyone who they didn’t like the looks of, with security guards. Even so, they didn’t check ID’s at the door.

Categories: musing

The rise of big data, big brother

I recently read an article off the newsstand called The Rise of Big Data.

It was written by Kenneth Neil Cukier and Viktor Mayer-Schoenberger and it was published in the May/June 2013 edition of Foreign Affairs, which is published by the Council on Foreign Relations (CFR). I mention this because CFR is an influential think tank, filled with powerful insiders, including people like Robert Rubin himself, and for that reason I want to take this view on big data very seriously: it might reflect the policy view before long.

And if I think about it, compared to the uber naive view I came across last week when I went to the congressional hearing about big data and analytics, that would be good news. I’ll write more about it soon, but let’s just say it wasn’t everything I was hoping for.

At least Cukier and Mayer-Schoenberger discuss their reservations regarding “big data” in this article. To contrast this with last week, it seemed like the only background material for the hearing, at least for the congressmen, was the McKinsey report talking about how sexy data science is and how we’ll need to train an army of them to stay competitive.

So I’m glad it’s not all rainbows and sunshine when it comes to big data in this article. Unfortunately, whether because they’re tied to successful business interests, or because they just haven’t thought too deeply about the dark side, their concerns seem almost token, and their examples bizarre.

The article is unfortunately behind the pay wall, but I’ll do my best to explain what they’ve said.

Datafication

First they discuss the concept of datafication, and their example is how we quantify friendships with “likes”: it’s the way everything we do, online or otherwise, ends up recorded for later examination in someone’s data storage units. Or maybe multiple storage units, and maybe for sale.

They formally define later in the article as a process:

… taking all aspect of life and turning them into data. Google’s augmented-reality glasses datafy the gaze. Twitter datafies stray thoughts. LinkedIn datafies professional networks.

Datafication is an interesting concept, although as far as I can tell they did not coin the word, and it has led me to consider its importance with respect to intentionality of the individual.

Here’s what I mean. We are being datafied, or rather our actions are, and when we “like” someone or something online, we are intending to be datafied, or at least we should expect to be. But when we merely browse the web, we are unintentionally, or at least passively, being datafied through cookies that we might or might not be aware of. And when we walk around in a store, or even on the street, we are being datafied in an completely unintentional way, via sensors or Google glasses.

This spectrum of intentionality ranges from us gleefully taking part in a social media experiment we are proud of to all-out surveillance and stalking. But it’s all datafication. Our intentions may run the gambit but the results don’t.

They follow up their definition in the article, once they get to it, with a line that speaks volumes about their perspective:

Once we datafy things, we can transform their purpose and turn the information into new forms of value

But who is “we” when they write it? What kinds of value do they refer to? As you will see from the examples below, mostly that translates into increased efficiency through automation.

So if at first you assumed they mean we, the American people, you might be forgiven for re-thinking the “we” in that sentence to be the owners of the companies which become more efficient once big data has been introduced, especially if you’ve recently read this article from Jacobin by Gavin Mueller, entitled “The Rise of the Machines” and subtitled “Automation isn’t freeing us from work — it’s keeping us under capitalist control.” From the article (which you should read in its entirety):

In the short term, the new machines benefit capitalists, who can lay off their expensive, unnecessary workers to fend for themselves in the labor market. But, in the longer view, automation also raises the specter of a world without work, or one with a lot less of it, where there isn’t much for human workers to do. If we didn’t have capitalists sucking up surplus value as profit, we could use that surplus on social welfare to meet people’s needs.

The big data revolution and the assumption that N=ALL

According to Cukier and Mayer-Schoenberger, the Big Data revolution consists of three things:

  1. Collecting and using a lot of data rather than small samples.
  2. Accepting messiness in your data.
  3. Giving up on knowing the causes.

They describe these steps in rather grand fashion, by claiming that big data doesn’t need to understand cause because the data is so enormous. It doesn’t need to worry about sampling error because it is literally keeping track of the truth. The way the article frames this is by claiming that the new approach of big data is letting “N = ALL”.

But here’s the thing, it’s never all. And we are almost always missing the very things we should care about most.

So for example, as this InfoWorld post explains, internet surveillance will never really work, because the very clever and tech-savvy criminals that we most want to catch are the very ones we will never be able to catch, since they’re always a step ahead.

Even the example from their own article, election night polls, is itself a great non-example: even if we poll absolutely everyone who leaves the polling stations, we still don’t count people who decided not to vote in the first place. And those might be the very people we’d need to talk to to understand our country’s problems.

Indeed, I’d argue that the assumption we make that N=ALL is one of the biggest problems we face in the age of Big Data. It is, above all, a way of excluding the voices of people who don’t have the time or don’t have the energy or don’t have the access to cast their vote in all sorts of informal, possibly unannounced, elections.

Those people, busy working two jobs and spending time waiting for buses, become invisible when we tally up the votes without them. To you this might just mean that the recommendations you receive on Netflix don’t seem very good because most of the people who bother to rate things are Netflix are young and have different tastes than you, which skews the recommendation engine towards them. But there are plenty of much more insidious consequences stemming from this basic idea.

Another way in which the assumption that N=ALL can matter is that it often gets translated into the idea that data is objective. Indeed the article warns us against not assuming that:

… we need to be particularly on guard to prevent our cognitive biases from deluding us; sometimes, we just need to let the data speak.

And later in the article,

In a world where data shape decisions more and more, what purpose will remain for people, or for intuition, or for going against the facts?

This is a bitch of a problem for people like me who work with models, know exactly how they work, and know exactly how wrong it is to believe that “data speaks”.

I wrote about this misunderstanding here, in the context of Bill Gates, but I was recently reminded of it in a terrifying way by this New York Times article on big data and recruiter hiring practices. From the article:

“Let’s put everything in and let the data speak for itself,” Dr. Ming said of the algorithms she is now building for Gild.

If you read the whole article, you’ll learn that this algorithm tries to find “diamond in the rough” types to hire. A worthy effort, but one that you have to think through.

Why? If you, say, decided to compare women and men with the exact same qualifications that have been hired in the past, but then, looking into what happened next you learn that those women have tended to leave more often, get promoted less often, and give more negative feedback on their environments, compared to the men, your model might be tempted to hire the man over the woman next time the two showed up, rather than looking into the possibility that the company doesn’t treat female employees well.

In other words, ignoring causation can be a flaw, rather than a feature. Models that ignore causation can add to historical problems instead of addressing them. And data doesn’t speak for itself, data is just a quantitative, pale echo of the events of our society.

Some cherry-picked examples

One of the most puzzling things about the Cukier and Mayer-Schoenberger article is how they chose their “big data” examples.

One of them, the ability for big data to spot infection in premature babies, I recognized from the congressional hearing last week. Who doesn’t want to save premature babies? Heartwarming! Big data is da bomb!

But if you’re going to talk about medicalized big data, let’s go there for reals. Specifically, take a look at this New York Times article from last week where a woman traces the big data footprints, such as they are, back in time after receiving a pamphlet on living with Multiple Sclerosis. From the article:

Now she wondered whether one of those companies had erroneously profiled her as an M.S. patient and shared that profile with drug-company marketers. She worried about the potential ramifications: Could she, for instance, someday be denied life insurance on the basis of that profile? She wanted to track down the source of the data, correct her profile and, if possible, prevent further dissemination of the information. But she didn’t know which company had collected and shared the data in the first place, so she didn’t know how to have her entry removed from the original marketing list.

Two things about this. First, it happens all the time, to everyone, but especially to people who don’t know better than to search online for diseases they actually have. Second, the article seems particularly spooked by the idea that a woman who does not have a disease might be targeted as being sick and have crazy consequences down the road. But what about a woman is actually is sick? Does that person somehow deserve to have their life insurance denied?

The real worries about the intersection of big data and medical records, at least the ones I have, are completely missing from the article. Although they did mention that ”improving and lowering the cost of health care for the world’s poor” inevitable  will lead to “necessary to automate some tasks that currently require human judgment.” Increased efficiency once again.

To be fair, they also talked about how Google tried to predict the flu in February 2009 but got it wrong. I’m not sure what they were trying to say except that it’s cool what we can try to do with big data.

Also, they discussed a Tokyo research team that collects data on 360 pressure points with sensors in a car seat, “each on a scale of 0 to 256.” I think that last part about the scale was added just so they’d have more numbers in the sentence – so mathematical!

And what do we get in exchange for all these sensor readings? The ability to distinguish drivers, so I guess you’ll never have to share your car, and the ability to sense if a driver slumps, to either “send an alert or atomatically apply brakes.” I’d call that a questionable return for my investment of total body surveillance.

Big data, business, and the government

Make no mistake: this article is about how to use big data for your business. It goes ahead and suggests that whoever has the biggest big data has the biggest edge in business.

Of course, if you’re interested in treating your government office like a business, that’s gonna give you an edge too. The example of Bloomberg’s big data initiative led to efficiency gain (read: we can do more with less, i.e. we can start firing government workers, or at least never hire more).

As for regulation, it is pseudo-dealt with via the discussion of market dominance. We are meant to understand that the only role government can or should have with respect to data is how to make sure the market is working efficiently. The darkest projected future is that of market domination by Google or Facebook:

But how should governments apply antitrust rules to big data, a market that is hard to define and is constantly changing form?

In particular, no discussion of how we might want to protect privacy.

Big data, big brother

I want to be fair to Cukier and Mayer-Schoenberger, because they do at least bring up the idea of big data as big brother. Their topic is serious. But their examples, once again, are incredibly weak.

Should we find likely-to-drop-out boys or likely-to-get-pregnant girls using big data? Should we intervene? Note the intention of this model would be the welfare of poor children. But how many models currently in production are targeting that demographic with that goal? Is this in any way at all a reasonable example?

Here’s another weird one: they talked about the bad metric used by US Secretary of Defense Robert McNamara in the Viet Nam War, namely the number of casualties. By defining this with the current language of statistics, though, it gives us the impression that we could just be super careful about our metrics in the future and: problem solved. As we experts in data know, however, it’s a political decision, not a statistical one, to choose a metric of success. And it’s the guy in charge who makes that decision, not some quant.

Innovation

If you end up reading the Cukier and Mayer-Schoenberger article, please also read Julie Cohen’s draft of a soon-to-be published Harvard Law Review article called “What Privacy is For” where she takes on big data in a much more convincing and skeptical light than Cukier and Mayer-Schoenberger were capable of summoning up for their big data business audience.

I’m actually planning a post soon on Cohen’s article, which contains many nuggets of thoughtfulness, but for now I’ll simply juxtapose two ideas surrounding big data and innovation, giving Cohen the last word. First from the Cukier and Mayer-Schoenberger article:

Big data enables us to experiment faster and explore more leads. These advantages should produce more innovation

Second from Cohen, where she uses the term “modulation” to describe, more or less, the effect of datafication on society:

When the predicate conditions for innovation are described in this way, the problem with characterizing privacy as anti-innovation becomes clear: it is modulation, not privacy, that poses the greater threat to innovative practice. Regimes of pervasively distributed surveillance and modulation seek to mold individual preferences and behavior in ways that reduce the serendipity and the freedom to tinker on which innovation thrives. The suggestion that innovative activity will persist unchilled under conditions of pervasively distributed surveillance is simply silly; it derives rhetorical force from the cultural construct of the liberal subject, who can separate the act of creation from the fact of surveillance. As we have seen, though, that is an unsustainable fiction. The real, socially-constructed subject responds to surveillance quite differently—which is, of course, exactly why government and commercial entities engage in it. Clearing the way for innovation requires clearing the way for innovative practice by real people, by preserving spaces within which critical self-determination and self-differentiation can occur and by opening physical spaces within which the everyday practice of tinkering can thrive.

Why we should break up the megabanks (#OWS)

Today is May Day, and my Occupy group and I are planning to join in the actions all over the city this afternoon. At 2:00 I’m going to be at Cooper Square, where Free University is holding a bunch of teach-ins, and I’m giving one entitled “Why we should break up the megabanks.” I wanted to get my notes for the talk down in writing beforehand here.

The basic reasons to break up the megabanks are these:

  1. They hold too much power.
  2. They cost too much.
  3. They get away with too much.
  4. They make things worse.

Each requires explanation.

Megabanks hold too much power

When Paulson went to Congress to argue for the bailout in 2008, he told them that the consequences of not acting would be a total collapse of the financial system and the economy. He scared Congress and the American people to such an extent that the banks managed to receive $700 billion with no strings attached. Even though half of that enormous pile of money was supposed to go to help homeowners threatened with foreclosures, almost none of it did, because the banks found other things to do with it.

The power of megabanks doesn’t only exert itself through the threat of annihilation, though. It also flows through lobbyists who water down Dodd-Frank (or really any policy that banks don’t like) and through “the revolving door,” the men and women who work for Treasury, the White House, and regulators about half the time and sit in positions of power in the megabanks the other half of their time, gaining influence and money and retiring super rich.

It is unreasonable to expect to compete with this kind of insularity and influence of the megabanks.

They cost too much

The bailout didn’t work and it’s still going on. And we certainly didn’t “make money” on it, compared to what the government could have expected if we had invested differently.

But honestly it’s too narrow to think about money alone, because what we really need to consider is risk. And there we’ve lost a lot: when we bailed them out, we took on the risk of the megabanks, and we have simply done nothing to return it. Ultimately the only way to get rid of that costly risk is to break them up once and for all to a size that they can reasonably and obviously be expected to fail.

Make no mistake about it: risk is valuable. It may not be quantifiable at a moment of time, but over time it becomes quite valuable and quantifiable indeed, in various ways.

One way is to think about borrowing costs and long-term default probabilities, and there the estimates have varied but we’ve seen numbers such as $83 billion per year modeled. Few people dispute that it’s the right order of magnitude.

They get away with too much

There doesn’t seem to be a limit to what the megabanks can get away with, which we’ve seen with HSBC’s money laundering from terrorists and drug cartels, we’ve seen with Jamie Dimon and Ina Drew lying to Congress about fucking with their risk models, we’ve seen with countless fraudulent and racist practices with mortgages and foreclosures and foreclosure reviews, not to mention setting up customers to fail in deals made to go bad, screwing municipalities and people with outrageous fees, shaving money off of retirement savings, and manipulating any and all markets and rates that they can to increase their bonuses.

The idea of a financial sector is to grease the wheels of commerce, to create a machine that allows the economy to work. But in our case we have a machine that’s taken over the economy instead.

They make things worse

Ultimately the best reason to break them up right now, the sooner the better, is that the incentives are bad and getting worse. Now that they live in a officially protected zone, there is even less reason for them then there used to be to rein in risky practices. There is less reason for them to worry about punishments, since the SEC’s habit of letting people off without jailtime, meaningful penalties, or even admitting wrongdoing has codified the lack of repercussions for bad behavior.

If we use recent history as a guide, the best job in finance you can have right now is inside a big bank, protected from the law, rather than working at a hedge fund where you can be nabbed for insider trading and publicly displayed as an example of the SEC’s new “toughness.”

What we need to worry about now is how bad the next crash is going to be. Let’s break up the megabanks now to mitigate that coming disaster.

Categories: #OWS, finance

Mathbabe, the book

Thanks to a certain friendly neighborhood mathbabe reader, I’ve created this mathbabe book, which is essentially all of my posts that I ever wrote (I think. Note sure about that.) bundled together mostly by date and stuck in a huge pdf. It comes to 1,243 pages.

I did it using leanpub.com, which charges $0.99 per person who downloads the pdf. I’m not charging anything over that, because the way I look at it, it’s already free.

Speaking of that, I can see why I’d want a copy of this stuff, since it’s the best way I can think of to have a local version of a bunch of writing I’ve done over the past couple of years, but I don’t actually see why anyone else would. So please don’t think I’m expecting you to go buy this book! Even so, more than one reader has requested this, so here it is.

And one strange thing: I don’t think it required my password on WordPress.com to do it, I just needed the url for the RSS feed. So if you want to avoid paying 99 cents, I’m pretty sure you can go to leanpub or one of its competitors and create another, identical book using that same feed.

And for that matter you can also go build your own book about anything using these tools, which is pretty cool when you think about it. Readers, please tell me if there’s a way to do this that’s open source and free.

Categories: musing

Guest post: Kaisa Taipale visualizes mathematics Ph.D.’s emigration patterns

This is a guest post by Kaisa Taipale. Kaisa got a BS at Caltech, a Ph.D. in math at the University of Minnesota, was a post-doc at MSRI, an assistant professor at St. Olaf College 2010-2012, and is currently visiting Cornell, which is where I met here a couple of weeks ago, and where she told me about her cool visualizations of math Ph.D. emigration patterns and convinced her to write a guest post. Here’s Kaisa on a bridge:

Kaisa

Math data and viz

I was inspired by this older post on Mathbabe, about visualizing the arXiv postings of various math departments.

It got me thinking about tons of interesting questions I’ve asked myself and could answer with visualizations: over time, what’s been coolest on the arXiv? are there any topics that are especially attractive to hiring institutions? There’s tons of work to do!

I had to start somewhere though, and as I’m a total newbie when it comes to data analysis, I decided to learn some skills while focusing on a data set that I have easy non-technical access to and look forward to reading every year. I chose the AMS Annual Survey. I also wanted to stick to questions really close to my thoughts over the last two years, namely the academic job search.

I wanted to learn to use two tools, R and Circos. Why Circos? See the visualizations of college major and career path here - it’s pretty! I’ve messed around with a lot of questions, but in this post I’ll look at two and a half.

Graduating PhDs

Where do graduating PhDs from R1 universities end up, in the short term? I started with graduates of public R1s, as I got my PhD at one.

EmploymentOfPublicR1Grad

The PhD-granting institutions are colored green, while academic institutions granting other degrees are in blue. Purple is for business, industry, government, and research institutions. Red is for non-U.S. employment or people not seeking — except for the bright red, which is still seeking. Yellow rounds things out at unknown. Remember, these figures are for immediate plans after graduation rather than permanent employment.

While I was playing with this data (read “learning how to use the reshape and ggplot2 packages”) I noticed that people from private R1s tend to end up at private R1s more often. So I graphed that too.

EmploymentOfPrivateR1Grad

Does the professoriate in the audience have any idea if this is self-selection or some sort of preference on the part of employers? Also, what happened between 2001 and 2003? I was still in college, and have no idea what historical events are at play here.

Where mathematicians go

For any given year, we can use a circular graph to show us where people go. This is a more clumped version of the above data from 2010 alone, plotted using Circos. (Supplemental table E.4 from the AMS report online.)

2010RoughByType

The other question – the question current mathematicians secretly care more about, in a gossipy and potentially catty way – is what fields lead to what fate. We all know algebra and number theory are the purest and most virtuous subjects, and applied math is for people who want to make money or want to make a difference in the world.

[On that note, you might notice that I removed statistics PhDs in the visualization below, and I also removed some of the employment sectors that gained only a few people a year. The stats ribbons are huge and the small sectors are very small, so for looks alone I took them out.]

2010BigCircosPicHigher resolution version available here.

Wish list

I wish I could animate a series of these to show this view over time as well. Let me know if you know how to do that! Another nice thing I could do would be to set up a webpage in which these visualizations could be explored in a bit more depth. (After finals.)

Also:

  • I haven’t computed any numbers for you
  • the graphs from R show employment in each field by percentage of graduates instead of total number per category;
  • it’s hard to show both data over time and all the data one could explore. But it’s a start.

I should finish with a shout-out to Roger Peng and Jeff Leek, though we’ve never met: I took Peng’s Computing for Data Analysis and much of Leek’s Data Analysis on Coursera (though I’m one of those who didn’t finish the class). Their courses and Stack Overflow taught me almost everything I know about R. As I mentioned above, I’m pretty new to this type of analysis.

What questions would you ask? How can I make the above cooler? Did you learn anything?

Follow

Get every new post delivered to your Inbox.

Join 584 other followers