This is a guest post by Lillian Pierce, who is currently a faculty member of the Hausdorff Center for Mathematics in Bonn, and will next year join the faculty at Duke University.
I’m a mathematician. I also happen to be a mother. I turned in my Ph.D. thesis one week before the due date of my first child, and defended it five weeks after she was born. Two and a half years into my postdoc years, I had my second child.
Now after a few years of practice, I can pretty much handle daily life as a young academic and a parent, at least most of the time, but it still seems like a startlingly strenuous existence compared to what I remember of life as just a young academic, not a parent.
Last year I was asked by the Association for Women in Mathematics to write a piece for the AWM Newsletter about my impressions of being a young mother and getting a mathematical career off the ground at the same time. I suggested that instead I interview a lot of other mathematical mothers, because it’s risky to present just one view as “the way” to tackle mathematics and motherhood.
Besides, what I really wanted to know was: how is everyone else doing this? I wanted to pick up some pointers.
I met Mathbabe about ten years ago when I was a visiting prospective graduate student and she was a postdoc. She made a deep impression on me at the time, and I am very happy that I now have the chance to interview her for the series Mathematics+Motherhood, and to now share with you our conversation.
LP: Tell me about your current work.
CO: I am a data scientist working at a small start-up. We’re trying to combine consulting engagements with a new vision for data science training and education and possibly some companies to spin off. In the meantime, we’re trying not to be creepy.
LP: That sounds like a good goal. And tell me a bit about your family.
CO: I have three kids. I got pregnant with my first son, who’s 13 now, soon after my PhD. Then I had a second child 2 years later, also while I was a postdoc. I also have a 4 year old, whom I had when I was working in finance.
LP: Did you have any notions or worries in advance about how the growth of your family would intersect with the growth of your career?
CO: I absolutely did worry about it, and I was right to worry about it, but I did not hesitate about whether to have children because it was just not a question to me about how I wanted my life to proceed. And I did not want to wait until I was tenured because I didn’t want to risk being infertile, which is a real risk. So for me it was not an option not to do it as a woman, forget as a mathematician.
LP: What was it like as a postdoc with two very young children?
CO: On the one hand I was hopeful about it, and on the other hand I was incredibly disappointed about it. The hopeful part was that the chair of my department was incredibly open to negotiating a maternity leave for postdocs, and it really was the best maternity policy that I knew about: a semester off of teaching for each baby and in total an extra year of the postdoc, since I had 2 babies. So I ended up with four years of postdoc, which was really quite generous on the one hand, but on the other hand it really didn’t matter at all. Not “not at all”—it mattered somewhat but it simply wasn’t enough to feel like I was actually competing with my contemporaries who didn’t have children. That’s on the one hand completely obvious and natural and it makes sense, because when you have small children you need to pay attention to them because they need you—and at the same time it was incredibly frustrating.
LP: It’s interesting because it’s not that you were saying “I won’t be able to compete with my contemporaries over the course of my life,” but more “I can’t compete right now.”
CO: Exactly, “I can’t compete right now” with postdocs without children. I realize—and this is not a new idea—that mathematics as a culture frontloads entirely into those 3 or 4 years after you get your PhD. Ultimately it’s not my fault, it’s not women’s fault, it’s the fault of the academic system.
LP: What metrics could departments use to be thinking more about future potential?
CO: I actually think it’s hard. It’s not just for women that it should change. It’s for the actual culture of mathematics. Essentially, the system is too rigid. And it’s not only women who get lost. The same thing that winnows the pool down right after getting a PhD—it’s a whittling process, to get rid of people, get rid of people, get rid of people until you only have the elite left—that process is incredibly punishing to women, but it’s also incredibly punishing to everybody. And moreover because of the way you get tenure and then stay in your field for the rest of your life, my feeling is that mathematics actually suffers. The reason I say this is because I work in industry now, which is a very different system, and people can reinvent themselves in a way that simply does not happen in mathematics.
LP: Do you think industry, in terms of the young career phase, gets it closer to “right” than academia currently does?
CO: Much closer to right. It’s a brutal place, don’t get me wrong, it’s brutal. I’m not saying it’s a perfect system by any stretch of the imagination. But the truth is in industry you can have a 3 year stint somewhere that is a mistake. Forget having kids, you can have a 3 year stint that was just a mistake for you. You can say “I had a bad boss and I left that place and I got a new job” and people will say “Ok.” They don’t care. One thing that I like about it is the ability to reinvent yourself. And I don’t think you see that in math. In math, your progress is charted by your publication record at a granular level. And if you’re up for tenure and there’s a 3 year gap where you didn’t publish, even if in the other years you published a lot, you still have to explain that gap. It’s like a moral responsibility to keep publishing all the time.
LP: How are you measured in industry?
CO: In industry it’s the question “what have you done for me,” and “what have you done for me lately.” It’s a shorter-term question, and there are good elements to that. One of the good elements is that as a woman you can have a baby or a couple babies and then you can pick up the slack, work your ass off, and you can be more productive after something happens. If someone gets sick, people lower their expectations for that person for some amount of time until they recover, and then expectations are higher. Mathematics by contrast has frontloaded all of the stress, especially for the elite institutions, into the 3 or 4 years to get the tenure track offer and then the next 6 years to get tenure. And then all the stress is gone. I understand why people with tenure like that. But ultimately I don’t think mathematics gets done better because of it. And certainly when the question arises “why don’t women stay in math,” I can answer that very easily: because it’s not a very good place for women, at least if they want kids.
LP: You mention on your blog that your mother is an unapologetic nerd and computer scientist; the conclusion you drew from that was that it was natural for you not to doubt that your contributions to nerd-dom and science and knowledge would be welcomed. How do you think this experience of having a mother like that inoculated you?
CO: One of the great gifts that my mother gave me as a Mother Nerd was the gift of privacy—in the sense that I did not scrutinize myself. First of all she was role-modeling something for me, so if I had any expectations it would be to be like my mom. But second of all she wasn’t asking me to think about that. I think that was one of the rarest things I had, the most unusual aspect of my upbringing as a girl. Very few of the girls that I know are not scrutinized. My mother was too busy to pay attention to my music or my art or my math. And I was left alone to decide what I wanted to do—it wasn’t about what I was good at or what other people thought of my progress. It was all about answering the question, what did I want to do. Privacy for me is having elbow space to self-define.
LP: Do you think it’s harder for parents to give that space to girls than to boys?
CO: Yes I do, I absolutely do. It’s harder and for some reason it’s not even thought about. My mother also gave me the gift of not feeling at all guilty about putting me into daycare. And that’s one of my strongest lessons, is that I don’t feel at all guilty about sending my kids to daycare. In fact I recently had the daycare providers for my 4-year-old all over for dinner, and I was telling them in all honesty that sometimes I wish I could be there too, that I could just stay there all day, because it’s just a wonderful place to be. I’m jealous of my kids. And that’s the best of all worlds. Instead of saying “oh my kid is in daycare all day, I feel bad about that,” it’s “my kid gets to go to daycare.”
LP: Where did this ability not to scrutinize come from? Where did your mother get this?
CO: I don’t know. My mother has never given me advice, she just doesn’t give advice. And when I ask her to, she says “you know more about your life than I do.”
LP: How do you deal with scrutiny now?
CO: It’s transformed as I’ve gotten older. I’ve gotten a thicker skin, partly from working in finance. I’ve gotten to the point now where I can appreciate good feedback and ignore negative feedback. And that’s a really nice place to be. But it started out, I believe, because I was raised in an environment where I wasn’t scrutinized. And I had that space to self-define.
LP: The idea of pushing back against scrutiny to clear space for self-definition is inspiring for adults as well.
CO: Women in math, especially with kids, give yourself a break. You’re under an immense amount of pressure, of scrutiny. You should think of it as being on the front lines, you’re a warrior! And if you’re exhausted, there’s a reason for it. Please go read Radhika Nagpal’s Scientific American blog post (“The Awesomest 7-Year Postdoc Ever”) for tips on how to deal with the pressure. She’s awesome. And the last thing I want to say is that I never stopped loving math. Cardinal Rule Number 1: Before all else, don’t become bitter. Cardinal Rule Number 2: Remember that math is beautiful.
I met you this past summer, you may not remember me. I have a question.
I know a lot of people who know much more math than I do and who figure out solutions to problems more quickly than me. Whenever I come up with a solution to a problem that I’m really proud of and that I worked really hard on, they talk about how they’ve seen that problem before and all the stuff they know about it. How do I know if I’m good enough to go into math?
High School Kid
Dear High School Kid,
Great question, and I’m glad I can answer it, because I had almost the same experience when I was in high school and I didn’t have anyone to ask. And if you don’t mind, I’m going to answer it to anyone who reads my blog, just in case there are other young people wondering this, and especially girls, but of course not only girls.
Here’s the thing. There’s always someone faster than you. And it feels bad, especially when you feel slow, and especially when that person cares about being fast, because all of a sudden, in your confusion about all sort of things, speed seems important. But it’s not a race. Mathematics is patient and doesn’t mind. Think of it, your slowness, or lack of quickness, as a style thing but not as a shortcoming.
Why style? Over the years I’ve found that slow mathematicians have a different thing to offer than fast mathematicians, although there are exceptions (Bjorn Poonen comes to mind, who is fast but thinks things through like a slow mathematician. Love that guy). I totally didn’t define this but I think it’s true, and other mathematicians, weigh in please.
One thing that’s incredibly annoying about this concept of “fastness” when it comes to solving math problems is that, as a high school kid, you’re surrounded by math competitions, which all kind of suck. They make it seem like, to be “good” at math, you have to be fast. That’s really just not true once you grow up and start doing grownup math.
In reality, mostly of being good at math is really about how much you want to spend your time doing math. And I guess it’s true that if you’re slower you have to want to spend more time doing math, but if you love doing math then that’s totally fine. Plus, thinking about things overnight always helps me. So sleeping about math counts as time spent doing math.
[As an aside, I have figured things out so often in my sleep that it's become my preferred way of working on problems. I often wonder if there's a "math part" of my brain which I don't have normal access to but which furiously works on questions during the night. That is, if I've spent the requisite time during the day trying to figure it out. In any case, when it works, I wake up the next morning just simply knowing the proof and it actually seems obvious. It's just like magic.]
So here’s my advice to you, high school kid. Ignore your surroundings, ignore the math competitions, and especially ignore the annoying kids who care about doing fast math. They will slowly recede as you go to college and as high school algebra gives way to college algebra and then Galois Theory. As the math gets awesomer, the speed gets slower.
And in terms of your identity, let yourself fancy yourself a mathematician, or an astronaut, or an engineer, or whatever, because you don’t have to know exactly what it’ll be yet. But promise me you’ll take some math major courses, some real ones like Galois Theory (take Galois Theory!) and for goodness sakes don’t close off any options because of some false definition of “good at math” or because some dude (or possibly dudette) needs to care about knowing everything quickly. Believe me, as you know more you will realize more and more how little you know.
One last thing. Math is not a competitive sport. It’s one of the only existing truly crowd-sourced projects of society, and that makes it highly collaborative and community-oriented, even if the awards and prizes and media narratives about “precocious geniuses” would have you believing the opposite. And once again, it’s been around a long time and is patient to be added to by you when you have the love and time and will to do so.
One of my biggest regrets when I left academic math and number theory behind in 2007 was that I never finished writing up and publishing some cool results I’d been working on with Manjul Bhargava about what we called “3x3x3 Rubik’s cubes”.
Just a teeny bit of background. Say you have a 3x3x3 matrix filled with numbers, including in the very center. So you have 27 numbers in a special 3-dimension configuration. Since there are three axis for such a cube, there are three ways of dividing such a cube into three 3×3 matrices and Once you do that you can get a cubic form by computing
which gives you a cubic equation in three variables, or in other words a genus one curve.
Actually you get three different genus one curves, since you do it along any axis. Turns out there are crazy interesting relationships between those curves, as well as in the space of all 3x3x3 cubes.
Just talking about that stuff gets me excited, because it’s first of all a really natural construction, second of all number theoretic, and third of all it actually makes me think of solving Rubik’s cubes, which I’ve always loved.
Anyhoo, I gave my notes to a grad student Wei Ho when I left math, and she and Manjul recently came out with this preprint entitled “Coregular Spaces and genus one curves”, which is posted on the mathematical arXiv.
First, what’s freaking cool about their paper, to me personally, is that my work with Manjul has been incorporated into the paper in the form of parts of sections 3.2 and 5.1.
But what’s even more incredibly cool, to the mathematical world, is that Wei and Manjul are going to use this paper as background to understand the average size of Selmer groups of elliptic curves, a really fantastic result. Here’s the full abstract of their paper:
A coregular space is a representation of an algebraic group for which the ring of polynomial invariants is free. In this paper, we show that the orbits of many coregular irreducible representations where the number of invariants is at least two, over a (not necessarily algebraically closed) field k, correspond to genus one curves over k together with line bundles, vector bundles, and/or points on their Jacobians. In forthcoming work, we use these orbit parametrizations to determine the average sizes of Selmer groups for various families of elliptic curves.
One last thing. I am lucky enough to be a neighbor of Wei right now, as she finishes up a post-doc at Columbia, and she’s agreed to explain this stuff to me in the coming weeks. Hopefully I will remember enough number theory to understand her!
I’m on my way to D.C. today to give an alleged “distinguished lecture” to a group of mathematics enthusiasts. I misspoke in a previous post where I characterized the audience to consist of math teachers. In fact, I’ve been told it will consist primarily of people with some mathematical background, with typically a handful of high school teachers, a few interested members of the public, and a number of high school and college students included in the group.
So I’m going to try my best to explain three different ways of approaching recommendation engine building for services such as Netflix. I’ll be giving high-level descriptions of a latent factor model (this movie is violent and we’ve noticed you like violent movies), of the co-visitation model (lots of people who’ve seen stuff you’ve seen also saw this movie) and the latent topic model (we’ve noticed you like movies about the Hungarian 1956 Revolution). Then I’m going to give some indication of the issues in doing these massive-scale calculation and how it can be worked out.
And yes, I double-checked with those guys over at Netflix, I am allowed to use their name as long as I make sure people know there’s no affiliation.
In addition to the actual lecture, the MAA is having me give a 10-minute TED-like talk for their website as well as an interview. I am psyched by how easy it is to prepare my slides for that short version using prezi, since I just removed a bunch of nodes on the path of the material without removing the material itself. I will make that short version available when it comes online, and I also plan to share the longer prezi publicly.
[As an aside, and not to sound like an advertiser for prezi (no affiliation with them either!), but they have a free version and the resulting slides are pretty cool. If you want to be able to keep your prezis private you have to pay, but not as much as you'd need to pay for powerpoint. Of course there's always Open Office.]
Train reading: Wrong Answer: the case against Algebra II, by Nicholson Baker, which was handed to me emphatically by my friend Nick. Apparently I need to read this and have an opinion.
Someone asked me a math question the other day and I had fun figuring it out. I thought it would be nice to write it down.
So here’s the problem. You are getting to see sample data and you have to infer the underlying distribution. In fact you happen to know you’re getting draws – which, because I’m a basically violent person, I like to think of as throws of a dart – from a uniform distribution from 0 to some unknown and you need to figure out what is. All you know is your data, so in particular you know how many dart throws you’ve gotten to see so far. Let’s say you’ve seen draws.
In other words, given what’s your best guess for ?
First, in order to simplify, note that all that really matters in terms of the estimate of is what is and how big is.
Next, note you might as well assume that and you just don’t know it yet.
With this set-up, you’ve rephrased the question like this: if you throw darts at the interval , then where do you expect the right-most dart – the maximum – to land?
It’s obvious from this phrasing that, as goes to infinity, you can expect a dart to get closer and closer to 1. Moreover, you can look at the simplest case, where and since the uniform distribution is symmetric, you can see the answer is 1/2. Then you might guess the overall answer, which depends on and goes to 1 as goes to infinity, might be . It makes intuitive sense, but how do you prove that?
Start with a small case where you know the answer. For we just need to know what the expected value of is, and since there’s one dart, the max is just itself, which is to say we need to compute a simple integral to find the expected value (note it’s coming in handy here that I’ve normalized the interval from 0 to 1 so I don’t have to divide by the width of the interval):
and we recover what we already know. In the next case, we need to integrate over two variables (same comment here, don’t have to divide by area of the 1×1 square base):
If you think about it, though, and play symmetric parts in this matter, so you can assume without loss of generality that is bigger, as long as we only let range between 0 and and then multiply the end result by 2:
But that simplifies to:
Let’s do the general case. It’s an n-fold integral over the maximum of all darts, and again without loss of generality is the maximum as long as we remember to multiply the whole thing by . We end up computing:
But this collapses to:
To finish the original question, take the maximum value in your collection of draws and multiply it by the plumping factor to get a best estimate of the parameter
My buddy Jordan Ellenberg sent me this link to an article which covered Sir Andrew Wiles’ comments at a the opening of the Andrew Wiles Building, a housing complex for math nerds in Oxford. From the article:
Wiles claimed that the abuse of mathematics during the global financial meltdown in 2009, particularly by banks’ manipulation of complex derivatives, had tarnished his chosen subject’s reputation.
He explained that scientists used to worry about the ethical repercussions of their work and that mathematics research, which used to be removed from day-to-day life, has diverged “towards goals that you might not believe in”.
At one point Wiles said the following, which is music to my ears coming from a powerful mathematician:
One has to be aware now that mathematics can be misused and that we have to protect its good name.
First, maybe I should invite Wiles to be on my panel of mathematicians for investigating public math models. I originally thought this should be run under the auspices of a society such as the AMS but after talking to some people I’ve given up on that and just want it to be independent.
Second, the Andrew Wiles building was evidently paid for primarily by Landon Clay, who also founded the Clay Institute and was the CEO of Eaton Vance, which an investment management firm which provides its clients with wealth management tools and advice. I’m wondering if that kind of mathematical tool was in Wiles’ mind when he made his speech, and if so, how it went over. Certainly in my experience, wealth management tools are definitely in the “weapons of math destruction” toolbox.
Definitions are basic objects in mathematics. Even so, I’ve never seen the art of definition explicitly taught, and I have rarely seen the need for a definition explicitly discussed.
Have you ever noticed how damn hard it is to make a good definition and yet how utterly useful a good definition can be?
The basic definitions inform the research of any field, and a good definition will lead to better theorems than a bad one. If you get them right, if you really nail down the definition, then everything works out much more cleanly than otherwise.
So for example, it doesn’t make sense to work in algebraic geometry without the concepts of affine and projective space, and varieties, and schemes. They are to algebraic geometry like circles and triangles are to elementary geometry. You define your objects, then you see how they act and how they interact.
I saw first hand how a good definition improves clarity of thought back in grad school. I was lucky enough to talk to John Tate (my mathematical hero) about my thesis, and after listening to me go on for some time with a simple object but complicated proofs, he suggested that I add an extra sentence to my basic object, an assumption with a fixed structure.
This gave me a bit more explaining to do up front – but even there added intuition – and greatly simplified the statement and proofs of my theorems. It also improved my talks about my thesis. I could now go in and spend some time motivating the definition, and then state the resulting theorem very cleanly once people were convinced.
Another example from my husband’s grad seminar this semester: he’s starting out with the concept of triangulated categories coming from Verdier’s thesis. One mysterious part of the definition involves the so-called “octahedral axiom,” which mathematicians have been grappling with ever since it was invented. As far as Johan tells it, people struggle with why it’s necessary but not that it’s necessary, or at least something very much like it. What’s amazing is that Verdier managed to get it right when he was so young.
Why? Because definition building is naturally iterative, and it can take years to get it right. It’s not an obvious process. I have no doubt that many arguments were once fought over whether the most basic definitions, although I’m no historian. There’s a whole evolutionary struggle that I can imagine could take place as well – people could make the wrong definition, and the community would not be able to prove good stuff about that, so it would eventually give way to stronger, more robust definitions. Better to start out carefully.
Going back to that. I think it’s strange that the building up of definitions is not explicitly taught. I think it’s a result of the way math is taught as if it’s already known, so the mystery of how people came up with the theorems is almost hidden, never mind the original objects and questions about them. For that matter, it’s not often discussed why we care whether a given theorem is important, just whether it’s true. Somehow the “importance” conversations happen in quiet voices over wine at the seminar dinners.
Personally, I got just as much out of Tate’s help with my thesis as anything else about my thesis. The crystalline focus that he helped me achieve with the correct choice of the “basic object of study” has made me want to do that every single time I embark on a project, in data science or elsewhere.
This is a guest post from Jordan Ellenberg, a professor of mathematics at the University of Wisconsin. Jordan’s book, How Not To Be Wrong, comes out in May 2014. It is crossposted from his blog, Quomodocumque, and tweeted about at @JSEllenberg.
Cathy posted some cool data yesterday coming from the new visualization features of the magnificent Stacks Project. Summary: you can make a directed graph whose vertices are the 10,445 tagged assertions in the Stacks Project, and whose edges are logical dependency. So this graph (hopefully!) doesn’t have any directed cycles. (Actually, Cathy tells me that the Stacks Project autovomits out any contribution that would create a logical cycle! I wish LaTeX could do that.)
Given any assertion v, you can construct the subgraph G_v of vertices which are the terminus of a directed path starting at v. And Cathy finds that if you plot the number of vertices and number of edges of each of these graphs, you get something that looks really, really close to a line.
Why is this so? Does it suggest some underlying structure? I tend to say no, or at least not much — my guess is that in some sense it is “expected” for graphs like this to have this sort of property.
Because I am trying to get strong at sage I coded some of this up this morning. One way to make a random directed graph with no cycles is as follows: start with N edges, and a function f on natural numbers k that decays with k, and then connect vertex N to vertex N-k (if there is such a vertex) with probability f(k). The decaying function f is supposed to mimic the fact that an assertion is presumably more likely to refer to something just before it than something “far away” (though of course the stack project is not a strictly linear thing like a book.)
Here’s how Cathy’s plot looks for a graph generated by N= 1000 and f(k) = (2/3)^k, which makes the mean out-degree 2 as suggested in Cathy’s post.
Pretty linear — though if you look closely you can see that there are really (at least) a couple of close-to-linear “strands” superimposed! At first I thought this was because I forgot to clear the plot before running the program, but no, this is the kind of thing that happens.
Is this because the distribution decays so fast, so that there are very few long-range edges? Here’s how the plot looks with f(k) = 1/k^2, a nice fat tail yielding many more long edges:
My guess: a random graph aficionado could prove that the plot stays very close to a line with high probability under a broad range of random graph models. But I don’t really know!
Update: Although you know what must be happening here? It’s not hard to check that in the models I’ve presented here, there’s a huge amount of overlap between the descendant graphs; in fact, a vertex is very likely to be connected all but c of the vertices below it for a suitable constant c.
I would guess the Stacks Project graph doesn’t have this property (though it would be interesting to hear from Cathy to what extent this is the case) and that in her scatterplot we are not measuring the same graph again and again.
It might be fun to consider a model where vertices are pairs of natural numbers and (m,n) is connected to (m-k,n-l) with probability f(k,l) for some suitable decay. Under those circumstances, you’d have substantially less overlap between the descendant trees; do you still get the approximately linear relationship between edges and nodes?
So yesterday I told you about the cool new visualizations now available on Johan’s Stack Project.
But how do we use these visualizations to infer something about either mathematics or, at the very least, the way we think about mathematics? Here’s one way we thought of with Pieter.
So, there’s a bunch of results, and each of them has its own subgraph of the entire graph which positions that result as the “base node” and shows all the other results which it logically depends on.
And each of those graphs has structure and attributes, the stupidest two of which are the just counts of the nodes and edges. So for each result, we have an ordered pair (#nodes, #edges). What can we infer about mathematics from these pairs?
Here’s a scatter plot of the nodes-vs-edges for each of the 10,445 results (email me if you want to play with this data yourself):
I also put a best-fit line in, just to illustrate that the scatter plot is super linear but not perfectly linear.
So there are a bunch of comments I can make about this, but I’ll limit myself to the following:
- There are a lot of points at (1,0), corresponding to remarks, axioms, beginning lemmas, definitions, and tags for sections.
- As a data person, let me just say that data is never this clean. There’s something going on, some internal structure to these graphs that we should try to understand.
- By “clean” I’m not exactly referring to the fact that things look pretty linear, although that’s weird and we should think about that. What I really mean is that things are so close to the curve that is being approximated. They’re all within a very tight border of this imaginary line. It’s super amazing.
- Let’s pretend it’s just plain straight. Does that make sense, that as graphs get more complex the edges don’t get more dense than some multiple (1.86) of of the number of nodes?
- Kind of: remember, we don’t depict all logical dependency edges, just the ones that are directly referred to in the proof of a result. So right off the bat you are less surprised that the edges aren’t growing quadratically in the number of nodes, even though the number of possible edges is of course quadratic in the number of nodes.
- Think about it this way: assume that every result that requires proof (so, that’s not a (1,0) result) refers to exactly 2 other results in its proof. Then those two child results each correspond to some subgraph of the entire graph, and say their subgraphs each have something like twice as many edges as nodes. Then, ignoring overlap, we’d see two graphs with a 2:1 ratio, then we’d see that parent node, plus two edges leading to each result, which is also a 2:1 ratio, and the disjoint union of all those graphs gives us a large graph with a 2:1 ratio.
- Then if you imagine now allowing the overlap, the ratio goes down a bit on average. In this toy model, the discrepancy between 2.0 and the slope we actually see, 1.86, is a measurement of the collapse of the two child graphs, which can be taken as a proxy for how much the two supporting results overlap as notions.
- Of course, not every result has exactly two children.
- Plus it doesn’t really explain how ridiculously consistent the plot above is. What would?
- If you think about it, the only real explanation of the consistency above is my husband brain.
- In other words, he’s humming along, thinking about stacks, and at some point, when he thinks things have gotten complicated enough, he says to himself “It’s time to wrap this stuff up and call it a result!” and then he does so. That moment, when he’s decided things are getting complicated enough, is very consistent internally to his brain.
- In other words, if someone else created the stacks project, I’d expect to see another kind of plot, possibly also very consistent, but possibly with a different slope.
- Also it’d be interesting to compare this plot to another kind of citation network graph, like the papers in the arXiv. Has anyone made that?
Crossposted on Not Even Wrong.
Here’s a completely biased interview I did with my husband A. Johan de Jong, who has been working with Pieter Belmans on a very cool online math project using d3js. I even made up some of his answers (with his approval).
Q: What is the Stacks Project?
A: It’s an open source textbook and reference for my field, which is algebraic geometry. It builds foundations starting from elementary college algebra and going up to algebraic stacks. It’s a self-contained exposition of all the material there, which makes it different from a research textbook or the experience you’d have reading a bunch of papers.
We were quite neurotic setting it up – everything has a proof, other results are referenced explicitly, and it’s strictly linear, which is to say there’s a strict ordering of the text so that all references are always to earlier results.
Of course the field itself has different directions, some of which are represented in the stacks project, but we had to choose a way of presenting it which allowed for this idea of linearity (of course, any mathematician thinks we can do that for all of mathematics).
Q: How has the Stacks Project website changed?
A: It started out as just a place you could download the pdf and tex files, but then Pieter Belmans came on board and he added features such as full text search, tag look-up, and a commenting system. In this latest version, we’ve added a whole bunch of features, but the most interesting one is the dynamic generation of dependency graphs.
We’ve had some crude visualizations for a while, and we made t-shirts from those pictures. I even had this deal where, if people found mathematical mistakes in the Stacks Project, they’d get a free t-shirt, and I’m happy to report that I just last week gave away my last t-shirt. Here’s an old picture of me with my adorable son (who’s now huge).
Q: Talk a little bit about the new viz.
A: First a word about the tags, which we need to understand the viz.
Every mathematical result in the Stacks Project has a “tag”, which is a four letter code, and which is a permanent reference for that result, even as other results are added before or after that one (by the way, Cathy O’Neil figured this system out).
The graphs show the logical dependencies between these tags, represented by arrows between nodes. You can see this structure in the above picture already.
So for example, if tag ABCD refers to Zariski’s Main Theorem, and tag ADFG refers to Nakayama’s Lemma, then since Zariski depends on Nakayama, there’s a logical dependency, which means the node labeled ABCD points to the node labeled ADFG in the entire graph.
Of course, we don’t really look at the entire graph, we look at the subgraph of results which a given result depends on. And we don’t draw all the arrows either, we only draw the arrows corresponding to direct references in the proofs. Which is to say, in the subgraph for Zariski, there will be a path from node ABCD to node ADFG, but not necessarily a direct link.
Q: Can we see an example?
Let’s move to an example for result 01WC, which refers to the proof that “a locally projective morphism is proper”.
First, there are two kinds of heat maps. Here’s one that defines distance as the maximum (directed) distance from the root node. In other words, how far down in the proof is this result needed? In this case the main result 01WC is bright red with a black dotted border, and any result that 01WC depends on is represented as a node. The edges are directed, although the arrows aren’t drawn, but you can figure out the direction by how the color changes. The dark blue colors are the leaf nodes that are farthest away from the root.
Another way of saying this is that the redder results are the results that are closer to it in meaning and sophistication level.
Note if we had defined the distance as the minimum distance from the root node (to come soon hopefully), then we’d have a slightly different and also meaningful way of thinking about “redness” as “relevance” to the root node.
This is a screenshot but feel free to play with it directly here. For all of the graphs, hovering over a result will cause the statement of the result to appear, which is awesome.
Next, let’s look at another kind of heat map where the color is defined as maximum distance from some leaf note in the overall graph. So dark blue nodes are basic results in algebra, sheaves, sites, cohomology, simplicial methods, and other chapters. The link is the same, you can just toggle between the different metric.
Next we delved further into how results depend on those different topics. Here, again for the same result, we can see the extent to which that result depends on the different on results from the various chapters. If you scroll over the nodes you can see more details. This is just a screenshot but you can play with it yourself here and you can collapse it in various ways corresponding to the internal hierarchy of the project.
Finally, we have a way of looking at the logical dependency graph directly, where result node is labeled with a tag and colored by “type”: whether it’s a lemma, proposition, theorem, or something else, and it also annotates the results which have separate names. Again a screenshot but play with it here, it rotates!
Check out the whole project here, and feel free to leave comments using the comment feature!
You should really read Nagpal’s guest blogpost from Scientific American (hat tip Ken Ribet) yourself, but here’s just a sneak preview, namely her check list of survival tactics that she describes in more detail later in the piece:
- I decided that this is a 7-year postdoc.
- I stopped taking advice.
- I created a “feelgood” email folder.
- I work fixed hours and in fixed amounts.
- I try to be the best “whole” person I can.
- I found real friends.
- I have fun “now”.
I really love this list, especially the “stop taking advice” part. I can’t tell you how much crap advice you get when you’re a tenure-track woman in a technical field. Nagpal was totally right to decide to ignore it, and I wish I’d taken her advice to ignore people’s advice, even though that sounds like a logical contradiction.
What I like the most about her list was her insistence on being a whole person and having fun – I have definitely had those rules since forever, and I didn’t have to make them explicit, I just thought of them as obvious, although maybe it was for me because my alternative was truly dark.
It’s just amazing how often people are willing to make themselves miserable and delay their lives when they’re going for something ambitious. For some reason, they argue, they’ll get there faster if they’re utterly submissive to the perceived expectations.
What bullshit! Why would anyone be more efficient at learning, at producing, or at creating when they’re sleep-deprived and oppressed? I don’t get it. I know this sounds like a matter of opinion but I’m super sure there’ll be some study coming out describing the cognitive bias which makes people believe this particular piece of baloney.
Here’s some advice: go get laid, people, or whatever it is that you really enjoy, and then have a really good night’s sleep, and you’ll feel much more creative in the morning. Hell, you might even think of something during the night – all my good ideas come to me when I’m asleep.
Even though her description of tenure-track life resonates with me, this problem, of individuals needlessly sacrificing their quality of life, isn’t confined to academia by any means. For example I certainly saw a lot of it at D.E. Shaw as well.
In fact I think it happens anywhere where there’s an intense environment of expectation, with some kind of incredibly slow-moving weeding process – academia has tenure, D.E. Shaw has “who gets to be a Managing Director”. People spend months or even years in near-paralysis wondering if their superiors think they’re measuring up. Gross!
Ultimately it happens to someone when they start believing in the system. Conversely the only way to avoid that kind of oppression is to live your life in denial of the system, which is what Nagpal achieved by insisting on thinking of her tenure-track job as having no particular goal.
Which didn’t mean she didn’t work hard and get her personal goals done, and I have tremendous respect for her work ethic and drive. I’m not suggesting that we all get high-powered positions and then start slacking. But we have to retain our humanity above all.
Bottomline, let’s perfect the art of ignoring the system when it’s oppressive, since it’s a useful survival tactic, and also intrinsically changes the system in a positive way by undermining it. Plus it’s way more fun.
I wrote a post three months ago talking about how we don’t need better models but we need to stop lying with our models. My first example was municipal debt and how various towns and cities are in deep debt partly because their accounting for future pension obligations allows them to be overly optimistic about their investments and underfund their pension pots.
This has never been more true than it is right now, and as this New York Times Dealbook article explains, was a major factor in Detroit’s bankruptcy filing this past week. But don’t make any mistake: even in places where they don’t end up declaring bankruptcy, something is going to shake out because of these broken models, and it isn’t going to be extra money for retired civil servants.
It all comes down to wanting to avoid putting required money away and hiring quants (in this case actuaries) to make that seem like it’s mathematically acceptable. It’s a form of mathematical control fraud. From the article:
When a lender calculates the value of a mortgage, or a trader sets the price of a bond, each looks at the payments scheduled in the future and translates them into today’s dollars, using a commonplace calculation called discounting. By extension, it might seem that an actuary calculating a city’s pension obligations would look at the scheduled future payments to retirees and discount them to today’s dollars.
But that is not what happens. To calculate a city’s pension liabilities, an actuary instead projects all the contributions the city will probably have to make to the pension fund over time. Many assumptions go into this projection, including an assumption that returns on the investments made by the pension fund will cover most of the plan’s costs. The greater the average annual investment returns, the less the city will presumably have to contribute. Pension plan trustees set the rate of return, usually between 7 percent and 8 percent.
In addition, actuaries “smooth” the numbers, to keep big swings in the financial markets from making the pension contributions gyrate year to year. These methods, actuarial watchdogs say, build a strong bias into the numbers. Not only can they make unsustainable pension plans look fine, they say, but they distort the all-important instructions actuaries give their clients every year on how much money to set aside to pay all benefits in the future.
One caveat: if the pensions have actually been making between 7 percent and 8 percent on their investments every year then all is perhaps well. But considering that they typically invest in bonds, not stocks – which is a good thing – we’re likely seeing much smaller returns than that, which means their yearly contributions to the local pension plans are in dire straits.
What’s super interesting about this article is that it goes into the action on the ground inside the Actuary community, since their reputations are at stake in this battle:
A few years ago, with the debate still raging and cities staggering through the recession, one top professional body, the Society of Actuaries, gathered expert opinion and realized that public pension plans had come to pose the single largest reputational risk to the profession. A Public Plans Reputational Risk Task Force was convened. It held some meetings, but last year, the matter was shifted to a new body, something called the Blue Ribbon Panel, which was composed not of actuaries but public policy figures from a number of disciplines. Panelists include Richard Ravitch, a former lieutenant governor of New York; Bradley Belt, a former executive director of the Pension Benefit Guaranty Corporation; and Robert North, the actuary who shepherds New York City’s five big public pension plans.
I’m not sure what happened here, but it seems like a bunch of people in a profession, the actuaries, got worried that they were being used by politicians, and decided to investigate, but then that initiative got somehow replaced by a bunch of politicians. I’d love to talk to someone on the inside about this.
When I worked as a research mathematician, I was always flabbergasted by the speed at which other people would seem to absorb mathematical theory. I had then, and pretty much have now, this inability to believe anything that I can’t prove from first principles, or at least from stuff I already feel completely comfortable with. For me, it’s essentially mathematically unethical to use a result I can’t prove or at least understand locally.
I only recently realized that not everyone feels this way. Duh. People often just assemble accepted facts about a field quickly just to explore the landscape and get the feel for something – it makes complete sense to me now that one can do this and it doesn’t seem at all weird. And it explains what I saw happening in grad school really well too.
Most people just use stuff they “know to be true,” without having themselves gone through the proof. After all, things like Deligne’s work on Weil Conjectures or Gabber’s recent work on finiteness of etale cohomology for pseudo-excellent schemes are really fucking hard, and it’s much more efficient to take their results and use them than it is to go through all the details personally.
After all, I use a microwave every day without knowing how it works, right?
I’m not sure I know where I got the feeling that this was an ethical issue. Probably it happened without intentional thought, when I was learning what a proof is in math camp, and I’d perhaps state a result and someone would say, how do you know that? and I’d feel like an asshole unless I could prove it on the spot.
Anyway, enough about me and my confused definition of mathematical ethics – what I now realize is that, as mathematics is developed more and more, it will become increasingly difficult for a graduate student to learn enough and then prove an original result without taking things on faith more and more. The amount of mathematical development in the past 50 years is just frighteningly enormous, especially in certain fields, and it’s just crazy to imagine someone learning all this stuff in 2 or 3 years before working on a thesis problem.
What I’m saying, in other words, is that my ethical standards are almost provably unworkable in modern mathematical research. Which is not to say that, over time, a person in a given field shouldn’t eventually work out all the details to all the things they’re relying on, but it can’t be linear like I forced myself to work.
And there’s a risk, too: namely, that as people start getting used to assuming hard things work, fewer mistakes will be discovered. It’s a slippery slope.
There’s really exciting news in the world of number theory, my old field. I heard about it last month but it just hit the mainstream press.
Namely, mathematician Yitang Zhang just proved is that there are infinitely many pairs of primes that differ by at most 70,000,000. His proof is available here and, unlike Mochizuki’s claim of a proof of the ABC Conjecture, this has already been understood and confirmed by the mathematical community.
Also, my buddy and mathematical brother Jordan Ellenberg has an absolutely beautiful article in Slate explaining why mathematicians believed this theorem had to be true, due to the extent to which we can consider prime numbers to act as if they are “randomly distributed.” My favorite passage from Jordan’s article:
It’s not hard to compute that, if prime numbers behaved like random numbers, you’d see precisely the behavior that Zhang demonstrated. Even more: You’d expect to see infinitely many pairs of primes that are separated by only 2, as the twin primes conjecture claims.
(The one computation in this article follows. If you’re not onboard, avert your eyes and rejoin the text where it says “And a lot of twin primes …”)
Among the first N numbers, about N/log N of them are primes. If these were distributed randomly, each number n would have a 1/log N chance of being prime. The chance that n and n+2 are both prime should thus be about (1/log N)^2. So how many pairs of primes separated by 2 should we expect to see? There are about N pairs (n, n+2) in the range of interest, and each one has a (1/log N)^2 chance of being a twin prime, so one should expect to find about N/(log N)^2 twin primes in the interval.
I was recently interviewed by Caroline Chen, a graduate student at Columbia’s Journalism School, about the status of Mochizuki’s proof the the ABC Conjecture. I think she found me through my previous post on the subject.
Anyway, her article just came out, and I like it and wanted to share it, even though I don’t like the title (“The Paradox of the Proof”) because I don’t like the word paradox (when someone calls something a paradox, it means they are making an assumption that they don’t want to examine). But that’s just a pet peeve – the article is nice, and it features my buddies Moon and Jordan and my husband Johan.
Read the article here.
This is a guest post by Kaisa Taipale. Kaisa got a BS at Caltech, a Ph.D. in math at the University of Minnesota, was a post-doc at MSRI, an assistant professor at St. Olaf College 2010-2012, and is currently visiting Cornell, which is where I met here a couple of weeks ago, and where she told me about her cool visualizations of math Ph.D. emigration patterns and convinced her to write a guest post. Here’s Kaisa on a bridge:
Math data and viz
I was inspired by this older post on Mathbabe, about visualizing the arXiv postings of various math departments.
It got me thinking about tons of interesting questions I’ve asked myself and could answer with visualizations: over time, what’s been coolest on the arXiv? are there any topics that are especially attractive to hiring institutions? There’s tons of work to do!
I had to start somewhere though, and as I’m a total newbie when it comes to data analysis, I decided to learn some skills while focusing on a data set that I have easy non-technical access to and look forward to reading every year. I chose the AMS Annual Survey. I also wanted to stick to questions really close to my thoughts over the last two years, namely the academic job search.
I wanted to learn to use two tools, R and Circos. Why Circos? See the visualizations of college major and career path here - it’s pretty! I’ve messed around with a lot of questions, but in this post I’ll look at two and a half.
Where do graduating PhDs from R1 universities end up, in the short term? I started with graduates of public R1s, as I got my PhD at one.
The PhD-granting institutions are colored green, while academic institutions granting other degrees are in blue. Purple is for business, industry, government, and research institutions. Red is for non-U.S. employment or people not seeking — except for the bright red, which is still seeking. Yellow rounds things out at unknown. Remember, these figures are for immediate plans after graduation rather than permanent employment.
While I was playing with this data (read “learning how to use the reshape and ggplot2 packages”) I noticed that people from private R1s tend to end up at private R1s more often. So I graphed that too.
Does the professoriate in the audience have any idea if this is self-selection or some sort of preference on the part of employers? Also, what happened between 2001 and 2003? I was still in college, and have no idea what historical events are at play here.
Where mathematicians go
For any given year, we can use a circular graph to show us where people go. This is a more clumped version of the above data from 2010 alone, plotted using Circos. (Supplemental table E.4 from the AMS report online.)
The other question – the question current mathematicians secretly care more about, in a gossipy and potentially catty way – is what fields lead to what fate. We all know algebra and number theory are the purest and most virtuous subjects, and applied math is for people who want to make money or want to make a difference in the world.
[On that note, you might notice that I removed statistics PhDs in the visualization below, and I also removed some of the employment sectors that gained only a few people a year. The stats ribbons are huge and the small sectors are very small, so for looks alone I took them out.]
Higher resolution version available here.
I wish I could animate a series of these to show this view over time as well. Let me know if you know how to do that! Another nice thing I could do would be to set up a webpage in which these visualizations could be explored in a bit more depth. (After finals.)
- I haven’t computed any numbers for you
- the graphs from R show employment in each field by percentage of graduates instead of total number per category;
- it’s hard to show both data over time and all the data one could explore. But it’s a start.
I should finish with a shout-out to Roger Peng and Jeff Leek, though we’ve never met: I took Peng’s Computing for Data Analysis and much of Leek’s Data Analysis on Coursera (though I’m one of those who didn’t finish the class). Their courses and Stack Overflow taught me almost everything I know about R. As I mentioned above, I’m pretty new to this type of analysis.
What questions would you ask? How can I make the above cooler? Did you learn anything?
I’m returning from two full days of talking to mathematicians and applied mathematicians at Cornell. I was really impressed with the people I met there – thoughtful, informed, and inquisitive – and with the kind reception they gave me.
I gave an “Oliver Talk” which was joint with the applied math colloquium on Thursday afternoon. The goal of my talk was to convince mathematicians that there’s a very bad movement underway whereby models are being used against people, in predatory ways, and in the name of mathematics. I turned some people off, I think, by my vehemence, but then again it’s hard not get riled up about this stuff, because it’s creepy and I actually think there’s a huge amount at stake.
One thing I did near the end of my talk was bring up (and recruit for) the idea of a panel of mathematicians which defines standards for public-facing models and vets the current crop.
The first goal of such a panel would be to define mathematical models, with a description of “best practices” when modeling people, including things like anticipating impact, gaming, and feedback loops of models, and asking for transparent and ongoing evaluation methods, as well as having minimum standards for accuracy.
The second goal of the panel would be to choose specific models that are in use and measure the extent to which they pass the standards of the above best practices rubric.
So the teacher value-added model, I’d expect, would fail in that it doesn’t have an evaluation method, at least that is made public, nor does it seem to have any accuracy standards, even though it’s widely used and is high impact.
I’ve had some pretty amazing mathematicians already volunteer to be on such a panel, which is encouraging. What’s cool is that I think mathematicians, as a group, are really quite ethical and can probably make their voices heard and trusted if they set their minds to it.
This is a guest post by Julia Evans. Julia is a data scientist & programmer who lives in Montréal. She spends her free time these days playing with data and running events for women who program or want to — she just started a Montréal chapter of pyladies to teach programming, and co-organize a monthly meetup called Montréal All-Girl Hack Night for women who are developers.
I asked mathbabe a question a few weeks ago saying that I’d recently started a data science job without having too much experience with statistics, and she asked me to write something about how I got the job. Needless to say I’m pretty honoured to be a guest blogger here Hopefully this will help someone!
Last March I decided that I wanted a job playing with data, since I’d been playing with datasets in my spare time for a while and I really liked it. I had a BSc in pure math, a MSc in theoretical computer science and about 6 months of work experience as a programmer developing websites. I’d taken one machine learning class and zero statistics classes.
In October, I left my web development job with some savings and no immediate plans to find a new job. I was thinking about doing freelance web development. Two weeks later, someone posted a job posting to my department mailing list looking for a “Junior Data Scientist”. I wrote back and said basically “I have a really strong math background and am a pretty good programmer”. This email included, embarrassingly, the sentence “I am amazing at math”. They said they’d like to interview me.
The interview was a lunch meeting. I found out that the company (Via Science) was opening a new office in my city, and was looking for people to be the first employees at the new office. They work with clients to make predictions based on their data.
My interviewer (now my manager) asked me about my role at my previous job (a little bit of everything — programming, system administration, etc.), my math background (lots of pure math, but no stats), and my experience with machine learning (one class, and drawing some graphs for fun). I was asked how I’d approach a digit recognition problem and I said “well, I’d see what people do to solve problems like that, and I’d try that”.
I also talked about some data visualizations I’d worked on for fun. They were looking for someone who could take on new datasets and be independent and proactive about creating model, figuring out what is the most useful thing to model, and getting more information from clients.
I got a call back about a week after the lunch interview saying that they’d like to hire me. We talked a bit more about the work culture, starting dates, and salary, and then I accepted the offer.
So far I’ve been working here for about four months. I work with a machine learning system developed inside the company (there’s a paper about it here). I’ve spent most of my time working on code to interface with this system and make it easier for us to get results out of it quickly. I alternate between working on this system (using Java) and using Python (with the fabulous IPython Notebook) to quickly draw graphs and make models with scikit-learn to compare our results.
I like that I have real-world data (sometimes, lots of it!) where there’s not always a clear question or direction to go in. I get to spend time figuring out the relevant features of the data or what kinds of things we should be trying to model. I’m beginning to understand what people say about data-wrangling taking up most of their time. I’m learning some statistics, and we have a weekly Friday seminar series where we take turns talking about something we’ve learned in the last few weeks or introducing a piece of math that we want to use.
Overall I’m really happy to have a job where I get data and have to figure out what direction to take it in, and I’m learning a lot.
I’m back! I missed you guys bad.
My experience with Seattle in the last 8 days has convinced me of something I rather suspected, namely I’m a huge New York snob and can’t exist happily anywhere else. I will spare you the details (they have to do with cars, subways, and being an asshole pedestrian) but suffice it to say, glad to be home.
Just a few caveats on complaining about my vacation:
- I enjoyed visiting the University of Washington and giving the math colloquium there as well as a “Math Day” talk where I showed kids the winning strategy for Nim (as well as other impartial two-player games) following my notes from last summer.
- I enjoyed reading Leon and Becky’s guest posts. Thanks guys!
- And then there was the time spent with my darling family. Of course, goes without saying, it’s always magical to get to the point where your kids have invented a whole new language of insults after you’ve outlawed certain words: “Shut your fidoodle, you syncopathic lardle!”
Of all the topics I want to write about today, I’ve decided to go with the most immediate and surprising one : Leila Schneps is now a mystery writer! How cool is that? She’s written a book with her daughter, Math on Trial: How Numbers Get Used and Abused in the Courtroom, currently in stock and available on Amazon. And she wrote an op-ed for the New York Times talking about it (hat tip Chris Wiggins).
I know Leila from having been her grad student assistant at the GWU Summer Program for Women in Math the first year it existed, in 1995. She taught undergrads about Galois cohomology and interpreted elements of as twists and elements of as obstructions and then had them do a bunch of examples for homework with me. It was pretty awesome, and I learned a ton. Leila is also a regular and fantastic commenter on mathbabe.
I love the premise of the book she’s written. She finds a bunch of historical examples where mathematics is used in trials to the detriment of justice, and people get unfairly jailed (or, less often, let free). From the op-ed (emphasis mine):
Decades ago, the Harvard law professor Laurence H. Tribe wrote a stinging denunciation of the use of mathematics at trial, saying that the “overbearing impressiveness” of numbers tends to “dwarf” other evidence. But we neither can nor should throw math out of the courtroom. Advances in forensics, which rely on data analysis for everything from gunpowder to DNA, mean that quantitative methods will play an ever more important role in judicial deliberations.
The challenge is to make sure that the math behind the legal reasoning is fundamentally sound. Good math can help reveal the truth. But in inexperienced hands, math can become a weapon that impedes justice and destroys innocent lives.