Home > math, musing, open source tools > Two thoughts on math research papers

Two thoughts on math research papers

January 15, 2014

Today I’d like to mention two ideas I’ve been having recently on how to make being a research mathematician (even) more fun.

1) Mathematicians should consider holding public discussions about papers

First, math nerds, did you know that in statistics they have formal discussions about papers? It’s been a long-standing tradition by the Royal Statistical Society, whose motto is “Advancing the science and application of statistics, and promoting use and awareness for public benefit,” to choose papers by some criterion and then hold regular public discussions about those papers by a few experts who are not the author, about the paper. Then the author responds to their points and the whole conversation is published for posterity.

I think this is a cool idea for math papers too. One thing that kind of depressed me about math is how rarely you’d find people reading the same papers unless you specifically got a group of people together to do so, which was a lot of work. This way the work is done mostly by other people and more importantly the payoff is much better for them since everyone gets a view into the discussion.

Note I’m sidestepping who would organize this whole thing, and how the papers would be chosen exactly, but I’d expect it would improve the overall feeling that I had of being isolated in a tiny math community, especially if the conversations were meant to be penetrable.

2) There should be a good clustering method for papers around topics

This second idea may already be happening, but I’m going to say it anyway, and it could easily be a thesis for someone in CS.

Namely, the idea of using NLP and other such techniques to cluster math papers by topic. Right now the most obvious way to find a “nearby” paper is to look at the graph of papers by direct reference, but you’re probably missing out on lots of stuff that way. I think a different and possibly more interesting way would be to use the text in the title, abstract, and introduction to find papers with similar subjects.

This might be especially useful when you want to know the answer to a question like, “has anyone proved that such-and-such?” and you can do a text search for the statement of that theorem.

The good news here is that mathematicians are in love with terminology, and give weird names to things that make NLP techniques very happy. My favorite recent example which I hear Johan muttering under his breath from time to time is Flabby Sheaves. There’s no way that’s not a distinctive phrase.

The bad news is that such techniques won’t help at all in finding different fields who have come across the same idea but have different names for the relevant objects. But that’s OK, because it means there’s still lots of work for mathematicians.

By the way, back to the question of whether this has already been done. My buddy Max Lieblich has a website called MarXiv which is a wrapper over the math ArXiv and has a “similar” button. I have no idea what that button actually does though. In any case I totally dig the design of the similar button, and what I propose is just to have something like that work with NLP.

Categories: math, musing, open source tools
  1. Guest2
    January 15, 2014 at 6:57 am

    More like the Royal Society (1660): The Society’s early meetings included experiments performed first by Hooke and then by Denis Papin, who was appointed in 1684. These experiments varied in their subject area, and were both important in some cases and trivial in others. http://en.wikipedia.org/wiki/Royal_Society

    The tie-in is with a number of relevant “math destruction” issues — (1) consensus “science”, (2) visual rhetoric of the early air pump experiments, as described in the classic by Steven Shapin and Simon Schaffer, Leviathan and the air-pump. Princeton: Princeton University Press, 1985/2011, just reissued.


  2. January 15, 2014 at 9:33 am

    At one point the Royal Statistical Society discussions were also known for, um, free and frank exchange of views, but they have calmed down a lot.


  3. Bill
    January 15, 2014 at 11:21 am

    My (outsiders) view is that it would be difficult to do public talks in a way that would be both approachable to the general public as well as interesting to active researchers. I’m only married to a math Ph.D, but from what I’ve seen; much of math research can be almost impenetrable to researchers in other areas let alone those who are just intellectually curious. I took a quick look at the abstract for a recent paper presented to the RSS and it was filled with jargon which I had never seen before. While I think that having discussions of important papers/topics might be good for people working in the field; I’m not sure how much the general public would get out of it.


    • January 15, 2014 at 11:22 am

      This isn’t for the general public. I was thinking more like a colloquium style event, for other mathematicians.


      • Bill
        January 15, 2014 at 11:34 am

        Sorry, I misunderstood. When you quoted “for public benefit”, I took that too literally. Given the reduced scope of the audience, as I said already, I think it sounds like a good idea. The easiest way to do it, might be at major conferences. Much easier to get enough people knowledgable about the area to be part of the discussion panel. You would also have a ready made target audience for the discussion as well.

        Not that the discussion shouldn’t be preserved for posterity. In the modern age, video taping it would seem the obvious way to preserve it. Bonus points if you live stream it and allow remote viewers to ask questions. (That’s assumming that you use a format that allows for questions.)


  4. JSE
    January 15, 2014 at 1:23 pm

    Re clustering for math papers: Google Scholar does a surprisingly good job of this, in a limited context; it knows what all MY papers are, and it suggests to me new papers I might be interested in reading. It’s not perfect, but it does a better job than I might have expected, and it’s not just giving me papers that cite mine. It’s possible it’s doing nothing more than giving me papers that cite the same things my papers cite. But I don’t think so — after all, it gave me Mochizuki’s ABC paper!


    • January 15, 2014 at 1:26 pm

      My guess is it’s something like “people who were interested in this were also interested in this.”


  5. January 15, 2014 at 4:26 pm

    I’m intrigued by the clustering idea; I’ve done work in NLP so maybe this might be a project I might take on. I wonder if using a syntactic parser to analyze sentences in papers might be a helpful tool, rather than typically treating a document as a soup of disconnected words.

    It’s a thought at least…


    • January 15, 2014 at 4:30 pm

      Hey cool! Tell me how it goes. Happy to have a few guest posts devoted to progress as well – and maybe you can pick up some test users that way too!


  6. theoreticalminimum
    January 15, 2014 at 8:45 pm

    A couple of physicists came up with Paperscape (paperscape.org), which is a tool to visualise the arXiv. For all I know, it also maps the mathematics preprints, but there is unfortunately no MATH.** breakdown. Maybe you can ask them to do something in that direction if you think their way of going about it is interesting enough.


  7. January 15, 2014 at 8:47 pm

    How about Paperscape (paperscape.org) for MATH.**?


  8. January 16, 2014 at 12:46 pm

    In my not so humble opinion, the greatest need for public understanding of “quants,” whether academic, educated citizen, or Jane Q Public, is greater emphasis on the ERROR and built in biases (statistical, cultural, personal, professional, etc) around any reported number or result. Error analysis is probably the most valuable analytical skill we can teach everyone.


    • Bill
      January 20, 2014 at 12:10 pm

      I so agree with you. I’ve suggested that rather than so much emphasis on Calculus in high school that more time should be spent on probability and statistics. That would undoubtedly mean discrete rather than continuous probability, but it would be worth it.


  1. No trackbacks yet.
Comments are closed.
%d bloggers like this: