Diophantus and the math arXiv

December 3, 2012

Last night my 7th-grade son, who is working on a school project about the mathematician Diophantus, walked into the living room with a mopey expression.

He described how Diophantus worked on a series of mathematical texts called Arithmetica, in which he described the solutions to what we now describe as diophantine equations, but which are defined as polynomial equations with strictly integer coefficients, and where the solutions we care about are also restricted to be integers. I care a lot about this stuff because it’s what I studied when I was an academic mathematician, and I still consider this field absolutely beautiful.

What my son was upset about, though, was that of the 13 original books in Arhtimetica, only 6 have survived. He described this as “a way of losing progress“. I concur: Diophantus was brilliant, and there may be things we still haven’t recovered from that text.


But it also struck me that my son would be right to worry about this idea of losing progress even today.

We now have things online and often backed up, so you’d think we might never need to worry about this happening again. Moreover, there’s something called the arXiv where mathematicians and physicists put all or mostly all their papers before they’re published in journals (and many of the papers never make it to journals, but that’s another issue).

My question is, who controls this arXiv? There’s something going on here much like Josh Wills mentioned last week in Rachel Schutt’s class (and which Forbes’s Gil Press responded to already).

Namely, it’s not all that valuable to have one unreviewed, unpublished math paper in your possession. But it’s very valuable indeed to have all the math papers written in the past 10 years.

If we lost access to that collection, as a community, we will have lost progress in a huge way.

Note: I’m not accusing the people who run arXiv of anything weird. I’m sure they’re very cool, and I appreciate their work in keeping up the arXiv. I just want to acknowledge how much power they have, and how strange it is for an entire field to entrust that power to people they don’t know and didn’t elect in a popular vote.

As I understand it (and I could be wrong, please tell me if I am), the arXiv doesn’t allow crawlers to make back-ups of the documents. I think this is a mistake, as it increases the public reliance on this one resource. It’s unrobust in the same way it would be if the U.S. depended entirely on its food supply from a country whose motives are unclear.

Let’s not lose Arithmetica again.

  1. Deane
    December 3, 2012 at 9:20 am | #1

    It seems to me that some answers can be found here: http://arxiv.org/help/support/faq

    • December 3, 2012 at 10:33 am | #2

      Cool. I learned that they have digital backups (which still leaves open the possibility that they could restrict access to them). I also signed up to be on the mailing list for updates on their storage procedure and support.

  2. Tara Holm
    December 3, 2012 at 9:48 am | #3

    I agree that this is a hugely important issue.

    The arXiv was started by Paul Ginsparg in 1991 when he was at Los Alamos National Labs. In 2001, he and the arXiv moved to Cornell, whose Library took over the running and maintenance of the arXiv. In 2008, the budget to run, back up and maintain the arXiv was on the order of $400K, mostly paid for by Cornell, with some grants from NSF (and maybe others).

    In 2008, with the fiscal crisis, Cornell Libraries could no longer afford to be solely paying for this. (Planning may have started sooner than this, but it was when I first became aware of the situation.) They have made some temporary arrangements to have other (academic) libraries contribute to the arXiv (voluntary, and based on the size of the institution; access is still free, of course). They also have had a Simons Foundation planning grant to think about what the right budget model should be.

    I guess the bottom line is that the folks at Cornell who run the arXiv are smart people, they are aware of the importance of the arXiv, and they are thinking about how to sustain it.

  3. December 3, 2012 at 10:35 am | #5


    What are CUL’s preservation strategies?

    Digital preservation refers to a range of managed activities to support the long-term maintenance of bitstreams. These activities ensure that digital objects are usable (intact and readable), retaining all quantities of authenticity, accuracy, and functionality deemed to be essential when articles (and other associated materials) were ingested. Formats accepted by arXiv have been selected based on their archival value (TeX/LaTeX, PDF, HTML) and the ability to process all source files is actively monitored. The underlying bits are protected by standard backup procedures at the Cornell campus. Off-site backup facilities in New York City provide geographic redundancy. The complete content is replicated at arXiv’s mirror sites around the world, and additional managed tape backups are taken at Los Alamos National Laboratory. CUL has an archival repository to support preservation of critical content from institutional resources, including arXiv. We anticipate storing all arXiv documents, both in source and processed form, in this repository. There will be ongoing incremental ingest of new material. We expect that CUL will bear the preservation costs for arXiv, leveraging the archival infrastructure developed for the library system.

    Also, http://en.wikipedia.org/wiki/ArXiv#Access claims that you can search it via Google scholar and Windows Live Academic, so I suspect we’re OK.

  4. mathematrucker
    December 3, 2012 at 10:43 am | #6

    This issue hits very close to home for me right now.

    Lately I have been doing some historical sleuthing re Kuratowski’s 14-set theorem and related phenomena. Some of the best papers were published in the past decade, but there are also many interesting early papers that have almost completely been overlooked by later authors, especially ones written by the celebrated Ukrainian topologist Miron Zarycki (1889-1961).

    It surely didn’t help that Zarycki almost always published in non-English languages. Many of his papers are written in Ukrainian. Thankfully that solid concrete roadblock was recently lifted out of the way by the crane that is Google Translate, but the scarcity roadblock remains.

    Since the purpose of my research is limited mainly to compiling a comprehensive annotated bibliography (of sorts), obtaining copies of all the papers is of paramount importance. During the hunt, I came across an interesting 2003 article about a dangerous predicament faced by Eastern European libraries, one that I had not heard about before:


  5. December 3, 2012 at 12:12 pm | #7

    For arxiv, there’s a math advisory committee, then the SAB (Scientific Advisory Board, also MAB, another board; Cornell University Library act as executives of arxiv, having ultimate autthority). It’s naviguable from here: http://arxiv.org/new/math.html#advisory_committee .

    There’s http:www.archive.org with lots of out-of-copyright books and also Gallica at Bibliotheque nationale de France, http://gallica.bnf.fr/ . Many works such as Gauss’ Disquisitiones arithmetiquae are available as downloads (in Latin, in this case). Many European Journals allow free digital access to 50+ year old articles. I think sometimes it’s worth studying a book or paper in the language of publication. I’d suggest that learning a language might be easier than mastering World of Warcraft and other forms of entertainment, perhaps more rewarding …

  6. Sugar troll
    December 3, 2012 at 3:08 pm | #8

    Talking about Diophantine equations, do you think Fermat was a troll?

  7. mtredinnick
    December 3, 2012 at 4:17 pm | #9

    If your son is interested in historical mathematics (and recovery), he might like the book ‘The Archimedes Codex’ by Reviel Netz and William Noel. It’s a very accessible look at recovering one of Archimedes’ texts from a palimpsest of writings recovered from a monastery. Netz is a mathematician and Noel is a museum curator, so it’s not overly technical, whilst still taking a fairly deep look at things.

  8. mozibur ullah
    December 4, 2012 at 8:48 am | #11

    I kind of think people lost interest in Diophantus & his equations (hard to stomach, I know :-)), and they moved onto other things, time & attrition did the rest. I know a lot of greek texts were translated into arabic during the height of the Islamic empire (probably, as the prophet, in a hadith said ‘go, look for knowledge, be it in China’, bu Greece was a bit more to hand). Who knows what may still lie in some dusty corner of a library in Iran…

  9. M. M.
    December 4, 2012 at 7:37 pm | #12

    I obsessively save to my own system every paper that I come across that seems like it could even be vaguely useful for me, because of paranoia of repositories failing. But this seems to not be the norm. In http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1991753, on page 31 it talks about how most scientists don’t even bother saving the arXiv papers they use to their own systems, they just get them from arXiv again.

