Ideas for two thesis problems in data science
Natural Language Processing on math overflow
You know about math overflow? It’s a site where grad students in math (or anyone) go and pose questions, and other people can answer them. There are lots of uninteresting, unanswered questions (like questions that are too easy and the person should be able to look up) and there are some really popular ones and some really dumb ones. Sometimes there are interesting ones.
Here’s a thesis idea, come up with a metric for “interestingness” and try to forecast the interestingness of a question from its language. Might as well also try to forecast its popularity while you’re at it. That way, if you make a good model, some of the more interesting questions will get higher in the queue and people will have a better time at the site.
Genealogy graphs in different fields
You know about the mathematics genealogy project? It shows everyone with a Ph.D. in math and considers them to be “descended” from their advisor in a family-tree like structure. For example, I’m here, and if I got up through my ancestors in 7 steps I get to Jacobi. Actually there are lots of ways to go up since a bunch of people have more than one advisor – I’m also 7 steps away from Poisson, 8 from Lagrange and Laplace, and 9 from Euler. This is probably not because I’m so cool but because there just weren’t many mathematicians back then- probably most people descended from Euler. And because we have this cool data set we can see if that’s true!
Here’s what I think someone should do, besides visualizing this graph in an awesome way (which by itself would be really cool, has anyone done that?). They should draw the graph for other fields as well and try to see if there are graph properties that characterize mathematics as distinct from other disciplines like Physics or Law or History.