Plumping up darts
Someone asked me a math question the other day and I had fun figuring it out. I thought it would be nice to write it down.
So here’s the problem. You are getting to see sample data and you have to infer the underlying distribution. In fact you happen to know you’re getting draws – which, because I’m a basically violent person, I like to think of as throws of a dart – from a uniform distribution from 0 to some unknown and you need to figure out what is. All you know is your data, so in particular you know how many dart throws you’ve gotten to see so far. Let’s say you’ve seen draws.
In other words, given what’s your best guess for ?
First, in order to simplify, note that all that really matters in terms of the estimate of is what is and how big is.
Next, note you might as well assume that and you just don’t know it yet.
With this set-up, you’ve rephrased the question like this: if you throw darts at the interval , then where do you expect the right-most dart – the maximum – to land?
It’s obvious from this phrasing that, as goes to infinity, you can expect a dart to get closer and closer to 1. Moreover, you can look at the simplest case, where and since the uniform distribution is symmetric, you can see the answer is 1/2. Then you might guess the overall answer, which depends on and goes to 1 as goes to infinity, might be . It makes intuitive sense, but how do you prove that?
Start with a small case where you know the answer. For we just need to know what the expected value of is, and since there’s one dart, the max is just itself, which is to say we need to compute a simple integral to find the expected value (note it’s coming in handy here that I’ve normalized the interval from 0 to 1 so I don’t have to divide by the width of the interval):
and we recover what we already know. In the next case, we need to integrate over two variables (same comment here, don’t have to divide by area of the 1×1 square base):
If you think about it, though, and play symmetric parts in this matter, so you can assume without loss of generality that is bigger, as long as we only let range between 0 and and then multiply the end result by 2:
But that simplifies to:
Let’s do the general case. It’s an n-fold integral over the maximum of all darts, and again without loss of generality is the maximum as long as we remember to multiply the whole thing by . We end up computing:
But this collapses to:
To finish the original question, take the maximum value in your collection of draws and multiply it by the plumping factor to get a best estimate of the parameter