NYC Parks datadive update: does pruning prevent future fallen trees?
After introducing ourselves, we subdivided our pruning problem into 5 problems:
- mapping tree coordinates to block segments
- defining the expected number of fallen tree events based on number of trees, size and age of trees, and species,
- accounting for weather,
- designing the model assuming the above sub-models are in shape, and
- getting the data in shape to train the model (right now the data is in pieces with different formats).
After a few hours of work, there was real progress on 1 and 5, and we’d noticed that we don’t have the age of trees, but only the size, which we can use as a proxy. Moreover, the size measurements weren’t updated after they were taken once in 2005. So it would require much more domain expertise that we currently had to incorporate a model of how fast trees grow, which we don’t have time for this weekend.
Before lunch we realized we really needed to talk about 4, namely the design of the model, so we scheduled pow-wow for after lunch.
After some discussion, we settled on a univariate regression model where the basic unit is a block of trees in Brooklyn for a given year:
So for each street block and for each year of data, we define:
- to a simple function of the number of years since that block was last pruned,
- ‘s numerator to be a (weighted) count of the number of fallen tree events (or similar) the following year – this is weighted by the fact that some work orders are much more expensive than others, and
- ‘s denominator to be a (weighted) count of the number of trees on the block – this is weighted by the fact that larger trees should possibly get counted more than smaller trees.
Going back to the since we are trying to predict work orders per tree, we expect the effect of pruning on this count to be (negative and) greatest the year following pruning, and for the effect to wear off over time. So the actual function is probable or or something like that, which tends to zero as tends to infinity.
We ended up deciding that we can’t really account for weather in our model, since we won’t have any idea how many storms will pass through Brooklyn next year.
I left last night before we’d gotten all the data in shape so I’m eager to go back this morning to the presentation event and see if we have any hard results. Even if we don’t, I think we have a reasonable model and a very good start on it, and I think we will have helped the NYC Parks department with the question. I’ll update soon with the final results.