Datadive: NYC Park data

Home > data science > Datadive: NYC Park data

Datadive: NYC Park data

September 8, 2012 Cathy O'Neil, mathbabe

I’m excited to be a data ambassador this weekend for DataKind’s NYC Parks datadive. The event is sadly sold out, but you can follow along to some extent through the wiki and through this blog.

This weekend I’m in charge of herding people who are interested in the pruning project; for that reason I’ve dubbed my self the Prune Queen, which is nice and gross sounding so I love it.

The Parks department is a New York City agency that’s in charge of our urban forest here in New York. They deal with planting trees, keeping track of what trees exist, how many trees exist, and the health of all the trees in the five boroughs. When there’s a storm, and a tree falls, they get a “request order” coming from 311 calls (or occasionally other means) and if and when they decide to go deal with the problem, a “work order” is created and a team of people is sent out to fix the problem.

A fallen tree is an expensive proposition, although unavoidable considering how many trees there are in the city. The question we are trying to address this weekend is, can we mitigate the “fallen trees” problem by pruning beforehand.

In fact, there’s been lots of tree pruning already, so we can use our data science magic to see whether or not we think pruning helps. Namely:

We’ve had various sized budgets which resulted in various levels of pruning activity in the past decade.
When they do prune, they prune an entire block, so from one corner to the next corner. They describe these as “block segments.”
Our data tells us when which block segments were pruned, at the year level. That is to say, we’ll be able to see if a given block segment was pruned in 2003, but we won’t know which month during 2003 it was pruned.

The first iteration of the model is this: does a block segment have fewer (than expected) “fallen tree” events right after being pruned?

We’d expect the answer to be yes, and we’d also expect the effect to decay over time. Maybe a block segment is protected from fallen tree events for a couple years after pruning, for example, but after about 7 or 8 years the effect has worn off. Something like that.

But then, if you think about it, the “expected” number of fallen tree events is actually kind of tricky.

If there are only 2 trees on a block, then even if there’s no pruning on that block, there are not likely to be lots of fallen tree events compared to another block that has 100 trees. So it would be great to have a sense of the density of trees on a given block segment.

Luckily, we do: we have a tree census, which is to say we know more or less where all the trees are in the five boroughs. This is a pretty crazy awesome data set when you think about it. This will allow us to define the tree density per block segment (once we establish a map between existing trees and block segments) and will therefore also allow us to have a first stab at what the “expected” rate of fallen tree events should be on a block-by-block basis.

Are there other things we should normalize for besides number of trees per block segment? Well, there have also been a number of severe storms, and even tornadoes, that have gone through Brooklyn in the last decade (and for some reason even more in the past few years). We also might want to account for a block which was directly in the path of a tornado, because we shouldn’t blame pruning or lack of pruning for an asston of fallen tree events if it was actually caused by a natural disaster.

Finally, we recently found out that a student at SIPA worked on a similar but different project: namely, whether pruning blocks mitigates future pruning requests. In other words, the same pruning (x) but a different effect (y). They actually had the dollar costs in mind and figured out how cost-effective pruning is. But then again, they didn’t account for the more expensive fallen tree events, so the project this weekend could change the results (I don’t actually know what their findings were, so far I’ve only heard this third hand).

Categories: data science

Comments (1)

Tom

September 8, 2012 at 5:04 pm

Please post your findings – I have heard that pruning actually increases the chance of trees falling in spite of conventional wisdom. It would be nice to have some data one way or the other.

LikeLike