NYC Parks datadive update: does pruning prevent future fallen trees?

Home > data science > NYC Parks datadive update: does pruning prevent future fallen trees?

NYC Parks datadive update: does pruning prevent future fallen trees?

September 9, 2012 Cathy O'Neil, mathbabe

After introducing ourselves, we subdivided our pruning problem into 5 problems:

mapping tree coordinates to block segments
defining the expected number of fallen tree events based on number of trees, size and age of trees, and species,
accounting for weather,
designing the model assuming the above sub-models are in shape, and
getting the data in shape to train the model (right now the data is in pieces with different formats).

After a few hours of work, there was real progress on 1 and 5, and we’d noticed that we don’t have the age of trees, but only the size, which we can use as a proxy. Moreover, the size measurements weren’t updated after they were taken once in 2005. So it would require much more domain expertise that we currently had to incorporate a model of how fast trees grow, which we don’t have time for this weekend.

Before lunch we realized we really needed to talk about 4, namely the design of the model, so we scheduled pow-wow for after lunch.

After some discussion, we settled on a univariate regression model where the basic unit is a block of trees in Brooklyn for a given year:

$y = \alpha x + \epsilon,$

So for each street block and for each year of data, we define:

$x$ to a simple function of the number of years since that block was last pruned,
$y$ ‘s numerator to be a (weighted) count of the number of fallen tree events (or similar) the following year – this is weighted by the fact that some work orders are much more expensive than others, and
$y$ ‘s denominator to be a (weighted) count of the number of trees on the block – this is weighted by the fact that larger trees should possibly get counted more than smaller trees.

Going back to the $x,$ since we are trying to predict work orders per tree, we expect the effect of pruning on this count to be (negative and) greatest the year following pruning, and for the effect to wear off over time. So the actual function is probable $f(n) = 1/n$ or $f(n) = 1/\sqrt(n),$ or something like that, which tends to zero as $n$ tends to infinity.

We ended up deciding that we can’t really account for weather in our model, since we won’t have any idea how many storms will pass through Brooklyn next year.

I left last night before we’d gotten all the data in shape so I’m eager to go back this morning to the presentation event and see if we have any hard results. Even if we don’t, I think we have a reasonable model and a very good start on it, and I think we will have helped the NYC Parks department with the question. I’ll update soon with the final results.

Categories: data science

Comments (3)

tagouti

September 9, 2012 at 9:43 am

needs to be interactive; you are trying to mastermind a central control for a complex continually changing problem. why not involve the people on the block who walk by those trees every day? let them enter issues seen. every spring some branches are found not to have survived the winter or lack of water from the previous summer, so they die, but are still part of the tree, until a storm renders them weaker, and they break. You cannot track or model the effects of the weather so completely, so use the eyes of the people, too.

LikeLike
somedude

September 9, 2012 at 4:19 pm

My own experience with trees leads me te assume that the number of falling trees is highly dependant on species. Some fast growing species can easily lose branches. Oaks are good, willows not so much.

LikeLike
jmacclure

September 9, 2012 at 10:43 pm

I’m an actuary, we use mathematical models of survival to value liabilities for insurance companies and pension funds. To use your example, actuaries:
1. map survival probabilities to characteristics of the population: age, gender, etc
2. define the expected cash flows under a pension scheme or insurance policy
3. accounting for the economic environment
4. tweaking the assumptions in our model as experience of the specific population unfolds over time
5. here is where it gets interesting, and where i’d like to be able to apply some of this “data science” to actuarial science, where the models would essentially “train themselves” to extract trends and information from the data

LikeLike