Working in the NYC Mayor’s Office

Home > data science, modeling > Working in the NYC Mayor’s Office

Working in the NYC Mayor’s Office

September 10, 2013 Cathy O'Neil, mathbabe

I recently took a job in the NYC Mayor’s Office as an unpaid consultant. It’s an interesting time to be working for the Mayor, to be sure – everyone’s waiting to see what happens this week with the election, and all sorts of things are up in the air. Planning essentially stops at December 31st.

Note the expiration date.

I’m working in a data group which deals with social service agency data. That means Child Services, Homeless Services, and the like. Any agency where there there is direct contact with lots of people and their data. The idea is for me to help them out with a project that, if successful, I might be able to take to another city as a product. I’m still working full-time at the same job.

Specifically, my goal is to figure out a way to use data to help the people involved – the homeless, for example – get connected to better services. As a side effect I think this should make the agency more efficient. Far too many data studies only care about efficiency – how to make do with fewer police or fewer ambulances – with no thought or care about whether the people experiencing the services are being affected. I want to start with the people, and hope for efficiency gains, which I believe will come.

One thing that has already amazed me about this job, which I’ve just started, is the conversations people have about the ethics of data privacy.

It is a well-known fact that, as you link more and more data about people together, you can predict their behavior better. So for example, you could theoretically link all the different agency data for a given person into a profile, including crime data, health data, education and the like.

This might help you profile that person, and that might help you offer them better services. But it also might not be what that person wants you to do, especially if you start adding social media information. There’s a tension between the best model and reasonable limits of privacy and decency, even when the model is intended to be used in a primarily helpful manner. It’s more obvious when you’re attempting something insidious like predictive policing, of course.

Now, it shouldn’t shock me to have such conversations, because after all we are talking about some of the most vulnerable populations here. But even so, it does.

In all my time as a predictive modeler, I’ve never been in that kind of conversation, about the malicious things people could do with such-and-such profile information, or with this or that model, unless I started it myself.

When you work as a quant in finance, the data you work with is utterly sanitized to the point where, although it eventually trickles down to humans, you are asked to think of it as generated by some kind of machine, which we call “the market.”

Similarly, when you work in ad tech or other internet modeling, you think of users as the targets of your predatory goals: click on this, user, or buy that, user! They are prey, and the more we know about them the better our aim will be. If we can buy their profiles from Acxiom, all the better for our purposes.

This is the opposite of all of that. Super interesting, and glad I am being given this opportunity.

Categories: data science, modeling

Comments (7)

Zathras

September 10, 2013 at 10:45 am

This work sounds very exciting! I look forward to see how it goes. There is such a shortage of data scientists who want to do the really messy problems outside of the stylized world of finance, etc.

LikeLike
Michael L.

September 10, 2013 at 11:37 am

I’m a social worker and sociologist by training and use quantitative methods in my research. I don’t know if the people you’re in these conversations with are social workers or have similar backgrounds but coming from such a background myself, I wonder if at least some of them do. Maybe something about the ethos of social work and similar disciplines, as well as having to get research proposals through human subjects review boards, results in ethical issues around data use being more salient.

LikeLike
- Cathy O'Neil, mathbabe
  
  September 10, 2013 at 11:43 am
  
  Yes quite a few of them do have such training. It’s a very cool and a very rich modeling environment.
  
  LikeLike
medicalquackblog

September 10, 2013 at 2:47 pm

Good stuff and observations as always and yeah the fear we all have it such data being used “out of context” against us and add in some flawed data and the fear grows:) What I keep saying is there’s a big danger when you combine “credible” data with “non credible” data and think you have accuracy..not:) And what do they do with this data…sell it. I know I’m the broken record on that topic but it’s true.

Like you I’m a big fan of trending data as it does open our eyes but when they take some of this and bring it down to applying a formula to “score” on an individual basis you end up with what used to be called “grading on the curve” if you will and folks like me are old enough to remember that fiasco from school, didn’t work:) You want your scores and data and and not meshed with group findings for sure:) Just did a post today on a recall of electronic medical records dropping out the MD notes on the patient meds in the ER records, well it kind of is a big deal as you might find some helpful information in there from the docs, like patient is allergic to . MDs get graded on pay for performance too as well as the information left out relative to the patient condition which is the #1 here of course. Part 2 is what the MDs worry about, data out of context due to algorithmic errors being used against them when pay for performance rolls around…just one little example:)

http://ducknetweb.blogspot.com/2013/09/picis-software-used-in-emergency-rooms.html

Those who control the code end up controlling the world in some fashion or another…

LikeLike
- Guest
  
  September 11, 2013 at 8:18 am
  
  “… and yeah the fear we all have it such data being used “out of context” against us and add in some flawed data and the fear grows …”
  
  A worse problem is the hidden rhetorical (i.e., persuasive) uses data is put to — foremost of which is the unspoken legitimation of the particular task buttressed with walls of impenetrable data. No one questions it. No one looks at the assumptions. What’s missing is the ever-present sense of irony, that all mediating description is at the same time denial, and a form of cognitive violence (Friedrich Adolf Kittler).
  
  Florence Nightingale, the first female member of the Royal Statistical Society, produced her rose diagram, in support of changes that she proposed.
  http://en.wikipedia.org/wiki/Rose_diagram#Polar_area_diagram
  
  LikeLike
cgutierrez777

September 11, 2013 at 9:52 am

Sounds amazing. Mad props, my lady. I hope something really good comes put of it that you can pass on to other cities.

LikeLike
John W Rodat (@johnrodat)

October 3, 2013 at 5:24 pm

Even though you don’t have a lot of time to work on this project, you’ve already gotten some new perspectives out of it. Your differentiating between the data you’re working with on this project and the “sanitized” data in the world of finance is an important start.

Not too long ago, I worked in an upstate NY county. Every county agency, e.g., social services, child services, aging, probation, mental health, health, etc. had its own data silo and several had more than one system. None connected to the others. We knew that we could better serve families as well as individuals if we could even do some minimal connecting. And yes, we probably could have saved a fair amount. Intuitively and sometimes anecdotally, we knew we had some “million dollar families,” for whom some organized interventions would likely have been useful.

Alas, that would have and still will require the sort of culture change to one that values data translated into useful information and connected with other information, that Mayor Bloomberg has engendered.

Such opportunities are increasing, but we still have a long ways to go. Cherish yours.

LikeLike