Home > modeling, open source tools > Ignore data, focus on power

Ignore data, focus on power

May 20, 2014

I get asked pretty often whether I “believe” in open data. I tend to murmur a response along the lines of “it depends,” which doesn’t seem too satisfying to me or to the person I’m talking about. But this morning, I’m happy to say, I’ve finally come up with a kind of rule, which isn’t universal. It focuses on power.

Namely, I like data that shines light on powerful people. Like the Sunlight Foundation tracks money and politicians, and that’s good. But I tend to want to protect powerless people, like people who are being surveilled with sensors and their phones. And the thing is, most of the open data focuses on the latter. How people ride the subway or how they use the public park or where they shop.

Something in the middle is crime data, where you have compilation of people being stopped by the police (powerless) and the police themselves (powerful). But here as well you’ll notice an asymmetry on identifying information. Looking at Stop and Frisk data, for example, there’s a precinct to identify the police officer, but no badge number, whereas there’s a bunch of identifying information about the person being stopped which is recorded.

A lot of the time you won’t even find data about powerful people. Worker bees get scored but the managers are somehow above scoring. Bloomberg never scored his lieutenants or himself even when he insisted that teachers should be scored. I like to keep an eye on who gets data collected about them. The power is where the data isn’t.

I guess my point is this. Data and data modeling are not magical tools. They are in fact crude tools, and so to focus on them is misleading and distracting from the real show, which is always about power (and/or money). It’s a boondoggle to think about data when we should be thinking about when and how a model is being wielded and who gets to decide.

One of the biggest problem we face is that all this data is being collected and saved now and the models haven’t even been invented yet. That’s why there’s so much urgency in getting reasonable laws in place to protect the powerless.

  1. May 20, 2014 at 12:02 pm

    Amen! I used to work for a software company that specialized in aggregating enterprise budget and forecasting data. Their main sales point was simple. The people that knew the most about the details of a particular operation should be involved in budgeting and forecasting. The idea was that if you collect accurate information from people that are directly involved in company operations you will produce better forecasts and budgets. Unfortunately this is not how budgeting and forecasting works in most organizations. What typically happens is that someone with power decides what the numbers should be and then all the minions are sent scurrying to make it so. The aggregation process is mostly a waste of time and is done to make people feel their little voices matter. Our political process is similar. How many pointless dog and pony show meeting are held to solicit input on issues that have already been decided?


  2. daviddlewis
    May 20, 2014 at 12:15 pm

    “The power is where the data isn’t.” is a marvelously succinct heuristic. The optimist would add “Yet.” after that.


  3. May 20, 2014 at 6:08 pm

    Good post and why I think ALL data sellers and distributors should first off be required to buy a license and how in the world do you regulate without identifying the group? We need an index to begin with for sure and that would also give law enforcement a few teeth as well. They could shut down anyone for selling without a license and so on as a simple example.

    In addition all should be required to keep an updated page on a federal site describing what kind of data they sell and to who. I’ve been all over the FTC on this for a couple years now and it’s step one and then we can get into models and laws to apply. Unless someone has a better idea I think it’s a start as how do you begin to even think about regulating a group that has not been identified?

    I had a great day today picking apart the the Wall Street Journal on a related topic with doctor ratings sites, like Healthgrades and Vitals. They came out and said “oh boy MDs are seeing the importance here of updating their profiles”…well three years ago the AMA and I had a nice conversation about the “flawed data” that’s out there too. I got mad when I found my former MD who had been dead for 8 years still shown as taking new patients…and it’s just as bad today as of a few months ago you could still book an appointment with Michael Jackson’s bankrupt doctor, they don’t update little or anything.

    I want good data too so it was fun to poke holes in the WSJ article today on how they got duped and duped again on these sites, talking about how important they are. It’s just like us as consumers having to go fix errors on what’s reported on us all the time, the free labor bit. The MD sites are click bait basically to generate money from ad exposure so they don’t care at all what’s on there, just want you to go click and maybe click on an ad while you are there.

    You are correct on how people can and will use data out of context against you, especially to make or save money.


  4. EJD
    May 21, 2014 at 3:06 pm

    You are a veritable 21st century Woodie Guthrie.
    You go!


  5. Mark
    May 23, 2014 at 11:11 am

    Good post and a good rule. I have said something similar over and over about the powerful always seeking to create an information differential. When I get cold calls on the phone, they always start by asking as many questions as they can about who I am. I refuse to utter a thing until I know very clearly who they are. This is a microcosm of the larger dynamic.


  1. May 24, 2014 at 1:54 pm
  2. June 2, 2014 at 10:10 pm
Comments are closed.
%d bloggers like this: