Home > data science > Who stays off the data radar?

Who stays off the data radar?

June 25, 2013

Last night’s Data Skeptics Meetup talk by Suresh Naidu was great, as I suspected it would be. I’m not going to be able to cover everything he talked about (a discussion is forming here as well) but I’ll touch on a few things related to my chosen topic for the day, namely who stays off the data radar.

In his talk Suresh discussed the history of governments tracking people with data, which more or less until recently was the history of the census. The issue of trust or lack thereof that people have in being classified and tracked has been central since the get-go, and with it the understanding by the data collectors that people respond differently to data collection when they anticipate it being used against them.

Among other examples he mentioned the efforts of the U.S. Census Bureau to stay independent (specifically, away from any kind of tax decisions) in order to be trusted but then turning around during war time and using census tracks to put Japanese into internment camps.

It made me wonder, who distrusts data collection so much that they manage to stay off the data radar?

Suresh gave quite a few examples of people who did this out of fear of persecution or what have you, and because, at least in the example of the Domesday Book, once land ownership was written down it was somehow “more official and objective” than anything else, which of course resulted in some people getting screwed out of their land.

It’s not just a historical problem, of course: it’s still true that¬†certain populations, especially illegal immigrant populations, are afraid of how the census will be used and go undercounted. Who can say when the census might start being used to deport illegal immigrants?

As a kind of anti example, he mentioned that the census was essentially canceled in 1920 because the South knew that so many ex-slaves were moving north that their representation in government was growing weak. I say anti-example because in this case it wasn’t out of distrust, to avoid detection, but it was a savvy and political move, to remain looking large.

What about the modern version of government tracking? In this case, of course, it’s not just census data, but anything else the NSA happens to collect about us. I’m no expert (tell me if you know data on this) but I will hazard a guess on who avoids being tracked:

  1. Old people who don’t have computers and never have,
  2. Members of hacking group Anonymous who know how it works and how to bypass the system, and
  3. People who have worked or are now working at the NSA.

Of course there are a few other rare people that just happen to care enough about privacy to educate themselves on how to avoid being tracked. But it’s hard to do, obviously.

Let me soften the requirements a bit – instead of staying off the radar completely, who makes it really hard to find them?

If you’re talking about individuals, I’d start with this answer: politicians. In my work with Peter Darche and Lee Drutman from the Sunlight Foundation (blog post coming soon!) trying to follow money in politics, it’s amazed me time and time again how difficult it’s been to put together the political events for a given politician – events that are individually publicly recorded but are seemingly intentionally siloed so it will be extremely difficult to put together a narrative. Thanks to Peter’s recent efforts, and the Sunlight Foundations long-term efforts, we are getting to the point where we can do this, but it’s been a data munging problem from hell.

If you’re generalizing to entities and corporations, then the “making data collection hard” award should probably go to the corporations with hundreds of subsidiaries all over the world which now don’t even need to be reported on tax forms.

Funny how the very people who know the most about how data can be used are paranoid about being tracked.

Categories: data science
  1. tqft9999
    June 25, 2013 at 7:26 am

    Dear data nerds looking for a challenge: a consortium of journalists have compiled a downloadable database here http://offshoreleaks.icij.org/ of offshore companies used for a variety of purposes. Happy mining.


  2. June 25, 2013 at 8:53 am

    During the long 2008 election season, people in some circles were surprised by just how hard it was to get a lot of data on Barack Obama’s life, work, and beliefs. I wonder if, among millennials today who intend to become politicians in the next two decades, there is much savvy about how to keep a low data profile, or whether they have given up and are happily sharing everything on Facebook.


    • June 25, 2013 at 8:55 am

      Great example! And also note super connected people don’t bother with LinkedIn profiles.


  3. June 25, 2013 at 9:19 am

    As I think I remember reading in Daniel Okrent’s, “last call,” the story of Prohibition, the 1920 census did take place but, the House reapportionment was canceled (unconstitutionally). He suggested that the reason was that those in favor of prohibition knew that between 1910 and 1920 many people had moved from the country to the cities and that those in the cities were opposed to Prohibition. It was an example of power unconstitutionally trying to hold on to its power — specifically to thwart what they believed was the will of the people. Big surprise. It is interesting, indeed, to hear of another reason that may have played a role in that unconstitutional act.

    Also, the use of the census for Japanese interment ought to be a great example for all those who say complete surveillance is fine because they have “nothing to hide.”


    • June 25, 2013 at 9:22 am

      Interesting take on the 1920 census, I’m not a historian.

      And great point about who is comfortable with surveillance – it’s really more about how secure one feels that they live within a privileged group than whether they honestly believe data is safe.


    • Glen S. McGhee
      June 25, 2013 at 1:41 pm

      Ex-slaves in 1920 census? That war ended almost sixty years earlier, and migrants tend to be a younger generation.

      But the migration was massive — to the urban centers of the North, where the jobs were.

      Part of this was driven by the advances following WWI — the move toward mechanization, roads and autos, urban electrification, the standardization movement, the efficiency craze (to produce and move war material) — higher standard of living, yeah, but the tilt toward consumer culture, mass marketing, bureaucratization of modern life — what a price to pay!

      I wasn’t aware that House reapportionment was halted …. hm!


  4. June 25, 2013 at 9:50 am

    Old people without computers are still tracked because they love catalogs. So they still end up in databases it’s just younger fingers pushing the keys. And let’s not forget the medical databases they are all popping up in, especially the home health care suppliers.


  5. badmax
    June 25, 2013 at 10:04 am

    I always thought of government surveillance units as being like an army of the undead. Suppose you cause enough trouble to get on their radar. With all their funding, it is more than likely they will catch you eventually. At that point, you get two choices: 25 years in jail (where they make it quite clear what happens to spindly data nerds on the inside) or join us. Their talent pool just increased.

    Also, to answer your question, Lyle. Watch this if you haven’t already.



  6. Dave
    June 25, 2013 at 3:22 pm

    Aids testing in the 1980’s: All involved counseled multiple layers of stealth. Even the most indifferent student of history imagined the changes an epidemic might bring, if it threatened the survival of the species.


  7. somedude
    June 26, 2013 at 3:13 pm

    If Theodor Kaczynski could be found, then I doubt that anyone commenting here can evade detection.


    • Bobito
      June 28, 2013 at 12:00 pm

      Kaczynski was found because he didn’t protect himself – he sent a long manuscript to the New York Times – it was published at the behest of the FBI, as someone realized that whoever wrote was quite intelligent and had a distinctive voice. His brother recognized the writing, and turned him in. If he hadn’t had the psychological need to be recognized, he would have been very hard to find, as he had been for many years. After all this is a murderer who was not found for two decades.


  8. June 26, 2013 at 7:08 pm

    Thank you very much for the “disappearing subsidiary” link. I mostly focus on healthcare but for a few years track subsidiaries of insurance companies and put two words in my titles “subsidiary watch” when I think it’s something to pay attention too. Just last year we had HHS give the contract to build the Federal Data Hub to a company called QSSI, and after 2 weeks of the announcement from HHS, a subsidiary of United Healthcare buys it.

    HHS gets all upset and for good reason as they probably were “played” on this and this was just before the election and begins to insist on a firewall between QSSI and United. She also told United as I read in the news that it needed to be reported to the SEC…well not so and Sebelius waited until after the election to report it herself. So this is an example of how things get gray with some of this as well. People have no clue on who the ultimate conglomerate is making the buck if you will. United was busy working DOD for more money and contracts (and they go them) so now they are not for the most part going to participate in any state insurance exchanges. Now, it seems like we have maybe another company with NSA security clearances in new areas up and coming? So, I’ll just keep up my campaign on having a law licensing all data sellers. We license stock brokers, real estate sales people, doctors…the next step as licensing (and I also say excise tax the rich ones. i.e. banks on their billions made in easy data profits) to give some avenue of regulation here as your point is well made, how do we know who they are and what they sell?

    While all the corporate modeling and data mining goes on, the in the mean time we have what I call the side shows of digital illiterate law makers with their “default” fall back topic to abortions that works to keep the focus from what we really need to be addressing but the illits seem to think they know it all already and show little interest in learning from what I read in the news sadly. Most of those too you will find off the digital map other than what they ‘have” to show, and staff takes care of that.


  9. June 28, 2013 at 7:42 am

    FYI, interesting thoughts on how to profit from mining the NSA’s data set:



  1. June 28, 2013 at 5:14 am
Comments are closed.
%d bloggers like this: