Home > data science, modeling > Inside the Podesta Report: Civil Rights Principles of Big Data

Inside the Podesta Report: Civil Rights Principles of Big Data

I finished reading Podesta’s Big Data Report to Obama yesterday, and I have to say I was pretty impressed. I credit some special people that got involved with the research of the report like Danah Boyd, Kate Crawford, and Frank Pasquale for supplying thoughtful examples and research that the authors were unable to ignore. I also want to thank whoever got the authors together with the civil rights groups that created the Civil Rights Principles for the Era of Big Data:

  1. Stop High-Tech Profiling. New surveillance tools and data gathering techniques that can assemble detailed information about any person or group create a heightened risk of profiling and discrimination. Clear limitations and robust audit mechanisms are necessary to make sure that if these tools are used it is in a responsible and equitable way.
  2. Ensure Fairness in Automated Decisions. Computerized decisionmaking in areas such as employment, health, education, and lending must be judged by its impact on real people, must operate fairly for all communities, and in particular must protect the interests of those that are disadvantaged or that have historically been the subject of discrimination. Systems that are blind to the preexisting disparities faced by such communities can easily reach decisions that reinforce existing inequities. Independent review and other remedies may be necessary to assure that a system works fairly.
  3. Preserve Constitutional Principles. Search warrants and other independent oversight of law enforcement are particularly important for communities of color and for religious and ethnic minorities, who often face disproportionate scrutiny. Government databases must not be allowed to undermine core legal protections, including those of privacy and freedom of association.
  4. Enhance Individual Control of Personal Information. Personal information that is known to a corporation — such as the moment-to-moment record of a person’s movements or communications — can easily be used by companies and the government against vulnerable populations, including women, the formerly incarcerated, immigrants, religious minorities, the LGBT community, and young people. Individuals should have meaningful, flexible control over how a corporation gathers data from them, and how it uses and shares that data. Non-public information should not be disclosed to the government without judicial process.
  5. Protect People from Inaccurate Data. Government and corporate databases must allow everyone — including the urban and rural poor, people with disabilities, seniors, and people who lack access to the Internet — to appropriately ensure the accuracy of personal information that is used to make important decisions about them. This requires disclosure of the underlying data, and the right to correct it when inaccurate.

This was signed off on by multiple civil rights groups listed here, and it’s a great start.

One thing I was not impressed by: the only time the report mentioned finance was to say that, in finance, they are using big data to combat fraud. In other words, finance was kind of seen as an industry standing apart from big data, and using big data frugally. This is not my interpretation.

In fact, I see finance as having given birth to big data. Many of the mistakes we are making as modelers in the big data era, which require the Civil Rights Principles as above, were made first in finance. Those modeling errors – and when not errors, politically intentional odious models – were created first in finance, and were a huge reason we first had the mortgage-backed-securities rated with AAA ratings and then the ensuing financial crisis.

In fact finance should have been in the report standing as a worst case scenario.

One last thing. The recommendations coming out of the Podesta report are lukewarm and are even contradicted by the contents of the report, as I complained about here. That’s interesting, and it shows that politics played a large part of what the authors could include as acceptable recommendations to the Obama administration.

Categories: data science, modeling
  1. May 7, 2014 at 9:11 am

    Do you have a post (or some other self-contained writing) where you detail how finance gave birth to big data? I realize that you demonstrate parts of it throughout your blog, but I would be very interested in reading a self-contained version of such a historic overview. A focus on mistakes would also be instructive, but hopefully it can be balanced by some successes.


    • May 7, 2014 at 9:27 am

      It’s going to be in my book 🙂


      • Zathras
        May 7, 2014 at 11:17 am

        A lot of Big Data methods though originated in Physics. Then Physicists brought them to Finance.


  2. May 7, 2014 at 11:44 am

    Well I still bomb the FTC with my idea of having a law passed to where every bank, company, etc. should be required to buy a license as it would bring many out of hiding that consumers don’t know are out there and secondly a requirement would be to have a federal website to where those who distribute and sell data have a page where we could look up and see what kind of data they sell and to who.
    Part of a license would be to keep a public page updated so we could look these things up and I’m not saying list every detail, but again what kind of data and who they sell too. I keep saying this that when you want to regulate a group, you need an index of who they all are and a license is a very well accepted method we use for everything, so no big learning curve there. It would bring folks like e-Scoring out of hiding for one.
    Again unless we have an index of who everyone is in the game, the models and functions can be written in every different way to allow code to run and be “verbiage” compliant. My opinion here too is that sure they discussed a lot but can we get to the root of what’s causing a lot of privacy issues and do something there? I keep saying lawyers work on law verbiage, while “code runs hog ass wild”…yeah I created that to get attention for sure.
    You have Walgreens out there making one to two billion a year just selling data so think of how much money there is and retailers like them would have to register too. They can sell prescription data as long as it is not obtained from a HIPAA covered entity..it’s like one of those loopholes that nobody wants to talk about.


    • Zathras
      May 7, 2014 at 12:18 pm

      “Walgreens out there making one to two billion a year just selling data”

      I need to see a source for that. I am sure they are selling the data; I’m just a little hesitant about the order of magnitude.


  1. No trackbacks yet.
Comments are closed.
%d bloggers like this: