Is open data a good thing?
As much as I like the idea of data being open and free, it’s not an open and shut case. As it were.
I’m first going to argue against open data with three examples.
The first is a pretty commonly discussed concern of privacy. Simply put, there is no such thing as anonymized data, and people who say there is are either lying or being naive. The amount of information you’d need to remove to really anonymize data is not known to be different from the amount of data you have in the first place. So if you did a good job to anonymize a data set, you’d probably remove all interesting information anyway. Of course, you could think this is only important with respect to individual data.
But my next example comes from land data, specifically Tamil Nadu in Southern India. There’s an interesting Crooked Timber blogpost here (hat tip Suresh Naidu) explaining how “open data” has screwed a local population, the Dalits. Although you could (and I would) argue that the way the data is collected and disseminated, and the fact that the courts go along with this process, is itself politically motivated and disenfrachising, there are some important point made in this post:
Open data undermines the power of those who benefit from “the idiosyncracies and complexities of communities… Local residents [who] understand the complexity of their community due to prolonged exposure.” The Bhoomi land records program is an example of this: it explicitly devalues informal knowledge of particular places and histories, making it legally irrelevant; in the brave new world of open data such knowledge is trumped by the ability to make effective queries of the “open” land records.15 The valuing of technological facility over idiosyncratic and informal knowledge is baked right in to open data efforts.
The Crooked Timber blog post specifically called out Tim O’Reilly and his “Government as Platform” project as troublesome:
The faith in markets sometimes goes further among open data advocates. It’s not just that open data can create new markets, there is a substantial portion of the push for open data that is explicitly seeking to create new markets as an alternative to providing government services.
The issue is how data is used. If the wealthy can manipulate legislators to wipe out generations of records and folk knowledge as “inaccurate,” then there’s a problem. A group like DataKind could go in and figure out a way to codify that older generation of knowledge. Then at least, if that isn’t acceptable to the government, it would be clear that the problem lies in political manipulation, not in the data itself. And note that a government could wipe out generations of “inaccurate records” without any requirement that the new records be open. In years past the monied classes would have just taken what they wanted, with the government’s support. The availability of open data gives a plausible pretext, but it’s certainly not a prerequisite (nor should it be blamed) for manipulation by the 0.1%.
[Speaking of DataKind (formerly Data Without Borders), it's also a problem, as I discovered as a data ambassador working with the NYCLU on Stop, Question and Frisk data, when the government claims to be open but withholds essential data such as crime reports.]
My final example comes from finance. On the one hand I want total transparency of the markets, because it sickens me to think about how nobody knows the actual price of bonds, or the correct interest rate, or the current default assumption of the market, how all of that stuff is being kept secret by Wall Street insiders so they can each skim off their little cut and the dumb money players get constantly screwed.
But on the other hand, if I imagine a world where everything really is transparent, then even in the best of all database situations, that’s just asstons of data which only the very very richest and most technologically savvy high finance types could ever munge through.
So who would benefit? I’d say, for some time, the average dumb money customer would benefit very slightly, by not paying extra fees, but that the edgy techno finance firms would benefit fantastically. Then, I imagine, new ways would be invented for the dumb money customers to lost that small amount of benefit altogether, probably by just inundating them with so much data they can’t absorb it.
In other words, open data is great for the people who have the tools to use it for their benefit, usually to exploit other people and opportunities. It’s not clearly great for people who don’t have those tools.
But before I conclude that data shouldn’t be open, let me strike an optimistic (for me) tone.
The tools for the rest of us are being built right now. I’m not saying that the non-exploiters will ever catch up with the Goldman Sachs and credit card companies, because probably not.
But there will be real tools (already are things like python and R, and they’re getting better every day), built out of the open software movement, that will help specific people analyze and understand specific things, and there are platforms like wordpress and twitter that will allow those things to be broadcast, which will have real impact when the truth gets out. An example is the Crooked Timber blog post above.
So yes, open data is not an unalloyed good. It needs to be a war waged by people with common sense and decency against those who would only use it for profit and exploitation. I can’t think of a better thing to do with my free time.