AAPOR Big Data Report
I was recently part of a task force for understanding the practices of “big data” from the perspective of the American Association for Public Opinion Research (AAPOR), which is an organization that promotes good standards for studying public opinion.
So for example, AAPOR has a code of ethics for how to track public opinion, and a set of understood methodologies for correctly using surveys. They involved themselves last year when they criticized the New York Times and CBS for releasing the results of a nationwide poll on Senate races where the opt-in survey method had “little grounding in theory” and for a lack of transparency.
But here’s the thing, the biggest problem facing the world of public opinion research isn’t that online opt-in polls, but rather the temptation to troll twitter to “see what people are thinking.” And that’s exactly what’s happening, in large part because it’s cheaper. Thus the AAPOR Big Data Report that I helped with.
I think we did a decent job of describing some of the intrinsic difficulties with using big data, specifically around quality control issues, and for that reason I recommend this report to anyone entering the field, or even people already in the field who haven’t thought through this stuff. If you don’t have time to read the full report, here are our recommendations:
1. Surveys and Big Data are complementary data sources not competing data sources.
There are differences between the approaches, but this should be seen as an advantage rather than a disadvantage. Research is about answering questions, and one way to answer questions is to start utilizing all information available. The availability of Big Data to support research provides a new way to approach old questions as well as an ability to address some new questions that in the past were out of reach. However, the findings that are generated based on Big Data inevitably generate more questions, and some of those questions tend to be best addressed by traditional survey research methods.
2. AAPOR should develop standards for the use of Big Data in survey research when more knowledge has been accumulated.
Using Big Data in statistically valid ways is a challenge. One common misconception is the belief that volume of data can compensate for any other deficiency in the data. AAPOR should develop standards of disclosure and transparency when using Big Data in survey research. AAPOR’s transparency initiative is a good role model that should be extended to other data sources besides surveys.
3. AAPOR should start working with the private sector and other professional organizations to educate its members on Big Data.
The current pace of the Big Data development in itself is a challenge. It is very difficult to keep up with the research and development in the Big Data area. Research on new technology tends to become outdated very fast. There is currently insufficient capacity in the AAPOR community. AAPOR should tap other professional associations, such as the American Statistical Association and the Association for Computing Machinery, to help understand these issues and provide training for other AAPOR members and non-members.
4. AAPOR should inform the public of the risks and benefits of Big Data.
Most users of digital services are unaware of the fact that data formed out of their digital behavior may be reused for other purposes, for both public and private good. AAPOR should be active in public debates and provide training for journalists to improve data-driven journalism. AAPOR should also update its Code of Professional Ethics and Practice to include the collection of digital data outside of surveys. It should work with Institutional Review Boards to facilitate the research use of such data in an ethical fashion.
5. AAPOR should help remove the barrier associated with different uses of terminology.
Effective use of Big Data usually requires a multidisciplinary team consisting of e.g., a domain expert, a researcher, a computer scientist, and a system administrator. Because of the interdisciplinary nature of Big Data, there are many concepts and terms that are defined differently by people with different backgrounds. AAPOR should help remove this barrier by informing its community about the different uses of terminology. Short courses and webinars are successful instruments that AAPOR can use to accomplish this task.
6. AAPOR should take a leading role in working with federal agencies in developing a necessary infrastructure for the use of Big Data in survey research.
Data ownership is not well defined and there is no clear legal framework for the collection and subsequent use of Big Data. There is a need for public-private partnerships to ensure data access and reproducibility. The Office of Management and Budget (OMB) is very much involved in federal surveys since they develop guidelines for those and research funded by government should follow these guidelines. It is important that AAPOR work together with federal statistical agencies on Big Data issues and build capacity in this field. AAPOR’s involvement could include the creation or propagation of shared cloud computing resources