Privacy, Anonymity, and Big Data in the Social Sciences

 

Quality social science research and the privacy of human subjects requires trust.

JON P. DARIES, JUSTIN REICH, JIM WALDO, ELISE M. YOUNG, JONATHAN WHITTINGHILL, DANIEL THOMAS SEATON, ANDREW DEAN HO, ISAAC CHUANG

 

Open data has tremendous potential for science, but, in human subjects research, there is a tension between privacy and releasing high-quality open data. Federal law governing student privacy and the release of student records suggests that anonymizing student data protects student privacy. Guided by this standard, we de-identified and released a data set from 16 MOOCs (massive open online courses) from MITx and HarvardX on the edX platform. In this article, we show that these and other de-identification procedures necessitate changes to data sets that threaten replication and extension of baseline analyses. To balance student privacy and the benefits of open data, we suggest focusing on protecting privacywithout anonymizing data by instead expanding policies that compel researchers to uphold the privacy of the subjects in open data sets. If we want to have high-quality social science research and also protect the privacy of human subjects, we must eventually have trust in researchers. Otherwise, we’ll always have the strict tradeoff between anonymity and science illustrated here.

Privacy, Anonymity, and Big Data in the Social Sciences

 

Related:
Four Billion Little Brothers?: Privacy, mobile phones, and ubiquitous data collection
Communications Surveillance: Privacy and Security at Risk
Modeling People and Places with Internet Photo Collections

Hazy: Making it Easier to Build and Maintain Big-data Analytics

Racing to unleash the full potential of big data with the latest statistical and machine-learning techniques.

ARUN KUMAR, FENG NIU, AND CHRISTOPHER RÉ, DEPARTMENT OF COMPUTER SCIENCES, UNIVERSITY OF WISCONSIN-MADISON

The rise of big data presents both big opportunities and big challenges in domains ranging from enterprises to sciences. The opportunities include better-informed business decisions, more efficient supply-chain management and resource allocation, more effective targeting of products and advertisements, better ways to “organize the world’s information,” faster turnaround of scientific discoveries, etc.

Hazy: Making it Easier to Build and Maintain Big-data Analytics

 

Related:

The Pathologies of Big Data

Condos and Clouds

How Will Astronomy Archives Survive the Data Tsunami?