Quality social science research and the privacy of human subjects requires trust.
JON P. DARIES, JUSTIN REICH, JIM WALDO, ELISE M. YOUNG, JONATHAN WHITTINGHILL, DANIEL THOMAS SEATON, ANDREW DEAN HO, ISAAC CHUANG
Open data has tremendous potential for science, but, in human subjects research, there is a tension between privacy and releasing high-quality open data. Federal law governing student privacy and the release of student records suggests that anonymizing student data protects student privacy. Guided by this standard, we de-identified and released a data set from 16 MOOCs (massive open online courses) from MITx and HarvardX on the edX platform. In this article, we show that these and other de-identification procedures necessitate changes to data sets that threaten replication and extension of baseline analyses. To balance student privacy and the benefits of open data, we suggest focusing on protecting privacywithout anonymizing data by instead expanding policies that compel researchers to uphold the privacy of the subjects in open data sets. If we want to have high-quality social science research and also protect the privacy of human subjects, we must eventually have trust in researchers. Otherwise, we’ll always have the strict tradeoff between anonymity and science illustrated here.
Four Billion Little Brothers?: Privacy, mobile phones, and ubiquitous data collection
Communications Surveillance: Privacy and Security at Risk
Modeling People and Places with Internet Photo Collections