What are you measuring?

The gross national product does not allow for the health of our children, the quality of their education or the joy of their play. It does not include the beauty of our poetry or the strength of our marriages, the intelligence of our public debate or the integrity of our public officials.

It measures neither our wit nor our courage, neither our wisdom nor our learning, neither our compassion nor our devotion to our country, it measures everything in short, except that which makes life worthwhile.

—Robert F. Kennedy

What are you measuring?

Data Science, Ethics, and Academics in Industry

I’ve been fielding more questions about research ethics and protecting individuals with regard to data science and big data. The topic warrants a much more in-depth discussion than this blog post, but I’ve noticed one trend that’s worth pointing out: academics previously working at research universities either leaving academia temporarily or permanently for tech companies and industry.

Academic researchers are almost always required to submit their research proposals to their organization’s Institutional Review Board (IRB), an interdisciplinary group of researchers charged with protecting human subjects as outlined in the 1979 Belmont Report and overseeing research ethics training at most universities and research organizations. Private companies are under no such obligation, as the controversial Facebook study (PDF) of emotional contagion demonstrated. These companies rely on the permissions granted by users who consent to the Terms of Service agreements prior to signing up for the service.

For me, it remains an open question whether researchers in private industry are adhering to a “do no harm” maxim. The obvious tension is that profit-motivated entities like startups and publicly-traded tech companies are interested in maximizing investor or shareholder value and are not subject to the same research ethics requirements as publicly-funded research universities.

I’m encouraged that some academic researchers like Jessica Vitak are tackling these issues and looking for ways to increase transparency in big data use. Vitak’s Privacy + Security Internet Research Lab is tackling exactly these questions. I had the opportunity to hear Vitak speak at the recent Human-Computer Interaction Laboratory annual symposium at the University of Maryland, College Park. One of the potential solutions that Vitak suggests is that the peer review process for academic publications and conferences needs to fill gaps left by insufficient IRB expertise in some areas of data science. This won’t necessarily change what private companies do with individual data, but it’s certainly a start. The controversial Facebook study now includes an “Editorial Expression of Concern,” which appeared after the publication of the study. Had the editor and peer reviewers at PNAS been more attuned to research ethics and human subjects protection during the peer review process, the Facebook authors might have been asked to do a much better job of addressing the ethical implications in their research.

Of course, this raises the thornier question of rejecting research that does not adhere to accepted human subjects protections: in this case, we do not reward the authors for failing to conduct research in an ethical manner, but we prevent information about the research from entering the public domain. I don’t have a good answer to this issue.

I don’t specifically intend to pick on the tech companies here. Plenty of other industries have, in the name of profit-driven research, done harm. But tech companies also represent a particularly desirable organization in which to do research. Traditionally, researchers, especially in the social sciences, had to painstakingly collect their own experimental or correlational data. This was both time consuming and expensive, and perhaps too often resulted in non-significant findings because the research sample was too small. Tech companies, on the other hand, are awash in data that represents a potential intellectual gold mine for social scientists.

My hope is that those who leave academia for the bountiful data available at tech companies remember and abide by their research ethics training, even when they aren’t required to. I also hope that tech companies are engaging with experts in research ethics and taking any objections by those experts seriously.

A recent NPR Hidden Brain podcast episode “This is Your Brain on Uber” featured an interview with Keith Chen, who appears to be both Head of Economic Research at Uber and also tenured professor at Yale. If he indeed holds dual roles, it raises important ethical questions about the research he is conducting for Uber. Does Chen conform to the same human subjects protection protocols at Uber that he must when working “at” Yale? Or is there an artificial separation because Uber isn’t Yale and isn’t subject to the same requirements?

During the episode, Shankar Vendantam at one point asks Chen about the implications for individual users’ privacy in research projects based on users’ data. Chen seemed concerned about the implications Vendantam raised, but also somewhat dismissive, simply suggesting that Uber has a Privacy Officer, a hire that was made only after a user outcry when it was discovered that an Uber executive may have inappropriately used his access to track the movements of a reporter. Chen said he didn’t usually worry about his behavioral data being used by tech companies, but that Vendantam’s question is now making him think more about it.

I am encouraged that reporters are challenging researchers and industry on their data and research practices and I certainly don’t believe we should throw the proverbial baby out with the bathwater here. There is much to be gained by using these first-ever datasets of human behavior that will add to what we know and understand about humans and social behavior.

It’s also the case that with great power comes great responsibility. Greater transparency, the involvement of research ethicists, and ensuring truly informed participants should be required not just for academic researchers, but also for researchers working in industry.

Look for a future post on the role of psychologists in the ethical conduct of research, and why I believe that a professional code of ethics is a vital component of protecting individuals.

Data Science, Ethics, and Academics in Industry