The Kollected Kode Vicious

Kode Vicious - @kode_vicious

  Download PDF version of this article PDF

Repeat, Reproduce, Replicate

The pressure to publish versus the will to defend scientific claims

Dear KV,

Even though I'm not in academia, I do try to keep up with the research in computing science, mostly by reading the abstracts of papers in the SIGs I was a member of in college. From time to time, I find an interesting paper that contains something I can apply in my day job. Sometimes I'm even lucky enough to find that the researchers have posted their code to their web page or a service like GitHub. But, more often than not, whenever I try the code, I find it has many problems. Sometimes it doesn't build at all and appears to be abandonware. There are even times when it will build but then doesn't operate in the same way the paper indicated. Occasionally, I've emailed the researchers only to find they're graduate students who have moved on to other work and either don't reply, or—when they do—it's only to shrug me off and wish me luck.

It seems that if researchers have gone to the trouble of posting code, it ought to actually work, right?

Irreproducible

 

Dear Irreproducible,

As I'm sure you know from previous installments of the KV show, I too try to follow the research in my area, which, broadly covers "systems"—also known as those unsexy bits of software that enable applications to use computing hardware and the occasional network. Down here in the sub-sub-basement of computing science, we try to improve the world by applying the scientific method. So, I'm always happy for the occasional missive that floats down from the ivory towers of those who have managed to convince program committees that their work has merit.

It may shock you to know that most conferences and publishing venues do not require researchers to submit their experimental data or systems in order to be allowed to publish their results. I'm told this is now changing. In fact, ACM introduced a badging system for software artifacts back in 2020 (https://www.acm.org/publications/policies/artifact-review-and-badging-current).

While the badging system is a step in the right direction (albeit with an annoyingly silly set of three R's—repeatability, reproducibility, and replicability that are hard enough for native English speakers to differentiate, never mind for those of our colleagues who didn't start out life speaking English), it is not a requirement for publication, and herein lies one of the problems.

A hallmark of the scientific method over the past several hundred years—and the thing that differentiates science from belief or faith—is that other people must be able to independently replicate the result of an experiment. In computing science, we do not take this seriously enough. In fact, if you talk to some researchers about this, they'll chuckle and point out that what gets published might be based on a graduate student finally getting their code to run once and produce a graph.

To say that these are shifting sands on which to build up a body of scientific knowledge is an understatement. In a world that depends, day in and day out, on the results of experiments in computing science, it qualifies as a dangerous outrage. Do you want the algorithm that determines when and how hard to apply your car brakes to be one that was embraced on the basis of one lucky run of test code?

There are several reasons for this disconnect between research and the rest of us, and they include concerns outside the realm of computing science—issues related to economics and politics, for example. But in the end, it all comes down to a fundamental disconnect between incentives. In the academic world, the incentives revolve largely around "publish or perish"—a well worn phrase that even those outside of the academic world surely know. The people who produce the research are graded not so much on how well their ideas work—although some ideas that win a following do end up propelling careers—but instead, by how many papers have been accepted into prestigious journals and conferences.

This pressure to publish, along with the fact that the field of computing is now one of the most lucrative in the world, has twisted things such that people are publishing at any cost. This, in fact, has led to a huge amount of academic chicanery, such as paper mills, where prior research is mixed and matched to yield seemingly new results that might get published somewhere, even if not in the top-tier journals. In some fields, like medicine, this pressure has become so intense that great reams of research have been torn up after being found to be based on faulty or even fraudulent data.

Another challenge confronting reproducible results in computing science is the very speed with which the field changes. Finding a computer that's largely comparable to the one used just five years earlier to produce a result can prove challenging. And finding a system with the exact same configurations or memory, disk, bus, and CPU is sure to be even harder. KV doubts that conferences anytime soon will be requiring researchers to hand over their hardware as well as their software in order to submit a paper. But I have to admit this is an amusing thought I occasionally enjoy entertaining over a double of anything at my local bar. Usually, though, by the second double, I've stopped laughing and have started madly scribbling new submission rules on gin-soaked napkins I can never seem to find the next morning.

KV likes to look at physics as a gold standard in the sciences. I'm sure some angry physicists will now send me missives to tell me I'm dead wrong to believe this (and some of those folks make bombs, so I really should watch what I say). Still, I asked a physicist of my acquaintance—one with a long career and many published papers and books—what he thought of a recent statistic indicating that, in my area, systems, only 24 percent of the research a group attempted to reproduce proved to be reproducible (http://reproducibility.cs.arizona.edu/tr.pdf). Once he'd stopped laughing and set his drink down, my friend mentioned two names, Fleischmann and Pons, which I then had to look up. These are the guys who claimed to have achieved "cold fusion" and now are infamous for having gotten that all wrong. This would have made for a sobering conversation had we not already ordered our sixth round. Luckily, the napkins on that table were less soaked with gin than usual or I'd have lost the names.

All of which is to say that if Computing Science wants to really be a science, and not just in name, we are going to have to take a pause, take stock, and think about how we encourage (or, on my more angry days FORCE) people to defend their scientific claims with reproducible results. Since most researchers have a cadre of graduate students working for them perhaps it might be good training to have the first year students reproduce results from recent, award winning, papers as both a learning experience, and, if they find errors, a way to have their own early papers to publish. While the problems of hardware and software moving quickly are undeniable, the reduction in cost for computing hardware actually argues in favor of reproducing results.

Unless a result relies on a specific hardware trick, such as a proprietary accelerator or modified instruction set, it is possible to reproduce the results of one group by a different one. Unlike the physicists we don't have to build a second Hadron Collider to verify the result of the first. We have millions of similar, and sometimes identical, devices, on which to reproduce our results. All that is required is the will to do so.

 

KV

George V. Neville-Neil works on networking and operating-system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are computer security, operating systems, networking, time protocols, and the care and feeding of large codebases. He is the author of The Kollected Kode Vicious and co-author with Marshall Kirk McKusick and Robert N. M. Watson of The Design and Implementation of the FreeBSD Operating System. For nearly 20 years, he has been the columnist better known as Kode Vicious. Since 2014, he has been an industrial visitor at the University of Cambridge, where he is involved in several projects relating to computer security. He earned his bachelor's degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. His software not only runs on Earth, but also has been deployed as part of VxWorks in NASA's missions to Mars. He is an avid bicyclist and traveler who currently lives in New York City.

Copyright © 2024 held by owner/author. Publication rights licensed to ACM.

acmqueue

Originally published in Queue vol. 22, no. 3
Comment on this article in the ACM Digital Library





More related articles:

Ellen Chisa - Evolution of the Product Manager
Software practitioners know that product management is a key piece of software development. Product managers talk to users to help figure out what to build, define requirements, and write functional specifications. They work closely with engineers throughout the process of building software. They serve as a sounding board for ideas, help balance the schedule when technical challenges occur - and push back to executive teams when technical revisions are needed. Product managers are involved from before the first code is written, until after it goes out the door.


Jon P. Daries, Justin Reich, Jim Waldo, Elise M. Young, Jonathan Whittinghill, Daniel Thomas Seaton, Andrew Dean Ho, Isaac Chuang - Privacy, Anonymity, and Big Data in the Social Sciences
Open data has tremendous potential for science, but, in human subjects research, there is a tension between privacy and releasing high-quality open data. Federal law governing student privacy and the release of student records suggests that anonymizing student data protects student privacy. Guided by this standard, we de-identified and released a data set from 16 MOOCs (massive open online courses) from MITx and HarvardX on the edX platform. In this article, we show that these and other de-identification procedures necessitate changes to data sets that threaten replication and extension of baseline analyses. To balance student privacy and the benefits of open data, we suggest focusing on protecting privacy without anonymizing data by instead expanding policies that compel researchers to uphold the privacy of the subjects in open data sets.


Michael J. Lutz, J. Fernando Naveda, James R. Vallino - Undergraduate Software Engineering: Addressing the Needs of Professional Software Development
In the fall semester of 1996 RIT (Rochester Institute of Technology) launched the first undergraduate software engineering program in the United States. The culmination of five years of planning, development, and review, the program was designed from the outset to prepare graduates for professional positions in commercial and industrial software development.





© ACM, Inc. All Rights Reserved.