The Four Horsemen of an Ailing Software Project

Don't let the pale rider catch you with an exception.

Dear KV,

Are there any reliable measurements one can use to judge the health of a software project? I've seen many things written about the quality of software but not very much about the quality of a project itself. I ask this because I worry that I'm stuck on a failing project, but it's hard to know if it is really failing. The company I work for alternately feeds and starves the project of resources, while also saying that completing the next release on time is the key to our success. If we're the key to success, why would they periodically starve the project? I keep wondering if I'm a frog in a slow-boiling pot of water and that I'll only know I should have left once it's too late. If there are measures for software quality, there must surely be measures for project quality?

Heating Slowly

Dear Heating,

Software teams, unlike software projects, are made up of people, and interactions with people are messy, which is why some of us went into this field in the first place: to avoid the messy humans and to work with the wonderfully logical and exact machines. Unfortunately, it's difficult to build anything interesting with one person, so you wind up working with a team, and teams are made of people, and as Sartre once said, "Hell is other people."

There are plenty of books and articles written about how software projects live or die, the most famous of which, The Mythical Man Month by Fred Brooks, I recommended in these pages long ago, and I stand by that recommendation. Brooks's work continues to be relevant because—unlike the technology we work on—people do not change very quickly, and some, including KV, would argue that people rarely learn anything from their experiences. If you doubt my cynicism, pick up a newspaper and read the front page. I'll wait.

Without delving deeply into specific cases of how software teams fail, we can talk about the four harbingers of the ignominious end to a software project. The harbingers bear a strong resemblance to the mythological Four Horsemen of the Apocalypse: War, Famine, Pestilence, and Death.

When a team starts to fail, one of the first harbingers to appear is War. Functioning teams can get along—at least in the work environment—and share tasks, hand them off when one member is overburdened, and generally work in a congenial manner. As a team starts to fail, team members become increasingly paranoid because they don't want to be blamed for the failure.

This paranoia often exhibits itself as extreme defensiveness, the idea being, "It's not my fault we're failing. My code works!" In a large and complex project, once enough of the team has hunkered down in this paranoid state, they will lash out at anything or anyone who might be seen to be impugning the quality of their work. The lashing out leads to arguments, which look a lot like war, although one carried out with code commits, snarky reviews, and nasty email threads. Hardly the stuff of immortal legend, but enough of a drain on the team to make it fall into a downward spiral of failure.

As teams fail and projects get delayed, management may decide that it's time to focus effort elsewhere and to move developers off the team and into other areas of work. Removing developers starves the project of resources and leads to Famine. At this point, it would probably make sense to kill the project and completely reconstitute the teams in some more productive fashion, but managers—like developers—can often be too hopeful of a miracle save and, therefore, continue a project long after the team that is developing it should have been disbanded. Dying of famine, like death by a thousand cuts, is long and painful. If you are on a project that is constantly being deprived of resources, it's time to find something else to work on or somewhere else to work. Once famine starts, recovery is difficult and it's best to seek sustenance elsewhere.

KV has talked about various measures of software quality in past columns, but perhaps failing software quality—in the form of increasing bug counts—is one of the most objective measures that a team is failing. This Pestilence, brought about by the low morale engendered in the team by War and Famine, is a clear sign that something is wrong. In the real world, a diseased animal can be culled so that disease does not spread and become a pestilence over the land. Increasing bug counts, especially in the absence of increased functionality—which is when code fixes cause more bugs rather than actual fixes—is a sure sign of a coming project apocalypse.

The final horseman is not a harbinger of Death, but Death itself. Eventually, either management or the VCs will be forced to see the failure for what it is, kill off the project, and disband the team. In the most extreme cases, this will also destroy the company itself. It's a moment that those of us who have worked in the industry for any length of time have seen—often firsthand—and it's never pretty. When you see War, Famine, and Pestilence on a team, if you are not able to fix the problem—and few of us are—then it's time to move along to somewhere or something else, lest the pale rider catch you with an exception when you're deep inside a complex function from which you will fail to return.

George V. Neville-Neil works on networking and operating-system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are computer security, operating systems, networking, time protocols, and the care and feeding of large code bases. He is the author of The Kollected Kode Vicious and co-author with Marshall Kirk McKusick and Robert N. M. Watson of The Design and Implementation of the FreeBSD Operating System. For nearly 20 years, he has been the columnist better known as Kode Vicious. Since 2014, he has been an Industrial Visitor at the University of Cambridge, where he is involved in several projects relating to computer security. He earned his bachelor's degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. His software not only runs on Earth, but also has been deployed as part of VxWorks in NASA's missions to Mars. He is an avid bicyclist and traveler who currently lives in New York City.

Originally published in Queue vol. 20, no. 4—
Comment on this article in the ACM Digital Library

More related articles:

Dennis Roellke - String Matching at Scale
String matching can't be that difficult. But what are we matching on? What is the intrinsic identity of a software component? Does it change when developers copy and paste the source code instead of fetching it from a package manager? Is every package-manager request fetching the same artifact from the same upstream repository mirror? Can we trust that the source code published along with the artifact is indeed what's built into the release executable? Is the tool chain kosher?

Catherine Hayes, David Malone - Questioning the Criteria for Evaluating Non-cryptographic Hash Functions
Although cryptographic and non-cryptographic hash functions are everywhere, there seems to be a gap in how they are designed. Lots of criteria exist for cryptographic hashes motivated by various security requirements, but on the non-cryptographic side there is a certain amount of folklore that, despite the long history of hash functions, has not been fully explored. While targeting a uniform distribution makes a lot of sense for real-world datasets, it can be a challenge when confronted by a dataset with particular patterns.

Nicole Forsgren, Eirini Kalliamvakou, Abi Noda, Michaela Greiler, Brian Houck, Margaret-Anne Storey - DevEx in Action
DevEx (developer experience) is garnering increased attention at many software organizations as leaders seek to optimize software delivery amid the backdrop of fiscal tightening and transformational technologies such as AI. Intuitively, there is acceptance among technical leaders that good developer experience enables more effective software delivery and developer happiness. Yet, at many organizations, proposed initiatives and investments to improve DevEx struggle to get buy-in as business stakeholders question the value proposition of improvements.

João Varajão, António Trigo, Miguel Almeida - Low-code Development Productivity
This article aims to provide new insights on the subject by presenting the results of laboratory experiments carried out with code-based, low-code, and extreme low-code technologies to study differences in productivity. Low-code technologies have clearly shown higher levels of productivity, providing strong arguments for low-code to dominate the software development mainstream in the short/medium term. The article reports the procedure and protocols, results, limitations, and opportunities for future research.