The Kollected Kode Vicious

Kode Vicious - @kode_vicious

  Download PDF version of this article PDF

The Four Horsemen of an Ailing Software Project

Don't let the pale rider catch you with an exception.

Dear KV,

Are there any reliable measurements one can use to judge the health of a software project? I've seen many things written about the quality of software but not very much about the quality of a project itself. I ask this because I worry that I'm stuck on a failing project, but it's hard to know if it is really failing. The company I work for alternately feeds and starves the project of resources, while also saying that completing the next release on time is the key to our success. If we're the key to success, why would they periodically starve the project? I keep wondering if I'm a frog in a slow-boiling pot of water and that I'll only know I should have left once it's too late. If there are measures for software quality, there must surely be measures for project quality?

Heating Slowly

 

Dear Heating,

Software teams, unlike software projects, are made up of people, and interactions with people are messy, which is why some of us went into this field in the first place: to avoid the messy humans and to work with the wonderfully logical and exact machines. Unfortunately, it's difficult to build anything interesting with one person, so you wind up working with a team, and teams are made of people, and as Sartre once said, "Hell is other people."

There are plenty of books and articles written about how software projects live or die, the most famous of which, The Mythical Man Month by Fred Brooks, I recommended in these pages long ago, and I stand by that recommendation. Brooks's work continues to be relevant because—unlike the technology we work on—people do not change very quickly, and some, including KV, would argue that people rarely learn anything from their experiences. If you doubt my cynicism, pick up a newspaper and read the front page. I'll wait.

Without delving deeply into specific cases of how software teams fail, we can talk about the four harbingers of the ignominious end to a software project. The harbingers bear a strong resemblance to the mythological Four Horsemen of the Apocalypse: War, Famine, Pestilence, and Death.

When a team starts to fail, one of the first harbingers to appear is War. Functioning teams can get along—at least in the work environment—and share tasks, hand them off when one member is overburdened, and generally work in a congenial manner. As a team starts to fail, team members become increasingly paranoid because they don't want to be blamed for the failure.

This paranoia often exhibits itself as extreme defensiveness, the idea being, "It's not my fault we're failing. My code works!" In a large and complex project, once enough of the team has hunkered down in this paranoid state, they will lash out at anything or anyone who might be seen to be impugning the quality of their work. The lashing out leads to arguments, which look a lot like war, although one carried out with code commits, snarky reviews, and nasty email threads. Hardly the stuff of immortal legend, but enough of a drain on the team to make it fall into a downward spiral of failure.

As teams fail and projects get delayed, management may decide that it's time to focus effort elsewhere and to move developers off the team and into other areas of work. Removing developers starves the project of resources and leads to Famine. At this point, it would probably make sense to kill the project and completely reconstitute the teams in some more productive fashion, but managers—like developers—can often be too hopeful of a miracle save and, therefore, continue a project long after the team that is developing it should have been disbanded. Dying of famine, like death by a thousand cuts, is long and painful. If you are on a project that is constantly being deprived of resources, it's time to find something else to work on or somewhere else to work. Once famine starts, recovery is difficult and it's best to seek sustenance elsewhere.

KV has talked about various measures of software quality in past columns, but perhaps failing software quality—in the form of increasing bug counts—is one of the most objective measures that a team is failing. This Pestilence, brought about by the low morale engendered in the team by War and Famine, is a clear sign that something is wrong. In the real world, a diseased animal can be culled so that disease does not spread and become a pestilence over the land. Increasing bug counts, especially in the absence of increased functionality—which is when code fixes cause more bugs rather than actual fixes—is a sure sign of a coming project apocalypse.

The final horseman is not a harbinger of Death, but Death itself. Eventually, either management or the VCs will be forced to see the failure for what it is, kill off the project, and disband the team. In the most extreme cases, this will also destroy the company itself. It's a moment that those of us who have worked in the industry for any length of time have seen—often firsthand—and it's never pretty. When you see War, Famine, and Pestilence on a team, if you are not able to fix the problem—and few of us are—then it's time to move along to somewhere or something else, lest the pale rider catch you with an exception when you're deep inside a complex function from which you will fail to return.

 

George V. Neville-Neil works on networking and operating-system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are computer security, operating systems, networking, time protocols, and the care and feeding of large code bases. He is the author of The Kollected Kode Vicious and co-author with Marshall Kirk McKusick and Robert N. M. Watson of The Design and Implementation of the FreeBSD Operating System. For nearly 20 years, he has been the columnist better known as Kode Vicious. Since 2014, he has been an Industrial Visitor at the University of Cambridge, where he is involved in several projects relating to computer security. He earned his bachelor's degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. His software not only runs on Earth, but also has been deployed as part of VxWorks in NASA's missions to Mars. He is an avid bicyclist and traveler who currently lives in New York City.

Copyright © 2022 held by owner/author. Publication rights licensed to ACM.

acmqueue

Originally published in Queue vol. 20, no. 4
see this item in the ACM Digital Library


Tweet


Related:

James P. Hughes, Whitfield Diffie - The Challenges of IoT, TLS, and Random Number Generators in the Real World
Many in the cryptographic community scoff at the mistakes made in implementing RNGs. Many cryptographers and members of the IETF resist the call to make TLS more resilient to this class of failures. This article discusses the history, current state, and fragility of the TLS protocol, and it closes with an example of how to improve the protocol. The goal is not to suggest a solution but to start a dialog to make TLS more resilient by proving that the security of TLS without the assumption of perfect random numbers is possible.


Benoit Baudry, Tim Toady, Martin Monperrus - Long Live Software Easter Eggs!
It's a period of unrest. Rebel developers, striking from continuous deployment servers, have won their first victory. During the battle, rebel spies managed to push an epic commit in the HTML code of https://pro.sony. Pursued by sinister agents, the rebels are hiding in commits, buttons, tooltips, API, HTTP headers, and configuration screens.


Alexandros Gazis, Eleftheria Katsiri - Middleware 101
Whether segregating a sophisticated software component into smaller services, transferring data between computers, or creating a general gateway for seamless communication, you can rely on middleware to achieve communication between different devices, applications, and software layers. Following the increasing agile movement, the tech industry has adopted the use of fast waterfall models to create stacks of layers for each structural need, including integration, communication, data, and security. Given this scope, emphasis must now be on endpoint connection and agile development. This means that middleware should not serve solely as an object-oriented solution to execute simple request-response commands.


Alvaro Videla - Meaning and Context in Computer Programs
When you look at a function program's source code, how do you know what it means? Is the meaning found in the return values of the function, or is it located inside the function body? What about the function name? Answering these questions is important to understanding how to share domain knowledge among programmers using the source code as the medium. The program is the medium of communication among programmers to share their solutions.





© ACM, Inc. All Rights Reserved.