The Unholy Trinity of Software Development

Tests, documentation, and code

Dear KV,

I've read your column for a few years and you have written a few times about testing and documentation, but you have not written about the relationship between the two. Many large projects, including the one I'm working with now, keep their tests and documentation separate from the code and from each other. I have argued with my team about putting both of these closer to, or even embedded in, the source code, but there are always a few engineers who refuse to allow anything but code and comments in the source files. Their argument is about maximizing the screen space for code context, but that seems like a more reasonable argument for a 25-line terminal than for a modern development environment. What arguments would you make to get this software team to adopt a more holistic approach to the system we're working on?

System Integrationist

Dear SI,

Questions like this bring to mind the fact that source code, documentation, and testing are the unholy trinity of software development, although many organizations like to see them as separate entities. It is interesting that while many groups pay lip service to "test-driven development," they do not include documentation in TDD.

I think you have found one of the reasons many developers fight against the integration of tests and documentation into source code, either directly into the source files or even close by in the same directory. The arguments against such integration include the one you mentioned about vertical space for code. Let's treat that one first.

Software developers like new toys. Of course they do: they work on computers and computers are toys to us, and everyone likes things that are shiny. If you visit a modern software company, what do you see besides a sea of Aeron chairs? Lots and lots of monitors, and many of those are of the 4-K variety, meaning that a text editor, even with a large font, will give you more than 100 lines of code to look at—a 400 percent increase over the 80x25 monitors used to write code since the 1970s.

There is a school of thought that says a function or method of 100 lines is too long and should be broken down into smaller, more easily digestible chunks. If you work in such an environment, then there is no reason for the developers to argue against documentation or tests being integrated within the code itself, as a 25-line function can have 10 lines of documentation above it, and a reasonable documentation system, such as Doxygen, can assemble this set of "documentlettes" into a greater whole. That greater whole, by the way, needs to be reviewed by someone who can turn it into language suitable for others to read. The worst code documentation is the kind that is not kept in sync with the code itself. The second worst type of such documentation is where the engineer writes something obvious, foo() function returns bar, which is easily seen from reading the code.

The best type of this documentation explains what the function does, what its possible error conditions are, and what conditions are required when the code is called. Multithreaded applications really need to have their locking requirements described in such documentation blocks. For some products, such as end-user-facing systems, these document blocks will not generally find their way into the final manual, but they will be very useful to the person who is responsible for writing that manual. Libraries and other systems that are consumed by other programmers absolutely have to have this style of documentation for the preservation of the sanity of all involved.

On the integration of tests into the source code, well, KV may be a bit old-fashioned, but I do see that this is a tad harder and probably requires the source code to have special features embedded in it, or that the source-code editing system have special features, or both. Even with more than 100 lines of vertical space in which to code and document, adding any significant number of conformance tests will definitely dwarf the code and make each source file quite large and unwieldy.

Code folding, a common feature of many text editors, may help if you really are hell-bent on keeping the unholy trinity together in one file. The top of each source file would include the overarching documentation, a large block that describes the module and its purpose within the system. The source code would then be placed between the class/method/function documentation, and the conformance tests for the code would come last.

The main complaint that you will encounter with the folding method is that it requires special characters and a smart code editor, although even Vim has code folding at this point. Since folding in the source file usually requires special tags, it would make sense to standardize these for the whole project so that there is one tag each for documentation, tests, and source code.

One advantage of combining all of these things is that the tools you use—including compilers, editors, test frameworks, and documentation extractors—can all point at the same directory hierarchy. Keeping tests, documentation, and code separate complicates the search paths for the various tools and leads to errors, such as pointing at the wrong place and getting the wrong tests, or similar problems.

Bringing together these three components so that they are easily developed in concert has a lot of advantages, but you're still going to have to help people past the mindset of the terminal. These arguments often bring to mind the following scene from Neal Stephenson's Snow Crash, in which he describes how the main character, annoyingly named Hiro Protagonist, actually writes code:

"...where everything that you see in the Metaverse, no matter how lifelike and beautiful and three-dimensional, reduces to a simple text file: a series of letters on an electronic page. It is a throwback to the days when people programmed computers through primitive teletypes and IBM punch cards."

Next time your teammates complain about vertical space wasted on tests or documentation, hand them a punch card.

Dear KV,

In the past 10 years I've noticed that the number of CPU cores available to my software has been increasing, but that the frequency of those cores isn't much more than it was when I left school. Multicore software was a big topic when the trend first began, but it doesn't seem to be discussed as much now that systems often have six or more cores. Most programmers seem to ignore the number of cores and write their code as they did when systems had only a single CPU. Is that just my impression, or does this mean that maybe I picked the wrong startup this year?

Core Curious

Dear Core,

The chief contribution of multicore hardware to software design has been to turn every system into a truly concurrent system. A recently released digital watch has two cores in it, and people still "think digital watches are a pretty neat idea" (as in Douglas Adams's The Hitchhiker's Guide to the Galaxy). When the current crop of computer languages was written, the only truly concurrent systems were rare and expensive beasts that were used in government research labs and other similarly rarefied venues. Now, any clown can buy a concurrent system off the Internet, install it in a data center, and push some code to it. In fact, such clowns can get such systems in the cloud at the push of a button. Would that software for such systems were as easily gotten!

Leaving aside the fact that most applications are now distributed systems implemented on many-core communicating servers, what can we say about the concurrent nature of modern software and hardware? The short answer is, "It's all crap," but that's not helpful or constructive, and KV is all about being helpful and constructive.

From our formative computer science years, we all know that in a concurrent system two different pieces of code can be executing simultaneously, and on a modern server, that number can easily be 32 or 64 rather than just two. As concurrency increases, so does complexity. Software is written to be executed as a set of linear steps, but depending on how the software is written, it may be broken down into many small parts that might all be running at the same time. As long as the software doesn't share any state between the concurrent code, everything is fine—well, as fine as any other nonconcurrent software. The purpose of software is to process data and therefore to take things and change them or to mutate state. The number of significant software systems that do not wind up sharing state between concurrent parts is very, very small.

Software that is written specifically without concurrency is, of course, easier to manage and debug, but it also wastes most of the processing power of the system on which it runs, and so more and more software is being converted from nonconcurrent into concurrent or being written for concurrency from scratch. For any significant system, it is probably easier to rewrite the software in a newer, concurrency-aware language than to try to retrofit older software with traditional concurrency primitives.

Now, I'm sure you've read code that looks nonconcurrent—that is, it does not use threads in its process—and you might think that was fine, but, alas, nothing is ever really fine. Taking a collection of programs and having them share data through, for example, the file system or shared memory, a common early way of having some level of concurrency, does not protect the system from the evils of deadlock or other concurrency bugs. It's just as possible to deadlock software by passing a set of messages between two concurrent processes as it is to do the same sort of thing with Posix threads and mutexes. The problems all come down to the same things: idempotent updates of data and data structures, the avoidance of deadlocks, and the avoidance of starvation.

These topics are covered in books about operating systems, mostly because it was operating systems that first had these challenges. If, after this description, you're still curious, I recommend picking up one such book so that you at least understand the risks of concurrent systems and the land mines you are likely to step on as you build and debug, and debug, and debug such systems.

Kode Vicious, known to mere mortals as George V. Neville-Neil, works on networking and operating-system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are code spelunking, operating systems, and rewriting your bad code (OK, maybe not that last one). He earned his bachelor's degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. Neville-Neil is the co-author with Marshall Kirk McKusick and Robert N. M. Watson of The Design and Implementation of the FreeBSD Operating System (second edition). He is an avid bicyclist and traveler who currently lives in New York City.

Putting It All Together
- Rolf Ernst
Embedded projects are built out of lots of pieces. Are you sure that what you've got at the end is what you wanted when you started? Component integration is one of the tough challenges in embedded system design. Designers search for conservative design styles and reliable techniques for interfacing and verification.
http://queue.acm.org/detail.cfm?id=644268

Describing the Elephant: The Different Faces of IT as Service
- Ian Foster and Steven Tuecke
Terms such as grid, on-demand, and service-oriented architecture are mired in confusion, but there is an overarching trend behind them all.
http://queue.acm.org/detail.cfm?id=1080874

Kode Vicious Cycles On
I want to thank you both for writing to KV here in the New World where we enjoy plenty of sunshine and benevolent employers. I was actually just enjoying a rubdown from my private masseur here at the office when your letters arrived, but I sent Jacques away so that I could concentrate fully on my answer to you.
http://queue.acm.org/detail.cfm?id=1080870

Originally published in Queue vol. 14, no. 5—
Comment on this article in the ACM Digital Library

Kode Vicious - @kode_vicious

The Unholy Trinity of Software Development

Tests, documentation, and code

Related Articles