A recent conversation about development methodologies turned to the relative value of various artifacts produced during the development process, and the person I was talking with said: the code has "always been the only artifact that matters. It's just that we're only now coming to recognize that." My reaction to this, not expressed at that time, was twofold. First, I got quite a sense of déjà-vu since it harkened back to my time as an undergraduate and memories of many heated discussions about whether code was self-documenting. Second, I thought of several instances from recent experience in which the code alone simply was not enough to understand why the system was architected in a particular way.
Contrary to the point the speaker was making, the notion that code is all that matters has a long history in software development. In my experience, really good programmers have always tended towards the idea that code is self-sufficient. In fact, the ability to look at a piece of code and perceive its overall structure and purpose without recourse to comments or design documents has often been the informal way to separate the best from the rest. To paraphrase a theme from my undergraduate days: "If you don't understand my code, it doesn't mean that it needs comments, it means you need to learn more about programming."
And it is true that the best programmers I have worked with do have an amazing ability to "think in code." Six months after they had cranked out 10,000 lines of code, you could go back, point to one of the 10,000 lines of code, and ask them why it was there. And without hesitation, they would tell you. The trouble is that people with that level of ability are rare. I've only encountered a few in my roughly 30 years of software development experience. The reality of software development is that there is a much larger class of programmers who are good, but not that good. And unless you have had the immense good fortune to have a development team composed of nothing but programming ninjas, then your software development processes have to be geared to that broader class of software developers.
Two recent experiences really brought this into focus for me. The first relates to a relatively simple piece of code. The code itself was simple to understand, but what the code could not communicate was why the code existed at all. The second instance involved the value of working with formalisms other than codein this particular case, finite state machines.
The first example came to mind because I was recently looking over the code base for my company's products. I came across a fairly simple piece of code that was part of our base implementation for the factory pattern. Whenever a factory creates an object, it stores it onto a list. There is an entry point into the factory that then traverses this list and calls into our underlying storage system for each object in the list. The code is self-documenting in the sense that it is completely obvious what it is doing. On the other hand, it is completely unclear why it is doing it. I knew that the storage sub-system handles object creation automatically, so simply instantiating the object in the factory should have been sufficient.
I went and asked the developer who had written the code why the extra traversal of the list. At first, even he had trouble recalling why we had added this bit of code. Eventually, between the two of us we managed to recall that we had written the factory in the straightforward manner originally, and it hadn't worked. Although the storage sub-system did automatically handle the creation of the objects, we had run into a problem with database constraints being violated because the way our factories work, you create the object first, and then initialize various properties. Unfortunately, the storage sub-system was inserting the objects into the DB immediately upon creation. Since the properties of the object had not been set at that point, certain types of constraints were violated by that insertion, typically constraints that required certain properties to be non-null.
So the additional traversal of the list of created objects in the factory was part of a mechanism to delay inserting the objects into the DB until after the properties had been set. What this experience brought home for me is that if you have well-written code, you can easily understand what the code is doing. However, even the best-written code can't reveal why it is doing it. That's because the question of "why" is not centered on the code itself, but on the context it operates in and the design decisions made during the development of the system. The best way to communicate those ideas is not code, but comments and design documents. To me this clearly demonstrated that it is not just code that has value.
I won't go into the second example in as much detail. Suffice to say that we had embarked on doing some event-based programming. I knew from previous experience that finite state machines (FSMs) worked well in these types of circumstances and we created FSMs for various components. While working with the developers, it became immediately clear that the best way to discuss the FSMs with them was not to sit and look at the code implementing them, but to grab a sheet of paper and draw them. While it is true that the code implementing the FSM completely defines how that FSM behaves, looking at the code doesn't really give you an intuitive sense of what the FSM does. A diagram does. The graphic formalism adds value in this case that the code cannot.
I've been through enough experiences like this that I simply don't believe the "only code has value" proposition. Clearly code is the core value product of the software development process. But it's not the only thing of value, and if we want to maintain and extend our code, we need to gear our software development processes accordingly.
Originally published in Queue vol. 5, no. 6—
see this item in the ACM Digital Library
Ivar Jacobson, Ian Spence, Ed Seidewitz - Industrial Scale Agile - from Craft to Engineering
Essence is instrumental in moving software development toward a true engineering discipline.
Andre Medeiros - Dynamics of Change: Why Reactivity Matters
Tame the dynamics of change by centralizing each concern in its own module.
Brendan Gregg - The Flame Graph
This visualization of software execution is a new necessity for performance profiling and debugging.
Ivar Jacobson, Ian Spence, Brian Kerr - Use-Case 2.0
The Hub of Software Development
(newest first)The unreasonable persistence of the self-documenting code fallacy has long puzzled me too, because it is so clearly and simply refuted. Undergraduate beliefs and pretensions should not be taken seriously, but when experienced professional developers express a belief in the fallacy, the foremost question in my mind is 'what are they thinking?'
For example, when needing to use a library function or system call they are unfamiliar with, do they read its code to determine its arguments, return values, exceptions thrown, thread-safety and semantics? Of course not! If they are poor programmers, they will try to get by using examples and trial-and-error, while better programmers will read the documentation (for that matter, how did they decide that the function in question was the one they need?) Perhaps SDC believers are all applications programmers who think of the operating system and language-supplied standard libraries as some sort of deus ex machina that is not the result of programming as they understand it, and so do not recognize the contradiction between their beliefs and actions. I doubt that such a person could be much of a programmer, as libraries and other modular abstractions have an important part to play in well-designed applications as well as systems software.
In addition to wondering what are they thinking when writing code, I wonder if SDC believers ever think about the process of programming, and in particular, their own programming. Such introspection is an important part of improving one's skills, but if a programmer thinks his code is good enough that it has no need for documentation, perhaps he thinks his skills have no need of improvement. The Dunning-Kruger effect is rampant in programming; Linus Torvalds remarked that 95% of programmers think they are in the top 5%, and the rest think they are above average.
There is a school of thought, born of bitter experience, that says all comments and other documentation should be ignored, as it is rarely updated when the code is changed. Superficially, this is an argument for all code being self-documenting, but it does not follow that it can be. Furthermore, this argument fails in view of the fact that identifiers can also become misleading if there are changes in the semantics of the variables and functions they denote. For software to remain intrinsically self-documenting under maintenance, therefore, it would have to be self-documenting under arbitrary identifier renaming, which is (I hope) obviously infeasible.
I have a challenge for SDC believers: write a self-documenting implementation of the quicksort algorithm in the language of your choice, that is superior in any aspect to a combination of code and comments. The self-documented features of such an algorithm should include a proof of correctness and that it is O(n log n) on average. To anyone who says that these two issues are known properties of the algorithm, I reply: in what documentation did you read that?