May/June 2020 issue of acmqueue The May/June 2020 issue of acmqueue is out now

Subscribers and ACM Professional members login here

Kode Vicious


  Download PDF version of this article PDF

Hickory Dickory Doc

On null encryption and automated documentation

George Neville-Neil

Dear KV,

While reviewing some encryption code in our product, I came across an option that allowed for null encryption. This means the encryption could be turned on, but the data would never be encrypted or decrypted. It would always be stored "in the clear." I removed the option from our latest source tree because I figured we didn't want an unsuspecting user to turn on encryption but still have data stored in the clear. One of the other programmers on my team reviewed the potential change and blocked me from committing it, saying that the null code could be used for testing. I disagreed with her, since I think that the risk of accidentally using the code is more important than a simple test. Which of us is right?

NULL for Naught

Dear NULL,

I hope you're not surprised to hear me say that she who blocked your commit is right. I've written quite a bit about the importance of testing and I believe that crypto systems are critical enough to require extra attention. In fact, there is an important role that a null encryption option can play in testing a crypto system.

Most systems that work with cryptography are not single programs, but are actually frameworks into which different cryptographic algorithms can be placed, either at build or run time. Cryptographic algorithms are also well known for requiring a great deal of processor resources, so much so that specialized chips and CPU instructions have been produced to increase the speed of cryptographic operations. If you have a crypto framework and it doesn't have a null operation, one that takes little or no time to complete, how do you measure the overhead introduced by the framework itself? I understand that establishing a baseline measurement is not common practice in performance analysis, an understanding I have come to while banging my fist on my desk and screaming obscenities. I often think that programmers shouldn't just be given offices instead of cubicles, but padded cells. Think of how much the company would save on medical bills if everyone had a cushioned wall to bang their heads against, instead of those cheap, pressboard desks that crack so easily.

Having a set of null crypto methods allows you and your team to test two parts of your system in near isolation. Make a change to the framework and you can determine if that has speeded up or slowed down the framework overall. Add in a real set of cryptographic operations, and you will then be able to measure the effect the change has on the end user. You may be surprised to find that your change to the framework did not speed up the system overall, as it may be that the overhead induced by the framework is quite small. But you cannot find this out if you remove the null crypto algorithm.

More broadly, any framework needs to be tested as much as it can be in the absence of the operations that are embedded within it. Comparing the performance of network sockets on a dedicated loopback interface, which removes all of the vagaries of hardware, can help establish a baseline showing the overhead of the network protocol code itself. A null disk can show the overhead present in file-system code. Replacing database calls with simple functions to throw away data and return static answers to queries will show you how much overhead there is in your web and database framework.

Far too often we try to optimize systems without sufficiently breaking them down or separating out the parts. Complex systems give rise to complex measurements, and if you cannot reason about the constituent parts, you definitely cannot reason about the whole, and anyone who claims they can is bullshitting you.


Dear KV,

What do you think of systems such as Doxygen that generate documentation from code? Can they replace handwritten documentation in a project?

Dickering with Docs

Dear Dickering,

I'm not quite sure what you mean by "handwritten" documentation. Unless you have some sort of fancy mental interface to your computer that I have not yet heard of, any documentation, whether in code or elsewhere, is handwritten or at least typed by hand. If you're using anything else to type on your keyboard, please, keep it to yourself.

I believe what you're actually asking is if systems that can parse code and extract documentation are helpful, to which my answer is, "Yes, but..."

Any sort of documentation extraction system has to have something to work with to start. If you believe that extracting all of the function calls and parameters from a piece of code is sufficient to be called documentation, then you are dead wrong, but, unfortunately, you would not be alone in your beliefs. Alas, having beliefs in common with others does not make those beliefs right. What you will get from Doxygen on the typical, uncommented, code base is not even worth the term "API guide," it is actually the equivalent of running a fancy grep over the code and piping that to a text formatting system such as TeX or troff.

For code to be considered documented there must be some set of expository words associated with it. Function and variable names, descriptive as they might be, rarely explain the important concepts hiding in the code, such as, "What does this damnable thing actually do?" Many programmers claim their code is self-documenting, but, in point of fact, self-documented code is so rare that I am more hopeful of seeing a unicorn giving a ride to a manticore on the way to a bar. In fact, if I ever do see this I will be both less surprised and quite happy, because it will mean that I'm in an excellent frame of mind. The claim of self-documenting code is simply a cover up for laziness. At this point, most programmers have nice keyboards and should be able to type at 40-60 words per minute, and some of those words can easily be spared for actual documentation. It's not like we're typing on ancient, line printing, terminals.

The advantage you get from a system like Doxygen is that it provides a consistent framework in which to write the documentation. Setting off the expository text from the code is simple and easy, and this helps in encouraging people to comment their code. The next step is to convince people to make sure that their code matches the comments. Stale comments are sometimes worse than none at all because they can misdirect you when looking for a bug in the code. "But it says it does X!" is not what you want to hear yourself screaming after hours of staring at a piece of code and its concomitant comment.

Even with a semi-automatic documentation extraction system, you still need to write documentation, because an API guide is not a manual, even for the lowest level of software. How the API’s documentation comes together to form a total system and how it should and should not be used are two important features in good documentation and are the things that are lacking in the poorer kind. Once upon a time I worked for a company whose product was relatively low level and technical. We had automatic documentation extraction, which is a wonderful first step, but we also had an excellent documentation team. That team took the raw material extracted from the code and then extracted, sometimes gently and sometimes not so gently, the requisite information from the company's developers so that they could not only edit the API guide, but then write the relevant higher-level documentation that made the product actually useable for those who had not written it.

Yes, automatic documentation extraction is a benefit, but it is not the entire solution to the problem. Good documentation requires tools and processes that are followed rigorously in order to produce something of value both to those who produced it and to those who have to consume it.



[email protected]

Kode Vicious, known to mere mortals as George V. Neville-Neil, works on networking and operating system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are code spelunking, operating systems, and rewriting your bad code (OK, maybe not that last one). He earned his bachelor’s degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. George is the co-author with Marshall Kirk McKusick and Robert N. M. Watson of The Design and Implementation of the FreeBSD Operating System. He is an avid bicyclist and traveler who currently lives in New York City.

© 2015 ACM 1542-7730/15/0500 $10.00


Originally published in Queue vol. 13, no. 6
see this item in the ACM Digital Library


Follow Kode Vicious on Twitter


J. Paul Reed - Beyond the Fix-it Treadmill
The Use of Post-Incident Artifacts in High-Performing Organizations

Laura M.D. Maguire - Managing the Hidden Costs of Coordination
Controlling coordination costs when multiple, distributed perspectives are essential

Marisa R. Grayson - Cognitive Work of Hypothesis Exploration During Anomaly Response
A look at how we respond to the unexpected

Richard I. Cook - Above the Line, Below the Line
The resilience of Internet-facing systems relies on what is below the line of representation.

© 2020 ACM, Inc. All Rights Reserved.