Hickory Dickory Doc

On null encryption and automated documentation

George Neville-Neil

Dear KV,

While reviewing some encryption code in our product, I came across an option that allowed for null encryption. This means the encryption could be turned on, but the data would never be encrypted or decrypted. It would always be stored "in the clear." I removed the option from our latest source tree because I figured we didn't want an unsuspecting user to turn on encryption but still have data stored in the clear. One of the other programmers on my team reviewed the potential change and blocked me from committing it, saying that the null code could be used for testing. I disagreed with her, since I think that the risk of accidentally using the code is more important than a simple test. Which of us is right?

NULL for Naught

Dear NULL,

I hope you're not surprised to hear me say that she who blocked your commit is right. I've written quite a bit about the importance of testing and I believe that crypto systems are critical enough to require extra attention. In fact, there is an important role that a null encryption option can play in testing a crypto system.

Most systems that work with cryptography are not single programs, but are actually frameworks into which different cryptographic algorithms can be placed, either at build or run time. Cryptographic algorithms are also well known for requiring a great deal of processor resources, so much so that specialized chips and CPU instructions have been produced to increase the speed of cryptographic operations. If you have a crypto framework and it doesn't have a null operation, one that takes little or no time to complete, how do you measure the overhead introduced by the framework itself? I understand that establishing a baseline measurement is not common practice in performance analysis, an understanding I have come to while banging my fist on my desk and screaming obscenities. I often think that programmers shouldn't just be given offices instead of cubicles, but padded cells. Think of how much the company would save on medical bills if everyone had a cushioned wall to bang their heads against, instead of those cheap, pressboard desks that crack so easily.

Having a set of null crypto methods allows you and your team to test two parts of your system in near isolation. Make a change to the framework and you can determine if that has speeded up or slowed down the framework overall. Add in a real set of cryptographic operations, and you will then be able to measure the effect the change has on the end user. You may be surprised to find that your change to the framework did not speed up the system overall, as it may be that the overhead induced by the framework is quite small. But you cannot find this out if you remove the null crypto algorithm.

More broadly, any framework needs to be tested as much as it can be in the absence of the operations that are embedded within it. Comparing the performance of network sockets on a dedicated loopback interface, which removes all of the vagaries of hardware, can help establish a baseline showing the overhead of the network protocol code itself. A null disk can show the overhead present in file-system code. Replacing database calls with simple functions to throw away data and return static answers to queries will show you how much overhead there is in your web and database framework.

Far too often we try to optimize systems without sufficiently breaking them down or separating out the parts. Complex systems give rise to complex measurements, and if you cannot reason about the constituent parts, you definitely cannot reason about the whole, and anyone who claims they can is bullshitting you.

Dear KV,

What do you think of systems such as Doxygen that generate documentation from code? Can they replace handwritten documentation in a project?

Dickering with Docs

Dear Dickering,

I'm not quite sure what you mean by "handwritten" documentation. Unless you have some sort of fancy mental interface to your computer that I have not yet heard of, any documentation, whether in code or elsewhere, is handwritten or at least typed by hand. If you're using anything else to type on your keyboard, please, keep it to yourself.

I believe what you're actually asking is if systems that can parse code and extract documentation are helpful, to which my answer is, "Yes, but..."

Any sort of documentation extraction system has to have something to work with to start. If you believe that extracting all of the function calls and parameters from a piece of code is sufficient to be called documentation, then you are dead wrong, but, unfortunately, you would not be alone in your beliefs. Alas, having beliefs in common with others does not make those beliefs right. What you will get from Doxygen on the typical, uncommented, code base is not even worth the term "API guide," it is actually the equivalent of running a fancy grep over the code and piping that to a text formatting system such as TeX or troff.

For code to be considered documented there must be some set of expository words associated with it. Function and variable names, descriptive as they might be, rarely explain the important concepts hiding in the code, such as, "What does this damnable thing actually do?" Many programmers claim their code is self-documenting, but, in point of fact, self-documented code is so rare that I am more hopeful of seeing a unicorn giving a ride to a manticore on the way to a bar. In fact, if I ever do see this I will be both less surprised and quite happy, because it will mean that I'm in an excellent frame of mind. The claim of self-documenting code is simply a cover up for laziness. At this point, most programmers have nice keyboards and should be able to type at 40-60 words per minute, and some of those words can easily be spared for actual documentation. It's not like we're typing on ancient, line printing, terminals.

The advantage you get from a system like Doxygen is that it provides a consistent framework in which to write the documentation. Setting off the expository text from the code is simple and easy, and this helps in encouraging people to comment their code. The next step is to convince people to make sure that their code matches the comments. Stale comments are sometimes worse than none at all because they can misdirect you when looking for a bug in the code. "But it says it does X!" is not what you want to hear yourself screaming after hours of staring at a piece of code and its concomitant comment.

Even with a semi-automatic documentation extraction system, you still need to write documentation, because an API guide is not a manual, even for the lowest level of software. How the API’s documentation comes together to form a total system and how it should and should not be used are two important features in good documentation and are the things that are lacking in the poorer kind. Once upon a time I worked for a company whose product was relatively low level and technical. We had automatic documentation extraction, which is a wonderful first step, but we also had an excellent documentation team. That team took the raw material extracted from the code and then extracted, sometimes gently and sometimes not so gently, the requisite information from the company's developers so that they could not only edit the API guide, but then write the relevant higher-level documentation that made the product actually useable for those who had not written it.

Yes, automatic documentation extraction is a benefit, but it is not the entire solution to the problem. Good documentation requires tools and processes that are followed rigorously in order to produce something of value both to those who produced it and to those who have to consume it.

LOVE IT, HATE IT? LET US KNOW

[email protected]

Kode Vicious, known to mere mortals as George V. Neville-Neil, works on networking and operating system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are code spelunking, operating systems, and rewriting your bad code (OK, maybe not that last one). He earned his bachelor’s degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. George is the co-author with Marshall Kirk McKusick and Robert N. M. Watson of The Design and Implementation of the FreeBSD Operating System. He is an avid bicyclist and traveler who currently lives in New York City.

Originally published in Queue vol. 13, no. 6—
Comment on this article in the ACM Digital Library

More related articles:

Nicole Forsgren, Eirini Kalliamvakou, Abi Noda, Michaela Greiler, Brian Houck, Margaret-Anne Storey - DevEx in Action
DevEx (developer experience) is garnering increased attention at many software organizations as leaders seek to optimize software delivery amid the backdrop of fiscal tightening and transformational technologies such as AI. Intuitively, there is acceptance among technical leaders that good developer experience enables more effective software delivery and developer happiness. Yet, at many organizations, proposed initiatives and investments to improve DevEx struggle to get buy-in as business stakeholders question the value proposition of improvements.

João Varajão, António Trigo, Miguel Almeida - Low-code Development Productivity
This article aims to provide new insights on the subject by presenting the results of laboratory experiments carried out with code-based, low-code, and extreme low-code technologies to study differences in productivity. Low-code technologies have clearly shown higher levels of productivity, providing strong arguments for low-code to dominate the software development mainstream in the short/medium term. The article reports the procedure and protocols, results, limitations, and opportunities for future research.

Ivar Jacobson, Alistair Cockburn - Use Cases are Essential
While the software industry is a fast-paced and exciting world in which new tools, technologies, and techniques are constantly being developed to serve business and society, it is also forgetful. In its haste for fast-forward motion, it is subject to the whims of fashion and can forget or ignore proven solutions to some of the eternal problems that it faces. Use cases, first introduced in 1986 and popularized later, are one of those proven solutions.

Jorge A. Navas, Ashish Gehani - OCCAM-v2: Combining Static and Dynamic Analysis for Effective and Efficient Whole-program Specialization
OCCAM-v2 leverages scalable pointer analysis, value analysis, and dynamic analysis to create an effective and efficient tool for specializing LLVM bitcode. The extent of the code-size reduction achieved depends on the specific deployment configuration. Each application that is to be specialized is accompanied by a manifest that specifies concrete arguments that are known a priori, as well as a count of residual arguments that will be provided at runtime. The best case for partial evaluation occurs when the arguments are completely concretely specified. OCCAM-v2 uses a pointer analysis to devirtualize calls, allowing it to eliminate the entire body of functions that are not reachable by any direct calls.