January/February issue of acmqueue


The January/February issue of acmqueue is out now


Kode Vicious

Development

  Download PDF version of this article PDF

Hickory Dickory Doc

On null encryption and automated documentation


George Neville-Neil

Dear KV,

While reviewing some encryption code in our product, I came across an option that allowed for null encryption. This means the encryption could be turned on, but the data would never be encrypted or decrypted. It would always be stored "in the clear." I removed the option from our latest source tree because I figured we didn't want an unsuspecting user to turn on encryption but still have data stored in the clear. One of the other programmers on my team reviewed the potential change and blocked me from committing it, saying that the null code could be used for testing. I disagreed with her, since I think that the risk of accidentally using the code is more important than a simple test. Which of us is right?

NULL for Naught


Dear NULL,

I hope you're not surprised to hear me say that she who blocked your commit is right. I've written quite a bit about the importance of testing and I believe that crypto systems are critical enough to require extra attention. In fact, there is an important role that a null encryption option can play in testing a crypto system.

Most systems that work with cryptography are not single programs, but are actually frameworks into which different cryptographic algorithms can be placed, either at build or run time. Cryptographic algorithms are also well known for requiring a great deal of processor resources, so much so that specialized chips and CPU instructions have been produced to increase the speed of cryptographic operations. If you have a crypto framework and it doesn't have a null operation, one that takes little or no time to complete, how do you measure the overhead introduced by the framework itself? I understand that establishing a baseline measurement is not common practice in performance analysis, an understanding I have come to while banging my fist on my desk and screaming obscenities. I often think that programmers shouldn't just be given offices instead of cubicles, but padded cells. Think of how much the company would save on medical bills if everyone had a cushioned wall to bang their heads against, instead of those cheap, pressboard desks that crack so easily.

Having a set of null crypto methods allows you and your team to test two parts of your system in near isolation. Make a change to the framework and you can determine if that has speeded up or slowed down the framework overall. Add in a real set of cryptographic operations, and you will then be able to measure the effect the change has on the end user. You may be surprised to find that your change to the framework did not speed up the system overall, as it may be that the overhead induced by the framework is quite small. But you cannot find this out if you remove the null crypto algorithm.

More broadly, any framework needs to be tested as much as it can be in the absence of the operations that are embedded within it. Comparing the performance of network sockets on a dedicated loopback interface, which removes all of the vagaries of hardware, can help establish a baseline showing the overhead of the network protocol code itself. A null disk can show the overhead present in file-system code. Replacing database calls with simple functions to throw away data and return static answers to queries will show you how much overhead there is in your web and database framework.

Far too often we try to optimize systems without sufficiently breaking them down or separating out the parts. Complex systems give rise to complex measurements, and if you cannot reason about the constituent parts, you definitely cannot reason about the whole, and anyone who claims they can is bullshitting you.

KV



Dear KV,

What do you think of systems such as Doxygen that generate documentation from code? Can they replace handwritten documentation in a project?

Dickering with Docs


Dear Dickering,

I'm not quite sure what you mean by "handwritten" documentation. Unless you have some sort of fancy mental interface to your computer that I have not yet heard of, any documentation, whether in code or elsewhere, is handwritten or at least typed by hand. If you're using anything else to type on your keyboard, please, keep it to yourself.

I believe what you're actually asking is if systems that can parse code and extract documentation are helpful, to which my answer is, "Yes, but..."

Any sort of documentation extraction system has to have something to work with to start. If you believe that extracting all of the function calls and parameters from a piece of code is sufficient to be called documentation, then you are dead wrong, but, unfortunately, you would not be alone in your beliefs. Alas, having beliefs in common with others does not make those beliefs right. What you will get from Doxygen on the typical, uncommented, code base is not even worth the term "API guide," it is actually the equivalent of running a fancy grep over the code and piping that to a text formatting system such as TeX or troff.

For code to be considered documented there must be some set of expository words associated with it. Function and variable names, descriptive as they might be, rarely explain the important concepts hiding in the code, such as, "What does this damnable thing actually do?" Many programmers claim their code is self-documenting, but, in point of fact, self-documented code is so rare that I am more hopeful of seeing a unicorn giving a ride to a manticore on the way to a bar. In fact, if I ever do see this I will be both less surprised and quite happy, because it will mean that I'm in an excellent frame of mind. The claim of self-documenting code is simply a cover up for laziness. At this point, most programmers have nice keyboards and should be able to type at 40-60 words per minute, and some of those words can easily be spared for actual documentation. It's not like we're typing on ancient, line printing, terminals.

The advantage you get from a system like Doxygen is that it provides a consistent framework in which to write the documentation. Setting off the expository text from the code is simple and easy, and this helps in encouraging people to comment their code. The next step is to convince people to make sure that their code matches the comments. Stale comments are sometimes worse than none at all because they can misdirect you when looking for a bug in the code. "But it says it does X!" is not what you want to hear yourself screaming after hours of staring at a piece of code and its concomitant comment.

Even with a semi-automatic documentation extraction system, you still need to write documentation, because an API guide is not a manual, even for the lowest level of software. How the API’s documentation comes together to form a total system and how it should and should not be used are two important features in good documentation and are the things that are lacking in the poorer kind. Once upon a time I worked for a company whose product was relatively low level and technical. We had automatic documentation extraction, which is a wonderful first step, but we also had an excellent documentation team. That team took the raw material extracted from the code and then extracted, sometimes gently and sometimes not so gently, the requisite information from the company's developers so that they could not only edit the API guide, but then write the relevant higher-level documentation that made the product actually useable for those who had not written it.

Yes, automatic documentation extraction is a benefit, but it is not the entire solution to the problem. Good documentation requires tools and processes that are followed rigorously in order to produce something of value both to those who produced it and to those who have to consume it.


KV


LOVE IT, HATE IT? LET US KNOW

feedback@queue.acm.org


Kode Vicious, known to mere mortals as George V. Neville-Neil, works on networking and operating system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are code spelunking, operating systems, and rewriting your bad code (OK, maybe not that last one). He earned his bachelor’s degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. George is the co-author with Marshall Kirk McKusick and Robert N. M. Watson of The Design and Implementation of the FreeBSD Operating System. He is an avid bicyclist and traveler who currently lives in New York City.

© 2015 ACM 1542-7730/15/0500 $10.00

acmqueue

Originally published in Queue vol. 13, no. 6
see this item in the ACM Digital Library


Tweet



Follow Kode Vicious on Twitter
and Facebook


Have a question for Kode Vicious? E-mail him at kv@acmqueue.com. If your question appears in his column, we'll send you a rare piece of authentic Queue memorabilia. We edit e-mails for style, length, and clarity.


Related:

Ivar Jacobson, Ian Spence, Ed Seidewitz - Industrial Scale Agile - from Craft to Engineering
Essence is instrumental in moving software development toward a true engineering discipline.


Andre Medeiros - Dynamics of Change: Why Reactivity Matters
Tame the dynamics of change by centralizing each concern in its own module.


Brendan Gregg - The Flame Graph
This visualization of software execution is a new necessity for performance profiling and debugging.


Ivar Jacobson, Ian Spence, Brian Kerr - Use-Case 2.0
The Hub of Software Development



Comments

(newest first)

ARaybould | Sun, 28 Jun 2015 13:26:09 UTC

It is a false dichotomy to conclude that because being able to test with null encryption is important, we should accept that it is OK to ship code that could have null encryption enabled in the field. While your reply is narrowly correct, I would like to see the several paragraphs that are devoted to explaining the usefulness of null testing support being balanced with a discussion about how this could be done while protecting against it being released. Off the top of my head, I would probably go with conditional compilation (assuming that's an option here; I am guessing it is) that also puts a secret token in the binary so that release builds can be scanned for its accidental use.

I am aware that 'test what you ship' is a wise principle, but I do not think it needs to encompass all testing, and particularly not the testing of or using features that are not intended for field use.


Leave this field empty

Post a Comment:







© 2016 ACM, Inc. All Rights Reserved.