November/December issue of acmqueue

The November/December issue of acmqueue is out now


  Download PDF version of this article PDF

Finding More Than One Worm in the Apple

If you see something, say something.

Mike Bland

In February Apple revealed and fixed an SSL (Secure Sockets Layer) vulnerability that had gone undiscovered since the release of iOS 6.0 in September 2012. It left users vulnerable to man-in-the-middle attacks thanks to a short circuit in the SSL/TLS (Transport Layer Security) handshake algorithm introduced by the duplication of a goto statement. Since the discovery of this very serious bug, many people have written about potential causes. A close inspection of the code, however, reveals not only how a unit test could have been written to catch the bug, but also how to refactor the existing code to make the algorithm testable—as well as more clues to the nature of the error and the environment that produced it.

This article addresses five big questions about the SSL vulnerability: What was the bug (and why was it bad)? How did it happen (and how didn't it)? How could a test have caught it? Why didn't a test catch it? How can we fix the root cause?

What was the bug (and why was it bad)?

The Apple SSL vulnerability, formally labeled CVE-2014-1266, was produced by the inclusion of a spurious, unconditional goto statement that bypassed the final step in the SSL/TLS handshake algorithm. According to the National Vulnerability Database ( and the CVE (Common Vulnerabilities and Exposure) Standard Vulnerability Entry (, the bug existed in the versions of iOS, OS X, and the Apple TV operating system shown in table 1.

These formal reports describe the bug as follows: "The SSLVerifySignedServerKeyExchange function in libsecurity_ssl/lib/sslKeyExchange.c in the Secure Transport feature in the Data Security component...does not check the signature in a TLS Server Key Exchange message, which allows man-in-the-middle attackers to spoof SSL servers by (1) using an arbitrary private key for the signing step or (2) omitting the signing step." This error is visible by searching for the function name within Apple's published open-source code ( and looking for this series of statements:

if ((err = SSLHashSHA1.update(
   &hashCtx, &signedParams)) != 0)
   goto fail;
   goto fail;

Those familiar with the C programming language will recognize that the first goto fail is bound to the if statement immediately preceding it; the second is executed unconditionally. This is because whitespace, used to nest conditional statements for human readability, is ignored by the C compiler; curly braces must enclose all statements bound to an if statement when more than one statement applies.

The other >goto fail statements appearing throughout the algorithm are a common idiom in C for releasing resources when a function has encountered a fatal error prior to completion. In the flawed code, a successful update() call will result in an unconditional jump to the end of the function, before the final step of the algorithm; and the return value will indicate the handshake was successful. In essence, the algorithm gets short-circuited.

For users of Apple's Safari and other Secure Transport-based applications on the affected platforms, "secure" connections were vulnerable to man-in-the-middle attacks, whereby an attacker in a position to relay messages from a client to a "secure" server across the Internet can impersonate the server and intercept all communications after the bogus handshake. (Users of products incorporating their own SSL/TLS implementations, such as Google Chrome and Mozilla Firefox, were not affected.) Though it is unknown whether this vulnerability was ever exploited, it rendered hundreds of millions of devices (and users) vulnerable over the course of 17 months.

Apple was criticized for patching the vulnerability for iOS devices and Apple TV on Friday, February 21, 2014, making knowledge of the vulnerability widespread, but delaying the patch for OS X Mavericks until the following Tuesday. This four-day window left users who weren't aware of the iOS patch vulnerable to a now very public exploit.

Finding More Than One Worm in the Apple: Schedule of affected systems and security updates

How did it happen (and how didn't it)?

Many have noted apparently missing factors that could have caught the bug. Coding standards—specifically those enforcing the use of indentation and curly braces—combined with automated style-checking tools and code reviews, might have drawn attention to the repeated statement. An automated merge may have produced the offending extra line, and the developer may have lacked sufficient experience to detect it. Had coverage data been collected, it would have highlighted unreachable code. Compiler and static-analysis warnings also could have detected the unreachable code, though false warnings might have drowned out the signal if such tools weren't already being used regularly.

Others noted that the code appears to lack unit tests, which likely would have caught the bug. While many of the other tools and practices might have been sufficient to prevent this specific vulnerability, a deeper problem, which ultimately produced the repeated goto statement, would have been prevented by proper unit-testing discipline.

Some question whether adequate system testing took place, while others argue that because system testing can't find every bug, this was merely an unfortunate case of one that happened to slip through. Others claim use of the goto statement and/or deliberate sabotage is to blame. None of these claims stands up to scrutiny.

Goto Not "Considered Harmful"

Since it's one of the more popular theories, let's dispatch the argument that the use of >goto is to blame for this vulnerability. Many have referred to the popular notion that goto is "considered harmful," based on Edsger Dijkstra's letter published in the March 1968 Communications of the ACM. This is what Dijkstra actually said in "A Case against the GO TO Statement": "I do not claim that the clauses mentioned are exhaustive in the sense that they will satisfy all needs; but whatever clauses are suggested (e.g., abortion clauses) they should satisfy the requirement that a programmer-independent coordinate system can be maintained to describe the process in a helpful and manageable way."9 In other words, "abortion clauses" to release a function's resources may still rely on >goto, absent other direct language support.

This C language "abortion clause" idiom is legitimate and well understood, and is directly supported by other languages. For example, in C++, automatic destructors implement the RAII (Resource Acquisition Is Initialization) idiom; Java employs >try/catch/finally blocks (; Go provides the defer(), panic(), and recover() mechanisms (; and Python has try/except/finally blocks ( and the with statement, which is used to implement RAII ( Absent these mechanisms, in C this remains a legitimate application of the goto statement, lest the code become bloated with repetitive statements or the control structure become nested so deeply as to hinder readability and maintainability.

In fact, a misplaced >return statement could have produced the same effect. Imagine a macro such as the following had been defined:

#define ERROR_EXIT {\
 return err; }

Then the bug might have appeared in this incarnation:

if ((err = SSLHashSHA1.update(
   &hashCtx, &signedParams)) != 0)

Even enforcing the use of curly braces might not have prevented the error, as they could be mismatched:

if ((err = SSLHashSHA1.update(
   &hashCtx, &signedParams)) != 0)
   goto fail;
   goto fail;
if ((err =
   &hashCtx, &hashOut)) != 0)
   goto fail;

The blame for this vulnerability does not lie with the goto statement. A proper unit test would have caught the error regardless of how it was written.

Code Duplication

The handshake algorithm in which the extra goto statement appears is duplicated six times throughout the code. Figure 1 shows the algorithm containing the repeated >goto fail line as it appears in the SSLVerifySignedServerKeyExchange() function. Figure 2 shows the block immediately preceding this algorithm. This duplication is the critical factor leading to the manifestation of the vulnerability, and it can be traced directly to a lack of unit testing discipline—because of the absence of the craftsmanship and design sense that testable code requires. Someone writing code with unit testing in mind would have ensured only one copy of the algorithm existed—not only because it's theoretically more proper, but because it would have been easier to test.

Finding More Than One Worm in the Apple: The handshake algorithm containing the goto fail bug
Finding More Than One Worm in the Apple: The duplicate handshake algorithm appearing immediately before the buggy block

The coder could not "smell" ( the duplicate code as he or she was writing it—or copying it for the second, third, fourth, fifth, or sixth time! This indicates a pattern of poor habits over time, not a single careless mistake. Ultimately, this speaks to a cultural issue, not a moment of individual error.

How could a test have caught it?

Landon Fuller published a proof-of-concept unit test implemented in Objective-C,10 using the Xcode Testing Framework.2 Fuller notes that "there's no reason or excuse for [the SSLVerifySignedServerKeyExchange() function] not being fully tested for" all of the potential error conditions. This proof of concept, however, misses the opportunity to look deeper into the code and provide full test coverage of this particularly critical algorithm—so critical that it appears six times in the same file.

Step one in properly testing the algorithm is to extract it into a separate function—which in itself might have prevented the duplicate goto fail that caused the bug, since a single block of code is less susceptible to editing or automated merge errors than six blocks of nearly identical code (figure 3).

Finding More Than One Worm in the Apple: The handshake algorithm extracted into its own function

The two earlier blocks of code from SSLVerifySignedServerKeyExchange() now appear as follows:

if(isRsa) {
 /* ... */
 if ((err =  HashHandshake(
      &SSLHashMD5, &clientRandom,
      &signedParams, &hashOut))
      != 0)
   goto fail;
} else {...}
if ((err =  HashHandshake(
    &SSLHashSHA1, &clientRandom,
    &serverRandom, &signedParams,
    &hashOut)) != 0)
 goto fail;

This works because the >HashReference is a "jump table" structure, and SSLHashMD5 and SSLHashSHA1 are instances of HashReference, which point to specific hash algorithm implementations. The >HashReference interface makes it straightforward to write a small test exercising every path through the isolated HashHandshake() algorithm using a HashReference stub, and to verify that it would have caught this particular error:

+ build/
TestHandshakeFinalFailure failed:
 expected FINAL_FAILURE,
 received SUCCESS
1 test failed

The code for tls_digest_test.c is viewable at and contains all of my proof-of-concept changes; automates downloading the code, applying the patch, and building and running the test with a single command. The test and the patch are very quick efforts but work as a stand-alone demonstration without requiring the full set of dependencies needed to build the entire library. The demonstration admittedly doesn't address further duplication or other issues present in the code.

The point of all this is, if an ex-programmer who has been out of the industry for two and a half years can successfully refactor and test this code within a couple of hours, never having seen it before, why didn't the engineer or team responsible for the code properly test it 17 months earlier?

Why didn't a test catch it?

Several articles have attempted to explain why the Apple SSL vulnerability made it past whatever tests, tools, and processes Apple may have had in place, but these explanations are not sound, especially given the above demonstration to the contrary in working code. The ultimate responsibility for the failure to detect this vulnerability prior to release lies not with any individual programmer but with the culture in which the code was produced. Let's review a sample of the most prominent explanations and specify why they fall short.

Adam Langley's oft-quoted blog post13 discusses the exact technical ramifications of the bug but pulls back on asserting that automated testing would have caught it: "A test case could have caught this, but it's difficult because it's so deep into the handshake. One needs to write a completely separate TLS stack, with lots of options for sending invalid handshakes."

This "too hard to test" resignation complements the "I don't have time to test" excuse Google's Test Mercenaries, of which I was a member, often heard (though, by the time we disbanded, testing was well ingrained at Google, and the excuse was rarely heard anymore).11 As already demonstrated, however, a unit test absolutely would have caught this, without an excess of difficulty. Effectively testing the algorithm does not require "a completely separate TLS stack"; a well-crafted test exercising well-crafted code would have caught the error—indeed, the thought of testing likely would have prevented it altogether.

Unfortunately, some adopted Langley's stance without considering that the infeasibility of testing everything at the system level is why the small, medium, and large test size schema exists (as shown in figure 4, that's unit, integration, and system to most of the world outside Google).5 Automated tests of different sizes running under a continuous integration system (e.g., Google's TAP, Solano CI) are becoming standard practice throughout the industry. One would expect this to be a core feature of a major software-development operation such as Apple's, especially as it pertains to the security-critical components of its products.

Writing for Slate, David Auerbach breaks down the flaw for nonprogrammers and hypothesizes that the bug might have been caused by a merge error (based on this diff:; look for green line 631), but then concludes: "I think the code is reasonably good by today's standards. Apple wouldn't have released the code as open source if it weren't good, and even if they had, there would have been quite an outcry from the open-source community if they'd looked it over and found it to be garbage."3

Auerbach's conclusion assumes that everything Apple releases is high quality by definition, that it has every reasonable control in place to assure such high quality, and that all open-source code receives the focused scrutiny of large numbers of programmers (thanks to Stephen Vance for pointing this out specifically in his comments on my earlier presentation, which inspired this article)—at least, programmers motivated to report security flaws. The actual code, however, suggests a lack of automated testing discipline and the craftsmanship that accompanies it, as well as the absence of other quality controls, not the fallibility of the existing discipline that Auerbach imagines Apple already applies.

Security guru Bruce Schneier notes, "The flaw is subtle, and hard to spot while scanning the code. It's easy to imagine how this could have happened by error.... Was this done on purpose? I have no idea. But if I wanted to do something like this on purpose, this is exactly how I would do it."15 Schneier's focus is security, not code quality, so his perspective is understandable; but the evidence tips heavily in favor of programmer error and a lack of quality controls.

Delft University computer science professor Arie van Deursen notes many industry-standard tools and practices that could have caught the bug; but despite self-identifying as a strong unit-testing advocate, he demurs from asserting that the practice should have been applied: "In the current code, functions are long, and they cover many cases in different conditional branches. This makes it hard to invoke specific behavior.... Thus, given the current code structure, unit testing will be difficult."16 As already demonstrated, however, this one particular, critical algorithm was easy to extract and test. Software structure can be changed to facilitate many purposes, including improved testability. Promoting such changes was the job of the Test Mercenaries at Google.

My former Test-Mercenary colleague C. Keith Ray noted both in his comments on van Deursen's post and in his own blog: "Most developers who try to use TDD [test-driven development] in a badly designed, not-unit-tested project will find TDD is hard to do in this environment, and will give up. If they try to do 'test-after' (the opposite of TDD's test-first practice), they will also find it hard to do in this environment and give up. And this creates a vicious cycle: untested bad code encourages more untested bad code."14

I largely agree with Ray's statement but had hoped he might seize the opportunity to mention the obvious duplicate code smell and how to eliminate it. Again, that was our stock-in-trade as Test Mercenaries. Absence of TDD in the past doesn't preclude making code more testable now, and we have a responsibility to demonstrate how to do so.

Columbia University computer science professor Steven M. Bellovin provides another thorough explanation of the bug and its ramifications, but when he asks "why they didn't catch the bug in the first place," his focus remains on the infeasibility of exhaustive system-level testing: "No matter how much you test, you can't possibly test for all possible combinations of inputs that can result to try to find a failure; it's combinatorially impossible."4

As demonstrated, this vulnerability wasn't a result of insufficient system testing; it was because of insufficient unit testing. Keith Ray himself wrote a "Testing on the Toilet"8 article, "Too Many Tests,"11 explaining how to break complex logic into small, testable functions to avoid a combinatorial explosion of inputs and still achieve coverage of critical corner cases ("equivalence class partitioning"). Given the complexity of the TLS algorithm, unit testing should be the first line of defense, not system testing. When six copies of the same algorithm exist, system testers are primed for failure.

Such evidence of a lack of developer testing discipline, especially for security-critical code, speaks to a failure of engineering and/or corporate culture to recognize the importance and impact of unit testing and code quality, and the real-world costs of easily preventable failures—and to incentivize well-tested code over untested code. Comments by an anonymous ex-Apple employee quoted by Charles Arthur in The Guardian2 support this claim:

"Why didn't Apple spot the bug sooner?

"The former programmer there says, 'Apple does not have a strong culture of testing or test-driven development. Apple relies overly on dogfooding [using its own products] for quality processes, which in security situations is not appropriate....

"What lessons are there from this?

"But the former staffer at Apple says that unless the company introduces better testing regimes—static code analysis, unit testing, regression testing—'I'm not surprised by this... it will only be a matter of time until another bomb like this hits.' The only—minimal—comfort: 'I doubt it is malicious.'"

Reviewer Antoine Picard, commenting on the similarity between this security vulnerability and reported problems with Apple's MacBook power cords, noted: "When all that matters is the design, everything else suffers."12

How can we fix the root cause?

Those with unit-testing experience understand its productivity benefits above and beyond risk prevention; but when the inexperienced remain stubbornly unconvinced, high-visibility bugs such as this can demonstrate the concrete value of unit testing—in working code.

Seize the teachable moments! Write articles, blog posts, flyers, give talks, start conversations; contribute working unit tests when possible; and hold developers, teams, and companies responsible for code quality.

Over time, through incremental effort, culture can change. The Apple flaw, and the Heartbleed bug discovered in OpenSSL in April 2014—after this article was originally drafted—could have been prevented by the same unit-testing approach that my Testing Grouplet (, Test Certified,6 Testing on the Toilet, and Test Mercenary partners in crime worked so hard to demonstrate to Google engineering over the course of several years. By the time we finished, thorough unit testing had become the expected cultural norm. (My commentary on Heartbleed, with working code, is available at

Culture change isn't easy, but it's possible. If like-minded developers band together across teams, across companies, even across the industry—such as is beginning to happen with the Automated Testing Boston Meetup (, its sister groups in New York, San Francisco, and Philadelphia, and the AutoTest Central community blog (—and engage in creative pursuits to raise awareness of such issues and their solutions, change will come over time.

The goal is that this and upcoming articles (including my "Goto Fail, Heartbleed, and Unit-testing Culture" article published by Martin Fowler, will drive discussion around the Apple SSL and Heartbleed bugs, spreading awareness and improving the quality of discourse—not just around these specific bugs, but around the topics of unit testing and code quality in general. These bugs are a perfect storm of factors that make them ideal for such a discussion:

• The actual flaw is very obvious in the case of the Apple bug, and the Heartbleed flaw requires only a small amount of technical explanation.

• The unit-testing approaches that could have prevented them are straightforward.

• User awareness of the flaws and their severity is even broader than for other well-known software defects, generating popular as well as technical press.

• The existing explanations that either dismiss the ability of unit testing to find such bugs or otherwise excuse the flaw are demonstrably unsound.

If we don't seize these opportunities to make a strong case for the importance and impact of automated testing, code quality, and engineering culture, and hold companies and colleagues accountable for avoidable flaws, how many more preventable, massively widespread vulnerabilities and failures will occur? What fate awaits us if we don't take appropriate corrective measures in the wake of >goto fail and Heartbleed? How long will the excuses last, and what will they ultimately buy us?

And what good is the oft-quoted bedrock principle of open-source software, Linus's Law—"Given enough eyeballs, all bugs are shallow"—if people refuse to address the real issues that lead to easily preventable, catastrophic defects?

I have worked to produce artifacts of sound reasoning based on years of experience and hard evidence—working code in the form of the Apple patch-and-test tarball and heartbleed_test.c (—to back up my rather straightforward claim: a unit-testing culture most likely could have prevented the catastrophic >goto fail and Heartbleed security vulnerabilities.

High-profile failures such as the Apple SSL/TLS vulnerability and the Heartbleed bug are prime opportunities to show the benefits of automated testing in concrete terms; to demonstrate technical approaches people can apply to existing code; and to illustrate the larger, often cultural, root causes that produce poor habits and bugs. Given the extent to which modern society has come to depend on software, the community of software practitioners must hold its members accountable, however informally, for failing to adhere to fundamental best practices designed to reduce the occurrence of preventable defects—and must step forward not to punish mistakes but to help address root causes leading to such defects. If you see something, say something!

Attribution/Further Reading

This article is based on my presentation, "Finding More than One of the Same Worm in the Apple" (, and the corresponding one-page Testing-on-the-Toilet-inspired treatment ( These were based on my blog entry, "Test Mercenary (Slight Return)" (, and my AutoTest Central article, "Finding the Worm Before the Apple Has Shipped" ( Excerpts from my blog post, "The Official Apple SSL Bug Testing on the Toilet Episode" (, were also used in the concluding section. All were published under a Creative Commons Attribution 4.0 International License (

The small, medium, and large test pyramid image shown in figure 4 is by Catherine Laplace, based on the author's sketch of an image from the Testing Grouplet/EngEDU Noogler Unit Testing lecture slides for new Google engineers.

Finding More Than One Worm in the Apple: The Small/Medium/Large Test Strategy

Partners In Crime

My deepest gratitude extends to my former Google colleagues, new associates from the Automated Testing Boston Meetup, and generous acquaintances whom I've met only online: David Plass, Isaac Truett, Stephen Vance, RT Carpenter, Gleb Bahmutov, Col Willis, Chris Lopez, and Antoine Picard. They provided tremendous input into the slides and one-page treatment, producing the structure and focus evident in those works and this article.

I'd like to thank Sarah Foster of the Automated Testing Boston Meetup and the AutoTest Central blog for providing a forum to discuss this issue and the opportunity to connect with other like-minded developers.

Finally, I don't know how I'll ever repay Guido van Rossum of Python and Dropbox for advocating on my behalf that this article be published in ACM Queue, and Martin Fowler of ThoughtWorks for engaging me to write the "Goto Fail, Heartbleed, and Unit Testing Culture" article.


1. Apple Inc. 2014. Xcode overview;

2. Arthur, C. 2014. Apple's SSL iPhone vulnerability: how did it happen, and what next? The Guardian, (February 25);

3. Auerbach, D. 2014. An extraordinary kind of stupid. Slate (February 25);

4. Bellovin, S. M. 2014. Goto Fail. SMBlog (February 23);

5. Bland, M. 2014. AutoTest Central;

6. Bland, M. 2011. Test Certified;

7. Bland, M. 2012. Test Mercenaries;

8. Bland, M. 2011. Testing on the Toilet;

9. Dijkstra, E. 1968. A case against the GO TO statement. Communications of the ACM 11 (3): 147-148;

10. Fuller, L. 2014. TestableSecurity: demonstrating that SSLVerifySignedServerKeyExchange() is trivially testable;

11. Google, Inc. 2008. Too many tests. Google Testing Blog (February 21);

12. Greenfield, R. 2012. Why Apple's power cords keep breaking. The Wire (July 30);

13. Langley, A. 2014. Apple's SSL/TLS bug. Imperial Violet (February 22);

14. Ray, C. K. 2014. TDD and signed SSLVerifySignedServerKeyExchange. Exploring Agile Solutions: Software Development with Agile Practices (February 23);

15. Schneier, B. 2014. Was the iOS SSL flaw deliberate? Schneier on Security: A Blog Covering Security and Security Technology (February 27);

16. van Deursen, A. 2014. Learning from Apple's #gotofail security bug. Arie van Deursen: Software Engineering in Theory and Practice (February 22);


[email protected]

Mike Bland was a software engineer at Google from 2005 to 2011. Prior to working on Web-search infrastructure, he led the Testing and Fixit Grouplets; was a member of the Test Mercenaries, Testing Tech, and Build Tools teams; and was instrumental in bringing about the engineering culture changes that made thorough developer testing the accepted cultural norm. He does not represent Google in any capacity and is a student at Berklee College of Music.

© 2014 ACM 1542-7730/14/0500 $10.00


Originally published in Queue vol. 12, no. 5
see this item in the ACM Digital Library



Arvind Narayanan, Jeremy Clark - Bitcoin's Academic Pedigree
The concept of cryptocurrencies is built from forgotten ideas in research literature.

Geetanjali Sampemane - Internal Access Controls
Trust, but Verify

Thomas Wadlow - Who Must You Trust?
You must have some trust if you want to get anything done.

Bob Toxen - The NSA and Snowden: Securing the All-Seeing Eye
How good security at the NSA could have stopped him


(newest first)

Displaying 10 most recent comments. Read the full list here

Mike Bland | Tue, 20 May 2014 14:41:43 UTC

@Andrew Raybould: Regarding unit testing (not necessarily pure TDD), I was dismayed that reactions to these bugs did not contain a stronger argument to support the claim that unit testing could've found them, and that some actively dismissed unit testing because the code was "too hard to test". I guess in other parts of the world, there's the opposite problem, with some taking TDD to the extreme and testing things to death. Between this and the Fowler article, I hope to present an emphatic, yet non-extremist view that unit testing, done well, can go a long way to preventing low-level errors such as these and should be a part of every developer's arsenal (along with static analysis et. al.)--which is as much a cultural argument as a technical one.

@Bob Binder: I think, in this one specific case of "abortion clauses"/resource cleanup, goto provides the least cumbersome solution in C. As I mention in the article, you could structure the code with nested ifs or redundant check/release/return statements bound to every step of the algorithm, but I think that potentially harms readability/maintainability more than applying goto to this one particular, well-understood case. But I was always a C++ programmer in my day, and never had an occasion to use goto in my career as best I can recall.

As far as "the Apple flaw" vs "the Heartbleed bug", I was avoiding the use of "bug" back-to-back in the same sentence. I could've used "defect" or "vulnerability" interchangeably as well.

Bob Binder | Mon, 19 May 2014 21:44:32 UTC


The rubric "Go To Considered Harmful" was the creation of Robert Ashenhurst, long time editor of the CACM letters to the editor section, Professor of Computing at the University of Chicago, and a great friend. While it's true Dijkstra didn't write that, I think Ashenhurst succinctly stated the essence of his point.

It is always possible to structure code correctly without gotos - even in C. There's no point in rehashing that war, but using go to seems like pissing in the wind just to prove you can. I wonder if that mind set excluded code reviews and static analyzers as well.

As to whether unit testing should have revealed this bug, I cannot imagine how any minimally adequate test suite could not have resulted in execution of the second go to, which presumably would have triggered an observable failure. By minimally adequate, I mean that each line of code has been executed at least once in test. Also known as statement or line coverage, this criterion is minimally adequate because if you do not execute a statement with a bug, you have no chance of revealing that bug. Of course, this is a very weak criterion that will not reveal many, many kinds of bugs, but it is better than no chance at all.

A question - why is it "The Apple flaw" and the "Heartbleed bug"?

Andrew Raybould | Sun, 18 May 2014 16:31:34 UTC

@Mike Bland: I take your point that you can only address one issue at a time.

I was surprised by your feeling that unit testing is under-appreciated - are you referring specifically to commentary on this bug? At least since the appearance of the phrase 'test-driven development', I have had the impression that testing, and unit testing in particular, has become the latest of the One True Ways. Perhaps the problem is that while everyone is talking about it, few are actually doing it.

It would be a good thing if increased unit testing resulted in a reduction in code duplication, but I wonder if developers will apply cut-and-paste to test generation with the same enthusiasm that they have applied it to application code generation.

I see a pervasive design problem that seems, in some ways, closely related to code duplication: the proliferation of unnecessary classes and methods that do little to advance the computation or provide any other benefit. Instead, they fragment algorithms, responsibilities and information, and scatter their representation across wide swaths of source code, making it difficult to figure out (and therefore analyze) what is going on (the Gertrude Stein problem: "there is no 'there' there.") Unit testing might be more effective in rooting out this practice than simple duplication, because cut-and-paste would not be as (mis-)applicable.

I agree that the continuing difficulties with software quality have a deep-rooted cultural component. A colleague once told me that in the medical equipment company where he once worked, testing for buffer overflows and invalid pointer use was not performed because if problems were found, they would have to fix them. If that is not a crime, it should be.

Mike Bland | Sun, 18 May 2014 15:44:16 UTC

@Steve Mattingly: No worries; I was able to keep the two "yous" separate. Regarding your tone, you should've seen the drafts of this and the Fowler article before I got folks to review them! ;-)

Also, I misspoke in my earlier comment: The next installment of the Fowler article will cover the costs and benefits of unit testing; the following installment will cover complimentary tools and practices (which I've already updated to cover the "false warnings" issue).

Steve M | Sat, 17 May 2014 22:28:58 UTC


Thanks for the feedback; I think we're on the same page.

I apologize if, in my haste, my tone was too strident. (In particular, I hope that addressing "you" Mike Bland in the early part of the post did not give the impression that I was telling "you" Mike Bland to check your ego in the latter part.)

Steve Mattingly

Mike Bland | Sat, 17 May 2014 14:24:28 UTC

@Steve M: Forgot to mention: I actually sympathize with your feeling about a false culture of "false warnings". In retrospect, I regret not being clearer about that; in my haste, I neglected to give that false notion its due. I'll add something to my Fowler article to challenge that very notion.

Mike Bland | Sat, 17 May 2014 14:18:43 UTC

@Steve M and Andrew Raybould: I agree static analysis/compiler warnings should be applied; but they've gotten a lot more publicity so far. I had nothing to add to that topic, and my objective here was to break the silence around unit testing. In my article on Martin Fowler's site (linked from this article), the next installment to appear next week will be devoted to brief highlights of other tools, and will remark on how such tools and unit testing are complimentary; I was up against a word limit here.

I'm all for complimentary tools/redundant safeguards, but I also assert that the tolerance of code duplication is evidence of a deeper cultural issue, and that unit testing alone amongst all other proposed solutions (except maybe code review) would exert the design pressure to avoid such duplication. Braces and analysis would've contained the potential damage in this instance, but would not have gone far to solve the deeper cultural problem that tolerates low-quality code that enables bugs like this to hide.

Also, the culpability isn't just with the team that produced the bug, or even the company: I'd like to see greater accountability across the industry. Every tool should be brought to bear to prevent such defects, and so far unit testing hasn't had very strong representation, and has on occasion been actively dismissed. I aim to fix that.

@Steve M: I agree the bug never should've made it to a test. The programmers never should've allowed the duplication that sheltered it, and unit testing would've (should've) encouraged only one copy of the algorithm. I feel reasonably sure the first copy didn't contain the bug, otherwise it would've appeared in the code more than once, since the presence of six copies strongly suggests cut-and-paste. That, and they should've had compiler warnings turned on full.

We are of the same mind regarding your comment on professional ethics. I'm avoiding using legally-charged words, but I'm hoping folks will put two and two together for themselves before the lawyers do.

Thanks for your thoughtful comments. They are truly a cut above. :-)

Richard Tallent | Sat, 17 May 2014 07:42:56 UTC

I would argue that one of the key root causes is the overuse of branching to compute a result for "err" and act on it. Consider this alternative:

err = SSLFreeBuffer(&hashCtx) || ReadyHash(&SSLHashSHA1, &hashCtx) || SSLHashSHA1.update(&hashCtx, &clientRandom) || SSLHashSHA1.update(&hashCtx, &serverRandom) || SSLHashSHA1.update(&hashCtx, &signedParams) ||, &hashOut); if (err != 0) goto fail;

IMHO, this performs just as well (using bitwise short-circuiting), expresses the logic more clearly, and avoids relying on duplicated branching code and sidecar evaluation of assignments.

Andrew Raybould | Sat, 17 May 2014 03:56:36 UTC

In his enthusiasm to promote unit testing, which is a laudable goal, the author seems to me to be giving short shrift to static analysis. It is dismissed with the comment that its results might have been hidden in false positives, while on the other hand, it is assumed that unit testing would have found the bug.

In this particular case, that might be true, but we are not merely attempting to avoid a repeat of this particular bug, and the practical limitations of testing are well-known. In practice, analysis and testing tend to be complementary, as neither are perfect, and any serious attempt to improve code quality should take advantage of both.

The author states that the code could be refactored to make it more testable, but it could also be refactored to make its analysis more tractable. You may recall that this was Dijkstra's primary argument for deprecating the go-to.

Steve M | Fri, 16 May 2014 20:56:14 UTC

I think your emphasis on testing obscures the most important point here. This defect should have never made it to any sort of test.

Instead, it should have been identified and eliminated at compile time. You were very close to the target when you wrote "[c]ompiler and static-analysis warnings also could have detected the unreachable code...". (I would say "should have" rather than "could have".)

But the rest of that sentence illustrates some cultural issues that quickly snuff out this glimmer of insight: "... though false warnings might have drowned out the signal if such tools weren't already being used regularly."

I question the concept of "false warnings". If the compiler, lint tool, or similar is generating warnings that it should not (i.e., the alleged issue does not exist in the code), then you need to change tools. I have never seen this happen. But I have seen many developers and teams perpetuate "false warning" cultures because they don't (or can't be bothered to) understand what the warning is telling them about the code. As a result, the warnings are ignored or (more likely) hidden through compiler flags. I have seen this too many times to count, and this is not the first time I have seen it mask defects in systems with large implications for public welfare.

I advocate a culture that holds these truths to be self-evident: that static analysis tools should be used to the greatest extent possible, with the strictest settings, and that every warning message can and should be resolved appropriately. "Appropriately" here means, in descending order of preference: 1. Change the code to eliminate the issue identified by the warning. If this seems unnecessary, check your ego. If your expertise and judgment actually exceeds that of all the people who produced the compiler or analysis tool, consider changing tools. 2. Change the code to suppress the warning for a specific occurrence. This can usually be done with some sort of annotation to the code. This should include a comment with the rationale for this step. The rationale should be a group consensus, not an individual decision. 3. Suppress the warning for all occurrences, using the analysis tool's configuration options. This also should be by group consensus, and should be a rare step.

This is low-hanging fruit. Professional ethics demand that we advance toward a culture where ignorance of these practices is considered incompetence, and willful failure to apply them is considered malpractice.

Displaying 10 most recent comments. Read the full list here
Leave this field empty

Post a Comment:

© 2018 ACM, Inc. All Rights Reserved.