Open vs. Closed

Which source is more secure?

Richard Ford, Florida Institute Of Technology

There is no better way to start an argument among a group of developers than proclaiming Operating System A to be “more secure” than Operating System B. I know this from first-hand experience, as previous papers I have published on this topic have led to reams of heated e-mails directed at me—including some that were, quite literally, physically threatening. Despite the heat (not light!) generated from attempting to investigate the relative security of different software projects, investigate we must.

Understanding why products are (and are not) secure is a critical stepping stone toward building better software.

Before wading into these dangerous waters, we should clarify the question. All too often when comparing open and closed source approaches, the question is unconsciously interpreted as Windows versus Linux. While that’s a fantastic question to knock around, doing so is a very narrow way of looking at the world, as it ignores many other projects in both the open and closed source worlds. Although it’s foolish to ignore the data points the Windows/Linux world provides, they are simply examples of the process. So, let us first strip away the misconception that the question is about these particular platforms and recognize its real breadth.

With this in mind, our answer requires three crucial definitions in order to have meaning: “What is open source?”; “What is closed source?”; and, surprisingly, “What is security?” The first two we can deal with quickly; the third is a lot subtler, however, so we shall tackle it first.

What is Security?

Traditionally, we tend to think of security as maintaining the CIA (confidentiality, integrity, and availability) of information. This is a useful taxonomy of security, and because of this, it’s pervasive. One limitation of the CIA approach is that it isn’t very helpful when we consider how to measure security. What does it mean to say that one product is more secure than another product? Is C more important than A, and is A more important than I? How does one rank these different aspects of security?

A literature review quickly shows that measuring security is a tricky problem, which, as yet, we haven’t gotten our arms around very well. That’s a pity, because if we had, it would be tempting to run the simple experiment of measuring the security of various open and closed source projects and see if one methodology is consistently more secure than the other. If closed source, for example, were measurably better from a security perspective, we would have the answer to our question.

There are two obvious ways to measure security:

What are the chances of any member of the CIA triad being violated?
How many actual vulnerabilities are there in a product?

Let’s take a look at both of these approaches.

The problem is that the former is a combination of the quality of the software under test, the number and type of attackers targeting that software, and how the box is configured, administered, and used. Thus, if “more secure” simply means measuring the probability of compromise, it might be possible to conclude that MS-DOS with a TCP/IP stack is more secure than a fully patched Windows XP box, simply because the number of attackers looking for MS-DOS machines is now vanishingly small. While the measure is pragmatic, it tells us a lot about the ubiquity of the system and the talent and number of its attackers.

Discarding this approach leaves us with the latter of our two approaches: counting vulnerabilities in the code. Even here, it’s not obvious how to proceed, as we don’t have direct measures of actual vulnerability counts; we have information only about the number of vulnerabilities that are publicly disclosed. Thus, like the first approach, this one doesn’t provide an objective measure of security; it also considers external factors (such as attacker profile).

A variation of this approach is known as “days of risk,” which is literally counting the elapsed time between vulnerability disclosure and remediation. Defining remediation is a difficult task. Does turning off a noncritical service count as temporarily “fixing” the problem, or does only a “sanctioned” patch supported by the vendor constitute remediation? This would depend on the service provided and the needs of the user. Even if we can agree on remediation, the number of attackers plays a critical role in determining the total days of risk. Despite this, the approach is tremendously practical because it takes into account the fact that actually exploiting a vulnerability is relatively rare until the vulnerability is publicly known.1 It’s a practical measure, however, and as such, doesn’t speak directly to inherent security properties, but pragmatic ones. Note here that days of risk are traditionally counted from the date the vulnerability is publicly available, not the date an exploit is known. Although one can argue that knowledge of the vulnerability is meaningless in the absence of an exploit, it is often difficult to determine when an exploit became “public,” as many members of the black-hat community keep such information under close guard. Thus, vulnerability date is the most objective—and therefore repeatable—measure (even if it is not as desirable as the exploit date).

Even based on this short discussion, it’s clear that accurately measuring security will mean different things to different people. Thus, for the purposes of this article, it’s reasonable to accept that we can’t (yet) measure the inherent security outcomes of open/closed source processes in an ordinal way. This means that our “experimental” approach to determining which approach leads to better security is off the table: until the science matures, we will have to examine the pros and cons of each approach independently and try to balance them ourselves.

Open Source, Closed Source

Put simply, the open source process can be thought of as an approach where the source code to products/executables is provided. In contrast, closed source approaches restrict source-code access to just the developers of the product and other chosen individuals (usually under the constraints of a nondisclosure agreement). In both worlds, many finer distinctions can be made. For example, some open source projects restrict development to a small cadre of programmers; others allow anyone to contribute. Source code access, however, is the key distinction between the approaches. Note also that neither case requires software to be free nor “for fee”—though the open source world is generally friendlier in terms of licensing.

Perhaps appropriately for the open source community, a more precise definition of open source varies from person to person. At its simplest, open source refers to the practice of providing the source code for programs. Furthermore, most proponents of the open source approach would agree that the distributed source code should be legally modifiable and redistributable (with some license restrictions). Thus, users have the ability to inspect and modify programs they use. (A far more complete definition is provided at http://www.opensource.org.)

In contrast, the closed source approach seals the program code. As such, derivative works are usually legally prohibited. Proponents of both camps may object to the simplicity of my definitions: they do capture the essence of both approaches but fail to capture the culture that surrounds them.

Culturally, closed source represents traditional corporate software developers. When we think of open source, however, we tend to think of volunteers working as a collective, free software, and community projects. Open source structures are fluid; closed ones rigid. While this is something of a caricature, like all good sketches, it does catch some of the “feel” of the movement.

Inherent Security Properties

Armed now with an understanding of the question, it is time to examine the relative merits of the two approaches from a security perspective. Clearly, others have undertaken this process (for a slightly different perspective, for example, see Ross Anderson2); however, there are many issues that are not addressed completely. As such, we begin by considering the most basic difference between the development methodologies: one can examine the source code of an open source project. Pragmatically, this is of use to both the attacker and the defender.

From the attacker’s point of view, code availability means that there is complete disclosure on how a particular feature is implemented. Furthermore, it means that discussion of weaknesses and design decisions often happens in the open (see the “Disclosure Models” section later in this article). Thus, open source products allow the attacker a white-box view of the product and, potentially, associated problems. When a security patch is made available, it is trivial for the attacker to determine exactly what was fixed.

From the perspective of the defender, open source also has advantages. Perhaps most importantly, it allows for code inspection. Thus, if the defender really wants to know that a particular feature is secure, he or she can simply examine the code—provided, of course, that the defender has the necessary security knowledge to spot a problem. Second, there is a sense that because many people can review the code, the code is inherently higher quality—as framed by Eric S. Raymond in his now-famous quote, “Given enough eyeballs, all bugs are shallow.”3 Finally, features that are problematic in a particular environment can be turned off by a sufficiently skilled programmer. Thus, when a vulnerability is found, the user doesn’t have to wait for a sanctioned patch: anyone can make the requisite changes to the code base.

From an attacker’s perspective, closed source means that only a small part of a given community has access to the code. Thus, to understand the internals, the attacker must reverse-engineer the binary; such a process is time consuming and, in the case of software that has been protected from such reverse engineering, nontrivial.

Furthermore, design mistakes may be harder to spot, as grasping the entire form of a large application is quite difficult when working only with compiled code.

Things are equally double-edged for the defender. When using a closed source product, the user is left entirely at the mercy of the code developer in terms of functionality changes or security patches. Thus, when a vulnerability is announced, the options for the defender are limited. Once again, differences in disclosure models help mitigate this somewhat, but ultimately, the user is left trusting the vendor. Self-help is not a practical option; code cannot be screened internally for structures that are worrisome in particular environments. Of course, these issues are compounded if the code to a closed source product is leaked; then the attacker has many of the benefits of the approach, with few of the downsides.

These fundamental properties are painted with a fairly broad brush, but in essence they encapsulate the systematic differences between the techniques in terms of attacker and defender. Space precludes a thorough examination of these differences, so we will turn our attention to the two that seem to have the most impact: vulnerability disclosure models and trust/validation.

Disclosure Models

One key difference between open and closed source processes is the vulnerability disclosure model that is typically shared within them. As open source’s nature is openness, when vulnerabilities are repaired it is trivial for an attacker to see exactly what was repaired and work back to the vulnerability and (probably) a working exploit. In the closed source world, it might not even be clear that a vulnerability existed or was fixed.

Because of this, open source tends to do badly from the perspective of “days of risk,” where one counts the time between the disclosure of a vulnerability and an “approved” fix. Some may find this unfair, but pragmatically history shows that the window between the public availability of a vulnerability/exploit and its patch is a difficult and dangerous time. In addition, while it is entirely possible (and practiced in several open source communities) to embargo security bug disclosure until a patch is available, the practice of no disclosure is still rarer in the open source community than the closed source community. In addition, the problem is compounded by the many different Linux distributions that contain open source components. If a component is updated by its creators, it is impractical to wait until all distributions that use it are ready to issue a validated patch.

The difference in disclosure models is a difficult problem for open source processes to solve. While one can argue that users can fix problems as they arise (thus, as soon as the problem is disclosed, the user writes a patch for his or her own use), this is a little far-fetched. Most users aren’t programmers, and those who are usually aren’t security experts. Thus, closed source benefits from its “closed” nature in this aspect—its worldview centers on keeping certain “secrets” secret.

Conversely, the open source world is based around information exchange. Changing the open source worldview on this matter with respect to security is really the crux of the solution but is somewhat in contradiction to the culture. Despite the solid progress several open source projects are making in this area (bugs are increasingly discussed in private, not in public forums), as soon as a patch is released it is trivial to determine the exact details of the patch. This makes developing an exploit for the previous version much simpler.

Trusting Trust

Ken Thompson’s paper “Reflections on Trusting Trust” is as important today as it was when first penned in 1984.4 Thompson illustrated the trust assumptions we make when deciding on security-related issues. Ultimately, he argues, we’re trusting far more than we might realize. The same argument holds when considering open/closed source security.

Classically, security people tend to think of the attacker as either a malicious insider or a third party. It’s also possible, however, to think of the software vendor—in its entirety—as untrustworthy (because one suspects the vendor is either malicious or incompetent). What then?

This change of focus in terms of trust can be a little startling, but isn’t entirely far-fetched. It doesn’t even require malfeasance on the part of the vendor. Consider a well-meaning (but foolish) vendor who, during an install, disables a critical piece of security software, with the intent of restoring it at the end of the install. Such a vendor could be unwittingly placing the user at risk. Incidents such as the Sony rootkit, used for DRM (digital rights management) purposes, also emphasize the sometimes misplaced trust placed in vendors. In each case, the closed source nature of the project put the user in jeopardy because there was no way—aside from reverse engineering—to determine the real functionality of the software.

There is also the issue of unethical vendors deliberately sneaking adware onto your computers under the guise of a “utility.” Vendors aren’t inherently trustworthy, and anyone who blindly makes the assumption that they are is either in denial or naïve.

In the case of an untrustworthy vendor, open source provides at least a mechanism by which a concerned entity can verify (within reason—remember the implications of Thompson’s paper) that all is well. Going to the trouble of auditing the entire code base for a project isn’t justified in many cases, but I can provide an example that is difficult to refute: voting software.

The idea of trusting a single vendor with the legitimacy of elections is, frankly, terrifying. With so much at stake, voting software must be verified by source inspection—who would trust a black-box approach to voting? Clearly, in the case of such software, an open source approach provides at least a mechanism by which the software’s veracity can be verified. Does one vote entered tally up with one vote counted in all scenarios? Although the process is nontrivial in an open source world, it’s really very challenging in a closed source scenario where one must resort to reverse-engineering the system. Thus, in some cases, it seems the open source approach clearly has the edge.

An interesting counterpoint can be found in security software. Consider antivirus software. While much antivirus software is signature-based, many different incarnations of generic virus protection exist that attempt to apply different techniques to stop new viruses. Such software is important, as it provides a first line of defense against rapid worms, which can become pandemic minutes after their initial release. Generally, such software is not theoretically secure—it is heuristic in nature and can be bypassed by an attacker with sufficient knowledge. This being the case, an open source approach is probably less attractive than a closed source one. Let’s at least make the life of the attacker a bit harder. If that sounds like security through obscurity, hold on to your seat for a moment: it is.

Security Through Obscurity?

The idea of “security through obscurity” has a horrible reputation among software engineers. I can still remember mentors through the years drumming into my head the idea that security by obscurity is no security at all (I expect that some of those fine scientists will contact me as they read this article to see where they went wrong in my education), but my belief is that the entire argument is highly contextual. For example, passwords are the perfect example of “acceptable” security through obscurity: they are useful only if the attacker doesn’t know them.

Again, let me illustrate my position by using an example: DRM software. Any time one is attempting to protect software from unauthorized copying, one runs into the idea of security through obscurity. Essentially, if the computer can run the software, it’s almost certainly going to be possible to copy it. Similarly, with a copy-protected document, if all else fails, I can always take a picture of my screen. Almost all DRM software is, at some level, security through obscurity: the bar is set only so high. The trick is making sure it is high enough to deter most attackers. Similarly, the protection offered by Microsoft Windows Vista’s much-discussed Kernel Patch Protection is of far less value if the source code is available. This would allow attackers to chart the fastest route around it.

A counterpoint once again highlights the context I’m talking about: encryption. As computer scientists, we can make encryption arbitrarily difficult to break given currently known technology. If breaking the code involves factoring a very large number, I can make good predictions of how much effort an attacker needs to spend, and that time doesn’t really depend on the attacker’s knowledge of my software or algorithm. For such software, the best route to security is to publish the algorithm and let it be independently verified. So, what’s the difference?

The difference between these cases is simple: determinism. In the case of the encryption software, the outcome is deterministic. Knowing everything about the mechanism doesn’t compromise the security of the outcome. In contrast, for antivirus software the system is heuristic. As such, some things benefit from disclosure, and some things don’t. In these two cases, it’s obvious. Unfortunately, that’s the exception, not the rule. The problem is that many systems contain aspects that are heuristic and aspects that are deterministic.

For a word processor, the question is different. You might like your word processor to work reliably, but the truth is that it contains bugs, and, potentially, security vulnerabilities. The closed source approach makes it expensive for anyone other than the developer to find those bugs. The open source approach means it’s easy for anyone trained in secure coding practices to find weaknesses. Both of these properties are double-edged, and it’s not clear which provides the best long-term outcome.

Conclusion

Part of the reason why this topic is interesting is because it is difficult: there are arguments on both sides that are compelling. By being able to understand the nuances of the question better, different aspects begin to become clear. Both development methodologies have intrinsic properties: which set of properties most appropriately fits for a particular application is contextual.

Unfortunately, the cases where one is clearly better than the other are few and far between. Most software sits somewhat uncomfortably between the two. In such cases, the makeup, philosophy, and training of the team behind the software are far more important than whether the project is open or closed source. Both methods can be done well, and both can be done badly.

Understanding where each method is strong and where it is weak is the first step toward process improvement. Instead of focusing on either/or decisions, perhaps it is ultimately more fruitful to follow both, using each where appropriate. Software engineering is a young discipline; time will answer if we approach the question with full knowledge of our assumptions and shortcomings.

References

Arbaugh, W. A., Fithen, W. L., McHugh, J. 2000. Windows of vulnerability: A case study analysis. IEEE Computer 33 (December): 52-59.
Anderson, R. J. 2002. Security in open versus closed systems—the Dance of Boltzmann, Coase and Moore. Presented at Open Source Software Economics.
Raymond, E. S. 1999. The Cathedral and the Bazaar. Sebastapol, CA: O’Reilly.
Thompson K. 1984. Reflections on trusting trust. Communications of the ACM 27(8): 761-763.

RICHARD FORD graduated from the University of Oxford in 1992 with a D.Phil. in quantum physics. Since that time, he has worked extensively in the area of computer security and malicious mobile code prevention. Previous projects include work on the Computer Virus Immune System at IBM Research and development of the world’s largest Web hosting system while director of engineering for Verio. Ford is an associate professor at Florida Institute of Technology, where he is the director of the Center for Security Sciences. His research interests include malicious mobile code, behavioral worm prevention, security metrics, and computer forensics. Ford is executive editor of Reed-Elsevier’s Computers and Security, Virus Bulletin and co-editor of a column in IEEE Security and Privacy.

Originally published in Queue vol. 5, no. 1—
Comment on this article in the ACM Digital Library