A Conversation with Peter Tippett and Steven Hofmeyr
There have always been similarities and overlap between the worlds of biology and computer science. Nowhere is this more evident than in computer security, where the basic terminology of viruses and infection is borrowed from biomedicine.
The two participants in this month’s conversation, Peter Tippett and Steven Hofmeyr, both come from backgrounds in the life sciences that led them to become leaders in the field of computer security.
Tippett, who refers to himself as “one of the graybeards” of the field, has both an M.D. and a Ph.D. in biochemistry from Case Western Reserve. He created “a little software company” and built the first anti-virus product that evolved into Norton Anti-Virus. His company, Certus International Corporation, merged with Symantec in 1992, and Tippett was made director of security and enterprise products at the Peter Norton Group of Symantec. Tippett advised the Joint Chiefs of Staff on cyberwarfare during Desert Storm. The national media often turns to him as their expert during news stories about computer security. He is now chief technology office of Cybertrust, a $160 million company created in 2004 through the merger of Betrusted and Trusecure. Based in Herndon, Virginia, Cybertrust provides information security technologies and services to companies and governments worldwide.
Hofmeyr is newer to the field, earning his Ph.D. from the University of New Mexico in 1999. His research investigated the crossover between biology and computation, and his studies also took him to the Artificial Intelligence Lab at MIT. Using his research as a base he founded Sana Security four years ago and now serves as its chief scientist. Sana, based in San Mateo, California, makes host-based intrusion prevention software. In 2003, MIT’s Technology Review named Hofmeyr as one of the top 100 young innovators under 35.
QUEUE Both of you have a background in the life sciences. To what degree has biological modeling influenced your approach to computer security?
STEVEN HOFMEYR I founded Sana Security, which does host-based intrusion prevention software, largely on the basis of research I’d done while working on a Ph.D. in biology at the University of New Mexico. My research was involved in investigating the crossover between biology and computation.
At the time I first started that research, I wasn’t particularly knowledgeable about or interested in computer security. But I was fascinated by the way the human immune system works and the model it offers for distributed detection and response. So largely for that reason, I ended up building an abstract model along much the same lines and applying it to computer security. It just seemed like a fairly obvious thing to do. The results have proved to be pretty good.
PETER TIPPETT I’ve been at this security game a lot longer, but there are otherwise some similarities to our stories. I have both an M.D. and a Ph.D. in biochemistry, and during my internship and residency in internal medicine, I ended up starting a little security software company to deal with the problem of computer viruses.
The tech world at the time didn’t really grasp the way that viruses work. But there’s not really much of a mystery to it for pretty much anyone in the biomedical field. In fact, the models for how viruses work are fairly trivial, so I just ended up applying that understanding to the way that computer viruses work and concluded that everyone was going to eventually need an anti-virus product.
Q What are the limitations of biological modeling? That is, does it always work? Can it be fooled?
SH The real advantage of using ideas from biology to improve artificial systems is that they can serve as a useful way of stimulating thought. The ideas might not always be applicable, but they can still give you a lot of power. For example, if you look at immunology and how it might apply to computer security, you start to see how much it has to teach us, because the body that the immune system is trying to protect is a highly complex, highly distributed environment. It’s also one that offers no centralized control. Yet the immune system is still largely effective because it manages to take advantage of some very clever little mechanisms.
The danger, though, lies in trying to use analogies like this where they don’t really fit. For example, the immune system has never had to evolve to the point where it could ensure the confidentiality of your genes, whereas in computer security we need to be very concerned about maintaining the confidentiality of systems. That requirement tends to change the whole nature of the game.
My philosophy has always been to use these biological metaphors and analogies wherever they seem as though they might help to solve a problem—and yet also know to abandon them whenever they don’t seem to offer anything particularly useful. That has been a very powerful approach for me.
PT I like that line of thinking a lot. But the thing is, most of the thinking about computer security is pretty poor. You end up asking yourself why. After all, these are digital machines we’re talking about. They’re very precise, and the people who think about them also tend to be very precise in their thinking. So how could we be so consistently off in our thinking? If you look at the data, you’ll see that—averaged over time—the number of successful attacks by worms and viruses has nearly doubled every year for the last seven years. The same is true for hacking. Those sorts of attacks have been tripling every year. Insider attacks have been growing at the rate of 15 to 20 percent a year. This constitutes a huge compound interest rate.
Bear in mind that this is the success rate I’m talking about here, not the attack rate. The attack rate is growing even faster. So here we are, spending more money in the expectation it will serve to make things better, but instead they just keep getting worse. It’s almost the definitive negative feedback loop. We’re obviously doing something wrong. My sense is that the errors we’re making are very similar to the errors that were made in the study of biology back when we were locked into certain dogmatic ways of thinking.
So maybe we haven’t really thought out the whole problem or come up with parameters for the problem that really make sense. One thing that I see, for example, is that we tend to think in terms of micro-vulnerabilities—the vulnerability of one particular application or module or code base—instead of the vulnerabilities that apply to the organization as a whole. Another issue is that we tend to think in terms of vulnerabilities instead of risks. We tend to engage in binary thinking instead of analog thinking. And we tend to put our focus on individual computers rather than on the behavior of whole communities of computers.
This is like the difference between treating cholera one patient at a time or instead choosing to do what Costa Rica did, which was to make sure the sewer lines didn’t run down the same side of the road as the drinking water. Those are two incredibly different countermeasures—one of which focuses entirely on the individual, while the other one concerns itself more with the health of the overall community.
Q Can you cite a parallel example to that having to do with computer security?
PT Consider the difference between patching and simply blocking things at your router. It’s incredible how much energy we spend patching things, given that patching has absolutely zero value more than half the time. That is, more than half the time you’re either never going to face the threat you’re patching for, or it’s already too late because you’ve got an institutional vulnerability to that threat. The other possibility, of course, is that the threat will never get to you because your organization has set up filters or firewalls or some sort of segmenting scheme that can effectively shield you.
This really points out the difference between taking an individual approach to computer security and more of an organizational approach. If it’s your own system at home that you’re concerned about, patching is the second most useful thing you can do to reduce your vulnerability to hacking and mal-code. But within a corporation, patching is about number 10 on the list of useful things you can do to reduce those same two risks. The organization is going to derive a lot more value from the use of filters at the perimeters, filters in its routers, and the implementation of sensible policies and rules of engagement.
SH That’s a very interesting perspective, particularly in reference to our earlier discussion of biological modeling, because we know that dense concentrations of people have implications where the spread of disease is concerned—smallpox epidemics, for example. We have a parallel in the technological world as a result of the massive amount of communication that passes between our devices. As connectivity continues to increase, so does our vulnerability to disease and attack.
PT That’s exactly right. When Code Red came along, for example, people ran around like crazy patching all their Web servers. Despite all those efforts, 38 percent of corporations still ended up suffering some sort of major or moderate disaster. What happened? Well, in all likelihood somebody had a Web server running on a development machine and wasn’t even aware of it. Chances are they were connected directly to the Internet by way of some home connection or hotel connection or whatever and so came to be infected. Then, when they resumed working on the company network, they effectively ended up bringing the infection across the barrier with them. From there, the infection probably spread like wildfire to Web servers throughout the network, and pretty soon the whole corporate network was down.
So all these new communication modes—VPNs and the like—actually turn out to be extremely effective means for exposing vulnerable systems on the inside of an organization to all sorts of diseases from the outside world that they have no immunity against.
Q Are there comparable security ramifications as a consequence of having full operating systems embedded in so many different things? That is, once my car is Wi-Fi, will I need to worry about a virus potentially bringing down the processors that control certain of the car’s key subsystems? Aren’t we just essentially providing an increased surface area for attacks?
SH Yes, I see the implications here as being potentially enormous. A lot of the concerns our customers are raising these days have to do with mobile computing devices—laptops mostly for now, but PDAs and embedded devices probably won’t be far behind. For the most part, the problems don’t have so much to do with self-replicating viruses and worms but rather with code generally termed malware—“malicious software” customized for some specific nefarious task. Dealing with malware can be especially problematic since the old signature-based approaches for isolating the source of trouble often don’t apply all that well. That’s because a lot of this malware, especially the stuff that’s driven by organized crime, often tends to be customized to attack just one particular corporation or one particular system. As a consequence, you end up seeing a piece of malware only once or twice. By the time you’ve developed a signature for the thing, it’s useless because you’re never going to see it again.
What makes this all the more daunting is that it’s very, very easy to take some piece of malware and modify it for further havoc. Anyone can download tools straight off the Web to ensure their little program can evade almost any anti-virus system.
As we come to be surrounded by all these devices that we utterly depend upon—and every last one of is a potential target for some piece of malware—we ought to be very concerned about the potential for harm.
PT Of course, embedded devices don’t necessarily have to communicate with other devices. All the same, it is cheaper and easier to use TCP/IP than to use some other protocol. It’s also cheaper and easier to use wireless technology than it is to run wires. So, just according to economics, the tendency is for manufacturers to stick with standards and to use what’s already out there. People are going to throw Linux on a chip and use it for every car subsystem they can, simply because it’s cheaper and easier to do things that way than to invent something new.
Having said all that, though, it wouldn’t be all that hard to turn off all the stuff that an intruder might use to cause trouble. Most attacks, of course, come—by definition—through an interface. If you’re using a computer to run any sort of sensitive device—whether it be a medical device, an automobile, an airplane, or whatever—you might want to take a long hard look at restricting, at the interface level, who that thing can communicate with, and what they’re allowed to do and what they’re not allowed to do.
I also think your comment about signatures becoming less and less useful is incredibly core. That’s especially interesting to me since I invented the first anti-virus system that was not signature-based. I basically catalogued all the software on any given system and then hashed it to detect any changes. The idea was to prevent the changes from happening, or at least prevent the change-code from executing. Because of that, an infected system would have no way to propagate the disease. Instead, it got better and better, such that it could effectively undo what had been changed. By the time I got all that working pretty well, however, back in the late ’80s, John McAfee had come along and made a fortune on his signature-based system.
I’d learned my lesson. It was fairly obvious that people prefer signature-based stuff. It’s just easier to understand, so the market buys it—even though it has all kinds of problems. For one thing, it’s got a zero-day problem, which is to say that if there’s a massive attack that strikes quickly, the community can’t possibly update signatures fast enough to save even half the community. The SQL Slammer attack, for example, infected 90 percent of all the computers that were ever going to get infected within the first 10 minutes. Even then, after the signatures finally came out, the half-life of getting them deployed proved to be something like a day. It was all over by the time the signatures got out there.
Still, to this day, the community continues to dislike and distrust nonsignature types of technologies. The big defense of signature-based systems—and there’s some truth to this—is that they don’t have many false positives. In all other sciences, we’ve come to accept that there are going to be some false positives and some false negatives associated with any test. But in computing, we’re so accustomed to the notion of precision that we have a hard time getting comfortable with any tests that aren’t absolutely perfect.
Signature-based tools tend to have very low false-positive rates because they don’t interfere. They don’t cry wolf unless they’re sure they’ve found something wrong, whereas the more generic heuristic systems used in biological research tend to achieve much better false-negative rates. The signature-based tools are very poor at finding things they don’t already know about, so they have a huge false-negative rate, whereas the generic systems are much better at detecting new, previously unfamiliar sorts of attacks and preventing them. Still, one of the undeniable downsides here is that these systems also end up occasionally stopping things that really shouldn’t be stopped.
SH Actually, a lot of the intrusion-detection systems that are based on signatures are notorious for having high false-positive rates.
Here’s another thought on the whole matter of false positives versus false negatives. If you look at the biological systems at work in nature, you’ll see that they’ve all evolved to tolerate at least some degree of inaccuracy. The human body, for example, can cope with a certain level of false positives. So, from that perspective, I’d have to say that our computer systems aren’t really all that evolved as yet. We haven’t built them to tolerate error to the same degree. I find that particularly interesting with regard to false positives, since, generally, it’s a little easier to predict the rate of false positives because they’re something you can measure in normal behavior. For that same reason, it’s also possible to predict their impact with reasonable accuracy.
Given that, you’d think it would be reasonably easy to design systems to accommodate false positives. It’s much harder, on the other hand, to measure the impact of false negatives as well as to predict whether they’re going to happen or not, because the occurrence of even just one false negative can be catastrophic.
One other comment: it’s interesting to see how the industry has managed to convince customers that if their systems get compromised while they’re using a signature-based system, it must be because they failed to update it. Strange, isn’t it?
PT It’s also important to remember that signature systems and nonsignature systems tend to behave differently with false positives and false negatives, so people need to use these systems in different ways.
If something has any chance of flagging a false positive, then people should avoid putting it into any sort of serial implementation. By contrast, if all you’re doing is putting in an intrusion detection system—sniffing a wire—you can tolerate a lot of false positives since all you’re really doing is writing a log somewhere. The traffic continues unimpeded because you’re not acting on the alarm in a direct and immediate way. But if you put in a system that stops any traffic that appears to be potentially malicious, you might be needlessly causing real harm to the business.
Both signature and nonsignature systems have their places. Signature-based systems are exceedingly good at stopping things you already know about, and you can refine the signature to weed out the false positives. That’s exactly what’s been done with all the anti-virus products, which certainly manage to substantially reduce the risks that companies would otherwise face since all the viruses that ever were out in the wild still are out there to one degree or another. Thus, if you were to do away with all of your signature-based protections, you’d surely be harmed at some level.
You also want to have nonsignature protections for all the bad things out there that you don’t know about yet. In general, these need to be built and deployed in such a way that they’ll have a very low negative impact when they’re used, understanding that they won’t be perfect. If you can manage to achieve 70, 80, or 90 percent effectiveness in terms of combating a given category of attack using countermeasures that aren’t signature-based, I think that’s absolutely fabulous. Let’s say you can get 80 percent effectiveness out of one particular router rule; you back that up with an IPS that itself is 80 percent effective; then by configuring your computer in a certain way, you add another layer of protection that’s 80 percent effective. Altogether you end up with an aggregate protection value of more than 99 percent. That sort of approach works exceedingly well. At Cybertrust, we call it “synergistic security,” and we think of it as one of the three major tenets of making corporations work.
SH I’d call it more of a “layered approach,” with reference once again to the biological inspiration. The human immune system, for example, employs multiple layers in much the same way. I think something that bears mention, though, is that when you have multiple levels of security, those layers need to be operating independently or you won’t be able to achieve maximum protection.
Sometimes organizations are so eager to implement multiple layers in order to achieve the synergistic effect you talked about that they forget that the more security layers there are, the greater the impact is likely to be on normal operations.
PT That’s why it’s so critical to configure them properly. The most common error is that people want near perfection in each layer. A more realistic goal would be for each layer to be about as effective as a seatbelt, which is 45 percent likely to prevent death in a car and about 50 percent effective in terms of reducing major injury. That’s great—particularly in concert with airbags, which on their own are about 30 percent effective in terms of preventing fatalities and about 40 effective in terms of reducing major injuries.
Together, those two measures add up to a tremendous level of protection. But if someone were to offer you a computer security product or configuration or policy or practice or procedure or architecture that offered you only a 40 or 50 percent level of effectiveness, you’d probably say, “Wow, this is horrible!” But if you change your attitude, you’ll find that there are all sorts of non-infringing, low-cost, low-maintenance things you can do to achieve between 50 and 80 percent effectiveness in protecting against certain risks. These things can be easily layered in such a way as to greatly increase their aggregate effectiveness. For example, you can mix policy measures with physical protections in much the same way that the human immune system comes into play only after the protective layer that your skin provides has been penetrated. Both of these protections work independently to guard you from infections. There are other layers as well: the oil on your skin, your body’s histamine response, the way that blood circulates throughout your body—each of these systems has its own role to play. And together, they’re much more effective than any one of them would be independently.
With computer security, it’s much the same. If we design our systems such that we accept multiple layers of countermeasures, each of which is within some reasonable range of effectiveness, and we design them around user productivity so as to avoid adverse impact on operations, then we wind up with an overall system that’s relatively inexpensive and yet still manages to provide very, very good organizational security.
SH Much of the typical literature on intrusion-detection systems tends to represent performance as a receiver operating-characteristics curve where you’re able to measure false positives versus false negatives—the idea being that by adjusting parameters in the system, you’ll move along the curve, producing more false positives and fewer false negatives, or vice versa. It has always struck me when looking at those charts that if you were to layer independent systems, you should be able to detune them such that the false negatives increase and the false positives drop dramatically. Thus, so long as you manage to keep the different systems independent, you should be able to get a huge benefit from layering them.
PT That’s exactly how it ought to be done, but people don’t understand that. A big part of that is that we continue to be obsessed with security at the single-computer level. So we concentrate on the micro-vulnerabilities to the exclusion of the much more significant organizational vulnerabilities.
SH I was thinking that very thing when we were discussing the possibility of a virus being communicated to all the various subsystems in your car. I think the interesting point you made there was that by limiting communications between those different subsystems, you can do a great deal to limit the amount of damage that might be inflicted. But as I look at the way that computer technology continues to evolve, it seems that if there’s any possibility whatsoever of getting a couple of devices to communicate with each other, then you can be absolutely certain that somebody will see some benefit in making that happen.
PT I think there actually are a couple of different drivers for this. One is that it’s generally easier for people just to grab something that’s already available—the standard protocols or anything else that can be taken right off the shelf. Generally speaking, people are going to take the whole thing instead of just the piece they actually need because, I mean, who cares how much hard-drive space the thing occupies. So they take a whole operating system, for example, and they just jam it in—full TCP/IP stacks, full wireless connectivity, lock, stock, and all. It fits well enough. And, of course, it’s certainly easy enough. And then on they blithely go to their next task, because who wants to spend a bunch of time whittling out the bits you’re not going to use?
The other driver here is the issue you suggested: that connectivity trumps security every time. It’s just so powerful and seductive to offer all this connectivity. That virtually ensures that almost everything gets connected. It’s just a fact of life, except in the case of very carefully designed systems. In an autopilot system for an airplane, for example, a lot of the connectivity is purposely designed out because you don’t want to create potential security holes in something that hundreds of lives are going to end up depending on.
SH The really interesting thing about the autopilot example is the fact that you know you’re developing a high-risk system—and that you’d better get things right because hundreds of lives lie in the balance. One of the major design challenges is that because of the way systems are connected, you can end up with a domino effect, where a problem with one system can cascade into other problems that cause major havoc—with none of that being all that apparent initially from an examination of the overall system design.
PT Exactly. The SQL Slammer worm, for example, took out a fifth of the ATM systems in America. If you needed to get money out of your ATM machine that day, you were just out of luck. That cost the banks many tens of millions of dollars. Also, Continental Airlines couldn’t fly during 12 hours of that day, and there were plenty of other organizations that had plenty of problems as well. But it turns out that the mainframes that handle the back end on all those ATM transactions were not themselves vulnerable to that particular attack. They were just unable to communicate with certain of the ATM systems because some of the network segments were overwhelmed with traffic. That’s how attacks work: the attackers look for vulnerabilities in the system and that’s where they strike. Other parts of the system just get pulled down in the course of the resulting domino effect.
Q Aren’t many of the attacks we’re seeing these days getting to be increasingly malicious in character? What does the future hold? Are things going to get grimmer still?
SH That’s hard to say, but we’re clearly in an arms race. We’re like the Red Queen in Alice in Wonderland, running hard just to keep in the same place. There’s no way of predicting when the defenders might get the edge or when the attackers are likely to pull ahead. One of the things that makes it impossible to predict is that technology just keeps on evolving. We keep building new systems that we think are going to make our lives easier. Of course, that sometimes just ends up making things easier for the bad guys. So everything that we do to add power and functionality also increases the potential for abuse.
One of the changes that we are witnessing now is that there definitely seems to be a surge in organized crime involvement in computer attacks. You hear increasingly about organizations being blackmailed or extorted by people who threaten to bring down the company’s Web site unless they pay some sort of “protection money.”
Conversely, we’ve seen fewer big worms of late. There are those who claim that we’re not seeing as many worms simply because the people who used to write worms are now involved in far more lucrative schemes that put them in league with organized crime. Certainly, it’s not as though we’re any less vulnerable to worms than we used to be. For example, someone could write a worm to exploit a recent vulnerability in Microsoft SNMP to hit Microsoft desktops and servers around the world. So I think there still is huge potential for worms. And I don’t think there’s any question that we’re going to continue to see a surge in organized crime involvement, especially in terms of targeting the weakest points, as we’ve already seen with the phishing-style attacks, where they just bypass all the heavyweight security mechanisms to go straight for the vulnerable client computers behind the cable modem.
PT Major worm attacks tend to come in spurts. We had Code Red in July 2001, and then Nimda a few months later. Then, we skipped about a year and a half to the SQL Slammer attack in January 2003. There were three major worms in a single month during August 2003. The most recent wave included Mydoom in January 2004, followed by Sasser three months later. So I don’t think we can be at all smug about being past the major worm attacks. A year to a year and a half seems to be the common interval. But a recent ICSA Labs study of 300 corporations shows that the impact of the day-in-day-out worms over the end of 2004 and beginning of 2005 was still higher than for each of the last 10 years, even though it was a period without a major worm event.
One of our offerings at Cybertrust is e-mail filtering. Right now, we’re throwing away about 80 to 90 percent of all the traffic before we pass it along to our clients. Just a year ago, we were tossing away only about 10 percent. So we’re facing an entirely different problem than before, where we’re effectively dealing with the pressure of attacks trying to come in through every pore. The thing to remember is that under these kinds of conditions, new pathways might open up that you would have never otherwise recognized as pathways.
In terms of predicting where all this leads, I have another observation that I think is worth making. Aviation was 1,000 times less safe 60 years ago than it is today. That’s to say that the likelihood of a commercial passenger dying was 1,000 times greater 60 years ago than it is today per passenger-seat-mile. If you analyze what has made that sort of progress possible, you’ll find that the security improvements made in the aviation field aren’t all that different from what we’re proposing for computers now. The aviation industry has benefited from security improvements made in the equipment itself and in the procedures they follow. You’ll notice that we’re pushing for many of the same things for computers.
While the 737s and 777s of today offer some obvious advantages over the old DC-3s, the changes in technology have been responsible for only a tiny proportion of the thousandfold reduction in risk—maybe just 1 percent of it. The remaining 99 percent improvement has come about as a result of two different developments. One has been the emergence of better standard operating procedures. At Cybertrust, we call them “essential practices.” That’s to distinguish them from best practices because essential practices are things that can be implemented by anybody using the people and products and budget they’ve already got.
The other important development in aviation has had to do with the emergence of crucial third-party systems: air traffic control, the FAA, and so on. It’s not the regulations that have actually been so important, but rather that there’s an official group working to understand the overall risk and issuing directives to industry about when to check and replace certain key parts and when to upgrade certain component subsystems. Having an air route traffic control system also helps.
My concern is that, in the computing field, we continue to bark up the technology tree—when there’s so much more to be gained by implementing appropriate essential practices and putting suitable third-party programs in place that ensure the proper implementation and maintenance of things.
SH: I think your airplane safety analogy might be a little misleading because, to really be comparable to the computer security challenges we’re facing today, the analogy would need to include a lot of terrorists who keep firing anti-aircraft rockets at the planes that are flying overhead. That is, I think one really important reason the aviation industry has realized that thousandfold improvement in safety is that it didn’t have to deal the whole while with adversaries who were dedicated to breaking the system.
So maybe a better analogy would have to do with improving the safety and security of fighter airplanes during a time of war. Because of dedicated adversaries, wartime is when technology tends to evolve very rapidly. Consequently, you end up with attackers who come up with inventive new anti-aircraft missiles that you then have to find a way to counter. You can institute all the process improvements and practices you want, but if you can’t evade those new anti-aircraft missiles, you will lose. Warfare has a rich history of victory through technology. In the wartime aviation scenario, and in computer security, technological solutions actually do make a huge difference.
PT Well, to my way of thinking, when the attackers do get around to doing something new, you can generally predict it. Just by monitoring what they say to each other and watching some other key indicators, you can typically get a pretty good sense for what’s coming down months—or even years—in advance. Then, just by adding a few more protections to guard against the threats that look to be coming down the pike, we’ve found that a company can achieve dramatic risk reductions, an average of about fortyfold in our measurements.
That does require a lot of instrumentation and knowledge of what’s going on in the world around you. And, yes, there may still be some directed attacks that manage to get around your defenses. But again, the idea here is not to reduce your risk level to zero. That’s just the wrong way to look at this. This is a negative problem we’re dealing with here, and you can never be certain that you’ve managed to solve a negative problem. The only thing you can do is to design scenarios and then work on materially improving your ability to deal with those scenarios. We want to better understand how it is that networks really fail or how corporations can be successfully attacked. Then you just work to put the dynamic systems in place that will allow you to measure, monitor, and keep track of what’s changing so that you can keep on top of those scenarios.
Originally published in Queue vol. 3, no. 5—
see this item in the ACM Digital Library