In their quest to solve the next big computing problem or develop the next disruptive technology, software engineers rarely take the time to look back at the history of their profession. What’s changed? What hasn’t changed? In an effort to shed light on these questions, we invited three members of ACM Queue’s editorial advisory board to sit down and offer their perspectives on the continuously evolving practice of software engineering. We framed the discussion around the bread and butter of every developer’s life, tools and technologies, and how the process of software development has changed (or not changed) with the rise of certain popular development methodologies such as Agile and open source. This is part one of their conversation, with part two to follow in a subsequent issue of Queue.
Steve Bourne is chairman of Queue’s editorial advisory board. Bourne’s name will be familiar to anyone who uses Unix (or its descendants), as he developed the legendary Bourne shell command-line interface. While a member of the Seventh Edition Unix team at Bell Labs in the 1970s, Bourne also developed the Unix debugger adb. Beyond his contributions to Unix and its associated tooling, Bourne has held senior engineering and management positions at Cisco, Sun Microsystems, Digital Equipment, and Silicon Graphics. He is also past president of ACM, where he continues to be active in various advisory roles. In his current position as CTO of Eldorado Ventures, he evaluates new technologies and advises the firm on technology investments.
Our second panelist is Eric Allman, developer of the sendmail MTA (mail transfer agent), currently used to deliver more than 60 percent of the world’s e-mail. Allman is chief science officer of Sendmail Inc., the company he cofounded in 1998 as the commercial wing of the open source sendmail project. While at UC Berkeley in the 1970s and 1980s, Allman led the development of the well-known INGRES database management system and wrote a number of system utilities for the BSD Unix (now FreeBSD) operating system. Allman rounds out his professional and commercial endeavors with volunteer positions at ACM and Usenix and frequently speaks at conferences around the world.
Bryan Cantrill from Sun Microsystems has already made significant and lasting contributions to software engineering. Along with Sun’s Mike Shapiro and Adam Leventhal, Cantrill designed and implemented DTrace, a tool for dynamic instrumentation of production systems that helps companies identify and fix performance bottlenecks. DTrace, which has proven itself indispensable to many companies, won the Wall Street Journal’s top Technology Innovation Award in 2006. In 2005, Cantrill was named by MIT’s Technology Review as one of the top 35 technologists under the age of 35, and by InfoWorld as one of its Innovators of the Year. This year he and his colleagues were awarded the Usenix STUG (Software Tools User Group) award for DTrace.
STEVE BOURNE Let’s begin by discussing tools. Eric, what’s your current set of tools and what do you like or dislike about them?
ERIC ALLMAN I have to admit to being kind of old-fashioned. It’s just like in my kitchen, where I’d rather have a few really sharp knives that I know how to use than a whole bunch of appliances that don’t do anything particularly well. Part of this is because of bad experiences in the past. I remember spending a certain amount of time with a symbolic debugger, which I believe was not adb, spending close to a week trying to find a bug in a program only to finally realize it was a bug in the debugger. That soured me on symbolic debuggers for several years, so I went back to using binary debuggers. That was on a PDP-11 where you could actually read the assembly language.
There are certainly tools today, such as symbolic editors, that I haven’t really gotten used to, either, in part because of the simple GUI issues. Moving your hand over to a mouse and then back to the keyboard when you’re typing text is to me not a good optimization, so I still use vi. Emacs is fine, except I don’t have that many fingers, so it doesn’t work for me.
I guess I have finally gotten over some of my hang-ups. I do use symbolic debuggers now; they seem to be reliable enough.
BRYAN CANTRILL Yes, welcome to the 1970s. But I’m similar in that I very much adhere to the Unix philosophy of having a small number of tools that do their jobs very well. For me, the holy trinity would be vim—which is basically vi, but with improvements—to edit; the compiler to compile; and mdb to debug. Mdb is an adb derivative with very powerful extensions that allow for powerful debugging methodologies to be developed. And then DTrace to debug in situ for performance problems.
Beyond that I use cscope a lot for source-code navigation, and TeamWare, which is one of the best things no one has ever heard about. It’s a Sun program that’s really the granddaddy of the bring-over/modify/merge model of development. BitKeeper and TeamWare are very much in the same vein, in part because Larry McVoy is behind both of them.
I don’t use Eclipse, not necessarily because I have anything against it or NetBeans or these other kinds of higher-level IDEs. It’s more that they are not appropriate for the problems that we’re solving. So I’m still old school in that regard.
SB Can you elaborate on that?
BC I think that each of these IDEs was designed with a particular problem in mind and that the success of the IDE really depends on the ability to outgrow that single problem to other problem spaces. If you look at Eclipse and NetBeans, what they both have in common is their pluggable architectures, which are not mated to any one language or operating-system environment. They have both done an admirable job. What they both have in common in terms of their limitations is that they’re designed around developing a single program or entity of a program that they have defined—a servlet or what have you—that fits within their model.
When you start developing a larger system, they break down a little bit. Certainly for C they break down because a lot of their added value involves things like code completion. With code completion, not only are you able to implement in Java, it’s really a requirement. But in C, code completion is much trickier to do dynamically, so it’s of more dubious utility.
I don’t want to be curmudgeonly, but one of the things that concerns me about using code completion is it’s the kind of tool that can become a crutch insofar as it becomes a replacement for understanding the system. I no longer have to understand the system because I can auto-complete all this stuff. I’m able to quickly stagger my way to code that compiles, but if that code reflects less understanding of the system, it makes me nervous.
One of the things I love about the Unix philosophy is that I don’t have to read the man (manual) pages. I don’t need to read the man page for open, for read, for write. I don’t need to read the man page for Awk or for the shell because these things have simple abstractions that are powerful but simple enough that I’m able to retain them. The thing I worry about with code completion is that it allows for this kind of metastasis of complexity.
Steve, you designed some of these basic essential tools like adb and the shell. Are they simple by design, or are they simple because they had to be, because of the limitations of the 16-bit address space?
SB A little bit of both, but I think the environment then was not as complicated as it is today. For example, one of the things I liked about the Unix development environment back in those days was that you could find things. If you said “man x,” then it either said it was there or it wasn’t there. If it wasn’t there, it wasn’t there, and if it was there, you found it. The problem today is finding methods and libraries that do things you want to do. Maybe it’s easier to do if you’re in a development shop that has already been set up.
Just to give you a little background on our environment, when we were developing Unix we actually had a couple of rooms in the attic at Bell Labs with Model 33 Teletypes and a couple of Tektronix 4014s, which actually were very nice for software development because they had big screens and they refreshed fast. We had no Ethernet and no network except UUCP. What was important from the software development point of view was that you released your software into that environment. Presumably you tested it, and if you didn’t, guess who yelled at you? It was all the guys in the room who were using it. You got instant feedback on whether there were bugs in it and whether people liked it or not. If it didn’t work, guess what? It was your reputation at stake and that was pretty instant peer-review feedback, which I think is extremely important in software development. Some of that has gotten lost in some of these big projects, where the engineers essentially have no personal or any other kind of accountability for the quality of the code they deliver into the environment.
EA One of the Extreme Programming XT rules is either you have customers who work with the programmers full-time or who are always available, depending on how extreme they want to get. In Steve’s example his peers were his own customers.
BC The best software is always developed by those who actually need it themselves. That was certainly the motivation for us in developing mdb and DTrace. It was the software that we needed, so we designed it for what we wanted, and when you can have those two in the same brain, you can develop the best possible software.
Obviously, we’re not developing a banking system for ourselves; and for developers out there in the world, I think Agile and XP attempt to rectify that by having that customer voice as close as possible to the developer.
I think the trick, in any environment, is how do you assure that you are your own consumer? Maybe it’s by having the customer sitting over your shoulder. Maybe it’s what Steve did in the Unix environment at Bell Labs, and what we do here, which is to be sure that those innovations are pumped back into the common environment. Everyone is using a common environment, so even if you are not the most immediate consumer, the person next to you is, and that feedback helps you become your own customer.
SB I would just like to underline one thing. The customer has two roles here. One is: what does the customer want? The other involves peer review. There’s a responsibility and a reputation that you have in a group, which is really important and doesn’t involve the external customers so much as it involves the engineers whom you’re working with. You release to your own group and they’re using the stuff that you’re using, and that provides instant feedback on the work that you do. I believe it actually helps improve the quality of the overall group output, as well as the quality of the products that the group is producing.
BC That peer pressure is incredibly powerful. We’ve got a very public model at Sun, where failures are very public. That can be too powerful at times, because it can punish failure more than it rewards success. You have to balance the two, and you need both the positive reinforcement and the negative reinforcement. I do wonder, though, when you’ve got development efforts online, with collaborators who are spread across multiple geographic areas, whether some of that face-to-face positive and negative reinforcement disappears.
EA It depends on the group. Some distributed companies out there make an effort at least once, preferably twice, a year to get everyone together into one room so that instead of just being an e-mail address, they actually have faces. It’s a lot easier to say something inappropriate to somebody who is just an e-mail address than to somebody whom you have actually gone out drinking with. I think that kind of cohesion really helps. For a lot of open source development, which is what I’ve been doing pretty much all of my life, you don’t have travel budgets most of the time so that’s not feasible.
SB I was surprised that the open source model and distributed development actually appear to work quite well. Eric, maybe you could say a little bit about why it works, and how, for example, integration control and quality control are done.
EA First of all—and I may be committing heresy here—but what we see are the successful projects; we don’t see all the ones that have failed, so it looks like open source development is this great solution to all these problems. In fact, it can be done very badly. There’s an old maxim that you can write spaghetti code in any language, and I think that’s definitely true with development methodologies as well. The ones that do it well use a couple of different models. One is the benevolent dictator model, in which one person ultimately does all of the integration and so forth. This is pretty much how Linux works at this point, where Linus [Torvalds] is the dictator and he has lieutenants who help out, but it’s a very tree-structured approach.
BC Surprisingly so, I might add. Linux is more hierarchical than any other project I have encountered in proprietary software products.
EA Yes, that is kind of surprising. In contrast, FreeBSD has a much more spread-out network. There is no one who is absolutely in control. A core team has to make some of the big decisions, but that team consists of around 30 people. That seems to work really well, and they’ve done a lot of work to structure that so that the communication paths work. One of the other reasons that it works is that it’s a big project and there are a lot of folks who are working on just their little pieces of the system, so the integration doesn’t have to be done continuously. It’s not like you change the kernel and everyone who is on the system crashes. That was kind of the way we did it at Berkeley when we were developing the VM/Unix stuff. We ate our own dog food, and that meant there were crashes a lot of the time, but it also meant that we fixed them very quickly.
BC That’s a very important general principle in terms of using your own software. We call that “avoiding the quality death spiral.” Solaris went through a very interesting transition. Prior to Solaris 2.5, there was much more of a, for lack of a better word, waterfall model in terms of the way new releases were dispersed to people. As a result, people would not run the latest bits on their desktop or on the server; they would develop their own little bits and integrate them into a whole that they never saw. Solaris was in the quality death spiral because once people refused to use the latest stuff because it was known to be broken, then people used the latest stuff less and less and it got to be more and more broken.
To break the quality death spiral, you’ve got to force people to use the latest stuff. I think it’s much more important when you’re in a distributed environment where you don’t necessarily have the kind of immediate peer pressure to do that.
EA Probably, although I don’t like the concept of forcing people to use things because they won’t. You can’t really force them to do stuff. You’ve got to get them to want to use it. I think if you’re providing good enough quality, then most people will use it, particularly if they feel like they’re part of the development effort.
The traditional models in which you hold the product away from the users until you’ve done all the debugging, and then throw it over the wall to them, produce a lower quality than having the users always running the latest test version so they’re giving constant feedback.
SB I was involved in the Solaris development early on and one of the things we tried to do there was bring the test capability to the desktop of the engineers so that they had the tools they needed. I would be interested in how that has progressed, because I’ve been out of the corporate engineering business for a while. Can the engineers effectively test their code in the environment that it’s being shipped in?
EA One of the things I learned about sendmail a long time ago is that it’s really hard to write a simulator for the Internet that’s not bigger than the Internet itself. To a certain extent you can’t, but we certainly do have test labs. We have special programs designed to create load artificially. We have basic sources and sinks, and we will go in and intentionally introduce errors. There’s actually some code in sendmail to force timeouts and things, to make sure that that kind of thing is working.
BC In our group we’re really focused on having easy-to-run test suites. I feel the mistake that we made in DTrace development was not starting the test suite soon enough. On this [new] project, we developed the test suite moments after the first line of code was written, so we have a complete test suite that we try to run. There are problems when you do that. The test suite right now takes a long time to run. It takes several hours now where it used to take seconds and then minutes, so engineers are running only those portions of the test suite that they know affect their code. In general, that’s the right decision to make, but you do end up with tradeoffs when you have a tightly integrated test suite.
EA You can automate running them so that everything gets run every night.
BC You can do that, but someone has to watch the results. We’ve got one engineer here who is very diligent about watching the results, and he got frustrated because he was the only one doing that. The test suite would be broken for days on end, and no one else would be investigating it.
I think the other problem that we run into is that certain aspects of the system are very hard to test. How do you test creating a link aggregation? How do you test the networking configurations? How do you test things that are going to change the storage configuration of a box?
EA It’s hard to automate pulling an Ethernet jack out of the back of a machine.
BC Exactly. We’ve done things similar to what you’ve done, Eric, in terms of developing little frameworks for injections of various faults, simulating various faults, and so on.
The problem is, if you see a failure associated with the injection of a fault, is it the failure that you would see if the fault were true, or is it an artifact of the way the fault was injected? So, we end up with tests that fail, even though that failure is effectively a failure of the way we’re testing.
What you end up doing is sleeping, effectively. You end up doing something and then waiting for it to happen, then making sure that it happened. When you have these kinds of large, asynchronous things, and you’re injecting a fault, how do you synchronously know whether the system has behaved in the right way or not? Sometimes you can’t, and if you can’t know synchronously, you basically have to wait. How long do you wait? If it doesn’t do it in five seconds, does that constitute failure? Ten? Fifteen?
This is why the tests take so damn long to run. Someone says, “I figured out why the testing has taken so long to run: we’re sleeping all the time.” Then you try to remove some of the sleeps and your tests start failing. It’s a first-class engineering endeavor to develop those tests because it is so challenging.
Part of the reason I’m interested in virtualization is as a development methodology. It has not delivered on this, but one of the things that I ask is can I use virtualization to automate someone pulling the Ethernet cable out of the jack? I can get a lot closer to simulating it if you let me create a toy virtual machine than I can running on the live machine.
EA You brought up some things that have bothered me for a long time. There are two things, actually, which sound like they’re the same but are subtly different. One is how do you test the tests? The second one is how do you verify the results of tests? For things that are completely algorithmic, it’s easy. You diff the output, basically. It should be the same. But for all the interesting stuff, there’s a randomness thrown in so you see the output and it’s different from what it was before. Is it different in an innocuous way or in some serious way? Once again, how do you know that when the test says, “Yes, it’s fine,” that it really is fine? I’ve never seen a solution to that other than getting better coders to write the tests than the coders who wrote what the tests are testing.
BC Isn’t this a halting problem of sorts for testing? If you could test the tests, how would you test the test that tests the tests? Ultimately, you have to have a human in the loop to verify that the system is doing the right thing at some level, and you try to minimize what the human has to do.
But to your other point about the test breaking and not knowing whether that’s a failure or not, I had a test recently that needed some stable form of input, so I picked usr/dict/words because usr/dict/words on Solaris had not changed for four years. Clearly this is not a file that’s changing, so I can go ahead and depend on usr/dict/words.
EA I think I know where this is going.
BC You know exactly where this is going. Moments after I integrated it, the first change in four years to usr/dict/words happened. Solaris already has an anemic usr/dict/words, so I don’t know why that was the time to add simply one word to our already anemic usr/dict/words. But it obviously broke the test. I ended up spending too much time looking at my code, thinking usr/dict/words obviously hasn’t changed before I ultimately backtracked to realizing that it actually had changed.
EA That’s my experience in debugging the debugger.
SB Just an anecdote here. A debugger is the only program I’ve ever written where three machines are playing: the machine you’re debugging, the machine you’re running on, and the machine the debugger was compiled on.
DTrace seems to be a real leap forward in debugging. Bryan, can you tell us a little bit about why you did it and what you wish you had done differently?
BC The reason we did it is the same reason you guys did your things: we needed it. We were trying to debug incredibly complicated systems, and Sun was building larger and larger systems with SMP. Our systems got dramatically larger and more complicated in a very short period of time. We had an SMP kernel in Solaris, and we were struggling to understand the system when it failed fatally.
That’s why Mike [Shapiro] developed mdb, and I helped him by developing some of the intelligent modules we can plug into mdb. Once we had actually diagnosed the fatal failures of the system, then we had this problem of the transient failures. Why does the system suck?
BC Yes, the software is up, it’s functioning correctly, but it’s sucking at some level. I was working in the Solaris performance group—that’s why I originally came to Sun, to work with [Sun Fellow and CTO for storage] Jeff Bonwick—and we were grasping at straws using these tools that would give you only the happy/sad state of the system. The tools would tell us, “Here’s the number of operations you’re doing, here’s your percent utilization,” and so on, and then we would try to back-calculate where we were in the software stack. The problem is that you’re looking at the lowest layer of the software stack, trying to draw inferences about the highest layer of the software stack.
What we didn’t realize when we set out to do DTrace is how acute this problem was, and the problem was so much worse for people that were developing Java or PHP or Python or Ruby, because they’re at an even higher level of abstraction. They’re inducing more unintended work out of the system. They’ve got systems that suck even more than ours.
That’s the reason we developed DTrace. Historically, we have had two branches of our code. We have had the branch that is debuggable, with all this ifdef debug junk in it, and then we’ve had the branch that we ship.
It’s kind of absurd that where the bugs are most critical—in those production environments—we’ve got the least amount of infrastructure to understand what is going on.
EA I’ve argued that you just ship it with the debugging in it for two reasons: one is you don’t want to ship something different from what you tested; and second, you always need to test stuff.
BC There’s a certain level of debugging infrastructure that you should ship, but the problem is that when you are looking at the debugging infrastructure in the very bowels of the system, that debugging infrastructure has costs associated with it. Even if it’s as simple as loading a flag to indicate that something is not enabled, that’s a load, a compare, and a branch. That costs. And when you do a load, a compare, and a branch when you are scheduling a thread, you will have a system that is too slow to ship. You’ll have the Linux guys laughing at you because your scheduler is slow. It’s a little hard to make the argument, “Our scheduler is slow because we need to debug it when it’s broken.”
That’s not something a user of that scheduler wants to hear. We realized we needed to change that model, and that’s what DTrace does. My final observation on DTrace is that there should be no probe effect when the instrumentation is disabled. If I’m not asking the question, then my app runs just as fast as if it weren’t there at all.
EA Right, and that’s pretty profound.
SB I’d like to ask Eric a similar question. Debugging code that deals with network events as opposed to system events is a different game. I’d be interested in knowing what the challenges have been in debugging sendmail over the years.
EA The obvious one is you’re dealing with multiple machines running a protocol at the same time. It’s sometimes not clear which end of the connection you’re debugging. You need to make sure that the output gets someplace usable, which is actually harder than it looks. A lot of people don’t realize I wrote syslog, which is a standard tool now, in the process of writing sendmail, precisely because I needed to have this place where it would go, which was not stdout, because stdout wasn’t connected to anything. At that point, there were log files scattered all over the system, so I used MPX files. That’s what I originally built syslog on.
Other things are a little subtler—timing issues, for example. You’re sometimes dealing with TCP/IP. TCP/IP implementations vary far more vastly than anyone really wants to admit, and certainly for SMTP a lot of these little things can turn into great big things.
BC This is an interesting common theme among the three of us. Eric developed syslog because it was a problem he needed to understand and debug sendmail; we developed DTrace because it was a tool that we needed to understand Solaris; and Steve, you developed adb because it was a technology that you needed to understand your programs.
SB Exactly. It’s rather similar to Eric’s experience in that the debuggers that existed at the time interpreted the information a little more than I was comfortable with. That was why I wrote adb—just so I knew that what I saw was what I got.
BC Can you elaborate on that?
SB First of all, I couldn’t find the answers to my questions, because only a limited window of information was coming out of the debugger at the time. DB was the early debugger in Unix, and that was fairly simple but had a bunch of things missing. Also, we were moving to the separate I&D space on the PDP, so there were changes in the system environment that needed to be reflected in the debugger, and capabilities in the a.out files that were being changed that weren’t being reflected in the debugger.
It wasn’t so much that the debugger was broken when it was written; it’s that it wasn’t keeping track of the changes in the system environment that we were debugging.
BC Sounds like bit rot.
SB Yes, it was bit rot. Actually, at the time, I had written an Algol 68 compiler that I was porting to the PDP-11 and I wanted to have stuff in there to interpret the Algol 68 stack trace.
BC This is the famous “$a”?
SB This is the $a, exactly.
BC I discovered $a when I attempted to write $q while using adb at Sun in 1996. I typed $a by accident and Steve Bourne, like an apparition from the past, whispered to me, “No Algol 68 here,” which is the message you get out of adb when you type $a.
I was flabbergasted! Suddenly, the deepest heart of the system was speaking directly to me. No Algol 68 here? There hadn’t been Algol 68 in Solaris ever, and so needless to say, $a has become something of legend. I believe Steve knows, but I’ll let folks discover it for themselves, in mdb we have implemented $a. In adb compatibility mode it gives you the message, “No Algol 68 here.” Mdb gives you its own smarmy message on $a.
One more question: What was the moment when you knew that the investment in the tools you developed paid off?
EA Probably almost instantly. Syslog was one of those things where it was just so obvious from the start that it was going to be valuable. It wasn’t something where you go, “Well, I’ll just build this little tool,” and then three months later you say, “Oh, wow, I’m glad I wrote that tool.” I was trying to debug things that were basically undebuggable.
SB I don’t remember a particular event, but I knew fairly early on that you couldn’t use the other debuggers to do what we were doing, so it was almost like there wasn’t another choice at the time. It wasn’t until many years later that an ACM Fellow, whose name I’ll omit, said, “Adb is really cool. It just does what it does, and that’s it.”
EA I have another example, which is slightly different. I wrote a front end for the SCCS (source code control system) at Berkeley. I had written a source management system while I was in high school that did tape-to-tape stuff on the IBM 1401, so I was actually an early adopter of that sort of stuff. Back then people weren’t keeping change sets. You couldn’t do deltas, you couldn’t see your history. So I wrote this front end primarily for myself, but I was able to convince Bill Joy et al. that they should use it on Berkeley Unix. When we finally got to the point where we were saying, “OK, everything in the system is actually going to be under source management,” that was a real “Finally... and I had something to do with that” kind of moment.
Originally published in Queue vol. 6, no. 4—
see this item in the ACM Digital Library