Facing the Strain

Kode Vicious - @kode_vicious

September 15, 2006
Volume 4, issue 7

Download PDF version of this article PDF

A koder with attitude, KV answers your questions. Miss Manners he ain’t.

APIs can change. Even the ones you’ve come to depend on over the years—the ones you thought were set in stone, indelible, immutable, pure. But fear not, because this month Kode Vicious offers his take on dealing with this most loathsome form of change. Encountered an equally annoying programming challenge? Write to Kode Vicious at [email protected] and vent until your heart’s content.

Dear KV,
I’ve been working on a software team that produces an end-user application on several different operating system platforms. I started out as the build engineer, setting up the build system, then the nightly test scripts, and now I work on several of the components themselves, as well as maintaining the build system. The biggest problem I’ve seen in building software is the lack of API stability. It’s OK when new APIs are added—you can ignore those if you like—and when APIs are removed I know, because the build breaks. The biggest problem is when someone changes an API, as this isn’t discovered until some test script—or worse, a user—executes the code and it blows up. How do you deal with constantly changing APIs?
Changes

Dear Changes,
The best way to deal with change is to bury your head in the sand and ignore it. After all, we can all learn from the great management traditions of the past, and engineers are no exception to this. Hmm, perhaps not.

What you point out is one of the biggest challenges in building large and complex systems. Software is amazingly malleable, and that makes it possible (and, unfortunately, quite probable) that someone will make a change, often one that will break your system. What many engineers and programmers don’t realize is that when they’re building a library, or really any component that others are supposed to depend on, the API becomes the contract between their code and everyone who uses it.

As you point out, there are really three ways in which these contracts change. The first, adding an API, won’t affect your system because with no one to call it, the new API can’t really cause much damage. The second case, removing an API, results in an immediate error when your program is linked, either at compilation or runtime, so at least you notice this before trying to use the code. The last case is the one that will give you fits and nightmares because there are very few automated ways of finding an API that looks the same, but isn’t. At one place I worked we dubbed this “changative change” for want of a better phrase, or, it would seem, a decent technical writer.

On one particular system about 80 percent of our problems were related to trying to reintegrate different subsystems. The problem, as you can imagine, grows quite quickly with the number of components involved. Two subsystems that depend on each other have at least one dependency, whereas four subsystems have six dependencies, and eight subsystems have 28, and so on. Testing all these possible combinations was referred to as the “matrix of pain.” Building up any sort of coherent system from a set of modules, all of which are changing, turns out to be very hard, but there are some solutions.

Operating systems people have long known about this problem, so APIs that programs depend upon tend to change slowly or not at all. The basic open(), close(), read(), write() system calls in Unix and Unix-like operating systems have taken the same arguments and returned the same types of values for 20-plus years. When subsystems are added, such as networking, new function calls are added as needed; hence, to open a network connection you don’t call open(), because that would require changing its arguments and therefore all the code that already used it. Instead, you have the socket() system call that takes different arguments but returns a value that is usable by read() and write(). Systems programmers also tend to narrowly define the set of functions they will provide because they know the nightmare of maintaining an arbitrarily wide set of APIs. FreeBSD, for example, has about 450 available system calls—that is, APIs that user programs call to get the OS to do something, such as read a file, open a socket, or find out the time. Although that number is not small, it is trackable and maintainable, whereas the number of APIs in the full set of Posix libraries or Microsoft Foundation Classes is far larger.

Another trick that can be adopted from the systems programming world is ioctl(), or I/O control. Device driver writers can do most of the work using the simple open(), close(), read(), and write() semantics, because what most people want from a device is to open, or use it; read data from and write data to it; and then put it away, or close it. Unfortunately, it is often necessary to have device-specific controls that can be easily exported upward to the operating system—for example, to set a network device into promiscuous listening mode or to set its various address parameters. These special cases are where ioctl() is used. The ioctl() call has been used, and abused, over the years, but the basic design principle is a sound one. Always leave yourself an escape route. With an ioctl() interface you can add nearly any extra command to your subsystem without breaking backwards compatibility or even adding a new function call.

Lastly, there is discipline, which some people very much enjoy, but this is not that kind of magazine. What I actually mean is that there has to be a decision made about how changes are introduced into a system. Changing things fast seems to be in vogue at the moment; the so-called extreme programming methodology is an example of this. Fast changes work when no one but you or your team has your code, but eventually other people will be using it and that’s when the trouble starts. Many engineers simply decide that at some point an API is set in stone and has too many callers to change, and so any changes require new APIs.

Unfortunately, I doubt I’ve solved your real problem, because unless you and your team write everything from scratch, you will be at the mercy of people who can, and will, cause you headaches. My only other advice is that your team use the smallest number of external APIs possible and not too many new or advanced features, as those are the ones mostly likely to change.
KV

KODE VICIOUS, known to mere mortals as George V. Neville-Neil, works on networking and operating system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are code spelunking, operating systems, and rewriting your bad code (OK, maybe not that last one). He earned his bachelor’s degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. He is an avid bicyclist and traveler who has made San Francisco his home since 1990.

Originally published in Queue vol. 4, no. 7—
Comment on this article in the ACM Digital Library