KV is back on duty and ready to treat another koding illness: bad APIs. This
is one of the most widespread pathologies affecting, and sometimes infecting,
us all. But whether we write APIs or simply use APIs (or both), we would all
do well to read on and heed the vicious one’s advice. And as always,
your ongoing kode-related questions are welcomed and appreciated: firstname.lastname@example.org.
Dear Kode Vicious,
I’ve been reading your rants for a few months now and was hoping you could read one of mine. It’s a pretty simple rant, actually: it’s just that I’m tired of hearing about buffer overflows and don’t understand why anyone in his or her right mind still uses strcpy(). Why does such an unsafe routine continue to exist at all? Why not just remove the thing from the library and force people to migrate their code? Another thing I wonder is, how did such an API come to exist in the first place?
Yours for Better APIs
Yes, it’s true, some APIs just seem to be obtuse or written to trip you up. Usually this is not due to evil intent on the part of the koder. As my grandmother used to say, “Never attribute to malice that which can be adequately explained by stupidity.” Oh, wait, no, my grandmother said, “If you can’t say something nice about someone, don’t say anything at all.” I have given only brief attention to both of these pieces of advice throughout my life, but my grandmother was a wise woman. The fact is that you can’t even blame stupidity most of the time; you most often have to blame the inability of people to be omniscient.
You see, way back in the mists of time, computers weren’t networked and were programmed by a small group of dedicated professionals using a well-constructed set of tools and libraries. These professionals understood their tools intimately and didn’t really think about people attacking their computer programs because many of them worked in research labs, and because most of their programs didn’t handle money. Certainly some of these people thought about security, but not in the way one would have to think about it after hundreds of millions of people gained access to computers and the Internet. Before we hooked everything to the Internet, life was good—programmers laughed and played all day, while dreaming of larger disk drives and dynamic RAM. At least, that’s the story as I’ve heard it. So, at the time that strcpy() was written, most programmers thought only about their own mistakes, as opposed to someone trying to take over their computers via the network and a buffer overflow attack.
As you said, though, the buffer overflow attack has been discussed to death, and perhaps we ought to think about what makes strcpy() such a problematic API instead of hammering on buffer overflows. After all, people are still building APIs that are insecure and poorly thought out, and perhaps we should shove them, if not into the sea, then in the right direction.
Part of the problem comes from the definition of the string itself. A string is just a pointer to a NULL-terminated set of bytes. Let’s think about some things we would need to know before passing this hunk of memory around to other APIs. One important question is, “How big is it?” Yes, a bit off-color, but in this case, size actually does matter. If you are on the receiving end of a string, and you don’t know how long it is, there really is no way to handle it safely. You have to scan the entire thing until you find the terminating NULL, and even when you do, it might be the wrong one.
A second problem with strings really has to do with how memory is allocated and controlled in programs. Pointers to memory tell you only where the memory starts, not how much you’re really supposed to use. Since it is more efficient to manage memory in terms of groups of bytes, which the operating system calls pages, your program is not going to get a clear signal if it accidentally writes past the space you thought was allocated to it. There is no way for the program to know, without the use of special tools and libraries, when it has gone too far. Of course, the special tools and libraries slow your program down so you can’t use them all the time, and even when you do, you have to design sufficient tests to see if your code has any holes in it.
And so now we come to strcpy(), which for those who may not have ever seen this routine, looks like this:
char *strcpy(char *destination, char *source)
and which is supposed to copy bytes from source to destination, including the terminating NULL byte, so that when the routine returns, you have a copy of source pointed to by destination. This API has several problems:
So, let’s abstract the bad qualities a bit and try to state them more clearly. First of all, there is no way for the API to validate its arguments. Not validating arguments leads to errors. Errors are bad. Secondly, there is no clear way to communicate an error status. Errors happen; they should be checked for and returned. The programmer who does not check for errors is a bad person and will suffer eternal debugging sessions forever after, amen. Thirdly, the arguments and return values are confusing. Why return something you already passed into the routine when you don’t need to? Lastly, the documentation does not warn us about any of the possible boundary conditions and what we might expect if they occur.
I could, of course, go on and on and find horrific APIs that make strcpy() look like a walk in the park with Aunt Rose, but I’m limited to 1,200 or so words. So, what do good APIs look like or what should they look like? Well in KV’s highly biased opinion, a good API has several attributes:
KODE VICIOUS, known to mere mortals as George V. Neville-Neil, works on networking and operating system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are code spelunking, operating systems, and rewriting your bad code (OK, maybe not that last one). He earned his bachelor’s degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. He is an avid bicyclist and traveler who has made San Francisco his home since 1990.
Originally published in Queue vol. 3, no. 9—
see this item in the ACM Digital Library
Follow Kode Vicious on Twitter
Have a question for Kode Vicious? E-mail him at email@example.com. If your question appears in his column, we'll send you a rare piece of authentic Queue memorabilia. We edit e-mails for style, length, and clarity.
Ivar Jacobson, Ian Spence, Ed Seidewitz - Industrial Scale Agile - from Craft to Engineering
Essence is instrumental in moving software development toward a true engineering discipline.
Andre Medeiros - Dynamics of Change: Why Reactivity Matters
Tame the dynamics of change by centralizing each concern in its own module.
Brendan Gregg - The Flame Graph
This visualization of software execution is a new necessity for performance profiling and debugging.
Ivar Jacobson, Ian Spence, Brian Kerr - Use-Case 2.0
The Hub of Software Development