A Plea to Software Vendors from Sysadmins—10 Do's and Don'ts

What can software vendors do to make the lives of sysadmins a little easier?

Thomas A. Limoncelli, Google

A friend of mine is a grease monkey: the kind of auto enthusiast who rebuilds engines for fun on a Saturday night. He explained to me that certain brands of automobiles were designed in ways to make the mechanic's job easier. Others, however, were designed as if the company had a pact with the aspirin industry to make sure there are plenty of mechanics with headaches. He said those car companies hate mechanics. I understood completely because, as a system administrator, I can tell when software vendors hate me. It shows in their products.

A panel discussion at CHIMIT (Computer-Human Interaction for Management of Information Technology) 2009 discussed a number of do's and don'ts for software vendors looking to make software that is easy to install, maintain, and upgrade. This article highlights some of the issues uncovered. CHIMIT is a conference that focuses on computer-human interaction for IT workers—the opposite of most CHI research, which is about the users of the systems that IT workers maintain. This panel turned the microscope around and gave system administrators a forum to share how they felt about the speakers who were analyzing them.

Here are some highlights:

1. DO have a "silent install" option. One panelist recounted automating the installation of a software package on 2,000 desktop PCs, except for one point in the installation when a window popped up and the user had to click OK. All other interactions could be programmatically eliminated through a "defaults file." Linux/Unix tools such as Puppet and Cfengine should be able to automate not just installation, but also configuration. Deinstallation procedures should not delete configuration data, but there should be a "leave no trace" option that removes everything except user data.

2. DON'T make the administrative interface a GUI. System administrators need a command-line tool for constructing repeatable processes. Procedures are best documented by providing commands that we can copy and paste from the procedure document to the command line. We cannot achieve the same repeatability when the instructions are: "Checkmark the 3rd and 5th options, but not the 2nd option, then click OK." Sysadmins do not want a GUI that requires 25 clicks for each new user. We want to craft the commands to be executed in a text editor or generate them via Perl, Python, or PowerShell.

3. DO create an API so that the system can be remotely administered. An API gives us the ability to do things with your product you didn't think of. That's a good thing. System administrators strive to automate, and automate to thrive. The right API lets me provision a service automatically as part of the new employee account creation system. The right API lets me write a chat bot that hangs out in a chat room to make hourly announcements of system performance. The right API lets me integrate your product with a USB-controlled toy missile launcher. Your other customers may be satisfied with a "beep" to get their attention; I like my way better (http://www.kleargear.com/5004.html).

4. DO have a configuration file that is an ASCII file, not a binary blob. This way the files can be checked into a source-code control system. When the system is misconfigured it becomes important to be able to "diff" against previous versions. If the file can't be uploaded back into the system to re-create the same configuration, then we can't trust that you're giving us all the data. This prevents us from cloning configurations for mass deployment or disaster recovery. If the file can be edited and uploaded back into the system, then we can automate the creation of configurations. Archives of configuration backups make for interesting historical analysis.¹

5. DO include a clearly defined method to restore all user data, a single user's data, and individual items (for example, one e-mail message). The method to make backups is a prerequisite, obviously, but we care primarily about the restore procedures.

6. DO instrument the system so that we can monitor more than just, "Is it up or down?" We need to be able to determine latency, capacity, and utilization, and we need to be able to collect this data. Don't graph it yourself. Let us collect and analyze the raw data so we can make the "pretty picture" graphs that our nontechnical management will understand. If you aren't sure what to instrument, imagine the system being completely overloaded and slow: what parameters would we need to be able to find and fix the problem?

7. DO tell us about security issues. Announce them publicly. Put them in an RSS feed. Tell us even if you don't have a fix yet; we need to manage risk. Your PR department doesn't understand this, and that's OK. It is your job to tell them to go away.

8. DO use the built-in system logging mechanism (Unix syslog or Windows Event Logs). This allows us to leverage preexisting tools that collect, centralize, and search the logs. Similarly, use the operating system's built-in authentication system and standard I/O systems.

9. DON'T scribble all over the disk. Put binaries in one place, configuration files in another, data someplace else. That's it. Don't hide a configuration file in /etc and another one in /var. Don't hide things in \Windows. If possible, let me choose the path prefix at install time.

10. DO publish documentation electronically on your Web site. It should be available, linkable, and findable on the Web. If someone blogs about a solution to a problem, they should be able to link directly to the relevant documentation. Providing a PDF is painfully counterproductive. Keep all old versions online. The disaster recovery procedure for a 5-year-old, unsupported, pathetically outdated installation might hinge on being able to find the manual for that version on the Web.

Software is not just bits to us. It has a complicated life cycle: procurement, installation, use, maintenance, upgrades, deinstallation. Often vendors think only about the use (and some seem to think only about the procurement). Features that make software more installable, maintainable, and upgradable are usually afterthoughts. To be done right, these things need to be part of the design from the beginning, not bolted on later.

Be good to the sysadmins of the world. As one panelist said, "The inability to rapidly deploy your product affects my ability to rapidly purchase your products."

I should point that that this topic wasn't the main point of the CHIMIT panel. It was a very productive tangent. When I suggested that each panelist name his or her single biggest "don't," I noticed that the entire audience literally leaned forward in anticipation. I was pleasantly surprised to see software developers and product managers alike take an interest. Maybe there's hope, after all.
Q

References

1. Plonka, D., Tack, A. J. 2009. An analysis of network configuration artifacts. In Proceedings of the 23rd Large Installation System Administration Conference (November): 79-91.

Acknowledgments

I would like to thank the members of the panel: Daniel Boyd, Google; Æleen Frisch, Exponential Consulting and author; Joseph Kern, Delaware Department of Education; and David Blank-Edelman, Northeastern University and author. I was the panel organizer and moderator. I would also like to thank readers of my blog, www.EverythingSysadmin.com, for contributing their suggestions.

LOVE IT, HATE IT? LET US KNOW

[email protected]

Thomas A. Limoncelli is an author, speaker, and system administrator. His books include The Practice of System and Network Administration (Addison-Wesley) and Time Management for System Administrators (O'Reilly). He works at Google in New York City and blogs at http://EverythingSysadmin.com.

Originally published in Queue vol. 8, no. 12—
Comment on this article in the ACM Digital Library

More related articles:

Adam Oliner, Archana Ganapathi, Wei Xu - Advances and Challenges in Log Analysis
Computer-system logs provide a glimpse into the states of a running system. Instrumentation occasionally generates short messages that are collected in a system-specific log. The content and format of logs can vary widely from one system to another and even among components within a system. A printer driver might generate messages indicating that it had trouble communicating with the printer, while a Web server might record which pages were requested and when.

Mark Burgess - Testable System Administration
The methods of system administration have changed little in the past 20 years. While core IT technologies have improved in a multitude of ways, for many if not most organizations system administration is still based on production-line build logistics (aka provisioning) and reactive incident handling. As we progress into an information age, humans will need to work less like the machines they use and embrace knowledge-based approaches. That means exploiting simple (hands-free) automation that leaves us unencumbered to discover patterns and make decisions.

Christina Lear - System Administration Soft Skills
System administration can be both stressful and rewarding. Stress generally comes from outside factors such as conflict between SAs (system administrators) and their colleagues, a lack of resources, a high-interrupt environment, conflicting priorities, and SAs being held responsible for failures outside their control. What can SAs and their managers do to alleviate the stress? There are some well-known interpersonal and time-management techniques that can help, but these can be forgotten in times of crisis or just through force of habit.

Eben M. Haber, Eser Kandogan, Paul Maglio - Collaboration in System Administration
George was in trouble. A seemingly simple deployment was taking all morning, and there seemed no end in sight. His manager kept coming in to check on his progress, as the customer was anxious to have the deployment done. He was supposed to be leaving for a goodbye lunch for a departing co-worker, adding to the stress. He had called in all kinds of help, including colleagues, an application architect, technical support, and even one of the system developers. He used e-mail, instant messaging, face-to-face contacts, his phone, and even his office mate’s phone to communicate with everyone. And George was no novice.