System Failures

Vol. 2 No. 8 – November 2004

System Failures

Articles

Oops! Coping with Human Error in IT Systems

Human operator error is one of the most insidious sources of failure and data loss in today's IT environments. In early 2001, Microsoft suffered a nearly 24-hour outage in its Web properties as a result of a human error made while configuring a name resolution system. Later that year, an hour of trading on the Nasdaq stock exchange was disrupted because of a technicians mistake while testing a development system. More recently, human error has been blamed for outages in instant messaging networks, for security and privacy breaches, and for banking system failures.

by Aaron B. Brown

Automating Software Failure Reporting

There are many ways to measure quality before and after software is released. For commercial and internal-use-only products, the most important measurement is the user's perception of product quality. Unfortunately, perception is difficult to measure, so companies attempt to quantify it through customer satisfaction surveys and failure/behavioral data collected from its customer base. This article focuses on the problems of capturing failure data from customer sites. To explore the pertinent issues I rely on experience gained from collecting failure data from Windows XP systems, but the problems you are likely to face when developing internal (noncommercial) software should not be dissimilar.

by Brendan Murphy

Error Messages: What's the Problem?

Computer users spend a lot of time chasing down errors - following the trail of clues that starts with an error message and that sometimes leads to a solution and sometimes to frustration. Problems with error messages are particularly acute for system administrators (sysadmins) - those who configure, install, manage, and maintain the computational infrastructure of the modern world - as they spend a lot of effort to keep computers running amid errors and failures.

by Paul P. Maglio, Eser Kandogan

Outsourcing: Devising a Game Plan

Your CIO just summoned you to duty by handing off the decision-making power about whether to outsource next years big development project to rewrite the internal billing system. That's quite a daunting task! How can you possibly begin to decide if outsourcing is the right option for your company? There are a few strategies that you can follow to help you avoid the pitfalls of outsourcing and make informed decisions. Outsourcing is not exclusively a technical issue, but it is a decision that architects or development managers are often best qualified to make because they are in the best position to know what technologies make sense to keep in-house. Deciding what should and should not be outsourced is key to a successful game plan.

by Adam Kolawa

Lack of Priority Queuing Considered Harmful

Most modern routers consist of several line cards that perform packet lookup and forwarding, all controlled by a control plane that acts as the brain of the router, performing essential tasks such as management functions, error reporting, control functions including route calculations, and adjacency maintenance. This control plane has many names; in this article it is the route processor, or RP. The route processor calculates the forwarding table and downloads it to the line cards using a control-plane bus. The line cards perform the actual packet lookup and forwarding. Although individual vendors or models may differ slightly in implementation, the salient points remain the same.

by Vijay Gill

Interviews

A Conversation with Bruce Lindsay

If you were looking for an expert in designing database management systems, you couldn't find many more qualified than IBM Fellow Bruce Lindsay. He has been involved in the architecture of RDBMS (relational database management systems) practically since before there were such systems. In 1978, fresh out of graduate school at the University of California at Berkeley with a Ph.D. in computer science, he joined IBM's San Jose Research Laboratory, where researchers were then working on what would become the foundation for IBM's SQL and DB2 database products. Lindsay has had a guiding hand in the evolution of RDBMS ever since.

Curmudgeon

Programming in Franglais

When I was studying French in high school, we students often spoke "Franglais": French grammar and words where we knew them, English inserted where our command of French failed us. It was pretty awful, and the teacher did not think highly of it. But we could communicate haltingly because we all had about the same levels of knowledge of the respective languages.

by Rodney Bates

Kode Vicious

Kode Vicious Strikes Again

Dear Kode Vicious, I have this problem. I can never seem to find bits of code I know I wrote. This isn't so much work code--that's on our source server--but you know, those bits of test code I wrote last month, I can never find them. How do you deal with this?

by George Neville-Neil