Failure and Recovery

Vol. 8 No. 8 – August 2010

Failure and Recovery

A Paucity of Ports

Debugging an ephemeral problem

A Paucity of Ports

Debugging an ephemeral problem

Dear KV, 
I've been debugging a network problem in what should be a simple piece of network code. We have a small server process that listens for commands from all the other systems in our data center and then farms the commands out to other servers to be run. For each command issued, the client sets up a new TCP connection, sends the command, and then closes the connection after our server acknowledges the command.

In our original data center the controller had no problems, but then we moved to a larger data center with more client machines. Now we frequently cannot make connections when trying to execute commands, slowing down the overall system. It's such a simple design that I have no clue what could be going wrong. The controller itself is only one page of code! I suspect the network gear in the new data center is to blame, and that it cannot handle the load of incoming connections.

by George Neville-Neil

Articles

Computers in Patient Care: The Promise and the Challenge

Information technology has the potential to radically transform health care. Why has progress been so slow?

Computers in Patient Care: The Promise and the Challenge

Information technology has the potential to radically transform health care. Why has progress been so slow?

Stephen V. Cantrill, MD, FACEP

A 29-year-old female from New York City comes in at 3 a.m. to an ED (emergency department) in California, complaining of severe acute abdominal pain that woke her up. She reports that she is in California attending a wedding and that she has suffered from similar abdominal pain in the recent past, most recently resulting in an appendectomy. The emergency physician performs an abdominal CAT scan and sees what he believes to be an artifact from the appendectomy in her abdominal cavity. He has no information about the patient's past history other than what she is able to tell him; he has no access to any images taken before or after the appendectomy, nor does he have any other vital information about the surgical operative note or follow-up. The physician is left with nothing more than what he can see in front of him. The woman is held overnight for observation and released the following morning symptomatically improved, but essentially undiagnosed.

A vital opportunity has been lost, and it will take several months and several more physicians and diagnostic studies (and quite a bit more abdominal pain) before an exploratory laparotomy will reveal that the woman suffered from a rare (but highly curable) condition, a Meckel's diverticulum. This might well have been discovered that night in California had the physician had access to complete historical information.

by Stephen V. Cantrill

Injecting Errors for Fun and Profit

Error-detection and correction features are only as good as our ability to test them.

Injecting Errors for Fun and Profit

Error-detection and correction features are only as good as our ability to test them.

Steve Chessin, Oracle

"That which isn't tested is broken." —Author unknown

"Well, everything breaks, don't it, Colonel." —Monty Python's Flying Circus

by Steve Chessin