Sort By:

A Paucity of Ports:
Debugging an ephemeral problem

I’ve been debugging a network problem in what should be a simple piece of network code. We have a small server process that listens for commands from all the other systems in our data center and then farms the commands out to other servers to be run. For each command issued, the client sets up a new TCP connection, sends the command, and then closes the connection after our server acknowledges the command.

by George Neville-Neil | August 24, 2010


Another Day, Another Bug:
We asked our readers which tools they use to squash bugs. Here’s what they said.

As part of this issue on programmer tools, we at Queue decided to conduct an informal Web poll on the topic of debugging. We asked you to tell us about the tools that you use and how you use them. We also collected stories about those hard-to-track-down bugs that sometimes make us think of taking up another profession.

by Queue Readers | October 2, 2003


Debugging Devices:
What is the proper way to debug malfunctioning hardware?

I suggest taking a very sharp knife and cutting the board traces at random until the thing either works, or smells funny! I gather you’re not asking the same question that led me to use the word changeineer in another column. I figure you have an actually malfunctioning piece of hardware and that you’ve already sent three previous versions back to the manufacturer, complete with nasty letters containing veiled references to legal action should they continue to send you broken products.

by George Neville-Neil | January 8, 2009


Debugging Incidents in Google’s Distributed Systems:
How experts debug production issues in complex distributed systems

This article covers the outcomes of research performed in 2019 on how engineers at Google debug production issues, including the types of tools, high-level strategies, and low-level tasks that engineers use in varying combinations to debug effectively. It examines the research approach used to capture data, summarizing the common engineering journeys for production investigations and sharing examples of how experts debug complex distributed systems. Finally, the article extends the Google specifics of this research to provide some practical strategies that you can apply in your organization.

by Charisma Chan, Beth Cooper | June 6, 2020


Debugging on Live Systems:
It’s more of a social than a technical problem.

I’ve been trying to debug a problem on a system at work, but the control freaks who run our production systems don’t want to give me access to the systems on which the bug always occurs. I haven’t been able to reproduce the problem in the test environment on my desktop, but every day the bug happens on several production systems.

by George Neville-Neil | September 13, 2011


Divide and Conquer:
The use and limits of bisection

Bisection is of no use if you have a heisenbug that fails only from time to time. These subtle bugs are the hardest to fix and the ones that cause us to think critically about what we are doing. Timing bugs, bugs in distributed systems, and all the difficult problems we face in building increasingly complex software systems can't yet be addressed by simple bisection. It's often the case that it would take longer to write a usable bisection test for a complex problem than it would to analyze the problem whilst at the tip of the tree.

by George V. Neville-Neil | July 26, 2021


Enhanced Debugging with Traces:
An essential technique used in emulator development is a useful addition to any programmer’s toolbox.

Creating an emulator to run old programs is a difficult task. You need a thorough understanding of the target hardware and the correct functioning of the original programs that the emulator is to execute. In addition to being functionally correct, the emulator must hit a performance target of running the programs at their original realtime speed. Reaching these goals inevitably requires a considerable amount of debugging. The bugs are often subtle errors in the emulator itself but could also be a misunderstanding of the target hardware or an actual known bug in the original program. (It is also possible the binary data for the original program has become subtly corrupted or is not the version expected.)

by Peter Phillips | March 31, 2010


Getting Off the Mad Path:
Debuggers and assertions

KV continues to grind his teeth as he sees code loaded with debugging statements that would be totally unnecessary if the programmers who wrote the code could be both confident in and proficient with their debuggers. If one is lucky enough to have access to a good debugger, one should give extreme thanks to whatever they normally give thanks to and use the damn thing!

by George V. Neville-Neil | January 31, 2022


Outsourcing Responsibility:
What do you do when your debugger fails you?

Dear KV, I’ve been assigned to help with a new project and have been looking over the admittedly skimpy documentation the team has placed on the internal wiki. I spent a day or so staring at what seemed to be a long list of open-source projects that they intend to integrate into the system they have been building, but I couldn’t find where their original work was described.

by George Neville-Neil | July 1, 2014


Research for Practice: Tracing and Debugging Distributed Systems; Programming by Examples:
Expert-curated Guides to the Best of CS Research

This installment of Research for Practice covers two exciting topics in distributed systems and programming methodology. First, Peter Alvaro takes us on a tour of recent techniques for debugging some of the largest and most complex systems in the world: modern distributed systems and service-oriented architectures. The techniques Peter surveys can shed light on order amid the chaos of distributed call graphs. Second, Sumit Gulwani illustrates how to program without explicitly writing programs, instead synthesizing programs from examples! The techniques Sumit presents allow systems to "learn" a program representation from illustrative examples, allowing nonprogrammer users to create increasingly nontrivial functions such as spreadsheet macros.

by Peter Alvaro, Sumit Galwani | March 29, 2017


Stone Knives and Bear Skins

If you look at the software tooling landscape, you see that the majority of developers work with either open-source tools; or tools from the recently reformed home of proprietary software, Microsoft, which has figured out that its Visual Studio Code system is a good way to sucker people into working with its platforms; or finally Apple, whose tools are meant only for its platform. In specialized markets, such as deeply embedded, military, and aerospace, there are proprietary tools that are often far worse than their open-source cousins, because the market for such tools is small but lucrative.

by George V. Neville-Neil | July 18, 2023


The Debugging Mindset:
Understanding the psychology of learning strategies leads to effective problem-solving skills.

Software developers spend 35-50 percent of their time validating and debugging software. The cost of debugging, testing, and verification is estimated to account for 50-75 percent of the total budget of software development projects, amounting to more than $100 billion annually. While tools, languages, and environments have reduced the time spent on individual debugging tasks, they have not significantly reduced the total time spent debugging, nor the cost of doing so. Therefore, a hyperfocus on elimination of bugs during development is counterproductive; programmers should instead embrace debugging as an exercise in problem solving.

by Devon H. O'Dell | March 22, 2017


To Catch a Failure: The Record-and-Replay Approach to Debugging:
A discussion with Robert O’Callahan, Kyle Huey, Devon O’Dell, and Terry Coatta

When work began at Mozilla on the record-and-replay debugging tool called rr, the goal was to produce a practical, cost-effective, resource-efficient means for capturing low-frequency nondeterministic test failures in the Firefox browser. Much of the engineering effort that followed was invested in making sure the tool could actually deliver on this promise with a minimum of overhead. What was not anticipated, though, was that rr would come to be widely used outside of Mozilla?and not just for sleuthing out elusive failures, but also for regular debugging.

by Robert O'Callahan, Kyle Huey, Devon O'Dell, Terry Coatta | March 28, 2020


Wanton Acts of Debuggery:
Keep your debug messages clear, useful, and not annoying.

Dear KV, Why is it that people who add logging to their programs lack the creativity to differentiate their log messages? If they all say the same thing—for example, DEBUG—it’s hard to tell what is going on, or even why the previous programmer added these statements in the first place.

by George Neville-Neil | October 24, 2011