Cloud Debugging

Vol. 14 No. 2 – March-April 2016

Cloud Debugging

The Flame Graph:
This visualization of software execution is a new necessity for performance profiling and debugging.

An everyday problem in our industry is understanding how software is consuming resources, particularly CPUs. What exactly is consuming how much, and how did this change since the last software version? These questions can be answered using software profilers, tools that help direct developers to optimize their code and operators to tune their environment. The output of profilers can be verbose, however, making it laborious to study and comprehend. The flame graph provides a new visualization for profiler output and can make for much faster comprehension, reducing the time for root cause analysis.

by Brendan Gregg

Debugging Distributed Systems:
Challenges and options for validation and debugging

Distributed systems pose unique challenges for software developers. Reasoning about concurrent activities of system nodes and even understanding the system’s communication topology can be difficult. A standard approach to gaining insight into system activity is to analyze system logs. Unfortunately, this can be a tedious and complex process. This article looks at several key features and debugging challenges that differentiate distributed systems from other kinds of software. The article presents several promising tools and ongoing research to help resolve these challenges.

by Ivan Beschastnikh, Patty Wang, Yuriy Brun, Michael D, Ernst

Standing on Distributed Shoulders of Giants:
Farsighted Physicists of Yore Were Danged Smart!

If you squint hard enough, many of the challenges of distributed computing appear similar to the work done by the great physicists. Dang, those fellows were smart! Here, we examine some of the most important physics breakthroughs and draw some whimsical parallels to phenomena in the world of computing... just for fun.

by Pat Helland

Should You Upload or Ship Big Data to the Cloud?:
The accepted wisdom does not always hold true.

It is accepted wisdom that when the data you wish to move into the cloud is at terabyte scale and beyond, you are better off shipping it to the cloud provider, rather than uploading it. This article takes an analytical look at how shipping and uploading strategies compare, the various factors on which they depend, and under what circumstances you are better off shipping rather than uploading data, and vice versa. Such an analytical determination is important to make, given the increasing availability of gigabit-speed Internet connections, along with the explosive growth in data-transfer speeds supported by newer editions of drive interfaces such as SAS and PCI Express. As this article reveals, the aforementioned "accepted wisdom" does not always hold true, and there are well-reasoned, practical recommendations for uploading versus shipping data to the cloud.

by Sachin Date

Introducing Research for Practice:
Expert-curated guides to the best of CS research

Reading a great research paper is a joy. A team of experts deftly guides you, the reader, through the often complicated research landscape, noting the prior art, the current trends, the pressing issues at hand--and then, sometimes artfully, sometimes through seeming sheer force of will, expands the body of knowledge in a fell swoop of 12 or so pages of prose. A great paper contains a puzzle and a solution; these can be useful, enlightening, or both. A great paper is a small, structured quantum of human ingenuity, creativity, and labor, in service of a growing understanding of our world and the future worlds we may inhabit.

by Peter Bailis, Justine Sherry, Simon Peter

What Are You Trying to Pull?:
A single cache miss is more expensive than many instructions.

Saving instructions - how very 1990s of him. It’s always nice when people pay attention to details, but sometimes they simply don’t pay attention to the right ones. While KV would never encourage developers to waste instructions, given the state of modern software, it does seem like someone already has. KV would, as you did, come out on the side of legibility over the saving of a few instructions.

by George Neville-Neil

The Small Batches Principle:
Reducing waste, encouraging experimentation, and making everyone happy

The small batches principle is part of the DevOps methodology. It comes from the lean manufacturing movement, which is often called just-in-time manufacturing. It can be applied to just about any kind of process. It also enables the MVP (minimum viable product) methodology, which involves launching a small version of a service to get early feedback that informs the decisions made later in the project.

by Thomas A. Limoncelli

Nine Things I Didn’t Know I Would Learn Being an Engineer Manager:
Many of the skills aren’t technical at all.

When I moved from being an engineer to being a dev lead, I knew I had a lot to learn. My initial thinking was that I had to be able to do thorough code reviews, design and architect websites, see problems before they happened, and ask insightful technical questions. To me that meant learning the technology and becoming a better engineer. When I actually got into the role (and after doing it almost 15 years), the things I have learned--and that have mattered the most--weren’t those technical details. In fact, many of the skills I have built that made me a good engineer manager weren’t technical at all and, while unexpected lessons, have helped me in many other areas of my life.

by Kate Matsudaira