Failure

Vol. 15 No. 1 – January-February 2017

Failure

Making Money Using Math:
Modern applications are increasingly using probabilistic machine-learned models.

A big difference between human-written code and learned models is that the latter are usually not represented by text and hence are not understandable by human developers or manipulable by existing tools. The consequence is that none of the traditional software engineering techniques for conventional programs (such as code reviews, source control, and debugging) are applicable anymore. Since incomprehensibility is not unique to learned code, these aspects are not of concern here.

by Erik Meijer

Does Anybody Listen to You?:
How do you step up from mere contributor to real change-maker?

An idea on its own is not worth much. Just because you think you know a better way to do something, even if you’re right, no one is required to care. Making great things happen at work is about more than just being smart. Good ideas succeed or fail depending on your ability to communicate them correctly to the people who have the power to make them happen. When you are navigating an organization, it pays to know whom to talk to and how to reach them. Here is a simple guide to sending your ideas up the chain and actually making them stick. It takes three elements: the right people, the right time, and the right way.

by Kate Matsudaira

MongoDB’s JavaScript Fuzzer:
The fuzzer is for those edge cases that your testing didn’t catch.

As MongoDB becomes more feature-rich and complex with time, the need to develop more sophisticated methods for finding bugs grows as well. Three years ago, MongDB added a home-grown JavaScript fuzzer to its toolkit, and it is now our most prolific bug-finding tool, responsible for detecting almost 200 bugs over the course of two release cycles. These bugs span a range of MongoDB components from sharding to the storage engine, with symptoms ranging from deadlocks to data inconsistency. The fuzzer runs as part of the CI (continuous integration) system, where it frequently catches bugs in newly committed code.

by Robert Guo

The Debugging Mindset:
Understanding the psychology of learning strategies leads to effective problem-solving skills.

Software developers spend 35-50 percent of their time validating and debugging software. The cost of debugging, testing, and verification is estimated to account for 50-75 percent of the total budget of software development projects, amounting to more than $100 billion annually. While tools, languages, and environments have reduced the time spent on individual debugging tasks, they have not significantly reduced the total time spent debugging, nor the cost of doing so. Therefore, a hyperfocus on elimination of bugs during development is counterproductive; programmers should instead embrace debugging as an exercise in problem solving.

by Devon H. O'Dell

Research for Practice: Tracing and Debugging Distributed Systems; Programming by Examples:
Expert-curated Guides to the Best of CS Research

This installment of Research for Practice covers two exciting topics in distributed systems and programming methodology. First, Peter Alvaro takes us on a tour of recent techniques for debugging some of the largest and most complex systems in the world: modern distributed systems and service-oriented architectures. The techniques Peter surveys can shed light on order amid the chaos of distributed call graphs. Second, Sumit Gulwani illustrates how to program without explicitly writing programs, instead synthesizing programs from examples! The techniques Sumit presents allow systems to "learn" a program representation from illustrative examples, allowing nonprogrammer users to create increasingly nontrivial functions such as spreadsheet macros. Both of these selections are well in line with RfP’s goal of accessible, practical research; in fact, both contributors have successfully transferred their own research in each area to production, at Netflix and as part of Microsoft Excel. Readers may also find a use case!

by Peter Alvaro, Sumit Galwani

Too Big NOT to Fail:
Embrace failure so it doesn’t embrace you.

Web-scale infrastructure implies LOTS of servers working together, often tens or hundreds of thousands of servers all working toward the same goal. How can the complexity of these environments be managed? How can commonality and simplicity be introduced?

by Pat Helland, Simon Weaver, Ed Harris

Forced Exception-Handling:
You can never discount the human element in programming.

Yes, KV also reads "The Morning Paper," although he has to admit that he does not read everything that arrives in his inbox from that list. Of course, the paper you mention piqued my interest, and one of the things you don’t point out is that it’s actually a study of distributed systems failures. Now, how can we make programming harder? I know! Let’s take a problem on a single system and distribute it. Someday I would like to see a paper that tells us if problems in distributed systems increase along with the number of nodes, or the number of interconnections. Being an optimist, I can only imagine that it’s N(N + 1) / 2, or worse.

by George Neville-Neil