Weathering the Unexpected
Failures happen, and resilience drills help organizations prepare for them.
Resilience Engineering: Learning to Embrace Failure
A discussion with Jesse Robbins, Kripa Krishnan, John Allspaw, and Tom Limoncelli
Fault Injection in Production
Making the case for resilience testing
Verification of Safety-critical Software
Avionics software safety certification is achieved through objective-based standards.
Microsoft's Protocol Documentation Program:
Interoperability Testing at Scale
A Discussion with Nico Kicillof, Wolfgang Grieskamp and Bob Binder
The Meaning of Maintenance
Software maintenance is more than just bug fixes.
Take a Freaking Measurement!
Have you ever worked with someone who is a complete jerk about measuring everything?
Traipsing Through the QA Tools Desert
The Jeremiahs of the software world are out there lamenting, "Software is buggy and insecure!" Like the biblical prophet who bemoaned the wickedness of his people, these malcontents tell us we must repent and change our ways. But as someone involved in building commercial software, I'm thinking to myself, "I don't need to repent. I do care about software quality." Even so, I know that I have transgressed. I have shipped software that has bugs in it. Why did I do it? Why can't I ship perfect software all the time?
Orchestrating an Automated Test Lab
Networking and the Internet are encouraging increasing levels of interaction and collaboration between people and their software. Whether users are playing games or composing legal documents, their applications need to manage the complex interleaving of actions from multiple machines over potentially unreliable connections. As an example, Silicon Chalk is a distributed application designed to enhance the in-class experience of instructors and students. Its distributed nature requires that we test with multiple machines. Manual testing is too tedious, expensive, and inconsistent to be effective. While automating our testing, however, we have found it very labor intensive to maintain a set of scripts describing each machine's portion of a given test.
Sifting Through the Software Sandbox:
SCM Meets QA
Thanks to modern SCM (software configuration management) systems, when developers work on a codeline they leave behind a trail of clues that can reveal what parts of the code have been modified, when, how, and by whom. From the perspective of QA (quality assurance) and test engineers, is this all just "data," or is there useful information that can improve the test coverage and overall quality of a product?
Too Darned Big to Test
The increasing size and complexity of software, coupled with concurrency and distributed systems, has made apparent the ineffectiveness of using only handcrafted tests. The misuse of code coverage and avoidance of random testing has exacerbated the problem. We must start again, beginning with good design (including dependency analysis), good static checking (including model property checking), and good unit testing (including good input selection). Code coverage can help select and prioritize tests to make you more efficient, as can the all-pairs technique for controlling the number of configurations.
Quality Assurance:
Much More than Testing
Quality assurance isn't just testing, or analysis, or wishful thinking. Although it can be boring, difficult, and tedious, QA is nonetheless essential.
Black Box Debugging
Modern software development practices build applications as a collection of collaborating components. Unlike older practices that linked compiled components into a single monolithic application, modern executables are made up of any number of executable components that exist as separate binary files.
Uprooting Software Defects at the Source
Source code analysis is an emerging technology in the software industry that allows critical source code defects to be detected before a program runs. Although the concept of detecting programming errors at compile time is not new, the technology to build effective tools that can process millions of lines of code and report substantive defects with only a small amount of noise has long eluded the market. At the same time, a different type of solution is needed to combat current trends in the software industry that are steadily diminishing the effectiveness of conventional software testing and quality assurance.
Code Spelunking:
Exploring Cavernous Code Bases
Try to remember your first day at your first software job. Do you recall what you were asked to do, after the human resources people were done with you? Were you asked to write a piece of fresh code? Probably not. It is far more likely that you were asked to fix a bug, or several, and to try to understand a large, poorly documented collection of source code.
Debugging in an Asynchronous World
Pagers, cellular phones, smart appliances, and Web services - these products and services are almost omnipresent in our world, and are stimulating the creation of a new breed of software: applications that must deal with inputs from a variety of sources, provide real-time responses, deliver strong security - and do all this while providing a positive user experience. In response, a new style of application programming is taking hold, one that is based on multiple threads of control and the asynchronous exchange of data, and results in fundamentally more complex applications.
