Human Factors

Cognitive Work of Hypothesis Exploration During Anomaly Response:
A look at how we respond to the unexpected

Four incidents from web-based software companies reveal important aspects of anomaly response processes when incidents arise in web operations, two of which are discussed in this article. One particular cognitive function examined in detail is hypothesis generation and exploration, given the impact of obscure automation on engineers’ development of coherent models of the systems they manage. Each case was analyzed using the techniques and concepts of cognitive systems engineering. The set of cases provides a window into the cognitive work "above the line" in incident management of complex web-operation systems.

by Marisa R. Grayson

Beyond the Fix-it Treadmill:
The Use of Post-Incident Artifacts in High-Performing Organizations

Given that humanity’s study of the sociological factors in safety is almost a century old, the technology industry’s post-incident analysis practices and how we create and use the artifacts those practices produce are all still in their infancy. So don’t be surprised that many of these practices are so similar, that the cognitive and social models used to parse apart and understand incidents and outages are few and cemented in the operational ethos, and that the byproducts sought from post-incident analyses are far-and-away focused on remediation items and prevention.

by J. Paul Reed

Above the Line, Below the Line:
The resilience of Internet-facing systems relies on what is above the line of representation.

Knowledge and understanding of below-the-line structure and function are continuously in flux. Near-constant effort is required to calibrate and refresh the understanding of the workings, dependencies, limitations, and capabilities of what is present there. In this dynamic situation no individual or group can ever know the system state. Instead, individuals and groups must be content with partial, fragmented mental models that require more or less constant updating and adjustment if they are to be useful.

by Richard I. Cook

Master of Tickets:
Valuing the quality, not the quantity, of work

Many silly metrics have been created to measure work, including the rate at which tickets are closed, the number of lines of code a programmer writes in a day, and the number of words an author can compose in an hour. All of these measures have one thing in common: They fail to take into account the quality of the output. If Alice writes 1,000 lines of impossible-to-read, buggy code in a day and Carol writes 100 lines of well-crafted, easy-to-use code in the same time, then who should be rewarded?

by George V. Neville-Neil

Managing the Hidden Costs of Coordination:
Controlling coordination costs when multiple, distributed perspectives are essential

Some initial considerations to control cognitive costs for incident responders include: (1) assessing coordination strategies relative to the cognitive demands of the incident; (2) recognizing when adaptations represent a tension between multiple competing demands (coordination and cognitive work) and seeking to understand them better rather than unilaterally eliminating them; (3) widening the lens to study the joint cognition system (integration of human-machine capabilities) as the unit of analysis; and (4) viewing joint activity as an opportunity for enabling reciprocity across inter- and intra-organizational boundaries.

by Laura M.D. Maguire

Securing the Boot Process:
The hardware root of trust

The goal of a hardware root of trust is to verify that the software installed in every component of the hardware is the software that was intended. This way you can verify and know without a doubt whether a machine’s hardware or software has been hacked or overwritten by an adversary. In a world of modchips, supply chain attacks, evil maid attacks, cloud provider vulnerabilities in hardware components, and other attack vectors it has become more and more necessary to ensure hardware and software integrity.

by Jessie Frazelle

The Way We Think About Data:
Human inspection of black-box ML models; reclaiming ownership of data

The two papers I’ve chosen for this issue of acmqueue both challenge the way we think about and use data, though in very different ways. In "Stop Explaining Black-box Machine-learning Models for High-stakes Decisions and Use Interpretable Models Instead," Cynthia Rudin makes the case for models that can be inspected and interpreted by human experts. The second paper, "Local-first Software: You Own Your Data, in Spite of the Cloud," describes how to retain sovereignty over your data.

by Adrian Colyer

Revealing the Critical Role of Human Performance in Software:
It’s time to revise our appreciation of the human side of Internet-facing software systems.

Understanding, supporting, and sustaining the capabilities above the line of representation require all stakeholders to be able to continuously update and revise their models of how the system is messy and yet usually manages to work. This kind of openness to continually reexamine how the system really works requires expanding the efforts to learn from incidents.

by David D. Woods, John Allspaw

Sign up for QueueNews

acmqueue app

Join ACM