March/April 2020 issue of acmqueue The March/April 2020 issue of acmqueue is out now

Subscribers and ACM Professional members login here



The Morning Paper

Development

  Download PDF version of this article PDF

The Morning Paper

How Do Committees Invent? and Ironies of Automation

The formulation of Conway's law and the counterintuitive consequences of increasing levels of automation

Adrian Colyer

The Lindy effect tells us that if a paper has been highly relevant for a long time, it's likely to continue being so for a long time to come as well. For this issue's selections I am going back to a couple of papers that have most definitely stood the test of time. The lessons they contain could continue to bear fruit for many more years.

My first choice, from 1968, is entitled "How Do Committees Invent?" This is the paper that gave us Conway's law, and while we all know that law today, author Melvin E. Conway provides a lot of great material that led up to the formulation of the law that bears his name. It's one of those wonderful papers that tends to give you fresh takeaways every time you come to it. It should be good for another 52 years, according to the Lindy effect ;).

For my second choice we go forward in time to 1983, with Lisanne Bainbridge's "Ironies of Automation." It's a classic treatise on the counterintuitive consequences of increasing levels of automation, and something oh-so-relevant to this forthcoming decade. Where will the experts be when we need them?

 

How Do Committees Invent?

Conway, M. E. 1968. How do committees invent? Datamation 14(4), 28-31.

With thanks to Chris Frost for recommending this paper—a great example of a case where we all know the law (Conway's law in this case), but many of us have not actually read the original ideas behind it.

 

We're back in 1968, a time when it was taken for granted that before building a system, it was necessary to design it. The systems under discussion are not restricted to computer systems either, by the way—one example is a public transport network. Designs are produced by people, and the set of people working on a design are part of a design organization.

The definition of design itself is quite interesting:

 

That kind of intellectual activity which creates a whole from its diverse parts may be called the design of a system.

 

When I think about design, I more naturally think about it the other way around: how to decompose the whole into a set of parts that will work together to accomplish the system goals. Of course, Conway is right that those parts do have to fit together to produce the intended whole again.

Two bits of knowledge are needed at the beginning of the design process:

• An (initial) understanding of the system boundaries (and any boundaries on the design and development process too)—what's in scope and what's out of scope.

• A preliminary notion of how the system will be organized. Without this, you can't begin to break down the design work.

With a preliminary understanding of the system, it's possible to begin organizing the design team. Decisions made at this early stage, with limited information, can have long-lasting consequences:

 

... the very act of organizing a design team means that certain design decisions have already been made, explicitly or otherwise. Given any design team organization, there is a class of design alternatives which cannot be effectively pursued by such an organization because the necessary communication paths do not exist. Therefore, there is no such thing as a design group which is both organized and unbiased.

 

These days it's less likely to have a dedicated design team—even the seemingly obvious statement that it makes sense to (at least partially) design something before building it can feel like a controversial statement. But, of course, people do undertake design activities all the time, perhaps informally and implicitly, sometimes more explicitly. They've just learned to take smaller steps with each design increment before getting feedback. Then, in the software context, mentally substituting "design and development" every time Conway talks about "design and the design organization" will make some sense.

Once the initial organization of the design (and development) team is done, delegation of tasks can begin. Each delegation represents a narrowing of someone's scope, with a commensurate narrowing of the class of design alternatives that can be considered. Along with the narrowing of individual scopes, there is also a coordination problem:

 

Coordination among task groups, although it appears to lower the productivity of the individual in the small group, provides the only possibility that the separate task groups will be able to consolidate their efforts into a unified system design.

 

It's a rare team that reorganizes in the light of newly discovered information, even though it might suggest a better alternative.

 

This point of view has produced the observation that there's never enough time to do something right, but there's always enough time to do it over.

 

The two most fundamental tools in a designer's toolbox are decomposition and composition. The system as a whole is decomposed into smaller subsystems that are interconnected (composed). Each of these subsystems may in turn be further decomposed into, and then composed out of, parts. Eventually, a level is reached that is simple enough to be understood without further subdivision. Therefore, the most important decisions a designer can make involve the criteria for decomposing a system into modules, but that's another story.

The different subsystems talk to each other through interfaces (a newly emerging term in 1968). Now, if you think about systems composed of subsystems interacting via interfaces, you will find a parallel in the organization by making the following substitutions:

• Replace system with committee.

• Replace subsystem with subcommittee.

• Replace interface with coordinator.

To put that in more modern terms, I think you can also:

• Replace system with group.

• Replace subsystem with team.

• Replace interface with team leader.

 

We are now in a position to address the fundamental question of this article. Is there any predictable relationship between the graph structure of a design organization and the graph structure of the system it designs? The answer is: Yes, the relationship is so simple that in some cases it is an identity... This kind of structure-preserving relationship between two sets of things is called a homomorphism.

 

By far my favorite part of the paper is the second half, where the implications of this homomorphism are unpacked. It was Fred Brooks who actually coined the term Conway's law in The Mythical Man Month when referring to this paper. The mythical thing about the man (person) month, of course, is the illusion that person-months are fungible commodities—a tempting idea from the management perspective but utterly wrong. Conway shows why. The resource units' viewpoint would say that two people working for a year, or 100 people working for a week are of equal value...

 

Assuming that two men and one hundred men cannot work in the same organizational structure (this is intuitively evident and will be discussed below) our homomorphism says that they will not design similar systems; therefore the value of their efforts may not even be comparable. From experience we know that the two men, if they are well chosen and survive the experience, will give us a better system. Assumptions which may be adequate for peeling potatoes and erecting brick walls fail for designing systems.

 

Everyone understands this at some level, but it's easy to forget. Plus, there are organizational forces that work against you:

1. You come to the early realization that the system will be large, with the implication that it's going to take more time than you'd like to design with the current team size. Organizational pressures then kick in to "make irresistible the temptation to assign too many people to a design effort."

2. As you add people, and apply conventional management structures to their organization, the organizational communication structure begins to disintegrate. (Conway refers to the military-style organizational structure of each individual having, at most, one superior and, at most, approximately seven subordinates—pretty much the rule of thumb still used today.)

3. The homomorphism then ensures that the structure of the system will reflect the disintegration that has occurred in the design organization.

The critical moment comes when the complexity has not yet been tamed, and the skills of the initial designer are being tested to the maximum:

 

It is a natural temptation of the initial designer—the one whose preliminary design concepts influence the organization of the design effort—to delegate tasks when the apparent complexity of the system approaches his limits of comprehension. This is the turning point in the course of the design. Either he struggles to reduce the system to comprehensibility and wins, or else he loses control of it.

 

Once an organization has been staffed and built, it's going to be used. Organizations have an incredible propensity for self-preservation.

 

Probably the greatest single common factor behind many poorly designed systems now in existence has been the availability of a design organization in need of work.

 

I've always had a preference for smaller teams consisting of highly skilled people over larger groups. Revisiting Conway's law while putting this write-up together, I am struck most forcibly by the more often overlooked observation that the design organization structure doesn't just direct the design, but actually constrains the set of designs that can be contemplated. Every person you add reduces your design options.

Perhaps the most important concept, therefore, is to "keep design organizations lean and flexible." Flexibility of organization is important to effective design, because the design you have now is rarely the best possible for all time.

Finally, in the third to last paragraph, is the formulation that has come to be known as Conway's law:

 

The basic formulation of this article is that organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.

 

Ironies of Automation

Bainbridge, L. 1983. Ironies of automation. Automatica 19(6);

https://www.ise.ncsu.edu/wp-content/uploads/2017/02/Bainbridge_1983_Automatica.pdf.

With thanks to Thomas Depierre for the paper recommendation.

 

Making predictions is a dangerous game, but as we look forward to the next decade a few things seem certain: increasing automation, increasing system complexity, faster processing, more interconnectivity, and an even greater human and societal dependence on technology. What could possibly go wrong? Automation is supposed to make our lives easier, but when it goes wrong it can put us in a very tight spot indeed. "Ironies of Automation," explores these issues. Originally published in 1983, the paper puts forth lessons that are just as relevant today as they were then.

The central irony (combination of circumstances, the result of which is the direct opposite of what might be expected) referred to in this paper is that the more we automate, and the more sophisticated we make that automation, the more we become dependent on a highly skilled human operator.

 

Automated Systems Need Highly Skilled Operators

Why do we automate?

 

The designer's view of the human operator may be that the operator is unreliable and inefficient, so should be eliminated from the system.

 

An automated system doesn't make mistakes in the same way that a human operator might, and it can operate at greater speeds and/or lower costs than a human operator. The paper assumes a world in which every automated task was previously undertaken by humans (the context is industrial control systems), but of course many systems today were born automated. One example I found myself thinking about while reading through the paper does have a human precedence, though: self-driving cars.

In an automated system, two roles are left to humans: monitoring that the automated system is operating correctly, and taking control if it isn't. An operator who doesn't routinely operate the system will have atrophied skills if ever called on to take over.

 

Unfortunately, physical skills deteriorate when they are not used, particularly the refinements of gain and timing. This means that a formerly experienced operator who has been monitoring an automated process may now be an inexperienced one.

 

Not only are the operator's skills declining, but the situations when the operator will be called upon are by their very nature the most demanding ones where something is deemed to be going wrong. Thus, what is really needed in such a situation is a more, not a lesser, skilled operator. To generate successful strategies for unusual situations, an operator also needs a good understanding of the process under control and the current state of the system. The former understanding develops most effectively through use and feedback (which the operator may no longer be getting the regular opportunity for); the latter takes some time to assimilate.

We've seen that taking over control is problematic, but there are also issues with the monitoring that leads up to a decision to take over control. For example, here's something to consider before relying on a human driver to take over the controls of a self-driving car in an emergency:

 

We know from many 'vigilance' studies (Mackworth, 1950) that it is impossible for even a highly motivated human being to maintain effective visual attention towards a source of information on which very little happens, for more than about half an hour. This means that is humanly impossible to carry out the basic function of monitoring for unlikely abnormalities, which therefore has to be done by an automatic alarm system connected to sound signals...

 

But who notices when the alarm system is not working properly? You might need alarms on alarms. Section 2.1 in this paper has a nice section on the challenges of what we would now call gray failure:

 

Unfortunately automatic control can 'camouflage' system failure by controlling against the variable changes, so that trends do not become apparent until they are beyond control. This implies that the automatics should also monitor unusual valuable movement. 'Graceful degradation' of performance is quoted in "Fitt's list" of man-computer qualities as an advantage of man over machine. This is not an aspect of human performance to be aimed for in computers, as it can raise problems with monitoring for failure; automatic systems should fail obviously.

 

A straightforward solution when feasible is to shut down automatically. But many systems, "because of complexity, cost, or other factors," must be stabilized rather than shut down. If very fast failures are possible, with no warning from prior changes so that an operator's working memory is not up to speed, then reliable automatic response is necessary; and if this is not possible, then the process should not be built if the costs of failure are unacceptable.

 

What Can We Do About It?

One possibility is to allow the operator to use hands-on control for a short period in each shift. If this suggestion is laughable then simulator practice must be provided.

 

Chaos experiments and game-days are some of the techniques used today to give operators experience with the system under various scenarios. Simulators can help teach basic skills but are always going to be limited: "Unknown faults cannot be simulated, and system behavior may not be known for faults, which can be predicted but have not been experienced."

 

No one can be taught about unknown properties of the system, but they can be taught to practice solving problems with the known information.

 

One new innovation at the time this paper was written was the possibility of using "soft displays on VDUs [visual display units]" to design task-specific displays. But changing displays bring their own challenges. Bainbridge offers three suggestions:

• There should be at least one source of information permanently available for each type of information that cannot be mapped simply onto others.

• Operators should not have to page between displays to obtain information about abnormal states in parts of the process other than the one they are currently thinking about, or between displays giving information needed within one decision process.

• Research on sophisticated displays should concentrate on the problems of ensuring compatibility between them, rather than finding which independent display is best for one particular function without considering its relation to information for other functions.

It's quite likely in many cases to end up in a situation where a computer is controlling some aspects of a system, and the human operator others. The key here is that the human being must always know which tasks the computer is dealing with and how.

 

Perhaps the final irony is that it is the most successful automated systems, with rare need for manual intervention, which may need the greatest investment in human operator training... I hope this paper has made clear both the irony that one is not by automating necessarily removing the difficulties, and also the possibility that resolving them will require even greater technological ingenuity than does classic automation.

 

This puts me in mind of Kernighan's law ("Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."). If you push yourself to the limits of your technological abilities in automating a system, how then are you going to be able to manage it?

 

Adrian Colyer is a venture partner with Accel in London, where his job is to help find and build great technology companies across Europe and Israel. (If you're working on an interesting technology-related business, he would love to hear from you: [email protected]) Prior to joining Accel, he spent more than 20 years in technical roles, including CTO at Pivotal, VMware, and SpringSource.

Copyright © 2020 held by owner/author. Publication rights licensed to ACM.

Reprinted with permission from https://blog.acolyer.org

acmqueue

Originally published in Queue vol. 18, no. 1
see this item in the ACM Digital Library


Tweet


Related:

J. Paul Reed - Beyond the Fix-it Treadmill
The Use of Post-Incident Artifacts in High-Performing Organizations


Laura M.D. Maguire - Managing the Hidden Costs of Coordination
Controlling coordination costs when multiple, distributed perspectives are essential


Marisa R. Grayson - Cognitive Work of Hypothesis Exploration During Anomaly Response
A look at how we respond to the unexpected


Richard I. Cook - Above the Line, Below the Line
The resilience of Internet-facing systems relies on what is below the line of representation.





© 2020 ACM, Inc. All Rights Reserved.