*Originally published in Queue vol. 13, no. 5*—

see this item in the ACM Digital Library

Tweet

Related:

Theo Schlossnagle - **Monitoring in a DevOps World**

Perfect should never be the enemy of better.

Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, Hannes Payer - **Idle-Time Garbage-Collection Scheduling**

Taking advantage of idleness to reduce dropped frames and memory consumption

Robert Sproull, Jim Waldo - **The API Performance Contract**

How can the expected interactions between caller and implementation be guaranteed?

Patrick Meenan - **How Fast is Your Web Site?**

Web site performance data has never been more readily available.

(newest first)

Overbrief summary: superlinear effects explained by baseline platform suffering errors (making it slower than it should've been). Lesson: when measuring scaling up, compare to apples-to-apples in terms of QoI.

0. Dr Gunther has seen, solved and explained an extremely important result, "super-linear" scaling that has not been documented in the 50+ years of multi-processing until recently.

1. The analysis is wholly in terms of Throughput, with the absolute value 'normalised' relative to T1, eqn 1, as a 'speedup', Sp. Until "super-linear" performance was found, no additional equations were needed.

Solving the USL (eqn 2) for maximum Sp (d/dp Sp = 0), provides a "Never Exceed" bound for a system. Attempting to process higher demands leads to lower Throughput, not what we want.

2. There are many physical processes that follow exactly these curves, one that's well known is power output of petrol engines.

Two curves, vs RPM (i.e. p), are always provided to describe engine performance: - the brake horsepower produced (equiv to Tp, or the normalised value, Sp), and - the specific, or normalised, output per Revolution: the Torque (equiv to Sp x p)

3. Peak Torque is the maximum economic speed, RPM, of the engine. For minimum fuel use, or maximum range, vehicles or machinery need to operate at this speed.

Similarly, the point when (Sp x p) peaks, [d/dp = 0] is the maximum economic area of operation of a system. System designers should question routinely exceeding the maximum specific processor performance.

4. There is a third point of interest in the USL curve, where p is large and Sp <= 1, and continues falling. This is where the system is slower than a single processor, hence all the additional processors aren't just busy doing nothing, but creating non-useful work.

The system is spending more processor effort on its internal operations than on productive work.

Beyond peak Throughput, this is always true, but below Unity Speedup, the system needs to radically triage its operations and revert to a single processor.

This is very close to the boundary definition of Virtual Memory Thrashing and the system response, load-shedding, very similar.

5. The application of Dr Gunther's USL is similar Peter Denning et al's "Working Sets" in automatically controlling Virtual Memory thrashing: - it defines a well specified, single performance metric and what, if any, hardware support is needed to capture the data, - the "Never Exceed" bounds of the metric, and - the action for the system to take on nearing or exceeding the Bound.

It took Denning et al a decade to complete the full theory after proving the Working Set Theory explained and enabled avoidance of Thrashing. Potentially much of that Theory can be reused, or tested in this new application.