Check out Pat's
Scattered Thoughts on Distributed Systems

pathelland.substack.com

Escaping the Singularity

  Download PDF version of this article PDF

I'm Probably Less Deterministic Than I Used to Be

Embracing randomness is necessary in cloud environments.

Pat Helland

In my youth, I thought the universe was ruled by cause and effect like a big clock. In this light, computing made sense. Now I see that both life and computing can be a crapshoot, and that has given me a new peace.

Escaping the Singularity: I

I've Always Been Determined

In high school, I learned about classic Newtonian physics. Every day, I am grateful I studied physics before getting my driver's license as a teenager. As I looked around at the world, read history, and observed things, everything seemed to have a cause. Philosophically, I was a believer in determinism.

As I started programming in the early 1970s, bugs in my software caused momentary lapses in my belief in determinism as I swore at the computer. This wavering dissipated as I found the underlying bug was caused by my screw-up. Everything in the complex system had a cause, and it was possible to construct the causal reason for its correct or incorrect behavior.

Names and identities for things in the system were always derived from some overarching taxonomy. For many years, this was within the centralized single server. Over time, I started working on a small gaggle of cooperating servers. This led to directory services within enterprises and eventually to Internetwide names based on DNS (Domain Name System). Everything lived in its own place in the big ticking clock. I could reason about the origin of each name based on a global preordained hierarchy.

As I started working on distributed systems, the bounded local networks were tightly controlled. Dedicated hardware ensured messages seemed to be synchronously delivered. While they were not always delivered on time, we pretended they were and built solutions based on timeout and restart. Pseudosynchronous messaging behaved well in this bounded context.

The world could be mapped into predictable behaviors, causes, consequences, and composed relationships. I was truly a database person, albeit one with an interest in distributed systems. I was at peace with the universe.

 

An Existential Midlife Crisis

I started working at Microsoft in 1994 to help the company move from desktop PCs to running enterprises. My experience in the 1980s at Tandem Computers had left me with a deep interest in providing the perfectly correct transactional answer for enterprise customers. This required super-high availability even when things broke. Microsoft was aggressive and bold, and so was I.

By the time I turned 40 in 1996, I was middle-aged, and my middle had also aged. My view of the world was Newtonian in that everything was modeled on actions and reactions within a closed universe. This abstract "perfectly correct transactional answer" was built on a notion of identity based on some centralized authority within the customers' companies. Names and identities were hierarchical and local to the customer. Talking to outsiders and their world was an "application problem," not part of our system.

In contrast, Microsoft lived in a world where software was created at many independent sources and came together in an ecosystem. At first, this entailed rampantly sharing floppy disks. By the 1990s, it meant squirting bits around the nascent Internet.

It was then that I was introduced to UUIDs (universally unique identifiers). At Microsoft, these are called GUIDs (globally unique identifiers). UUIDs are independently allocated without a hierarchical namespace and are assumed to be unique. My first exposure to UUIDs involved identifying software interfaces coded at independent companies, yet remaining unique when installed anywhere in the world.

How can these be unique and independently allocated? I immediately and viscerally rejected such nonsense and went home with my head in a spin.

Version 4 UUIDs are 128-bit identifiers with 122 bits allocated to hold a unique value; 2 to the 122nd power is, indeed, a Big NumberTM! While I don't consider myself to be a math expert, my colleagues at Microsoft convinced me that the probability of a UUID collision was an example of the birthday problem with 2^122 possible birthdays. The probability of collision in 122 random bits is driven by the number of unique identifiers in the pool. According to the math at the Wiki for universally unique identifiers, if you used 1.6 petabytes to store 103 trillion 16-byte UUIDs in a single place, you'd have a one-in-a-billion chance of one duplicate.

A UUID is a Big NumberTM only if you assume it's random. To understand randomness for this article, I read a white paper called "The Intel Random Number Generator." Most of it made sense, even though I flunked out of college math because programming was more fun. It explains the notion of entropy or the randomness of a random bit. Quoting this white paper:

 

In the case of a random number generator that produces a k-bit binary result, Pi is the probability that an output will equal i, where 0 ≤ i < 2k

Thus, for a perfect random number generator, Pi = 2-k and the entropy of the output is equal to k bits. This means that all possible outcomes are equally (un)likely, and on average the information present in the output cannot be represented in a sequence shorter than k bits.

 

The paper explores the use of TRNGs (true random number generators) built on a nondeterministic source (such as multiple heat-sensitive oscillators), as well as PRNGs (pseudo-random number generators). Intel exposes these in hardware as the RDRAND and RDSEED instructions. It is common practice to take two or more sources for random bits or bytes and XOR them together to increase the entropy of the result. For example, consider taking two 128-bit inputs from two different sources. By XORing these two (with a total of 256 bits), you can produce a 128-bit output with better entropy. Care must be taken to avoid undermining these established techniques.

While Intel's solutions are the ones described here, I believe similar results are available from all other major chip vendors. My shallow investigation into randomness leads me to believe that UUIDs are unique as far as I'm concerned.

As I said before, I consider myself to be aggressive and bold. I can tolerate this minuscule risk. Emotionally, probabilities became part of my life. My midlife existential crisis launched me toward a more Zenlike acceptance of things being less than perfect, some of the time. Karma, dogma, and determinism were only a subset of life.

 

Around Here, Looks Like Newton Had It Right

By the early 2000s, I'd spent about 25 years hanging out with people who build databases. They are my peeps! Still, I couldn't help feeling that they saw the world through a Newtonian prism. It struck me that there was data outside of a database and it had different properties than database data. One of the major reasons was that time (as perceived by the outside data) was disconnected from the transactional now inside the database. This led to my 2005 CIDR (Conference on Innovative Data Systems Research) paper, "Data on the Outside Versus Data on the Inside."

As what I wrote in "The Singular Success of SQL," I had truly renounced SQL as the only model of the universe. Like Newtonian physics, it's very practical in small domains. SQL depends on transactions to suspend time for correctness. It only works in the now. Now, if it exists, is a local frame of reference.

Personally, I have no trouble both living in the frame of reference needed to support transactional correctness and understanding that there's a bigger world of distributed systems. Like particles of light, databases exist in their own world, oblivious to interactions across their boundaries. Like waves, the world of a database sometimes interacts with the outside with surprising results.

 

The Joy of Sects

The dictionary defines a sect as a group of individuals with somewhat different religious beliefs from those of the larger group to which they belong. That sounds like computer scientists and engineers to me. Let's discuss just three of these sectarian groups:

Networking folks. Most of these people have personalities combining the best aspects of hippies and bikers. When you talk to them, they want to both ride into town like a storm and talk existentially about the randomness of life. I've never met a networking engineer who thinks anything is deterministic. Everything happens with a probability and a cumulative distribution function over its latency. Yet, they are as happy as can be, ready to jump into the next fracas, and continuously show a zest for life.

Distributed systems folks. These people vacillate between philosophers and lawyers. No one else can talk so fluently about total order without discussing the scope of totality. Availability is always couched by assuming a loss of, at most, F servers, while never veering into what happens when an operator logs into the system. Integral to their psyche is the belief in deterministic outcomes in the face of nondeterministic environments. Sometimes these discussions get seriously F'd up!

Database systems folks. This community is my family of origin. They are my people! These professionals bring out the best qualities of bankers, architects, and builders. Everything must be a combination of business-critical, provably correct, and overengineered for reliability. Database folks assume a preexisting deterministic world and build the coolest complex systems on top of that deterministic foundation. This works great while their assumptions hold true. Arm in arm with them, I'll gleefully design an outhouse expected to last for 100 years, despite the poor hygienics.

 

Nothing is more fun than sliding between these groups of folks and messing with their brains.

 

Zen and the Art of System Design

In Robert Pirsig's famous philosophy book Zen and the Art of Motorcycle Maintenance, friends ride motorcycles on a 17-day journey from Minnesota to California. During the ride, it emerges that there are a couple of different philosophies about how to maintain their motorcycles. This contrasts the "romantic" approach to life, where John chooses to avoid maintaining his expensive new bike, with the narrator's "classical" approach of applying methodical care and diligence maintaining every part.

It emerges that John just wants to enjoy the gestalt and live in the moment. The narrator seeks to understand every detail and impose a rational analysis on all things.

Similarly, one perspective on computer systems focuses on the causality of every component's behavior and attempts to control and manage the overarching system with the precision of a fine Swiss watch. Another perspective recognizes that a commute home on the freeway is frequently, but not always, fast.

 

Climbing into a Fast-Moving Tube of Aluminum

My first trip on an airplane was as a senior in high school. While I had a high-level understanding of how it stayed in the air, it really seemed counterintuitive to trust this aluminum tube. I was completely assuaged in my fears, though, because I had read about the extremely low chance of problems when flying in a commercial airplane. Apparently, it's much riskier to be on the freeway than in the air.

Today, everything is more and more interconnected. At the same time, we're tossing our stuff into massively shared cloud deployments. Each of these two trends undermines the closed system needed for predictable and deterministic behavior. Computing has become a nondeterministic Wild West. This has caused me to relax and become zen with probabilities. No longer can I completely explain our systems only on causal dependencies.

Within this cloudy Wild West, I work side by side with folks who squawk about collisions of UUIDs and look to find perfect confidence in unique identifiers. This happens while blithely ignoring the deep dependencies that systems have on cryptography for security and its foundational dependence on the probabilities of random key allocation. They continue to expect servers to fail cleanly and fast even though, as I once wrote, "Fail-Fast Is Failing... Fast!" I know they fly on airplanes too.

Now I can ride in an airplane with confidence, be a tiny bit curious about avionics, discuss probabilistic SLAs (service-level agreements) for responses in the cloud, and work to define the perfectly correct database-recovery algorithm with complete comfort. Maybe it's because I like hippies, bikers, philosophers, bankers, architects, and builders. I especially like provoking debates when they're all in the room at the same time.

 

Pat Helland has been implementing transaction systems, databases, application platforms, distributed systems, fault-tolerant systems, and messaging systems since 1978. For recreation, he occasionally writes technical papers. He works at Salesforce. His blog is at pathelland.substack.com and he tweets with the handle @pathelland.

Copyright © 2022 held by owner/author. Publication rights licensed to ACM.

acmqueue

Originally published in Queue vol. 20, no. 3
see this item in the ACM Digital Library


Tweet


Related:

Martin Kleppmann, Alastair R. Beresford, Boerge Svingen - Online Event Processing
Support for distributed transactions across heterogeneous storage technologies is either nonexistent or suffers from poor operational and performance characteristics. In contrast, OLEP is increasingly used to provide good performance and strong consistency guarantees in such settings. In data systems it is very common for logs to be used as internal implementation details. The OLEP approach is different: it uses event logs, rather than transactions, as the primary application programming model for data management. Traditional databases are still used, but their writes come from a log rather than directly from the application. The use of OLEP is not simply pragmatism on the part of developers, but rather it offers a number of advantages.


Andrew Leung, Andrew Spyker, Tim Bozarth - Titus: Introducing Containers to the Netflix Cloud
We believe our approach has enabled Netflix to quickly adopt and benefit from containers. Though the details may be Netflix-specific, the approach of providing low-friction container adoption by integrating with existing infrastructure and working with the right early adopters can be a successful strategy for any organization looking to adopt containers.


Marius Eriksen - Functional at Scale
Modern server software is demanding to develop and operate: it must be available at all times and in all locations; it must reply within milliseconds to user requests; it must respond quickly to capacity demands; it must process a lot of data and even more traffic; it must adapt quickly to changing product needs; and in many cases it must accommodate a large engineering organization, its many engineers the proverbial cooks in a big, messy kitchen.


Caitie McCaffrey - The Verification of a Distributed System
Leslie Lamport, known for his seminal work in distributed systems, famously said, "A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable." Given this bleak outlook and the large set of possible failures, how do you even begin to verify and validate that the distributed systems you build are doing the right thing?





© ACM, Inc. All Rights Reserved.