ACID: My Personal:
How could I miss such a simple thing?
I had a chance recently to chat with my old friend, Andreas Reuter, the inventor of ACID. He and his Ph.D. advisor, Theo Härder, coined the term in their famous 1983 paper, Principles of Transaction-Oriented Database Recovery. I had blinders on after almost four decades of seeing C based on my assumptions. One big lesson for me is to work hard to ALWAYS question your assumptions. Try hard to surround yourself with curious and passionate people, both young and old, who will challenge you and try to dislodge your blinders.
Baleen Analytics:
Large-scale filtering of data provides serendipitous surprises.
Data analytics hoovers up anything it can find and we are finding patterns and insights that weren't available before, with implications for both data analytics and for messaging between services and microservices. It seems that a pretty good understanding among many different sources allows more flexibility and interconnectivity. Increasingly, flexibility dominates perfection.
Consistently Eventual:
For many data items, the work never settles on a value.
Applications are no longer islands. Not only do they frequently run distributed and replicated over many cloud-based computers, but they also run over many hand-held computers. This makes it challenging to talk about a single truth at a single place or time. In addition, most modern applications interact with other applications. These interactions settle out to impact understanding. Over time, a shared opinion emerges just as new interactions add increasing uncertainty. Many business, personal, and computational "facts" are, in fact, uncertain. As some changes settle, others meander from place to place. With all the regular, irregular, and uncleared checks, my understanding of our personal joint checking account is a bit hazy.
Don't Get Stuck in the "Con" Game:
Consistency, convergence, and confluence are not the same! Eventual consistency and eventual convergence aren't the same as confluence, either.
"Eventual consistency" is a popular phrase with a fuzzy definition. People are even inconsistent in their use of consistency. But two other terms, "convergence" and "confluence", that have crisper definitions and are more easily understood.
Extract, Shoehorn, and Load:
Data doesn’t always fit nicely into a new home.
It turns out that the business value of ill-fitting data is extremely high. The process of taking the input data, discarding what doesn’t fit, adding default or null values for missing stuff, and generally shoehorning it to the prescribed shape is important. The prescribed shape is usually one that is amenable to analysis for deeper meaning.
Fail-fast Is Failing... Fast!:
Changes in compute environments are placing pressure on tried-and-true distributed-systems solutions.
For more than 40 years, fail-fast has been the dominant way of achieving fault tolerance. In this approach, some mechanism is responsible for ensuring that each component is up, functioning, and responding to work. As the industry moves to leverage cloud computing, this is getting more challenging. The way we create robust solutions is under pressure as the individual components don't fail fast but instead, starts running slow, which is far worse The slow component may be healthy enough to say, "I'm still here!" but slow enough to clog up all the work. This makes fail-fast schemes vulnerable.
I'm Probably Less Deterministic Than I Used to Be:
Embracing randomness is necessary in cloud environments.
In my youth, I thought the universe was ruled by cause and effect like a big clock. In this light, computing made sense. Now I see that both life and computing can be a crapshoot, and that has given me a new peace.
Side Effects, Front and Center!:
One System’s Side Effect is Another’s Meat and Potatoes.
We think of computation in terms of its consequences. The big MapReduce job returns a large result. Web interactions display information. Enterprise applications update the database and return an answer. These are the reasons we do our work. What we rarely discuss are the side effects of doing the work we intend. Side effects may be unwanted, or they may actually cause desired behavior at different layers of the system. This column points out some fun patterns to keep in mind as we build and use our systems.
Space Time Discontinuum:
Combining data from many sources may cause painful delays.
Back when you had only one database for an application to worry about, you didn’t have to think about partial results. You also didn’t have to think about data arriving after some other data. It was all simply there. Now, you can do so much more with big distributed systems, but you have to be more sophisticated in the tradeoff between timely answers and complete answers.
Standing on Distributed Shoulders of Giants:
Farsighted Physicists of Yore Were Danged Smart!
If you squint hard enough, many of the challenges of distributed computing appear similar to the work done by the great physicists. Dang, those fellows were smart! Here, we examine some of the most important physics breakthroughs and draw some whimsical parallels to phenomena in the world of computing... just for fun.
The Best Place to Build a Subway:
Building projects despite (and because of) existing complex systems
Many engineering projects are big and complex. They require integrating into the existing environment to tie into stuff that precedes the new, big, complex thing. It is common to bemoan the challenges of dealing with the preexisting stuff. Many times, engineers don’t realize that their projects (and their paychecks) exist only because of the preexisting and complex systems that impose constraints on the new work. This column looks at some sophisticated urban redevelopment projects that are very much part of daily life in San Francisco and compares them with the challenges inherent in building software.
The Power of Babble:
Expect to be constantly and pleasantly befuddled
Metadata defines the shape, the form, and how to understand our data. It is following the trend taken by natural languages in our increasingly interconnected world. While many concepts can be communicated using shared metadata, no one can keep up with the number of disparate new concepts needed to have a common understanding.
The Singular Success of SQL:
SQL has a brilliant future as a major figure in the pantheon of data representations.
SQL has a brilliant past and a brilliant future. That future is not as the singular and ubiquitous holder of data but rather as a major figure in the pantheon of data representations. What the heck happens when data is not kept in SQL?
Write Amplification Versus Read Perspiration:
The tradeoffs between write and read
In computing, there’s an interesting trend where writing creates a need to do more work. You need to reorganize, merge, reindex, and more to make the stuff you wrote more useful. If you don’t, you must search or do other work to support future reads.
XML and JSON Are Like Cardboard:
Cardboard surrounds and protects stuff as it crosses boundaries.
In cardboard, the safety and care for stuff is the important reason for its existence. Similarly, in XML and JSON the safety and care of the data, both in transit and in storage, are why we bother.