Provenance

Vol. 19 No. 3 – May-June 2021

Provenance

What Went Wrong?:
Why we need an IT accident investigation board

Governments should create IT accident investigation boards for the exact same reasons they have done so for ships, railroads, planes, and in many cases, automobiles. Denmark got its Railroad Accident Investigation Board because too many people were maimed and killed by steam trains. The UK's Air Accidents Investigation Branch was created for pretty much the same reasons, but, specifically, because when the airlines investigated themselves, nobody was any the wiser. Does that sound slightly familiar in any way?

by Poul-Henning Kamp

Digging into Big Provenance (with SPADE):
A user interface for querying provenance

Several interfaces exist for querying provenance. Many are not flexible in allowing users to select a database type of their choice. Some provide query functionality in a data model that is different from the graph-oriented one that is natural for provenance. Others have intuitive constructs for finding results but have limited support for efficiently chaining responses, as needed for faceted search. This article presents a user interface for querying provenance that addresses these concerns and is agnostic to the underlying database being used.

by Ashish Gehani, Raza Ahmad, Hassan Irshad, Jianqiao Zhu, Jignesh Patel

When Curation Becomes Creation:
Algorithms, microcontent, and the vanishing distinction between platforms and creators

Media platforms today benefit from: (1) discretion to organize content, (2) algorithms for curating user-posted content, and (3) absolution from liability. This favorable regulatory environment results from the current legal framework, which distinguishes between intermediaries and content providers. This distinction is ill-adapted to the modern social media landscape, where platforms deploy powerful data-driven algorithms to play an increasingly active role in shaping what people see, and where users supply disconnected bits of raw content as fodder. Today's platforms have license to monetize whatever content they like, moderate if and when it aligns with their corporate objectives, and curate their content however they wish.

by Liu Leqi, Dylan Hadfield-Menell, Zachary C. Lipton

Divide and Conquer:
The use and limits of bisection

Bisection is of no use if you have a heisenbug that fails only from time to time. These subtle bugs are the hardest to fix and the ones that cause us to think critically about what we are doing. Timing bugs, bugs in distributed systems, and all the difficult problems we face in building increasingly complex software systems can't yet be addressed by simple bisection. It's often the case that it would take longer to write a usable bisection test for a complex problem than it would to analyze the problem whilst at the tip of the tree.

by George V. Neville-Neil

Real-world String Comparison:
How to handle Unicode sequences correctly

In many languages a string comparison is a pitfall for beginners. With any Unicode string as input, a comparison often causes problems even for advanced users. The semantic equivalence of different characters in Unicode requires a normalization of the strings before comparing them. This article shows how to handle Unicode sequences correctly. The comparison of two strings for equality often raises questions concerning the difference between comparison by value, comparison of object references, strict equality, and loose equality. The most important aspect is semantic equivalence.

by Torsten Ullrich

Declarative Machine Learning Systems:
The future of machine learning will depend on it being in the hands of the rest of us.

The people training and using ML models now are typically experienced developers with years of study working within large organizations, but the next wave of ML systems should allow a substantially larger number of people, potentially without any coding skills, to perform the same tasks. These new ML systems will not require users to fully understand all the details of how models are trained and used for obtaining predictions, but will provide them a more abstract interface that is less demanding and more familiar. Declarative interfaces are well-suited for this goal, by hiding complexity and favoring separation of interest, and ultimately leading to increased productivity.

by Piero Molino, Christopher Ré

Don't Get Stuck in the "Con" Game:
Consistency, convergence, and confluence are not the same! Eventual consistency and eventual convergence aren't the same as confluence, either.

"Eventual consistency" is a popular phrase with a fuzzy definition. People are even inconsistent in their use of consistency. But two other terms, "convergence" and "confluence", that have crisper definitions and are more easily understood.

by Pat Helland