January/February 2020 issue of acmqueue The January/February 2020 issue of acmqueue is out now

Subscribers and ACM Professional members login here

The Morning Paper


  Download PDF version of this article PDF

The Morning Paper

The Way We Think About Data

Human inspection of black-box ML models; reclaiming ownership of data

Adrian Colyer

The two papers I've chosen for this issue of acmqueue both challenge the way we think about and use data, though in very different ways.

In "Stop Explaining Black-box Machine-learning Models for High-stakes Decisions and Use Interpretable Models Instead," Cynthia Rudin makes the case for models that can be inspected and interpreted by human experts. There are many occasions when understanding what a model is doing and how it reaches its decisions is sufficiently important that interpretability has to be taken into account as a key design objective for a system. The common wisdom is that turning away from deep models inevitably means sacrificing accuracy. But Rudin points out that's not necessarily so.

What if you never deployed a black-box model when an interpretable model exists with the same level of performance? That line of thinking leads to the follow-on question, how can you know the optimal performance an interpretable model can achieve? If this topic interests you, you can read more about that in The Morning Paper write-ups on CORELS (certifiably optimal rule lists; https://blog.acolyer.org/2019/10/30/corels/) and RiskSLIM (Risk-calibrated Supersparse Linear Integer Model; https://blog.acolyer.org/2019/11/01/optimized-risk-scores/).

The second paper, "Local-first Software: You Own Your Data, in Spite of the Cloud," describes how to retain sovereignty over your data. Can you combine the freedom and sense of ownership you used to have with applications writing open file formats to the local file system, with the ease of multidevice access and multiuser collaboration that comes from cloud services? Kleppmann et al. think so and set forth a compelling call for us to start building local-first applications. The only question is, how?


Stop Explaining Black-box Machine-learning Models for High-stakes Decisions and Use Interpretable Models Instead

Rudin, C., et al., arXiv 2019; https://arxiv.org/abs/1811.10154

(With thanks to Glyn Normington for pointing out this paper to me.)


It's pretty clear from the title alone what Cynthia Rudin would like us to do. Her paper is a mix of technical and philosophical arguments and comes with two main takeaways: first, a sharpening of my understanding of the difference between explainability and interpretability, and why the former may be problematic; and second, some great pointers to techniques for creating truly interpretable models.


There has been an increasing trend in healthcare and criminal justice to leverage machine learning (ML) for high-stakes prediction applications that deeply impact human lives The lack of transparency and accountability of predictive models can have (and has already had) severe consequences


Defining terms

A model can be a black box for one of two reasons: (1) The function that the model computes is far too complicated for any human to comprehend; or (2) the model may in actual fact be simple, but its details are proprietary and not available for inspection.

In explainable ML you make predictions using a complicated black-box model—for example, a DNN (deep neural network), and use a second (posthoc) model created to explain what the first model is doing. A classic example is the LIME algorithm, which explores a local area of a complex model to uncover decision boundaries.

An interpretable model is used for predictions and can itself be directly inspected and interpreted by human experts.


Interpretability is a domain-specific notion, so there cannot be an all-purpose definition. Usually, however, an interpretable machine learning model is constrained in model form so that it is either useful to someone, or obeys structural knowledge of the domain, such as monotonicity, or physical constraints that come from domain knowledge.


Explanations don't really explain

There has been a lot of research aimed at producing explanations for the outputs of black-box models. Rudin thinks this approach is fundamentally flawed. At the root of her argument is the observation that ad-hoc explanations are only really "guessing" (my choice of word) at what the black-box model is doing:


Explanations must be wrong. They cannot have perfect fidelity with respect to the original model. If the explanation was completely faithful to what the original model computes, the explanation would equal the original model, and one would not need the original model in the first place, only the explanation.


Even the word explanation is problematic, because you're not really describing what the original model actually does. The example of COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) brings this distinction to life. A linear explanation model for COMPAS created by ProPublica, and dependent on race, was used to accuse COMPAS (which is a black box) of depending on race. But we don't know whether or not COMPAS has race as a feature (though it may well have correlated variables).


Let us stop calling approximations to black box model predictions explanations. For a model that does not use race explicitly, an automated explanation "This model predicts you will be arrested because you are black" is not a model of what the model is actually doing, and would be confusing to a judge, lawyer or defendant.


In the image space, saliency maps show where the network is looking, but even they don't say what it is truly looking at. Saliency maps for many different classes can be very similar. In the following example, the saliency-based explanations for why the model thinks the image is husky, and why it thinks it is a flute, look very similar:


Since explanations aren't really explaining, identifying and troubleshooting issues with black-box models can be very difficult.


Arguments against interpretable models

Given the issues with black-box models and explanations, why are black-box models so in vogue? It's hard to argue against the tremendous recent successes of deep-learning models, but we shouldn't conclude from this that more complex models are always better.


There is a widespread belief that more complex models are more accurate, meaning that a complicated black box is necessary for top predictive performance. However, this is often not true, particularly when the data is structured, with a good representation in terms of naturally meaningful features.


As a consequence of the belief that complex is good, it's also a commonly held myth that if you want good performance you have to sacrifice interpretability:


The belief that there is always a trade-off between accuracy and interpretability has led many researchers to forgo the attempt to produce an interpretable model. This problem is compounded by the fact that researchers are now trained in deep learning, but not in interpretable machine learning


The Rashomon set says that you are often likely to be able to find an interpretable model if you try: Given that the data permits a large set of reasonably accurate predictive models to exist, it often contains at least one model that is interpretable.

This suggests an interesting approach of first doing the comparatively quicker task of trying a deep-learning method without any feature engineering, etc. If that produces reasonable results, you know that the data permits the existence of reasonably accurate predictive models, and you can invest the time in trying to find an interpretable one.


For data that are unconfounded, complete, and clean, it is much easier to use a black box machine learning method than to troubleshoot and solve computationally hard problems. However, for high-stakes decisions, analyst time and computational time are less expensive than the cost of having a flawed or overly complicated model.


Creating interpretable models

Section 5 in Rudin's paper discusses three common challenges that often arise in the search for interpretable machine-learning models: constructing optimal logical models, constructing optimal (sparse) scoring systems, and defining what interpretability might mean in specific domains.


Logical models

A logical model is just a bunch of IF-THEN-ELSE statements. These have been crafted by hand for a long time. The ideal logical model would have the smallest number of branches possible for a given level of accuracy. CORELS is a machine-learning system designed to find such optimal logical models. Here's an example output model that has similar accuracy to the black-box COMPAS model on data from Broward County, Florida:

Note that the figure caption calls it a machine-learning model. That terminology doesn't seem right to me. It's a machine-learned-model, and CORELS is a machine-learning model that produces it, but the IF-THEN-ELSE statement is not itself a machine-learning model. Nevertheless, CORELS looks very interesting, and we're going to take a deeper look at it in the next edition of The Morning Paper.


Scoring systems

Scoring systems are used pervasively through medicine. We're interested in optimal scoring systems that are the outputs of machine-learning models but look like they could have been produced by a human. For example:

This model was, in fact, produced by the RiskSLIM algorithm (which we'll also look at in more depth later).

For both the CORELS and RiskSLIM models, the key thing to remember is that although they look simple and highly interpretable, they give results with highly competitive accuracy. It's not easy getting things to look this simple. I certainly know which models I'd rather deploy and troubleshoot, given the option.


Designing for interpretability in specific domains

even for classic domains of machine learning, where latent representations of data need to be constructed, there could exist interpretable models that are as accurate as black box models.


The key is to consider interpretability in the model design itself. For example, if an expert were to explain to you why they classified an image in a certain way, they would probably point out different parts of the image that were important in their reasoning process (a bit like saliency), and explain why. Bringing this idea to network design, Chen, Li, et al. ("This Looks Like That: Deep Learning for Interpretable Image Recognition"; https://arxiv.org/abs/1806.10574) built a model that during training learns parts of images that act as prototypes for a class, and then during testing finds parts of the test image similar to the prototypes it has learned.


These explanations are the actual computations of the model, and these are not posthoc explanations. The network is called "This look like that" because its reasoning process considers whether "this" part of the image looks like "that" prototype.



Explanation, interpretation, and policy

Section 4 of the paper discusses potential policy changes to encourage interpretable models to be preferred (or even required in high-stakes situations).


Let us consider a possible mandate that, for certain high-stakes decisions, no black box should be deployed when there exists an interpretable model with the same level of performance.


That sounds a worthy goal, but as worded it would be very tough to prove that an interpretable model doesn't exist. So, perhaps companies would have to be required to be able to produce evidence of having searched for an interpretable model with an appropriate level of diligence


Consider a second proposal, which is weaker than the one provided above, but which might have a similar effect. Let us consider the possibility that organizations that introduce black box models would be mandated to report the accuracy of interpretable modeling methods.


If this process is followed, you're likely to see a lot fewer black-box machine-learning models deployed in the wild if the author's experience is anything to go by:


It could be possible that there are application domains where a complete black box is required for a high stakes decision. As of yet, I have not encountered such an application, despite having worked on numerous applications in healthcare and criminal justice, energy reliability, and financial risk assessment.


The last word

If this commentary can shift the focus even slightly from the basic assumption underlying most work in Explainable ML— which is that a black box is necessary for accurate predictions— we will have considered this document a success . If we do not succeed [in making policy makers aware of the current challenges in interpretable machine learning], it is possible that black box models will continue to be permitted when it is not safe to use them.


Local-first Software: You Own Your Data, in Spite of the Cloud

Kleppmann, M., et al., Onward! 2019; https://martin.kleppmann.com/papers/local-first.pdf


Watch out! If you start reading this paper, you could be lost for hours following all the interesting links and ideas, and end up even more dissatisfied than you already are with the state of software today. You might also be inspired to help work toward a better future. I'm all in :).


The rock or the hard place?

On the one hand, there are cloud apps that make it easy to access work from multiple devices and to collaborate online with others (e.g., Google Docs, Trello, ). On the other hand, there are good old-fashioned native apps that you install on your operating system (a dying breed? See, for example, Kubernetes co-founder Brendan Burns's recent tweet; https://twitter.com/brendandburns/status/1194820433142374400?s=21). Somewhere in the middle, but not quite perfect, are online (browser-based) apps with offline support.

The primary issue with cloud apps—the SaaS (software as a service) model—is ownership of the data.


Unfortunately, cloud apps are problematic in this regard. Although they let you access your data anywhere, all data access must go via the server, and you can only do the things that the server will let you do. In a sense, you don't have full ownership of that data— the cloud provider does.


Services do get shut down, or pricing may change to your disadvantage, or the features evolve in a way you don't like, and there's no way to keep using an older version. This link to "Our Incredible Journey" (https://reclaimthenet.org/wordpress-automattic-buys-tumblr/) handily provides a good example—it will take you first to a page announcing that Tumblr has been acquired by Automattic, on which you can agree to the new terms of service, should you wish.

With a traditional operating-system app you have much more control over the data (the files on your file system at least, which if you're lucky might even be in an open format). But you have other problems, such as easy access across all of your devices and the ability to collaborate with others. (Note this is not the new breed of operating-system apps that are really just wrapped browsers over an online service.)


Local-first software ideals

The authors coin the phrase local-first to describe software that retains the ownership properties of old-fashioned applications, with the sharing and collaboration properties of cloud applications.


In local-first applications we treat the copy of the data on your local device — your laptop, tablet, or phone — as the primary copy. Servers still exist, but they hold secondary copies of your data in order to assist with access from multiple devices. As we shall see, this change in perspective has profound implications


Great local-first software should have seven key properties:

1. It should be fast. You don't want to make round trips to a server to interact with the application. Operations can be handled by reading and writing to the local file system, with data synchronization happening in the background.

2. It should work across multiple devices. Local-first apps keep their data in local storage on each device, but the data is also synchronized across all the devices on which a user works.

3. It should work without a network. This follows from reading and writing to the local file system, with data synchronization happening in the background when a connection is available. That connection could be peer-to-peer across devices and doesn't have to be over the Internet.

4. It should support collaboration. "In local-first apps, our ideal is to support realtime collaboration that is on par with the best cloud apps today, or better. Achieving this goal is one of the biggest challenges in realizing local-first software, but we believe it is possible."

5. It should support data access for all time. On one level you get this if you retain a copy of the original application (and an environment capable of executing it). Even better is if the local app using open/long-lasting file formats. See, for example, the Library of Congress recommended archival formats (https://www.loc.gov/preservation/resources/rfs/TOC.html).

6. It should be secure and private by default. "Local-first apps can use end-to-end encryption so that any servers that store a copy of your files hold only encrypted data they cannot read."

7. It should give the user full ownership and control of their data. " we mean ownership in the sense of user agency, autonomy, and control over data. You should be able to copy and modify data in any way, write down any thought, and no company should restrict what you are allowed to do."


How close can we get today?

Section 3 in the paper shows how a variety of different apps/technologies stack up against the local-first ideals.

The combination of Git and GitHub comes closest, but nothing meets the bar across the board.

we speculate that web apps will never be able to provide all the local-first properties we are looking for, due to the fundamental thin-client nature of the platform. By choosing to build a web app, you are choosing the path of data belonging to you and your company, not to your users.


Mobile apps that use local storage combined with a back-end service such as Google Firebase and its Cloud Firestore come closer to the local-first ideal, depending on the way the local data is treated by the application. CouchDB also gets an honorable mention, let down only by the difficulty of getting application-level conflict resolution right.


CRDTs to the rescue?

We have found some technologies that appear to be promising foundations for local-first ideals. Most notably the family of distributed systems algorithms called Conflict-free Replicated Data Types (CRDTs) the special thing about them is that they are multi-user from the ground up CRDTs have some similarity to version control systems like Git, except that they operate on richer data types than text files.


While most industrial usage of CRDTs has been in server-centric computing, the Ink & Switch research lab has been exploring how to build collaborative local-first client applications on top of CRDTs. One of the fruits of this work is an open-source JavaScript CRDT implementation called Automerge, which brings CRDT-style merge operations to JSON documents. Used in conjunction with the dat:// networking stack, the result is Hypermerge.


Just as packet switching was an enabling technology for the Internet and the web, or as capacitive touchscreens were an enabling technology for smart phones, so we think CRDTs may be the foundation for collaborative software that gives users full ownership of their data.


The brave new world

The authors built three (fairly advanced) prototypes using this CRDT stack: a Trello clone called Trellis, a collaborative drawing program, and a mixed-media workspace called PushPin (Evernote meets Pinterest).

If you have 2 minutes and 10 seconds available, it's well worth watching a short video showing Trellis in action (https://www.youtube.com/watch?v=L9fdyDlhByM). It really brings the vision to life.

The authors share what they learned from building these systems:

• CRDT technology works—the Automerge library did a great job and was easy to use.

• The user experience with offline work is splendid.

• CRDTs combine well with reactive programming to provide a good developer experience: "The result of [this combination] was that all of our prototypes realized real-time collaboration and full offline capability with little effort from the application developer."

• In practice, conflicts are not as significant a problem as feared. They are mitigated on two levels: (1) Automerge tracks changes at a fine-grained level; (2) "Users have an intuitive sense of human collaboration and avoid creating conflicts with their collaborators."

• Visualizing document history is important (see the Trellis video).

• URLs are a good mechanism for sharing.

• Cloud servers still have their place for discovery, backup, and burst compute.


Some challenges

• It can be hard to reason about how data moves between peers.

• CRDTs accumulate a large change history, which creates performance problems. (This is an issue with state-based CRDTs, as opposed to operation-based CRDTs).


Performance and memory/disk usage quickly became a problem because CRDTs store all history, including character-by-character text edits. These pile up, but can't be easily truncated because it's impossible to know when someone might reconnect to your shared document after six months away and need to merge changes from that point forward.


It feels like some kind of log compaction with a history watermark (e.g., after n months you may no longer be able to merge in old changes and will have to do a full resync to the latest state) could possibly help here.

• P2P technologies aren't production ready yet (but "feel like magic" when they do work).


What can you do today?

You can take incremental steps toward a local-first future by following these guidelines:

• Use aggressive caching to improve responsiveness.

• Use syncing infrastructure to enable multidevice access.

• Embrace offline web application features (Progressive Web Apps).

• Consider operational transformation as the more mature alternative to CRDTs for collaborative editing.

• Support data export to standard formats.

• Make it clear what data is stored on a device and what is transmitted to the server.

• Enable users to back up, duplicate, and delete some or all of their documents (outside of your application).

I'll leave you with a quote from Section 4.3.4:


If you are an entrepreneur interested in building developer infrastructure, all of the above suggests an interesting market opportunity: "Firebase for CRDTs."


Adrian Colyer is a venture partner with Accel in London, where it's his job to help find and build great technology companies across Europe and Israel. (If you're working on an interesting technology-related business he would love to hear from you: you can reach him at [email protected].) Prior to joining Accel, he spent more than 20 years in technical roles, including CTO at Pivotal, VMware, and SpringSource.

Copyright © 2019 held by owner/author. Publication rights licensed to ACM.

Reprinted with permission from https://blog.acolyer.org


Originally published in Queue vol. 17, no. 6
see this item in the ACM Digital Library



Pat Helland - Identity by Any Other Name
The complex cacophony of intertwined systems

Raymond Blum, Betsy Beyer - Achieving Digital Permanence
The many challenges to maintaining stored information and ways to overcome them

Graham Cormode - Data Sketching
The approximate approach is often faster and more efficient.

Heinrich Hartmann - Statistics for Engineers
Applying statistical techniques to operations data

© 2020 ACM, Inc. All Rights Reserved.