Machine Learning

Vol. 16 No. 3 – May-June 2018

Machine Learning

Mind Your State for Your State of Mind

The interactions between storage and applications can be complex and subtle.

Applications have had an interesting evolution as they have moved into the distributed and scalable world. Similarly, storage and its cousin databases have changed side by side with applications. Many times, the semantics, performance, and failure models of storage and applications do a subtle dance as they change in support of changing business requirements and environmental challenges. Adding scale to the mix has really stirred things up. This article looks at some of these issues and their impact on systems.

by Pat Helland

The Secret Formula for Choosing the Right Next Role

The best careers are not defined by titles or resume bullet points.

Focus on factors that will increase your career capital and make you a more valuable hire in your next role, and the one after that, and the one after that. When you are looking at the options for your next role, there are smarter choices that you can make.

by Kate Matsudaira

GitOps: A Path to More Self-service IT

IaC + PR = GitOps

GitOps lowers the bar for creating self-service versions of common IT processes, making it easier to meet the return in the ROI calculation. GitOps not only achieves this, but also encourages desired behaviors in IT systems: better testing, reduction of bus factor, reduced wait time, more infrastructure logic being handled programmatically with IaC, and directing time away from manual toil toward creating and maintaining automation.

by Thomas A. Limoncelli

Knowledge Base Construction in the Machine-learning Era

Three critical design points: Joint-learning, weak supervision, and new representations

More information is accessible today than at any other time in human history. From a software perspective, however, the vast majority of this data is unusable, as it is locked away in unstructured formats such as text, PDFs, web pages, images, and other hard-to-parse formats. The goal of knowledge base construction is to extract structured information automatically from this "dark data," so that it can be used in downstream applications for search, question-answering, link prediction, visualization, modeling and much more. Today, knowledge bases are the central components of systems that help fight human trafficking, accelerate biomedical discovery, and, increasingly, power web-search and question-answering technologies.

by Alex Ratner, Christopher Ré

Corp to Cloud: Google's Virtual Desktops

How Google moved its virtual desktops to the cloud

Over one-fourth of Googlers use internal, data-center-hosted virtual desktops. This on-premises offering sits in the corporate network and allows users to develop code, access internal resources, and use GUI tools remotely from anywhere in the world. Among its most notable features, a virtual desktop instance can be sized according to the task at hand, has persistent user storage, and can be moved between corporate data centers to follow traveling Googlers. Until recently, our virtual desktops were hosted on commercially available hardware on Google's corporate network using a homegrown open-source virtual cluster-management system called Ganeti. Today, this substantial and Google-critical workload runs on GCP (Google Compute Platform). This article discusses the reasons for the move to GCP, and how the migration was accomplished.

by Matt Fata, Philippe-Joseph Arida, Patrick Hahn, Betsy Beyer

The Obscene Coupling Known as Spaghetti Code

Teach your junior programmers how to read code

Since you both are working on the same code base, you also have ample opportunity for leadership by showing this person how you code. You must do this carefully or the junior programmer will think you're pulling rank, but, with a bit of gentle show and tell, you can get your Padawan to see what you're driving at. This human interaction is often difficult for those of us who prefer to spend our days with seemingly logical machines. Mentorship is the ultimate test of leadership and compassion, and I really hope you don't wind up sliced in half on the deck of a planet-smashing space station.

by George Neville-Neil

The Mythos of Model Interpretability

In machine learning, the concept of interpretability is both important and slippery.

Supervised machine-learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? Models should be not only good, but also interpretable, yet the task of interpretation appears underspecified. The academic literature has provided diverse and sometimes non-overlapping motivations for interpretability and has offered myriad techniques for rendering interpretable models. Despite this ambiguity, many authors proclaim their models to be interpretable axiomatically, absent further argument. Problematically, it is not clear what common properties unite these techniques. This article seeks to refine the discourse on interpretability. First it examines the objectives of previous papers addressing interpretability, finding them to be diverse and occasionally discordant. Then, it explores model properties and techniques thought to confer interpretability, identifying transparency to humans and post hoc explanations as competing concepts. Throughout, the feasibility and desirability of different notions of interpretability are discussed. The article questions the oft-made assertions that linear models are interpretable and that deep neural networks are not.

by Zachary C. Lipton