Research for Practice

October 4, 2017
Volume 15, issue 4

Download PDF version of this article PDF

Research for Practice: Private Online Communication; Highlights in Systems Verification

Expert-curated Guides to the Best of CS Research

This installment of Research for Practice covers two exciting topics in modern computer systems: private communication systems, and verified systems programming.

First, Albert Kwon provides an overview of recent systems for secure and private communication. While messaging protocols such as Signal provide privacy guarantees, Albert's selected research papers illustrate what is possible at the cutting edge: more transparent endpoint authentication, better protection of communication metadata, and anonymous broadcasting. These papers marry state-of-the-art cryptography with practical, privacy-preserving protocols, providing a glimpse of what we might expect from tomorrow's secure messaging systems.

Second, James Wilcox takes us on a tour of recent advances in verified systems design. It's now possible to build end-to-end verified compilers, operating systems, and distributed systems that are provably correct with respect to well-defined specifications, providing high assurance of well-defined, well-behaved code. Because these system components interact with low-level hardware like the instruction set architecture and external networks, each paper introduces new techniques to balance the tension between formal correctness and practical applicability. As programming language techniques advance and more of the modern computing stack continues to crystallize, expect these advances to make their way into production systems.

As always, our goal in this column is to allow our readers to become experts in the latest topics in computer science research in a weekend afternoon's worth of reading. To facilitate this process, we have provided open access to the ACM Digital Library for the relevant citations from these selections so you can enjoy these research results in full. Please enjoy! —Peter Bailis

Private Online Communication

The importance of private communication will continue to grow.

Albert Kwon

When we communicate online, we expect the same levels of privacy and anonymity as we do offline. Recent leaks, however, suggest that this is not the case, as large-scale surveillance threatens the privacy of our daily communication (see "NSA files decoded;" "NSA slide shows surveillance of undersea cables;" and "NSA spying").

Though perhaps as a result, there have also been significant efforts to protect privacy by both researchers and the open-source community: Tor helps millions of users stay anonymous online, and the Signal protocol (used by the Signal Messaging App and WhatsApp) brings end-to-end encrypted secure chats to more than a billion users already.

Still, many important challenges remain to ensure privacy of online communication. The set of papers presented here highlights recent work that addresses some of the challenges. The first set of papers, on CONIKS and CT (Certificate Transparency), tackle mechanisms for secure distribution of public keys for end-to-end encryption. The next paper presents Vuvuzela, showing how two mutually trusting persons can communicate online without revealing anything about the content of the conversation or the metadata (e.g., with whom and when one talks). The last paper discusses Riposte, a system in which users can send messages anonymously, meaning no one (not even the recipients of messages) can learn the sender of any message.

Public key infrastructures

Melara, M. S., et al. 2015. CONIKS: Bringing key transparency to end users. 24th Usenix Security Symposium.

Laurie, B., et al. 2013. Certificate Transparency (IETF RFC 6962).

End-to-end encryption is already prevalent in today's Internet (e.g., https/TLS), but an important bootstrapping problem remains: how can you be sure that you are encrypting for the right end point? Traditionally, we trust a small number of entities such as CAs (certificate authorities) or PGP (Pretty Good Privacy) key servers to maintain a valid list of public keys. Users can then query them to acquire the keys and start an encrypted communication channel. Unfortunately, as we have seen many times in practice (see "Iranian man-in-the-middle attack against Google demonstrates dangerous weakness of certificate authorities" and "How secure is HTTPS today? How often is it attacked?"), these entities can be compromised and thus provide incorrect keys to the users.

CONIKS and Certificate Transparency aim to remove the single point of trust and add transparency to the public-key infrastructures for end-user keys and TLS (Transport Layer Security) certificates, respectively. Though the details are different, the high-level ideas are similar: they both use transparency logs, which are the sets of public keys stored as Merkle trees. When a third party (e.g., users, dedicated monitors, etc.) requests a public key from a CA or a key server, the response comes with a proof that can be verified efficiently to ensure the key is correct. Both systems are practical, requiring only a few extra kilobytes of data to verify a key; in particular, CT has been deployed by many CAs, and many popular browsers such as Chrome and Firefox already have built-in support.

Private point-to-point communication

Lazar, D., et al. 2015. Vuvuzela: Scalable private messaging resistant to traffic analysis. Proceedings of the 25th Symposium on Operating Systems Principles.

Encryption can hide the content of the messages, but it does not hide potentially important metadata such as with whom and when one is talking. Vuvuzela is a recent work that hides as much metadata as possible by adding noise to the network to obfuscate users' actions.

The system consists of a handful of Vuvuzela servers that act collectively as a privacy provider. Vuvuzela users send messages to other users in the system through the Vuvuzela servers. As each server routes the messages, it also adds many dummy messages (messages indistinguishable from those sent by the real users) such that no adversary can learn if two users are communicating with each other, as long as one of the servers remains honest; a key insight of the paper is using differential privacy to determine the quantity of dummy messages required to provide provable strong privacy guarantees. Vuvuzela can support millions of users with commercially available machines for SMS-style messaging, where the users can tolerate some amount of latency. To my knowledge, this paper was one of the first uses of differential privacy for private communication, which is exciting in its own right.

Anonymous communication

Corrigan-Gibbs, H., et al. 2015. Riposte: An anonymous messaging system handling millions of users. Proceedings of the 2015 IEEE Symposium on security and privacy.

Sometimes, one might also want to hide his or her identity from the recipient of the message. A whistleblower, for example, might wish to send a message either to a large group or audience or a particular end point, without revealing the identity of the sender. Riposte is an anonymous broadcasting system (think anonymous Twitter) that enables exactly that for millions of users.

Similar to Vuvuzela, Riposte uses a small number of servers, one of which needs to be honest to guarantee anonymity. To send a message, a user splits his or her message into multiple shares, each of which is given to one of the servers. Each server then stores each share in a database. After a large number of users submit their messages, the servers come together to reveal all messages simultaneously without revealing the senders of the messages to anyone. The system can support millions of Tweet-length messages per day and is a great example of how theory meets practice: the system has a formal proof of security, a prototype implementation, and evaluation.

Final thoughts

As the world becomes more connected, the importance of private communication will continue to grow. The papers presented here are only a few examples of recent works on the topic. Many other interesting papers have been written about private communication. Pung, for example, is another private point-to-point communication system that provides privacy under an even stronger threat model at the cost of latency. Other anonymity networks such as Dissent and Riffle provide anonymity guarantees similar to Riposte but with different tradeoffs.

Many important challenges remain, however, to realize private communication for everyone. To list just a few: how can we scale private communication to billions of users? How can we hold users accountable without sacrificing their privacy and anonymity? How do we make privacy user friendly? Without a doubt, many more interesting works will come in the near future.

Highlights in Systems Verification

We need techniques to build larger verified systems from verified components.

James R. Wilcox

Humanity now relies on software in all aspects of life, including safety-critical applications. Programmers use a spectrum of techniques to ferret out bugs, most commonly testing or static analysis. At the most rigorous end of this spectrum is formal verification, which for decades has sought to guarantee the absence of bugs using mathematical proof.

In recent decades, the research community has developed techniques that allow one to verify important properties of real systems. When reviewing this work, it is important to consider not only the guarantees each system makes, but also their assumptions; these assumptions are known as the trusted computing base, or TCB.

The rest of this column highlights three applications of verification techniques to pervasive systems infrastructure: compilers, operating systems, and distributed systems. These projects point to a future where practical systems can be built from existing verified components, eliminating entire classes of bugs—from the hardware up to the application logic.

Verified compilers: CompCert

Leroy, X. 2009. Formal verification of a realistic compiler. Communications of the ACM 52(7): 107-115.

Compiler bugs are infectious: a buggy compiler can make otherwise correct source programs misbehave at runtime. This is concerning for any programmer, but especially so if the programmer wants to reason at the source level or use source-level program analysis. Any analysis results are at risk of being invalidated by the compiler's disease.

CompCert is a C compiler that has been formally proven never to miscompile source programs. More precisely, CompCert is guaranteed to produce assembly code that is equivalent to the C source program. CompCert is programmed and proved using the Coq proof assistant.

Like all verified systems, CompCert trusts certain other pieces of software to be correct. The TCB of a verified system generally includes tools used to carry out the verification, the specification, and the shim, or glue code, used to connect the system to the rest of the world. CompCert's TCB contains tools such as Coq itself, the OCaml compiler and runtime, and the operating system; the specification, including the semantics of both C and the target assembly language; and its shim, which is an unverified OCaml program responsible for reading files from disk and so on.

Verified operating systems: seL4

Klein, G., et al. 2010. seL4: Formal verification of an operating-system kernel. Communications of the ACM 53(6): 107-115.

Operating-systems bugs, like compiler bugs, may cause a correct program to misbehave. Even worse, the OS strain of contagious bugs can result in unintended interaction among processes. Such interaction can lead to security holes, such as leaks of sensitive data across process boundaries.

seL4 is an operating-system kernel that is verified for full functional correctness. More precisely, seL4 is shown to refine an abstract specification of its behavior. This refinement guarantees, among other things, that no system calls ever panic unexpectedly, loop infinitely, or return wrong results. These guarantees are sufficient to establish security properties such as access control and process isolation.

Thw refinement proof, done in the Isabelle/HOL proof assistant, first shows that the C implementation refines an executable specification written in Haskell; the Haskell specification is then shown to refine the abstract specification.

seL4's TCB includes Isabelle/HOL itself, the C compiler, the hardware, the abstract specification, and the shim, which consists of several hundred lines of hand-written assembly.

Verified distributed systems: Verdi

Wilcox, J. R., et al. 2015. Verdi: A framework for implementing and formally verifying distributed systems. Proceedings of the Conference on Programming Language Design and Implementaion.

At least compiler and operating-system bugs are localized on a particular node. In contrast, avoiding distributed-systems bugs requires reasoning about the interaction between nodes. Furthermore, distributed systems must tolerate failure of the underlying hardware.

Verdi supports reasoning about distributed systems by modeling the network using network semantics, which formally capture the potential faults the nodes might experience, including dropped messages, crashing machines, etc. Verdi employs VSTs (verified systems transformers), which can, for example, add fault tolerance to existing systems. Verdi has been used to verify the Raft consensus protocol as a VST from a single node to a replicated system.

Verdi trusts Coq itself, the OCaml compiler and runtime, the fact that the network obeys the fault model, and its shim, which is an unverified OCaml program responsible for low-level network and disk access.

Future directions

The research community has now verified many pieces of common infrastructure. Going forward, how do we connect these pieces to build larger verified applications?

As a simple example, imagine combining verified systems from the domains highlighted here to build a verified replicated key-value store. Such a system would use Raft for replication; be compiled with a verified compiler; and run on a verified operating system. Today, it is not clear how to execute such a plan for several reasons.

First, the systems are written in different proof assistants—CompCert and Verdi in Coq, seL4 in Isabelle/HOL—so it is not directly possible to reason about their composition.

Next, it is likely that the systems make subtly incompatible assumptions about each other or their shared environments (e.g., seL4 may use a feature of C that CompCert does not support). In a similar vein, the correctness theorems of the systems are not designed to work together logically; for example, the assumptions Verdi makes about the operating system are unlikely to be exactly what seL4 proves.

Finally, many techniques require reimplementing the system from scratch in a way that supports verification, but this is impractical in a world of large legacy systems.

We need techniques to build larger verified systems from verified components. One bright spot on this horizon is the recent DeepSpec project, which seeks to connect verification projects across many abstraction layers. Future work should seek to integrate verified systems into a library of reliable components that can be snapped together to build bug-free applications. The existence of such a library will also lower the barrier to entry for verifying systems, eventually leading to a world where it is no more expensive to verify a system than to test it thoroughly.

Albert Kwon is a Ph.D. candidate in the EECS department at MIT. He is broadly interested in applied cryptography, and he likes to design and implement systems that can enhance users' privacy. During his time at MIT, he has worked on oblivious RAMs, public key infrastructure, cryptocurrencies, anonymous reputation, and anonymous communication networks. His dissertation focuses on scaling anonymous communication to Internet scale while providing strong anonymity guarantees.

James Wilcox is a Ph.D. student at the University of Washington in the Programming Languages and Software Engineering lab (http://uwplse.org/). His research develops program verification techniques for distributed and multithreaded systems.

Originally published in Queue vol. 15, no. 4—
Comment on this article in the ACM Digital Library