Cloud computing has been pioneering the business of renting computing resources in large data centers to multiple (and possibly competing) tenants. The basic enabling technology for the cloud is
A key driver of the growth of cloud computing in the early days was server consolidation. Existing applications were often installed on physical hosts that were individually underutilized, and virtualization made it feasible to pack them onto fewer hosts without requiring any modifications or code recompilation. VMs are also managed via software APIs rather than physical actions.
They can be centrally backed up and migrated across different physical hosts without interrupting service. Today commercial providers such as Amazon and Rackspace maintain vast data centers that host millions of VMs. These cloud providers relieve their customers of the burden of managing data centers and achieve economies of scale, thereby lowering costs.
While
This problem has received a lot of thought at the University of Cambridge, both at the Computer Laboratory (where the Xen hypervisor originated in 2003) and within the Xen Project (custodian of the hypervisor that now powers the public cloud via companies such as Amazon and Rackspace). The
The goal of MirageOS is to restructure entire
A typical VM running on the cloud today contains a full
Despite containing many flexible layers of software, most deployed VMs ultimately perform a single function such as acting as a database or Web server. The shift toward single-purpose VMs is a reflection of just how easy it has become to deploy a new virtual computer on demand. Even a decade ago, it would have taken more time and money to deploy a single (physical) machine instance, so the single machine would need to run multiple end-user applications and therefore be carefully configured to isolate the constituent services and users from each other.
The software layers that form a VM haven't yet caught up to this trend, and this represents a real opportunity for optimization—not only in terms of performance by adapting the appliance to its task, but also for improving security by eliminating redundant functionality and reducing the attack surface of services running on the public cloud. Doing so statically is a challenge, however, because of the structure of existing operating systems.
The modern hypervisor provides a resource abstraction that can be scaled dynamically— both vertically by adding memory and cores, and horizontally by spawning more VMs. Many applications and operating systems can't fully utilize this capability since they were designed before modern hypervisors came about (and the physical analogs such as memory hotplug were never ubiquitous in commodity hardware). Often, external application-level load balancers are added to traditional applications running in VMs in order to make the service respond elastically by spawning new VMs when load increases. Traditional systems, however, are not optimized for size or boot time (Windows might apply a number of patches at boot time, for example), so the load balancer must compensate by keeping idle VMs around to deal with load spikes, wasting resources and money.
Why couldn't these problems with operating systems simply be fixed? Modern operating systems are intended to remain resolutely general-purpose to solve problems for a wide audience. For example, Linux runs on an incredibly diverse set of platforms, from low-power mobile devices to high-end servers powering vast data centers. Compromising this flexibility simply to help one class of users improve application performance would not be acceptable.
On the other hand, a specialized server appliance no longer requires an OS to act as a resource multiplexer since the hypervisor can do this at a lower level. One obvious problem with this approach is that most existing code presumes the existence of large but rather calcified interfaces such as POSIX or the Win32 API. Another potential problem is that conventional operating systems provide services such as a TCP/IP stack for communication and a file-system interface for storing persistent data: in our brave new world, where would these come from?
The MirageOS
This is not the first time people have asked these existential questions about operating systems. Several research groups have proposed
The libOS architecture has several advantages over more conventional designs. For applications where
The libOS architecture has two big drawbacks. First, running multiple applications side by side with strong resource isolation is tricky (although Nemesis did an admirable job of minimizing crosstalk between interactive applications). Second, device drivers must be rewritten to fit the new model. The
Happily, OS virtualization overcomes these drawbacks on commodity hardware. A modern hypervisor provides VMs with CPU time and strongly isolated virtual devices for networking, block storage, USB, and PCI bridges. A libOS running as a VM needs to implement only drivers for these virtual hardware devices and can depend on the hypervisor to drive the real physical hardware. Isolation between libOS applications can be achieved at low cost simply by using the hypervisor to spawn a fresh VM for each distinct application, leaving each VM free to be extremely specialized to its particular purpose. The hypervisor layer imposes a much simpler, less
Although OS virtualization has made the libOS possible without needing an army of device- driver writers, protocol libraries are still needed to replace the services of a traditional operating system. Modern kernels are all written in C, which excels at
Figure 2 shows the logical workflow in MirageOS. Precise dependency tracking from source code (both local and global libraries) and configuration files lets the full provenance of the deployed kernel binaries be recorded in immutable data stores, sufficient to precisely recompile it on demand.
MirageOS aims to unify these diverse
• Static type checking. Compilers can classify program variables and functions into types and reject code where a variable of one type is operated on as if it were a different type. Static type checking catches these errors at compile time rather than runtime and provides a flexible way for a systems programmer to protect different parts of a program from each other without depending solely on hardware mechanisms such as virtual memory paging. The most obvious benefit of type checking is the resulting lack of memory errors such as buffer or integer overflows, which are still prevalent in the CERT (Computer Emergency Readiness Team) vulnerability database. A more advanced use is
• Automatic memory management. Runtime systems relieve programmers of the burden of allocating and freeing memory, while still permitting manual management of buffers (e.g., for ef- ficient I/O). Modern garbage collectors are also designed to minimize application interruptions via incremental and generational collection, thus permitting their use in
• Modules. When the code base grows, modules partition it into logical components with well-defined interfaces gluing them together. Modules help software development scale as internal imple- mentation details can be abstracted and the scope of a single
• Metaprogramming. If the runtime configuration of a system is partially understood at compile time, then a compiler can optimize the program much more than it would normally be able to. Without knowledge of the runtime configuration, the compiler's hands are tied, as the output pro- gram must remain completely generic, just in case. The goal here is to unify configuration and code at compilation time and eliminate waste before deploying to the public cloud.
Together, these features significantly simplify the construction of
We started building the MirageOS prototype in 2008 with the intention of understanding how far we could unify the programming models underlying library operating systems and
The challenge was to identify the correct modular abstractions to support the expression of an entire operating system and application software stack in a single manageable structure. MirageOS has since grown into a mature set of almost 100
Figure 2 illustrates MirageOS's design. It grants the compiler a much broader view of
• All source-code dependencies of the input application are explicitly tracked, including all the libraries required to implement kernel functionality. MirageOS includes a build system that internally uses a SAT solver (using the OPAM package manager, with solvers from the Mancoosi project) to search for compatible module implementations from a published online package set. Any mismatches in interfaces are caught at compile time because of OCaml's static type checking.
• The compiler can then output a full stand-alone kernel instead of just a Unix executable. These unikernels are single-purpose libOS VMs that perform only the task defined in their application source and configuration files, and they depend on the hypervisor to provide resource multiplexing and isolation. Even the bootloader, which has to set up the virtual memory page tables and initialize the language runtime, is written as a simple library. Each application links to the specific set of libraries it needs and can glue them together in application-specific ways.
• The specialized unikernels are deployed on the public cloud. They have a significantly smaller attack surface than the conventional virtualized equivalents and are more
OCaml is the sole base language for MirageOS for a few key reasons. It is a
OCaml supports the definition of module signatures (a collection of
Consider a simple example. Figure 3 shows a partial module graph for a static Web server. Libraries are module graphs that abstract over
When the programmers are satisfied that their HTTP logic is working, they can recompile to switch away from using Unix sockets to the OCaml TCP/IP stack shown by MirTCP in figure 3. This still requires a Unix kernel but only as a shell to deliver Ethernet frames to the
Libraries in MirageOS are designed in as functional a style as possible: they are reentrant with explicit state handles, which are in turn serializable so that they can be reconstructed explicitly. An application consists of a set of libraries plus some configuration code, all linked together. The configuration is structured as a tree roughly like a file system, with subdirectories being parsed by each library to initialize their own values (reminiscent of the Plan 9 operating system). All of this is connected by
The metaprogramming extends into storage as well. If an application uses a small set of files (which would normally require all the baggage of block devices and a file system), MirageOS can convert it into a static OCaml module that satisfies the
One (deliberate) consequence of metaprogramming is that large blocks of functionality may be entirely missing from the output binary. This makes dynamic reconfiguration of the most specialized targets impossible, and a configuration change requires the unikernel to be relinked. The lines of active (i.e.,
In a conventional OS, application source code is first compiled into object files via a
In MirageOS, the OCaml compiler receives the source code for an entire kernel's worth of code and links it into a
The Xen unikernel compilation derives its performance benefit from the fact that the running kernel has a single virtual address space, designed to run only the OCaml runtime. The virtual address space of the MirageOS Xen unikernel target is shown in figure 4. Since all configuration information is explicitly part of the compilation, there is no longer a need for the usual dynamic linking support that requires executable mappings to be added after the VM has booted.13
Consider the life cycle of a traditional application. First the source code is compiled to a binary. Later, the binary is loaded into memory and an OS process is created to execute it. The first thing the running process will do is read its configuration file and specialize itself to the environment it finds itself in. Many different applications will run exactly the same binary, obtained from the same binary package, but with different configuration files. These configuration files are effectively additional program code, except they're normally written in
Configuration is a considerable overhead in managing the deployment of a large
In MirageOS, rather than treating the database, web server, etc. as independent applications that must be connected by configuration files, they are treated as libraries within a single application, allowing the application developer to configure them using either simple library calls for dynamic parameters or metaprogramming tools for static parameters. This has the useful effect of making configuration decisions explicit and programmable in a host language rather than manipulating many
One downside to a unikernel is the burden it places on the cloud orchestration layers because of the need to schedule many more VMs with greater churn (since every reconfiguration requires the VM to be redeployed). The popular orchestration implementations have grown rather organically in recent years and consist of many distributed components that are not only difficult to manage, but also relatively high in latency and resource consumption.
One of the first production uses for MirageOS is to fix the
VMs easy: they are first built and tested as regular Unix applications before flipping a switch and relinking against the Xen kernel libraries
The cloud is an environment where all resource usage is metered and rented. At the same time, multitenant services suffer from variability in load that encourages rapid scaling of deployments— both up to meet current demand and down to avoid wasting money. In MirageOS, features that are not used in a particular build are not included, and
The small binary size of the unikernels (on the order of hundreds of kilobytes in many cases) makes deployment to remote data centers across the Internet much smoother. Boot time is also easily less than a second, making it feasible to boot a unikernel in response to incoming network packets.
Figure 5 compares the boot time of a service in MirageOS and a Linux/Apache distribution. The boot time of a
The MLton20 compiler pioneered WPO (whole program optimization), where an application and all of its libraries are optimized together. In the libOS world, a whole program is actually a whole operating system: this technique can now optimize all the way from
An interesting recent trend is a move toward
The structure of MirageOS libraries shown in figure 3 explicitly encodes what the library needs from its execution environment. While this has conventionally meant a
For example,
Other
MirageOS is certainly not the only unikernel that has emerged in the past few years, although it is perhaps the most extreme in terms of exploring the
On the other end of the spectrum, OSv2 and rump kernels9 provide a compatibility layer for existing applications, and deemphasize the programming model improvements and type safety that guide MirageOS. The Drawbridge project16 converts Windows into a libOS with just a reported
Ultimately, the public cloud should support all these emerging projects as
The MirageOS effort has been a large one and would not be possible without the intellectual and financial support of several sources. The core team of Richard Mortier, Thomas Gazagnaire, Jonatham Ludlam, Haris Rotsos, Balraj Singh, and Vincent Bernardoff have toiled to help us build the
This work was primarily supported by Horizon Digital Economy Research, RCUK grant EP/ G065802/1. A portion was sponsored by DARPA (Defense Advanced Research Projects Agency) and AFRL (Air Force Research Laboratory), under contract
MirageOS is available freely at http://openmirage.org. We welcome feedback, patches, and improbable stunts using it.
1. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A. 2003. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles:
2. Cloudius Systems. OSv; https://github.com/cloudius-systems/osv.
3. Colp, P., Nanavati, M., Zhu, J., Aiello, W., Coker, G., Deegan, T., Loscocco, P., Warfield, A. 2011. Breaking up is hard to do: security and functionality in a commodity hypervisor. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP):
4. Crowcroft, J., Madhavapeddy, A., Schwarzkopf, M., Hong, T., Mortier, R. Unclouded vision. In Proceedings of the International Conference on Distributed Computing and Networking (ICDCN) 29-40.
5. Eisenstadt, M. My hairiest bug war stories. 1997. Communications of the ACM 40(4):
6. Engler, D. R., Kaashoek, M. F., O'Toole, Jr., J. 1995. Exokernel: an operating system architecture for
7. Eriksen, M. 2013. Your server as a function. In Proceedings of the Seventh Workshop on Programming Languages and Operating Systems (PLOS): 5:1-5:7.
8. Galois Inc. The Haskell Lightweight Virtual Machine (HaLVM) source archive; https://github.com/GaloisInc/HaLVM.
9. Kantee, A. 2012. Flexible operating system internals: the design and implementation of the anykernel and rump kernels. Ph.D. thesis, Aalto University, Espoo, Finland.
10. Leslie, I. M., McAuley, D., Black, R., Roscoe, T., Barham, P. T., Evers, D., Fairbairns, R., Hyden, E. 1996. The design and implementation of an operating system to support distributed multimedia applications. IEEE Journal of Selected Areas in Communications 14(7):
11. Madhavapeddy, A., Ho, A., Deegan, T., Scott, D., Sohan, R. 2007. Melange: creating a "functional" Internet. SIGOPS Operating Systems Review 41(3):
12. Madhavapeddy, A., Mortier, R., Crowcroft, J., S. Hand, S. 2010. Multiscale not multicore: efficient heterogeneous cloud computing. In Proceedings of ACM-BCS Visions of Computer Science. Electronic Workshops in Computing, Edinburgh, UK.
13. Madhavapeddy, A., Mortier, R., Rotsos, C., Scott, D., Singh, B., Gazagnaire, T., Smith, S., Hand, S., Crowcroft, J. 2013. Unikernels: library operating systems for the cloud. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS):
14. Minsky, Y. 2011. OCaml for the masses. Communications of the ACM 54(11):
15. Mortier, R., Madhavapeddy, A., Hong, T., Murray, D., Schwarzkopf, M. 2010. Using dust clouds to enhance anonymous communication. In Proceedings of the 18th International Workshop on Security Protocols (IWSP).
16. Porter, D. E.,
17. Scott, D., Sharp, R., Gazagnaire, T., Madhavapeddy, A. 2010. Using functional programming within an industrial product group: perspectives and perceptions. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP):
18. Vinge, V. 1992. A Fire Upon the Deep. New York, NY: Tor Books.
19. Watson, R. N. M. 2013. A decade of OS
20. Weeks, S. 2006.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
LOVE IT, HATE IT? LET US KNOW [email protected]
Anil Madhavapeddy is a Senior Research Fellow at the University of Cambridge, based in the Systems Research Group. He was on the original team that developed the Xen hypervisor and the
Dave Scott is a Principal Architect at Citrix Systems where he works on the XenServer virtualization platform. Dave's main focus is on improving XenServer reliability and performance through exploiting advances in
© 2013 ACM
Originally published in Queue vol. 11, no. 11—
Comment on this article in the ACM Digital Library
Matt Fata, Philippe-Joseph Arida, Patrick Hahn, Betsy Beyer - Corp to Cloud: Google’s Virtual Desktops
Over one-fourth of Googlers use internal, data-center-hosted virtual desktops. This on-premises offering sits in the corporate network and allows users to develop code, access internal resources, and use GUI tools remotely from anywhere in the world. Among its most notable features, a virtual desktop instance can be sized according to the task at hand, has persistent user storage, and can be moved between corporate data centers to follow traveling Googlers. Until recently, our virtual desktops were hosted on commercially available hardware on Google’s corporate network using a homegrown open-source virtual cluster-management system called Ganeti. Today, this substantial and Google-critical workload runs on GCP (Google Compute Platform).
Pat Helland - Life Beyond Distributed Transactions
This article explores and names some of the practical approaches used in the implementation of large-scale mission-critical applications in a world that rejects distributed transactions. Topics include the management of fine-grained pieces of application data that may be repartitioned over time as the application grows. Design patterns support sending messages between these repartitionable pieces of data.
Ivan Beschastnikh, Patty Wang, Yuriy Brun, Michael D, Ernst - Debugging Distributed Systems
Distributed systems pose unique challenges for software developers. Reasoning about concurrent activities of system nodes and even understanding the system’s communication topology can be difficult. A standard approach to gaining insight into system activity is to analyze system logs. Unfortunately, this can be a tedious and complex process. This article looks at several key features and debugging challenges that differentiate distributed systems from other kinds of software. The article presents several promising tools and ongoing research to help resolve these challenges.
Sachin Date - Should You Upload or Ship Big Data to the Cloud?
It is accepted wisdom that when the data you wish to move into the cloud is at terabyte scale and beyond, you are better off shipping it to the cloud provider, rather than uploading it. This article takes an analytical look at how shipping and uploading strategies compare, the various factors on which they depend, and under what circumstances you are better off shipping rather than uploading data, and vice versa. Such an analytical determination is important to make, given the increasing availability of gigabit-speed Internet connections, along with the explosive growth in data-transfer speeds supported by newer editions of drive interfaces such as SAS and PCI Express.