WebAssembly: How Low Can a Bytecode Go?

New performance and capabilities

Ben Titzer

In 2017 the web platform added its fourth official language, WebAssembly. Following on HTML, CSS, and JavaScript as the standard markup, styling, and scripting languages, WebAssembly, or Wasm for short, added a new, powerful capability in the form of a low-level, binary format for compiled code that runs at near-native execution speed. Addressing use cases that prior solutions such as asm.js and NaCl (Native Client) sought to solve, Wasm was designed in an open, collaborative process that involved all browser vendors and included experts in programming languages, verification, and virtual machines.

Since its launch, Wasm has brought new performance and capabilities to the platform with featureful applications such as game engines, desktop applications such as Photoshop¹ and AutoCAD , audio processing, simulations, scientific computing, and machine learning, as well as a growing list of new and existing programming languages.

Yet even as Wasm was designed as a solution to specific problems emerging for the web, it was purposefully designed with more general uses in mind. The core specification lays out a format, verification, and execution rules independent of how Wasm is embedded into a host platform. Here the focus is on the design of the bytecode, demystifying the core features of Wasm bytecode.

Why a New Bytecode?

Prior to Wasm, JavaScript was the only truly native programming language for the web, and compiling to JavaScript was the only choice for other programming languages to target the platform. While JavaScript was initially slow to execute, engines improved in performance by leaps and bounds, unlocking performance that allowed the explosion of Web 2.0. Yet sophisticated applications such as game engines and rich applications rivaling desktop power pushed scalability to its limits, and the drawbacks of JavaScript as a compile target became apparent.

One issue was program size. Even with minification and compression, JavaScript source code can be large, and parsing can be a bottleneck for the often huge artifacts that resulted from compiling even moderately sized C/C++ applications to JavaScript. Another issue was the mismatch between JavaScript semantics and the low-level nature of these applications, which led to the design of asm.js, a subset of JavaScript with numerical type annotations in the form of dynamic coercions that worked almost entirely by accident. (Though JavaScript had only 64-bit floating-point numbers, its unusual choice for the semantics for bitwise operators meant that (a|0 + b|0)|0 is equivalent to a 32-bit two's complement integer.)

By 2013, JavaScript engines were experimenting with completely separate compilation pipelines to make asm.js efficient, complete with custom parsers, validators, code generators—and bugs, which often resulted in security vulnerabilities. By 2015, engineers from different browser vendors had already recognized that the long-term trajectory would be best served by purposefully, rather than accidentally, designing a bytecode.

Security was a critical design criteria for the bytecode. Untrusted code is the norm on the web, making malicious code a pervasive threat. Experience has shown that the complexity of an input language (especially a Turing-complete programming language) plus the complexity of the software to process and run that language multiplies the risk of bugs and security vulnerabilities. Browser vendors know this all too well, as the intricate complexity of optimizations needed to accelerate JavaScript casts a long bug tail. Thus, in designing Wasm, a healthy paranoia led to the highest levels of specification rigor to ensure that its definition was clear, unambiguous, internally consistent, and sound, bolstering confidence that promised safety properties do hold.

Module Structure and Verification

Wasm code is organized into modules, which are akin to an executable file, or part of one. A Wasm module can be as small as a few dozen bytes for a single function, or as large as an entire application, stretching into hundreds of megabytes. The binary format makes extensive use of variable-length integers, ensuring that small modules are small without limiting how large modules can eventually become. (Despite unlimited size integers in the binary format, Wasm engines enforce reasonable, standard limits for how big or complicated a module can be, such as 1 million maximum functions and a total size of 1GiB .) A module is divided into sections that declare functions, memories, tables, global variables, and static data. A key difference with native executables is that Wasm bytecode instructions are grouped into functions with statically typed parameters, results, and local variables, rather than an unstructured executable region of instructions that can be jumped and called into without restriction.

Paramount to Wasm's isolation properties is that all operations in core Wasm can access only a module's own internal state. Modules must import functions (and memories, tables, etc.) in order to access state outside the module or platform capabilities. Imports may be provided by the host environment, such as JavaScript and the web, or from other modules. That means that Wasm modules are always self-contained, with imports and exports describing the interface to the outside world.

On the web, clients must verify completely untrusted code because no central authority exists to vet or sign code. The verification process is of critical importance, and bugs in specifications have caused severe security vulnerabilities in past code-verification systems. While modern programming language formalizations such as type systems help in defining precise and sound specifications of how to typecheck code, implementations are concrete and algorithmic. As a critical security measure, this validation algorithm must be straightforward to implement, well tested, and efficient, as it is on the critical path for application loading.

To minimize latency, Wasm is thus designed so modules can be verified in a single forward pass, as shown in figure 1. Wasm modules are organized into sections that declare types, imports, functions, memories, and exports (tables and other segments not shown). The ordering between sections allows the module, as well as the code in each function, to be validated in a single pass, with an efficient single-pass abstract interpreter to infer the types of the operand stack. Because of careful ordering of the sections of a Wasm module, the information needed at each step precedes it. Decoding and validation can therefore happen in a streaming fashion, where sections, declarations, and individual function bodies are being validated as the module's bytes arrive over the network before the entire module is even complete.

WebAssembly: How Low Can a Bytecode Go? Fig 1: WebAssembly modules organized into sections

Another critical factor is verifying control-flow integrity, which establishes that the program doesn't jump to random memory addresses, even under adversarial conditions such as out-of-bounds memory accesses or stack smashing. A critical choice in designing Wasm was to make control-flow verification fast and simple, which motivated three important choices: (1) code is organized into functions with only calls and returns between, (2) the execution stack is not addressable, and (3) local control flow uses blocks and loops with structured branches.

Execution of a Wasm Module

To run, a module must be instantiated, supplying bindings for its imports. At instantiation time, a Wasm engine creates the state (tables, globals, and memories) declared by the module, with the result being called an instance. An instance can export its own functions, memories, tables, etc. to other modules or the host environment. The primary dynamic storage of a Wasm program is typically one or more large, bounds-checked, byte-addressable memories, while global variables and tables of opaque host references can also be used.

On the web platform, Wasm modules can import/export their memories as WebAssembly.Memory object instances, from which a typed array can be created. Thus, through the use of imports, the Wasm memory can be passed to any web API that uses typed arrays. Since the first appearance of Wasm in browsers, Wasm added first-class function references and garbage-collected objects. These too are forms of local state and must be shared explicitly with other instances.

Wasm is often described as a low-level bytecode. This is because Wasm's abstractions are close to those provided by typical hardware. The large, page-sized but byte-addressable memories map directly onto memory provided by the underlying hardware and OS, and only simple bounds checks on the memory are necessary for safety. Unlike some attempts of making a safer C, type safety for the language that is compiled to Wasm is not enforced by the Wasm engine. Wasm memory is untyped, allowing unrestricted aliasing and byte-oriented access. The execution model provides only sandboxing, which prevents one Wasm module from accessing any state not explicitly created by it or granted to it.

In Wasm 1.0, memories are limited to 4GiB in size and are indexed with 32-bit integers. On 64-bit machines, this allows a more efficient bounds-checking strategy by reserving a large enough virtual address range and protecting all out-of-bounds pages so the hardware MMU (memory management unit) performs bounds checks via normal virtual address translation, as seen in figure 2. Execution stacks, tables, globals, and the (new) GC (garbage collection) heap can store references and managed data that is separated from the byte-addressable memories. A rich instruction set allows access to these storage mechanisms as well as offering an extensive set of operations on integers, floats, and vector types.

WebAssembly: How Low Can a Bytecode Go? Fig 2: Wasm 3.0 instruction set and storage mechanisms

Instructions and types are similarly low level. Wasm has the standard primitive data types that are available on all modern CPUs: 32- and 64-bit integers, 32- and 64-bit floating-point numbers, as well as a 128-bit vector type. A large set of standard integer and floating-point arithmetic instructions are available, which typically map one to one with native machine instructions and have bit-exact specifications for their outputs. This makes Wasm programs portable and deterministic. (With limited nondeterminism for some floating-point operations involving NaNs [not a number], Wasm has no undefined behavior.) Wasm allows multiple modules, memories, and instances to occupy a single host process, fully sandboxed from each other; 32-bit Wasm memory access can be implemented by simply adding the base address and relying on 64-bit virtual memory hardware without explicit bounds checks (see figure 3).

WebAssembly: How Low Can a Bytecode Go? Fig 3: Modules, memories, and instances in a single host process

Structured control flow

In one of Wasm's more unusual design choices, control flow within a function is represented in a structured way, with blocks, loops, and if/else instructions that must be properly nested. Branches to the end of a block or the beginning of a loop are allowed only from within their respective scopes. This was an intentional choice to improve code density and efficiency of validation, as the metadata needed per block is minimized to reusable control-stack entries. Structured control also implies reducible control-flow graphs, which simplify many compiler algorithms, such as loop analysis and transformations like peeling, unrolling, and liveness analysis, thus simplifying already-complicated engines. Producers whose main intermediate representation of code is a general CFG (control-flow graph) need to restructure the basic blocks. This can be done with several effective algorithms.³

Compiling Wasm Bytecode

Wasm code was intentionally designed to be similar to native instructions. Source compilers are expected to do most of the work to lower language constructs and implement data structures at the byte and word level. Having few high-level constructs means that compiling Wasm to machine code is relatively straightforward for the Wasm engine. With all modern browsers supporting sophisticated optimizing compilers with good instruction selection and register allocation, Wasm bytecode can be parsed into the compiler's intermediate representation close to the back end, and after some optimizations are applied, efficient native machine code is generated.

The compilation step from Wasm to machine code inevitably consumes time and memory that could delay application startup. Much effort has gone into making this step efficient. In particular, compilers face a long-known inherent compile-time/code-quality tradeoff. Empirically, large applications tend to have many functions that are rarely or never executed, and spending time optimizing them doesn't pay off. Though early browser execution strategies precompiled all Wasm code with an optimizing compiler before execution, today's web engines all employ multiple compilers for Wasm, where a quick single-pass compiler (often called a baseline⁵ compiler) is used first, and an optimizing compiler provides faster code for important functions later. Caching machine code for Wasm modules that have been previously loaded is common in browsers, similar to other large resources from websites. In many use cases outside the web, Wasm engines often compile modules to machine code ahead of time, allowing for optimization while making startup nearly instantaneous.

Interpreting Wasm Bytecode

Wasm is a little unusual in that it was designed after mature optimizing compilers were already present in systems that intended to use it. Historically, most bytecode designs first consider efficient interpretation, and only later are optimizing compilers built. Because of the ready availability of optimizing compilers in web engines, Wasm was pointedly not designed to be interpreted efficiently. One example is structured control flow. While this can actually be a benefit to validation and fast compilation, it is challenging to implement an interpreter.

Despite this, several interpreters for Wasm soon appeared, such as the first tier of the JavaScriptCore engine powering Safari, and Wasm3, a low-memory engine designed primarily for embedded applications. Both of these interpreters take the approach of rewriting bytecode into an internal form with more traditional offset-based branches. (One might argue that an interpreter that requires a rewrite step is, in fact, a compiler.) The rewriting step takes time and memory, lessening some of the startup advantages of interpreters. Yet in 2022 it was shown⁴ that Wasm could indeed be interpreted efficiently in place without rewriting, with the help of a side-table data structure that provides bytecode deltas to help with branches. Today, at least two engines have in-place interpreters that use side tables: the Wizard Research Engine (standard, default) and JavaScriptCore (experimental).

New Features

Since first appearing in browsers, Wasm has continued to add features that expose more hardware performance and improve ergonomics for more languages. After the MVP (minimum viable product) release in 2017, early additions to the standard were reference types, bulk memory operations, multiple value returns (and values within blocks), and additional primitive type conversions. A key performance feature was the addition of a 128-bit vector type and associated instructions in the Wasm SIMD (single instruction, multiple data) extension, which was ratified into the standard in 2021. This feature adds a new primitive type and more than 200 instructions that perform 2-, 4-, 8-, and 16-lane integer and floating operations and memory accesses, making it the only portable bytecode to date to offer portable, deterministic vector operations with close-to-native performance.

More recently completed features include exception handling, 64-bit memories, tail-calls, and atomics. Exception handling adds the ability to throw and catch exception packages, a feature long demanded for large C++ codebases targeting the web, which previously relied on JavaScript exception handling. Exception handling also serves languages such as Kotlin, OCaml, and Java. Larger 64-bit-addressable memories also increase Wasm's ability to handle memory-intensive applications. While tail calls had a bumpy history in the JavaScript language, they constituted an addition of just two bytecodes to Wasm. Atomics add the ability to share Wasm memories among multiple web workers and provide a memory model that allows multithreaded programs to work efficiently and portably. Though core Wasm does not have a native mechanism to launch threads, other host environments can provide thread-creation mechanisms that are appropriate to use case (e.g., WASI (WebAssembly System Interface) threads).

Another feature with a long design phase that is now completed is garbage collection for Wasm programs. While Wasm is primarily a low-level language close to hardware, many of today's most popular languages rely on GC. Prior to Wasm, some of these languages targeted the web by compiling to JavaScript, inheriting its relatively inefficient object model and performance unpredictability. While some languages have invested considerable effort in optimizing their JavaScript output, the unpredictability of JavaScript engine optimizations has been a stumbling block.

Another approach is to simply include the GC implementation directly into applications as Wasm bytecode, with the collector operating on the byte-level representation of language objects. A key design problem that arises is root finding, where a collector identifies roots in execution stacks and updates them when moving objects. Yet Wasm's execution model does not allow addressing the stack at all, so indirectly accessing the contents of execution frames is not possible. To work around this, GC implementations targeting Wasm can use a shadow stack, a separate region within a Wasm memory that stores references (rather than storing them directly in execution frames), allowing runtime root-finding routines to operate with no special support from the engine.

The shadow stack comes with significant drawbacks, however, as it introduces a level of indirection and can complicate cross-module linking, even for modules written in the same language and compiled separately. A worse problem is that references to objects in the embedder language (i.e., JavaScript) cannot be written into Wasm memory, complicating cross-language interoperability with additional indirection through tables. Without strict discipline, this can lead to memory leaks. But perhaps the most critical drawback is that an application-specific GC with its own root finding makes advanced GC algorithms that employ concurrency, parallelism, and incrementalism nearly impossible, as root finding becomes a joint venture² between two completely different collectors.

So, in order to better serve garbage-collected languages and to attract other managed languages to Wasm and the web, the recently completed Wasm GC feature introduces fixed-size structs and arrays with automatic heap management. This proposal was codesigned with the function-references proposal (which adds statically typed first-class function references) and offers a low-level object model from which more complex object models can be built. Unlike bytecodes for Java, .NET, or Python, it doesn't provide classes or method dispatch, but a simpler model: fixed-size, statically typed structs and arrays. Allocated on an engine's internal heap, these objects require minimal metadata and store fields and array elements in efficient, unboxed representations. The design allows engines to employ their existing sophisticated garbage collectors and leverage a unified heap for both host (e.g., JavaScript) and Wasm objects. Now available in all browsers (and starting to appear in non-web engines) the Wasm GC object model provides languages such as Java, Kotlin, OCaml, and Scala a powerful new target.

The Future is Wide Open

Wasm is still growing with new features to address performance gaps as well as recurring pain points for both languages and embedders. The process for proposing new features is governed by the W3C (World Wide Web Consortium), and the CG (Community Group) is open to all. As the W3C's largest community group, Wasm has a vibrant and open community, with biweekly meetings of the main group and subgroups dedicated to important proposals. In addition to developing and maintaining a detailed, open specification, the community has built dozens of open source tools and repositories maintained by a diverse set of developers from browser vendors, cloud/edge providers, web developers, representatives from large software and hardware vendors, and active volunteers from around the world.

Wasm has a wide set of use cases outside of the web, with applications from cloud/edge computing to embedded and cyber-physical systems, databases, application plug-in systems, and more. Yet the core specification is so cleanly separated from the host environment that most design work can focus on making the best bytecode design possible. With a completely open and rigorous specification, it has unlocked a plethora of exciting new systems that use Wasm to bring programmability large and small. Those use cases have attracted more than 40 programming languages with official support. With many languages and many targets, Wasm could one day become the universal execution format for compiled applications.

References

1. Al-Shamma, N., Nattestad, T. 2021. Photoshop's journey to the web. web.dev; https://web.dev/articles/ps-on-the-web.

2. Degenbaev, U., Lippautz, M., Payer, H. 2019. Garbage collection as a joint venture. Communications of the ACM 62(6), 36–41; https://dl.acm.org/doi/10.1145/3316772.

3. Ramsey, N. 2022. Beyond Relooper: recursive translation of unstructured control flow to structured control flow (functional pearl). Proceedings of the ACM on Programming Languages 6 (ICFP), Article No. 90, 1–22; https://dl.acm.org/doi/abs/10.1145/3547621.

4. Titzer, B.L. 2022. A fast in-place interpreter for WebAssembly. Proceedings of the ACM on Programming Languages 6 (OOPSLA2), Article No. 148, 646–672; https://dl.acm.org/doi/abs/10.1145/3563311.

5. Titzer, B.L. 2024. Whose baseline compiler is it anyway? Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 207–220; https://dl.acm.org/doi/10.1109/CGO57630.2024.10444855.

Ben L. Titzer is a principal researcher at Carnegie Mellon University. A former member of the V8 team at Google, he cofounded the WebAssembly project, led the team that built the implementation in V8, and led the initial design of V8's TurboFan optimizing compiler. Prior to that he was a researcher at Sun Labs and contributed to the Maxine Java-in-Java VM. He is now working on a new Wasm research engine called Wizard and several Wasm-related research projects. He is the designer and main implementer of the Virgil programming language.

Originally published in Queue vol. 23, no. 3—
Comment on this article in the ACM Digital Library