The Bike Shed

  Download PDF version of this article PDF

Linear Address Spaces

Unsafe at any speed

Poul-Henning Kamp

I recently bought an Apple computer with the new M1 CPU to supplement the beastiarium known as Varnish Cache's continuous integration cluster. I am a bit impressed that it goes head to head with the s390x virtual machine borrowed from IBM, while never drawing more than 25 watts, but other than that: Meh...

This is one disadvantage of being a systems programmer: You see up close how each successive generation in an architecture has been inflicted with yet another "extension," "accelerator," "cache," "look-aside buffer," or some other kind of "marchitecture," to the point where the once-nice and orthogonal architecture is almost obscured by the "improvements" that followed. It seems almost like a law of nature: Any successful computer architecture, under immense pressure to "improve" while "remaining 100 percent compatible," will become a complicated mess.

Show me somebody who calls the IBM S/360 a RISC design, and I will show you somebody who works with the s390 instruction set today.

Or how about: The first rule of ARM is, "We don't talk about Thumb-2."

A very special instance of this law happened when AMD created the x86-64 instruction set to keep the x86 architecture alive after Intel, the nominal owner of that architecture, had all but abandoned it and gone full Second Systems Syndrome with the ill-fated Itanium architecture.

Fundamentally, we now have both kinds of CPUs—ARM and x64—and they both suffer from the same architectural problems. Take, for example, the translation from linear virtual to linear physical addresses. Not only have page table trees grown to a full handful of levels, but there are also multiple page sizes, each of which comes with its own handful of footnotes limiting usability and generality.

Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

Linear virtual addresses were made to be backwards-compatible with tiny computers with linear physical addresses but without virtual memory. There are still linear virtual addresses that are backwards-compatible with computers that were backwards-compatible with computers that were...

Apart from the smallest microcontrollers, nobody sane uses linear address spaces anymore, neither physical nor virtual. The very first thing any realtime nucleus or operating system kernel does is implement an abstract object store on top of the linear space. If the machine has virtual memory support, it then tries to map from virtual to physical as best it can, given five levels of page tables and all that drags in with it.

Translating from linear virtual addresses to linear physical addresses is slow and complicated, because 64-bit can address a lot of memory.

Having a single linear map would be prohibitively expensive in terms of memory for the map itself, so translations use a truncated tree structure, but that adds a whole slew of new possible exceptions: What if the page entry for the page directory entry for the page entry for the exception handler for missing page entries is itself empty?

This is the land where "double fault exceptions" and "F00F workarounds" originate. And with five levels of page tables, in the ultimate worst case, it takes five memory accesses before the CPU even knows where the data is.

It doesn't have to be that way.

One of my volunteer activities for is writing a software emulation of a unique and obscure computer: the Rational R1000/s400. (It's OK; you can look it up. I'll wait, because until one stood on our doorstep, I had never heard of it either.)

The R1000 has many interesting aspects, not the least of which is that it was created by some very brilliant and experienced people who were not afraid to boldly go. And they sure went: The instruction set is Ada primitives, it operates on bit fields of any alignment, the data bus is 128 bits wide: 64-bit for the data and 64-bit for data's type. They also made it a four-CPU system, with all CPUs operating in the same 64-bit global address space. It also needed a good 1,000 amperes at 5 volts delivered to the backplane through a dozen welding cables.

The global 64-bit address space is not linear; it is an object cache addressed with an (object + offset) tuple, and if that page of the object is not cached, a microcode trap will bring it in from disk.

In marketing materials, the object cache was sold as "memory boards," but in hardware, they contained a four-way associative cache, which brilliantly hid the tag-RAM lookup during the RAS (row address strobe) part of the DRAM memory cycle, so that it is precisely as fast as a normal linear DRAM memory would have been.

State-of-the-art CPUs today can still address only approximately 57 bits of address space, using five levels of page tables, each level successively and slowly sorting out another 10 bits of the address.

The R1000 addresses 64 bits of address space instantly in every single memory access. And before you tell me this is impossible: The computer is in the next room, built with 74xx-TTL (transistor-transistor logic) chips in the late 1980s. It worked back then, and it still works today.

The R1000 was a solid commercial success, or I guess I should say military success, because most customers were in the defense industry, and the price was an eye-watering, $1 million a pop. It also came with the first fully semantic IDE, but that is a story for another day.

Given Ada's reputation for strong typing, handling the type information in hardware rather than in the compiler may sound strange, but there is a catch: Ada's variant data structures can make it a pretty daunting task to figure out how big an object is and where to find things in it. Handling data+type as a unit in hardware makes that fast.

Not that type-checking in hardware is a bad idea. Quite the contrary: The recent announcement by ARM that it has prototyped silicon with its Morello implementation of Cambridge University's CHERI architecture gives me great hope for better software quality in the future.

Cut to the bone, CHERI makes pointers a different data type than integers in hardware and prevents conversion between the two types. Under CHERI, new valid pointers can be created only by derivation from existing valid pointers, either by restricting the permissible range or by restricting the permissions. If you try to create or modify a pointer by any other means, it will no longer be a pointer, because the hardware will have cleared the special bit in memory that marked it as a valid pointer.

According to Microsoft Research, CHERI would have deterministically detected and prevented a full 43 percent of the security problems reported to the company in 2019. To put that number in perspective: The National Highway Traffic Safety Administration reports that 47 percent of the people killed in traffic accidents were not wearing seat belts.

Like mandatory seat belts, some people argue that there would be no need for CHERI if everyone "just used type-safe languages," or they will claim that the extra bits for "capabilities" CHERI pointers carry make their programs look fat.

I'm not having any of it.

The linear address space as a concept is unsafe at any speed, and it badly needs mandatory CHERI seat belts. But even better would be to get rid of linear address spaces entirely and go back to the future, as successfully implemented in the Rational R1000 computer 30-plus years ago.


Poul-Henning Kamp ([email protected]) spent more than a decade as one of the primary developers of the FreeBSD operating system before creating the Varnish HTTP Cache software, which around a fifth of all web traffic goes through at some point. He lives in his native Denmark, where he makes a living as an independent contractor, specializing in making computers do weird stuff. One of his most recent projects was a supercomputer cluster to stop the stars twinkling in the mirrors of ESO's (European Southern Observatory's) new ELT (extremely large telescope).

Copyright © 2022 held by owner/author. Publication rights licensed to ACM.


Originally published in Queue vol. 20, no. 2
see this item in the ACM Digital Library



James P. Hughes, Whitfield Diffie - The Challenges of IoT, TLS, and Random Number Generators in the Real World
Many in the cryptographic community scoff at the mistakes made in implementing RNGs. Many cryptographers and members of the IETF resist the call to make TLS more resilient to this class of failures. This article discusses the history, current state, and fragility of the TLS protocol, and it closes with an example of how to improve the protocol. The goal is not to suggest a solution but to start a dialog to make TLS more resilient by proving that the security of TLS without the assumption of perfect random numbers is possible.

Benoit Baudry, Tim Toady, Martin Monperrus - Long Live Software Easter Eggs!
It's a period of unrest. Rebel developers, striking from continuous deployment servers, have won their first victory. During the battle, rebel spies managed to push an epic commit in the HTML code of Pursued by sinister agents, the rebels are hiding in commits, buttons, tooltips, API, HTTP headers, and configuration screens.

Alexandros Gazis, Eleftheria Katsiri - Middleware 101
Whether segregating a sophisticated software component into smaller services, transferring data between computers, or creating a general gateway for seamless communication, you can rely on middleware to achieve communication between different devices, applications, and software layers. Following the increasing agile movement, the tech industry has adopted the use of fast waterfall models to create stacks of layers for each structural need, including integration, communication, data, and security. Given this scope, emphasis must now be on endpoint connection and agile development. This means that middleware should not serve solely as an object-oriented solution to execute simple request-response commands.

Alvaro Videla - Meaning and Context in Computer Programs
When you look at a function program's source code, how do you know what it means? Is the meaning found in the return values of the function, or is it located inside the function body? What about the function name? Answering these questions is important to understanding how to share domain knowledge among programmers using the source code as the medium. The program is the medium of communication among programmers to share their solutions.

© ACM, Inc. All Rights Reserved.