Drill Bits

May 15, 2025
Volume 23, issue 2

Download PDF version of this article PDF

Sandboxing:
Foolproof Boundaries vs. Unbounded Foolishness

Terence Kelly
with Special Guest Borer Edison Fuh

Library Anxiety

You're a diligent programmer. Where safety or security is at stake, you specify requirements precisely and implement code carefully, keeping things as simple as possible and inviting expert peer review at every step of the way. In the end, your software isn't merely trusted, it's trustworthy.

Your own code, however, isn't the only software in the applications for which you are responsible. Much of the code is in large, complex, opaque off-the-shelf libraries whose top priorities are features and speed, not security. Such libraries present a dilemma. Finding and fixing their flaws would be prohibitively expensive, but linking them into your application carries risk: If crafted malicious input exploits a vulnerability in a library, the attacker can hijack the enclosing process. Thanks to the ambient authority that processes inherit from users on mainstream operating systems, hijacked processes can wreak havoc on everything within reach, stealing secrets, vandalizing data, and holding files for ransom. Far from paranoia, your library anxiety is vindicated by a long and sorry history. Bugs in libraries linked by sshd, for example, have made sshd vulnerable to remote root exploit.^3,14

Sandboxing protects your code from other people's bugs. By running library code in a suitable sandbox, your application can enjoy a library's features while preventing mayhem. This episode of Drill Bits presents a simple yet powerful sandboxing mechanism, showing how it provides strong confinement for unmodified library code—and how it can be defeated if proper precautions aren't taken. Our example code tarball sandboxes a widely used production library, and the "Drills" (exercises) section sketches enhancements that ambitious coders can implement.

Knee-High Piñata Meets Baseball Bat

We call our sandboxes filter sandboxes. Filter sandboxes are suitable for software that maps explicit inputs to explicit outputs but does not maintain persistent state or rely on indirect influences. For example, filter sandboxes are a good match for most compression, encryption, and mathematical libraries.

Filter sandboxes employ the Linux seccomp system call in its simplest mode of use.¹¹ A process that calls seccomp in this mode can subsequently make no syscalls whatsoever except read, write, and exit. This restriction is so simple that the OS kernel implementation is likely correct, so Draconian that sandboxes hijacked from within have few opportunities for mischief, and so easy to impose that we expect to knock the sandboxing problem out of the park blindfolded.

FIGURE 1A : Filter sandbox

A filter sandbox is a dedicated process that runs library code under seccomp confinement, as shown in figure 1a. Driver software calls seccomp when the only open file descriptors are stdin, stdout, and stderr; subsequent syscalls to obtain new file descriptors (e.g., by opening a file) would cause the kernel to kill the process. The driver reads inputs from stdin, passes them as arguments to library functions, and writes library return values to stdout. The driver can gripe via stderr. In Unix-speak, our sandboxes are filters.^10,15 Classic Unix filters include shell-pipeline favorites such as grep, tr, compress, and crypt. Unlike ordinary filters, however, our filter sandboxes are constrained to tread only the straight and narrow One True Path of Filtration.

"We should have some ways of coupling programs like garden hose—screw in another segment when it becomes necessary to massage data in another way."
—Doug McIlroy's pipe dream, 1964¹⁰

Library code in a filter sandbox should interact with the outside world only via condoned "in-band" channels (i.e., stdin, stdout, and stderr). Any "out-of-band" data flow or influence is an unauthorized leak, shown in red in figure 1a. How can leaks happen without syscalls? We'll return to that question repeatedly.

We assume that crafted malicious input can give an attacker control of the entire filter sandbox. Trusted code that uses the sandboxed library therefore runs in a separate process that communicates with the filter sandbox over pipes. Figure 1b shows the relationship: Trusted code forks a child coprocess, the child execs a filter sandbox executable, and pipes connect parent and child. In this arrangement the filter sandbox restricts only the child but not the parent, which may do whatever its permissions allow.

FIGURE 1B : Coprocesses

A full-blown rampage will be difficult for a hijacked filter sandbox child process, but some shenanigans are easy. A wayward child can hog the CPU by spinning in a tight loop, or sow confusion by spewing malarkey at the parent. A hijacked compression library, for example, can replace the input "attack at dawn" with "surrender now" prior to compression. My head is KILLING me! Given this disinformation threat, a major reason to block leaks into filter sandboxes is to prevent more insidious targeted "spear phooling": If a hijacked compression library learns via leaks that the $USER is kelly and today is 18th March, it can replace the input "Top o’ the mornin’!" with "My head is KILLING me!" prior to compression.

We'll try to prevent inbound leaks by keeping environment variables out of the filter sandbox and by banning nearly all syscalls, including those that obtain the date/time. The worst threat is that a hijacked child process can return crafted malicious data designed to hijack the trusted parent. A prudent parent regards all data from the child with suspicion and handles it with caution.

Confinement Code and Patterns

Figure 2 lists a header file that makes it easy to create filter sandboxes and plug two kinds of leaks. Lines 7–10 define macros for reporting errors from within sandboxes. These macros avoid elaborate facilities such as fprintf, which make banned syscalls under the hood. The CHK macro is like standard assert but can't be disabled; it streamlines error checks. DUP2 (lines 12–13) helps with coprocess plumbing.

FIGURE 2: sandbox.h: sandbox creation, preliminaries, and diagnostics

The two macros on lines 15–21 should be called before a sandbox is created. The first prevents environment variables from leaking into a sandbox. Ideally this macro should use C23's memset_explicit function, which resists being optimized away by the compiler but which is not yet universally available. The second macro tries to prevent leaks from a sandbox via core dumps. We'll say more about core dumps later.

SANDBOX_CREATE_STRICT is the main event (lines 23–24). It calls seccomp in its simplest and most restrictive form, which bans all system calls except read, write, and exit. The kernel will kill a process that makes any other syscall after this seccomp call. Linux man pages document all syscalls in our code.¹¹

We'll walk through detailed examples of two common sandboxing patterns. In the first pattern, trusted code interacts with a filter-sandboxed library in simple call-return fashion. Call-return is adequate for many libraries, particularly mathematical libraries. In the second pattern, trusted code pumps a stream of data through a filter sandbox. The stream pattern covers libraries that compress, encrypt, or otherwise transform arbitrary byte sequences. In both patterns, our approach requires no changes to library code.

Call-Return

Figure 3 presents a toy "library" function that we'll use to illustrate the call-return pattern. Function sum adds up a given array of integers. The sum library neither knows nor cares about sandboxing.

FIGURE 3: sum_lib.c: toy "library"

It doesn't take much code to confine sum to a filter sandbox in the manner of figure 1a. The driver in figure 4 creates a sandbox (lines 4–6) using the macros of figure 2, reads an input array from stdin (line 7), calls sum (line 9), and writes the return value to stdout (line 10). Limiting data transfers on pipes to PIPE_BUF bytes (lines 2–3) ensures atomicity (absent signals), sparing us the trouble of dealing with partial reads and writes.

FIGURE 4: sum_sandbox.c: driver that calls sum within filter sandbox

It's possible to use our sum filter sandbox in a shell pipeline, but it's really meant to serve trusted code as a coprocess in the manner of figure 1b.

Figure 5 shows a bare-bones "application" that invokes the sum filter sandbox as a coprocess. It forks a child process, which calls execve to execute sum_sandbox. The characteristic plumbing of coprocesses begins when pipe (line 3) creates two pipes, one for parent→child "calls" and the other for child→parent "returns." The child rewires these pipes to its stdin and stdout using DUP2 from figure 2 (lines 18–19). By the time sum_sandbox springs to life, its stdin is the read end of the p2c pipe and its stdout is the write end of the c2p pipe. Stevens and Rago explain the plumbing of coprocesses.¹⁶

FIGURE 5: sum_app.c: application that runs sum_sandbox as co-process

The parent process "calls" sum in the filter sandbox coprocess by write-ing an array of ints to the parent→child pipe (line 8). The parent then obtains the library function's return value by reading the child→parent pipe (line 9). The parent is simple because it relies on knowledge of the child's I/O behavior: The child always ingests all of its input first, and then emits all of its output, so the parent can safely ignore the possibility of tennis-match interactions or I/O deadlock. I/O misbehavior in the child could deadlock the coprocesses; this is another kind of denial-of-service attack, in addition to CPU hogging.

The "application" code of figure 5 takes several precautions worth noting. The child passes an empty set of environment variables when it execs the sandbox (ep on lines 16 and 20). The parent waits for the child to terminate, to prevent an orphaned child from becoming a "zombie" process, then confirms that the child exited normally by using WIFEXITED to inspect the kernel-set, and therefore trusted, bits of ws (lines 11–12). The parent ignores the child-set, and therefore untrusted, exit status stashed in the low bits of ws. Finally, note that both coprocesses share the same stderr stream. Their gripes may interleave into gibberish unless the shared stderr is a pipe, in which case writes smaller than PIPE_BUF are guaranteed to be atomic. To avert confusion, keep error messages short and run stderr through a pipe. More importantly, keep in mind that the shared stderr includes data from the untrusted child.

Streaming Data

We illustrate the stream pattern of coprocess interaction with a library that implements rot13 "encryption," which shifts alphabetic characters 13 positions rightward: "A" becomes "N", "B" becomes "O", etc., as in the equivalent tr command:

$ echo irk | tr A-Za-z N-ZA-Mn-za-m

vex

Figure 6 shows a driver that builds a filter sandbox around rot13, whose trivial implementation is not shown. The driver repeatedly reads a buffer's worth of bytes from stdin, applies rot13 to each byte, and writes the transformed buffer to stdout.

FIGURE 6: rot13_sandbox.c: driver that calls rot13 library function within sandbox

A parent application process that pumps a stream of data through a child filter sandbox coprocess such as rot13_sandbox must be prepared to handle a wider range of I/O contingencies than in the simple call-return pattern. In general, the child may legitimately read and write arbitrary numbers of bytes in arbitrary order. I/O deadlock between coprocesses can easily happen unless the parent application process takes careful precautions.

Figure 7 presents generic application code that pumps a data stream from its own stdin through a coprocess to its own stdout. The coprocess plumbing is set up the same way as in the call-return application code of figure 5. The child code is also similar, except now the child passes to execve a suffix of its own argv. Instructions on line 12 show the intended mode of use for the run_coproc program of figure 7.

FIGURE 7: run_coproc.c: generic "application" that pumps stream through co-process

The biggest difference between sum_app of figure 5 and run_coproc of figure 7 is that the latter's parent code uses I/O multiplexing. This is necessary because the child may perform blocking reads and writes in arbitrary patterns, so the parent can't know whether its own blocking read or write will create deadlock. The poll syscall on line 21 tells the parent if the child is ready to read or if the child has written data that the parent may read, enabling the parent to respond appropriately without fear of blocking. Flags xi and xo, respectively, indicate whether additional data remains to be written to the child or read from the child. The parent polls as long as both are true (line 18), then cleans up any remaining data transfers without the need for I/O multiplexing (lines 31–32).

Zlib

To demonstrate that filter sandboxes are practical for real-world libraries, our example tarball includes code to sandbox the widely used zlib compression library and run the sandboxed zlib as the child coprocess of the placeholder application in figure 7. Compression libraries are excellent candidates for filter sandboxing because security-critical programs such as sshd have inherited vulnerabilities by linking directly with compression libraries.^3,14 Furthermore, compression filters torture-test our streaming framework by changing the sizes of input streams and by interleaving reads and writes in challenging ways.

Creating a filter-sandboxed zlib was easy because the Zlib website provides driver code that exposes the library as a Unix filter.¹⁸ We modified the original zpipe.c driver into zpipe_sandbox.c by adding the sandbox macros of figure 2, replacing fread/fwrite with read/write, and making a few other changes.

Running our filter-sandboxed zlib, however, can be far from easy. The hidden machinery of modern dynamic linking and loading relies on system calls that seccomp bans. Address-space layout randomization employs banned syscalls too. Finally, zlib allocates memory dynamically, and the default malloc makes still more banned syscalls under the hood. A Gordian-knot solution to these problems is to statically link zpipe_sandbox.c with libz.a and with a small custom allocator that doles out statically allocated memory. Dynamically linking with libz.so is left as an exercise. An appropriately built/linked zlib filter sandbox works well as a coprocess for an application that must prevent zlib bugs from compromising the entire application.

More generally, our experience confirms that filter-sandbox coprocesses are versatile enough to handle two important library usage patterns: call-return and pass-through streaming.

If it sounds too good to be true, it is.

Jailbreak

Versatility is nice, but are filter sandboxes secure? Not against libraries that are born evil. To see how easily a malicious library can circumvent sandboxing, compile our sum_lib.c library with the -DEVIL option. That yields the malevolent variant shown in figure 8, which uses a constructor to dig a tunnel beneath the sandbox before the sandbox is created. Constructors run before the driver's main function, and therefore before seccomp is called. When the sum library function is eventually called, it exfiltrates data out of the sandbox via the tunnel to a file waiting in /tmp/. Of course, constructors need not restrict themselves to theft; vandalism and other mayhem are possible too. Indeed, constructors that run before main cause plenty of trouble even in the absence of malice.⁴

FIGURE 8: Evil version of sum_lib.c: constructor preemptively undermines sandbox

Identifying and addressing all of the ways a malicious library can arrange for its own code to run before seccomp is left as an exercise for the reader. Rather than explore how malicious library code and trusted code can fight over the steering wheel before a sandbox is created, we restrict attention to library code that is not inherently malevolent but is vulnerable to being hijacked by crafted inputs while running inside a sandbox.

It's relatively easy for crafted input to trigger SIGSEGV in library code. Should we worry that core dumps might leak the crown jewels from a filter sandbox? A core file would be generated by the kernel, not by syscalls that seccomp blocks. Before any segfaulty library code can run, however, filter sandbox drivers use a macro from figure 2 to set RLIMIT_CORE to zero. Which means we can stop worrying about core dumps, right?

Wrong. On modern Linux systems, /proc/sys/kernel/core_pattern governs the disposition of core files. If this pseudo-file contains a pattern like "|/path/program", a core file is generated and piped into the specified program regardless of RLIMIT_CORE. Some Linux distributions install core-handling programs by default. Determining exactly what these programs do is left as an exercise; check for handlers explicitly designed to whisk core files away to a distant corporate mothership. Most importantly, stay in control. Configure your system to ensure that core files cannot abscond with sensitive information.

What about leaks into a filter sandbox? Earlier, we saw that seccomp bans the syscalls that access date/time. On many Linux systems, however, these syscalls are helpfully replaced, in the name of efficiency, by vDSO (virtual dynamic shared object) equivalents that work despite seccomp. There's still hope for spear phooling the day after St. Patrick's.

If we somehow prevent the date/time from leaking into a filter sandbox, would that constrain library code to map stdin to stdout deterministically? No, non-deterministic behavior remains easy thanks to unprivileged CPU instructions such as RDSEED, which offers gold-standard true-random numbers.⁹ An entropy-proof sandbox would be a tall order on today's computers, and seccomp provides no such thing.

An exhaustive security analysis of filter sandboxing is beyond the scope of this column. For now, we can summarize the evidence on hand: Accessing filter sandboxes as coprocesses is fairly easy, and this approach probably increases the cost of successful attacks on trusted applications via vulnerable libraries more than it increases the defender's costs. Filter sandboxes are likely better at hindering vandalism than completely blocking out-of-band data leakage. In-band disinformation is easy and some targeted deceptions are possible.

Our study of sandboxing highlights several broader principles. To design and implement robust defenses, defenders must adopt the hacker's whole-system perspective: Computers are "weird machines" brimming with emergent behaviors and surprising interactions that their designers never intended.^2,5 Why? Because strong, clean isolation mechanisms such as seccomp are continually eroded and negated by frills conceived with little concern for security. Genuine security requires keeping the weird machine's weirdness in check via simplicity and logical coherence,⁶ virtues often shortchanged in favor of bells and whistles, time to market, development cost, runtime efficiency, convenience, portability, popularity, and profit. Thus, Americans "secure" their houses with garage door openers containing 50 million lines of code.⁷ Foolishness knows no bounds.

"There are no complex systems that are secure."
—Bruce Schneier

The most important security conflict is not between attacker and defender but between the defenders' priorities. Choose your priorities thoughtfully, recognizing that wherever you permit complexity you preclude security. Libraries written by strangers threaten to infuse your applications with staggering complexity. Filter sandboxes keep that complexity safely away from your own trusted, and trustworthy, code.

Drilling Deeper

In one form or another, sandboxing has been a goal since the earliest days of timesharing. An Internet search for sandboxing and related terms such as fault isolation will turn up many mechanisms that have been tried over the decades and that are too numerous to mention here. That old ones keep getting discarded as new ones keep getting invented suggests that we still haven't got it right.

The out-of-band data leaks that we consider are the goofy kid brothers of covert channels, such as those involving electromagnetic emissions, which we do not consider and which have been studied for decades.¹³ One approach to preventing careless, casual ambient information flows is object-capability security in safe programming languages.¹²

"OpenSSL must die, for it will never get any better."
—Poul-Henning Kamp

Kamp describes how security-critical software can become a dilapidated dumping ground beyond hope of auditing.⁸ Ken Thompson devoted his Turing Award lecture¹⁷ to the limitations of source-code auditing and to a memorable rule about trusting anything designed and built by strangers: Don't.

Black reveals the patient ingenuity of the criminal mind in his account of a long career as a burglar, safe cracker, and jailbreaker.¹

Bits

Grab the example code tarball at https://queue.acm.org/downloads/2025/Drill_Bits_15_example_code.tar.gz. You get all of the code discussed in this column, scripts to compile and run it, plus a vDSO demo, answers to some of the Drills below, and an alternative version of run_coproc.c contributed by a reviewer.

Drills

Check out the automagical syscalls made by a trivial program:
$ echo 'int main(void){}' | gcc -x c - ; strace ./a.out
Now strace complex deterministic programs. Is getrandom() called? Why??
Our little custom memory allocator, bpalloc.c, uses a kludgy loop to align pointers. Replace this with a more elegant one-step alignment.
Instead of wait-ing for the child coprocess to exit, should the application process (figures 5 and 7) kill the child?
See the comment about stack size in kernel/seccomp.c and amend our sandboxes accordingly.
Find a way to prevent core dumps completely. Check out prctl(PR_SET_DUMPABLE), madvise(MADV_DONTDUMP), and /proc/self/coredump_filter.
Would it make sense to drop user privileges before calling seccomp? What are the drawbacks and limitations of traditional mechanisms such as setuid and chroot? What are the best modern alternatives?
The classic Unix command-line utility sort is often used as a filter. Would a filter-sandboxed sort work as well as the original? (Hint: Consider external sorting.)
Prevent DoS attacks and detect deadlocks by limiting the wall-clock time spent waiting for the child in sum_app.
To detect when a hijacked Zlib compression library alters its inputs prior to compression, write an application that immediately decompresses and compares the output of decompression with the input of compression. For bonus points, alter zlib to defeat this countermeasure by collusion between compression and decompression routines.
Compare the performance of zpipe_sandbox stand-alone versus run via run_coproc.
Measure overhead as function of number of chained filters:
$ cat bigfile | ./run_coproc ... ./run_coproc /bin/cat > /dev/null
How much delay does each additional coprocess add? What happens to latency when the number of processes exceeds the number of CPU cores on your machine?
Would it improve performance to use splice for more efficient pipe-to-pipe copying in the copy function of run_coproc.c (lines 3–4 of figure 7)?
Would it be wise to write a program to automatically respond to diagnostic messages on the shared stderr stream of the coprocesses of figures 5 and 7?
Use a remote procedure call mechanism to make call-return interactions across coprocesses feel more like ordinary function calls.
Relax the near-total ban on system calls by using seccomp's more sophisticated filter mode. What syscalls can safely be permitted within a sandbox?
Link zpipe_sandbox dynamically with zlib.so and run it with run_coproc. To prevent dynamic linking from making banned syscalls after seccomp, check out the LD_BIND_NOW environment variable and the "-z now" option of GNU ld.
Write a program that creates an I/O deadlock when run_coproc runs it as a coprocess. Bonus points: Do the same for the alternative run_coproc_SP.

Acknowledgments

Zi Fan Tan suggested that Drill Bits explore seccomp-based sandboxes and showed us how to subvert them with constructors. Jacob Bachmeyer, John Dilley, Paul Lawrence, and Sergey Poznyakoff reviewed our example code meticulously and repeatedly, fixing several bugs and suggesting numerous improvements. Poznyakoff contributed a variant of run_coproc.c. Jon Bentley, Bachmeyer, Dilley, Lawrence, and Poznyakoff reviewed drafts of this column, again fixing bugs and suggesting numerous improvements. Hans Boehm provided insights and pointers related to constructors. Boehm serves on the C++ Standard Committee; Lawrence, from the Google Android team, wrote the initial seccomp filter for applications.

References

Black, J. 1926. You Can't Win. MacMillan; https://gutenberg.org/ebooks/69404.txt.utf-8.
Bratus, S., Locasto, M. E., Patterson, M. L., Sassaman, L., Shubina, A. 2011. Exploit programming: from buffer overflows to weird machines and theory of computation. Usenix ;login: 36(6); https://www.usenix.org/publications/login/december-2011-volume-36-number-6/exploit-programming-buffer-overflows-weird.
CERT. 2002. Vulnerability: Double free bug in zlib compression library corrupts malloc's internal data structures; https://kb.cert.org/vuls/id/368819.
Constructors of confusion in C++, 2025. Read a few of the FAQs that start at https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use and you'll see this is a mess.
Dullien, T. 2020. Weird machines, exploitability, and provable unexploitability. IEEE Transactions on Emerging Topics in Computing 8(2); https://ieeexplore.ieee.org/document/8226852.
Ferguson, N., Schneier, B. 2003. Practical Cryptography. Wiley. See Section 1.2, "The Evils of Features."
Hubert, B. 2024. Why bloat is still software's biggest vulnerability. IEEE Spectrum 61(4). Paywall: https://ieeexplore.ieee.org/document/10491389. Possibly without paywall: https://spectrum.ieee.org/lean-software-development.
Kamp, P.-H. 2014. Please put OpenSSL out of its misery. acmqueue 12(3). https://dl.acm.org/doi/pdf/10.1145/2602649.2602816.
Kelly, T. 2024. Zero tolerance for bias. acmqueue 22(2); https://queue.acm.org/detail.cfm?id=3664645.
Kernighan, B. 2020. UNIX: A History and a Memoir. Kindle Direct Publishing. See pp. 67–70 for pipes and filters.
Linux man pages. https://man7.org/linux/man-pages/: clock_gettime(2), close(2), dup2(2), environ(7), execve(2), fork(2), gcc(1), ld(1), ld.so(8), mkstemp(3), pipe(2), poll(2), prctl(2), read(2), setrlimit(2), syscall(2), vdso(7), wait(2), write(2).
Miller, M. S., Yee, K.-P., Shapiro, J. 2003. Capability myths demolished. Technical Report SRL2003-02, Johns Hopkins University Systems Research Laboratory; https://papers.agoric.com/assets/pdf/papers/capability-myths-demolished.pdf.
National Security Agency. TEMPEST: A signal problem; https://web.archive.org/web/20130918021523/http:/www.nsa.gov/public_info/_files/cryptologic_spectrum/tempest.pdf.
Provos, N., Friedl, M., Honeyman, P. 2003. Preventing privilege escalation. Proceedings of the 12th Usenix Security Symposium; http://www.usenix.org/events/sec03/tech/full_papers/provos_et_al/provos_et_al.pdf. See p. 239 for a zlib bug that created a remote root exploit vulnerability in sshd more than 20 years before the recent ssh/xz affair.
Raymond, E. S. 2004. The Art of UNIX Programming. Addison-Wesley. See p. 266 for the filter pattern.
Stevens, W. R., Rago, S. A. 2013. Advanced Programming in the UNIX Environment, third edition. Addison-Wesley. See pp. 548–552 for coprocesses.
Thompson, K. 1984. Reflections on trusting trust [Turing Award lecture]. Communications of the ACM 27(8); https://dl.acm.org/doi/pdf/10.1145/358198.358210.
Zlib. January 2024; https://www.zlib.net/.

Terence Kelly ([email protected]) and Edison Fuh aren't merely trusted, they're trustworthy.

Originally published in Queue vol. 23, no. 2—
Comment on this article in the ACM Digital Library