You're a diligent programmer. Where safety or security is at stake, you specify requirements precisely and implement code carefully, keeping things as simple as possible and inviting expert peer review at every step of the way. In the end, your software isn't merely trusted, it's trustworthy.
Your own code, however, isn't the only software in the applications for which you are responsible. Much of the code is in large, complex, opaque off-the-shelf libraries whose top priorities are features and speed, not security. Such libraries present a dilemma. Finding and fixing their flaws would be prohibitively expensive, but linking them into your application carries risk: If crafted malicious input exploits a vulnerability in a library, the attacker can hijack the enclosing process. Thanks to the ambient authority that processes inherit from users on mainstream operating systems, hijacked processes can wreak havoc on everything within reach, stealing secrets, vandalizing data, and holding files for ransom. Far from paranoia, your library anxiety is vindicated by a long and sorry history. Bugs in libraries linked by sshd
, for example, have made sshd
vulnerable to remote root exploit.3,14
Sandboxing protects your code from other people's bugs. By running library code in a suitable sandbox, your application can enjoy a library's features while preventing mayhem. This episode of Drill Bits presents a simple yet powerful sandboxing mechanism, showing how it provides strong confinement for unmodified library code—and how it can be defeated if proper precautions aren't taken. Our example code tarball sandboxes a widely used production library, and the "Drills" (exercises) section sketches enhancements that ambitious coders can implement.
We call our sandboxes filter sandboxes. Filter sandboxes are suitable for software that maps explicit inputs to explicit outputs but does not maintain persistent state or rely on indirect influences. For example, filter sandboxes are a good match for most compression, encryption, and mathematical libraries.
Filter sandboxes employ the Linux seccomp
system call in its simplest mode of use.11 A process that calls seccomp
in this mode can subsequently make no syscalls whatsoever except read
, write
, and exit
. This restriction is so simple that the OS kernel implementation is likely correct, so Draconian that sandboxes hijacked from within have few opportunities for mischief, and so easy to impose that we expect to knock the sandboxing problem out of the park blindfolded.
FIGURE 1A : Filter sandbox
A filter sandbox is a dedicated process that runs library code under seccomp
confinement, as shown in figure 1a. Driver software calls seccomp
when the only open file descriptors are stdin
, stdout
, and stderr
; subsequent syscalls to obtain new file descriptors (e.g., by open
ing a file) would cause the kernel to kill the process. The driver read
s inputs from stdin
, passes them as arguments to library functions, and write
s library return values to stdout
. The driver can gripe via stderr
. In Unix-speak, our sandboxes are filters.10,15 Classic Unix filters include shell-pipeline favorites such as grep
, tr
, compress
, and crypt
. Unlike ordinary filters, however, our filter sandboxes are constrained to tread only the straight and narrow One True Path of Filtration.
"We should have some ways of coupling programs like garden hose—screw in another segment when it becomes necessary to massage data in another way."
—Doug McIlroy's pipe dream, 196410
Library code in a filter sandbox should interact with the outside world only via condoned "in-band" channels (i.e., stdin
, stdout
, and stderr
). Any "out-of-band" data flow or influence is an unauthorized leak, shown in red in figure 1a. How can leaks happen without syscalls? We'll return to that question repeatedly.
We assume that crafted malicious input can give an attacker control of the entire filter sandbox. Trusted code that uses the sandboxed library therefore runs in a separate process that communicates with the filter sandbox over pipes. Figure 1b shows the relationship: Trusted code fork
s a child coprocess, the child exec
s a filter sandbox executable, and pipes connect parent and child. In this arrangement the filter sandbox restricts only the child but not the parent, which may do whatever its permissions allow.
FIGURE 1B : Coprocesses
A full-blown rampage will be difficult for a hijacked filter sandbox child process, but some shenanigans are easy. A wayward child can hog the CPU by spinning in a tight loop, or sow confusion by spewing malarkey at the parent. A hijacked compression library, for example, can replace the input "attack at dawn"
with "surrender now"
prior to compression. Given this disinformation threat, a major reason to block leaks into filter sandboxes is to prevent more insidious targeted "spear phooling": If a hijacked compression library learns via leaks that the
$USER
is kelly
and today is 18th March, it can replace the input "Top o’ the mornin’!"
with "My head is KILLING me!"
prior to compression.
We'll try to prevent inbound leaks by keeping environment variables out of the filter sandbox and by banning nearly all syscalls, including those that obtain the date/time. The worst threat is that a hijacked child process can return crafted malicious data designed to hijack the trusted parent. A prudent parent regards all data from the child with suspicion and handles it with caution.
Figure 2 lists a header file that makes it easy to create filter sandboxes and plug two kinds of leaks. Lines 7–10 define macros for reporting errors from within sandboxes. These macros avoid elaborate facilities such as fprintf
, which make banned syscalls under the hood. The CHK
macro is like standard assert
but can't be disabled; it streamlines error checks. DUP2
(lines 12–13) helps with coprocess plumbing.
FIGURE 2: sandbox.h
: sandbox creation, preliminaries, and diagnostics
The two macros on lines 15–21 should be called before a sandbox is created. The first prevents environment variables from leaking into a sandbox. Ideally this macro should use C23's memset_explicit
function, which resists being optimized away by the compiler but which is not yet universally available. The second macro tries to prevent leaks from a sandbox via core dumps. We'll say more about core dumps later.
SANDBOX_CREATE_STRICT
is the main event (lines 23–24). It calls seccomp
in its simplest and most restrictive form, which bans all system calls except read
, write
, and exit
. The kernel will kill a process that makes any other syscall after this seccomp
call. Linux man pages document all syscalls in our code.11
We'll walk through detailed examples of two common sandboxing patterns. In the first pattern, trusted code interacts with a filter-sandboxed library in simple call-return fashion. Call-return is adequate for many libraries, particularly mathematical libraries. In the second pattern, trusted code pumps a stream of data through a filter sandbox. The stream pattern covers libraries that compress, encrypt, or otherwise transform arbitrary byte sequences. In both patterns, our approach requires no changes to library code.
Figure 3 presents a toy "library" function that we'll use to illustrate the call-return pattern. Function sum
adds up a given array of integers. The sum
library neither knows nor cares about sandboxing.
FIGURE 3: sum_lib.c
: toy "library"
It doesn't take much code to confine sum
to a filter sandbox in the manner of figure 1a. The driver in figure 4 creates a sandbox (lines 4–6) using the macros of figure 2, read
s an input array from stdin
(line 7), calls sum
(line 9), and write
s the return value to stdout
(line 10). Limiting data transfers on pipes to PIPE_BUF
bytes (lines 2–3) ensures atomicity (absent signals), sparing us the trouble of dealing with partial read
s and write
s.
FIGURE 4: sum_sandbox.c
: driver that calls sum within filter sandbox
It's possible to use our sum
filter sandbox in a shell pipeline, but it's really meant to serve trusted code as a coprocess in the manner of figure 1b.
Figure 5 shows a bare-bones "application" that invokes the sum
filter sandbox as a coprocess. It fork
s a child process, which calls execve
to execute sum_sandbox
. The characteristic plumbing of coprocesses begins when pipe
(line 3) creates two pipes, one for parent→child "calls" and the other for child→parent "returns." The child rewires these pipes to its stdin
and stdout
using DUP2
from figure 2 (lines 18–19). By the time sum_sandbox
springs to life, its stdin
is the read end of the p2c
pipe and its stdout
is the write end of the c2p
pipe. Stevens and Rago explain the plumbing of coprocesses.16
FIGURE 5: sum_app.c
: application that runs sum_sandbox
as co-process
The parent process "calls" sum
in the filter sandbox coprocess by write
-ing an array of int
s to the parent→child pipe (line 8). The parent then obtains the library function's return value by read
ing the child→parent pipe (line 9). The parent is simple because it relies on knowledge of the child's I/O behavior: The child always ingests all of its input first, and then emits all of its output, so the parent can safely ignore the possibility of tennis-match interactions or I/O deadlock. I/O misbehavior in the child could deadlock the coprocesses; this is another kind of denial-of-service attack, in addition to CPU hogging.
The "application" code of figure 5 takes several precautions worth noting. The child passes an empty set of environment variables when it exec
s the sandbox (ep
on lines 16 and 20). The parent wait
s for the child to terminate, to prevent an orphaned child from becoming a "zombie" process, then confirms that the child exited normally by using WIFEXITED
to inspect the kernel-set, and therefore trusted, bits of ws
(lines 11–12). The parent ignores the child-set, and therefore untrusted, exit status stashed in the low bits of ws
. Finally, note that both coprocesses share the same stderr
stream. Their gripes may interleave into gibberish unless the shared stderr
is a pipe, in which case write
s smaller than PIPE_BUF
are guaranteed to be atomic. To avert confusion, keep error messages short and run stderr
through a pipe. More importantly, keep in mind that the shared stderr
includes data from the untrusted child.
We illustrate the stream pattern of coprocess interaction with a library that implements rot13
"encryption," which shifts alphabetic characters 13 positions rightward: "A" becomes "N", "B" becomes "O", etc., as in the equivalent tr
command:
$ echo irk | tr A-Za-z N-ZA-Mn-za-m
vex
Figure 6 shows a driver that builds a filter sandbox around rot13
, whose trivial implementation is not shown. The driver repeatedly read
s a buffer's worth of bytes from stdin
, applies rot13
to each byte, and write
s the transformed buffer to stdout
.
FIGURE 6: rot13_sandbox.c
: driver that calls rot13
library function within sandbox
A parent application process that pumps a stream of data through a child filter sandbox coprocess such as rot13_sandbox
must be prepared to handle a wider range of I/O contingencies than in the simple call-return pattern. In general, the child may legitimately read
and write
arbitrary numbers of bytes in arbitrary order. I/O deadlock between coprocesses can easily happen unless the parent application process takes careful precautions.
Figure 7 presents generic application code that pumps a data stream from its own stdin
through a coprocess to its own stdout
. The coprocess plumbing is set up the same way as in the call-return application code of figure 5. The child code is also similar, except now the child passes to execve
a suffix of its own argv
. Instructions on line 12 show the intended mode of use for the run_coproc
program of figure 7.
FIGURE 7: run_coproc.c
: generic "application" that pumps stream through co-process
The biggest difference between sum_app
of figure 5 and run_coproc
of figure 7 is that the latter's parent code uses I/O multiplexing. This is necessary because the child may perform blocking read
s and write
s in arbitrary patterns, so the parent can't know whether its own blocking read
or write
will create deadlock. The poll
syscall on line 21 tells the parent if the child is ready to read or if the child has written data that the parent may read, enabling the parent to respond appropriately without fear of blocking. Flags xi
and xo
, respectively, indicate whether additional data remains to be written to the child or read from the child. The parent poll
s as long as both are true (line 18), then cleans up any remaining data transfers without the need for I/O multiplexing (lines 31–32).
To demonstrate that filter sandboxes are practical for real-world libraries, our example tarball includes code to sandbox the widely used zlib
compression library and run the sandboxed zlib
as the child coprocess of the placeholder application in figure 7. Compression libraries are excellent candidates for filter sandboxing because security-critical programs such as sshd
have inherited vulnerabilities by linking directly with compression libraries.3,14 Furthermore, compression filters torture-test our streaming framework by changing the sizes of input streams and by interleaving read
s and write
s in challenging ways.
Creating a filter-sandboxed zlib
was easy because the Zlib website provides driver code that exposes the library as a Unix filter.18 We modified the original zpipe.c
driver into zpipe_sandbox.c
by adding the sandbox macros of figure 2, replacing fread/fwrite
with read/write
, and making a few other changes.
Running our filter-sandboxed zlib
, however, can be far from easy. The hidden machinery of modern dynamic linking and loading relies on system calls that seccomp
bans. Address-space layout randomization employs banned syscalls too. Finally, zlib
allocates memory dynamically, and the default malloc
makes still more banned syscalls under the hood. A Gordian-knot solution to these problems is to statically link zpipe_sandbox.c
with libz.a
and with a small custom allocator that doles out statically allocated memory. Dynamically linking with libz.so
is left as an exercise. An appropriately built/linked zlib
filter sandbox works well as a coprocess for an application that must prevent zlib
bugs from compromising the entire application.
More generally, our experience confirms that filter-sandbox coprocesses are versatile enough to handle two important library usage patterns: call-return and pass-through streaming.
If it sounds too good to be true, it is.
Versatility is nice, but are filter sandboxes secure? Not against libraries that are born evil. To see how easily a malicious library can circumvent sandboxing, compile our sum_lib.c
library with the -DEVIL
option. That yields the malevolent variant shown in figure 8, which uses a constructor to dig a tunnel beneath the sandbox before the sandbox is created. Constructors run before the driver's main
function, and therefore before seccomp
is called. When the sum
library function is eventually called, it exfiltrates data out of the sandbox via the tunnel to a file waiting in /tmp/
. Of course, constructors need not restrict themselves to theft; vandalism and other mayhem are possible too. Indeed, constructors that run before main
cause plenty of trouble even in the absence of malice.4
FIGURE 8: Evil version of sum_lib.c
: constructor preemptively undermines sandbox
Identifying and addressing all of the ways a malicious library can arrange for its own code to run before seccomp
is left as an exercise for the reader. Rather than explore how malicious library code and trusted code can fight over the steering wheel before a sandbox is created, we restrict attention to library code that is not inherently malevolent but is vulnerable to being hijacked by crafted inputs while running inside a sandbox.
It's relatively easy for crafted input to trigger SIGSEGV
in library code. Should we worry that core dumps might leak the crown jewels from a filter sandbox? A core file would be generated by the kernel, not by syscalls that seccomp
blocks. Before any segfaulty library code can run, however, filter sandbox drivers use a macro from figure 2 to set RLIMIT_CORE
to zero. Which means we can stop worrying about core dumps, right?
Wrong. On modern Linux systems, /proc/sys/kernel/core_pattern
governs the disposition of core files. If this pseudo-file contains a pattern like "|/path/program"
, a core file is generated and piped into the specified program regardless of RLIMIT_CORE
. Some Linux distributions install core-handling programs by default. Determining exactly what these programs do is left as an exercise; check for handlers explicitly designed to whisk core files away to a distant corporate mothership. Most importantly, stay in control. Configure your system to ensure that core files cannot abscond with sensitive information.
What about leaks into a filter sandbox? Earlier, we saw that seccomp
bans the syscalls that access date/time. On many Linux systems, however, these syscalls are helpfully replaced, in the name of efficiency, by vDSO (virtual dynamic shared object) equivalents that work despite seccomp
. There's still hope for spear phooling the day after St. Patrick's.
If we somehow prevent the date/time from leaking into a filter sandbox, would that constrain library code to map stdin
to stdout
deterministically? No, non-deterministic behavior remains easy thanks to unprivileged CPU instructions such as RDSEED
, which offers gold-standard true-random numbers.9 An entropy-proof sandbox would be a tall order on today's computers, and seccomp
provides no such thing.
An exhaustive security analysis of filter sandboxing is beyond the scope of this column. For now, we can summarize the evidence on hand: Accessing filter sandboxes as coprocesses is fairly easy, and this approach probably increases the cost of successful attacks on trusted applications via vulnerable libraries more than it increases the defender's costs. Filter sandboxes are likely better at hindering vandalism than completely blocking out-of-band data leakage. In-band disinformation is easy and some targeted deceptions are possible.
Our study of sandboxing highlights several broader principles. To design and implement robust defenses, defenders must adopt the hacker's whole-system perspective: Computers are "weird machines" brimming with emergent behaviors and surprising interactions that their designers never intended.2,5 Why? Because strong, clean isolation mechanisms such as seccomp
are continually eroded and negated by frills conceived with little concern for security. Genuine security requires keeping the weird machine's weirdness in check via simplicity and logical coherence,6 virtues often shortchanged in favor of bells and whistles, time to market, development cost, runtime efficiency, convenience, portability, popularity, and profit. Thus, Americans "secure" their houses with garage door openers containing 50 million lines of code.7 Foolishness knows no bounds.
"There are no complex systems that are secure."
—Bruce Schneier
The most important security conflict is not between attacker and defender but between the defenders' priorities. Choose your priorities thoughtfully, recognizing that wherever you permit complexity you preclude security. Libraries written by strangers threaten to infuse your applications with staggering complexity. Filter sandboxes keep that complexity safely away from your own trusted, and trustworthy, code.
In one form or another, sandboxing has been a goal since the earliest days of timesharing. An Internet search for sandboxing and related terms such as fault isolation will turn up many mechanisms that have been tried over the decades and that are too numerous to mention here. That old ones keep getting discarded as new ones keep getting invented suggests that we still haven't got it right.
The out-of-band data leaks that we consider are the goofy kid brothers of covert channels, such as those involving electromagnetic emissions, which we do not consider and which have been studied for decades.13 One approach to preventing careless, casual ambient information flows is object-capability security in safe programming languages.12
"OpenSSL must die, for it will never get any better."
—Poul-Henning Kamp
Kamp describes how security-critical software can become a dilapidated dumping ground beyond hope of auditing.8 Ken Thompson devoted his Turing Award lecture17 to the limitations of source-code auditing and to a memorable rule about trusting anything designed and built by strangers: Don't.
Black reveals the patient ingenuity of the criminal mind in his account of a long career as a burglar, safe cracker, and jailbreaker.1
Grab the example code tarball at https://queue.acm.org/downloads/2025/Drill_Bits_15_example_code.tar.gz. You get all of the code discussed in this column, scripts to compile and run it, plus a vDSO demo, answers to some of the Drills below, and an alternative version of run_coproc.c
contributed by a reviewer.
$ echo 'int main(void){}' | gcc -x c - ; strace ./a.out
strace
complex deterministic programs. Is getrandom()
called? Why??bpalloc.c
, uses a kludgy loop to align pointers. Replace this with a more elegant one-step alignment. wait
-ing for the child coprocess to exit, should the application process (figures 5 and 7) kill the child?kernel/seccomp.c
and amend our sandboxes accordingly.prctl(PR_SET_DUMPABLE)
, madvise(MADV_DONTDUMP)
, and /proc/self/coredump_filter
.seccomp
? What are the drawbacks and limitations of traditional mechanisms such as setuid
and chroot
? What are the best modern alternatives?sort
is often used as a filter. Would a filter-sandboxed sort
work as well as the original? (Hint: Consider external sorting.)sum_app
.zlib
to defeat this countermeasure by collusion between compression and decompression routines.zpipe_sandbox
stand-alone versus run via run_coproc
.$ cat bigfile | ./run_coproc ... ./run_coproc /bin/cat > /dev/null
splice
for more efficient pipe-to-pipe copying in the copy
function of run_coproc.c
(lines 3–4 of figure 7)?stderr
stream of the coprocesses of figures 5 and 7?seccomp
's more sophisticated filter mode. What syscalls can safely be permitted within a sandbox?zpipe_sandbox
dynamically with zlib.so
and run it with run_coproc
. To prevent dynamic linking from making banned syscalls after seccomp
, check out the LD_BIND_NOW
environment variable and the "-z now
" option of GNU ld
.run_coproc
runs it as a coprocess. Bonus points: Do the same for the alternative run_coproc_SP
.
Zi Fan Tan suggested that Drill Bits explore seccomp
-based sandboxes and showed us how to subvert them with constructors. Jacob Bachmeyer, John Dilley, Paul Lawrence, and Sergey Poznyakoff reviewed our example code meticulously and repeatedly, fixing several bugs and suggesting numerous improvements. Poznyakoff contributed a variant of run_coproc.c
. Jon Bentley, Bachmeyer, Dilley, Lawrence, and Poznyakoff reviewed drafts of this column, again fixing bugs and suggesting numerous improvements. Hans Boehm provided insights and pointers related to constructors. Boehm serves on the C++ Standard Committee; Lawrence, from the Google Android team, wrote the initial seccomp
filter for applications.
zlib
bug that created a remote root exploit vulnerability in sshd
more than 20 years before the recent ssh/xz
affair.
Terence Kelly ([email protected]) and Edison Fuh aren't merely trusted, they're trustworthy.
Copyright © 2025 held by owner/author. Publication rights licensed to ACM.
Originally published in Queue vol. 23, no. 2—
Comment on this article in the ACM Digital Library