-
Notifications
You must be signed in to change notification settings - Fork 4.7k
udp: drain IP_RECVERR error queue to fix 100% CPU busy-loop #29473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
robobun
wants to merge
3
commits into
main
Choose a base branch
from
farm/e52dd978/fix-udp-errqueue-busy-loop
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,150 @@ | ||
| // https://github.com/oven-sh/bun/issues/29436 | ||
| // | ||
| // Sending a UDP datagram to a port with no listener on Linux generates an | ||
| // ICMP "port unreachable". With IP_RECVERR enabled the kernel queues this on | ||
| // the socket's error queue and raises EPOLLERR. The error queue must be read | ||
| // with recvmsg(MSG_ERRQUEUE) — plain recvmsg reports the pending error once | ||
| // but does not dequeue it, so EPOLLERR stays level-triggered and epoll_wait | ||
| // busy-loops at 100% CPU forever. | ||
|
|
||
| import { expect, test } from "bun:test"; | ||
| import { bunEnv, bunExe, isLinux } from "harness"; | ||
|
|
||
| // Port 1 (tcpmux) is privileged (< 1024) so the kernel never auto-assigns it | ||
| // and no userspace process binds it in CI — guarantees ICMP port-unreachable | ||
| // without a bind→close→send TOCTOU race on an ephemeral port. | ||
| const deadPort = 1; | ||
|
|
||
| // Each test spawns a subprocess that sleeps up to ~3s; debug/ASAN builds add | ||
| // several seconds of startup, so budget well above the 5s default. | ||
| const timeout = 20_000; | ||
|
|
||
| async function run(script: string) { | ||
| await using proc = Bun.spawn({ | ||
| cmd: [bunExe(), "-e", script], | ||
| env: bunEnv, | ||
| stdout: "pipe", | ||
| stderr: "inherit", | ||
| }); | ||
| const [stdout, exitCode] = await Promise.all([proc.stdout.text(), proc.exited]); | ||
|
|
||
| const result = JSON.parse(stdout.trim()); | ||
| // The error handler should fire exactly once per ICMP error, not zero | ||
| // (event swallowed) and not unbounded (re-fired every loop tick). | ||
| expect(result.errorCount).toBe(1); | ||
| expect(result.errorCode).toBe("ECONNREFUSED"); | ||
| // The socket must remain open and usable after a transient ICMP error — | ||
| // a "fix" that closes it on error would also stop the busy-loop. | ||
| expect(result.closed).toBe(false); | ||
| // The buggy build burns ~100% CPU (cpuMs ≈ wallMs). A fixed build idles; | ||
| // even under debug/ASAN it stays well below 75% of wall time. | ||
| expect(result.cpuMs).toBeLessThan(result.wallMs * 0.75); | ||
| expect(exitCode).toBe(0); | ||
| } | ||
|
|
||
| // IP_RECVERR is Linux-only; on other platforms the send either silently | ||
| // succeeds (no ICMP surfaced on unconnected sockets) or errors synchronously. | ||
| test.skipIf(!isLinux)( | ||
| "Bun.udpSocket: ICMP error does not busy-loop the event loop", | ||
| () => | ||
| run(/* js */ ` | ||
| let errorCount = 0; | ||
| let errorCode; | ||
| const { promise: gotError, resolve } = Promise.withResolvers(); | ||
| const socket = await Bun.udpSocket({ | ||
| socket: { | ||
| error(err) { | ||
| errorCount++; | ||
| errorCode ??= err?.code; | ||
| resolve(); | ||
| }, | ||
| }, | ||
| }); | ||
| socket.send("x", ${deadPort}, "127.0.0.1"); | ||
| await Promise.race([gotError, Bun.sleep(2000)]); | ||
|
|
||
| // Measure CPU time consumed while the process should be idle. With the | ||
| // bug, the event loop spins and CPU time ~= wall time. | ||
| const wallMs = 1000; | ||
| const before = process.cpuUsage(); | ||
| await Bun.sleep(wallMs); | ||
| const after = process.cpuUsage(before); | ||
| const cpuMs = (after.user + after.system) / 1000; | ||
|
|
||
| const closed = socket.closed; | ||
| socket.close(); | ||
| console.log(JSON.stringify({ errorCount, errorCode, closed, cpuMs, wallMs })); | ||
| `), | ||
| timeout, | ||
| ); | ||
|
|
||
| // Connected UDP: the kernel's udp_err() sets sk->sk_err AND enqueues to the | ||
| // error queue. Draining the error queue via MSG_ERRQUEUE clears sk_err (in | ||
| // sock_dequeue_err_skb) for the last ICMP entry; a follow-up SO_ERROR read | ||
| // consumes any residual sk_err so EPOLLERR deasserts. | ||
| test.skipIf(!isLinux)( | ||
| "Bun.udpSocket (connected): ICMP error does not busy-loop the event loop", | ||
| () => | ||
| run(/* js */ ` | ||
| let errorCount = 0; | ||
| let errorCode; | ||
| const { promise: gotError, resolve } = Promise.withResolvers(); | ||
|
|
||
| const socket = await Bun.udpSocket({ | ||
| connect: { hostname: "127.0.0.1", port: ${deadPort} }, | ||
| socket: { | ||
| error(err) { | ||
| errorCount++; | ||
| errorCode ??= err?.code; | ||
| resolve(); | ||
| }, | ||
| }, | ||
| }); | ||
| socket.send("x"); | ||
| await Promise.race([gotError, Bun.sleep(2000)]); | ||
|
|
||
| const wallMs = 1000; | ||
| const before = process.cpuUsage(); | ||
| await Bun.sleep(wallMs); | ||
| const after = process.cpuUsage(before); | ||
| const cpuMs = (after.user + after.system) / 1000; | ||
|
|
||
| const closed = socket.closed; | ||
| socket.close(); | ||
| console.log(JSON.stringify({ errorCount, errorCode, closed, cpuMs, wallMs })); | ||
| `), | ||
| timeout, | ||
| ); | ||
|
|
||
| test.skipIf(!isLinux)( | ||
| "node:dgram: ICMP error does not busy-loop the event loop", | ||
| () => | ||
| run(/* js */ ` | ||
| const dgram = require("node:dgram"); | ||
| let errorCount = 0; | ||
| let errorCode; | ||
| const { promise: gotError, resolve } = Promise.withResolvers(); | ||
| const sock = dgram.createSocket("udp4"); | ||
| sock.on("error", err => { | ||
| errorCount++; | ||
| errorCode ??= err?.code; | ||
| resolve(); | ||
| }); | ||
| sock.send("x", ${deadPort}, "127.0.0.1"); | ||
| await Promise.race([gotError, Bun.sleep(2000)]); | ||
|
|
||
| const wallMs = 1000; | ||
| const before = process.cpuUsage(); | ||
| await Bun.sleep(wallMs); | ||
| const after = process.cpuUsage(before); | ||
| const cpuMs = (after.user + after.system) / 1000; | ||
|
|
||
| // Still bound and usable — address() throws ERR_SOCKET_DGRAM_NOT_RUNNING | ||
| // if the socket was torn down. | ||
| let closed; | ||
| try { sock.address(); closed = false; } catch { closed = true; } | ||
| sock.close(); | ||
| console.log(JSON.stringify({ errorCount, errorCode, closed, cpuMs, wallMs })); | ||
| `), | ||
| timeout, | ||
| ); |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.