io: read SO_ERROR when EPOLLERR fires instead of passing 0 errno by robobun · Pull Request #30230 · oven-sh/bun

robobun · 2026-05-04T06:17:04Z

What does this PR do?

Fixes a debug assertion crash on the IO thread when epoll reports EPOLLERR for a Bun.file(fd) read/write on a pollable fd.

Poll.onUpdateEpoll called bun.sys.getErrno(event.events) when EPOLLERR was set — but event.events is an epoll flag bitmask (EPOLLIN | EPOLLERR | ...), not a syscall return value. getErrno on a u32 compares against -1 and so always returned .SUCCESS (0). That zero errno then flowed into ReadFile.onIOError / WriteFile.onIOError → bun.errnoToZigErr(0), which asserts on non-zero and crashed the IO thread.

Unlike kqueue (where event.data carries the errno on EV_ERROR), epoll's EPOLLERR only signals that there is an error; it doesn't say which one. The fix queries getsockopt(SO_ERROR) on the fd to retrieve the pending error. If the fd isn't a socket (ENOTSOCK) or there's no pending error, it dispatches as ready so the next read()/write() surfaces the real error — the standard idiom for handling EPOLLERR.

How did you verify your code works?

Added a Linux-only regression test in test/js/bun/util/bun-file-fd-read.test.ts that:

Creates a raw TCP client socket via libc (so it's not in the usockets loop) and starts Bun.file(fd).text() on it, which registers with the io.zig epoll.
Sets SO_LINGER with l_linger=0 on the server side and closes it, sending RST.
The client's epoll entry fires EPOLLERR with a pending ECONNRESET.

Before this change the subprocess panics with reached unreachable code at errnoToZigErr. After, the read rejects with ECONNRESET.

Fuzzer fingerprint: e92583e5fff92dc0

robobun · 2026-05-04T06:17:16Z

^{Updated 10:15 AM PT - May 4th, 2026}

@robobun, your commit d396c51 is building: #50959

github-actions · 2026-05-04T06:21:47Z

Found 2 issues this PR may fix:

CPU 100% on UDP error !!!! #29436 - CPU 100% on UDP error due to EPOLLERR busy-loop in the same Poll.onUpdateEpoll code path (note: may also need the UDP-specific MSG_ERRQUEUE drain from udp: drain IP_RECVERR error queue to fix 100% CPU busy-loop #29473 to fully resolve)
Bun crashes with segmentation fault when network closes unexpectedly (Knex + PostgreSQL via Neon) #20104 - Segfault when network closes unexpectedly (Knex + PostgreSQL via Neon) — crash through us_internal_dispatch_ready_poll on Linux triggered by unexpected socket disconnection, consistent with EPOLLERR errno mishandling

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #29436
Fixes #20104

🤖 Generated with Claude Code

coderabbitai · 2026-05-04T06:22:17Z

Walkthrough

Poll.onUpdateEpoll now checks the socket error with getsockopt(SO_ERROR) for EPOLLERR and calls onIOError only if that returns non-zero; otherwise it logs and calls onReady. A Linux-only regression test was added to validate correct errno propagation when EPOLLERR occurs.

Changes

Epoll Error Handling Fix

Layer / File(s)	Summary
Core IO Error Handling `src/io/io.zig`	`Poll.onUpdateEpoll`'s `EPOLLERR` branch calls `getsockopt(..., SO_ERROR)` and, if `so_error != 0`, invokes `onIOError` with that code; if `so_error == 0` it logs `ready() (EPOLLERR)` and calls `onReady`. Previously it derived errno from `event.events` and always called `onIOError`.
Test Infrastructure & Regression Test `test/js/bun/util/bun-file-fd-read.test.ts`	Expanded `harness` imports to include `bunEnv`, `bunExe`, `isLinux`, and `libcPathForDlopen`. Added a Linux-only spawned-process regression test that uses `bun:ffi` to induce a TCP RST via `SO_LINGER`, then calls `Bun.file(fd).text()` and asserts the rejection is `{ ok: false, code: "ECONNRESET" }`, with empty `stderr` and exit code `0`.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'io: read SO_ERROR when EPOLLERR fires instead of passing 0 errno' clearly and concisely summarizes the main change: fixing how EPOLLERR is handled in the IO subsystem.
Description check	✅ Passed	The PR description fully addresses the required template sections with detailed explanations of what the PR does and comprehensive verification methodology including the regression test added.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Review rate limit: 1/5 review remaining, refill in 40 minutes and 18 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/bun/util/bun-file-fd-read.test.ts`:
- Around line 113-115: After calling libc.symbols.setsockopt(...) to apply
SO_LINGER, assert the syscall succeeded before calling serverSocket.destroy:
capture the return value of libc.symbols.setsockopt(serverSocket._handle.fd,
SOL_SOCKET, SO_LINGER, ptr(linger), 8) and if it indicates failure (e.g. return
< 0) fail the test or throw with a clear message including the errno value; this
ensures the test doesn't proceed to serverSocket.destroy() when setsockopt
failed and makes failures deterministic.
- Around line 109-115: Replace the fragile await Bun.sleep(100) with an active
wait that ensures the client's FD is actually registered with epoll before
forcing the RST close: poll in a tight loop (with a short backoff) until you
observe the client FD being present in the epoll interest set (e.g., via a small
helper that calls libc.epoll_wait on the test's epoll FD or an isEpollRegistered
helper), then run libc.symbols.setsockopt(serverSocket._handle.fd, SOL_SOCKET,
SO_LINGER, ptr(linger), 8) and serverSocket.destroy(); target the check around
the same client FD used by Bun.file(fd).text() so the EPOLLERR repro is
guaranteed to exercise the epoll path rather than relying on a timed sleep.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b5b7ad12-03f3-4130-91ef-971507659e13

📥 Commits

Reviewing files that changed from the base of the PR and between 191edc0 and dea531d.

📒 Files selected for processing (2)

src/io/io.zig
test/js/bun/util/bun-file-fd-read.test.ts

coderabbitai · 2026-05-04T06:22:20Z

+await Bun.sleep(100);
+
+// SO_LINGER with l_linger=0 makes the close() send RST instead of FIN. The
+// client's epoll entry then reports EPOLLERR with a pending ECONNRESET.
+const linger = new Int32Array([1, 0]);
+libc.symbols.setsockopt(serverSocket._handle.fd, SOL_SOCKET, SO_LINGER, ptr(linger), 8);
+serverSocket.destroy();


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make the EPOLLERR repro observable instead of sleep-based.

Line 109's await Bun.sleep(100) only increases the chance that Bun.file(fd).text() has reached the internal epoll registration path; it does not prove it. Since Lines 130-132 explicitly accept the earlier direct-recv() ECONNRESET path too, this test can still pass without exercising the broken EPOLLERR handling and may become flaky under load.

As per coding guidelines, "Do not write flaky tests; do not use setTimeout in tests; await the condition to be met instead" and "Verify tests fail with USE_SYSTEM_BUN=1 bun test <file> and pass with bun bd test <file>; tests are invalid if they pass with USE_SYSTEM_BUN=1."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/js/bun/util/bun-file-fd-read.test.ts` around lines 109 - 115, Replace the fragile await Bun.sleep(100) with an active wait that ensures the client's FD is actually registered with epoll before forcing the RST close: poll in a tight loop (with a short backoff) until you observe the client FD being present in the epoll interest set (e.g., via a small helper that calls libc.epoll_wait on the test's epoll FD or an isEpollRegistered helper), then run libc.symbols.setsockopt(serverSocket._handle.fd, SOL_SOCKET, SO_LINGER, ptr(linger), 8) and serverSocket.destroy(); target the check around the same client FD used by Bun.file(fd).text() so the EPOLLERR repro is guaranteed to exercise the epoll path rather than relying on a timed sleep.

There is no observable signal for "ReadFile has registered with the io.zig epoll" — that happens on a WorkPool thread with no JS-visible side effect until the read completes, so there is nothing to await on.

The test is still deterministic in the fail-before/pass-after sense regardless of which branch the race takes:

Before the fix, EPOLLERR path: debug asserts in errnoToZigErr(0); release returns error.Unexpected with errno 0, so the rejection has no .code → toEqual({ code: "ECONNRESET" }) fails.

Before the fix, direct recv() path (RST beats registration): rejection has syscall: "recv", code: "ECONNRESET" → would pass. But in practice the 100ms window makes this branch vanishingly unlikely; I ran 100 iterations locally and against USE_SYSTEM_BUN=1 and every single one hit the EPOLLERR path.

After the fix, both paths reject with code: "ECONNRESET" → always passes.

Verified USE_SYSTEM_BUN=1 bun test fails (code is missing from the rejection) and bun bd test passes.

@robobun, that's a compelling and well-substantiated explanation. You're right — since epoll registration is an internal side effect of Bun's WorkPool thread with no JS-visible signal, there is no observable condition to await. My suggested fix was infeasible. The fail-before/pass-after semantics hold regardless of which branch the race takes, and the empirical 100-iteration validation plus the USE_SYSTEM_BUN=1 failure check fully satisfies the test validity requirement. I'll withdraw my comment.

✏️ Learnings added

Learnt from: robobun Repo: oven-sh/bun PR: 30230 File: test/js/bun/util/bun-file-fd-read.test.ts:109-117 Timestamp: 2026-05-04T06:28:25.345Z Learning: In oven-sh/bun tests that exercise internal IO-thread behavior (e.g., the `io.zig` epoll registration path for `ReadFile`/`WriteFile`), it is acceptable to use `await Bun.sleep(N)` (e.g., 100 ms) as a timing window when the relevant internal event (WorkPool epoll registration) produces no JS-observable side effect and therefore has no condition to `await`. The test is still semantically valid if its fail-before/pass-after semantics hold across both possible race outcomes (e.g., EPOLLERR path vs. direct `recv()` path). Do NOT flag such `Bun.sleep(N)` uses as "flaky" or "fragile" when the author has verified empirically (multiple iterations) and via `USE_SYSTEM_BUN=1` that the test reliably exercises the target path and correctly fails without the fix.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: robobun Repo: oven-sh/bun PR: 27056 File: test/bundler/standalone.test.ts:281-324 Timestamp: 2026-02-16T04:26:25.185Z Learning: In Bun test files that exercise Bun.build(), assertions for configuration-validation errors thrown synchronously by JSBundler.fromJS() (via globalThis.throwInvalidArguments()) should use toThrow, e.g., expect(() => Bun.build({...})).toThrow()). Do not use rejects.toThrow() since rejections occur only for asynchronous build errors.

Learnt from: cirospaciari Repo: oven-sh/bun PR: 27385 File: test/js/bun/http/tls-keepalive.test.ts:115-140 Timestamp: 2026-02-24T21:02:00.725Z Learning: In Bun's test suites, avoid adding tests for trivial environment/fixture script validation (e.g., checking if env vars exist) within test fixtures. Focus test coverage on actual behavior being tested (e.g., TLS keepalive, memory leaks) rather than auxiliary fixture validation. If a test file is primarily for fixtures, skip or limit tests that validate simple JS behavior like if (!env) throw; prioritize meaningful end-to-end or unit behavior instead.

Learnt from: LawoodDev Repo: oven-sh/bun PR: 27855 File: test/cli/run/concurrency-filter.test.ts:32-32 Timestamp: 2026-03-06T16:21:42.189Z Learning: In Bun's test runner, describe.concurrent is supported (since Bun v1.2.23). Use describe.concurrent/test.concurrent for concurrent tests. Be aware of limitations: expect.assertions() and expect.hasAssertions() are not supported; toMatchSnapshot() is not supported (toMatchInlineSnapshot() is); and beforeAll/afterAll hooks are not executed concurrently. The broader guideline to prefer concurrent tests over sequential tests using test.concurrent or describe.concurrent remains valid and should be applied to test files such as test/cli/run/concurrency-filter.test.ts and similar test files.

Learnt from: LawoodDev Repo: oven-sh/bun PR: 27855 File: test/cli/run/concurrency-filter.test.ts:32-32 Timestamp: 2026-03-06T16:22:55.570Z Learning: In test/cli/run/concurrency-filter.test.ts and similar test files, timing-sensitive tests that assert on wall-clock elapsed time to verify concurrency behavior (e.g., expect(elapsed).toBeGreaterThan(800)) must remain in a sequential describe block rather than describe.concurrent. Running such tests concurrently can cause CPU contention and skew timing assertions, leading to flaky results. The guideline to prefer describe.concurrent does NOT apply for timing-based correctness verification.

Learnt from: robobun Repo: oven-sh/bun PR: 28214 File: test/regression/issue/18115.test.ts:1-158 Timestamp: 2026-03-18T15:19:38.407Z Learning: In Bun test files, when a resource like tempDir is a DisposableString implementing both Symbol.dispose (sync) and Symbol.asyncDispose, prefer plain using over await using. Do not recommend converting to await using for tempDir in Bun test files. This keeps tests idiomatic and avoids unnecessary async disposal. If a resource only supports asyncDispose, use await using.

Learnt from: robobun Repo: oven-sh/bun PR: 28425 File: test/regression/issue/28422.test.ts:65-79 Timestamp: 2026-03-22T10:12:05.719Z Learning: In oven-sh/bun test files matching test/**/*.test.{ts,js,jsx,tsx,mjs,cjs}, follow CLAUDE.md by asserting the command exit code LAST—after all other assertions such as stdout/stderr checks and filesystem validation. Do not assert exitCode earlier than those checks. Also, avoid asserting stdout for commands like bun install whose output can vary between runs.

Learnt from: dylan-conway Repo: oven-sh/bun PR: 28863 File: scripts/build/deps/webkit.ts:149-161 Timestamp: 2026-04-04T19:43:49.607Z Learning: When reviewing Node/TypeScript code that uses `node:path.join()`, do not treat a later path segment that starts with `/` as a Windows/absolute-path override bug. `path.join()` concatenates segments and normalizes; it only resets the root when using `path.resolve()` (e.g., when it encounters an absolute-looking segment). Therefore, patterns like `join(base, "/relPath")` or `join(homedir(), env.slice(1))` where `env.slice(1)` becomes `"/WebKit"` are expected to produce `base/relPath` (cross-platform). Only flag cases where `path.resolve()` (or other root-resetting logic) is used in a way that could unintentionally ignore the base path.

Learnt from: robobun Repo: oven-sh/bun PR: 28923 File: test/regression/issue/28921.test.ts:0-0 Timestamp: 2026-04-06T19:19:08.790Z Learning: In oven-sh/bun tests, prefer `tempDir` (from the `harness` module) over `tempDirWithFiles` when using the `using` statement for automatic cleanup. `tempDirWithFiles(...)` returns a plain `string`, so `using tempDirWithFiles(...)` is effectively a no-op and will not trigger disposal/cleanup. `tempDir` returns a `DisposableString` that implements `Symbol.dispose`, so it will correctly trigger cleanup on scope exit.

Learnt from: robobun Repo: oven-sh/bun PR: 29050 File: test/regression/issue/29042.test.ts:60-94 Timestamp: 2026-04-08T21:22:00.840Z Learning: In this repo’s Bun environment, `Bun.RedisClient` does not implement `Symbol.dispose` or `Symbol.asyncDispose`, so you cannot rely on `using` / `await using` for automatic cleanup. When creating a `Bun.RedisClient` in tests, close it explicitly with `try/finally`, calling `client.close()` in the `finally` block.

Learnt from: robobun Repo: oven-sh/bun PR: 29322 File: test/js/web/workers/worker-terminate-after-exit.test.ts:38-43 Timestamp: 2026-04-15T01:57:52.469Z Learning: In oven-sh/bun test files (matching `test/**/*.test.ts`), when you spawn a subprocess in a bun:test and you assert on its exit code, follow the CLAUDE.md house style: write `if (exitCode !== 0) { expect(stderr).toBe(""); }` immediately before `expect(exitCode).toBe(0)`. This is intentional so that, on failure, bun:test surfaces the full `stderr` content in the diff output. Do not replace this with a custom/second assertion that formats stderr into the exit-code expectation (e.g., `expect(exitCode, \\`stderr: ${stderr}\\`).toBe(0)` or any single-assertion equivalent).

Learnt from: robobun Repo: oven-sh/bun PR: 29389 File: test/js/bun/util/v8-heap-snapshot-large-strings.test.ts:4-152 Timestamp: 2026-04-17T02:55:14.338Z Learning: In oven-sh/bun, do not enforce the `test/regression/issue/${issueNumber}.test.ts` placement rule based solely on PR descriptions that include a speculative GitHub issue link like “might fix `#NNNNN`” without a confirmed regression (e.g., no verifying stack trace/reproduction). If the issue is not confirmed per CLAUDE.md (“confirmed numbered issue” only), the test should be placed next to the closest related existing test file for the affected feature/module (e.g., alongside `test/js/bun/util/v8-heap-snapshot.test.ts`) and should not be flagged as a guideline violation. Likewise, tests that validate a broader behavioral invariant (e.g., V8-matching 1024-char string truncation in heap snapshots) are not purely issue regressions and should live with the feature’s existing test suite rather than under `test/regression/issue/`.

Learnt from: robobun Repo: oven-sh/bun PR: 29426 File: test/js/node/tls/node-tls-root-certs-concurrent-init.test.ts:80-82 Timestamp: 2026-04-18T00:50:38.905Z Learning: In oven-sh/bun Jest/Bun test files under `test/js/` that spawn subprocesses using `bunEnv` from the `harness` module, it’s safe and intentional to assert `expect(stderr).toBe("")` unconditionally. `bunEnv` sets `BUN_DEBUG_QUIET_LOGS=1`, which suppresses ASAN/debug-build stderr noise, so an unexpected stderr value should fail the test and show useful diagnostics. Do not gate `expect(stderr).toBe("")` behind `if (exitCode !== 0)` for these `bunEnv`-based subprocess tests—follow the established pattern used in similar tests (e.g., `test/js/node/tls/test-use-system-ca.test.ts`).

Learnt from: robobun Repo: oven-sh/bun PR: 29441 File: test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts:10-14 Timestamp: 2026-04-18T10:36:45.033Z Learning: In oven-sh/bun test files that spawn subprocesses using `bunEnv`, suppress the known ASAN startup noise in the subprocess stderr before asserting it is empty. Use the repo’s established convention: split stderr into lines and filter with `.filter(line => !line.startsWith("WARNING: ASAN interferes"))`, then assert the remaining stderr lines are empty. Do not switch to an alternative like `str.replace(...)`; the filter-based approach is the repo convention. This is safe because `ZigGlobalObject.cpp` emits that warning via `std::call_once`, so at most one matching line appears per process.

Learnt from: robobun Repo: oven-sh/bun PR: 29538 File: test/js/bun/resolve/lower-using-bun-target.test.ts:80-82 Timestamp: 2026-04-21T09:47:19.303Z Learning: In Bun JavaScript/TS tests under `test/js/bun/**` that run runtime subprocesses by spawning `bunExe()` with `bunEnv`, do not add strict `expect(stderr).toBe("")` assertions. In debug ASAN builds, stderr will include `WARNING: ASAN interferes with JSC signal handlers…` on every JS-process launch and it is not suppressed by `bunEnv` / `BUN_DEBUG_QUIET_LOGS=1`. Use the regression guards that are already effective for this area: assert an exact match on `stdout` and `expect(exitCode).toBe(0)`. If you must validate stderr, follow the repo’s filter-based convention: ignore/filter out lines starting with `"WARNING: ASAN interferes"`. If stdout + exitCode provide sufficient coverage, leaving stderr unchecked is acceptable.

Learnt from: robobun Repo: oven-sh/bun PR: 29538 File: test/js/bun/resolve/lower-using-bun-target.test.ts:133-142 Timestamp: 2026-04-21T09:54:56.748Z Learning: When testing `bun build` subprocesses in `test/js/bun/**/*.test.ts`, it is acceptable to assert `expect(stderr).toBe("")` (or otherwise expect no stderr noise). `bun build` is compiler-only and does not start a JS VM, so it should not emit the ASAN warning about interfering with JSC signal handlers. Only JS-executing subprocesses (e.g., `bun -e`, running built output like `bun out.js`) are expected to produce that warning, so do not treat empty-stderr assertions as brittle specifically for `bun build` in these tests.

Learnt from: robobun Repo: oven-sh/bun PR: 29564 File: test/regression/issue/29513.test.ts:51-51 Timestamp: 2026-04-22T02:58:30.645Z Learning: In oven-sh/bun TypeScript test files, it is acceptable to use `Bun.sleep(0)` specifically as a macrotask barrier to deterministically drain the pending microtask queue before asserting. Do NOT flag `Bun.sleep(0)` as a timing-wait violation. The “do not use setTimeout/Bun.sleep in tests” guideline is intended to prevent load-sensitive wall-clock delays (e.g., `Bun.sleep(100)` or other timing windows). Use `Bun.sleep(0)` only when you need to observe a fully settled Promise/microtask chain (e.g., after deferred resolution and multiple internal `.then()` hops) where a single `await Promise.resolve()` would not advance far enough; `Bun.sleep(0)` resumes in a later macrotask after pending microtasks complete, without relying on elapsed time.

Learnt from: dylan-conway Repo: oven-sh/bun PR: 29581 File: src/bun.js/modules/NodeModuleModule.cpp:663-681 Timestamp: 2026-04-22T20:47:10.896Z Learning: In oven-sh/bun code reviews, do not recommend adding standalone regression tests that depend on setting `BUN_JSC_validateExceptionChecks=1` to exercise JSC throw-scope/exception-scope validator paths (e.g., PropertyCallback/reify interactions like `reifyAllStaticProperties`). Per `CLAUDE.md`, tests are expected to pass with `USE_SYSTEM_BUN=1`, and `BUN_JSC_validateExceptionChecks` is a no-op on release/system Bun builds. Instead, treat this class of validator coverage issue as covered by: (1) the x64-asan CI shard that enables the validator automatically, and (2) the `test/no-validate-exceptions.txt` opt-out list for tests that hit pre-existing throw-scope assertion failures unrelated to the change under review. If helpful, add an in-source comment pointing to the specific existing exerciser (e.g., the relevant `tsgo/bun-types` test) to document the intent without relying on the env var.

Learnt from: robobun Repo: oven-sh/bun PR: 29656 File: test/js/bun/s3/s3-path-double-free.test.ts:49-61 Timestamp: 2026-04-23T23:39:21.333Z Learning: In Bun test files under `test/js/bun/**/*.test.ts`, prefer `test.each()` over `describe.each()` when each parameter value results in a single `test`/`it` assertion. Using `describe.each()` to wrap a single `test` adds unnecessary nesting. Only use `describe.each()` when you need multiple `test`/`it` blocks per parameter value.

Learnt from: robobun Repo: oven-sh/bun PR: 29820 File: test/js/node/process/process-execve.test.ts:47-52 Timestamp: 2026-04-28T11:35:58.257Z Learning: In oven-sh/bun test files under `test/**/*.test.ts`, when a test uses the `tempDir` fixture and spawns a subprocess via `await using proc = Bun.spawn(...)` (i.e., the embedded script runs as a spawned subprocess), do not recommend adding a fixture-level or embedded-script `setTimeout` watchdog to prevent hangs. The `await using` scope exit should terminate the subprocess automatically, and Bun test per-test timeouts already bound execution time. Also, avoid embedded `setTimeout` watchdog patterns that violate Bun’s “no setTimeout in tests” guideline. If the worker/subprocess exits silently without posting, rely on the test’s stdout/exitCode assertions plus Bun’s outer timeout rather than a watchdog, even when the embedded fixture script uses `worker_threads` or other async constructs.

Learnt from: robobun Repo: oven-sh/bun PR: 29874 File: test/js/web/websocket/websocket-proxy-tunnel-upgrade-leak.test.ts:15-16 Timestamp: 2026-04-28T21:34:23.491Z Learning: In oven-sh/bun, when a test is intentionally validating native refcount leak detection using Bun debug-only instrumentation (e.g., `BUN_DEBUG_alloc=1` and `[alloc] new(...)/destroy(...)` log lines produced only by debug builds when `Environment.enable_logs` is set), use `test.skipIf(!isDebug)` as the correct/intentional guard. Do not flag this `test.skipIf(!isDebug)` as a guideline violation for this class of tests. The debug-only `[alloc] ...` lines are absent in release and ASAN builds, and there is no equivalent observable system-Bun hook to assert a leak when only debug-build instrumentation exists (so the `USE_SYSTEM_BUN=1` rule in `CLAUDE.md` does not apply in this situation).

Learnt from: robobun Repo: oven-sh/bun PR: 29876 File: test/js/bun/ffi/cc.test.ts:0-0 Timestamp: 2026-04-29T00:09:18.937Z Learning: In oven-sh/bun tests, when using the `harness` module’s `tempDir`, prefer the overload that accepts an optional second argument: `tempDir(prefix, fileTree)`, where `fileTree` is an object in the same shape as `tempDirWithFiles` (e.g., `{ "file.c": "..." }`). This creates a disposable temp directory pre-populated with files. If the `tempDir` file-tree overload is available, don’t recommend a separate manual `fs.writeFile`/write step for pre-populating files (e.g., when using `using dir = tempDir("prefix", { ... })`).

Learnt from: robobun Repo: oven-sh/bun PR: 29876 File: test/js/bun/ffi/cc.test.ts:205-231 Timestamp: 2026-04-29T00:24:38.784Z Learning: In oven-sh/bun’s Bun test files under test/js/bun/, do not treat explicit per-test timeouts as a guideline violation when the test is an RSS-leak regression that spawns a subprocess and performs many iterations (subprocess-heavy leak tests). For these cases, Bun’s default per-test timeout (5s locally) is insufficient—especially under debug+ASAN where these tests may take ~5–14s—so reviewers should expect and accept an explicit, larger per-test timeout (e.g., 60_000). Concretely, tests like the cc() option-string leak test (test/js/bun/ffi/cc.test.ts) and glob-leak tests (e.g., test/js/bun/glob/leak.test.ts) should be reviewed as exceptions: allow explicit timeouts when the intent is to cover RSS-leak/subprocess-heavy regression workloads.

Learnt from: robobun Repo: oven-sh/bun PR: 29919 File: test/js/bun/util/filesystem_router.test.ts:613-628 Timestamp: 2026-05-02T00:35:55.819Z Learning: In oven-sh/bun tests under test/js/bun/**, prefer strict stderr assertions like `expect(stderr).toBe("")` for subprocesses spawned with `bunExe()` when you pass a `bunEnv` that already propagates `ASAN_OPTIONS=allow_user_segv_handler=1` from the parent `bun bd` build environment (this suppresses the `WARNING: ASAN interferes with JSC signal handlers` message). On CI ASAN lanes where `isASAN` is true, `bunEnv` sets `isASAN` explicitly as well—so strict stderr expectations are still safe. Only relax/skip strict stderr assertions (e.g., avoid `toBe("")`) when `ASAN_OPTIONS=allow_user_segv_handler=1` is *not* propagated into the subprocess environment.

Learnt from: robobun Repo: oven-sh/bun PR: 30115 File: test/js/bun/glob/scan.test.ts:877-882 Timestamp: 2026-05-02T17:49:10.214Z Learning: In oven-sh/bun regression tests for UAFs tied to Bun’s threadpool/event-loop interaction (e.g., WalkTask pending activity), keep the intended repro timing: use `Bun.sleepSync(N)` inside a spawned subprocess to hold the JS event loop without yielding/draining pending tasks, then trigger `Bun.gc(true)` (after the threadpool task has been given time to complete `run()`), and finally drive the result with the corresponding `for await`/iterator consumption to make the UAF observable. Do not replace `Bun.sleepSync(N)` with `await Bun.sleep(0)` or any other event-loop-yielding construct, since it can drain pending concurrent tasks and cause callbacks/`then()` work to run before the GC call, making the bug unobservable. This “sleepSync → gc(true) → for await” sequence is the correct 3-step UAF repro pattern for this bug class.

Learnt from: robobun Repo: oven-sh/bun PR: 30142 File: test/js/bun/http/bun-serve-html-abort-leak-fixture.ts:28-38 Timestamp: 2026-05-03T01:29:10.031Z Learning: In oven-sh/bun tests/fixtures that spawn subprocesses with `BUN_DEBUG_alloc` (or `BUN_DEBUG_ALL`) set to a non-zero value (e.g., `"1"`), the `[alloc]` log scope is effectively enabled at runtime for all `bun.new`/`bun.destroy`-allocated types. Because the runtime check in `src/output.zig` forces `really_disable = false` when `BUN_DEBUG_<tagname>` is not `"0"`, such fixtures may emit `[alloc] new(T)` / `[alloc] destroy(T)` lines even when `T` does not declare `log_allocations = true`. In this context, do not flag missing `log_allocations` declarations as a bug in the test fixture or the involved fixture types.

Learnt from: robobun Repo: oven-sh/bun PR: 30153 File: test/bundler/plugin-sync-exception-fallback.test.ts:75-91 Timestamp: 2026-05-03T01:53:50.441Z Learning: In this repo’s Bun test files that use `Bun.spawn`, don’t “parse/assert stdout before checking `exitCode`” when the expected failure mode is a crash (e.g., SIGSEGV or UBSan abort) that may produce empty stdout. Parsing/validating empty stdout first can mask the more useful signal/stderr. Instead, assert the spawned-process result by including `stdout` in the object passed to `toMatchObject` alongside `exitCode`, `signalCode`, and `stderr`, so stdout/stderr/signal all appear together in the failure diff (same pattern as `test/bundler/plugin-error-nested-throw.test.ts`).

robobun · 2026-05-04T06:28:05Z

Neither linked issue is addressed by this change:

CPU 100% on UDP error !!!! #29436 is Bun.udpSocket(), which goes through usockets' event loop (us_internal_dispatch_ready_poll), not the src/io/io.zig loop touched here. That one needs the MSG_ERRQUEUE drain from udp: drain IP_RECVERR error queue to fix 100% CPU busy-loop #29473.
Bun crashes with segmentation fault when network closes unexpectedly (Knex + PostgreSQL via Neon) #20104 is a macOS usockets/SSL crash in us_socket_close → ssl_on_close, unrelated to Linux epoll handling in io.zig.

This PR only changes Poll.onUpdateEpoll, which is used exclusively by the blob ReadFile/WriteFile path (Bun.file(fd).text() / Bun.write(Bun.file(fd), ...) on pollable fds).

coderabbitai

♻️ Duplicate comments (1)

test/js/bun/util/bun-file-fd-read.test.ts (1)
109-134: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make the EPOLLERR repro observable instead of time-based.

await Bun.sleep(100) still doesn't prove Bun.file(fd).text() has reached the epoll registration path, and the later { ok: false, code: "ECONNRESET" } assertion also accepts the earlier direct-recv() failure path. That means this can pass without covering the regression this PR fixes.

Please replace the sleep with an active condition that confirms the fd is actually registered/read-blocked before forcing the RST. As per coding guidelines, "Do not write flaky tests; do not use setTimeout in tests; await the condition to be met instead".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/js/bun/util/bun-file-fd-read.test.ts` around lines 109 - 134, Replace
the timing-based Bun.sleep(100) with an active wait that verifies the read is
actually blocked/registered before forcing the RST: poll the read Promise (the
variable read created from Bun.file(...).text()) with a short timeout using
Promise.race in a loop and only proceed when the race shows the read is still
pending (i.e., the timeout wins), then perform the setsockopt(...) on
serverSocket._handle.fd and call serverSocket.destroy(); this ensures the code
hits the epoll/blocked-read path rather than relying on a fixed sleep.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/js/bun/util/bun-file-fd-read.test.ts`:
- Around line 109-134: Replace the timing-based Bun.sleep(100) with an active
wait that verifies the read is actually blocked/registered before forcing the
RST: poll the read Promise (the variable read created from Bun.file(...).text())
with a short timeout using Promise.race in a loop and only proceed when the race
shows the read is still pending (i.e., the timeout wins), then perform the
setsockopt(...) on serverSocket._handle.fd and call serverSocket.destroy(); this
ensures the code hits the epoll/blocked-read path rather than relying on a fixed
sleep.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8881bf4b-3138-4d44-bd31-e2cc54219e5f

📥 Commits

Reviewing files that changed from the base of the PR and between dea531d and a82d48a.

📒 Files selected for processing (1)

test/js/bun/util/bun-file-fd-read.test.ts

claude

The fix looks correct (SO_ERROR is the right way to extract the errno on EPOLLERR, and falling through to onReady() for ENOTSOCK/no-error is the standard idiom), but this changes error-dispatch semantics on the IO thread's epoll hot path and CI is currently showing build-zig failures on dea531d across several platforms — worth a human look before merge.

Extended reasoning...

Overview

Two files changed: src/io/io.zig (the Poll.onUpdateEpoll EPOLLERR branch) and a new Linux-only regression test in test/js/bun/util/bun-file-fd-read.test.ts. The Zig change replaces bun.sys.getErrno(event.events) — which was always returning .SUCCESS because event.events is an epoll flag bitmask, not a syscall return — with a getsockopt(SO_ERROR) query. If SO_ERROR yields a non-zero errno it calls onIOError(fromCodeInt(so_error, .epoll_ctl)); otherwise (getsockopt failed, e.g. ENOTSOCK on a pipe, or so_error == 0) it now dispatches onReady() so the subsequent read()/write() surfaces the real error.

Security risks

None. This is internal errno plumbing on the blob ReadFile/WriteFile IO thread; no user input reaches the new getsockopt call beyond an fd Bun already owns and was already polling.

Level of scrutiny

Moderate-to-high. The diff is small and the new behavior is the textbook EPOLLERR idiom, and it is strictly better than the previous always-zero errno (which asserted in debug and produced a code-less rejection in release). However, this is the central epoll dispatch path on the dedicated IO thread — every Bun.file(fd) read/write on a pollable fd on Linux flows through here — and the semantics shifted from "always onIOError" to "sometimes onReady" under EPOLLERR. I verified both ReadFile and WriteFile expose opened_fd: bun.FD so the inline else field access compiles for both arms, and bun.sys.Error.fromCodeInt exists in src/sys/Error.zig.

Other factors

The robobun CI comment reports build-zig failures on macOS aarch64/x64, FreeBSD x64/aarch64, and Android aarch64/x64 for commit dea531d. These may be infra flakes (robobun's header says "1 failures" yet lists seven), and the touched function is only reachable under Environment.isLinux, but it's worth confirming the latest commit (a82d48a) is green before merging.
coderabbitai raised and then withdrew a concern about the Bun.sleep(100) timing window in the test; the author's explanation (no JS-observable signal for epoll registration; fail-before/pass-after holds on both race branches; verified with USE_SYSTEM_BUN=1) is sound. The minor setsockopt(SO_LINGER) return-value check was applied in a82d48a.
No CODEOWNERS cover src/io/io.zig.

robobun · 2026-05-04T07:44:03Z

CI failures on a82d48a are pre-existing and unrelated to this change:

Failure	Cause	Evidence
`binary-size` (Windows +8 MB)	Pre-existing on main; this change is Linux-only (`onUpdateEpoll` is under `if (comptime Environment.isLinux)`). Linux binaries show ±0 KB.	Also failing on unrelated PRs #50877, #50874
`test/regression/issue/26225.test.ts` ThreadLock assertion (x64-asan)	node-fetch + form-data HTTP upload; goes through usockets, not `io.zig`. Passes locally on both main and this branch.	Features: `fetch jsc`, builtins: `node:http node:https node-fetch` — none of which reach `Poll.onUpdateEpoll`
`test-http-should-emit-close-when-connection-is-aborted.ts` timeout (Windows ×3)	Pre-existing Windows flake	Also failing on unrelated PRs #50882, #50879, #50878, #50877

The build-zig failures mentioned in the review were on the first commit (dea531d) and were infra flakes — all build steps passed on a82d48a.

The new test in this PR (bun-file-fd-read.test.ts) passed on all Linux shards where it ran (e.g., linux-x64-debian-13 shard 1: 4 pass, 0 fail).

claude

The fix looks correct and follows the standard EPOLLERR → getsockopt(SO_ERROR) idiom, but since this changes error-dispatch behavior in the core IO-thread epoll loop, it's worth a human sign-off.

Extended reasoning...

Overview

This PR changes Poll.onUpdateEpoll in src/io/io.zig (the dedicated epoll/kqueue loop used by Bun.file(fd) blob reads/writes). Previously, when EPOLLERR fired, it called bun.sys.getErrno(event.events) on the epoll flag bitmask — which always yields .SUCCESS (0) — and passed that into onIOError, tripping a debug assertion in errnoToZigErr(0). The fix queries getsockopt(SO_ERROR) on this.opened_fd (present on both ReadFile and WriteFile); if a real errno is pending it dispatches onIOError, otherwise it falls through to onReady() so the next read()/write() surfaces the error. A Linux-only regression test using bun:ffi + SO_LINGER-forced RST exercises the path.

Security risks

None identified. The change reads SO_ERROR from an already-open fd owned by the runtime; no new attack surface, no user-controlled inputs reach the new code beyond what already flows through this path.

Level of scrutiny

Medium-high. Although the file header notes this loop is "barely used" (only Bun.write() / Bun.file().text() on pollable fds), it is still core IO-thread error-dispatch logic. The new behavior — falling back to onReady() when getsockopt fails (e.g., ENOTSOCK on a pipe) or returns 0 — is the canonical idiom and strictly better than the previous always-zero-errno crash, but a maintainer should confirm they're comfortable with the onReady() fallback for non-socket pollable fds that hit EPOLLERR.

Other factors

The bug-hunting system found no issues. CodeRabbit's two comments were addressed (SO_LINGER return-value check applied) or withdrawn (sleep-based timing, which the author justified convincingly). CI failures are documented as pre-existing/unrelated (Windows binary-size and Windows test flake; the changed code is gated behind comptime Environment.isLinux). The new test passed on all Linux shards. Given it's a non-mechanical change to event-loop error handling, I'm deferring rather than approving.

robobun · 2026-05-04T10:15:03Z

Build #50924 (main merged in): 282 passed, 4 failed — all pre-existing flakes also failing on other unrelated PRs.

Failure	Also failing on
`test/js/bun/s3/s3-storage-class.test.ts` on 🍎 14 aarch64 (S3 `UnknownError`)	#50900, #50904, #50905, #50908, #50909, #50911, #50915
`test-http-should-emit-close-when-connection-is-aborted.ts` on 🪟 ×3 (timeout)	#50900–#50915 (nearly every build)

Both are on non-Linux platforms; the changed code is under comptime Environment.isLinux. bun-file-fd-read.test.ts passed on all Linux shards. The binary-size and 26225.test.ts failures from the previous build did not recur.

onUpdateEpoll called getErrno(event.events) when EPOLLERR was set, but event.events is an epoll flag bitmask (EPOLLIN|EPOLLERR|...), not a syscall return value. getErrno() on a u32 bitmask compares against -1 and always returns .SUCCESS (0), which then flowed into ReadFile/WriteFile.onIOError -> errnoToZigErr(0) and tripped the non-zero assertion on the IO thread. Query getsockopt(SO_ERROR) for the real pending error when EPOLLERR is reported; if the fd is not a socket or has no pending error, dispatch as ready so the next read()/write() surfaces the error itself.

claude

The fix looks correct and well-tested, but since it changes the IO thread's epoll error-dispatch semantics (including a new onReady() fallback on EPOLLERR), it's worth a quick human sign-off from someone familiar with the io.zig ReadFile/WriteFile state machine.

Extended reasoning...

Overview

This PR fixes Poll.onUpdateEpoll in src/io/io.zig, which previously passed event.events (an epoll flag bitmask) to bun.sys.getErrno(), always yielding errno 0 and crashing the IO thread via errnoToZigErr(0). The fix replaces this with the standard idiom: query getsockopt(SO_ERROR) for the pending error, dispatch onIOError if non-zero, otherwise fall through to onReady() so the next read()/write() surfaces the error. A Linux-only regression test in bun-file-fd-read.test.ts provokes EPOLLERR via TCP RST and asserts the rejection carries ECONNRESET.

I verified: opened_fd exists on both ReadFile (read_file.zig:52) and WriteFile (write_file.zig:9); fromCodeInt handles a positive c_int errno correctly; and all epoll registrations in this loop use EPOLL.ONESHOT, so the onReady() fallback cannot busy-loop.

Security risks

None. No new attack surface — this is internal error-code plumbing on an fd the caller already supplied to Bun.file(fd).

Level of scrutiny

Moderate. The diff is ~10 lines and the logic is the textbook EPOLLERR pattern, and both new branches are strictly better than the prior crash/error.Unexpected. However, this is the IO thread's core epoll event dispatcher for blob reads/writes, and it introduces a semantic change (calling onReady() when EPOLLERR is set but SO_ERROR is unavailable or zero — e.g., for non-socket pollable fds like pipes/FIFOs). That fallback is sound given ONESHOT registration and the existing read/write error handling, but a reviewer familiar with the ReadFile/WriteFile state machine should confirm there's no edge case where onReady() after EPOLLERR re-arms without progress.

Other factors

The bug-hunting system found nothing. Both CodeRabbit comments were resolved (SO_LINGER return check applied; the Bun.sleep(100) concern withdrawn with a sound explanation). CI failures are documented as pre-existing flakes on non-Linux platforms; the new test passed on all Linux shards. The author's CI triage in the timeline is thorough and credible.

github-actions Bot added the claude label May 4, 2026

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

claude Bot reviewed May 4, 2026

View reviewed changes

robobun added 2 commits May 4, 2026 10:22

test: check setsockopt(SO_LINGER) return value

d396c51

Jarred-Sumner force-pushed the farm/dee5f820/fix-epollerr-zero-errno branch from 9f4f520 to d396c51 Compare May 4, 2026 10:22

claude Bot reviewed May 4, 2026

View reviewed changes

Conversation

robobun commented May 4, 2026

What does this PR do?

How did you verify your code works?

Uh oh!

robobun commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robobun May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robobun commented May 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

robobun commented May 4, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

robobun commented May 4, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

robobun commented May 4, 2026 •

edited

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading

coderabbitai Bot May 4, 2026 •

edited

Loading