io: read SO_ERROR when EPOLLERR fires instead of passing 0 errno#30230
io: read SO_ERROR when EPOLLERR fires instead of passing 0 errno#30230robobun wants to merge 2 commits into
Conversation
|
Found 2 issues this PR may fix:
🤖 Generated with Claude Code |
Walkthrough
ChangesEpoll Error Handling Fix
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 1/5 review remaining, refill in 40 minutes and 18 seconds. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/js/bun/util/bun-file-fd-read.test.ts`:
- Around line 113-115: After calling libc.symbols.setsockopt(...) to apply
SO_LINGER, assert the syscall succeeded before calling serverSocket.destroy:
capture the return value of libc.symbols.setsockopt(serverSocket._handle.fd,
SOL_SOCKET, SO_LINGER, ptr(linger), 8) and if it indicates failure (e.g. return
< 0) fail the test or throw with a clear message including the errno value; this
ensures the test doesn't proceed to serverSocket.destroy() when setsockopt
failed and makes failures deterministic.
- Around line 109-115: Replace the fragile await Bun.sleep(100) with an active
wait that ensures the client's FD is actually registered with epoll before
forcing the RST close: poll in a tight loop (with a short backoff) until you
observe the client FD being present in the epoll interest set (e.g., via a small
helper that calls libc.epoll_wait on the test's epoll FD or an isEpollRegistered
helper), then run libc.symbols.setsockopt(serverSocket._handle.fd, SOL_SOCKET,
SO_LINGER, ptr(linger), 8) and serverSocket.destroy(); target the check around
the same client FD used by Bun.file(fd).text() so the EPOLLERR repro is
guaranteed to exercise the epoll path rather than relying on a timed sleep.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b5b7ad12-03f3-4130-91ef-971507659e13
📒 Files selected for processing (2)
src/io/io.zigtest/js/bun/util/bun-file-fd-read.test.ts
| await Bun.sleep(100); | ||
|
|
||
| // SO_LINGER with l_linger=0 makes the close() send RST instead of FIN. The | ||
| // client's epoll entry then reports EPOLLERR with a pending ECONNRESET. | ||
| const linger = new Int32Array([1, 0]); | ||
| libc.symbols.setsockopt(serverSocket._handle.fd, SOL_SOCKET, SO_LINGER, ptr(linger), 8); | ||
| serverSocket.destroy(); |
There was a problem hiding this comment.
Make the EPOLLERR repro observable instead of sleep-based.
Line 109's await Bun.sleep(100) only increases the chance that Bun.file(fd).text() has reached the internal epoll registration path; it does not prove it. Since Lines 130-132 explicitly accept the earlier direct-recv() ECONNRESET path too, this test can still pass without exercising the broken EPOLLERR handling and may become flaky under load.
As per coding guidelines, "Do not write flaky tests; do not use setTimeout in tests; await the condition to be met instead" and "Verify tests fail with USE_SYSTEM_BUN=1 bun test <file> and pass with bun bd test <file>; tests are invalid if they pass with USE_SYSTEM_BUN=1."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/js/bun/util/bun-file-fd-read.test.ts` around lines 109 - 115, Replace
the fragile await Bun.sleep(100) with an active wait that ensures the client's
FD is actually registered with epoll before forcing the RST close: poll in a
tight loop (with a short backoff) until you observe the client FD being present
in the epoll interest set (e.g., via a small helper that calls libc.epoll_wait
on the test's epoll FD or an isEpollRegistered helper), then run
libc.symbols.setsockopt(serverSocket._handle.fd, SOL_SOCKET, SO_LINGER,
ptr(linger), 8) and serverSocket.destroy(); target the check around the same
client FD used by Bun.file(fd).text() so the EPOLLERR repro is guaranteed to
exercise the epoll path rather than relying on a timed sleep.
There was a problem hiding this comment.
There is no observable signal for "ReadFile has registered with the io.zig epoll" — that happens on a WorkPool thread with no JS-visible side effect until the read completes, so there is nothing to await on.
The test is still deterministic in the fail-before/pass-after sense regardless of which branch the race takes:
- Before the fix, EPOLLERR path: debug asserts in
errnoToZigErr(0); release returnserror.Unexpectedwith errno 0, so the rejection has no.code→toEqual({ code: "ECONNRESET" })fails. - Before the fix, direct recv() path (RST beats registration): rejection has
syscall: "recv",code: "ECONNRESET"→ would pass. But in practice the 100ms window makes this branch vanishingly unlikely; I ran 100 iterations locally and againstUSE_SYSTEM_BUN=1and every single one hit the EPOLLERR path. - After the fix, both paths reject with
code: "ECONNRESET"→ always passes.
Verified USE_SYSTEM_BUN=1 bun test fails (code is missing from the rejection) and bun bd test passes.
There was a problem hiding this comment.
@robobun, that's a compelling and well-substantiated explanation. You're right — since epoll registration is an internal side effect of Bun's WorkPool thread with no JS-visible signal, there is no observable condition to await. My suggested fix was infeasible. The fail-before/pass-after semantics hold regardless of which branch the race takes, and the empirical 100-iteration validation plus the USE_SYSTEM_BUN=1 failure check fully satisfies the test validity requirement. I'll withdraw my comment.
✏️ Learnings added
Learnt from: robobun
Repo: oven-sh/bun PR: 30230
File: test/js/bun/util/bun-file-fd-read.test.ts:109-117
Timestamp: 2026-05-04T06:28:25.345Z
Learning: In oven-sh/bun tests that exercise internal IO-thread behavior (e.g., the `io.zig` epoll registration path for `ReadFile`/`WriteFile`), it is acceptable to use `await Bun.sleep(N)` (e.g., 100 ms) as a timing window when the relevant internal event (WorkPool epoll registration) produces no JS-observable side effect and therefore has no condition to `await`. The test is still semantically valid if its fail-before/pass-after semantics hold across both possible race outcomes (e.g., EPOLLERR path vs. direct `recv()` path). Do NOT flag such `Bun.sleep(N)` uses as "flaky" or "fragile" when the author has verified empirically (multiple iterations) and via `USE_SYSTEM_BUN=1` that the test reliably exercises the target path and correctly fails without the fix.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: robobun
Repo: oven-sh/bun PR: 27056
File: test/bundler/standalone.test.ts:281-324
Timestamp: 2026-02-16T04:26:25.185Z
Learning: In Bun test files that exercise Bun.build(), assertions for configuration-validation errors thrown synchronously by JSBundler.fromJS() (via globalThis.throwInvalidArguments()) should use toThrow, e.g., expect(() => Bun.build({...})).toThrow()). Do not use rejects.toThrow() since rejections occur only for asynchronous build errors.
Learnt from: cirospaciari
Repo: oven-sh/bun PR: 27385
File: test/js/bun/http/tls-keepalive.test.ts:115-140
Timestamp: 2026-02-24T21:02:00.725Z
Learning: In Bun's test suites, avoid adding tests for trivial environment/fixture script validation (e.g., checking if env vars exist) within test fixtures. Focus test coverage on actual behavior being tested (e.g., TLS keepalive, memory leaks) rather than auxiliary fixture validation. If a test file is primarily for fixtures, skip or limit tests that validate simple JS behavior like if (!env) throw; prioritize meaningful end-to-end or unit behavior instead.
Learnt from: LawoodDev
Repo: oven-sh/bun PR: 27855
File: test/cli/run/concurrency-filter.test.ts:32-32
Timestamp: 2026-03-06T16:21:42.189Z
Learning: In Bun's test runner, describe.concurrent is supported (since Bun v1.2.23). Use describe.concurrent/test.concurrent for concurrent tests. Be aware of limitations: expect.assertions() and expect.hasAssertions() are not supported; toMatchSnapshot() is not supported (toMatchInlineSnapshot() is); and beforeAll/afterAll hooks are not executed concurrently. The broader guideline to prefer concurrent tests over sequential tests using test.concurrent or describe.concurrent remains valid and should be applied to test files such as test/cli/run/concurrency-filter.test.ts and similar test files.
Learnt from: LawoodDev
Repo: oven-sh/bun PR: 27855
File: test/cli/run/concurrency-filter.test.ts:32-32
Timestamp: 2026-03-06T16:22:55.570Z
Learning: In test/cli/run/concurrency-filter.test.ts and similar test files, timing-sensitive tests that assert on wall-clock elapsed time to verify concurrency behavior (e.g., expect(elapsed).toBeGreaterThan(800)) must remain in a sequential describe block rather than describe.concurrent. Running such tests concurrently can cause CPU contention and skew timing assertions, leading to flaky results. The guideline to prefer describe.concurrent does NOT apply for timing-based correctness verification.
Learnt from: robobun
Repo: oven-sh/bun PR: 28214
File: test/regression/issue/18115.test.ts:1-158
Timestamp: 2026-03-18T15:19:38.407Z
Learning: In Bun test files, when a resource like tempDir is a DisposableString implementing both Symbol.dispose (sync) and Symbol.asyncDispose, prefer plain using over await using. Do not recommend converting to await using for tempDir in Bun test files. This keeps tests idiomatic and avoids unnecessary async disposal. If a resource only supports asyncDispose, use await using.
Learnt from: robobun
Repo: oven-sh/bun PR: 28425
File: test/regression/issue/28422.test.ts:65-79
Timestamp: 2026-03-22T10:12:05.719Z
Learning: In oven-sh/bun test files matching test/**/*.test.{ts,js,jsx,tsx,mjs,cjs}, follow CLAUDE.md by asserting the command exit code LAST—after all other assertions such as stdout/stderr checks and filesystem validation. Do not assert exitCode earlier than those checks. Also, avoid asserting stdout for commands like bun install whose output can vary between runs.
Learnt from: dylan-conway
Repo: oven-sh/bun PR: 28863
File: scripts/build/deps/webkit.ts:149-161
Timestamp: 2026-04-04T19:43:49.607Z
Learning: When reviewing Node/TypeScript code that uses `node:path.join()`, do not treat a later path segment that starts with `/` as a Windows/absolute-path override bug. `path.join()` concatenates segments and normalizes; it only resets the root when using `path.resolve()` (e.g., when it encounters an absolute-looking segment). Therefore, patterns like `join(base, "/relPath")` or `join(homedir(), env.slice(1))` where `env.slice(1)` becomes `"/WebKit"` are expected to produce `base/relPath` (cross-platform). Only flag cases where `path.resolve()` (or other root-resetting logic) is used in a way that could unintentionally ignore the base path.
Learnt from: robobun
Repo: oven-sh/bun PR: 28923
File: test/regression/issue/28921.test.ts:0-0
Timestamp: 2026-04-06T19:19:08.790Z
Learning: In oven-sh/bun tests, prefer `tempDir` (from the `harness` module) over `tempDirWithFiles` when using the `using` statement for automatic cleanup. `tempDirWithFiles(...)` returns a plain `string`, so `using tempDirWithFiles(...)` is effectively a no-op and will not trigger disposal/cleanup. `tempDir` returns a `DisposableString` that implements `Symbol.dispose`, so it will correctly trigger cleanup on scope exit.
Learnt from: robobun
Repo: oven-sh/bun PR: 29050
File: test/regression/issue/29042.test.ts:60-94
Timestamp: 2026-04-08T21:22:00.840Z
Learning: In this repo’s Bun environment, `Bun.RedisClient` does not implement `Symbol.dispose` or `Symbol.asyncDispose`, so you cannot rely on `using` / `await using` for automatic cleanup. When creating a `Bun.RedisClient` in tests, close it explicitly with `try/finally`, calling `client.close()` in the `finally` block.
Learnt from: robobun
Repo: oven-sh/bun PR: 29322
File: test/js/web/workers/worker-terminate-after-exit.test.ts:38-43
Timestamp: 2026-04-15T01:57:52.469Z
Learning: In oven-sh/bun test files (matching `test/**/*.test.ts`), when you spawn a subprocess in a bun:test and you assert on its exit code, follow the CLAUDE.md house style: write `if (exitCode !== 0) { expect(stderr).toBe(""); }` immediately before `expect(exitCode).toBe(0)`. This is intentional so that, on failure, bun:test surfaces the full `stderr` content in the diff output. Do not replace this with a custom/second assertion that formats stderr into the exit-code expectation (e.g., `expect(exitCode, \\`stderr: ${stderr}\\`).toBe(0)` or any single-assertion equivalent).
Learnt from: robobun
Repo: oven-sh/bun PR: 29389
File: test/js/bun/util/v8-heap-snapshot-large-strings.test.ts:4-152
Timestamp: 2026-04-17T02:55:14.338Z
Learning: In oven-sh/bun, do not enforce the `test/regression/issue/${issueNumber}.test.ts` placement rule based solely on PR descriptions that include a speculative GitHub issue link like “might fix `#NNNNN`” without a confirmed regression (e.g., no verifying stack trace/reproduction). If the issue is not confirmed per CLAUDE.md (“confirmed numbered issue” only), the test should be placed next to the closest related existing test file for the affected feature/module (e.g., alongside `test/js/bun/util/v8-heap-snapshot.test.ts`) and should not be flagged as a guideline violation. Likewise, tests that validate a broader behavioral invariant (e.g., V8-matching 1024-char string truncation in heap snapshots) are not purely issue regressions and should live with the feature’s existing test suite rather than under `test/regression/issue/`.
Learnt from: robobun
Repo: oven-sh/bun PR: 29426
File: test/js/node/tls/node-tls-root-certs-concurrent-init.test.ts:80-82
Timestamp: 2026-04-18T00:50:38.905Z
Learning: In oven-sh/bun Jest/Bun test files under `test/js/` that spawn subprocesses using `bunEnv` from the `harness` module, it’s safe and intentional to assert `expect(stderr).toBe("")` unconditionally. `bunEnv` sets `BUN_DEBUG_QUIET_LOGS=1`, which suppresses ASAN/debug-build stderr noise, so an unexpected stderr value should fail the test and show useful diagnostics. Do not gate `expect(stderr).toBe("")` behind `if (exitCode !== 0)` for these `bunEnv`-based subprocess tests—follow the established pattern used in similar tests (e.g., `test/js/node/tls/test-use-system-ca.test.ts`).
Learnt from: robobun
Repo: oven-sh/bun PR: 29441
File: test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts:10-14
Timestamp: 2026-04-18T10:36:45.033Z
Learning: In oven-sh/bun test files that spawn subprocesses using `bunEnv`, suppress the known ASAN startup noise in the subprocess stderr before asserting it is empty. Use the repo’s established convention: split stderr into lines and filter with `.filter(line => !line.startsWith("WARNING: ASAN interferes"))`, then assert the remaining stderr lines are empty. Do not switch to an alternative like `str.replace(...)`; the filter-based approach is the repo convention. This is safe because `ZigGlobalObject.cpp` emits that warning via `std::call_once`, so at most one matching line appears per process.
Learnt from: robobun
Repo: oven-sh/bun PR: 29538
File: test/js/bun/resolve/lower-using-bun-target.test.ts:80-82
Timestamp: 2026-04-21T09:47:19.303Z
Learning: In Bun JavaScript/TS tests under `test/js/bun/**` that run runtime subprocesses by spawning `bunExe()` with `bunEnv`, do not add strict `expect(stderr).toBe("")` assertions. In debug ASAN builds, stderr will include `WARNING: ASAN interferes with JSC signal handlers…` on every JS-process launch and it is not suppressed by `bunEnv` / `BUN_DEBUG_QUIET_LOGS=1`. Use the regression guards that are already effective for this area: assert an exact match on `stdout` and `expect(exitCode).toBe(0)`. If you must validate stderr, follow the repo’s filter-based convention: ignore/filter out lines starting with `"WARNING: ASAN interferes"`. If stdout + exitCode provide sufficient coverage, leaving stderr unchecked is acceptable.
Learnt from: robobun
Repo: oven-sh/bun PR: 29538
File: test/js/bun/resolve/lower-using-bun-target.test.ts:133-142
Timestamp: 2026-04-21T09:54:56.748Z
Learning: When testing `bun build` subprocesses in `test/js/bun/**/*.test.ts`, it is acceptable to assert `expect(stderr).toBe("")` (or otherwise expect no stderr noise). `bun build` is compiler-only and does not start a JS VM, so it should not emit the ASAN warning about interfering with JSC signal handlers. Only JS-executing subprocesses (e.g., `bun -e`, running built output like `bun out.js`) are expected to produce that warning, so do not treat empty-stderr assertions as brittle specifically for `bun build` in these tests.
Learnt from: robobun
Repo: oven-sh/bun PR: 29564
File: test/regression/issue/29513.test.ts:51-51
Timestamp: 2026-04-22T02:58:30.645Z
Learning: In oven-sh/bun TypeScript test files, it is acceptable to use `Bun.sleep(0)` specifically as a macrotask barrier to deterministically drain the pending microtask queue before asserting. Do NOT flag `Bun.sleep(0)` as a timing-wait violation. The “do not use setTimeout/Bun.sleep in tests” guideline is intended to prevent load-sensitive wall-clock delays (e.g., `Bun.sleep(100)` or other timing windows). Use `Bun.sleep(0)` only when you need to observe a fully settled Promise/microtask chain (e.g., after deferred resolution and multiple internal `.then()` hops) where a single `await Promise.resolve()` would not advance far enough; `Bun.sleep(0)` resumes in a later macrotask after pending microtasks complete, without relying on elapsed time.
Learnt from: dylan-conway
Repo: oven-sh/bun PR: 29581
File: src/bun.js/modules/NodeModuleModule.cpp:663-681
Timestamp: 2026-04-22T20:47:10.896Z
Learning: In oven-sh/bun code reviews, do not recommend adding standalone regression tests that depend on setting `BUN_JSC_validateExceptionChecks=1` to exercise JSC throw-scope/exception-scope validator paths (e.g., PropertyCallback/reify interactions like `reifyAllStaticProperties`). Per `CLAUDE.md`, tests are expected to pass with `USE_SYSTEM_BUN=1`, and `BUN_JSC_validateExceptionChecks` is a no-op on release/system Bun builds. Instead, treat this class of validator coverage issue as covered by: (1) the x64-asan CI shard that enables the validator automatically, and (2) the `test/no-validate-exceptions.txt` opt-out list for tests that hit pre-existing throw-scope assertion failures unrelated to the change under review. If helpful, add an in-source comment pointing to the specific existing exerciser (e.g., the relevant `tsgo/bun-types` test) to document the intent without relying on the env var.
Learnt from: robobun
Repo: oven-sh/bun PR: 29656
File: test/js/bun/s3/s3-path-double-free.test.ts:49-61
Timestamp: 2026-04-23T23:39:21.333Z
Learning: In Bun test files under `test/js/bun/**/*.test.ts`, prefer `test.each()` over `describe.each()` when each parameter value results in a single `test`/`it` assertion. Using `describe.each()` to wrap a single `test` adds unnecessary nesting. Only use `describe.each()` when you need multiple `test`/`it` blocks per parameter value.
Learnt from: robobun
Repo: oven-sh/bun PR: 29820
File: test/js/node/process/process-execve.test.ts:47-52
Timestamp: 2026-04-28T11:35:58.257Z
Learning: In oven-sh/bun test files under `test/**/*.test.ts`, when a test uses the `tempDir` fixture and spawns a subprocess via `await using proc = Bun.spawn(...)` (i.e., the embedded script runs as a spawned subprocess), do not recommend adding a fixture-level or embedded-script `setTimeout` watchdog to prevent hangs. The `await using` scope exit should terminate the subprocess automatically, and Bun test per-test timeouts already bound execution time. Also, avoid embedded `setTimeout` watchdog patterns that violate Bun’s “no setTimeout in tests” guideline. If the worker/subprocess exits silently without posting, rely on the test’s stdout/exitCode assertions plus Bun’s outer timeout rather than a watchdog, even when the embedded fixture script uses `worker_threads` or other async constructs.
Learnt from: robobun
Repo: oven-sh/bun PR: 29874
File: test/js/web/websocket/websocket-proxy-tunnel-upgrade-leak.test.ts:15-16
Timestamp: 2026-04-28T21:34:23.491Z
Learning: In oven-sh/bun, when a test is intentionally validating native refcount leak detection using Bun debug-only instrumentation (e.g., `BUN_DEBUG_alloc=1` and `[alloc] new(...)/destroy(...)` log lines produced only by debug builds when `Environment.enable_logs` is set), use `test.skipIf(!isDebug)` as the correct/intentional guard. Do not flag this `test.skipIf(!isDebug)` as a guideline violation for this class of tests. The debug-only `[alloc] ...` lines are absent in release and ASAN builds, and there is no equivalent observable system-Bun hook to assert a leak when only debug-build instrumentation exists (so the `USE_SYSTEM_BUN=1` rule in `CLAUDE.md` does not apply in this situation).
Learnt from: robobun
Repo: oven-sh/bun PR: 29876
File: test/js/bun/ffi/cc.test.ts:0-0
Timestamp: 2026-04-29T00:09:18.937Z
Learning: In oven-sh/bun tests, when using the `harness` module’s `tempDir`, prefer the overload that accepts an optional second argument: `tempDir(prefix, fileTree)`, where `fileTree` is an object in the same shape as `tempDirWithFiles` (e.g., `{ "file.c": "..." }`). This creates a disposable temp directory pre-populated with files. If the `tempDir` file-tree overload is available, don’t recommend a separate manual `fs.writeFile`/write step for pre-populating files (e.g., when using `using dir = tempDir("prefix", { ... })`).
Learnt from: robobun
Repo: oven-sh/bun PR: 29876
File: test/js/bun/ffi/cc.test.ts:205-231
Timestamp: 2026-04-29T00:24:38.784Z
Learning: In oven-sh/bun’s Bun test files under test/js/bun/, do not treat explicit per-test timeouts as a guideline violation when the test is an RSS-leak regression that spawns a subprocess and performs many iterations (subprocess-heavy leak tests). For these cases, Bun’s default per-test timeout (5s locally) is insufficient—especially under debug+ASAN where these tests may take ~5–14s—so reviewers should expect and accept an explicit, larger per-test timeout (e.g., 60_000). Concretely, tests like the cc() option-string leak test (test/js/bun/ffi/cc.test.ts) and glob-leak tests (e.g., test/js/bun/glob/leak.test.ts) should be reviewed as exceptions: allow explicit timeouts when the intent is to cover RSS-leak/subprocess-heavy regression workloads.
Learnt from: robobun
Repo: oven-sh/bun PR: 29919
File: test/js/bun/util/filesystem_router.test.ts:613-628
Timestamp: 2026-05-02T00:35:55.819Z
Learning: In oven-sh/bun tests under test/js/bun/**, prefer strict stderr assertions like `expect(stderr).toBe("")` for subprocesses spawned with `bunExe()` when you pass a `bunEnv` that already propagates `ASAN_OPTIONS=allow_user_segv_handler=1` from the parent `bun bd` build environment (this suppresses the `WARNING: ASAN interferes with JSC signal handlers` message). On CI ASAN lanes where `isASAN` is true, `bunEnv` sets `isASAN` explicitly as well—so strict stderr expectations are still safe. Only relax/skip strict stderr assertions (e.g., avoid `toBe("")`) when `ASAN_OPTIONS=allow_user_segv_handler=1` is *not* propagated into the subprocess environment.
Learnt from: robobun
Repo: oven-sh/bun PR: 30115
File: test/js/bun/glob/scan.test.ts:877-882
Timestamp: 2026-05-02T17:49:10.214Z
Learning: In oven-sh/bun regression tests for UAFs tied to Bun’s threadpool/event-loop interaction (e.g., WalkTask pending activity), keep the intended repro timing: use `Bun.sleepSync(N)` inside a spawned subprocess to hold the JS event loop without yielding/draining pending tasks, then trigger `Bun.gc(true)` (after the threadpool task has been given time to complete `run()`), and finally drive the result with the corresponding `for await`/iterator consumption to make the UAF observable. Do not replace `Bun.sleepSync(N)` with `await Bun.sleep(0)` or any other event-loop-yielding construct, since it can drain pending concurrent tasks and cause callbacks/`then()` work to run before the GC call, making the bug unobservable. This “sleepSync → gc(true) → for await” sequence is the correct 3-step UAF repro pattern for this bug class.
Learnt from: robobun
Repo: oven-sh/bun PR: 30142
File: test/js/bun/http/bun-serve-html-abort-leak-fixture.ts:28-38
Timestamp: 2026-05-03T01:29:10.031Z
Learning: In oven-sh/bun tests/fixtures that spawn subprocesses with `BUN_DEBUG_alloc` (or `BUN_DEBUG_ALL`) set to a non-zero value (e.g., `"1"`), the `[alloc]` log scope is effectively enabled at runtime for all `bun.new`/`bun.destroy`-allocated types. Because the runtime check in `src/output.zig` forces `really_disable = false` when `BUN_DEBUG_<tagname>` is not `"0"`, such fixtures may emit `[alloc] new(T)` / `[alloc] destroy(T)` lines even when `T` does not declare `log_allocations = true`. In this context, do not flag missing `log_allocations` declarations as a bug in the test fixture or the involved fixture types.
Learnt from: robobun
Repo: oven-sh/bun PR: 30153
File: test/bundler/plugin-sync-exception-fallback.test.ts:75-91
Timestamp: 2026-05-03T01:53:50.441Z
Learning: In this repo’s Bun test files that use `Bun.spawn`, don’t “parse/assert stdout before checking `exitCode`” when the expected failure mode is a crash (e.g., SIGSEGV or UBSan abort) that may produce empty stdout. Parsing/validating empty stdout first can mask the more useful signal/stderr. Instead, assert the spawned-process result by including `stdout` in the object passed to `toMatchObject` alongside `exitCode`, `signalCode`, and `stderr`, so stdout/stderr/signal all appear together in the failure diff (same pattern as `test/bundler/plugin-error-nested-throw.test.ts`).
|
Neither linked issue is addressed by this change:
This PR only changes |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/js/bun/util/bun-file-fd-read.test.ts (1)
109-134:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMake the EPOLLERR repro observable instead of time-based.
await Bun.sleep(100)still doesn't proveBun.file(fd).text()has reached the epoll registration path, and the later{ ok: false, code: "ECONNRESET" }assertion also accepts the earlier direct-recv()failure path. That means this can pass without covering the regression this PR fixes.Please replace the sleep with an active condition that confirms the fd is actually registered/read-blocked before forcing the RST. As per coding guidelines, "Do not write flaky tests; do not use
setTimeoutin tests; await the condition to be met instead".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/js/bun/util/bun-file-fd-read.test.ts` around lines 109 - 134, Replace the timing-based Bun.sleep(100) with an active wait that verifies the read is actually blocked/registered before forcing the RST: poll the read Promise (the variable read created from Bun.file(...).text()) with a short timeout using Promise.race in a loop and only proceed when the race shows the read is still pending (i.e., the timeout wins), then perform the setsockopt(...) on serverSocket._handle.fd and call serverSocket.destroy(); this ensures the code hits the epoll/blocked-read path rather than relying on a fixed sleep.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/js/bun/util/bun-file-fd-read.test.ts`:
- Around line 109-134: Replace the timing-based Bun.sleep(100) with an active
wait that verifies the read is actually blocked/registered before forcing the
RST: poll the read Promise (the variable read created from Bun.file(...).text())
with a short timeout using Promise.race in a loop and only proceed when the race
shows the read is still pending (i.e., the timeout wins), then perform the
setsockopt(...) on serverSocket._handle.fd and call serverSocket.destroy(); this
ensures the code hits the epoll/blocked-read path rather than relying on a fixed
sleep.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8881bf4b-3138-4d44-bd31-e2cc54219e5f
📒 Files selected for processing (1)
test/js/bun/util/bun-file-fd-read.test.ts
There was a problem hiding this comment.
The fix looks correct (SO_ERROR is the right way to extract the errno on EPOLLERR, and falling through to onReady() for ENOTSOCK/no-error is the standard idiom), but this changes error-dispatch semantics on the IO thread's epoll hot path and CI is currently showing build-zig failures on dea531d across several platforms — worth a human look before merge.
Extended reasoning...
Overview
Two files changed: src/io/io.zig (the Poll.onUpdateEpoll EPOLLERR branch) and a new Linux-only regression test in test/js/bun/util/bun-file-fd-read.test.ts. The Zig change replaces bun.sys.getErrno(event.events) — which was always returning .SUCCESS because event.events is an epoll flag bitmask, not a syscall return — with a getsockopt(SO_ERROR) query. If SO_ERROR yields a non-zero errno it calls onIOError(fromCodeInt(so_error, .epoll_ctl)); otherwise (getsockopt failed, e.g. ENOTSOCK on a pipe, or so_error == 0) it now dispatches onReady() so the subsequent read()/write() surfaces the real error.
Security risks
None. This is internal errno plumbing on the blob ReadFile/WriteFile IO thread; no user input reaches the new getsockopt call beyond an fd Bun already owns and was already polling.
Level of scrutiny
Moderate-to-high. The diff is small and the new behavior is the textbook EPOLLERR idiom, and it is strictly better than the previous always-zero errno (which asserted in debug and produced a code-less rejection in release). However, this is the central epoll dispatch path on the dedicated IO thread — every Bun.file(fd) read/write on a pollable fd on Linux flows through here — and the semantics shifted from "always onIOError" to "sometimes onReady" under EPOLLERR. I verified both ReadFile and WriteFile expose opened_fd: bun.FD so the inline else field access compiles for both arms, and bun.sys.Error.fromCodeInt exists in src/sys/Error.zig.
Other factors
- The robobun CI comment reports build-zig failures on macOS aarch64/x64, FreeBSD x64/aarch64, and Android aarch64/x64 for commit
dea531d. These may be infra flakes (robobun's header says "1 failures" yet lists seven), and the touched function is only reachable underEnvironment.isLinux, but it's worth confirming the latest commit (a82d48a) is green before merging. - coderabbitai raised and then withdrew a concern about the
Bun.sleep(100)timing window in the test; the author's explanation (no JS-observable signal for epoll registration; fail-before/pass-after holds on both race branches; verified withUSE_SYSTEM_BUN=1) is sound. The minorsetsockopt(SO_LINGER)return-value check was applied ina82d48a. - No CODEOWNERS cover
src/io/io.zig.
|
CI failures on a82d48a are pre-existing and unrelated to this change:
The build-zig failures mentioned in the review were on the first commit (dea531d) and were infra flakes — all build steps passed on a82d48a. The new test in this PR ( |
There was a problem hiding this comment.
The fix looks correct and follows the standard EPOLLERR → getsockopt(SO_ERROR) idiom, but since this changes error-dispatch behavior in the core IO-thread epoll loop, it's worth a human sign-off.
Extended reasoning...
Overview
This PR changes Poll.onUpdateEpoll in src/io/io.zig (the dedicated epoll/kqueue loop used by Bun.file(fd) blob reads/writes). Previously, when EPOLLERR fired, it called bun.sys.getErrno(event.events) on the epoll flag bitmask — which always yields .SUCCESS (0) — and passed that into onIOError, tripping a debug assertion in errnoToZigErr(0). The fix queries getsockopt(SO_ERROR) on this.opened_fd (present on both ReadFile and WriteFile); if a real errno is pending it dispatches onIOError, otherwise it falls through to onReady() so the next read()/write() surfaces the error. A Linux-only regression test using bun:ffi + SO_LINGER-forced RST exercises the path.
Security risks
None identified. The change reads SO_ERROR from an already-open fd owned by the runtime; no new attack surface, no user-controlled inputs reach the new code beyond what already flows through this path.
Level of scrutiny
Medium-high. Although the file header notes this loop is "barely used" (only Bun.write() / Bun.file().text() on pollable fds), it is still core IO-thread error-dispatch logic. The new behavior — falling back to onReady() when getsockopt fails (e.g., ENOTSOCK on a pipe) or returns 0 — is the canonical idiom and strictly better than the previous always-zero-errno crash, but a maintainer should confirm they're comfortable with the onReady() fallback for non-socket pollable fds that hit EPOLLERR.
Other factors
The bug-hunting system found no issues. CodeRabbit's two comments were addressed (SO_LINGER return-value check applied) or withdrawn (sleep-based timing, which the author justified convincingly). CI failures are documented as pre-existing/unrelated (Windows binary-size and Windows test flake; the changed code is gated behind comptime Environment.isLinux). The new test passed on all Linux shards. Given it's a non-mechanical change to event-loop error handling, I'm deferring rather than approving.
|
Build #50924 (main merged in): 282 passed, 4 failed — all pre-existing flakes also failing on other unrelated PRs.
Both are on non-Linux platforms; the changed code is under |
onUpdateEpoll called getErrno(event.events) when EPOLLERR was set, but event.events is an epoll flag bitmask (EPOLLIN|EPOLLERR|...), not a syscall return value. getErrno() on a u32 bitmask compares against -1 and always returns .SUCCESS (0), which then flowed into ReadFile/WriteFile.onIOError -> errnoToZigErr(0) and tripped the non-zero assertion on the IO thread. Query getsockopt(SO_ERROR) for the real pending error when EPOLLERR is reported; if the fd is not a socket or has no pending error, dispatch as ready so the next read()/write() surfaces the error itself.
9f4f520 to
d396c51
Compare
There was a problem hiding this comment.
The fix looks correct and well-tested, but since it changes the IO thread's epoll error-dispatch semantics (including a new onReady() fallback on EPOLLERR), it's worth a quick human sign-off from someone familiar with the io.zig ReadFile/WriteFile state machine.
Extended reasoning...
Overview
This PR fixes Poll.onUpdateEpoll in src/io/io.zig, which previously passed event.events (an epoll flag bitmask) to bun.sys.getErrno(), always yielding errno 0 and crashing the IO thread via errnoToZigErr(0). The fix replaces this with the standard idiom: query getsockopt(SO_ERROR) for the pending error, dispatch onIOError if non-zero, otherwise fall through to onReady() so the next read()/write() surfaces the error. A Linux-only regression test in bun-file-fd-read.test.ts provokes EPOLLERR via TCP RST and asserts the rejection carries ECONNRESET.
I verified: opened_fd exists on both ReadFile (read_file.zig:52) and WriteFile (write_file.zig:9); fromCodeInt handles a positive c_int errno correctly; and all epoll registrations in this loop use EPOLL.ONESHOT, so the onReady() fallback cannot busy-loop.
Security risks
None. No new attack surface — this is internal error-code plumbing on an fd the caller already supplied to Bun.file(fd).
Level of scrutiny
Moderate. The diff is ~10 lines and the logic is the textbook EPOLLERR pattern, and both new branches are strictly better than the prior crash/error.Unexpected. However, this is the IO thread's core epoll event dispatcher for blob reads/writes, and it introduces a semantic change (calling onReady() when EPOLLERR is set but SO_ERROR is unavailable or zero — e.g., for non-socket pollable fds like pipes/FIFOs). That fallback is sound given ONESHOT registration and the existing read/write error handling, but a reviewer familiar with the ReadFile/WriteFile state machine should confirm there's no edge case where onReady() after EPOLLERR re-arms without progress.
Other factors
The bug-hunting system found nothing. Both CodeRabbit comments were resolved (SO_LINGER return check applied; the Bun.sleep(100) concern withdrawn with a sound explanation). CI failures are documented as pre-existing flakes on non-Linux platforms; the new test passed on all Linux shards. The author's CI triage in the timeline is thorough and credible.
What does this PR do?
Fixes a debug assertion crash on the IO thread when epoll reports
EPOLLERRfor aBun.file(fd)read/write on a pollable fd.Poll.onUpdateEpollcalledbun.sys.getErrno(event.events)whenEPOLLERRwas set — butevent.eventsis an epoll flag bitmask (EPOLLIN | EPOLLERR | ...), not a syscall return value.getErrnoon au32compares against-1and so always returned.SUCCESS(0). That zero errno then flowed intoReadFile.onIOError/WriteFile.onIOError→bun.errnoToZigErr(0), which asserts on non-zero and crashed the IO thread.Unlike kqueue (where
event.datacarries the errno onEV_ERROR), epoll'sEPOLLERRonly signals that there is an error; it doesn't say which one. The fix queriesgetsockopt(SO_ERROR)on the fd to retrieve the pending error. If the fd isn't a socket (ENOTSOCK) or there's no pending error, it dispatches as ready so the nextread()/write()surfaces the real error — the standard idiom for handlingEPOLLERR.How did you verify your code works?
Added a Linux-only regression test in
test/js/bun/util/bun-file-fd-read.test.tsthat:Bun.file(fd).text()on it, which registers with theio.zigepoll.SO_LINGERwithl_linger=0on the server side and closes it, sending RST.EPOLLERRwith a pendingECONNRESET.Before this change the subprocess panics with
reached unreachable codeaterrnoToZigErr. After, the read rejects withECONNRESET.Fuzzer fingerprint:
e92583e5fff92dc0