test(fetch-http2-client): deflake 'client RSTs on local error' via same-session barrier#29954
Conversation
…me-session barrier The subprocess exits immediately after catching the gzip-decode error, and on Windows process termination abortively closes the TCP socket (RST) with any data still in the TLS/kernel send buffer dropped — so the 13-byte RST_STREAM(CANCEL) frame never reaches the raw server and state.rst is []. Add a second fetch on the same pooled session after the error. RST_STREAM(1) is queued ahead of HEADERS(3) on the one socket, so the 204 arriving back proves the RST reached the server before the subprocess exits. Assert connections=1 so the ordering argument actually holds. Seen on 25+ recent Windows (2019 x64 / 11 aarch64) CI runs.
|
Updated 6:19 PM PT - Apr 29th, 2026
❌ @robobun, your commit e39607f has 2 failures in
🧪 To try this PR locally: bunx bun-pr 29954That installs a local version of the PR into your bun-29954 --bun |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe test file introduces a secondary stream "barrier" that responds with server-side Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
LGTM — test-only deflake with a sound barrier pattern already used by the neighbouring 303-redirect test.
Extended reasoning...
Overview
This PR modifies a single flaky test case in test/js/web/fetch/fetch-http2-client.test.ts (client RSTs the stream when it abandons on a local error). The fix adds a second fetch on the same pooled h2 session after the error, acting as a delivery barrier so the RST_STREAM(1) frame is provably on the wire before the subprocess exits. The server handler branches on stream id (1 → bad gzip, else → 204), and assertions are extended to verify the barrier's 204 arrived and that state.connections === 1 (proving the barrier rode the same socket so FIFO ordering applies).
Security risks
None. This is a test-only change with no production code touched. The subprocess script is constructed by the test itself with a localhost URL; no new attack surface.
Level of scrutiny
Low. Test-only deflake, well-explained root cause (Windows abortive close drops buffered TLS writes on process exit), and the barrier technique mirrors the existing 303 redirect on a streaming-body POST test directly above it. The added connections === 1 assertion is a nice self-check that the ordering argument actually holds rather than masking a regression.
Other factors
No bugs found by the bug-hunting system. The PR description includes verification (58/58 pass ×3, 20/20 in isolation). The change is small, self-contained, and strictly tightens the test (it now also asserts the session was reused). No outstanding reviewer comments.
|
CI failure on
Don't have |
…known flake; MERGED #29954 deflakes)
Flaky test failures on x64-asan / windows-x64 unrelated to the Transpiler sourcemap diff: - fetch-http2-client: WebKit AtomString wasRemoved assertion (h2 client under ASAN has recent deflaking history — #29954, #29809) - test-worker-nested-uncaught: panic "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM teardown race - test-http-should-emit-close-when-connection-is-aborted: same failure landed on PR #30540's neighbour build 53635 - AsyncLocalStorage-tracking / bun-install-registry: async_hooks / install flakes None of these touch Bun.Transpiler / js_printer / sourcemap. All Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
…me-session barrier (oven-sh#29954) ## Flake `fetch() over HTTP/2 > raw frame server > client RSTs the stream when it abandons on a local error` fails on Windows (2019 x64, 2019 x64-baseline, 11 aarch64 — occasionally macOS ASAN) with: ``` expect(state.rst).toEqual([{ id: 1, code: 8 }]) - Expected: [{ code: 8, id: 1 }] + Received: [] ``` Seen in 25+ of the last 65 completed BuildKite runs across branches (builds 48962-49247). ## Cause The subprocess script is: ```js try { await (await fetch(url, { tls })).arrayBuffer(); console.log("ok"); } catch (e) { console.log("rejected", e.code ?? e.message); } ``` After `handleResponseBody` throws on the invalid gzip payload, the h2 client writes a 13-byte `RST_STREAM(CANCEL)` frame to the session's write buffer and flushes it to the TLS socket. But "flushed to the TLS socket" only means accepted into the BoringSSL/uSockets layer — not that it's on the wire. The catch block then prints and the script ends, so the event loop drains and the process exits. On Windows, process termination does an abortive close (TCP RST) on open sockets, dropping anything still in the TLS write buffer / kernel send queue. The raw server's `socket.on('data')` never sees the RST_STREAM frame; `state.allClosed()` resolves on the abrupt close; `state.rst` is `[]`. ## Fix Add a second fetch on the same pooled session after the error. The session survives a stream-level body-parse failure (only the stream is RST'd and removed; `maybeRelease()` pools the socket), so the follow-up request reuses it as stream 3. `RST_STREAM(1)` is FIFO-queued ahead of `HEADERS(3)` on that one socket, so the 204 response arriving back at the subprocess proves the RST reached the server before the process exits. The test also asserts `state.connections === 1` so the ordering argument actually holds (barrier rode the same socket). This is the same pattern the neighbouring `303 redirect on a streaming-body POST RSTs the half-open upload stream` test already relies on — there the RST for stream 1 is naturally followed by the redirect's follow-up GET on stream 3 before the subprocess exits. ## Verification - Full file: 58 pass / 0 fail, ×3 runs - Fixed case in isolation: 20/20 pass Co-authored-by: robobun <robobun@users.noreply.github.com>
Flaky test failures on x64-asan / windows-x64 unrelated to the Transpiler sourcemap diff: - fetch-http2-client: WebKit AtomString wasRemoved assertion (h2 client under ASAN has recent deflaking history — #29954, #29809) - test-worker-nested-uncaught: panic "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM teardown race - test-http-should-emit-close-when-connection-is-aborted: same failure landed on PR #30540's neighbour build 53635 - AsyncLocalStorage-tracking / bun-install-registry: async_hooks / install flakes None of these touch Bun.Transpiler / js_printer / sourcemap. All Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
Flaky test failures on x64-asan / windows-x64 unrelated to the Transpiler sourcemap diff: - fetch-http2-client: WebKit AtomString wasRemoved assertion (h2 client under ASAN has recent deflaking history — #29954, #29809) - test-worker-nested-uncaught: panic "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM teardown race - test-http-should-emit-close-when-connection-is-aborted: same failure landed on PR #30540's neighbour build 53635 - AsyncLocalStorage-tracking / bun-install-registry: async_hooks / install flakes None of these touch Bun.Transpiler / js_printer / sourcemap. All Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
Flaky test failures on x64-asan / windows-x64 unrelated to the Transpiler sourcemap diff: - fetch-http2-client: WebKit AtomString wasRemoved assertion (h2 client under ASAN has recent deflaking history — #29954, #29809) - test-worker-nested-uncaught: panic "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM teardown race - test-http-should-emit-close-when-connection-is-aborted: same failure landed on PR #30540's neighbour build 53635 - AsyncLocalStorage-tracking / bun-install-registry: async_hooks / install flakes None of these touch Bun.Transpiler / js_printer / sourcemap. All Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
Flaky test failures on x64-asan / windows-x64 unrelated to the Transpiler sourcemap diff: - fetch-http2-client: WebKit AtomString wasRemoved assertion (h2 client under ASAN has recent deflaking history — #29954, #29809) - test-worker-nested-uncaught: panic "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM teardown race - test-http-should-emit-close-when-connection-is-aborted: same failure landed on PR #30540's neighbour build 53635 - AsyncLocalStorage-tracking / bun-install-registry: async_hooks / install flakes None of these touch Bun.Transpiler / js_printer / sourcemap. All Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
Flaky test failures on x64-asan / windows-x64 unrelated to the Transpiler sourcemap diff: - fetch-http2-client: WebKit AtomString wasRemoved assertion (h2 client under ASAN has recent deflaking history — #29954, #29809) - test-worker-nested-uncaught: panic "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM teardown race - test-http-should-emit-close-when-connection-is-aborted: same failure landed on PR #30540's neighbour build 53635 - AsyncLocalStorage-tracking / bun-install-registry: async_hooks / install flakes None of these touch Bun.Transpiler / js_printer / sourcemap. All Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
Flaky test failures on x64-asan / windows-x64 unrelated to the Transpiler sourcemap diff: - fetch-http2-client: WebKit AtomString wasRemoved assertion (h2 client under ASAN has recent deflaking history — #29954, #29809) - test-worker-nested-uncaught: panic "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM teardown race - test-http-should-emit-close-when-connection-is-aborted: same failure landed on PR #30540's neighbour build 53635 - AsyncLocalStorage-tracking / bun-install-registry: async_hooks / install flakes None of these touch Bun.Transpiler / js_printer / sourcemap. All Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
Flake
fetch() over HTTP/2 > raw frame server > client RSTs the stream when it abandons on a local errorfails on Windows (2019 x64, 2019 x64-baseline, 11 aarch64 — occasionally macOS ASAN) with:Seen in 25+ of the last 65 completed BuildKite runs across branches (builds 48962-49247).
Cause
The subprocess script is:
After
handleResponseBodythrows on the invalid gzip payload, the h2 client writes a 13-byteRST_STREAM(CANCEL)frame to the session's write buffer and flushes it to the TLS socket. But "flushed to the TLS socket" only means accepted into the BoringSSL/uSockets layer — not that it's on the wire. The catch block then prints and the script ends, so the event loop drains and the process exits.On Windows, process termination does an abortive close (TCP RST) on open sockets, dropping anything still in the TLS write buffer / kernel send queue. The raw server's
socket.on('data')never sees the RST_STREAM frame;state.allClosed()resolves on the abrupt close;state.rstis[].Fix
Add a second fetch on the same pooled session after the error. The session survives a stream-level body-parse failure (only the stream is RST'd and removed;
maybeRelease()pools the socket), so the follow-up request reuses it as stream 3.RST_STREAM(1)is FIFO-queued ahead ofHEADERS(3)on that one socket, so the 204 response arriving back at the subprocess proves the RST reached the server before the process exits. The test also assertsstate.connections === 1so the ordering argument actually holds (barrier rode the same socket).This is the same pattern the neighbouring
303 redirect on a streaming-body POST RSTs the half-open upload streamtest already relies on — there the RST for stream 1 is naturally followed by the redirect's follow-up GET on stream 3 before the subprocess exits.Verification