Skip to content

test(fetch-http2-client): deflake 'client RSTs on local error' via same-session barrier#29954

Merged
Jarred-Sumner merged 1 commit into
mainfrom
farm/9c5297fb/deflake-h2-client-rst-test
Apr 29, 2026
Merged

test(fetch-http2-client): deflake 'client RSTs on local error' via same-session barrier#29954
Jarred-Sumner merged 1 commit into
mainfrom
farm/9c5297fb/deflake-h2-client-rst-test

Conversation

@robobun

@robobun robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Flake

fetch() over HTTP/2 > raw frame server > client RSTs the stream when it abandons on a local error fails on Windows (2019 x64, 2019 x64-baseline, 11 aarch64 — occasionally macOS ASAN) with:

expect(state.rst).toEqual([{ id: 1, code: 8 }])
- Expected: [{ code: 8, id: 1 }]
+ Received: []

Seen in 25+ of the last 65 completed BuildKite runs across branches (builds 48962-49247).

Cause

The subprocess script is:

try { await (await fetch(url, { tls })).arrayBuffer(); console.log("ok"); }
catch (e) { console.log("rejected", e.code ?? e.message); }

After handleResponseBody throws on the invalid gzip payload, the h2 client writes a 13-byte RST_STREAM(CANCEL) frame to the session's write buffer and flushes it to the TLS socket. But "flushed to the TLS socket" only means accepted into the BoringSSL/uSockets layer — not that it's on the wire. The catch block then prints and the script ends, so the event loop drains and the process exits.

On Windows, process termination does an abortive close (TCP RST) on open sockets, dropping anything still in the TLS write buffer / kernel send queue. The raw server's socket.on('data') never sees the RST_STREAM frame; state.allClosed() resolves on the abrupt close; state.rst is [].

Fix

Add a second fetch on the same pooled session after the error. The session survives a stream-level body-parse failure (only the stream is RST'd and removed; maybeRelease() pools the socket), so the follow-up request reuses it as stream 3.

RST_STREAM(1) is FIFO-queued ahead of HEADERS(3) on that one socket, so the 204 response arriving back at the subprocess proves the RST reached the server before the process exits. The test also asserts state.connections === 1 so the ordering argument actually holds (barrier rode the same socket).

This is the same pattern the neighbouring 303 redirect on a streaming-body POST RSTs the half-open upload stream test already relies on — there the RST for stream 1 is naturally followed by the redirect's follow-up GET on stream 3 before the subprocess exits.

Verification

  • Full file: 58 pass / 0 fail, ×3 runs
  • Fixed case in isolation: 20/20 pass

…me-session barrier

The subprocess exits immediately after catching the gzip-decode error, and
on Windows process termination abortively closes the TCP socket (RST) with
any data still in the TLS/kernel send buffer dropped — so the 13-byte
RST_STREAM(CANCEL) frame never reaches the raw server and state.rst is [].

Add a second fetch on the same pooled session after the error. RST_STREAM(1)
is queued ahead of HEADERS(3) on the one socket, so the 204 arriving back
proves the RST reached the server before the subprocess exits. Assert
connections=1 so the ordering argument actually holds.

Seen on 25+ recent Windows (2019 x64 / 11 aarch64) CI runs.
@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 6:19 PM PT - Apr 29th, 2026

@robobun, your commit e39607f has 2 failures in Build #49257 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29954

That installs a local version of the PR into your bun-29954 executable, so you can run:

bun-29954 --bun

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 247fe57f-3ae6-4c6c-af2d-e7228ee0ba2b

📥 Commits

Reviewing files that changed from the base of the PR and between e257f2d and e39607f.

📒 Files selected for processing (1)
  • test/js/web/fetch/fetch-http2-client.test.ts

Walkthrough

The test file introduces a secondary stream "barrier" that responds with server-side HEADERS(204) for non-id===1 requests, while preserving the original gzip decode failure behavior for id===1. Subprocess logic is refactored to reuse a single TLS object and validate ordered stdout assertions across pooled HTTP/2 session requests.

Changes

Cohort / File(s) Summary
HTTP/2 Raw-Frame Test
test/js/web/fetch/fetch-http2-client.test.ts
Introduces secondary stream barrier with server-side HEADERS(204) response. Refactors subprocess logic to reuse TLS object and issue sequential requests. Changes assertions to validate ordered stdout sequence (rejected then 204), single connection reuse, and RST_STREAM(CANCEL) capture for stream 1.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: deflaking a test by introducing a same-session barrier pattern to ensure HTTP/2 RST frames reach the server.
Description check ✅ Passed The description comprehensively addresses both required sections: it clearly explains what the PR does (fixes a flaky test with detailed root cause analysis) and how it was verified (58/0 pass across 3 runs, 20/20 in isolation).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — test-only deflake with a sound barrier pattern already used by the neighbouring 303-redirect test.

Extended reasoning...

Overview

This PR modifies a single flaky test case in test/js/web/fetch/fetch-http2-client.test.ts (client RSTs the stream when it abandons on a local error). The fix adds a second fetch on the same pooled h2 session after the error, acting as a delivery barrier so the RST_STREAM(1) frame is provably on the wire before the subprocess exits. The server handler branches on stream id (1 → bad gzip, else → 204), and assertions are extended to verify the barrier's 204 arrived and that state.connections === 1 (proving the barrier rode the same socket so FIFO ordering applies).

Security risks

None. This is a test-only change with no production code touched. The subprocess script is constructed by the test itself with a localhost URL; no new attack surface.

Level of scrutiny

Low. Test-only deflake, well-explained root cause (Windows abortive close drops buffered TLS writes on process exit), and the barrier technique mirrors the existing 303 redirect on a streaming-body POST test directly above it. The added connections === 1 assertion is a nice self-check that the ordering argument actually holds rather than masking a regression.

Other factors

No bugs found by the bug-hunting system. The PR description includes verification (58/58 pass ×3, 20/20 in isolation). The change is small, self-contained, and strictly tightens the test (it now also asserts the session was reused). No outstanding reviewer comments.

@Jarred-Sumner Jarred-Sumner enabled auto-merge (squash) April 29, 2026 23:55
@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

CI failure on windows-2019-x64-test-bun is unrelated to this change — it's test/bake/dev-and-prod.test.ts (DEV:dev-and-prod-12: hmr handles rapid consecutive edits timing out after 15s ×3), which appears in the flaky annotation of 22 of the last 40 builds across all branches.

fetch-http2-client.test.ts itself passed clean on every lane that's run so far, including the two Windows platforms where it was previously flaking:

lane result
windows 2019 x64 58 pass / 0 fail (1.97s)
windows 11 aarch64 58 pass / 0 fail (1.86s)
windows 2019 x64-baseline 58 pass / 0 fail
debian 13 x64-asan 58 pass / 0 fail

Don't have write_builds scope to retry the failed shard from here.

@Jarred-Sumner Jarred-Sumner disabled auto-merge April 29, 2026 23:55
@Jarred-Sumner Jarred-Sumner merged commit 1a0250b into main Apr 29, 2026
73 of 78 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/9c5297fb/deflake-h2-client-rst-test branch April 29, 2026 23:56
robobun added a commit that referenced this pull request May 1, 2026
Build #49698 failed on debian-13-x64-asan-test-bun with
AtomStringImpl::remove() assertion in fetch-http2-client.test.ts —
a pre-existing ASAN flake unrelated to these MySQL changes
(recent deflake PRs #29954, #29905, #29863, #29809 target the same test).

Empty commit to retrigger CI.
robobun pushed a commit that referenced this pull request May 6, 2026
robobun added a commit that referenced this pull request May 12, 2026
Flaky test failures on x64-asan / windows-x64 unrelated to the
Transpiler sourcemap diff:

- fetch-http2-client: WebKit AtomString wasRemoved assertion
  (h2 client under ASAN has recent deflaking history — #29954, #29809)
- test-worker-nested-uncaught: panic
  "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM
  teardown race
- test-http-should-emit-close-when-connection-is-aborted: same failure
  landed on PR #30540's neighbour build 53635
- AsyncLocalStorage-tracking / bun-install-registry: async_hooks /
  install flakes

None of these touch Bun.Transpiler / js_printer / sourcemap. All
Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…me-session barrier (oven-sh#29954)

## Flake

`fetch() over HTTP/2 > raw frame server > client RSTs the stream when it
abandons on a local error` fails on Windows (2019 x64, 2019
x64-baseline, 11 aarch64 — occasionally macOS ASAN) with:

```
expect(state.rst).toEqual([{ id: 1, code: 8 }])
- Expected: [{ code: 8, id: 1 }]
+ Received: []
```

Seen in 25+ of the last 65 completed BuildKite runs across branches
(builds 48962-49247).

## Cause

The subprocess script is:

```js
try { await (await fetch(url, { tls })).arrayBuffer(); console.log("ok"); }
catch (e) { console.log("rejected", e.code ?? e.message); }
```

After `handleResponseBody` throws on the invalid gzip payload, the h2
client writes a 13-byte `RST_STREAM(CANCEL)` frame to the session's
write buffer and flushes it to the TLS socket. But "flushed to the TLS
socket" only means accepted into the BoringSSL/uSockets layer — not that
it's on the wire. The catch block then prints and the script ends, so
the event loop drains and the process exits.

On Windows, process termination does an abortive close (TCP RST) on open
sockets, dropping anything still in the TLS write buffer / kernel send
queue. The raw server's `socket.on('data')` never sees the RST_STREAM
frame; `state.allClosed()` resolves on the abrupt close; `state.rst` is
`[]`.

## Fix

Add a second fetch on the same pooled session after the error. The
session survives a stream-level body-parse failure (only the stream is
RST'd and removed; `maybeRelease()` pools the socket), so the follow-up
request reuses it as stream 3.

`RST_STREAM(1)` is FIFO-queued ahead of `HEADERS(3)` on that one socket,
so the 204 response arriving back at the subprocess proves the RST
reached the server before the process exits. The test also asserts
`state.connections === 1` so the ordering argument actually holds
(barrier rode the same socket).

This is the same pattern the neighbouring `303 redirect on a
streaming-body POST RSTs the half-open upload stream` test already
relies on — there the RST for stream 1 is naturally followed by the
redirect's follow-up GET on stream 3 before the subprocess exits.

## Verification

- Full file: 58 pass / 0 fail, ×3 runs
- Fixed case in isolation: 20/20 pass

Co-authored-by: robobun <robobun@users.noreply.github.com>
robobun added a commit that referenced this pull request May 23, 2026
Flaky test failures on x64-asan / windows-x64 unrelated to the
Transpiler sourcemap diff:

- fetch-http2-client: WebKit AtomString wasRemoved assertion
  (h2 client under ASAN has recent deflaking history — #29954, #29809)
- test-worker-nested-uncaught: panic
  "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM
  teardown race
- test-http-should-emit-close-when-connection-is-aborted: same failure
  landed on PR #30540's neighbour build 53635
- AsyncLocalStorage-tracking / bun-install-registry: async_hooks /
  install flakes

None of these touch Bun.Transpiler / js_printer / sourcemap. All
Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
robobun added a commit that referenced this pull request May 23, 2026
Flaky test failures on x64-asan / windows-x64 unrelated to the
Transpiler sourcemap diff:

- fetch-http2-client: WebKit AtomString wasRemoved assertion
  (h2 client under ASAN has recent deflaking history — #29954, #29809)
- test-worker-nested-uncaught: panic
  "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM
  teardown race
- test-http-should-emit-close-when-connection-is-aborted: same failure
  landed on PR #30540's neighbour build 53635
- AsyncLocalStorage-tracking / bun-install-registry: async_hooks /
  install flakes

None of these touch Bun.Transpiler / js_printer / sourcemap. All
Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
robobun added a commit that referenced this pull request May 25, 2026
Flaky test failures on x64-asan / windows-x64 unrelated to the
Transpiler sourcemap diff:

- fetch-http2-client: WebKit AtomString wasRemoved assertion
  (h2 client under ASAN has recent deflaking history — #29954, #29809)
- test-worker-nested-uncaught: panic
  "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM
  teardown race
- test-http-should-emit-close-when-connection-is-aborted: same failure
  landed on PR #30540's neighbour build 53635
- AsyncLocalStorage-tracking / bun-install-registry: async_hooks /
  install flakes

None of these touch Bun.Transpiler / js_printer / sourcemap. All
Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
robobun added a commit that referenced this pull request May 25, 2026
Flaky test failures on x64-asan / windows-x64 unrelated to the
Transpiler sourcemap diff:

- fetch-http2-client: WebKit AtomString wasRemoved assertion
  (h2 client under ASAN has recent deflaking history — #29954, #29809)
- test-worker-nested-uncaught: panic
  "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM
  teardown race
- test-http-should-emit-close-when-connection-is-aborted: same failure
  landed on PR #30540's neighbour build 53635
- AsyncLocalStorage-tracking / bun-install-registry: async_hooks /
  install flakes

None of these touch Bun.Transpiler / js_printer / sourcemap. All
Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
robobun added a commit that referenced this pull request May 27, 2026
Flaky test failures on x64-asan / windows-x64 unrelated to the
Transpiler sourcemap diff:

- fetch-http2-client: WebKit AtomString wasRemoved assertion
  (h2 client under ASAN has recent deflaking history — #29954, #29809)
- test-worker-nested-uncaught: panic
  "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM
  teardown race
- test-http-should-emit-close-when-connection-is-aborted: same failure
  landed on PR #30540's neighbour build 53635
- AsyncLocalStorage-tracking / bun-install-registry: async_hooks /
  install flakes

None of these touch Bun.Transpiler / js_printer / sourcemap. All
Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
robobun added a commit that referenced this pull request Jun 5, 2026
Flaky test failures on x64-asan / windows-x64 unrelated to the
Transpiler sourcemap diff:

- fetch-http2-client: WebKit AtomString wasRemoved assertion
  (h2 client under ASAN has recent deflaking history — #29954, #29809)
- test-worker-nested-uncaught: panic
  "EventLoop.enqueueTaskConcurrent: VM has terminated" — worker VM
  teardown race
- test-http-should-emit-close-when-connection-is-aborted: same failure
  landed on PR #30540's neighbour build 53635
- AsyncLocalStorage-tracking / bun-install-registry: async_hooks /
  install flakes

None of these touch Bun.Transpiler / js_printer / sourcemap. All
Linux/Darwin/FreeBSD build-zig and build-cpp lanes passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants