Skip to content

http2: reclaim closed-stream entries from the session streams map#30416

Open
robobun wants to merge 19 commits into
mainfrom
farm/7d5b17e1/fix-http2-stream-leak
Open

http2: reclaim closed-stream entries from the session streams map#30416
robobun wants to merge 19 commits into
mainfrom
farm/7d5b17e1/fix-http2-stream-leak

Conversation

@robobun

@robobun robobun commented May 9, 2026

Copy link
Copy Markdown
Collaborator

Fixes #30415

Reproduction

Sustained http2.connect() session reuse (the pattern @smithy/node-http-handler uses in AWS SDK v3, including @aws-sdk/client-kinesis) leaked native RSS at ~380 MB/hour on Bun. Node plateaued flat. I reproduced without AWS by pointing the SDK at a local HTTP/2 mock:

  • 64 concurrent "shards" × ~60 requests/second on one pooled session
  • Bun 1.3.13 / 1.3.14: RSS climbs 173 → 221 MB in 450s (~380 MB/hr)
  • Node 24: pinned at 339–340 MB over 8 minutes / 360k requests
  • Raw fetch() over the same H2 endpoint at double the rate: flat — the fetch-path H2 client isn't the culprit
  • Raw node:http2 client: reproduces — isolating the leak to this module

Bun.gc(true) reclaims most of it at the end, so the memory is retained (JS-reachable) rather than unfreed buffers, but natural GC doesn't keep up during steady-state load.

Cause

H2FrameParser.streams (bun.U32HashMap(*Stream)) only dropped entries in deinit(). Every session.request() called streams.put(id, Stream.new(...)); normal stream closure (END_STREAM, RST_STREAM, AbortSignal, error paths) just set state = .CLOSED and called freeResources — the heap-allocated Stream and its map entry outlived every request until the whole session died. At ~150 bytes per closed stream, a long-lived pooled session bleeds memory proportional to total request count.

The parallel fetch-path client in src/http/h2_client/ClientSession.zig already has the right pattern (removeStream(*Stream) → swapRemove + deinit); this PR brings the same discipline to src/runtime/api/bun/h2_frame_parser.zig.

Fix

  • Add removeStreamByID(id)fetchRemove the map entry, call freeResources (idempotent), bun.destroy the Stream. Call it at every state = .CLOSED transition after the terminal JS dispatch (onStreamEnd/onStreamError/onAborted) completes.
  • For paths that walk the map via StreamResumableIterator (flushStreamQueue, emitAbortToAllStreams, emitErrorToAllStreams, other teardown loops) defer the reclaim to a post-iteration sweep — removeAllClosedStreams collects every slot whose Stream is .CLOSED and removes them. Removing mid-iteration would still be memory-safe (std.HashMap.remove tombstones), but a re-entrant dispatch could bun.destroy this under our feet while flushQueue's defer is still reading this.dataFrameQueue.
  • flushQueue rewrites its defer to re-resolve the stream by id after dispatchWriteCallback — the user's write callback can synchronously reset the stream (stream.close → rstStream → endStream → removeStreamByID), so trusting this across the callback was already a latent UAF that becomes observable once entries are reclaimed.
  • handleReceivedStreamID returns null for an id ≤ lastStreamID that isn't in the map, instead of creating a fresh Stream. Without this, a stale frame arriving after we removed the entry would resurrect a ghost stream with default state. The affected frame handlers (handleDataFrame, handleHeadersFrame, handleRSTStreamFrame, handleContinuationFrame, handlePriorityFrame) only escalate to a connection-level GOAWAY when the frame itself is connection-scoped (streamIdentifier == 0); a late frame on a closed stream is dropped per RFC 7540 §5.1.
  • JS-side destruction schedules setImmediate(rstNextTick) after we've already removed the native stream; parser.rstStream now silently returns .false instead of throwing Invalid stream id. noTrailers, sendTrailers, writeStream, getStreamState, getEndAfterHeaders, and isStreamAborted likewise degrade to no-ops / closed defaults for a removed id — the JS Duplex is mid-destroy by the time these fire.

Verification

test/regression/issue/30415.test.ts makes 50 sequential requests on one http2.connect() session, then reads client.state.streamCount (a new field on getCurrentState's output that exposes parser.streams.count()):

  • Before the fix: streamCount grows to ~50. Test fails.
  • After the fix: streamCount is 0 (or 1 if the last stream's setImmediate(rstNextTick) hasn't run). Test passes.

Passes the gate: git stash push -- src/ && bun bd test test/regression/issue/30415.test.ts fails without the fix, passes with it. All 241 Bun http2 tests in test/js/node/http2/node-http2.test.js still pass (the 4 pre-existing failures — 3 node none Node-ref tests and http2 server with minimal maxSessionMemory handles multiple requests — fail on main too in this container, unrelated to this PR).

The H2FrameParser.streams HashMap only dropped entries on full session
teardown — every session.request() added a *Stream but normal stream
closure (END_STREAM, RST_STREAM, abort) just marked state=CLOSED and
freed per-stream resources, leaving the heap-allocated Stream and map
entry alive until the whole parser deinit'd. Long-lived pooled sessions
(AWS SDK v3 @smithy/node-http-handler keeps one http2.connect() per
origin) leaked ~150 B per request × thousands of requests/minute.

Add removeStreamByID that unlinks the map entry and destroys the Stream
allocation. Call it at every state→.CLOSED transition once the terminal
onStreamEnd/onStreamError/onAborted dispatch has completed. For paths
inside a StreamResumableIterator walk (flushStreamQueue, emitAbort/Error,
and the similar teardown loops) defer the reclaim until the iterator
returns — removeAllClosedStreams sweeps every slot whose Stream is
CLOSED. JS-side RST (Duplex destroy → setImmediate(rstNextTick) →
parser.rstStream) now silently returns when the id is already gone
rather than throwing.

Update handleReceivedStreamID so an id ≤ lastStreamID that isn't in the
map returns null instead of creating a fresh Stream — RFC 7540 §5.1
permits peers to send a few stale frames after a stream closes; the
frame handlers already had orelse paths which now only escalate to a
connection-level GOAWAY when the frame itself is connection-scoped
(streamIdentifier == 0). The side-effect frame handlers (noTrailers,
sendTrailers, writeStream, getStreamState, isStreamAborted,
getEndAfterHeaders) degrade gracefully to no-ops / closed defaults for
a removed id instead of throwing — the JS Duplex is already mid-destroy
by the time these fire.

Fixes #30415
@robobun

robobun commented May 9, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 5:11 AM PT - May 9th, 2026

@robobun, your commit bc4d535 has 1 failures in Build #53016 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 30416

That installs a local version of the PR into your bun-30416 executable, so you can run:

bun-30416 --bun

@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. @fastify/http-proxy with HTTP/2 spuriously emits FST_REPLY_FROM_HTTP2_REQUEST_TIMEOUT after idle on Bun #30307 - Reporter explicitly notes Bun keeps stale streams that never get cancelled or drained, directly matching the closed-stream accumulation this PR fixes with removeStreamByID() and removeAllClosedStreams()

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #30307

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Centralizes safe stream removal, drains late/stale HTTP/2 frames without throwing, makes JS stream APIs permissive for removed streams, reclaims CLOSED streams after JS callbacks, and adds regression tests validating bounded streamCount and an abort-related UAF fix.

Changes

HTTP/2 Stream Removal and Memory Leak Fix

Layer / File(s) Summary
Stream Removal Infrastructure
src/runtime/api/bun/h2_frame_parser.zig
Adds removeStreamByID() helper to safely remove/free streams by ID, removeAllClosedStreams() to batch-reclaim CLOSED streams after iterator walks, and updates Stream.flushQueue() to capture stream ID and re-look up after JS re-entrancy.
Discard / Late-Frame Handling and Frame Handlers
src/runtime/api/bun/h2_frame_parser.zig
Adds discardFramePayload() to drain payloads for missing/stale streams while keeping HPACK state in sync. DATA/HEADERS/CONTINUATION/PRIORITY/RST handlers treat streamIdentifier == 0 as connection PROTOCOL_ERROR and otherwise drain late frames or dispatch terminal events then remove streams.
Window Update & Received-ID Semantics
src/runtime/api/bun/h2_frame_parser.zig
handleWindowUpdateFrame() credits connection window for streamIdentifier==0 and ignores WINDOW_UPDATEs for removed streams; handleReceivedStreamID() returns null for stale ids (<= lastStreamID).
Lifecycle & Error Cleanup
src/runtime/api/bun/h2_frame_parser.zig
abortStream(), endStream(), and sendData() END_STREAM/error paths remove streams via removeStreamByID() after dispatching terminal JS events; request() error branches remove failed/invalid streams post-dispatch; session teardown/broadcast helpers call removeAllClosedStreams().
JS API Robustness
src/runtime/api/bun/h2_frame_parser.zig
getEndAfterHeaders(), isStreamAborted(), rstStream(), writeStream() return benign values; getStreamState() synthesizes CLOSED state; noTrailers() and sendTrailers() return undefined for missing streams. getCurrentState() adds a streamCount field.
Regression Tests
test/regression/issue/30415.test.ts
Adds two tests: one that issues 50 sequential requests over a single persistent Node http2.connect() client and asserts bounded client.state.streamCount and completion via printed ok retained=... lastId=...; another that reproduces an abort-related UAF by aborting requests on finish across multiple POSTs and asserts the subprocess prints ok and exits 0.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: reclaiming closed-stream entries from the session streams map to fix a memory leak.
Description check ✅ Passed The description comprehensively covers the issue, reproduction steps, root cause analysis, the fix, and verification approach matching the template structure.
Linked Issues check ✅ Passed The changes fully address issue #30415: removing Stream entries after closure, preventing unbounded memory growth under sustained pooled session reuse, and adding regression tests validating streamCount remains low.
Out of Scope Changes check ✅ Passed All changes directly address #30415: Stream reclamation in h2_frame_parser.zig, JS-facing API resilience for removed streams, and regression tests validating the fix without extraneous modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/runtime/api/bun/h2_frame_parser.zig (1)

3510-3530: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Don't dereference stream after dispatchWriteCallback().

dispatchWriteCallback(callback) re-enters JS before this close path runs. If user code synchronously destroys or resets the stream there, removeStreamByID() can free stream, and the remaining accesses here (waitForTrailers, state, getIdentifier()) become a use-after-free. flushQueue() already fixed this pattern by re-resolving via stream id; sendData() needs the same treatment.

🛠️ Suggested direction
         defer {
             if (!enqueued) {
                 this.dispatchWriteCallback(callback);
+                const stream_after_callback = this.streams.get(stream_id);
                 var closed = false;
                 if (close) {
-                    if (stream.waitForTrailers) {
-                        this.dispatch(.onWantTrailers, stream.getIdentifier());
-                    } else {
+                    if (stream_after_callback) |stream| {
+                        if (stream.waitForTrailers) {
+                            this.dispatch(.onWantTrailers, stream.getIdentifier());
+                        } else {
                             const identifier = stream.getIdentifier();
                             identifier.ensureStillAlive();
                             if (stream.state == .HALF_CLOSED_REMOTE) {
                                 stream.state = .CLOSED;
                                 stream.freeResources(this, false);
                                 closed = true;
                             } else {
                                 stream.state = .HALF_CLOSED_LOCAL;
                             }
                             this.dispatchWithExtra(.onStreamEnd, identifier, jsc.JSValue.jsNumber(`@intFromEnum`(stream.state)));
-                    }
+                        }
+                    }
                 }
                 if (closed) this.removeStreamByID(stream_id);
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/api/bun/h2_frame_parser.zig` around lines 3510 - 3530, The defer
block in sendData dereferences `stream` after calling
`dispatchWriteCallback(callback)`, which can re-enter JS and cause `stream` to
be freed; instead, after calling `dispatchWriteCallback` re-resolve the stream
by its id (use the same id captured earlier) via the repository method used
elsewhere (e.g., `getStreamByID` or equivalent), verify the stream still exists,
and only then access `waitForTrailers`, `state`, `getIdentifier()`, call
`dispatchWithExtra`, and possibly set `closed`; if the re-resolved stream is
null, skip the post-callback close logic entirely and avoid any use-after-free
before calling `removeStreamByID(stream_id)` when appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/api/bun/h2_frame_parser.zig`:
- Around line 2037-2047: The branch handling stream_ == null currently returns
early without consuming the frame payload or clearing currentFrame, which leaves
remainingLength and currentFrame stale; instead call the discard helper (the
function that wraps handleIncommingPayload()) to consume exactly frame.length
bytes before returning, ensure remainingLength is decremented and currentFrame
cleared once the discard finishes, and for DATA frames also update the
connection-level flow-control accounting the same way handleIncommingPayload()
would; apply the same change to the other null-stream branches found around the
other occurrences (the branches referencing stream_ and frame.streamIdentifier
and the calls to this.sendGoAway).

In `@test/regression/issue/30415.test.ts`:
- Around line 84-87: The test currently asserts stderr === "" which makes it
flaky; update the assertion in the expect(...) call (the object containing
stderr, stdout, exitCode) to stop requiring stderr be empty — either remove the
stderr property from the expected object or relax it (e.g., allow any string)
while keeping the stdout check (expect.stringMatching(/^ok retained=\d+
lastId=\d+$/)) and exitCode === 0 unchanged; locate this in the test where
variables stderr, stdout, and exitCode are assembled and adjust the expect to
omit or relax the stderr assertion.

---

Outside diff comments:
In `@src/runtime/api/bun/h2_frame_parser.zig`:
- Around line 3510-3530: The defer block in sendData dereferences `stream` after
calling `dispatchWriteCallback(callback)`, which can re-enter JS and cause
`stream` to be freed; instead, after calling `dispatchWriteCallback` re-resolve
the stream by its id (use the same id captured earlier) via the repository
method used elsewhere (e.g., `getStreamByID` or equivalent), verify the stream
still exists, and only then access `waitForTrailers`, `state`,
`getIdentifier()`, call `dispatchWithExtra`, and possibly set `closed`; if the
re-resolved stream is null, skip the post-callback close logic entirely and
avoid any use-after-free before calling `removeStreamByID(stream_id)` when
appropriate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b204c110-38e7-45c0-aae1-cd70cfad2438

📥 Commits

Reviewing files that changed from the base of the PR and between 6d0d86b and b030246.

📒 Files selected for processing (2)
  • src/runtime/api/bun/h2_frame_parser.zig
  • test/regression/issue/30415.test.ts

Comment thread src/runtime/api/bun/h2_frame_parser.zig Outdated
Comment thread test/regression/issue/30415.test.ts Outdated
Before: when a frame arrived for a stream we'd just removed (RFC 7540
§5.1 permits late frames during the RST/END_STREAM propagation window)
the null-stream branch returned data.len WITHOUT advancing
remainingLength or clearing currentFrame. onNativeRead would consume
the chunk; the next read would re-enter readBytes with currentFrame
still set, re-dispatch to the same handler, and loop — the parser got
stuck on that frame and every subsequent byte was misparsed.

Add discardFramePayload() that routes through handleIncommingPayload()
so remainingLength/currentFrame advance the same way as a normal
dispatched frame. DATA charges the connection-level receive window
even though the stream is gone, so the peer's view stays consistent.
HEADERS/CONTINUATION walk the hpack decoder over the (de-padded,
de-prioritized) block; skipping decode would desync the shared
dynamic table and corrupt every subsequent request on the session.

Tests: the regression test dropped its stderr==="" check — ASAN lanes
emit benign stderr lines and the test only needs to verify the
streamCount output on stdout.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/runtime/api/bun/h2_frame_parser.zig (1)

3579-3599: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Re-resolve the stream after the immediate write callback.

sendData() still dereferences stream after dispatchWriteCallback(callback). If that callback synchronously closes/resets the same stream, removeStreamByID() can destroy the allocation and the later stream.waitForTrailers / stream.state reads become a use-after-free. Mirror the flushQueue() fix here by re-looking up stream_id after the callback and bailing out if it is already gone.

Suggested direction
         defer {
             if (!enqueued) {
                 this.dispatchWriteCallback(callback);
-                var closed = false;
-                if (close) {
-                    if (stream.waitForTrailers) {
-                        this.dispatch(.onWantTrailers, stream.getIdentifier());
+                var closed = false;
+                if (close) {
+                    if (this.streams.get(stream_id)) |stream_after_callback| {
+                        if (stream_after_callback.waitForTrailers) {
+                            this.dispatch(.onWantTrailers, stream_after_callback.getIdentifier());
+                        } else {
+                            const identifier = stream_after_callback.getIdentifier();
+                            identifier.ensureStillAlive();
+                            if (stream_after_callback.state == .HALF_CLOSED_REMOTE) {
+                                stream_after_callback.state = .CLOSED;
+                                stream_after_callback.freeResources(this, false);
+                                closed = true;
+                            } else {
+                                stream_after_callback.state = .HALF_CLOSED_LOCAL;
+                            }
+                            this.dispatchWithExtra(.onStreamEnd, identifier, jsc.JSValue.jsNumber(`@intFromEnum`(stream_after_callback.state)));
+                        }
-                    } else {
-                        const identifier = stream.getIdentifier();
-                        identifier.ensureStillAlive();
-                        if (stream.state == .HALF_CLOSED_REMOTE) {
-                            stream.state = .CLOSED;
-                            stream.freeResources(this, false);
-                            closed = true;
-                        } else {
-                            stream.state = .HALF_CLOSED_LOCAL;
-                        }
-                        this.dispatchWithExtra(.onStreamEnd, identifier, jsc.JSValue.jsNumber(`@intFromEnum`(stream.state)));
                     }
                 }
                 if (closed) this.removeStreamByID(stream_id);
             }
             this.deref();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/api/bun/h2_frame_parser.zig` around lines 3579 - 3599, In
sendData()’s defer block the code continues to use the local `stream` after
calling `this.dispatchWriteCallback(callback)`, which can cause use-after-free
if the callback synchronously removed the stream; update the block to re-lookup
the stream by `stream_id` (e.g. call the same lookup used elsewhere, like in
`flushQueue()`), and if the lookup returns null/absent bail out early before
touching `stream.waitForTrailers`, `stream.state`, `stream.getIdentifier()`, or
calling `this.removeStreamByID(stream_id)`; keep existing dispatch logic but
operate on the freshly resolved stream reference only.
♻️ Duplicate comments (1)
src/runtime/api/bun/h2_frame_parser.zig (1)

2101-2105: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Consume invalid connection-stream frames before returning.

These streamIdentifier == 0 branches still sendGoAway(...); return data.len; without advancing remainingLength/currentFrame. A bad DATA, HEADERS, CONTINUATION, PRIORITY, or RST_STREAM frame on stream 0 can leave the parser pinned on that frame until the peer closes the socket. Route these cases through discardFramePayload(frame, data) after sendGoAway(...) as well.

Suggested direction
             if (frame.streamIdentifier == 0) {
                 this.sendGoAway(frame.streamIdentifier, ErrorCode.PROTOCOL_ERROR, "Data frame on connection stream", this.lastStreamID, true);
-                return data.len;
+                return this.discardFramePayload(frame, data);
             }

Apply the same change to the analogous streamIdentifier == 0 branches in the RST_STREAM, PRIORITY, CONTINUATION, and HEADERS handlers.

Also applies to: 2341-2348, 2413-2419, 2456-2465, 2509-2516

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/api/bun/h2_frame_parser.zig` around lines 2101 - 2105, The
handlers that detect invalid connection-stream frames call sendGoAway(...) and
then return data.len without consuming the frame payload, which can pin the
parser; change each branch that checks frame.streamIdentifier == 0 (the DATA,
RST_STREAM, PRIORITY, CONTINUATION, and HEADERS handlers) to call
sendGoAway(frame.streamIdentifier, ErrorCode.PROTOCOL_ERROR, "...",
this.lastStreamID, true) and then return this.discardFramePayload(frame, data)
instead of returning data.len so the payload is consumed and
remainingLength/currentFrame advance.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/runtime/api/bun/h2_frame_parser.zig`:
- Around line 3579-3599: In sendData()’s defer block the code continues to use
the local `stream` after calling `this.dispatchWriteCallback(callback)`, which
can cause use-after-free if the callback synchronously removed the stream;
update the block to re-lookup the stream by `stream_id` (e.g. call the same
lookup used elsewhere, like in `flushQueue()`), and if the lookup returns
null/absent bail out early before touching `stream.waitForTrailers`,
`stream.state`, `stream.getIdentifier()`, or calling
`this.removeStreamByID(stream_id)`; keep existing dispatch logic but operate on
the freshly resolved stream reference only.

---

Duplicate comments:
In `@src/runtime/api/bun/h2_frame_parser.zig`:
- Around line 2101-2105: The handlers that detect invalid connection-stream
frames call sendGoAway(...) and then return data.len without consuming the frame
payload, which can pin the parser; change each branch that checks
frame.streamIdentifier == 0 (the DATA, RST_STREAM, PRIORITY, CONTINUATION, and
HEADERS handlers) to call sendGoAway(frame.streamIdentifier,
ErrorCode.PROTOCOL_ERROR, "...", this.lastStreamID, true) and then return
this.discardFramePayload(frame, data) instead of returning data.len so the
payload is consumed and remainingLength/currentFrame advance.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2e632d86-6c06-45bd-b849-fabc559311a5

📥 Commits

Reviewing files that changed from the base of the PR and between 548c443 and 3d11150.

📒 Files selected for processing (2)
  • src/runtime/api/bun/h2_frame_parser.zig
  • test/regression/issue/30415.test.ts

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🔴 src/runtime/api/bun/h2_frame_parser.zig:2037-2048 — These early returns for stale streams (stream_ == null && id != 0) leave the connection in a broken state: they return data.len without clearing currentFrame/remainingLength, so the next socket read re-dispatches the same stale frame forever and the parser permanently swallows all incoming bytes. They also skip the per-RFC-7540 §5.1 minimal processing — adjustWindowSize() for DATA (connection flow-control window leaks) and HPACK decode for HEADERS/CONTINUATION (dynamic table desyncs, corrupting all subsequent header blocks). The same pattern appears in handleRSTStreamFrame (~2280), handlePriorityFrame (~2350), handleContinuationFrame (~2391), and handleHeadersFrame (~2440); the stale-id branch needs to consume exactly the frame's payload via handleIncommingPayload, account for connection flow control on DATA, and feed header blocks through the HPACK decoder before discarding.

    Extended reasoning...

    What the bug is

    The new "stale stream" path in handleDataFrame / handleHeadersFrame / handleContinuationFrame / handleRSTStreamFrame / handlePriorityFrame returns data.len immediately when stream_ == null and frame.streamIdentifier != 0. Before this PR, that orelse branch was only reachable for streamIdentifier == 0 (because handleReceivedStreamID always created a fresh Stream for any nonzero id not in the map), and it always sent GOAWAY, terminating the connection — so leaving parser/decoder state dirty didn't matter. After this PR, handleReceivedStreamID returns null for any id <= lastStreamID that has been removed from the map, and the connection is kept alive. The early return now leaves three pieces of connection-level state corrupted.

    Code path that triggers it

    In readBytes (h2_frame_parser.zig:2787–2789), a new frame header is parsed and before dispatching to the handler we set:

    this.currentFrame = header;
    this.remainingLength = header.length;
    const stream = this.handleReceivedStreamID(header.streamIdentifier);

    The handler is then called with data = bytes[FrameHeader.byteSize..] — the entire remaining socket buffer, which may contain subsequent frames. The handler is responsible for consuming exactly its payload and clearing currentFrame (normally via handleIncommingPayload at line 1881, or the explicit this.currentFrame = null at line 2134 in handleDataFrame's normal path).

    Why the existing code doesn't prevent it

    The stale-stream branch never reaches handleIncommingPayload, so remainingLength is never decremented and currentFrame is never cleared. The pre-existing return data.len paths all sent GOAWAY first, which tore down the connection; this PR introduces the first return data.len path that leaves the connection alive.

    Step-by-step proof

    1. Client opens stream 5, then aborts it (req.close() / AbortSignalrstStreamendStreamremoveStreamByID(5)). Stream 5 is no longer in this.streams.
    2. Server's in-flight DATA frame for stream 5 (length = 100) arrives in a TCP segment that also contains a HEADERS frame for live stream 7: bytes = [DATA hdr(9) | 100B payload | HEADERS hdr(9) | ...].
    3. readBytes parses the DATA header, sets currentFrame = {DATA, len=100, id=5}, remainingLength = 100, calls handleReceivedStreamID(5)null (5 ≤ lastStreamID, not in map).
    4. handleDataFrame(header, bytes[9..], null) hits the orelse branch. frame.streamIdentifier == 5 != 0, so no GOAWAY. Returns data.len = bytes.len - 9.
    5. The outer loop (onNativeRead, lines 4722–4725) does bytes = bytes[bytes.len..] and exits. The HEADERS frame for stream 7 was silently discarded.
    6. currentFrame is still {DATA, len=100, id=5}, remainingLength is still 100.
    7. Next socket read delivers N new bytes. readBytes at line 2704 sees currentFrame set, calls handleReceivedStreamID(5)null again, handleDataFrame(header, bytes, null) → returns bytes.len again. Repeat forever. Every byte received on the session is consumed as "payload" for a frame that never completes. The session is permanently wedged.

    Additional connection-state corruption (RFC 7540 §5.1)

    Even if the byte-accounting were fixed to consume exactly min(remainingLength, data.len) and clear currentFrame, simply discarding the payload is still non-compliant. RFC 7540 §5.1 (closed state) requires endpoints to minimally process frames received on closed streams:

    • DATA → connection flow control. adjustWindowSize (line 1272) increments this.usedWindowSize; incrementWindowSizeIfNeeded (line 1291, deferred in onNativeRead) sends a connection-level WINDOW_UPDATE once usedWindowSize >= windowSize/2. Skipping it means the peer decremented its connection send window by frame.length and we never credit it back. On a long-lived pooled session (exactly the AWS-SDK use case this PR targets), each cancelled request with in-flight DATA permanently shrinks the peer's view of our connection window until all streams stall.
    • HEADERS / CONTINUATION → HPACK state. RFC 7540 §4.3: "A receiving endpoint reassembles the header block and performs decompression even if the frames are to be discarded." lshpack maintains a connection-wide dynamic table; header blocks contain dynamic-table size updates and indexed insertions. Skipping decode() desynchronizes our table from the peer's encoder, so every subsequent HEADERS on any stream decodes wrong indices or fails with COMPRESSION_ERROR. One aborted request with in-flight response/trailing HEADERS poisons the entire session.

    Impact

    A realistic and common trigger — request cancellation while server data is in flight — permanently hangs the long-lived pooled HTTP/2 session that this PR is specifically trying to make viable. This is a regression: pre-PR, the same orelse branch was unreachable for nonzero ids.

    How to fix

    In each handler's stale-id (id != 0) branch, replace return data.len with logic that:

    1. Consumes exactly min(this.remainingLength, data.len) bytes (e.g. via handleIncommingPayload with a no-op stream, or inline), decrementing remainingLength and clearing currentFrame when it reaches 0.
    2. For DATA: call this.adjustWindowSize(null, @truncate(end)) so connection-level WINDOW_UPDATE accounting stays correct.
    3. For HEADERS/CONTINUATION: feed the assembled header-block fragment through this.decode() to keep the HPACK dynamic table in sync, then discard the result. (Optionally also send RST_STREAM(STREAM_CLOSED) per §5.1, but that does not excuse skipping HPACK decode.)
    4. For RST_STREAM/PRIORITY: just consume the fixed-length payload and clear currentFrame.

    Applies to lines ~2037–2048 (DATA), ~2280–2288 (RST_STREAM), ~2350–2356 (PRIORITY), ~2391–2400 (CONTINUATION), and ~2440–2449 (HEADERS).

  • 🟡 src/runtime/api/bun/h2_frame_parser.zig:3458-3463 — Minor consistency gap: setStreamPriority (h2_frame_parser.zig:3360-3362) still throws Invalid stream id for a removed entry, while the seven sibling lookups updated in this PR now degrade gracefully. Pre-PR, a closed stream lingered in the map and setStreamPriority returned .false via the !canSendData() && !canReceiveData() check at line 3364 — so stream.priority({...}) called in the window between native removeStreamByID and JS _destroy now throws where it previously no-opped. Consider orelse return .false for symmetry; not a blocker.

    Extended reasoning...

    What this is

    setStreamPriority at src/runtime/api/bun/h2_frame_parser.zig:3360-3362 is the one JS-reachable streams.get(id) lookup the PR did not convert to graceful degradation. It still does:

    var stream = this.streams.get(stream_id) orelse {
        return globalObject.throw("Invalid stream id", .{});
    };
    if (!stream.canSendData() and !stream.canReceiveData()) {
        return .false;
    }

    Before this PR, a CLOSED stream stayed in the map until session teardown, so the lookup succeeded and the very next check (!canSendData() && !canReceiveData()) returned .false silently. After this PR, removeStreamByID deletes the entry immediately after the terminal dispatch, so the same call now throws.

    Concrete trigger path

    1. Server sends END_STREAM → handleDataFrame/handleHeadersFrame sets state = .CLOSED, dispatches onStreamEnd, then calls removeStreamByID(id) synchronously.
    2. onStreamEnd (http2.ts) eventually calls stream.destroy(), but _destroy runs asynchronously and only nulls this[bunHTTP2Session] at http2.ts:2191 later.
    3. In the interval (e.g. inside the user's 'end' listener, or any microtask before _destroy runs), stream.destroyed === false and this[bunHTTP2Session] is still set, so stream.priority({weight: 1}) (http2.ts:2090-2097) passes assertSession(session) and calls native.setStreamPriority(id, options).
    4. Native lookup misses → throw "Invalid stream id".

    Pre-PR, step 4 found the lingering CLOSED entry and returned .false. This is a small behavioral regression: silent no-op → exception.

    Why the existing JS guards don't catch it

    priority() only guards on assertSession(session). It has no if (this.destroyed) / if (this.closed) check, and bunHTTP2Session is cleared in _destroy, which hasn't run yet in this window. Compare with rstStream (called from setImmediate(rstNextTick)), writeStream, noTrailers, sendTrailers, getStreamState, getEndAfterHeaders, isStreamAborted — all updated in this PR to return a closed-default for the identical race.

    Scope correction vs. the original report

    The original report also flagged getStreamContext and setStreamContext. Those are not affected:

    • getStreamContext has zero callers in src/js/ — unreachable from user code.
    • setStreamContext is only called from the streamStart handler (http2.ts:2884 server / :3389 client) immediately after streams.put() in handleReceivedStreamID, so the entry is guaranteed present.

    Only setStreamPriority is reachable in the post-reclaim window.

    On the refutation ("throwing is more Node-compatible")

    Node's Http2Stream.prototype.priority throws ERR_HTTP2_INVALID_STREAM only when this.destroyed === true. In the window described above, the Bun JS Duplex is not yet destroyed (_destroy hasn't run), so Node would not throw here — it would send the PRIORITY frame (or no-op). The refutation is right that priority() is a user-action method rather than part of the automatic destroy machinery the PR targets, and that calling it on a just-ended stream is unusual; that's why this is a nit, not a blocker. But it is a behavior change introduced by this PR, and the one-line fix keeps the lookup table consistent.

    Suggested fix

    var stream = this.streams.get(stream_id) orelse return .false;

    This restores the pre-PR semantics (silent no-op on a closed stream) and matches the pattern applied to the other seven entry points in this diff.

  • 🔴 src/runtime/api/bun/h2_frame_parser.zig:3512-3516 — This defer block has the same re-entrancy hazard you fixed in flushQueue: dispatchWriteCallback(callback) re-enters JS, and if the write callback synchronously aborts the stream's attached AbortSignal (SignalRef.abortListenerabortStreamremoveStreamByIDbun.destroy(stream)), the captured stream pointer is freed before lines 3515-3527 read stream.waitForTrailers / stream.getIdentifier() / stream.state. Before this PR the stale pointer was still allocated (only state = .CLOSED); now it's a real UAF. Apply the same fix as flushQueue — re-resolve via this.streams.get(stream_id) after the dispatch and bail if it's gone.

    Extended reasoning...

    What the bug is

    sendData's defer block at src/runtime/api/bun/h2_frame_parser.zig:3510-3531 calls this.dispatchWriteCallback(callback) (line 3512), which synchronously re-enters user JS, and then continues to dereference the captured stream: *Stream pointer at lines 3515 (stream.waitForTrailers), 3516/3518 (stream.getIdentifier()), and 3520/3521/3525/3527 (stream.state). If the user's write callback causes that stream's allocation to be bun.destroy'd before returning, every one of those reads is a use-after-free.

    The PR description explicitly identifies this exact hazard and patches it in flushQueue (lines 942-970: "Dispatching the write callback re-enters JS. User code may synchronously destroy or reset this stream … in which case this is freed … Look up by id after dispatch rather than trusting this") by re-resolving via client.streams.get(stream_id_local) after the dispatch. The same re-lookup was not applied to sendData's defer, which has the identical structure.

    Concrete trigger path (step-by-step)

    1. User creates a request with an abort signal: const ac = new AbortController(); const req = session.request(headers, { signal: ac.signal }). request() reads options.signal and calls stream.attachSignal(signal), wiring SignalRef.abortListener (lines 807-814) to fire on abort.
    2. User calls req.end(data, () => ac.abort()) — i.e. the final write's callback synchronously aborts. writeStreamsendData(stream, payload, close=true, callback).
    3. With no backpressure and outboundQueueSize == 0, the data frame is written directly (enqueued stays false). The function body completes and the defer runs.
    4. Defer line 3512: this.dispatchWriteCallback(callback)handlers.callWriteCallback → synchronous JS call into the user's callback.
    5. Inside the callback, ac.abort() fires. AbortSignal listeners run synchronously, so SignalRef.abortListener executes immediately on the native side: it looks up the stream (still present, state ≠ CLOSED) and calls parser.abortStream(stream, reason).
    6. abortStream (lines 1358-1368, as modified by this PR) sets state = .CLOSED, dispatches onAborted, writes the RST frame, and now ends with this.removeStreamByID(stream_id)fetchRemove + bun.destroy(stream).
    7. Control unwinds back to sendData's defer at line 3513. close is true, so line 3515 reads stream.waitForTrailersfreed memory. Depending on heap reuse this either reads garbage (taking the wrong branch and dispatching onWantTrailers/onStreamEnd with a dangling identifier), corrupts state by writing stream.state = … into a reused allocation, or crashes outright under ASAN/debug allocators.

    A second synchronous path also exists: calling session.destroy() from the write callback reaches emitErrorToAllStreams → marks every stream .CLOSEDremoveAllClosedStreams()bun.destroy on this stream, with the same result. (Note: the JS-side stream.close()/destroy() paths schedule rstStream via setImmediate and are async, so they do not trigger this — but the native AbortSignal listener and session.destroy() are synchronous.)

    Why existing code doesn't prevent it

    The this.ref() / this.deref() pair around the defer keeps the parser alive across re-entrancy, but there is no equivalent protection for the individual *Stream. Before this PR, abortStream only set state = .CLOSED and called freeResources (idempotent) — the Stream allocation itself survived until session deinit, so the stale pointer in sendData's defer still pointed at valid (if stale) memory and the reads were benign. This PR adds removeStreamByIDbun.destroy(stream) to abortStream, turning what was previously a logic glitch into an observable heap UAF.

    Impact

    Heap use-after-free reachable from user JS on the req.end(cb) path whenever a write callback aborts the request's signal (or destroys the session). This is plausible in real cancellation/timeout logic that fires during the final write. In release builds the immediate symptom will likely be garbage reads or silent state corruption (writing .HALF_CLOSED_LOCAL into a freed/reused slot at line 3525); under ASAN it's a hard crash.

    Fix

    Mirror the flushQueue fix: capture stream_id before the dispatch (already done at line 3506), and after dispatchWriteCallback returns, re-resolve the stream via this.streams.get(stream_id) instead of using the captured pointer. If the lookup returns null, the stream was destroyed during the callback — skip the rest of the defer (it's already closed and removed, so there's nothing to dispatch and removeStreamByID is a no-op). Something like:

    defer {
        if (!enqueued) {
            this.dispatchWriteCallback(callback);
            if (this.streams.get(stream_id)) |s| {
                var closed = false;
                if (close) {
                    if (s.waitForTrailers) {
                        this.dispatch(.onWantTrailers, s.getIdentifier());
                    } else {
                        const identifier = s.getIdentifier();
                        identifier.ensureStillAlive();
                        if (s.state == .HALF_CLOSED_REMOTE) {
                            s.state = .CLOSED;
                            s.freeResources(this, false);
                            closed = true;
                        } else if (s.state != .CLOSED) {
                            s.state = .HALF_CLOSED_LOCAL;
                        }
                        this.dispatchWithExtra(.onStreamEnd, identifier, jsc.JSValue.jsNumber(@intFromEnum(s.state)));
                    }
                }
                if (closed) this.removeStreamByID(stream_id);
            }
        }
        this.deref();
    }

- sendData's defer dispatches the write callback before reading
  stream.state / stream.waitForTrailers / stream.getIdentifier().
  dispatchWriteCallback re-enters JS; if user code synchronously
  aborts the attached AbortSignal (SignalRef.abortListener runs
  inline) or destroys the session, abortStream/emitErrorToAllStreams
  → removeStreamByID frees the *Stream. Mirror the flushQueue fix:
  re-resolve via streams.get(stream_id) after dispatch and skip the
  close path if the stream is gone (abortStream already emitted
  onAborted). Prevents a latent UAF.

- setStreamPriority was the one JS-reachable streams.get() lookup
  that still threw 'Invalid stream id' for a removed entry. Pre-PR
  the CLOSED stream lingered and this method no-opped via the
  canSendData/canReceiveData fallthrough — return .false directly
  now so stream.priority({...}) in the window between native
  removeStreamByID and JS _destroy nulling bunHTTP2Session keeps
  matching the pre-PR silent no-op behavior instead of throwing.

Test: added an abort-from-finish-listener regression alongside the
streamCount check. Both pass; all 327 http2 tests still pass.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/runtime/api/bun/h2_frame_parser.zig (1)

2094-2105: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drain the invalid connection-level DATA frame before returning.

The stale-stream branch now advances parser state via discardFramePayload(), but the streamIdentifier == 0 protocol-error arm still returns right after sendGoAway(). That leaves currentFrame / remainingLength pinned to this frame, so if the session stays alive long enough to read again the parser re-enters the same header forever. Route this path through the same discard helper (or otherwise clear the frame state) before returning.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/api/bun/h2_frame_parser.zig` around lines 2094 - 2105, The
protocol-error branch for connection-level DATA frames returns immediately after
calling sendGoAway, leaving currentFrame/remainingLength unchanged and causing
the parser to re-read the same header; change the branch handling
frame.streamIdentifier == 0 to drain the frame payload the same way as the
stale-stream path by invoking discardFramePayload(frame, data) (or otherwise
advance currentFrame/remainingLength) before returning, keeping
sendGoAway(frame.streamIdentifier, ErrorCode.PROTOCOL_ERROR, ...) intact so the
parser state is always advanced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/api/bun/h2_frame_parser.zig`:
- Around line 1235-1250: The fix: add a parser-level reentrancy guard and
pending-removals queue so removeStreamByID does not call
this.streams.fetchRemove during an active StreamResumableIterator walk;
introduce an integer field (e.g. iterating_streams_depth) that is incremented at
the start and decremented at the end of any StreamResumableIterator walk (update
the entry/exit points used by flushStreamQueue, emitAbortToAllStreams,
emitErrorToAllStreams), and add a pending_removals collection (e.g. a small
set/array) to enqueue stream_id when iterating_streams_depth>0; change
removeStreamByID to enqueue the id when depth>0 (avoiding duplicates) and return
immediately, otherwise perform the current fetchRemove/freeResources/destroy
path; after decrementing iterating_streams_depth to zero, run a single sweep
that drains pending_removals and calls fetchRemove for each id (preserving
idempotency and using the same freeResources/destroy logic).

---

Duplicate comments:
In `@src/runtime/api/bun/h2_frame_parser.zig`:
- Around line 2094-2105: The protocol-error branch for connection-level DATA
frames returns immediately after calling sendGoAway, leaving
currentFrame/remainingLength unchanged and causing the parser to re-read the
same header; change the branch handling frame.streamIdentifier == 0 to drain the
frame payload the same way as the stale-stream path by invoking
discardFramePayload(frame, data) (or otherwise advance
currentFrame/remainingLength) before returning, keeping
sendGoAway(frame.streamIdentifier, ErrorCode.PROTOCOL_ERROR, ...) intact so the
parser state is always advanced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b55f8da3-3d94-4e30-a34c-02b13eae18d5

📥 Commits

Reviewing files that changed from the base of the PR and between 3d11150 and 6c2fd1e.

📒 Files selected for processing (2)
  • src/runtime/api/bun/h2_frame_parser.zig
  • test/regression/issue/30415.test.ts

Comment thread src/runtime/api/bun/h2_frame_parser.zig
Comment thread src/runtime/api/bun/h2_frame_parser.zig Outdated
Comment thread src/runtime/api/bun/h2_frame_parser.zig
Comment thread src/runtime/api/bun/h2_frame_parser.zig
…indow

Before this PR, a late stream-scope WINDOW_UPDATE was safe: closed streams
stayed in the map, so handleReceivedStreamID always returned a *Stream for
any nonzero id, and the else-branch only ever fired for streamIdentifier ==
0 (true connection-level updates). With removeStreamByID reclaiming closed
entries, handleReceivedStreamID now returns null for a stale nonzero id
too — and the unchanged else-branch would credit that WINDOW_UPDATE to
this.remoteWindowSize, the connection-level send budget.

Per RFC 7540 §5.1 the peer can legitimately send WINDOW_UPDATE on a stream
it hasn't yet seen us close, so this is routine on the long-lived pooled
sessions this PR is specifically trying to fix. Each stale update leaks
phantom connection credit; Bun then oversends and trips peer
GOAWAY(FLOW_CONTROL_ERROR), killing every concurrent request on the
session.

Gate the connection-level credit on `frame.streamIdentifier == 0` the
same way the other stream-scoped handlers (DATA/HEADERS/CONTINUATION/
PRIORITY/RST) already do; a stale nonzero id silently drops its increment.
handleIncommingPayload already advanced remainingLength/currentFrame above
the branch, so no extra drain is needed here.

Also refresh a stale comment in handleContinuationFrame — its
"HPACK state is not advanced" clause predates discardFramePayload, which
explicitly walks the decoder over header-block fragments to keep the
shared dynamic table in sync.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/runtime/api/bun/h2_frame_parser.zig (1)

2713-2736: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep stale nonzero ids distinguishable from the connection stream.

After this change, handleReceivedStreamID() returns null both for streamIdentifier == 0 and for removed nonzero stream ids. handlePingFrame() and handleGoAwayFrame() still use stream_ == null as the signal for “connection-scoped frame”, so a late PING/GOAWAY on a removed stream now bypasses the RFC-mandated PROTOCOL_ERROR path and gets processed as if it were on stream 0. Either preserve that distinction here, or make those handlers check frame.streamIdentifier != 0 directly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/api/bun/h2_frame_parser.zig` around lines 2713 - 2736,
handleReceivedStreamID currently returns null for both connection-scoped
(streamIdentifier == 0) and stale nonzero stream ids, which makes callers
(handlePingFrame, handleGoAwayFrame) treat late frames on removed streams as
connection-level; fix by changing those handlers to check the frame's numeric id
directly (use frame.streamIdentifier == 0 to detect connection-scoped frames)
instead of relying on stream_ == null, so stale nonzero ids returned as null by
handleReceivedStreamID remain distinguishable from the connection stream; update
handlePingFrame and handleGoAwayFrame to use frame.streamIdentifier (and keep
handleReceivedStreamID behavior unchanged) or alternatively return a distinct
error/sentinel from handleReceivedStreamID (e.g., error.StaleStream) and handle
that in those handlers—prefer the first approach for minimal change.
♻️ Duplicate comments (1)
src/runtime/api/bun/h2_frame_parser.zig (1)

2108-2110: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Drain these protocol-error frames before returning.

These streamIdentifier == 0 branches still send GOAWAY and return without consuming frame.length. That leaves currentFrame/remainingLength stale, so the parser can get stuck re-processing the same invalid frame on the next read instead of advancing past it. Route them through discardFramePayload() after sending GOAWAY, the same way the stale-stream branches do.

Also applies to: 2348-2350, 2420-2422, 2463-2465, 2516-2518

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/api/bun/h2_frame_parser.zig` around lines 2108 - 2110, The
protocol-error branches that detect frame.streamIdentifier == 0 (e.g., the block
calling this.sendGoAway(..., "Data frame on connection stream", ...)) must drain
the invalid frame payload before returning so the parser's
currentFrame/remainingLength advance; after calling this.sendGoAway(...) call
discardFramePayload(frame.length) (same helper used by stale-stream branches)
and then return the consumed length, mirroring the logic used in the other
invalid-stream branches; apply the same change to the other identical branches
at the locations handling Headers/Priority/Push/Ping frames (the blocks noted in
the review).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/runtime/api/bun/h2_frame_parser.zig`:
- Around line 2713-2736: handleReceivedStreamID currently returns null for both
connection-scoped (streamIdentifier == 0) and stale nonzero stream ids, which
makes callers (handlePingFrame, handleGoAwayFrame) treat late frames on removed
streams as connection-level; fix by changing those handlers to check the frame's
numeric id directly (use frame.streamIdentifier == 0 to detect connection-scoped
frames) instead of relying on stream_ == null, so stale nonzero ids returned as
null by handleReceivedStreamID remain distinguishable from the connection
stream; update handlePingFrame and handleGoAwayFrame to use
frame.streamIdentifier (and keep handleReceivedStreamID behavior unchanged) or
alternatively return a distinct error/sentinel from handleReceivedStreamID
(e.g., error.StaleStream) and handle that in those handlers—prefer the first
approach for minimal change.

---

Duplicate comments:
In `@src/runtime/api/bun/h2_frame_parser.zig`:
- Around line 2108-2110: The protocol-error branches that detect
frame.streamIdentifier == 0 (e.g., the block calling this.sendGoAway(..., "Data
frame on connection stream", ...)) must drain the invalid frame payload before
returning so the parser's currentFrame/remainingLength advance; after calling
this.sendGoAway(...) call discardFramePayload(frame.length) (same helper used by
stale-stream branches) and then return the consumed length, mirroring the logic
used in the other invalid-stream branches; apply the same change to the other
identical branches at the locations handling Headers/Priority/Push/Ping frames
(the blocks noted in the review).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3215f439-762d-4ac1-b0d2-f32c90490267

📥 Commits

Reviewing files that changed from the base of the PR and between 6c2fd1e and 90fbed2.

📒 Files selected for processing (1)
  • src/runtime/api/bun/h2_frame_parser.zig

robobun added 2 commits May 9, 2026 04:53
…dowSize

Two ASAN use-after-poison paths tripped by CI, both following the same
pattern: we cache a *Stream pointer, dispatch to JS (or call sendGoAway,
which dispatches onError/onEnd), the JS handler destroys the session
and reclaims every stream via emitErrorToAllStreams →
removeAllClosedStreams, and then we touch the freed pointer.

1. handleDataFrame wrote stream.padding = null *before* re-resolving the
   stream id — even though the surrounding block already re-resolved
   when emitted was true. Move the re-resolve to cover the padding
   write so the post-dispatch access is safe. Caught by
   test-http2-compat-aborted.js / test-http2-compat-errors.js /
   test-http2-compat-serverresponse-close.js /
   test-http2-multistream-destroy-on-read-tls.js /
   test-http2-server-errors.js.

2. adjustWindowSize subtracted payloadSize back from s.usedWindowSize
   *after* calling sendGoAway on a flow-control violation. The session
   teardown frees s before the subtract runs. Roll back first, then
   dispatch; bail from the connection-level branch so we never read
   stream. handleDataFrame's caller also re-resolves the stream pointer
   via streams.get(stream_id) after adjustWindowSize and returns early
   if it's gone. Caught by test-http2-misbehaving-flow-control.js /
   test-http2-misbehaving-flow-control-paused.js.

All 8 previously-failing x64-asan tests now pass locally; the 328 Bun
http2 tests still pass.
The streams-rehash test (PR #29765 regression for pointer stability
across hashmap rehash under re-entrant JS) sends 128 inner GET requests
from inside the outer POST's options getter. Because the inner requests
execute synchronously during options processing, their HEADERS frames
end up ahead of the outer POST's HEADERS on the wire — so the server
sees ids 3, 5, 7, ..., N, then id 1.

Pre-PR Bun server tolerated that: handleReceivedStreamID created a
fresh stream for any id not in the map. After this PR's stale-detection
change (id <= lastStreamID -> null), the server rejected the late id 1
as stale and the outer POST never got a response — the test hung.

That behavior matches RFC 7540 §5.1.1 (new stream ids MUST increase),
but #30415's leak is client-side only (AWS SDK pooled http2.connect
sessions). Restrict the stale-id gate to the client side so server-side
behavior stays pre-PR. Server-side stream removal still runs on close
— a late frame for a removed server stream just reallocates one entry,
which the next close reclaims.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 src/runtime/api/bun/h2_frame_parser.zig:3410-3413 — nit: closed_state_number = 7 duplicates the explicit CLOSED = 7 value from Stream.state's anonymous enum (line ~758) as a magic number, while the live-stream branch right below correctly uses @intFromEnum(stream.state). Consider @intFromEnum(@as(@FieldType(Stream, "state"), .CLOSED)) (or hoisting the enum to a named StreamState type) so the two branches can't drift.

    Extended reasoning...

    What the issue is

    The synthesized closed-state branch in getStreamState hardcodes the numeric value of the CLOSED state:

    const closed_state_number: u8 = 7; // Stream.state enum value for CLOSED
    var closed_state = jsc.JSValue.createEmptyObject(globalObject, 6);
    ...
    closed_state.put(globalObject, jsc.ZigString.static("state"), jsc.JSValue.jsNumber(closed_state_number));

    The live-stream branch immediately below derives the same value from the type system via @intFromEnum(stream.state). So one path is type-safe and the other duplicates a constant in a comment-coupled literal.

    Why it ended up this way

    Stream.state is declared with an anonymous inline enum (state: enum(u8) { IDLE = 1, ..., CLOSED = 7 } = .IDLE), so there's no StreamState.CLOSED to reference directly. The author hardcoded 7 and documented the coupling with a comment. That works today — the enum has explicit values, and these match Node.js / nghttp2's NGHTTP2_STREAM_STATE_CLOSED = 7 constant that JS consumers compare against, so they're effectively ABI-stable.

    Why it's still worth flagging

    Even with explicit assignments, duplicating the value in two places creates a silent coupling that the compiler won't enforce. If anyone ever edits the explicit CLOSED = 7 assignment (unlikely, but the comment is the only thing tying them together), this branch would silently report the wrong state to JS while the live-stream branch would stay correct. Note that mere reordering of enum members wouldn't break it (the values are explicit, not positional) — only a deliberate edit to the assigned value would.

    Step-by-step proof

    1. Stream.state is declared at ~line 758 as state: enum(u8) { ..., CLOSED = 7 } = .IDLE.
    2. getStreamState's removed-stream branch (line 3410) sets closed_state_number = 7 and returns it as state.
    3. The live-stream branch (line ~3423) returns @intFromEnum(stream.state).
    4. Today both yield 7 for a closed stream. If someone edits the enum's explicit value, (3) updates automatically; (2) does not, and there's no compile-time link between them.

    Impact

    None today — purely a maintenance/style nit. The value is correct, the comment documents the coupling, and the underlying constant is effectively pinned by Node compatibility. Hence nit severity.

    Suggested fix

    Either:

    • @intFromEnum(@as(@FieldType(Stream, "state"), .CLOSED)) to derive it from the anonymous enum without naming it, or
    • Extract the enum to a named const StreamState = enum(u8) { ... }; and use @intFromEnum(StreamState.CLOSED) here (and state: StreamState = .IDLE on the field), which also makes the other @intFromEnum(stream.state) call sites a bit more self-documenting.

Comment thread src/runtime/api/bun/h2_frame_parser.zig Outdated
Comment thread src/runtime/api/bun/h2_frame_parser.zig Outdated
Style nit. Use @intFromEnum(@FieldType(Stream, "state").CLOSED) so the
removed-stream branch of getStreamState can't drift if the enum's
explicit = 7 assignment is ever edited.
Comment thread src/runtime/api/bun/h2_frame_parser.zig
Comment thread src/runtime/api/bun/h2_frame_parser.zig Outdated
robobun added 2 commits May 9, 2026 06:35
Claude's review caught a real bug:  was running on
both client and server, but 's stale-id guard
only applied to the client. A late frame (WINDOW_UPDATE, DATA, RST)
arriving on the server for a stream we'd already removed would fall
through, hit the new-stream path, and synthesize a phantom Stream +
dispatch onStreamStart again — a duplicate JS  for
an id the application had already finished handling. RFC 7540 §5.1
explicitly permits those late frames, so this is a routine regression
on long-lived server sessions, not an adversarial case.

Gating 's stale-detection on the server is
also tricky: Bun's own client emits child HEADERS ahead of the parent's
HEADERS when a request() options getter re-enters (see
node-http2-streams-rehash), so wire-order ids on the server can
legitimately drop below the high-water mark. A
gate was tried, but by the time the out-of-order outer POST arrives
the inner children have already been responded-to and removed, so the
gate would still drop the outer stream as stale.

Simplest pragmatic fix: disable  on the server. The
#30415 leak is strictly client-side (AWS SDK pooled
sessions); server-side unbounded map growth is the pre-PR status quo
and out of scope here. The matching guard in
stays client-only so a server-side id miss still creates a fresh
stream, matching pre-PR behavior.
Follow-up to the client-side stale-id gate. handlePingFrame and
handleGoAwayFrame used "stream_ != null" as shorthand for
"streamIdentifier != 0" to detect the RFC 7540 §6.7/§6.8 PROTOCOL_ERROR
case. That worked pre-PR because handleReceivedStreamID always returned
non-null for any nonzero id (it created a fresh Stream for unknown
ids). With the new client-side stale-id gate, handleReceivedStreamID
can now return null for nonzero ids that used to be in the map. Those
PING/GOAWAY frames would sneak past the protocol-error check and be
processed as connection-level. Check the wire field directly so the
check no longer relies on handleReceivedStreamID's return value.

Also refresh the stale comment on flushQueue's close path: the
post-iteration sweep via defer removeAllClosedStreams runs once when
flushStreamQueue returns, not per iteration of its outer while loop.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🔴 src/runtime/api/bun/h2_frame_parser.zig:1415-1421endStream sets stream.state = .CLOSED (L1417) then calls stream.freeResources (L1421), whose cleanQueue invokes dispatchWriteCallback for each queued DATA frame — if that JS callback synchronously calls session.destroy(), emitErrorToAllStreamsremoveAllClosedStreams() (newly added at L4273) finds this already-CLOSED stream still in the map and bun.destroys it, after which freeResources reads this.signal (L1228) and endStream reads @intFromEnum(stream.state) (L1423) on freed memory. This is the same dispatch-then-dereference hazard already fixed in flushQueue/sendData; the simplest fix is to fetchRemove the entry from the map before calling freeResources (so a re-entrant sweep can't find it) and replace L1423's stream.state read with the constant .CLOSED. abortStream shares the freeResources-internal hazard at L1389.

    Extended reasoning...

    What the bug is

    endStream (h2_frame_parser.zig:1395-1430) and abortStream (L1364-1393) both call stream.freeResources(this, false) while the stream is still present in this.streams and already marked .CLOSED. freeResourcescleanQueue (L1204-1223) iterates the stream's pending dataFrameQueue and, for each queued frame, calls client.dispatchWriteCallback(callback_value) at L1218 — which synchronously runs the JS write callback via vm.eventLoop().runCallback. If that callback (or a synchronous listener it triggers) calls session.destroy(), the new removeAllClosedStreams() pre-loop sweep added to emitErrorToAllStreams at L4273 finds this stream (.CLOSED, still mapped) and calls removeStreamByIDbun.destroy(stream). Control then returns to the outer freeResources at L1228 (if (this.signal)) and to endStream at L1423 (@intFromEnum(stream.state)) — both reading a freed *Stream.

    The specific code path that triggers it

    ClientHttp2Session.destroy (src/js/node/http2.ts:3887) is the relevant re-entry point: it calls parser.emitErrorToAllStreams(code) synchronously. The native emitErrorToAllStreams (L4265-4289) now calls this.removeAllClosedStreams() at L4273 before the iterator loop. removeAllClosedStreams (L4296-4308) walks this.streams, collects every .CLOSED id, and calls removeStreamByID(id) for each. removeStreamByID (L1246-1269) is gated to !isServer at L1257, then fetchRemoves the entry, calls freeResources (idempotent — the queue was already swapped to {}), and bun.destroy(stream) at L1267.

    The trigger requires: (a) client-side (removeStreamByID is a no-op on the server), (b) the stream has at least one queued DATA frame when endStream/abortStream runs (i.e. it was backpressure- or flow-control-limited), and (c) the queued frame's write callback synchronously tears down the session. Narrow but legal — e.g. error-handling code that calls session.destroy() from a write callback once it sees the request was reset.

    Why existing code doesn't prevent it

    This is exactly the dispatch-then-dereference hazard the PR already identified and fixed in flushQueue (L944-954, "Look up by id after dispatch rather than trusting this") and sendData (L3634-3658). Both of those re-resolve via client.streams.get(stream_id) after dispatchWriteCallback. But the freeResourcescleanQueue path inside endStream/abortStream was not given the same treatment. Pre-PR this latent pattern was harmless because emitErrorToAllStreams never bun.destroy'd anything mid-session; the new removeAllClosedStreams() calls at L4273/L4288 are what make it observable, so this is a regression introduced by this PR.

    The PR author's own response to the earlier iterator-safety thread states "emitAbortToAllStreams / emitErrorToAllStreams dispatch to JS handlers that call stream.destroy()_destroysetImmediate(rstNextTick) — async". That's true for the iterator body's dispatches, but here the entry into emitErrorToAllStreams is from the opposite direction: native → JS write callback → session.destroy() → native emitErrorToAllStreamsremoveAllClosedStreams. No setImmediate is involved on this path.

    Step-by-step proof

    1. Client stream 7 has one DATA frame queued (flow-control window exhausted). Server sends RST_STREAM(7). handleRSTStreamFrame is unaffected here, but consider the path where JS calls stream.close() → native rstStreamendStream(stream, .NO_ERROR).
    2. endStream L1417: stream.state = .CLOSED. L1418: stream_id = 7. Stream 7 is still in this.streams.
    3. L1421: stream.freeResources(this, false) → L1227 cleanQueue: swaps dataFrameQueue to a local at L1207-1208, dequeues the one frame, L1218 client.dispatchWriteCallback(callback_value) → synchronously runs the user's write callback in JS.
    4. Inside that callback, user code calls session.destroy(). http2.ts:3887 → native emitErrorToAllStreams.
    5. L4273: this.removeAllClosedStreams(). The walk at L4301-4306 finds stream 7 (state == .CLOSED, still in map). L4307: removeStreamByID(7).
    6. removeStreamByID: L1257 !isServer ✓. L1258 fetchRemove(7) succeeds. L1266 inner stream.freeResources (queue already empty, may null signal). L1267 bun.destroy(stream) — heap freed.
    7. Control unwinds back to outer cleanQueue's while loop (operates on local queue/client, safe). Loop ends, returns to outer freeResources L1228: if (this.signal)this is the freed *Stream. First UAF read.
    8. freeResources returns. endStream L1423: @intFromEnum(stream.state) on the freed *Stream. Second UAF read.

    abortStream (L1389) shares step 3-7; its post-freeResources dispatch at L1390 uses captured old_state so the L1390 argument is safe, but the freeResources-internal L1228 read is the same UAF.

    Impact

    Heap-use-after-free on the client-side node:http2 stream-close path. ASAN builds will crash; release builds read (and in freeResources may write, via this.signal = null in the inner call before the outer reaches L1228) freed memory. This is a memory-safety regression introduced by this PR's removeAllClosedStreams() sweeps.

    How to fix

    Two complementary one-line changes:

    1. In endStream, replace L1423's @intFromEnum(stream.state) with the constant @intFromEnum(@as(@FieldType(Stream, "state"), .CLOSED)) (or capture it into a local before freeResources) — the value is always .CLOSED at that point anyway.
    2. In both endStream and abortStream, fetchRemove the entry from this.streams before calling freeResources, so a re-entrant removeAllClosedStreams cannot find it. The trailing removeStreamByID(stream_id) then becomes a defensive no-op (already handled by its fetchRemove-miss path), or can be replaced with the inline bun.destroy(stream) after dispatch.

    Alternatively, mirror the flushQueue/sendData pattern inside cleanQueue itself: capture stream_id before the dispatch loop and re-resolve via client.streams.get(stream_id) before the caller continues — but since cleanQueue's only post-dispatch caller-side read is freeResources L1228, removing the entry from the map first is simpler and covers every caller.

Comment thread src/runtime/api/bun/h2_frame_parser.zig
Comment thread src/runtime/api/bun/h2_frame_parser.zig Outdated
robobun added 2 commits May 9, 2026 07:07
Claude flagged a real UAF path this PR introduced: endStream and
abortStream set state = .CLOSED, then call freeResources. Inside
freeResources, cleanQueue dispatches queued DATA write callbacks to
JS. If user code synchronously calls session.destroy(), that reaches
emitErrorToAllStreams → removeAllClosedStreams, which now finds this
already-.CLOSED stream in the map and bun.destroy()s it. When control
returns to freeResources it reads this.signal on freed memory; the
post-dispatch @intFromEnum(stream.state) in endStream is a second UAF.

Unlink the stream from the map *before* calling freeResources so the
re-entrant sweep cannot find it. Add a detachStreamFromMap helper that
does the fetchRemove half of removeStreamByID, then destroyDetachedStream
to bun.destroy once every post-dispatch access completes. endStream
also swaps the post-freeResources @intFromEnum(stream.state) for the
constant .CLOSED (which it just was) so the read doesn't depend on
`stream` outliving dispatch.
Two nits from review:
- removeAllClosedStreams had no matching server-side early-return, so
  flushStreamQueue on a long-lived server session kept walking the map
  and heap-allocating an O(N) id list per flush that the no-op
  removeStreamByID immediately discarded.
- max_removed_stream_id was added for a stale-id detection approach
  that never landed; since handleReceivedStreamID uses lastStreamID for
  the client-side gate, the field is dead weight. Delete it.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🔴 src/runtime/api/bun/h2_frame_parser.zig:1393setStreamPriority (and sendTrailers) capture *Stream from the map and then call options.get(globalObject, "weight"/"parent"/"exclusive"/"silent") / item.toJSString(...) — user-controlled JS that can synchronously abort the attached AbortSignalSignalRef.abortListenerabortStreamremoveStreamByIDbun.destroy(stream) — and afterwards write stream.streamDependency/exclusive/weight (L3544-3546) and read stream.id on freed heap. The PR audited the explicit dispatch* re-entry points (sendData, flushQueue, handleDataFrame) but missed these implicit re-entries via property access on user objects; add the same stream = this.streams.get(stream_id) orelse return .false; re-resolve after the last options.get()/header-iteration call.

    Extended reasoning...

    What the bug is

    This PR changes the lifetime invariant of *Stream: previously the doc comment at L2560 said "stable for the lifetime of this H2FrameParser" (now rewritten at L2757-2758 to "stable until the stream closes"), and abortStream/endStream never freed the heap allocation. Now abortStream (L1393) and endStream (L1430) call removeStreamByIDbun.destroy(stream), and SignalRef.abortListener (L811-817) invokes abortStream synchronously when user code calls AbortController.abort() — the PR's own comment at L3645-3650 in sendData and the second regression test in 30415.test.ts both rely on this synchronous path.

    setStreamPriority (L3493-3568) and sendTrailers (L3838-4069) each capture a *Stream from the map, then perform user-controlled JS property access on argument objects, then continue dereferencing — and writing to — the captured pointer. If the user JS aborts the attached signal mid-call, the pointer is freed before those accesses run.

    Code path that triggers it (setStreamPriority)

    src/js/node/http2.ts:2096 passes the user's options object straight through with no copy/sanitization:

    session[bunHTTP2Native]?.setStreamPriority(this.#id, options);

    In h2_frame_parser.zig:

    • L3493: var stream = this.streams.get(stream_id) orelse return .false; — captures the live *Stream.
    • L3497: confirms the stream is open (canSendData/canReceiveData), so abortListener's state != .CLOSED check (L815) will not short-circuit.
    • L3509-3537: four calls to options.get(globalObject, "weight"/"parent"/"exclusive"/"silent"). JSValue.get is documented (src/jsc/JSValue.zig:1521) as "Equivalent to target[property]. Calls userland getters/proxies". Any of these can run a user-defined accessor.
    • Inside that accessor, user calls ac.abort() for the AbortSignal attached at request({signal}) time. signal.listen() (L1195) registered SignalRef.abortListener as a native listener that fires synchronously inside abort(). It checks state != .CLOSED (true) → parser.abortStream(stream, ...)removeStreamByID(stream_id)bun.destroy(stream).
    • Control returns to setStreamPriority. L3539 reads stream.id; L3544-3546 write stream.streamDependency, stream.exclusive, stream.weight; L3550-3561 read stream.exclusive/streamDependency/weight/id. All on freed heap memory.

    sendTrailers has the identical shape: L3838 captures *Stream, then header iteration at L3878-3992 calls item.toJSString(globalObject) (invokes toString()/Symbol.toPrimitive on object header values) and sensitive_arg.getTruthyPropertyValue(...) (Proxy trap). After iteration, L4011/4030/4048/4057-4068 read and write stream.* without re-resolving. (Note: http2.ts:2041 spreads {...headers}, which neutralizes outer-object getters, but header values pass through unchanged so toJSString on an object value still re-enters.)

    Why existing safeguards don't prevent it

    • endStream's if (stream.state == .CLOSED) return and abortListener's if (stream.state != .CLOSED) don't help — the stream is open at the re-entry point (L3497 just confirmed it).
    • The PR carefully added re-resolve-after-dispatch to sendData, flushQueue, and handleDataFrame/adjustWindowSize — but those audits targeted explicit dispatch*/dispatchWriteCallback call sites. options.get and toJSString are implicit JS re-entry via property access on user-supplied objects and were missed.
    • This is client-side only (removeStreamByID early-returns on isServer per commit 0341833), but stream.priority(opts) and stream.sendTrailers(headers) are reachable on client streams via the public Node.js http2 API.

    Step-by-step proof

    const ac = new AbortController();
    const req = client.request({ ':path': '/' }, { signal: ac.signal });
    req.on('response', () => {
      req.priority({
        get weight() { ac.abort(); return 16; },  // ← frees the native *Stream
      });
    });
    1. request({signal}) attaches ac.signal via SignalRef.setsignal.listen(...) (native synchronous listener).
    2. req.priority(opts) → native setStreamPriority(id, opts). L3493: stream = streams.get(id) (open).
    3. L3509: options.get(globalObject, "weight") → invokes the weight getter → ac.abort().
    4. Synchronously: SignalRef.abortListenerstate != .CLOSEDabortStream(stream, ...) → sets state = .CLOSED, dispatches onAborted, then removeStreamByID(id)fetchRemove + bun.destroy(stream).
    5. Getter returns 16. Back in setStreamPriority: L3519/3528/3532 run more options.get (harmless), then L3539 reads stream.id (UAF read), L3544 writes stream.streamDependency = parent_id (UAF write), L3545-3546 more UAF writes, L3550-3561 more UAF reads.

    ASAN crashes; release builds silently corrupt freed heap (writes!) — exploitable-class behavior reachable from standard node:http2 API surface, even if the trigger pattern is unusual.

    Impact

    Heap-use-after-free with writes to freed memory, introduced by this PR (pre-PR the captured pointer was valid for the session lifetime by construction). Requires a getter/toString on a user-supplied options/headers object that aborts the attached signal — unusual but valid, and exactly the kind of pattern fuzzers and exploit code target.

    How to fix

    Mirror the pattern already applied in sendData's defer: after the last user-controlled JS call, re-resolve by id and bail if gone.

    // after the last options.get(...) at L3537:
    stream = this.streams.get(stream_id) orelse return .false;
    if (parent_id == stream.id) { ... }
    stream.streamDependency = parent_id;
    ...

    For sendTrailers, re-resolve after the header-iteration loop and before L4011 (or read all needed stream.* fields into locals before iteration begins).

…ilers

Claude caught another UAF class this PR introduced. setStreamPriority
and sendTrailers hand a user-controlled options / headers object
directly to native code and then read/write stream.* afterwards. The
options.get(globalObject, "weight"/"parent"/...) / item.toJSString() /
sensitive_arg.getTruthyPropertyValue() calls each invoke userland JS
(property getters, Proxy traps, Symbol.toPrimitive). If that JS calls
AbortController.abort() on the stream signal, SignalRef.abortListener
runs inline -> abortStream -> removeStreamByID -> bun.destroy(stream).
The subsequent stream.* writes then land on freed heap.

Reproduced under ASAN: a weight getter that calls ac.abort() crashes
in setStreamPriority at stream.id read. Fix mirrors the
sendData/flushQueue/handleDataFrame pattern: re-resolve
this.streams.get(stream_id) after the last user JS call and bail if
the stream is gone. Added a third regression test that exercises the
priority({weight: getter() { ac.abort() }}) path.
Comment thread src/runtime/api/bun/h2_frame_parser.zig
…HTTP/1 timeout, intermittent ThreadLock panic)
Comment thread src/runtime/api/bun/h2_frame_parser.zig
Comment thread src/runtime/api/bun/h2_frame_parser.zig Outdated
Comment thread src/runtime/api/bun/h2_frame_parser.zig
Comment thread test/regression/issue/30415.test.ts Outdated
robobun added 3 commits May 9, 2026 09:17
…eamFrame

Two follow-ups from review:

1. `sendTrailers`'s two in-loop `encodeHeaderIntoList` catch branches
   used the captured `*Stream` pointer without re-resolving. A trailer
   whose value object has a malicious Symbol.toPrimitive on an earlier
   iteration can abort the attached AbortSignal → abortListener →
   abortStream → bun.destroy(stream), and a later iteration's encode
   failure would then write `stream.state = .CLOSED` to freed heap.
   Re-resolve via `this.streams.get(stream_id) orelse return` at the
   top of each catch, mirroring the post-loop fix.

2. `handleRSTStreamFrame` still did `stream.freeResources` while the
   stream was in the map, then dispatched and called `removeStreamByID`.
   `freeResources` → `cleanQueue` dispatches queued DATA write
   callbacks; a user callback that calls `session.destroy()` reaches
   `emitErrorToAllStreams` → `removeAllClosedStreams` and frees this
   stream before the outer dispatch runs. Mirror the endStream/abortStream
   detach-then-destroy pattern: `detachStreamFromMap` first, then
   `freeResources`, dispatch, `destroyDetachedStream`.
… in request()

Two more UAFs from claude's review that follow the same re-entry patterns
we already fixed elsewhere:

1. `emitErrorToAllStreams` / `emitAbortToAllStreams` had the
   `freeResources`-while-still-in-map hazard that commit b6fe0bc
   fixed for abortStream/endStream. `cleanQueue` in `freeResources`
   synchronously dispatches queued DATA write callbacks; a user
   callback that calls `session.destroy()` reaches the ClientHttp2Session
   guard with `#parser` still set (nulled only after
   `emitErrorToAllStreams` returns), re-enters `emitErrorToAllStreams`,
   and its leading `removeAllClosedStreams()` finds this already-CLOSED
   stream in the map and frees it, leaving the outer
   `freeResources` reading `this.signal` on freed heap. Detach before
   freeResources and destroy after dispatch, mirroring endStream/
   abortStream.

2. `request()` had the options-getter UAF that commit 9bac2c2 fixed
   for setStreamPriority / sendTrailers. For payload methods
   (POST/PUT/etc.), http2.ts passes the user's raw options object to
   native, so any of the `options.get(paddingStrategy/waitForTrailers/
   exclusive/parent/weight/signal)` calls can invoke a getter that
   calls `session.destroy()` → emitErrorToAllStreams → bun.destroy
   (client-side, no isServer gate). Re-resolve `stream` from the
   streams map before each subsequent `stream.*` access and after the
   options block, bailing out with the stream_id if the stream was
   destroyed. The AbortSignal vector doesn't apply here (signal isn't
   attached until the last getter), but session.destroy() does.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 src/runtime/api/bun/h2_frame_parser.zig:4666-4670 — nit: bad_parent_id holds stream.id (the stream's own id), not the rejected parent value — the name is misleading inside a branch that just validated parent. It's also redundant: stream_id is already in scope and is what the sibling error branches you added in this same diff (L4576/4605/4686/4698/4729/4753) use for the identical removeStreamByID(stream_id); return jsc.JSValue.jsNumber(stream_id); pattern. Suggest dropping the local and using stream_id here for consistency. No runtime effect.

    Extended reasoning...

    What the issue is

    In request()'s invalid-parent error branch (h2_frame_parser.zig:4664-4671), this PR introduces:

    if (parent <= 0 or parent > MAX_STREAM_ID) {
        stream.state = .CLOSED;
        stream.rstCode = @intFromEnum(ErrorCode.INTERNAL_ERROR);
        const bad_parent_id = stream.id;
        this.dispatchWithExtra(.onStreamError, stream.getIdentifier(), jsc.JSValue.jsNumber(stream.rstCode));
        this.removeStreamByID(bad_parent_id);
        return jsc.JSValue.jsNumber(bad_parent_id);
    }

    Two minor problems with the local bad_parent_id:

    1. The name is misleading. Inside a branch guarded by if (parent <= 0 or parent > MAX_STREAM_ID), a reader naturally assumes bad_parent_id holds the rejected parent id (i.e., the value of parent). It's actually stream.id — the id of the stream being created, which is what gets removed and returned. A future reader auditing this branch would be momentarily confused.

    2. It's redundant. stream_id is declared at the top of request() (L4432, const stream_id: u32) and is in scope here. Every other error branch this PR adds to request() — at L4538, L4576, L4605, L4686, L4698, L4729, and L4753 — uses stream_id directly for the identical removeStreamByID(stream_id); return jsc.JSValue.jsNumber(stream_id); pattern. This branch is the only one that introduces a fresh local.

    Step-by-step proof

    1. request() declares const stream_id: u32 = ... at L4432, captured from the JS argument.
    2. handleReceivedStreamID(stream_id) returns the *Stream whose .id == stream_id (L4436+).
    3. At L4667, stream.id == stream_id — they hold the same value.
    4. The sibling error branch immediately below (L4682-4687, the invalid-weight check) uses the existing pattern: this.removeStreamByID(stream_id); return jsc.JSValue.jsNumber(stream_id); — no fresh local.
    5. So bad_parent_id is functionally equivalent to stream_id, just with a name that suggests it holds something else.

    Why this ended up here

    The pre-PR code returned jsc.JSValue.jsNumber(stream.id) at this spot (vs. stream_id in the siblings — a minor pre-existing inconsistency). The PR author needed to capture the id before dispatchWithExtra (defensively, since dispatch can re-enter JS) and presumably reached for a fresh local rather than noticing stream_id was already available. The chosen name reflects the branch context ("bad parent") rather than what the variable holds (the stream's own id).

    Impact

    None at runtime — stream_id and stream.id hold the same value at this point, so behavior is identical. This is purely a naming/consistency tidy: the local breaks the pattern this PR itself establishes across the six sibling error branches, and the name actively misleads. Worth a one-line cleanup if you're touching this function anyway; should not block the merge.

    Suggested fix

    if (parent <= 0 or parent > MAX_STREAM_ID) {
        stream.state = .CLOSED;
        stream.rstCode = @intFromEnum(ErrorCode.INTERNAL_ERROR);
        this.dispatchWithExtra(.onStreamError, stream.getIdentifier(), jsc.JSValue.jsNumber(stream.rstCode));
        this.removeStreamByID(stream_id);
        return jsc.JSValue.jsNumber(stream_id);
    }

`bad_parent_id` held `stream.id` (the stream we're creating), not the
rejected parent value the name suggests. `stream_id` is already in
scope and is what every other error branch in this function uses for
the same `removeStreamByID(stream_id); return jsc.JSValue.jsNumber(stream_id);`
pattern. Drop the misleading local.
Comment thread src/runtime/api/bun/h2_frame_parser.zig Outdated
The prior rationale ("removing in-place during a StreamResumableIterator
walk could disturb the iterator, so terminal paths batch the removal")
was contradicted by 702defa, which made emitAbort/emitErrorToAllStreams
fetchRemove directly inside the iterator loop. mid-iteration removal
is safe — std.HashMap.fetchRemove only tombstones — so the comment now
correctly points at flushStreamQueue's defer as the real reason for
the batching and frames the emitError/Abort calls as belt-and-
suspenders sweeps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native RSS grows linearly under sustained AWS SDK v3 Kinesis GetRecords; identical Node 22 workload is flat

1 participant