Skip to content

Fix GC use-after-free of BuildMessage wrappers when building AggregateErrors#31874

Merged
Jarred-Sumner merged 2 commits into
mainfrom
farm/bf9d35b4/fix-process-fetch-log-gc-uaf
Jun 5, 2026
Merged

Fix GC use-after-free of BuildMessage wrappers when building AggregateErrors#31874
Jarred-Sumner merged 2 commits into
mainfrom
farm/bf9d35b4/fix-process-fetch-log-gc-uaf

Conversation

@robobun

@robobun robobun commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

What

Fixes a GC use-after-free found by fuzzing (ASAN fingerprint heap-use-after-free, read of size 1 at offset 144 inside a 152-byte region: that is exactly BuildMessage.logged inside a freed Box<BuildMessage>, confirmed via DWARF layout).

Root cause

JSC roots JSValues through the conservative stack scan. Values that live only in a malloc'd Vec<JSValue> buffer are invisible to the GC.

process_fetch_log (module fetch failure path) created one BuildMessage/ResolveMessage wrapper per log message and pushed each into a heap Vec<JSValue> before handing the slice to create_aggregate_error. Every create call in that loop allocates JS cells, so a collection can run mid-loop; the wrappers created in earlier iterations have no reachable reference at that point, get swept, and their finalizers free the native BuildMessage. Anything that touches the stale cells afterwards (printing the unhandled rejection, reading .message off error.errors[i]) reads freed memory. The original Zig implementation used an on-stack var errors_stack: [256]JSValue specifically so the conservative scan would root these; the port replaced it with a heap Vec.

The fixing lines are the Vec<JSValue> to [JSValue; 256] changes in src/jsc/VirtualMachine.rs (process_fetch_log) and src/jsc/lib.rs (LogJsc::to_js, same pattern, also feeds create_aggregate_error). ast_jsc::log_to_js already used the on-stack array and documents why.

Same bug shape in bake/production.rs: css_chunk_js_strings held freshly created JSStrings in a heap Vec across preload_bundled_module calls (which evaluate JS modules). That one is unbounded in length, so each value is kept protect()ed (RAII ProtectedJSValue) until consumed. An audit of the other Vec<JSValue> sites in the tree found them safe (values rooted elsewhere via MarkedArgumentBuffer, protect(), Strong, or no allocation while held).

Reproduction

A module with 257 duplicate const declarations produces 256 log messages. Importing it 16 times with BUN_JSC_slowPathAllocsBetweenGCs=100 crashed the unfixed ASAN debug build 6/6 runs (ASAN UAF or ASSERTION FAILED: isSymbol() from swept-and-reused cells, depending on timing). The fuzzer hit the same window on a Worker thread with its Bun.gc(true) suffix.

Tests

test/js/bun/resolve/build-error.test.ts: "import with many build errors keeps AggregateError entries alive across GC". Fails on the unfixed ASAN debug build (child aborts, 6/6), passes on the fixed build (15/15 runs). It passes under a release build either way since the UAF needs ASAN (or unlucky reuse) to become observable; the ASAN CI suites are the enforcing ones. test/js/bun/transpiler/transpiler-error-gc-uaf.test.ts covers the LogJsc::to_js site directly: Bun.Transpiler().transformSync with 256 parse errors under BUN_JSC_useZombieMode and BUN_JSC_collectContinuously crashes the unfixed ASAN build (zapped StructureID while marking the errors array) and passes on this branch. Existing plugins.test.ts, transpiler.test.js, dev-and-prod.test.ts (5 PROD variants) and framework-router.test.ts pass.

Linked issue

Fixes #23181

That report is the same scenario: a loop of dynamic imports over files where some fail with build errors, with a try/catch around each. A script mirroring its extract-messages.ts shape (sequential await import() of modules with multiple build errors, catch block touching the error) crashes the unfixed ASAN build 3/3 runs under GC pressure and completes cleanly 3/3 on this branch. The original report was against v1.2.23, whose implementation of this path differed, so the exact 1.2.23 mechanism can't be re-verified, but the reported workload no longer crashes on current Bun with this change.

Supersedes #30671

That earlier PR fixed the same two sites (process_fetch_log, LogJsc::to_js) but is now conflicting with main after the comment rewrites in #31783. This PR carries the same changes rebased on current main, absorbs its transpiler regression test, and additionally fixes the same unrooted-heap-Vec shape in bake's production build.

…gregateErrors

JSC roots JSValues on the native stack via the conservative scan, but a
heap-allocated Vec buffer is invisible to it. process_fetch_log and
LogJsc::to_js accumulated freshly created BuildMessage/ResolveMessage
wrappers in a Vec<JSValue> while subsequent create calls allocated more
cells; a GC triggered mid-loop could sweep the earlier wrappers and run
their finalizers, freeing the native BuildMessage that later reads
(error printing, .message getters) still dereference. Use the on-stack
[JSValue; 256] array for both, matching ast_jsc::log_to_js.

bake's production build had the same shape for css_chunk_js_strings
(fresh JSStrings in a heap Vec, consumed after module evaluation);
protect each value for the duration instead.

Found by fuzzing: ASAN heap-use-after-free, read of size 1 at offset
144 of a 152-byte region (BuildMessage.logged read on a finalized
BuildMessage).
@github-actions github-actions Bot added the claude label Jun 5, 2026
@robobun

robobun commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Processing dynamic imports with errors crashes Bun instead of throwing errors #23181 - Processing dynamic imports with errors crashes Bun instead of throwing errors — this PR fixes the GC use-after-free in process_fetch_log where BuildMessage/ResolveMessage wrappers in a heap Vec get collected during iteration, which is the code path handling module import failures with multiple build errors

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #23181

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 400715b4-5ebc-4837-bf69-05307e41d3d1

📥 Commits

Reviewing files that changed from the base of the PR and between 9157137 and 1265614.

📒 Files selected for processing (1)
  • test/js/bun/transpiler/transpiler-error-gc-uaf.test.ts

Walkthrough

This PR hardens garbage collection safety during error and asset aggregation by replacing heap-allocated temporary arrays with fixed stack buffers and protected JS values. Error aggregation in two code paths now uses on-stack [JSValue; 256] arrays, and CSS chunk URLs are explicitly protected during SSG rendering. A new regression test validates stability under allocation pressure.

Changes

GC Safety for Error Aggregation and CSS Chunk URLs

Layer / File(s) Summary
Multi-error aggregation stack arrays
src/jsc/VirtualMachine.rs, src/jsc/lib.rs
Both process_fetch_log and LogJsc::to_js replace heap-allocated Vec<JSValue> with fixed on-stack [JSValue; 256] arrays, capping error aggregation to 256 entries and passing stack slices to create_aggregate_error.
Protected CSS chunk URLs in SSG rendering
src/runtime/bake/production.rs
CSS chunk URL JS values are wrapped in jsc::ProtectedJSValue during accumulation and unwrapped via .value() at the three call sites where they are inserted into route style arrays.
GC-stability regression test
test/js/bun/resolve/build-error.test.ts, test/js/bun/transpiler/transpiler-error-gc-uaf.test.ts
Test harness utilities are imported to enable subprocess spawning. New tests spawn Bun subprocesses under forced GC settings and assert that AggregateError construction yields exactly 256 error entries per import/transform and that the process exits cleanly.

Suggested reviewers

  • Jarred-Sumner
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: replacing heap-allocated Vec with stack arrays to prevent GC use-after-free of BuildMessage wrappers.
Description check ✅ Passed The PR description thoroughly covers all required sections: what the fix does, root cause analysis, reproduction steps, test verification, and linked issues.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. Root BuildMessage cells on the stack in Log.to_js #30671 - Also roots BuildMessage/ResolveMessage cells on the stack in process_fetch_log and LogJsc::to_js by replacing Vec<JSValue> with [JSValue; 256] to fix the same GC use-after-free

🤖 Generated with Claude Code

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct and matches the established ast_jsc::log_to_js pattern, but GC-rooting changes in the module-loader error path and bake production build are subtle enough to warrant a human glance.

Extended reasoning...

Overview

This PR fixes a GC use-after-free where freshly created BuildMessage/ResolveMessage wrappers (and CSS-chunk JSStrings in bake) were held only in heap-allocated Vec<JSValue> buffers across further JSC allocations. JSC's conservative GC scans the native stack but not malloc'd buffers, so a mid-loop collection could sweep earlier cells and free their native backing. The fix moves the two bounded (≤256) sites to on-stack [JSValue; 256] arrays and the unbounded bake site to Vec<ProtectedJSValue> (RAII gcProtect/gcUnprotect). A regression test exercises the path with BUN_JSC_slowPathAllocsBetweenGCs=100.

Touched files: src/jsc/VirtualMachine.rs (process_fetch_log), src/jsc/lib.rs (LogJsc::to_js), src/runtime/bake/production.rs (css_chunk_js_strings), plus the new test in test/js/bun/resolve/build-error.test.ts.

Security risks

None introduced. This is a memory-safety hardening change; the prior code had a UAF and the new code closes it. No new external inputs, auth, or crypto surface.

Level of scrutiny

Medium-high. The change is mechanically small and follows the exact precedent already in src/ast_jsc/lib.rs (same on-stack [JSValue; 256] array, same comment). However, it sits squarely in GC-rooting semantics for the module-loader fetch-error path and the bake production build — both production-critical, and conservative-stack-scan reasoning is subtle enough that a reviewer with JSC GC context should confirm the rooting story (e.g., that 2 KiB on-stack is acceptable in these call frames, and that ProtectedJSValue drop ordering in build_with_vm is fine on every early-return path).

Other factors

  • I verified JSValue derives Default (= ZERO), so the stack-array initializer is well-defined, and only [..len] is passed to create_aggregate_error.
  • ProtectedJSValue exists with .value() and RAII unprotect; the Vec drops at end of build_with_vm (or on ? early return), which is after all put_index stores — no leak, no premature unprotect.
  • The {len} substitution for errors.len() in the message format string is semantically identical.
  • The bug-hunting system found no issues, and the PR description is thorough with reproduction and test verification (6/6 crash before, 15/15 pass after on ASAN debug).

Net: I believe the change is correct, but GC-rooting fixes in the runtime core are the kind of change a human reviewer should sign off on rather than auto-approve.

Ported from #30671, which this branch supersedes. transformSync with 256
parse errors under zombie mode and continuous collection crashes the
unfixed ASAN build (zapped StructureID while marking the errors array)
and passes with the stack-array fix.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this is a GC-rooting fix in core JSC bridge code (process_fetch_log, LogJsc::to_js, bake production) whose correctness hinges on conservative stack-scan semantics — worth a human look.

Extended reasoning...

Overview

This PR fixes a fuzzer-found GC use-after-free by changing how freshly-allocated BuildMessage/ResolveMessage/JSString cells are kept reachable while building an AggregateError (or while iterating routes in bake's static-site path). Three sites are touched: process_fetch_log in src/jsc/VirtualMachine.rs, LogJsc::to_js in src/jsc/lib.rs (both swap a heap Vec<JSValue> for an on-stack [JSValue; 256] so the conservative stack scan roots the values), and css_chunk_js_strings in src/runtime/bake/production.rs (unbounded count, so each value is wrapped in ProtectedJSValue instead). Two regression tests are added that exercise these paths under aggressive GC settings.

Security risks

None in the conventional sense — no auth, crypto, parsing of untrusted input, or new attack surface. The change is purely about GC reachability of values that already existed; it strictly removes a UAF.

Level of scrutiny

High. The fix is small and the reasoning in the PR description is excellent, but correctness depends on a subtle invariant of JSC's conservative GC (heap buffers are not scanned; stack frames are). This is core runtime/module-loader code that runs on every failed import and on every Bun.Transpiler parse error, and the bake change affects the production SSG render path. Someone familiar with the JSC rooting contract should confirm that (a) the on-stack array is guaranteed not to be SROA'd / spilled to heap by the optimizer in a way that defeats the scan, (b) ProtectedJSValue's RAII drop ordering is correct relative to the route loop, and (c) the 256-entry cap and 2 KiB stack array are acceptable in these frames.

Other factors

The PR supersedes #30671 (same fix, now rebased) and adds a third site plus tests. The description includes a detailed root-cause analysis with ASAN/DWARF evidence and a deterministic repro. No prior human review comments are outstanding. Given the subtlety of GC-rooting changes and that this is production-critical runtime code rather than a mechanical refactor, I'm deferring rather than approving.

@robobun

robobun commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

CI triage for build 60700: the red signals are environmental, not from this diff.

  • windows-x64-baseline-verify-baseline fails with the same exit status on recent main commits (55f6c89, 6038601, 8553428, 91270aa), for example main build 60678.
  • test/cli/install/bunx.test.ts is currently failing on 13+ platforms on every open PR build (31854, 31855, 31856, 31875 all show the same), consistent with the external GitHub tarball and registry endpoints those tests depend on.
  • test/js/bun/http/serve-body-leak.test.ts failed once, only on the x64-asan shard. Locally it times out identically on a pre-change ASAN debug build and passes on a release build, so that failure mode predates this change.
  • The darwin jobs show "Expired", which is agent availability.

The regression tests added here (test/js/bun/resolve/build-error.test.ts, test/js/bun/transpiler/transpiler-error-gc-uaf.test.ts) ran on the test shards, including x64-asan, and are not among the failures.

@Jarred-Sumner Jarred-Sumner merged commit 7ab24e4 into main Jun 5, 2026
69 of 78 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/bf9d35b4/fix-process-fetch-log-gc-uaf branch June 5, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Processing dynamic imports with errors crashes Bun instead of throwing errors

2 participants