Fix GC use-after-free of BuildMessage wrappers when building AggregateErrors#31874
Conversation
…gregateErrors JSC roots JSValues on the native stack via the conservative scan, but a heap-allocated Vec buffer is invisible to it. process_fetch_log and LogJsc::to_js accumulated freshly created BuildMessage/ResolveMessage wrappers in a Vec<JSValue> while subsequent create calls allocated more cells; a GC triggered mid-loop could sweep the earlier wrappers and run their finalizers, freeing the native BuildMessage that later reads (error printing, .message getters) still dereference. Use the on-stack [JSValue; 256] array for both, matching ast_jsc::log_to_js. bake's production build had the same shape for css_chunk_js_strings (fresh JSStrings in a heap Vec, consumed after module evaluation); protect each value for the duration instead. Found by fuzzing: ASAN heap-use-after-free, read of size 1 at offset 144 of a 152-byte region (BuildMessage.logged read on a finalized BuildMessage).
|
Found 1 issue this PR may fix:
🤖 Generated with Claude Code |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR hardens garbage collection safety during error and asset aggregation by replacing heap-allocated temporary arrays with fixed stack buffers and protected JS values. Error aggregation in two code paths now uses on-stack ChangesGC Safety for Error Aggregation and CSS Chunk URLs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
There was a problem hiding this comment.
The fix looks correct and matches the established ast_jsc::log_to_js pattern, but GC-rooting changes in the module-loader error path and bake production build are subtle enough to warrant a human glance.
Extended reasoning...
Overview
This PR fixes a GC use-after-free where freshly created BuildMessage/ResolveMessage wrappers (and CSS-chunk JSStrings in bake) were held only in heap-allocated Vec<JSValue> buffers across further JSC allocations. JSC's conservative GC scans the native stack but not malloc'd buffers, so a mid-loop collection could sweep earlier cells and free their native backing. The fix moves the two bounded (≤256) sites to on-stack [JSValue; 256] arrays and the unbounded bake site to Vec<ProtectedJSValue> (RAII gcProtect/gcUnprotect). A regression test exercises the path with BUN_JSC_slowPathAllocsBetweenGCs=100.
Touched files: src/jsc/VirtualMachine.rs (process_fetch_log), src/jsc/lib.rs (LogJsc::to_js), src/runtime/bake/production.rs (css_chunk_js_strings), plus the new test in test/js/bun/resolve/build-error.test.ts.
Security risks
None introduced. This is a memory-safety hardening change; the prior code had a UAF and the new code closes it. No new external inputs, auth, or crypto surface.
Level of scrutiny
Medium-high. The change is mechanically small and follows the exact precedent already in src/ast_jsc/lib.rs (same on-stack [JSValue; 256] array, same comment). However, it sits squarely in GC-rooting semantics for the module-loader fetch-error path and the bake production build — both production-critical, and conservative-stack-scan reasoning is subtle enough that a reviewer with JSC GC context should confirm the rooting story (e.g., that 2 KiB on-stack is acceptable in these call frames, and that ProtectedJSValue drop ordering in build_with_vm is fine on every early-return path).
Other factors
- I verified
JSValuederivesDefault(=ZERO), so the stack-array initializer is well-defined, and only[..len]is passed tocreate_aggregate_error. ProtectedJSValueexists with.value()and RAII unprotect; the Vec drops at end ofbuild_with_vm(or on?early return), which is after allput_indexstores — no leak, no premature unprotect.- The
{len}substitution forerrors.len()in the message format string is semantically identical. - The bug-hunting system found no issues, and the PR description is thorough with reproduction and test verification (6/6 crash before, 15/15 pass after on ASAN debug).
Net: I believe the change is correct, but GC-rooting fixes in the runtime core are the kind of change a human reviewer should sign off on rather than auto-approve.
Ported from #30671, which this branch supersedes. transformSync with 256 parse errors under zombie mode and continuous collection crashes the unfixed ASAN build (zapped StructureID while marking the errors array) and passes with the stack-array fix.
There was a problem hiding this comment.
I didn't find any bugs, but this is a GC-rooting fix in core JSC bridge code (process_fetch_log, LogJsc::to_js, bake production) whose correctness hinges on conservative stack-scan semantics — worth a human look.
Extended reasoning...
Overview
This PR fixes a fuzzer-found GC use-after-free by changing how freshly-allocated BuildMessage/ResolveMessage/JSString cells are kept reachable while building an AggregateError (or while iterating routes in bake's static-site path). Three sites are touched: process_fetch_log in src/jsc/VirtualMachine.rs, LogJsc::to_js in src/jsc/lib.rs (both swap a heap Vec<JSValue> for an on-stack [JSValue; 256] so the conservative stack scan roots the values), and css_chunk_js_strings in src/runtime/bake/production.rs (unbounded count, so each value is wrapped in ProtectedJSValue instead). Two regression tests are added that exercise these paths under aggressive GC settings.
Security risks
None in the conventional sense — no auth, crypto, parsing of untrusted input, or new attack surface. The change is purely about GC reachability of values that already existed; it strictly removes a UAF.
Level of scrutiny
High. The fix is small and the reasoning in the PR description is excellent, but correctness depends on a subtle invariant of JSC's conservative GC (heap buffers are not scanned; stack frames are). This is core runtime/module-loader code that runs on every failed import and on every Bun.Transpiler parse error, and the bake change affects the production SSG render path. Someone familiar with the JSC rooting contract should confirm that (a) the on-stack array is guaranteed not to be SROA'd / spilled to heap by the optimizer in a way that defeats the scan, (b) ProtectedJSValue's RAII drop ordering is correct relative to the route loop, and (c) the 256-entry cap and 2 KiB stack array are acceptable in these frames.
Other factors
The PR supersedes #30671 (same fix, now rebased) and adds a third site plus tests. The description includes a detailed root-cause analysis with ASAN/DWARF evidence and a deterministic repro. No prior human review comments are outstanding. Given the subtlety of GC-rooting changes and that this is production-critical runtime code rather than a mechanical refactor, I'm deferring rather than approving.
|
CI triage for build 60700: the red signals are environmental, not from this diff.
The regression tests added here ( |
What
Fixes a GC use-after-free found by fuzzing (ASAN fingerprint
heap-use-after-free, read of size 1 at offset 144 inside a 152-byte region: that is exactlyBuildMessage.loggedinside a freedBox<BuildMessage>, confirmed via DWARF layout).Root cause
JSC roots JSValues through the conservative stack scan. Values that live only in a malloc'd
Vec<JSValue>buffer are invisible to the GC.process_fetch_log(module fetch failure path) created oneBuildMessage/ResolveMessagewrapper per log message and pushed each into a heapVec<JSValue>before handing the slice tocreate_aggregate_error. Everycreatecall in that loop allocates JS cells, so a collection can run mid-loop; the wrappers created in earlier iterations have no reachable reference at that point, get swept, and their finalizers free the nativeBuildMessage. Anything that touches the stale cells afterwards (printing the unhandled rejection, reading.messageofferror.errors[i]) reads freed memory. The original Zig implementation used an on-stackvar errors_stack: [256]JSValuespecifically so the conservative scan would root these; the port replaced it with a heapVec.The fixing lines are the
Vec<JSValue>to[JSValue; 256]changes insrc/jsc/VirtualMachine.rs(process_fetch_log) andsrc/jsc/lib.rs(LogJsc::to_js, same pattern, also feedscreate_aggregate_error).ast_jsc::log_to_jsalready used the on-stack array and documents why.Same bug shape in
bake/production.rs:css_chunk_js_stringsheld freshly createdJSStrings in a heapVecacrosspreload_bundled_modulecalls (which evaluate JS modules). That one is unbounded in length, so each value is keptprotect()ed (RAIIProtectedJSValue) until consumed. An audit of the otherVec<JSValue>sites in the tree found them safe (values rooted elsewhere viaMarkedArgumentBuffer,protect(),Strong, or no allocation while held).Reproduction
A module with 257 duplicate
constdeclarations produces 256 log messages. Importing it 16 times withBUN_JSC_slowPathAllocsBetweenGCs=100crashed the unfixed ASAN debug build 6/6 runs (ASAN UAF orASSERTION FAILED: isSymbol()from swept-and-reused cells, depending on timing). The fuzzer hit the same window on a Worker thread with itsBun.gc(true)suffix.Tests
test/js/bun/resolve/build-error.test.ts: "import with many build errors keeps AggregateError entries alive across GC". Fails on the unfixed ASAN debug build (child aborts, 6/6), passes on the fixed build (15/15 runs). It passes under a release build either way since the UAF needs ASAN (or unlucky reuse) to become observable; the ASAN CI suites are the enforcing ones.test/js/bun/transpiler/transpiler-error-gc-uaf.test.tscovers theLogJsc::to_jssite directly:Bun.Transpiler().transformSyncwith 256 parse errors underBUN_JSC_useZombieModeandBUN_JSC_collectContinuouslycrashes the unfixed ASAN build (zapped StructureID while marking the errors array) and passes on this branch. Existingplugins.test.ts,transpiler.test.js,dev-and-prod.test.ts(5 PROD variants) andframework-router.test.tspass.Linked issue
Fixes #23181
That report is the same scenario: a loop of dynamic imports over files where some fail with build errors, with a try/catch around each. A script mirroring its
extract-messages.tsshape (sequentialawait import()of modules with multiple build errors, catch block touching the error) crashes the unfixed ASAN build 3/3 runs under GC pressure and completes cleanly 3/3 on this branch. The original report was against v1.2.23, whose implementation of this path differed, so the exact 1.2.23 mechanism can't be re-verified, but the reported workload no longer crashes on current Bun with this change.Supersedes #30671
That earlier PR fixed the same two sites (
process_fetch_log,LogJsc::to_js) but is now conflicting with main after the comment rewrites in #31783. This PR carries the same changes rebased on current main, absorbs its transpiler regression test, and additionally fixes the same unrooted-heap-Vec shape in bake's production build.