Fix dangling PackageManager.log when JS runs during auto-install sleep_until#31120
Fix dangling PackageManager.log when JS runs during auto-install sleep_until#31120robobun wants to merge 7 commits into
Conversation
…p_until When runtime auto-install waits for a manifest (enqueue_dependency_to_root -> sleep_until), it ticks the JS event loop. Code run there (module transpile, nested resolve, AsyncModule resume) swaps and restores the shared log pointers, but the resolve path saves jsc_vm.log while transpile_source_code saves transpiler.log. Because resolve_maybe_needs_trailing_slash did not swap transpiler.log, a nested swap/restore could leave PackageManager.log pointing at a dead stack Log. The next run_tasks poll then dereferenced it in add_error_fmt, tripping an ASAN stack-buffer-overflow. - Swap/restore transpiler.log alongside the other log pointers in both resolve entry points so the swap sites stay consistent. - Re-assert the caller's log on PackageManager before each run_tasks poll so a stale restore during the tick cannot reach log_mut().
|
Updated 2:37 AM PT - May 20th, 2026
✅ @robobun, your commit 13fbeba513b1e6ceadcbecd4ccee48c0ea01105f passed in 🧪 To try this PR locally: bunx bun-pr 31120That installs a local version of the PR into your bun-31120 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThree code paths now snapshot and restore temporary stack-log swaps to avoid dangling log pointers: PackageManager polling callback, VirtualMachine resolver (including transpiler), and the JSC resolve hook. An integration test exercises the auto-install event-loop interleaving. ChangesLog Pointer Dangling Fix
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts`:
- Around line 119-122: The subprocess test in the resolve autoinstall log
dangling case is using loose negative substring checks on stderr, which does not
match the `bunEnv` convention. Update the assertions in the test body to use a
strict empty-stderr check (`expect(stderr).toBe("")`) before the existing stdout
and exit-code assertions, and remove the `panic`/`AddressSanitizer` containment
checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8fae6c57-e088-4728-833f-77160abe7afd
📒 Files selected for processing (4)
src/install/PackageManager/PackageManagerEnqueue.rssrc/jsc/VirtualMachine.rssrc/runtime/jsc_hooks.rstest/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts
|
Found 1 issue this PR may fix:
🤖 Generated with Claude Code |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts`:
- Around line 117-122: The test currently asserts exitCode too early; move the
exitCode assertion after the hits assertion so more specific diagnostics surface
first: keep the existing checks for stderr and stdout, then assert hits with
expect(hits).toBeGreaterThan(0), and only after that assert
expect(exitCode).toBe(0); update the assertions referencing stdout, stderr,
hits, and exitCode accordingly in resolve-autoinstall-log-dangling.test.ts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 52c5ca04-9758-4106-8b37-a9b39700d219
📒 Files selected for processing (1)
test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts
6e69267 to
ce46fda
Compare
There was a problem hiding this comment.
Thanks for the force-push — ce46fda restores the actual fix, so my earlier 🔴 about the botched 6e69267 no longer applies. The change now matches the description, but since it's adjusting raw *mut Log lifetime management across reentrant event-loop ticks I'd like a human to confirm the swap/restore reasoning before merge.
Extended reasoning...
Overview
The PR now contains exactly what the description claims: (1) transpiler.log is added to the existing log-pointer swap/restore sets in VirtualMachine::resolve_maybe_needs_trailing_slash's RestoreLog guard and in jsc_hooks::resolve_hook's defer block, and (2) enqueue_dependency_to_root's Closure snapshots the caller's manager.log and re-asserts it before each run_tasks poll inside sleep_until. A new regression test drives nested resolves and module loads via setImmediate during auto-install's event-loop tick against a 404 registry. Net diff is 4 files / +81 lines.
Since my last review, the author force-pushed to replace 6e69267 (which had reverted the fix and pulled in an unrelated ~2500-line refactor) with ce46fda, which is the intended 2-line test reorder. The branch is now clean.
Security risks
None in the conventional sense — no auth, crypto, input parsing, or network surface changes. However this is a memory-safety fix: the bug is a fuzzer-found ASAN stack-buffer-overflow caused by a dangling *mut Log pointer, and the fix manipulates raw-pointer lifetimes in unsafe Rust. An incorrect fix here could mask the original UAF or introduce a new one.
Level of scrutiny
High. The mechanical changes are small and follow the existing pattern (one more pointer added to an existing swap/restore set), but correctness depends on the cross-cutting reasoning in the PR description about how resolve_*, transpile_source_code_inner, and AsyncModule::resume_loading_module each save/restore from different sources during reentrant event-loop ticks. Verifying that old_transpiler_log is the right value to restore (vs. old_log), and that re-asserting manager.log = self.log on every poll can't clobber a legitimately-swapped log from a nested context, requires someone familiar with this subsystem.
Other factors
- The regression test is well-constructed and the CodeRabbit nits (strict stderr assertion, exit-code-last) have been addressed.
- My remaining
test.concurrentnit is non-blocking. - CI on the previous push showed only infra failures (agent creation / macOS build-cpp), not test failures; the new push hasn't reported yet.
- Related prior art: PR #31020 touched the same log-swap guards, suggesting this area has needed iterative fixes — another reason for human sign-off.
…PRL-state-dependent crash
The registry server runs in the child process so its fetch handler executes during auto-install's sleep_until tick. The handler requires fresh modules, which triggers transpile_source_code_inner's log swap/restore. Without the fix, that restore redirects pm.log away from the resolve's scoped log to transpiler.log, and run_tasks' 404 error messages leak into it and get printed to stderr by process_fetch_log. With the fix, pm.log is re-asserted to the resolve's log before each run_tasks poll, so stderr stays clean.
Use the parent-process server on port 0 and queue module loads via setImmediate in the child so they run during sleep_until's event-loop tick. The require() triggers transpile_source_code_inner which restores pm.log from transpiler.log; without the fix the 404 diagnostic then leaks into vm.log and is dumped to stderr at process exit.
There was a problem hiding this comment.
Thanks — 776d442 resolves the port-probe race and the node:net dynamic-import nit, and the diff is back to the intended 4 files. No further issues from me; deferring to a human for the unsafe log-pointer lifetime changes across the re-entrant sleep_until path.
Extended reasoning...
Overview
This PR fixes an ASAN stack-buffer-overflow where PackageManager.log can be left pointing at dead stack memory when the JS event loop runs (and transpiles modules) during auto-install's sleep_until polling. It touches three Rust files — PackageManagerEnqueue.rs (snapshot/re-assert manager.log before each run_tasks poll), VirtualMachine.rs and jsc_hooks.rs (swap/restore transpiler.log alongside the other log pointers in the resolve entry points) — plus a new regression test. Net diff is +82 across 4 files.
Prior feedback now resolved
All of my earlier inline comments have been addressed:
- The botched commit that reverted the fix and pulled in an unrelated ~2500-line refactor was redone;
git diff 46b24045..776d442b --statnow shows only the 4 intended files. - 776d442 moved the 404 registry server into the parent test process with
Bun.serve({ port: 0 })held for the test's lifetime, eliminating the probe-then-close-then-rebind race. The child now triggers the transpile-during-tick viasetImmediate(load)callbacks rather than the server's fetch handler, which still exercises the same code path. - The
await import("node:net")is gone entirely. - The panic/ASAN substring assertions and exit-code ordering were fixed in earlier commits.
- The
test.concurrentsuggestion remains unapplied, but that was an explicit non-blocking nit.
Security risks
None in the conventional sense (no auth/input-handling changes). The change is memory-safety-relevant — it fixes a use-after-scope on a stack Log — but the fix only adds save/restore symmetry and a defensive re-assertion; it doesn't introduce new unsafe dereferences.
Level of scrutiny
This warrants human review. The fix is small and targeted, but it lives entirely in unsafe Rust manipulating raw *mut bun_ast::Log pointers whose validity depends on a subtle invariant across re-entrant event-loop callbacks (sleep_until → tick → transpile/resolve → swap/restore). In particular, the manager.log = self.log re-assertion in Closure::is_done is a belt-and-suspenders defense that compensates for desync elsewhere; a reviewer familiar with the log-swap sites (e.g. transpile_source_code_inner, AsyncModule::resume_loading_module) should confirm this is the right layer to enforce the invariant and that no other run_tasks callers need the same treatment.
Other factors
The regression test now follows the suite conventions (strict stderr, exit code last, port: 0 held by the test). CodeRabbit suggested dylan-conway / RiskyMH as reviewers, which aligns with the install/resolver ownership.
…log-dangling-sleep-until
There was a problem hiding this comment.
Thanks — the earlier issues are resolved (the fix is back in the diff after the botched amend, and the test now binds port: 0 directly with no probe-then-rebind race). Given this is raw-pointer lifetime juggling across reentrant event-loop ticks, I'd like a maintainer to sign off on the pm.log snapshot/re-assert approach.
Extended reasoning...
Overview
This PR fixes an ASAN stack-buffer-overflow where PackageManager.log can be left pointing at dead stack memory when JS runs during auto-install's sleep_until event-loop tick. It touches three Rust files in the runtime/install hot path:
src/install/PackageManager/PackageManagerEnqueue.rs— adds alog: *mut bun_ast::Logsnapshot to thesleep_untilclosure and re-assertsmanager.log = self.logbefore everyrun_taskspoll.src/jsc/VirtualMachine.rs—RestoreLogguard now also snapshots/restorestranspiler.logalongside the existing four log pointers.src/runtime/jsc_hooks.rs—resolve_hookdoes the same.- A new regression test in
test/js/bun/resolve/that usessetImmediate-scheduledrequire()calls to trigger module transpilation duringsleep_until, with a parent-hosted 404 registry onport: 0.
Prior review history
I've reviewed this PR three times already. The author has addressed every concern:
- The botched commit (6e69267) that reverted the fix and pulled in a ~2500-line unrelated refactor was redone (now ce46fda); the fix is present in the current diff and the changed-files list is back to 4.
- The
net.createServer()probe-then-close-then-rebind port race was removed (776d442); the registry server now runs in the parent test process withBun.serve({ port: 0 })and stays bound throughout. - The
panic/AddressSanitizersubstring checks were replaced withexpect(stderr).toBe(""). - The dynamic
await import("node:net")is gone (no longer needed). - The
test.concurrentnit was not applied, but it's a soft preference with no correctness impact.
Security risks
None introduced. This is a memory-safety fix — the change closes a use-after-scope on a stack Log, not opens one. No new attack surface, no input parsing, no auth/crypto.
Level of scrutiny
High. The change is small (~20 lines of Rust) but it's raw *mut pointer-lifetime reasoning across at least five log-swap sites (resolve_maybe_needs_trailing_slash, resolve_hook, transpile_source_code_inner, AsyncModule::resume_loading_module, and now the is_done closure) that interact through reentrant event-loop ticks. The original bug existed precisely because these sites disagreed on which pointer to save/restore from; adding a sixth snapshot point is the right defensive move but a maintainer familiar with this subsystem should confirm there's no remaining path where this.log itself is stale at closure-construction time, and that unconditionally overwriting manager.log on every poll can't clobber a legitimate inner swap.
Other factors
- CI build #56379 shows failures, but they're
HiveArrayFallbackdeprecation warnings and musl build issues in unrelated files — none originate from this PR's changed lines and appear to come from the mergedmain. - CodeRabbit suggested dylan-conway / RiskyMH as reviewers, which seems right for install/resolver code.
- The bug-hunting system found nothing this round.
Fuzzilli found an ASAN stack-buffer-overflow in
run_tasks→Log::add_msg→Vec::pushduring runtime auto-install.What happened
enqueue_dependency_to_rootblocks insleep_until, which ticks the JS event loop betweenis_donepolls. Each poll callsrun_tasks, which readsmanager.logto log manifest 4xx errors.The event loop tick can run module transpilation, nested resolves, or
AsyncModule::resume_loading_module, all of which swap the VM's log pointers (jsc_vm.log,transpiler.log,resolver.log,linker.log,pm.log) and restore them on exit. The problem is they don't all save from the same source:resolve_maybe_needs_trailing_slash/resolve_hooksavejsc_vm.logand swap everything excepttranspiler.log.transpile_source_code_innersavestranspiler.logand restorespm.logto it.AsyncModule::resume_loading_modulesavesjsc_vm.logbut restorestranspiler.logto it.So when a module transpile runs during a resolve's
sleep_untiltick, its restore setspm.log(andresolver.log/linker.log) to the oldtranspiler.log— not the resolve's scoped log. At minimum the 404 diagnostics fromrun_tasksleak into the wrong log; with further interleaving across callspm.logends up pointing at a dead stackLog, and the nextrun_taskspoll dereferences it.Fix
transpiler.logalongside the other log pointers in both resolve entry points so it can't be left desynced fromjsc_vm.logacross a resolve.pm.logwhen enteringenqueue_dependency_to_root'ssleep_untiland re-assert it before eachrun_taskspoll. This guaranteesrun_tasksalways sees the caller's log regardless of what the event loop tick did to it.Test
The regression test runs a 404 registry server in the child process so its
fetchhandler executes duringsleep_until's event-loop tick. The handlerrequire()s fresh modules, triggeringtranspile_source_code_inner's log swap/restore whilesleep_untilis pending. Without the fix,run_tasks' 404 errors land intranspiler.logand leak to stderr viaprocess_fetch_log; with the fix stderr stays clean.