Skip to content

Fix dangling PackageManager.log when JS runs during auto-install sleep_until#31120

Open
robobun wants to merge 7 commits into
mainfrom
farm/78c7a039/fix-pm-log-dangling-sleep-until
Open

Fix dangling PackageManager.log when JS runs during auto-install sleep_until#31120
robobun wants to merge 7 commits into
mainfrom
farm/78c7a039/fix-pm-log-dangling-sleep-until

Conversation

@robobun

@robobun robobun commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Fuzzilli found an ASAN stack-buffer-overflow in run_tasksLog::add_msgVec::push during runtime auto-install.

What happened

enqueue_dependency_to_root blocks in sleep_until, which ticks the JS event loop between is_done polls. Each poll calls run_tasks, which reads manager.log to log manifest 4xx errors.

The event loop tick can run module transpilation, nested resolves, or AsyncModule::resume_loading_module, all of which swap the VM's log pointers (jsc_vm.log, transpiler.log, resolver.log, linker.log, pm.log) and restore them on exit. The problem is they don't all save from the same source:

  • resolve_maybe_needs_trailing_slash / resolve_hook save jsc_vm.log and swap everything except transpiler.log.
  • transpile_source_code_inner saves transpiler.log and restores pm.log to it.
  • AsyncModule::resume_loading_module saves jsc_vm.log but restores transpiler.log to it.

So when a module transpile runs during a resolve's sleep_until tick, its restore sets pm.log (and resolver.log/linker.log) to the old transpiler.log — not the resolve's scoped log. At minimum the 404 diagnostics from run_tasks leak into the wrong log; with further interleaving across calls pm.log ends up pointing at a dead stack Log, and the next run_tasks poll dereferences it.

READ of size 8 at 0x7ffd98a149c0 thread T0
    #0  RawVecInner::capacity
    #1  Log::add_msg (lib.rs:2369)
    #3  Log::add_error_fmt (lib.rs:2032)
    #4  run_tasks (runTasks.rs:494)
    #5  Closure::is_done (PackageManagerEnqueue.rs:534)
    #7  AnyEventLoop::tick_raw (AnyEventLoop.rs:148)
    #8  PackageManager::sleep_until (PackageManager.rs:1063)
    #9  enqueue_dependency_to_root (PackageManagerEnqueue.rs:571)
    ...
    #18 resolve_maybe_needs_trailing_slash (VirtualMachine.rs:4259)

Address is located in stack of thread T0 in frame
    #0  to_js_host_call (host_fn.rs:677)
  [144, 152) 'scope' <== Memory access at offset 160 overflows this variable
  [176, 240) 'scope_storage'

Fix

  • Swap and restore transpiler.log alongside the other log pointers in both resolve entry points so it can't be left desynced from jsc_vm.log across a resolve.
  • Snapshot pm.log when entering enqueue_dependency_to_root's sleep_until and re-assert it before each run_tasks poll. This guarantees run_tasks always sees the caller's log regardless of what the event loop tick did to it.

Test

The regression test runs a 404 registry server in the child process so its fetch handler executes during sleep_until's event-loop tick. The handler require()s fresh modules, triggering transpile_source_code_inner's log swap/restore while sleep_until is pending. Without the fix, run_tasks' 404 errors land in transpiler.log and leak to stderr via process_fetch_log; with the fix stderr stays clean.

…p_until

When runtime auto-install waits for a manifest (enqueue_dependency_to_root
-> sleep_until), it ticks the JS event loop. Code run there (module
transpile, nested resolve, AsyncModule resume) swaps and restores the
shared log pointers, but the resolve path saves jsc_vm.log while
transpile_source_code saves transpiler.log. Because
resolve_maybe_needs_trailing_slash did not swap transpiler.log, a nested
swap/restore could leave PackageManager.log pointing at a dead stack Log.
The next run_tasks poll then dereferenced it in add_error_fmt, tripping an
ASAN stack-buffer-overflow.

- Swap/restore transpiler.log alongside the other log pointers in both
  resolve entry points so the swap sites stay consistent.
- Re-assert the caller's log on PackageManager before each run_tasks poll
  so a stale restore during the tick cannot reach log_mut().
@robobun

robobun commented May 20, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 2:37 AM PT - May 20th, 2026

@robobun, your commit 13fbeba513b1e6ceadcbecd4ccee48c0ea01105f passed in Build #56382! 🎉


🧪   To try this PR locally:

bunx bun-pr 31120

That installs a local version of the PR into your bun-31120 executable, so you can run:

bun-31120 --bun

@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Three code paths now snapshot and restore temporary stack-log swaps to avoid dangling log pointers: PackageManager polling callback, VirtualMachine resolver (including transpiler), and the JSC resolve hook. An integration test exercises the auto-install event-loop interleaving.

Changes

Log Pointer Dangling Fix

Layer / File(s) Summary
PackageManager polling callback log restoration
src/install/PackageManager/PackageManagerEnqueue.rs
Closure captures manager.log at construction and restores it in is_done before task execution, preventing polling callbacks from leaving manager.log pointing at a dropped stack Log.
Resolver temporary log guards
src/jsc/VirtualMachine.rs, src/runtime/jsc_hooks.rs
RestoreLog and resolve_hook now snapshot and restore transpiler.log in addition to the main log when temporarily switching to a stack bun_ast::Log for resolver/transpile/error formatting.
Event-loop safety test
test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts
New integration test spawns Bun against a local 404 HTTP server, drives auto-install through nested module resolution, and checks stdout/stderr and exit code to ensure no dangling log-pointer crashes occur.

Possibly related PRs

  • oven-sh/bun#31020: Both PRs address dangling stack log-pointer issues in resolver paths by extending VM/JSC hook log-swap guards to correctly snapshot and restore transpiler and related log pointers.

Suggested reviewers

  • RiskyMH
  • dylan-conway
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title precisely describes the main change: fixing a dangling PackageManager.log pointer during auto-install's sleep_until event loop.
Description check ✅ Passed The description comprehensively covers both required template sections: what the PR does (fixes dangling log pointer from ASAN overflow) and how it was verified (regression test with 404 registry server).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts`:
- Around line 119-122: The subprocess test in the resolve autoinstall log
dangling case is using loose negative substring checks on stderr, which does not
match the `bunEnv` convention. Update the assertions in the test body to use a
strict empty-stderr check (`expect(stderr).toBe("")`) before the existing stdout
and exit-code assertions, and remove the `panic`/`AddressSanitizer` containment
checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8fae6c57-e088-4728-833f-77160abe7afd

📥 Commits

Reviewing files that changed from the base of the PR and between a55a62d and 17c87d9.

📒 Files selected for processing (4)
  • src/install/PackageManager/PackageManagerEnqueue.rs
  • src/jsc/VirtualMachine.rs
  • src/runtime/jsc_hooks.rs
  • test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts

Comment thread test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Assertion failure when auto-installing packages #7222 - Assertion failure (heap corruption) during auto-install triggered by event loop tick — consistent with the dangling PackageManager.log pointer this PR fixes

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #7222

🤖 Generated with Claude Code

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts`:
- Around line 117-122: The test currently asserts exitCode too early; move the
exitCode assertion after the hits assertion so more specific diagnostics surface
first: keep the existing checks for stderr and stdout, then assert hits with
expect(hits).toBeGreaterThan(0), and only after that assert
expect(exitCode).toBe(0); update the assertions referencing stdout, stderr,
hits, and exitCode accordingly in resolve-autoinstall-log-dangling.test.ts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52c5ca04-9758-4106-8b37-a9b39700d219

📥 Commits

Reviewing files that changed from the base of the PR and between 17c87d9 and 324a7b9.

📒 Files selected for processing (1)
  • test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts

Comment thread test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts
Comment thread test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts Outdated
Comment thread test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts Outdated
@robobun robobun force-pushed the farm/78c7a039/fix-pm-log-dangling-sleep-until branch from 6e69267 to ce46fda Compare May 20, 2026 06:38
Comment thread test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the force-push — ce46fda restores the actual fix, so my earlier 🔴 about the botched 6e69267 no longer applies. The change now matches the description, but since it's adjusting raw *mut Log lifetime management across reentrant event-loop ticks I'd like a human to confirm the swap/restore reasoning before merge.

Extended reasoning...

Overview

The PR now contains exactly what the description claims: (1) transpiler.log is added to the existing log-pointer swap/restore sets in VirtualMachine::resolve_maybe_needs_trailing_slash's RestoreLog guard and in jsc_hooks::resolve_hook's defer block, and (2) enqueue_dependency_to_root's Closure snapshots the caller's manager.log and re-asserts it before each run_tasks poll inside sleep_until. A new regression test drives nested resolves and module loads via setImmediate during auto-install's event-loop tick against a 404 registry. Net diff is 4 files / +81 lines.

Since my last review, the author force-pushed to replace 6e69267 (which had reverted the fix and pulled in an unrelated ~2500-line refactor) with ce46fda, which is the intended 2-line test reorder. The branch is now clean.

Security risks

None in the conventional sense — no auth, crypto, input parsing, or network surface changes. However this is a memory-safety fix: the bug is a fuzzer-found ASAN stack-buffer-overflow caused by a dangling *mut Log pointer, and the fix manipulates raw-pointer lifetimes in unsafe Rust. An incorrect fix here could mask the original UAF or introduce a new one.

Level of scrutiny

High. The mechanical changes are small and follow the existing pattern (one more pointer added to an existing swap/restore set), but correctness depends on the cross-cutting reasoning in the PR description about how resolve_*, transpile_source_code_inner, and AsyncModule::resume_loading_module each save/restore from different sources during reentrant event-loop ticks. Verifying that old_transpiler_log is the right value to restore (vs. old_log), and that re-asserting manager.log = self.log on every poll can't clobber a legitimately-swapped log from a nested context, requires someone familiar with this subsystem.

Other factors

  • The regression test is well-constructed and the CodeRabbit nits (strict stderr assertion, exit-code-last) have been addressed.
  • My remaining test.concurrent nit is non-blocking.
  • CI on the previous push showed only infra failures (agent creation / macOS build-cpp), not test failures; the new push hasn't reported yet.
  • Related prior art: PR #31020 touched the same log-swap guards, suggesting this area has needed iterative fixes — another reason for human sign-off.

robobun added 2 commits May 20, 2026 07:20
The registry server runs in the child process so its fetch handler
executes during auto-install's sleep_until tick. The handler requires
fresh modules, which triggers transpile_source_code_inner's log
swap/restore. Without the fix, that restore redirects pm.log away from
the resolve's scoped log to transpiler.log, and run_tasks' 404 error
messages leak into it and get printed to stderr by process_fetch_log.
With the fix, pm.log is re-asserted to the resolve's log before each
run_tasks poll, so stderr stays clean.
Comment thread test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts Outdated
Comment thread test/js/bun/resolve/resolve-autoinstall-log-dangling.test.ts Outdated
Use the parent-process server on port 0 and queue module loads via
setImmediate in the child so they run during sleep_until's event-loop
tick. The require() triggers transpile_source_code_inner which restores
pm.log from transpiler.log; without the fix the 404 diagnostic then
leaks into vm.log and is dumped to stderr at process exit.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — 776d442 resolves the port-probe race and the node:net dynamic-import nit, and the diff is back to the intended 4 files. No further issues from me; deferring to a human for the unsafe log-pointer lifetime changes across the re-entrant sleep_until path.

Extended reasoning...

Overview

This PR fixes an ASAN stack-buffer-overflow where PackageManager.log can be left pointing at dead stack memory when the JS event loop runs (and transpiles modules) during auto-install's sleep_until polling. It touches three Rust files — PackageManagerEnqueue.rs (snapshot/re-assert manager.log before each run_tasks poll), VirtualMachine.rs and jsc_hooks.rs (swap/restore transpiler.log alongside the other log pointers in the resolve entry points) — plus a new regression test. Net diff is +82 across 4 files.

Prior feedback now resolved

All of my earlier inline comments have been addressed:

  • The botched commit that reverted the fix and pulled in an unrelated ~2500-line refactor was redone; git diff 46b24045..776d442b --stat now shows only the 4 intended files.
  • 776d442 moved the 404 registry server into the parent test process with Bun.serve({ port: 0 }) held for the test's lifetime, eliminating the probe-then-close-then-rebind race. The child now triggers the transpile-during-tick via setImmediate(load) callbacks rather than the server's fetch handler, which still exercises the same code path.
  • The await import("node:net") is gone entirely.
  • The panic/ASAN substring assertions and exit-code ordering were fixed in earlier commits.
  • The test.concurrent suggestion remains unapplied, but that was an explicit non-blocking nit.

Security risks

None in the conventional sense (no auth/input-handling changes). The change is memory-safety-relevant — it fixes a use-after-scope on a stack Log — but the fix only adds save/restore symmetry and a defensive re-assertion; it doesn't introduce new unsafe dereferences.

Level of scrutiny

This warrants human review. The fix is small and targeted, but it lives entirely in unsafe Rust manipulating raw *mut bun_ast::Log pointers whose validity depends on a subtle invariant across re-entrant event-loop callbacks (sleep_until → tick → transpile/resolve → swap/restore). In particular, the manager.log = self.log re-assertion in Closure::is_done is a belt-and-suspenders defense that compensates for desync elsewhere; a reviewer familiar with the log-swap sites (e.g. transpile_source_code_inner, AsyncModule::resume_loading_module) should confirm this is the right layer to enforce the invariant and that no other run_tasks callers need the same treatment.

Other factors

The regression test now follows the suite conventions (strict stderr, exit code last, port: 0 held by the test). CodeRabbit suggested dylan-conway / RiskyMH as reviewers, which aligns with the install/resolver ownership.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the earlier issues are resolved (the fix is back in the diff after the botched amend, and the test now binds port: 0 directly with no probe-then-rebind race). Given this is raw-pointer lifetime juggling across reentrant event-loop ticks, I'd like a maintainer to sign off on the pm.log snapshot/re-assert approach.

Extended reasoning...

Overview

This PR fixes an ASAN stack-buffer-overflow where PackageManager.log can be left pointing at dead stack memory when JS runs during auto-install's sleep_until event-loop tick. It touches three Rust files in the runtime/install hot path:

  • src/install/PackageManager/PackageManagerEnqueue.rs — adds a log: *mut bun_ast::Log snapshot to the sleep_until closure and re-asserts manager.log = self.log before every run_tasks poll.
  • src/jsc/VirtualMachine.rsRestoreLog guard now also snapshots/restores transpiler.log alongside the existing four log pointers.
  • src/runtime/jsc_hooks.rsresolve_hook does the same.
  • A new regression test in test/js/bun/resolve/ that uses setImmediate-scheduled require() calls to trigger module transpilation during sleep_until, with a parent-hosted 404 registry on port: 0.

Prior review history

I've reviewed this PR three times already. The author has addressed every concern:

  • The botched commit (6e69267) that reverted the fix and pulled in a ~2500-line unrelated refactor was redone (now ce46fda); the fix is present in the current diff and the changed-files list is back to 4.
  • The net.createServer() probe-then-close-then-rebind port race was removed (776d442); the registry server now runs in the parent test process with Bun.serve({ port: 0 }) and stays bound throughout.
  • The panic/AddressSanitizer substring checks were replaced with expect(stderr).toBe("").
  • The dynamic await import("node:net") is gone (no longer needed).
  • The test.concurrent nit was not applied, but it's a soft preference with no correctness impact.

Security risks

None introduced. This is a memory-safety fix — the change closes a use-after-scope on a stack Log, not opens one. No new attack surface, no input parsing, no auth/crypto.

Level of scrutiny

High. The change is small (~20 lines of Rust) but it's raw *mut pointer-lifetime reasoning across at least five log-swap sites (resolve_maybe_needs_trailing_slash, resolve_hook, transpile_source_code_inner, AsyncModule::resume_loading_module, and now the is_done closure) that interact through reentrant event-loop ticks. The original bug existed precisely because these sites disagreed on which pointer to save/restore from; adding a sixth snapshot point is the right defensive move but a maintainer familiar with this subsystem should confirm there's no remaining path where this.log itself is stale at closure-construction time, and that unconditionally overwriting manager.log on every poll can't clobber a legitimate inner swap.

Other factors

  • CI build #56379 shows failures, but they're HiveArrayFallback deprecation warnings and musl build issues in unrelated files — none originate from this PR's changed lines and appear to come from the merged main.
  • CodeRabbit suggested dylan-conway / RiskyMH as reviewers, which seems right for install/resolver code.
  • The bug-hunting system found nothing this round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant