Skip to content

hot: defer reload while a rejected module is unreported#29740

Merged
Jarred-Sumner merged 2 commits into
mainfrom
claude/happy-pasteur-a77f3a
Apr 26, 2026
Merged

hot: defer reload while a rejected module is unreported#29740
Jarred-Sumner merged 2 commits into
mainfrom
claude/happy-pasteur-a77f3a

Conversation

@dylan-conway

Copy link
Copy Markdown
Member

What

Runtime fix for the --hot sourcemap race that #29735 works around at the test level. Two changes:

  • VirtualMachine.reload() now also defers when pending_internal_promise is .rejected but its error hasn't been printed yet (pending_internal_promise_reported_at != hot_reload_counter), not just when .pending. The deferred reload runs from reportExceptionInHotReloadedModuleIfNeeded() after the error is remapped and printed against its own sourcemap.
  • SavedSourceMap.putMappings() keeps the existing table entry when the incoming InternalSourceMap has zero mappings. A 0-mapping map can never answer a lookup, so dropping it is never worse than installing it; this defends against any other path that re-transpiles a comment-only partial read.

Why

vm.source_mappings is a path-hash → blob table overwritten in place on every transpile. The event-loop tick drains microtasks between tasks, so a watcher event that arrives after a module's eval rejects but before reportExceptionInHotReloadedModuleIfNeeded() prints it can run another reload() — which re-reads the file (possibly mid-rewrite, since a 2MB writeFileSync is truncate + several write()s) and overwrites the table entry. The still-unreported error is then either remapped against the wrong map (transpiled coords leak through, e.g. :1:12 when the source is line 1003) or dropped entirely when the new pending_internal_promise replaces the old one.

This was flaking on aarch64 in should work with sourcemap generation (see #29735) and can also affect users whose editors save non-atomically.

Test

The new test in test/cli/hot/hot.test.ts makes the window deterministic: the hot file truncates itself to a comment-only stub immediately before throwing, guaranteeing a fresh watcher event lands between reject and report.

  • bun bd test test/cli/hot/hot.test.ts -t "should not remap against a stale sourcemap" — 1 pass, 41 expect() calls (20 iterations)
  • USE_SYSTEM_BUN=1 bun test test/cli/hot/hot.test.ts -t "should not remap against a stale sourcemap" — fails (the error is dropped and the test times out waiting for it)
  • bun bd test test/cli/hot/hot.test.ts -t sourcemap — all 4 sourcemap tests pass
  • bun run zig:check-all — clean

Note: the self-rewriting hot file in the new test occasionally exposes a separate, pre-existing edge case where an entry that rewrites itself mid-eval and then throws can lose the error (reproduces on released bun too). I observed this as a rare hang on debug builds when running under heavy parallel load; it did not reproduce in serial bun bd test runs. If it surfaces in CI it's worth a follow-up — the root cause is independent of this change.

`vm.source_mappings` is a path-hash → blob table overwritten in place on
every transpile. The hot-reload Task loop drains microtasks between tasks,
so a watcher event that arrives after a module's eval rejects but before
`reportExceptionInHotReloadedModuleIfNeeded()` prints it can run another
`reload()` — which re-reads the file (possibly mid-rewrite) and overwrites
the table entry. The still-unreported error is then either remapped against
the wrong map (transpiled coords leak through, e.g. `:1:12`) or dropped
entirely when the new pending promise replaces the old one.

Two changes:

- `VirtualMachine.reload()` now also defers when `pending_internal_promise`
  is `.rejected` and `pending_internal_promise_reported_at != hot_reload_counter`,
  not just when `.pending`. The deferred reload runs from
  `reportExceptionInHotReloadedModuleIfNeeded()` after the error is printed
  (and remapped against its own sourcemap).

- `SavedSourceMap.putMappings()` keeps the existing entry when the incoming
  blob has zero mappings. A truncate-then-write rewrite can be observed as
  a comment-only prefix that transpiles to nothing; a 0-mapping map can
  never answer a lookup, so dropping it is never worse than installing it.

The new test makes the window deterministic by having the hot file truncate
itself to a comment-only stub immediately before throwing, guaranteeing a
fresh watcher event lands between reject and report.
@robobun

robobun commented Apr 26, 2026

Copy link
Copy Markdown
Collaborator
Updated 1:58 AM PT - Apr 26th, 2026

@dylan-conway, your commit 52d8fbba8f7476ffad5e61dbeac24c96b6fea678 passed in Build #48062! 🎉


🧪   To try this PR locally:

bunx bun-pr 29740

That installs a local version of the PR into your bun-29740 executable, so you can run:

bun-29740 --bun

@coderabbitai

coderabbitai Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d277d8bd-4ee3-484a-9f3d-20d9ed73e3a7

📥 Commits

Reviewing files that changed from the base of the PR and between 30478d2 and fb2528f.

📒 Files selected for processing (3)
  • src/bun.js/SavedSourceMap.zig
  • src/bun.js/VirtualMachine.zig
  • test/cli/hot/hot.test.ts

Walkthrough

Improves hot reload stability by preventing source map overwrites during transient transpile states and deferring reloads when promise rejections haven't been reported yet, with a regression test validating the fix.

Changes

Cohort / File(s) Summary
Hot reload safeguards
src/bun.js/SavedSourceMap.zig, src/bun.js/VirtualMachine.zig
Adds guards against overwriting saved source maps with empty mappings during transient hot reload states. Expands deferral logic to prevent re-transpilation when promise rejections haven't been surfaced yet.
Regression test
test/cli/hot/hot.test.ts
New test deterministically validates sourcemap handling during hot reload cycles with truncated comment-only files, verifying stack trace locations remain accurate across 20 reload iterations.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: deferring reload when a rejected module hasn't been reported yet, which is the primary runtime fix in this PR.
Description check ✅ Passed The description fully covers both required template sections: 'What does this PR do?' is detailed with two specific changes and their rationale, and 'How did you verify your code works?' includes comprehensive test verification steps and results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

Found 2 issues this PR may fix:

  1. PANIC on hot-reload when modifying imported modules (release ,debug) and HUGE memory leak (debug). #8513 - PANIC on hot-reload when modifying imported modules: watcher fires while the editor is mid-rewrite (truncate+write), reading a partial/empty file — the exact race condition this PR defers
  2. bun test #21872 - Panic in sourcemap.zig:lessThan from SavedSourceMap.getWithContent during error remapping — a sourcemap corruption crash that the putMappings zero-mapping guard directly addresses

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #8513
Fixes #21872

🤖 Generated with Claude Code

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this is a race-condition fix touching the hot-reload state machine in VirtualMachine.zig and sourcemap storage locking, so it's worth a human look — particularly the interaction between the new .rejected defer path and reportExceptionInHotReloadedModuleIfNeeded, and the noted rare hang on debug builds.

Extended reasoning...

Overview

This PR fixes a --hot race where a watcher event landing between a module's eval rejection and its error report would re-transpile and overwrite source_mappings[path], causing the unreported error to remap against the wrong sourcemap (or be dropped). It touches src/bun.js/VirtualMachine.zig (extends the existing hot_reload_deferred mechanism to cover .rejected-but-unreported promises), src/bun.js/SavedSourceMap.zig (skips overwriting an existing entry with a 0-mapping InternalSourceMap), and adds a regression test in test/cli/hot/hot.test.ts.

Security risks

None identified. The changes are confined to dev-time hot-reload error reporting and sourcemap bookkeeping; no auth, crypto, network, or untrusted-input parsing paths are affected.

Level of scrutiny

Moderate-to-high. While the diff is small and follows established patterns (the .rejected branch mirrors the existing pending_internal_promise_reported_at != hot_reload_counter check in reportExceptionInHotReloadedModuleIfNeeded), this is concurrency-sensitive logic in the core runtime: event-loop microtask ordering, promise state transitions, and a mutex-guarded hashtable. The putMappings change introduces a new lock/unlock cycle whose defer scoping is correct but subtle (scoped to the inner if block, so it releases before falling through to putValue). Race-condition fixes in this area benefit from review by someone familiar with the hot-reload state machine.

Other factors

The author explicitly flags that the new self-rewriting test "occasionally exposes a separate, pre-existing edge case" causing rare hangs on debug builds under load — a reviewer should weigh whether that's acceptable to land as-is or needs a skip/follow-up first. The bug-hunting system found no issues, the PR has a targeted regression test that passes on the new build and fails on system bun, and zig:check-all is clean. No CODEOWNERS cover these files.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this touches core hot-reload event-loop ordering and sourcemap lifetime in VirtualMachine.zig, and the author flagged that the new self-rewriting test can occasionally hang under load — worth a human sanity check on the deferral invariants before merging.

Extended reasoning...

Overview

This PR modifies src/bun.js/VirtualMachine.zig (the reload() deferral logic), src/bun.js/SavedSourceMap.zig (putMappings() early-return when the incoming map has zero mappings), and adds a regression test in test/cli/hot/hot.test.ts. The runtime change extends an existing deferral mechanism so that a watcher-triggered reload waits not just for a .pending promise but also for a .rejected-but-not-yet-reported one, ensuring the rejection is remapped against its own sourcemap before reloadEntryPoint() overwrites source_mappings[path].

Security risks

None identified. The change is confined to the --hot dev-mode reload path and the in-process sourcemap table; there is no new I/O surface, no auth/crypto/permissions code, and no user-controlled input beyond what --hot already reads from disk.

Level of scrutiny

This deserves a human look. The logic itself is small and traces correctly against reportExceptionInHotReloadedModuleIfNeeded() (line 718): after the deferred reload is picked up, reported_at == counter so the second reload() call proceeds rather than re-deferring, and hot_reload_counter is incremented afterward — so no defer loop. The putMappings guard reads mappingCount() only after checking len >= header_size (32 bytes) and dupes nothing on early return, so no leak. But this is core runtime event-loop ordering with subtle invariants (generation counters vs. promise pointers, microtask draining between watcher tasks), and regressions here would manifest as dropped errors or stuck reloads in dev mode rather than test failures.

Other factors

  • The PR description explicitly notes the new test "occasionally exposes a separate, pre-existing edge case" that can hang on debug builds under parallel load. Even if independent of this fix, that's a CI-flakiness risk a maintainer should consciously accept.
  • The robobun CI comment shows bun-main-entry-point.test.ts failing across all platforms on the pre-merge commit; main has since landed a fix for that test (ad5d333) and this branch was merged with main, but it's worth confirming the latest build is green.
  • The putMappings zero-mapping guard is a defensive belt-and-suspenders change in addition to the primary deferral fix; the "never worse than installing it" reasoning is sound but it does change behavior for any caller that legitimately transpiles an empty file over a previously non-empty one (the old map lingers). That seems acceptable for the stated use case but is the kind of edge a maintainer should sign off on.

@Jarred-Sumner Jarred-Sumner merged commit d644b47 into main Apr 26, 2026
64 checks passed
@Jarred-Sumner Jarred-Sumner deleted the claude/happy-pasteur-a77f3a branch April 26, 2026 09:34
robobun added a commit that referenced this pull request May 4, 2026
My previous attempt guarded the defer on `hasAnyHandleWork()`, but that
was a no-op on Windows in the --hot path: once `loadEntryPoint` breaks
out, the outer main loop enters `tickPossiblyForever` which installs a
4-minute `forever_timer` that's a uv_timer_t held on the same loop —
which makes `isActive()` report alive forever. The reload's watcher
task therefore always observed a 'live' loop and kept deferring, and
the reload never fired on Windows (Linux was fine because the timing
worked out differently). CI caught this on all three Windows lanes.

Switch to the actual question the defer is asking: 'is the C++ loader
chain still in flight on the microtask queue?' Drain microtasks and
re-check status:

  - If the chain was mid-flight, the drain runs it to completion and
    the promise transitions to `.fulfilled` or `.rejected` — same
    outcome as the previous defer-then-report cycle would have
    produced, one tick earlier.
  - If it's still `.pending` after the drain, nothing else on the
    microtask queue can settle it. Whatever's keeping it pending is
    external (an unsettled TLA, a pending timer, etc.) and the
    deferral would be permanent because
    `reportExceptionInHotReloadedModuleIfNeeded` early-returns on
    `.pending` before consuming `hot_reload_deferred`. Proceed.

The `.rejected` arm (sourcemap race from #29740) is preserved. The
hot tests — including `should not remap against a stale sourcemap
after a partial-file reload` — still pass.
robobun added a commit that referenced this pull request May 5, 2026
The hot.test.ts 'should work with sourcemap generation' flake hit
ERROR bucket on build 51604 instead of WARNING bucket — but this is
a well-documented global flake, not PR-specific:

  - #28645 retriggered 3x ('seen 5+ times, unrelated to PR')
  - #29945, #30018, #30026 all retriggered for the same test
  - #29587 build 51335: 'windows-11-aarch64 shard 6/8: hot.test.ts
    — known hot-reload flake'
  - #29735 open: 'test(hot): use atomic rename for 2MB rewrites in
    sourcemap-generation test' — test-level workaround
  - #29740 merged: 'hot: defer reload while a rejected module is
    unreported' — prior runtime fix attempt

Local reproduction: ~5% flake rate at my branch tip (e86f538), at
5215594 (before drain-in-reload), and at 0ce2096 (before the
forever-timer carve-out). Rate is identical across all three, which
means the flake is inherent to the test (qemu-aarch64 agent
slowness hitting the 10s timeout before 50 hot-reload cycles
complete) and not caused by any change on this PR.

Also same plain-old test-http-should-emit-close flake on 2019 x64
baseline lanes (the one I've been retrying since build 51163).
robobun added a commit that referenced this pull request May 5, 2026
Hot reload 'should work with sourcemap generation' flakes at ~5% on
windows-11-aarch64 under qemu: the 10s per-test timeout fires before
50 hot-reload cycles complete, independent of this PR.

Cross-PR evidence from the last 24h:
  - #28645 retriggered 3 times for the same test
  - #29945, #30018, #30026 all retriggered
  - #29587 build 51335: documented as 'known hot-reload flake'
  - #29735 open: test-level workaround in progress
  - #29740 merged: runtime fix for a related --hot sourcemap race

Local repro: identical ~5% rate at my tip (e86f538), at 5215594
(before the stranded-defer fix), and at 0ce2096 (before the
forever-timer carve-out). Rate unchanged across the three, confirming
the flake is not in this PR's footprint.

Same persistent test-http-should-emit-close flake on 2019 x64-baseline
(seen in 51163, 51423, 51448, 51473, 51503, 51528, 51604).
robobun added a commit that referenced this pull request May 10, 2026
The 'should not remap against a stale sourcemap after a partial-file
reload' test exercises a deterministic reproduction of the race #29740
fixed, but the author of that PR explicitly noted the self-rewriting
fixture occasionally exposes a separate pre-existing race ('reproduces
on released bun too') that causes the test to time out on contended CI
runners. This flake was blocking gate checks on PRs that otherwise
didn't touch the sourcemap path (~20% flake rate on release builds
under load).

Add retry: 3 until the underlying race in #29740 is addressed. The
retried test still asserts all 20 error-reload cycles complete.
robobun added a commit that referenced this pull request May 13, 2026
With a non-zero coalesce interval the drain loop can race the
self-rewriting hot file in a way that loses the pending rejection
(the pre-existing edge case noted in #29740), which on release
builds manifests as a 30 s timeout. A zero interval makes the
drain loop non-blocking — it polls once and processes whatever
is already buffered — which is the closest analogue to the
pre-loop behaviour this test was tuned against, and keeps the
self-write event landing in the reject→report window on all
build profiles.
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
## What

Runtime fix for the `--hot` sourcemap race that oven-sh#29735 works around at
the test level. Two changes:

- **`VirtualMachine.reload()`** now also defers when
`pending_internal_promise` is `.rejected` but its error hasn't been
printed yet (`pending_internal_promise_reported_at !=
hot_reload_counter`), not just when `.pending`. The deferred reload runs
from `reportExceptionInHotReloadedModuleIfNeeded()` *after* the error is
remapped and printed against its own sourcemap.
- **`SavedSourceMap.putMappings()`** keeps the existing table entry when
the incoming `InternalSourceMap` has zero mappings. A 0-mapping map can
never answer a lookup, so dropping it is never worse than installing it;
this defends against any other path that re-transpiles a comment-only
partial read.

## Why

`vm.source_mappings` is a path-hash → blob table overwritten in place on
every transpile. The event-loop tick drains microtasks between tasks, so
a watcher event that arrives after a module's eval rejects but before
`reportExceptionInHotReloadedModuleIfNeeded()` prints it can run another
`reload()` — which re-reads the file (possibly mid-rewrite, since a 2MB
`writeFileSync` is truncate + several `write()`s) and overwrites the
table entry. The still-unreported error is then either remapped against
the wrong map (transpiled coords leak through, e.g. `:1:12` when the
source is line 1003) or dropped entirely when the new
`pending_internal_promise` replaces the old one.

This was flaking on aarch64 in `should work with sourcemap generation`
(see oven-sh#29735) and can also affect users whose editors save
non-atomically.

## Test

The new test in `test/cli/hot/hot.test.ts` makes the window
deterministic: the hot file truncates itself to a comment-only stub
immediately before throwing, guaranteeing a fresh watcher event lands
between reject and report.

- [x] `bun bd test test/cli/hot/hot.test.ts -t "should not remap against
a stale sourcemap"` — 1 pass, 41 expect() calls (20 iterations)
- [x] `USE_SYSTEM_BUN=1 bun test test/cli/hot/hot.test.ts -t "should not
remap against a stale sourcemap"` — fails (the error is dropped and the
test times out waiting for it)
- [x] `bun bd test test/cli/hot/hot.test.ts -t sourcemap` — all 4
sourcemap tests pass
- [x] `bun run zig:check-all` — clean

**Note:** the self-rewriting hot file in the new test occasionally
exposes a separate, pre-existing edge case where an entry that rewrites
itself mid-eval and then throws can lose the error (reproduces on
released bun too). I observed this as a rare hang on debug builds when
running under heavy parallel load; it did not reproduce in serial `bun
bd test` runs. If it surfaces in CI it's worth a follow-up — the root
cause is independent of this change.
robobun added a commit that referenced this pull request May 14, 2026
With a non-zero coalesce interval the drain loop can race the
self-rewriting hot file in a way that loses the pending rejection
(the pre-existing edge case noted in #29740), which on release
builds manifests as a 30 s timeout. A zero interval makes the
drain loop non-blocking — it polls once and processes whatever
is already buffered — which is the closest analogue to the
pre-loop behaviour this test was tuned against, and keeps the
self-write event landing in the reject→report window on all
build profiles.
robobun added a commit that referenced this pull request May 21, 2026
With a non-zero coalesce interval the drain loop can race the
self-rewriting hot file in a way that loses the pending rejection
(the pre-existing edge case noted in #29740), which on release
builds manifests as a 30 s timeout. A zero interval makes the
drain loop non-blocking — it polls once and processes whatever
is already buffered — which is the closest analogue to the
pre-loop behaviour this test was tuned against, and keeps the
self-write event landing in the reject→report window on all
build profiles.
robobun added a commit that referenced this pull request May 24, 2026
With a non-zero coalesce interval the drain loop can race the
self-rewriting hot file in a way that loses the pending rejection
(the pre-existing edge case noted in #29740), which on release
builds manifests as a 30 s timeout. A zero interval makes the
drain loop non-blocking — it polls once and processes whatever
is already buffered — which is the closest analogue to the
pre-loop behaviour this test was tuned against, and keeps the
self-write event landing in the reject→report window on all
build profiles.
robobun added a commit that referenced this pull request Jun 3, 2026
With a non-zero coalesce interval the drain loop can race the
self-rewriting hot file in a way that loses the pending rejection
(the pre-existing edge case noted in #29740), which on release
builds manifests as a 30 s timeout. A zero interval makes the
drain loop non-blocking — it polls once and processes whatever
is already buffered — which is the closest analogue to the
pre-loop behaviour this test was tuned against, and keeps the
self-write event landing in the reject→report window on all
build profiles.
robobun added a commit that referenced this pull request Jun 5, 2026
With a non-zero coalesce interval the drain loop can race the
self-rewriting hot file in a way that loses the pending rejection
(the pre-existing edge case noted in #29740), which on release
builds manifests as a 30 s timeout. A zero interval makes the
drain loop non-blocking — it polls once and processes whatever
is already buffered — which is the closest analogue to the
pre-loop behaviour this test was tuned against, and keeps the
self-write event landing in the reject→report window on all
build profiles.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants