hot: defer reload while a rejected module is unreported by dylan-conway · Pull Request #29740 · oven-sh/bun

dylan-conway · 2026-04-26T04:47:53Z

What

Runtime fix for the --hot sourcemap race that #29735 works around at the test level. Two changes:

VirtualMachine.reload() now also defers when pending_internal_promise is .rejected but its error hasn't been printed yet (pending_internal_promise_reported_at != hot_reload_counter), not just when .pending. The deferred reload runs from reportExceptionInHotReloadedModuleIfNeeded() after the error is remapped and printed against its own sourcemap.
SavedSourceMap.putMappings() keeps the existing table entry when the incoming InternalSourceMap has zero mappings. A 0-mapping map can never answer a lookup, so dropping it is never worse than installing it; this defends against any other path that re-transpiles a comment-only partial read.

Why

vm.source_mappings is a path-hash → blob table overwritten in place on every transpile. The event-loop tick drains microtasks between tasks, so a watcher event that arrives after a module's eval rejects but before reportExceptionInHotReloadedModuleIfNeeded() prints it can run another reload() — which re-reads the file (possibly mid-rewrite, since a 2MB writeFileSync is truncate + several write()s) and overwrites the table entry. The still-unreported error is then either remapped against the wrong map (transpiled coords leak through, e.g. :1:12 when the source is line 1003) or dropped entirely when the new pending_internal_promise replaces the old one.

This was flaking on aarch64 in should work with sourcemap generation (see #29735) and can also affect users whose editors save non-atomically.

Test

The new test in test/cli/hot/hot.test.ts makes the window deterministic: the hot file truncates itself to a comment-only stub immediately before throwing, guaranteeing a fresh watcher event lands between reject and report.

bun bd test test/cli/hot/hot.test.ts -t "should not remap against a stale sourcemap" — 1 pass, 41 expect() calls (20 iterations)
USE_SYSTEM_BUN=1 bun test test/cli/hot/hot.test.ts -t "should not remap against a stale sourcemap" — fails (the error is dropped and the test times out waiting for it)
bun bd test test/cli/hot/hot.test.ts -t sourcemap — all 4 sourcemap tests pass
bun run zig:check-all — clean

Note: the self-rewriting hot file in the new test occasionally exposes a separate, pre-existing edge case where an entry that rewrites itself mid-eval and then throws can lose the error (reproduces on released bun too). I observed this as a rare hang on debug builds when running under heavy parallel load; it did not reproduce in serial bun bd test runs. If it surfaces in CI it's worth a follow-up — the root cause is independent of this change.

`vm.source_mappings` is a path-hash → blob table overwritten in place on every transpile. The hot-reload Task loop drains microtasks between tasks, so a watcher event that arrives after a module's eval rejects but before `reportExceptionInHotReloadedModuleIfNeeded()` prints it can run another `reload()` — which re-reads the file (possibly mid-rewrite) and overwrites the table entry. The still-unreported error is then either remapped against the wrong map (transpiled coords leak through, e.g. `:1:12`) or dropped entirely when the new pending promise replaces the old one. Two changes: - `VirtualMachine.reload()` now also defers when `pending_internal_promise` is `.rejected` and `pending_internal_promise_reported_at != hot_reload_counter`, not just when `.pending`. The deferred reload runs from `reportExceptionInHotReloadedModuleIfNeeded()` after the error is printed (and remapped against its own sourcemap). - `SavedSourceMap.putMappings()` keeps the existing entry when the incoming blob has zero mappings. A truncate-then-write rewrite can be observed as a comment-only prefix that transpiles to nothing; a 0-mapping map can never answer a lookup, so dropping it is never worse than installing it. The new test makes the window deterministic by having the hot file truncate itself to a comment-only stub immediately before throwing, guaranteeing a fresh watcher event lands between reject and report.

robobun · 2026-04-26T04:48:03Z

^{Updated 1:58 AM PT - Apr 26th, 2026}

✅ @dylan-conway, your commit 52d8fbba8f7476ffad5e61dbeac24c96b6fea678 passed in Build #48062! 🎉

🧪 To try this PR locally:

bunx bun-pr 29740

That installs a local version of the PR into your bun-29740 executable, so you can run:

bun-29740 --bun

coderabbitai · 2026-04-26T04:51:44Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d277d8bd-4ee3-484a-9f3d-20d9ed73e3a7

📥 Commits

Reviewing files that changed from the base of the PR and between 30478d2 and fb2528f.

📒 Files selected for processing (3)

src/bun.js/SavedSourceMap.zig
src/bun.js/VirtualMachine.zig
test/cli/hot/hot.test.ts

Walkthrough

Improves hot reload stability by preventing source map overwrites during transient transpile states and deferring reloads when promise rejections haven't been reported yet, with a regression test validating the fix.

Changes

Cohort / File(s)	Summary
Hot reload safeguards `src/bun.js/SavedSourceMap.zig`, `src/bun.js/VirtualMachine.zig`	Adds guards against overwriting saved source maps with empty mappings during transient hot reload states. Expands deferral logic to prevent re-transpilation when promise rejections haven't been surfaced yet.
Regression test `test/cli/hot/hot.test.ts`	New test deterministically validates sourcemap handling during hot reload cycles with truncated comment-only files, verifying stack trace locations remain accurate across 20 reload iterations.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: deferring reload when a rejected module hasn't been reported yet, which is the primary runtime fix in this PR.
Description check	✅ Passed	The description fully covers both required template sections: 'What does this PR do?' is detailed with two specific changes and their rationale, and 'How did you verify your code works?' includes comprehensive test verification steps and results.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-26T04:52:26Z

Found 2 issues this PR may fix:

PANIC on hot-reload when modifying imported modules (release ,debug) and HUGE memory leak (debug). #8513 - PANIC on hot-reload when modifying imported modules: watcher fires while the editor is mid-rewrite (truncate+write), reading a partial/empty file — the exact race condition this PR defers
bun test #21872 - Panic in sourcemap.zig:lessThan from SavedSourceMap.getWithContent during error remapping — a sourcemap corruption crash that the putMappings zero-mapping guard directly addresses

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #8513
Fixes #21872

🤖 Generated with Claude Code

claude

I didn't find any bugs, but this is a race-condition fix touching the hot-reload state machine in VirtualMachine.zig and sourcemap storage locking, so it's worth a human look — particularly the interaction between the new .rejected defer path and reportExceptionInHotReloadedModuleIfNeeded, and the noted rare hang on debug builds.

Extended reasoning...

Overview

This PR fixes a --hot race where a watcher event landing between a module's eval rejection and its error report would re-transpile and overwrite source_mappings[path], causing the unreported error to remap against the wrong sourcemap (or be dropped). It touches src/bun.js/VirtualMachine.zig (extends the existing hot_reload_deferred mechanism to cover .rejected-but-unreported promises), src/bun.js/SavedSourceMap.zig (skips overwriting an existing entry with a 0-mapping InternalSourceMap), and adds a regression test in test/cli/hot/hot.test.ts.

Security risks

None identified. The changes are confined to dev-time hot-reload error reporting and sourcemap bookkeeping; no auth, crypto, network, or untrusted-input parsing paths are affected.

Level of scrutiny

Moderate-to-high. While the diff is small and follows established patterns (the .rejected branch mirrors the existing pending_internal_promise_reported_at != hot_reload_counter check in reportExceptionInHotReloadedModuleIfNeeded), this is concurrency-sensitive logic in the core runtime: event-loop microtask ordering, promise state transitions, and a mutex-guarded hashtable. The putMappings change introduces a new lock/unlock cycle whose defer scoping is correct but subtle (scoped to the inner if block, so it releases before falling through to putValue). Race-condition fixes in this area benefit from review by someone familiar with the hot-reload state machine.

Other factors

The author explicitly flags that the new self-rewriting test "occasionally exposes a separate, pre-existing edge case" causing rare hangs on debug builds under load — a reviewer should weigh whether that's acceptable to land as-is or needs a skip/follow-up first. The bug-hunting system found no issues, the PR has a targeted regression test that passes on the new build and fails on system bun, and zig:check-all is clean. No CODEOWNERS cover these files.

claude

I didn't find any bugs, but this touches core hot-reload event-loop ordering and sourcemap lifetime in VirtualMachine.zig, and the author flagged that the new self-rewriting test can occasionally hang under load — worth a human sanity check on the deferral invariants before merging.

Extended reasoning...

Overview

This PR modifies src/bun.js/VirtualMachine.zig (the reload() deferral logic), src/bun.js/SavedSourceMap.zig (putMappings() early-return when the incoming map has zero mappings), and adds a regression test in test/cli/hot/hot.test.ts. The runtime change extends an existing deferral mechanism so that a watcher-triggered reload waits not just for a .pending promise but also for a .rejected-but-not-yet-reported one, ensuring the rejection is remapped against its own sourcemap before reloadEntryPoint() overwrites source_mappings[path].

Security risks

None identified. The change is confined to the --hot dev-mode reload path and the in-process sourcemap table; there is no new I/O surface, no auth/crypto/permissions code, and no user-controlled input beyond what --hot already reads from disk.

Level of scrutiny

This deserves a human look. The logic itself is small and traces correctly against reportExceptionInHotReloadedModuleIfNeeded() (line 718): after the deferred reload is picked up, reported_at == counter so the second reload() call proceeds rather than re-deferring, and hot_reload_counter is incremented afterward — so no defer loop. The putMappings guard reads mappingCount() only after checking len >= header_size (32 bytes) and dupes nothing on early return, so no leak. But this is core runtime event-loop ordering with subtle invariants (generation counters vs. promise pointers, microtask draining between watcher tasks), and regressions here would manifest as dropped errors or stuck reloads in dev mode rather than test failures.

Other factors

The PR description explicitly notes the new test "occasionally exposes a separate, pre-existing edge case" that can hang on debug builds under parallel load. Even if independent of this fix, that's a CI-flakiness risk a maintainer should consciously accept.
The robobun CI comment shows bun-main-entry-point.test.ts failing across all platforms on the pre-merge commit; main has since landed a fix for that test (ad5d333) and this branch was merged with main, but it's worth confirming the latest build is green.
The putMappings zero-mapping guard is a defensive belt-and-suspenders change in addition to the primary deferral fix; the "never worse than installing it" reasoning is sound but it does change behavior for any caller that legitimately transpiles an empty file over a previously non-empty one (the old map lingers). That seems acceptable for the stated use case but is the kind of edge a maintainer should sign off on.

My previous attempt guarded the defer on `hasAnyHandleWork()`, but that was a no-op on Windows in the --hot path: once `loadEntryPoint` breaks out, the outer main loop enters `tickPossiblyForever` which installs a 4-minute `forever_timer` that's a uv_timer_t held on the same loop — which makes `isActive()` report alive forever. The reload's watcher task therefore always observed a 'live' loop and kept deferring, and the reload never fired on Windows (Linux was fine because the timing worked out differently). CI caught this on all three Windows lanes. Switch to the actual question the defer is asking: 'is the C++ loader chain still in flight on the microtask queue?' Drain microtasks and re-check status: - If the chain was mid-flight, the drain runs it to completion and the promise transitions to `.fulfilled` or `.rejected` — same outcome as the previous defer-then-report cycle would have produced, one tick earlier. - If it's still `.pending` after the drain, nothing else on the microtask queue can settle it. Whatever's keeping it pending is external (an unsettled TLA, a pending timer, etc.) and the deferral would be permanent because `reportExceptionInHotReloadedModuleIfNeeded` early-returns on `.pending` before consuming `hot_reload_deferred`. Proceed. The `.rejected` arm (sourcemap race from #29740) is preserved. The hot tests — including `should not remap against a stale sourcemap after a partial-file reload` — still pass.

The hot.test.ts 'should work with sourcemap generation' flake hit ERROR bucket on build 51604 instead of WARNING bucket — but this is a well-documented global flake, not PR-specific: - #28645 retriggered 3x ('seen 5+ times, unrelated to PR') - #29945, #30018, #30026 all retriggered for the same test - #29587 build 51335: 'windows-11-aarch64 shard 6/8: hot.test.ts — known hot-reload flake' - #29735 open: 'test(hot): use atomic rename for 2MB rewrites in sourcemap-generation test' — test-level workaround - #29740 merged: 'hot: defer reload while a rejected module is unreported' — prior runtime fix attempt Local reproduction: ~5% flake rate at my branch tip (e86f538), at 5215594 (before drain-in-reload), and at 0ce2096 (before the forever-timer carve-out). Rate is identical across all three, which means the flake is inherent to the test (qemu-aarch64 agent slowness hitting the 10s timeout before 50 hot-reload cycles complete) and not caused by any change on this PR. Also same plain-old test-http-should-emit-close flake on 2019 x64 baseline lanes (the one I've been retrying since build 51163).

Hot reload 'should work with sourcemap generation' flakes at ~5% on windows-11-aarch64 under qemu: the 10s per-test timeout fires before 50 hot-reload cycles complete, independent of this PR. Cross-PR evidence from the last 24h: - #28645 retriggered 3 times for the same test - #29945, #30018, #30026 all retriggered - #29587 build 51335: documented as 'known hot-reload flake' - #29735 open: test-level workaround in progress - #29740 merged: runtime fix for a related --hot sourcemap race Local repro: identical ~5% rate at my tip (e86f538), at 5215594 (before the stranded-defer fix), and at 0ce2096 (before the forever-timer carve-out). Rate unchanged across the three, confirming the flake is not in this PR's footprint. Same persistent test-http-should-emit-close flake on 2019 x64-baseline (seen in 51163, 51423, 51448, 51473, 51503, 51528, 51604).

The 'should not remap against a stale sourcemap after a partial-file reload' test exercises a deterministic reproduction of the race #29740 fixed, but the author of that PR explicitly noted the self-rewriting fixture occasionally exposes a separate pre-existing race ('reproduces on released bun too') that causes the test to time out on contended CI runners. This flake was blocking gate checks on PRs that otherwise didn't touch the sourcemap path (~20% flake rate on release builds under load). Add retry: 3 until the underlying race in #29740 is addressed. The retried test still asserts all 20 error-reload cycles complete.

With a non-zero coalesce interval the drain loop can race the self-rewriting hot file in a way that loses the pending rejection (the pre-existing edge case noted in #29740), which on release builds manifests as a 30 s timeout. A zero interval makes the drain loop non-blocking — it polls once and processes whatever is already buffered — which is the closest analogue to the pre-loop behaviour this test was tuned against, and keeps the self-write event landing in the reject→report window on all build profiles.

## What Runtime fix for the `--hot` sourcemap race that oven-sh#29735 works around at the test level. Two changes: - **`VirtualMachine.reload()`** now also defers when `pending_internal_promise` is `.rejected` but its error hasn't been printed yet (`pending_internal_promise_reported_at != hot_reload_counter`), not just when `.pending`. The deferred reload runs from `reportExceptionInHotReloadedModuleIfNeeded()` *after* the error is remapped and printed against its own sourcemap. - **`SavedSourceMap.putMappings()`** keeps the existing table entry when the incoming `InternalSourceMap` has zero mappings. A 0-mapping map can never answer a lookup, so dropping it is never worse than installing it; this defends against any other path that re-transpiles a comment-only partial read. ## Why `vm.source_mappings` is a path-hash → blob table overwritten in place on every transpile. The event-loop tick drains microtasks between tasks, so a watcher event that arrives after a module's eval rejects but before `reportExceptionInHotReloadedModuleIfNeeded()` prints it can run another `reload()` — which re-reads the file (possibly mid-rewrite, since a 2MB `writeFileSync` is truncate + several `write()`s) and overwrites the table entry. The still-unreported error is then either remapped against the wrong map (transpiled coords leak through, e.g. `:1:12` when the source is line 1003) or dropped entirely when the new `pending_internal_promise` replaces the old one. This was flaking on aarch64 in `should work with sourcemap generation` (see oven-sh#29735) and can also affect users whose editors save non-atomically. ## Test The new test in `test/cli/hot/hot.test.ts` makes the window deterministic: the hot file truncates itself to a comment-only stub immediately before throwing, guaranteeing a fresh watcher event lands between reject and report. - [x] `bun bd test test/cli/hot/hot.test.ts -t "should not remap against a stale sourcemap"` — 1 pass, 41 expect() calls (20 iterations) - [x] `USE_SYSTEM_BUN=1 bun test test/cli/hot/hot.test.ts -t "should not remap against a stale sourcemap"` — fails (the error is dropped and the test times out waiting for it) - [x] `bun bd test test/cli/hot/hot.test.ts -t sourcemap` — all 4 sourcemap tests pass - [x] `bun run zig:check-all` — clean **Note:** the self-rewriting hot file in the new test occasionally exposes a separate, pre-existing edge case where an entry that rewrites itself mid-eval and then throws can lose the error (reproduces on released bun too). I observed this as a rare hang on debug builds when running under heavy parallel load; it did not reproduce in serial `bun bd test` runs. If it surfaces in CI it's worth a follow-up — the root cause is independent of this change.

With a non-zero coalesce interval the drain loop can race the self-rewriting hot file in a way that loses the pending rejection (the pre-existing edge case noted in #29740), which on release builds manifests as a 30 s timeout. A zero interval makes the drain loop non-blocking — it polls once and processes whatever is already buffered — which is the closest analogue to the pre-loop behaviour this test was tuned against, and keeps the self-write event landing in the reject→report window on all build profiles.

claude Bot reviewed Apr 26, 2026

View reviewed changes

Merge branch 'main' into claude/happy-pasteur-a77f3a

52d8fbb

claude Bot reviewed Apr 26, 2026

View reviewed changes

Jarred-Sumner merged commit d644b47 into main Apr 26, 2026
64 checks passed

Jarred-Sumner deleted the claude/happy-pasteur-a77f3a branch April 26, 2026 09:34

claude Bot mentioned this pull request May 5, 2026

Exit unsettled top-level await instead of hanging / busy-looping #29549

Closed

robobun mentioned this pull request May 13, 2026

watcher: coalesce per-save event bursts into a single --hot reload #30617

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hot: defer reload while a rejected module is unreported#29740

hot: defer reload while a rejected module is unreported#29740
Jarred-Sumner merged 2 commits into
mainfrom
claude/happy-pasteur-a77f3a

dylan-conway commented Apr 26, 2026

Uh oh!

robobun commented Apr 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dylan-conway commented Apr 26, 2026

What

Why

Test

Uh oh!

robobun commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 26, 2026

Walkthrough

Changes

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robobun commented Apr 26, 2026 •

edited

Loading