inspector: interrupt a busy JS loop so Debugger.pause is serviced#32549
inspector: interrupt a busy JS loop so Debugger.pause is serviced#32549robobun wants to merge 10 commits into
Conversation
Inspector messages from the debugger thread are delivered to the
inspected thread as event-loop tasks. A thread spinning in a tight JS
loop (e.g. `while (true) {}`) never turns its event loop, so a
Debugger.pause sent while JS was running was never dispatched: no
Debugger.paused event arrived and a later Debugger.resume timed out.
When queuing a message for the inspected thread, fire a VM trap
(notifyNeedShellTimeoutCheck) so the thread is interrupted at its next
safepoint and drains the queued messages there. That dispatches the
pause on the JS thread, which enables stepping and deopts the running
code so the pause lands at the next statement. The trap callback is
registered during JSC::initialize, before the first VM construction
freezes g_jscConfig. The trap is skipped while already paused, where
runWhilePaused already services messages.
|
Updated 9:50 PM PT - Jun 20th, 2026
❌ @robobun, your commit 6dc2ef0 has 2 failures in
🧪 To try this PR locally: bunx bun-pr 32549That installs a local version of the PR into your bun-32549 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a VM safepoint interrupt path to the Bun debugger so ChangesDebugger.pause VM interrupt for busy loops
Possibly related issues
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
|
The bot is right that there is overlap with #26867. #26867 (@alii) is a larger feature: runtime inspector activation via This PR is a minimal, targeted fix for #32548 only: make Happy to defer if a maintainer would rather land the busy-pause behavior as part of #26867. |
Address review feedback on #32549: - interruptInspectedThreadIfBusy() read JSGlobalObject::debugger() and Debugger::isPaused() (both non-atomic, JS-thread-owned) from the debugger thread, a data race (TSAN) with a narrow UAF window if the Debugger is torn down concurrently. Replace it with an atomic pause-depth counter on BunInspectorConnection, incremented/decremented on the JS thread in runWhilePaused and read on the debugger thread. - Drop the explicit 30000ms per-test timeout (repo convention is to rely on the default); the in-test bounded wait for Debugger.paused already provides a clear failure.
The Debugger.pause-on-busy-loop regression is a timing race between the busy loop monopolizing the JS thread and the queued pause being dispatched. It reproduces deterministically on slow debug/ASAN builds but is flaky on optimized release builds, where the pause often wins the race and fires anyway, so the fail-before was unreliable there. Restrict the test to ASAN (skipIf(!isASAN)), where the gate (bun bd) runs it deterministically; the fix itself is platform independent. Also address review feedback: give the stdout busy-ready watcher a rejection path so an early child exit surfaces the diagnostic error instead of hanging, and reduce the file header to the issue URL.
The Debugger.pause-on-busy-loop regression is a timing race between the busy loop monopolizing the JS thread and the queued pause being dispatched. It reproduces deterministically only on slow debug builds (the gate runs bun bd); optimized release builds routinely win the race and pause anyway, so the fail-before was flaky/passing on release. Switch the skip from isASAN to isDebug so the test runs on the debug gate and skips all release lanes. This also keeps it off the release-asan lane, which aborts on a pre-existing unchecked exception in WebKit's JSJavaScriptCallFrame::scopeChain under validateExceptionChecks (a plain debugger; pause trips the same WebKit path; unrelated to this fix).
skipIf(!isDebug) skipped the test on every CI lane (the buildkite matrix has no debug test profile), leaving the fix with zero CI coverage. Gate on isASAN instead: the test runs on the local debug+ASAN build and the release-asan CI lanes, which execute slowly enough that the busy-loop race reproduces deterministically, and skips optimized non-ASAN release lanes where the pause often wins the race and the fail-before is flaky. Add test/regression/issue/32548.test.ts to no-validate-exceptions.txt so the release-asan lanes do not abort on a pre-existing unchecked exception in WebKit's JSJavaScriptCallFrame::scopeChain (a plain debugger; pause trips the same path; unrelated to this fix).
The runWhilePausedDepth guard (added to avoid SignalSender churn while paused) skips notifyNeedShellTimeoutCheck for messages that arrive while paused, since runWhilePaused services them. But a Debugger.pause arriving after the resume batch is swapped out and before the depth drops to zero saw depth>0, skipped the trap, and the resume path breaks out of the loop without re-draining, leaving the pause queued with no trap armed. Arm the trap on exit from runWhilePaused when anything is still queued so the raced-in message is serviced at the first post-resume safepoint. Also attach no-op rejection sinks to the test's busyPromise/pausedPromise so a secondary failure path does not surface an unhandled rejection on top of the real error.
|
This is ready for review/merge from my side. Diff status
Remaining CI failures are pre-existing flakes unrelated to this change (which is inspector-only,
I have already used my one |
await using proc already kills the child and awaits its exit on scope exit, so the explicit proc.kill() + await proc.exited in the finally was redundant manual cleanup alongside using.
There was a problem hiding this comment.
All my earlier feedback has been addressed and I have nothing further to flag, but this is a cross-thread VM-trap change in the JSC debugger bindings (global g_jscConfig.shellTimeoutCheckCallback, a new no-validate-exceptions.txt entry, and acknowledged overlap with #26867), so it warrants a maintainer's review rather than a bot approval.
Extended reasoning...
Overview
This PR fixes #32548: Debugger.pause sent to a process spinning in a tight while (true) loop was never serviced because inspector messages were delivered as event-loop tasks that the busy loop never drained. The fix fires a VM trap (VM::notifyNeedShellTimeoutCheck) from the debugger thread so the inspected thread is interrupted at its next safepoint, and registers g_jscConfig.shellTimeoutCheckCallback = dispatchPendingMessagesOnVMInterrupt during JSCInitialize() (before the first VM freezes g_jscConfig) to drain queued CDP messages there. An atomic runWhilePausedDepth counter avoids redundant traps while paused, with a scope-exit re-arm to close the resume race. Files touched: src/jsc/bindings/BunDebugger.cpp, src/jsc/bindings/ZigGlobalObject.cpp, test/no-validate-exceptions.txt, and a new regression test.
Security risks
None apparent. The change is confined to the inspector/debugger path, which is already a privileged local-developer surface; no auth, crypto, permission, or untrusted-input handling is touched.
Level of scrutiny
High. This is non-trivial cross-thread C++ in core JSC bindings: it introduces a new global VM-trap callback that runs on the inspected JS thread from inside VMTraps::handleTraps, adds cross-thread atomics read by the debugger thread and written by the JS thread, and changes the lock/trap ordering in runWhilePaused. The threading model went through several iterations during review (initial Debugger::isPaused() race → atomic depth counter → resume-window re-arm), which is exactly the kind of design a maintainer familiar with BunDebugger.cpp and JSC's VMTraps/SignalSender should sign off on.
Other factors
- Prior review cycle: I left ten inline comments across earlier revisions (cross-thread UAF, resume race, test timeout, rejection wiring, skip-gate coverage, manual cleanup alongside
await using, etc.); every one has been addressed and resolved at HEAD (6dc2ef0). The current bug-hunting pass found nothing new. no-validate-exceptions.txtentry: The PR adds a suppression for the new test, justified as a pre-existing unchecked exception in vendored WebKit'sJSJavaScriptCallFrame::scopeChainthat any inspector pause hits. The justification is plausible, but CLAUDE.md generally discourages new suppression entries, so a human should confirm this is the right tradeoff vs. fixing it upstream.- Test coverage: Gated to
skipIf(!isASAN), so it runs only on ASAN lanes. The author and I went back and forth on this; their rationale (release builds spuriously pass without the fix, breaking the fail-before gate) is reasonable but means non-ASAN platforms (Windows/macOS) get no coverage of the platform-sensitiveVMTrapspath. - Overlap with #26867: The author explicitly flagged that a larger PR implements similar busy-loop interruption as part of a broader feature, and offered to defer; a maintainer should decide which lands.
- CI: The remaining failure (
zlib.test.json x64-asan) is documented as a pre-existing flake unrelated to the inspector.
Given the cross-thread complexity, the global g_jscConfig mutation, the suppression-list entry, and the open question of overlap with #26867, this should be reviewed by a human maintainer rather than auto-approved.
Fixes #32548.
Repro
Debugger.pausesent to an inspected process spinning in a tight JS loop never produced a pause:No
Debugger.pausedevent arrived, and a laterDebugger.resumetimed out. A program that periodically returns to its event loop pauses fine; only a loop that never yields was affected.Cause
Inspector/CDP messages from the debugger thread are delivered to the inspected thread as event-loop tasks (
BunInspectorConnection::sendMessageToInspectorFromDebuggerThreadposts viaScriptExecutionContext::postTaskTo). Awhile (true) {}loop never turns its event loop, so the posted task never runs:Debugger.pauseis never dispatched,InspectorDebuggerAgent::pause()/Debugger::schedulePauseAtNextOpportunity()never run, and the loop keeps spinning.Fix
When queuing a message for the inspected thread, also fire a VM trap (
VM::notifyNeedShellTimeoutCheck) on the inspected VM. That interrupts the thread at its next safepoint (a loop back-edge) even when it is spinning in optimized JIT code, and runs a callback on the JS thread that drains the queued inspector messages. Dispatching there callsschedulePauseAtNextOpportunity(), which enables stepping and deopts the running code, so the pause lands at the next statement with usable call frames. This mirrors how V8/Node service inspector messages on a busy isolate (v8::Isolate::RequestInterrupt) and how WebKit'sWebInspectorInterruptDispatcherbreaks into a running page.The trap's JS-thread callback is registered in
JSCInitialize()(duringJSC::initialize), before the first VM construction freezesg_jscConfig. It is skipped while the debugger is already paused, whererunWhilePausedalready services messages (firing the trap there would only make the signal sender poll for a safepoint that is not reached until resume).Verification
New regression test
test/regression/issue/32548.test.tsattaches over the WebSocket inspector, lets awhile (true)loop start, sendsDebugger.pause, and asserts aDebugger.pausedevent with call frames arrives.bun bdwithsrc/stashed (noDebugger.paused, times out).A three-way check (control
debugger;statement, an active-yielding program, and the busy loop) confirms the busy loop now pauses like the other two: