Skip to content

inspector: interrupt a busy JS loop so Debugger.pause is serviced#32549

Open
robobun wants to merge 10 commits into
mainfrom
farm/173a725c/debugger-pause-busy-loop
Open

inspector: interrupt a busy JS loop so Debugger.pause is serviced#32549
robobun wants to merge 10 commits into
mainfrom
farm/173a725c/debugger-pause-busy-loop

Conversation

@robobun

@robobun robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Fixes #32548.

Repro

Debugger.pause sent to an inspected process spinning in a tight JS loop never produced a pause:

// run under --inspect-wait, attach, then send Debugger.pause
let counter = 0;
while (true) {
  counter++;
  if (counter === Number.MAX_SAFE_INTEGER) console.log(counter);
}

No Debugger.paused event arrived, and a later Debugger.resume timed out. A program that periodically returns to its event loop pauses fine; only a loop that never yields was affected.

Cause

Inspector/CDP messages from the debugger thread are delivered to the inspected thread as event-loop tasks (BunInspectorConnection::sendMessageToInspectorFromDebuggerThread posts via ScriptExecutionContext::postTaskTo). A while (true) {} loop never turns its event loop, so the posted task never runs: Debugger.pause is never dispatched, InspectorDebuggerAgent::pause() / Debugger::schedulePauseAtNextOpportunity() never run, and the loop keeps spinning.

Fix

When queuing a message for the inspected thread, also fire a VM trap (VM::notifyNeedShellTimeoutCheck) on the inspected VM. That interrupts the thread at its next safepoint (a loop back-edge) even when it is spinning in optimized JIT code, and runs a callback on the JS thread that drains the queued inspector messages. Dispatching there calls schedulePauseAtNextOpportunity(), which enables stepping and deopts the running code, so the pause lands at the next statement with usable call frames. This mirrors how V8/Node service inspector messages on a busy isolate (v8::Isolate::RequestInterrupt) and how WebKit's WebInspectorInterruptDispatcher breaks into a running page.

The trap's JS-thread callback is registered in JSCInitialize() (during JSC::initialize), before the first VM construction freezes g_jscConfig. It is skipped while the debugger is already paused, where runWhilePaused already services messages (firing the trap there would only make the signal sender poll for a safepoint that is not reached until resume).

Verification

New regression test test/regression/issue/32548.test.ts attaches over the WebSocket inspector, lets a while (true) loop start, sends Debugger.pause, and asserts a Debugger.paused event with call frames arrives.

  • Fails on bun bd with src/ stashed (no Debugger.paused, times out).
  • Passes with the fix (~1.3s), stable across repeated runs.

A three-way check (control debugger; statement, an active-yielding program, and the busy loop) confirms the busy loop now pauses like the other two:

[control-debugger-stmt]    paused=YES  callFrames=1  reason=DebuggerStatement
[async-pause-active-yield] paused=YES  callFrames=1  reason=PauseOnNextStatement
[async-pause-busy-loop]    paused=YES  callFrames=1  reason=PauseOnNextStatement   (was: paused=NO, Debugger.pause timed out)

Inspector messages from the debugger thread are delivered to the
inspected thread as event-loop tasks. A thread spinning in a tight JS
loop (e.g. `while (true) {}`) never turns its event loop, so a
Debugger.pause sent while JS was running was never dispatched: no
Debugger.paused event arrived and a later Debugger.resume timed out.

When queuing a message for the inspected thread, fire a VM trap
(notifyNeedShellTimeoutCheck) so the thread is interrupted at its next
safepoint and drains the queued messages there. That dispatches the
pause on the JS thread, which enables stepping and deopts the running
code so the pause lands at the next statement. The trap callback is
registered during JSC::initialize, before the first VM construction
freezes g_jscConfig. The trap is skipped while already paused, where
runWhilePaused already services messages.
@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 9:50 PM PT - Jun 20th, 2026

@robobun, your commit 6dc2ef0 has 2 failures in Build #63717 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 32549

That installs a local version of the PR into your bun-32549 executable, so you can run:

bun-32549 --bun

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a VM safepoint interrupt path to the Bun debugger so Debugger.pause commands can interrupt a CPU-bound busy loop without waiting for an event loop tick. The mechanism uses notifyNeedShellTimeoutCheck to fire a shellTimeoutCheckCallback at the next JSC safepoint, which drains queued inspector messages. The callback is registered during JSCInitialize before g_jscConfig is frozen. A regression test for issue #32548 verifies the end-to-end behavior.

Changes

Debugger.pause VM interrupt for busy loops

Layer / File(s) Summary
VM interrupt mechanism in BunDebugger
src/jsc/bindings/BunDebugger.cpp
Adds JSCConfig.h and VM.h includes; introduces runWhilePausedDepth accounting in runWhilePaused() to track paused state across connections; calls interruptInspectedThreadIfBusy() after posting the inspector message task; implements interruptInspectedThreadIfBusy() to call vm().notifyNeedShellTimeoutCheck() when not already paused; adds dispatchPendingMessagesOnVMInterrupt() to drain per-VM queued messages at safepoints; adds Bun__Debugger__initVMInterruptDispatch() to install the dispatcher into g_jscConfig.shellTimeoutCheckCallback.
Register callback during JSC initialization
src/jsc/bindings/ZigGlobalObject.cpp
Declares extern "C" Bun__Debugger__initVMInterruptDispatch() and calls it inside the JSC::initialize lambda in JSCInitialize, before the first VM construction freezes g_jscConfig.
Regression test for issue #32548
test/regression/issue/32548.test.ts
Spawns a child Bun process with an infinite busy loop under --inspect-wait, parses the inspector WebSocket URL from stderr, waits for a stdout ready marker, connects a WebSocket inspector harness, enables debugger domains, sends Debugger.pause, and asserts a Debugger.paused event with valid callFrames arrives within 4 seconds.

Possibly related issues

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'inspector: interrupt a busy JS loop so Debugger.pause is serviced' clearly and concisely describes the main change—enabling Debugger.pause to interrupt non-yielding JavaScript loops.
Description check ✅ Passed The PR description fully covers both required template sections: 'What does this PR do?' (Fixes #32548 with detailed repro, cause, and fix explanation) and 'How did you verify your code works?' (new regression test with verification results).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. feat(inspector): runtime inspector activation via SIGUSR1 + CDP pause during busy execution #26867 - Also implements CDP pause during busy JS execution by using JSC's VM trap mechanism to interrupt tight loops and drain queued inspector messages

🤖 Generated with Claude Code

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

The bot is right that there is overlap with #26867.

#26867 (@alii) is a larger feature: runtime inspector activation via SIGUSR1 / process._debugProcess, which includes busy-loop CDP pause as one piece. It is built on the pre-Rust-rewrite tree (src/bun.js/*.zig, the old BunDebugger.cpp), so it would need a substantial rebase onto the current src/jsc/ tree to land.

This PR is a minimal, targeted fix for #32548 only: make Debugger.pause interrupt a busy while (true) loop on the current codebase. It does not touch SIGUSR1 activation, and uses a smaller mechanism (notifyNeedShellTimeoutCheck to run a dispatch callback at the next safepoint, relying on the default signal-based traps to deopt the loop) rather than VMManager::requestStopAll + polling traps.

Happy to defer if a maintainer would rather land the busy-pause behavior as part of #26867.

Comment thread src/jsc/bindings/BunDebugger.cpp Outdated
Comment thread test/regression/issue/32548.test.ts Outdated
robobun and others added 2 commits June 21, 2026 01:33
Address review feedback on #32549:

- interruptInspectedThreadIfBusy() read JSGlobalObject::debugger() and
  Debugger::isPaused() (both non-atomic, JS-thread-owned) from the
  debugger thread, a data race (TSAN) with a narrow UAF window if the
  Debugger is torn down concurrently. Replace it with an atomic
  pause-depth counter on BunInspectorConnection, incremented/decremented
  on the JS thread in runWhilePaused and read on the debugger thread.
- Drop the explicit 30000ms per-test timeout (repo convention is to rely
  on the default); the in-test bounded wait for Debugger.paused already
  provides a clear failure.
Comment thread test/regression/issue/32548.test.ts Outdated
Comment thread test/regression/issue/32548.test.ts Outdated
The Debugger.pause-on-busy-loop regression is a timing race between the
busy loop monopolizing the JS thread and the queued pause being
dispatched. It reproduces deterministically on slow debug/ASAN builds but
is flaky on optimized release builds, where the pause often wins the race
and fires anyway, so the fail-before was unreliable there. Restrict the
test to ASAN (skipIf(!isASAN)), where the gate (bun bd) runs it
deterministically; the fix itself is platform independent.

Also address review feedback: give the stdout busy-ready watcher a
rejection path so an early child exit surfaces the diagnostic error
instead of hanging, and reduce the file header to the issue URL.
Comment thread src/jsc/bindings/BunDebugger.cpp
The Debugger.pause-on-busy-loop regression is a timing race between the
busy loop monopolizing the JS thread and the queued pause being
dispatched. It reproduces deterministically only on slow debug builds
(the gate runs bun bd); optimized release builds routinely win the race
and pause anyway, so the fail-before was flaky/passing on release.

Switch the skip from isASAN to isDebug so the test runs on the debug gate
and skips all release lanes. This also keeps it off the release-asan lane,
which aborts on a pre-existing unchecked exception in WebKit's
JSJavaScriptCallFrame::scopeChain under validateExceptionChecks (a plain
debugger; pause trips the same WebKit path; unrelated to this fix).
Comment thread test/regression/issue/32548.test.ts
Comment thread test/regression/issue/32548.test.ts Outdated
skipIf(!isDebug) skipped the test on every CI lane (the buildkite matrix
has no debug test profile), leaving the fix with zero CI coverage. Gate on
isASAN instead: the test runs on the local debug+ASAN build and the
release-asan CI lanes, which execute slowly enough that the busy-loop race
reproduces deterministically, and skips optimized non-ASAN release lanes
where the pause often wins the race and the fail-before is flaky.

Add test/regression/issue/32548.test.ts to no-validate-exceptions.txt so
the release-asan lanes do not abort on a pre-existing unchecked exception
in WebKit's JSJavaScriptCallFrame::scopeChain (a plain debugger; pause
trips the same path; unrelated to this fix).
Comment thread src/jsc/bindings/BunDebugger.cpp
Comment thread test/regression/issue/32548.test.ts
The runWhilePausedDepth guard (added to avoid SignalSender churn while
paused) skips notifyNeedShellTimeoutCheck for messages that arrive while
paused, since runWhilePaused services them. But a Debugger.pause arriving
after the resume batch is swapped out and before the depth drops to zero
saw depth>0, skipped the trap, and the resume path breaks out of the loop
without re-draining, leaving the pause queued with no trap armed. Arm the
trap on exit from runWhilePaused when anything is still queued so the
raced-in message is serviced at the first post-resume safepoint.

Also attach no-op rejection sinks to the test's busyPromise/pausedPromise
so a secondary failure path does not surface an unhandled rejection on top
of the real error.
@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This is ready for review/merge from my side.

Diff status

  • The new regression test (test/regression/issue/32548.test.ts) passes on the debug+ASAN build and the release-asan CI lanes; it is gated to ASAN builds (skipped on non-ASAN release lanes, where the busy-loop pause is a non-deterministic race and would be an unreliable fail-before).
  • All review threads are resolved.

Remaining CI failures are pre-existing flakes unrelated to this change (which is inspector-only, BunDebugger.cpp / ZigGlobalObject.cpp):

  • test/js/node/zlib/zlib.test.js (x64-asan crash) is flaky on main itself: it failed on main builds #63678 and #63677 but passed on #63692, #63679, and #63665. It does not touch the inspector/debugger path this PR modifies.
  • test/bake/dev-and-prod.test.ts and test/package.js are Windows/bake flakes, also unrelated.

I have already used my one ci: retrigger, so I am not pushing another. Flagging for a maintainer: the change and its test are green on the lanes they touch, and the red lanes are unrelated pre-existing flakes.

Comment thread test/regression/issue/32548.test.ts Outdated
await using proc already kills the child and awaits its exit on scope
exit, so the explicit proc.kill() + await proc.exited in the finally was
redundant manual cleanup alongside using.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my earlier feedback has been addressed and I have nothing further to flag, but this is a cross-thread VM-trap change in the JSC debugger bindings (global g_jscConfig.shellTimeoutCheckCallback, a new no-validate-exceptions.txt entry, and acknowledged overlap with #26867), so it warrants a maintainer's review rather than a bot approval.

Extended reasoning...

Overview

This PR fixes #32548: Debugger.pause sent to a process spinning in a tight while (true) loop was never serviced because inspector messages were delivered as event-loop tasks that the busy loop never drained. The fix fires a VM trap (VM::notifyNeedShellTimeoutCheck) from the debugger thread so the inspected thread is interrupted at its next safepoint, and registers g_jscConfig.shellTimeoutCheckCallback = dispatchPendingMessagesOnVMInterrupt during JSCInitialize() (before the first VM freezes g_jscConfig) to drain queued CDP messages there. An atomic runWhilePausedDepth counter avoids redundant traps while paused, with a scope-exit re-arm to close the resume race. Files touched: src/jsc/bindings/BunDebugger.cpp, src/jsc/bindings/ZigGlobalObject.cpp, test/no-validate-exceptions.txt, and a new regression test.

Security risks

None apparent. The change is confined to the inspector/debugger path, which is already a privileged local-developer surface; no auth, crypto, permission, or untrusted-input handling is touched.

Level of scrutiny

High. This is non-trivial cross-thread C++ in core JSC bindings: it introduces a new global VM-trap callback that runs on the inspected JS thread from inside VMTraps::handleTraps, adds cross-thread atomics read by the debugger thread and written by the JS thread, and changes the lock/trap ordering in runWhilePaused. The threading model went through several iterations during review (initial Debugger::isPaused() race → atomic depth counter → resume-window re-arm), which is exactly the kind of design a maintainer familiar with BunDebugger.cpp and JSC's VMTraps/SignalSender should sign off on.

Other factors

  • Prior review cycle: I left ten inline comments across earlier revisions (cross-thread UAF, resume race, test timeout, rejection wiring, skip-gate coverage, manual cleanup alongside await using, etc.); every one has been addressed and resolved at HEAD (6dc2ef0). The current bug-hunting pass found nothing new.
  • no-validate-exceptions.txt entry: The PR adds a suppression for the new test, justified as a pre-existing unchecked exception in vendored WebKit's JSJavaScriptCallFrame::scopeChain that any inspector pause hits. The justification is plausible, but CLAUDE.md generally discourages new suppression entries, so a human should confirm this is the right tradeoff vs. fixing it upstream.
  • Test coverage: Gated to skipIf(!isASAN), so it runs only on ASAN lanes. The author and I went back and forth on this; their rationale (release builds spuriously pass without the fix, breaking the fail-before gate) is reasonable but means non-ASAN platforms (Windows/macOS) get no coverage of the platform-sensitive VMTraps path.
  • Overlap with #26867: The author explicitly flagged that a larger PR implements similar busy-loop interruption as part of a broader feature, and offered to defer; a maintainer should decide which lands.
  • CI: The remaining failure (zlib.test.js on x64-asan) is documented as a pre-existing flake unrelated to the inspector.

Given the cross-thread complexity, the global g_jscConfig mutation, the suppression-list entry, and the open question of overlap with #26867, this should be reviewed by a human maintainer rather than auto-approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Debugger.pause on busy loop emits no usable paused event or call frames

1 participant