bun test: interrupt synchronous infinite loops on --timeout#30598
bun test: interrupt synchronous infinite loops on --timeout#30598robobun wants to merge 2 commits into
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
WalkthroughThis PR fixes infinite-loop hangs in Bun tests by enabling JSC watchdog timeouts around test callbacks. The watchdog is armed before test execution with remaining time, triggering interruption for synchronous loops. Child-process cleanup is refactored into a shared helper called from both async and sync timeout paths. Integration tests validate timeout detection and reporting for blocked event loops. ChangesPer-test timeout interrupt mechanism
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/cli/test/test-timeout-behavior.test.ts`:
- Line 37: The test currently asserts the absence of the broad string "Unhandled
error" on the combined output (expect(combined).not.toContain("Unhandled
error")), which is a fragile negative check; remove that negative assertion and
instead assert concrete, positive timeout/result indicators on the same output
variable (combined) — for example, assert presence of the specific timeout
message or status your code emits (e.g., the exact "timed out" / "Timeout" text
or an expected status code/event), or assert the function/method that reports
timeouts (the timeout handler used by the test) was called; update the test to
use only those positive, actionable checks rather than the broad negative check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 49b99b57-df6c-4114-a38b-87447e98303a
📒 Files selected for processing (4)
src/jsc/bindings/bindings.cppsrc/test_runner/Execution.zigsrc/test_runner/bun_test.zigtest/cli/test/test-timeout-behavior.test.ts
There was a problem hiding this comment.
The node:vm RELEASE_ASSERT fix in d8bdce5 looks right — checkForTermination and NodeVMModule::evaluate now let an external TerminationException propagate, and the new runInThisContext test covers it. No further issues from me; deferring to a human for the JSC Watchdog / VMTraps interaction and the idle-sentinel design since this arms a VM-global watchdog around every test callback.
Extended reasoning...
Overview
This PR makes bun test --timeout interrupt synchronous infinite loops by arming JSC's Watchdog around each test callback in runTestCallback (src/test_runner/bun_test.zig). Supporting changes: JSC__VM__setExecutionTimeLimit now calls watchdog.enteredVM() when inside an existing VMEntryScope (src/jsc/bindings/bindings.cpp); NodeVMScript::checkForTermination and NodeVMModule::evaluate no longer RELEASE_ASSERT when a TerminationException arrives from outside their own {timeout} option (src/jsc/bindings/NodeVM*.cpp); dangling-process cleanup is hoisted into a shared killDanglingProcesses helper called from both the event-loop-timer path and the synchronous-return path (src/test_runner/Execution.zig); and three new integration tests cover the sync-loop, post-microtask-loop, and node:vm-loop cases (test/cli/test/test-timeout-behavior.test.ts).
Changes since my prior review
My earlier 🔴 about the node:vm RELEASE_ASSERT_NOT_REACHED was addressed in d8bdce5: both sites now leave the termination request/exception in place when it came from neither SIGINT nor the user's {timeout}, so it unwinds to runTestCallback's catch block. A regression test for vm.runInThisContext("while(true);") under bun test --timeout was added. The 🟡 test-convention nits (test.concurrent, drop explicit 30s timeouts) and CodeRabbit's not.toContain("Unhandled error") comment were also addressed. All four inline threads are resolved.
Security risks
None identified. No auth, crypto, network, or untrusted-input handling. The watchdog is an internal VM mechanism; the only new externally-observable behavior is that a previously-hanging bun test now reports a timeout.
Level of scrutiny
High. This is not a mechanical change — it arms a VM-global JSC Watchdog around every test/hook callback with a non-epoch deadline (the default), which interacts with: (1) node:vm's own watchdog usage via setupWatchdog, (2) Worker.terminate()'s TerminationException path, (3) test.concurrent (the watchdog is VM-global, which is why the author scoped it to the synchronous body only), and (4) JSC's VMEntryScope/VMTraps lifecycle. The "relax to maxInt(i32) seconds instead of noTimeLimit" idle-sentinel workaround for the uncancellable dispatchAfter timer is well-commented but is exactly the kind of JSC-internals reasoning that benefits from a reviewer who knows Watchdog::startTimer/shouldTerminate semantics. The if (vm->entryScope) guard in bindings.cpp similarly encodes a non-obvious invariant about VMTraps::handleTraps dereferencing vm.entryScope.
Other factors
- The remaining CI failures (musl build shards, 26225.test.ts ASAN ThreadLock) appear unrelated per the author's retrigger commit message, but a human should confirm before merge.
- The known limitation (macrotask-yield-then-spin still hangs) was acknowledged by the author as a deliberate scoping decision for a follow-up; not a regression.
- Test coverage for the change itself is good (three new subprocess tests plus the existing
processes get killedtest).
3cb4605 to
bdd9fa6
Compare
The per-test timeout was enforced by an event-loop timer, which can only
fire once control returns to the event loop. A test body that never
yields — e.g. `test("x", () => { while (true); })` — blocked the
thread forever and the timeout never fired.
Arm JSC's Watchdog around each test/hook callback in run_test_callback
(src/runtime/test_runner/bun_test.rs). The watchdog schedules on a
separate VMTraps queue thread and raises a TerminationException at the
next JS safepoint, breaking out of the loop; the existing
clear_termination_exception() + evaluate_timeout() path then reports the
test as timed out. The watchdog is a hang detector, not the precise
timer — callbacks that yield are still handled by the event-loop timer.
A one-second grace over the deadline keeps it from racing the event-loop
path for tests with very short timeouts whose synchronous prologue
(spawn, fixture setup) would otherwise be cut off before the first
await. On return the limit is relaxed to a large finite sentinel rather
than noTimeLimit so the un-cancellable dispatchAfter can't drive
shouldTerminate into startTimer(∞) and trip ASSERT(hasTimeLimit()).
Hoist the dangling-process kill into kill_dangling_processes shared by
handle_timeout and the synchronous-return path in step_sequence_one so
children spawned by a watchdog-interrupted callback are still cleaned up
(the event-loop timer never fires in that case).
JSC__VM__setExecutionTimeLimit now calls watchdog.enteredVM() only when
vm->entryScope is non-null, so the timer arms from inside an existing
VMEntryScope without making Watchdog::isActive() lie when armed from
native code (which would make VMTraps::handleTraps dereference the null
vm.entryScope when servicing a stale NeedWatchdogCheck trap from
autoTick).
node:vm's checkForTermination / NodeVMModule::evaluate now let an
external TerminationException propagate instead of
RELEASE_ASSERT_NOT_REACHED when it came from neither SIGINT nor the
user's {timeout} option — the test-runner watchdog is now a third source.
Fixes #21277
…utionTimeLimit
run_test_callback relaxes the JSC watchdog to INT32_MAX seconds between
callbacks instead of noTimeLimit (to keep Watchdog::startTimer's
internal asserts happy across stale dispatchAfter callbacks). That
left JSC__VM__hasExecutionTimeLimit() returning true for the rest of
the process, which permanently opted spawnSync out of its blocking
fast path in top-level / describe() code where auto_killer.enabled is
false. Gate on getTimeLimit() < INT32_MAX so the idle sentinel reads
as 'no limit' while a real node:vm {timeout} still disables the fast
path.
89138ee to
2b594f8
Compare
Fixes #21277
Repro
Cause
The per-test timeout is an
EventLoopTimer(tag.BunTest) drained bydrainTimers()on the JS thread.runTestCallbackarms it viaupdateMinTimeoutand then synchronously calls the test body on the same thread. A callback that never yields —while (true);— never returns control to the event loop, so the timer never fires andbunTestTimeoutCallbackis never reached.Fix
Arm JSC's
Watchdogaround the callback inrunTestCallback(same mechanismnode:vm's{timeout}option uses — seesetupWatchdoginNodeVMScript.cpp). The watchdog schedules on the VMTraps queue thread and firesnotifyNeedWatchdogCheck(), which raises aTerminationExceptionat the next JS safepoint (loop back-edge), breaking out of the spin. The existingclearTerminationException()in the catch block lets subsequent tests run, andevaluateTimeout()instepSequenceOnereports.fail_because_timeout.The watchdog is a hang detector, not the precise timer — tests that yield are still timed by the event-loop timer. A one-second grace over the deadline keeps the watchdog from racing the event-loop path for tests with very short per-test timeouts whose synchronous prologue (child spawn, fixture setup) would otherwise be cut off before the first await.
Also hoisted the dangling-process cleanup into
killDanglingProcessesshared byhandleTimeoutand the synchronous-return path instepSequenceOne, so children spawned by a watchdog-interrupted callback are still killed (the event-loop timer never fires in that case).JSC__VM__setExecutionTimeLimitnow callswatchdog.enteredVM()beforesetTimeLimitso the timer starts even when the watchdog is lazily created from inside an existingVMEntryScope(the scope's ownenteredVMran before the watchdog existed). MatchessetupWatchdog. No other callers.Verification
New tests in
test/cli/test/test-timeout-behavior.test.tsspawnbun test --timeout=500against a fixture withwhile (true);(sync and post-await Promise.resolve()) and assert the spinning test is reported astimed out after 500ms, the next test still runs, and the process exits 1.Without the fix the subprocess never exits and the outer test times out; with the fix the subprocess exits in ~1.5 s. Existing
processes get killedtests in the same file continue to pass.