Skip to content

bun test: interrupt synchronous infinite loops on --timeout#30598

Open
robobun wants to merge 2 commits into
mainfrom
farm/fd0ad4ff/test-timeout-sync-loop
Open

bun test: interrupt synchronous infinite loops on --timeout#30598
robobun wants to merge 2 commits into
mainfrom
farm/fd0ad4ff/test-timeout-sync-loop

Conversation

@robobun

@robobun robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Fixes #21277

Repro

// foo.test.ts
import { test } from "bun:test";
test("foo", () => {
  while (true);
});
bun test --timeout=1000 foo.test.ts
# hangs forever

Cause

The per-test timeout is an EventLoopTimer (tag .BunTest) drained by drainTimers() on the JS thread. runTestCallback arms it via updateMinTimeout and then synchronously calls the test body on the same thread. A callback that never yields — while (true); — never returns control to the event loop, so the timer never fires and bunTestTimeoutCallback is never reached.

Fix

Arm JSC's Watchdog around the callback in runTestCallback (same mechanism node:vm's {timeout} option uses — see setupWatchdog in NodeVMScript.cpp). The watchdog schedules on the VMTraps queue thread and fires notifyNeedWatchdogCheck(), which raises a TerminationException at the next JS safepoint (loop back-edge), breaking out of the spin. The existing clearTerminationException() in the catch block lets subsequent tests run, and evaluateTimeout() in stepSequenceOne reports .fail_because_timeout.

The watchdog is a hang detector, not the precise timer — tests that yield are still timed by the event-loop timer. A one-second grace over the deadline keeps the watchdog from racing the event-loop path for tests with very short per-test timeouts whose synchronous prologue (child spawn, fixture setup) would otherwise be cut off before the first await.

Also hoisted the dangling-process cleanup into killDanglingProcesses shared by handleTimeout and the synchronous-return path in stepSequenceOne, so children spawned by a watchdog-interrupted callback are still killed (the event-loop timer never fires in that case).

JSC__VM__setExecutionTimeLimit now calls watchdog.enteredVM() before setTimeLimit so the timer starts even when the watchdog is lazily created from inside an existing VMEntryScope (the scope's own enteredVM ran before the watchdog existed). Matches setupWatchdog. No other callers.

Verification

New tests in test/cli/test/test-timeout-behavior.test.ts spawn bun test --timeout=500 against a fixture with while (true); (sync and post-await Promise.resolve()) and assert the spinning test is reported as timed out after 500ms, the next test still runs, and the process exits 1.

Without the fix the subprocess never exits and the outer test times out; with the fix the subprocess exits in ~1.5 s. Existing processes get killed tests in the same file continue to pass.

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Rate limit exceeded

@robobun has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minute and 2 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a3a36065-76d0-48b8-b9f1-5416638a1c40

📥 Commits

Reviewing files that changed from the base of the PR and between 35d7c19 and 7983feb.

📒 Files selected for processing (4)
  • src/jsc/bindings/NodeVMModule.cpp
  • src/jsc/bindings/NodeVMScript.cpp
  • src/jsc/bindings/bindings.cpp
  • test/cli/test/test-timeout-behavior.test.ts

Walkthrough

This PR fixes infinite-loop hangs in Bun tests by enabling JSC watchdog timeouts around test callbacks. The watchdog is armed before test execution with remaining time, triggering interruption for synchronous loops. Child-process cleanup is refactored into a shared helper called from both async and sync timeout paths. Integration tests validate timeout detection and reporting for blocked event loops.

Changes

Per-test timeout interrupt mechanism

Layer / File(s) Summary
Watchdog infrastructure and test callback timeout arming
src/jsc/bindings/bindings.cpp, src/test_runner/bun_test.zig
JSC watchdog is initialized to work in nested native code paths via enteredVM(). Test callbacks arm the watchdog with execution time limits before running, then reset to idle state afterward to prevent stale timer issues.
Timeout detection and child-process cleanup
src/test_runner/Execution.zig
A shared killDanglingProcesses helper centralizes child-process termination and logging. The helper is invoked from both the async timeout path in handleTimeout and the synchronous timeout path in stepSequenceOne to ensure consistent cleanup when timeouts fire.
Integration tests for timeout behavior
test/cli/test/test-timeout-behavior.test.ts
Two integration tests validate that per-test timeouts interrupt synchronous infinite loops (one for direct loops, one for loops after await), confirming timeout reporting and non-zero exit code in both cases.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: enabling the test runner to interrupt synchronous infinite loops when --timeout is used, which matches the core objective of fixing issue #21277.
Description check ✅ Passed The description thoroughly covers the issue (repro case), root cause (event-loop timer limitation), solution (JSC Watchdog mechanism), verification (test results), and implementation details. It fully satisfies the template requirements.
Linked Issues check ✅ Passed The PR directly addresses all coding requirements from issue #21277: it implements watchdog-based interruption of synchronous infinite loops, reports timeout errors, ensures non-zero exit codes, and prevents hangs via test cases validating the fix.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the timeout behavior for synchronous loops: watchdog integration in JSC bindings and test runner, dangling process cleanup refactoring, and integration tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Comment thread src/test_runner/bun_test.zig Outdated
Comment thread test/cli/test/test-timeout-behavior.test.ts Outdated
Comment thread src/test_runner/bun_test.zig Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/cli/test/test-timeout-behavior.test.ts`:
- Line 37: The test currently asserts the absence of the broad string "Unhandled
error" on the combined output (expect(combined).not.toContain("Unhandled
error")), which is a fragile negative check; remove that negative assertion and
instead assert concrete, positive timeout/result indicators on the same output
variable (combined) — for example, assert presence of the specific timeout
message or status your code emits (e.g., the exact "timed out" / "Timeout" text
or an expected status code/event), or assert the function/method that reports
timeouts (the timeout handler used by the test) was called; update the test to
use only those positive, actionable checks rather than the broad negative check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 49b99b57-df6c-4114-a38b-87447e98303a

📥 Commits

Reviewing files that changed from the base of the PR and between 3bf4b33 and 35d7c19.

📒 Files selected for processing (4)
  • src/jsc/bindings/bindings.cpp
  • src/test_runner/Execution.zig
  • src/test_runner/bun_test.zig
  • test/cli/test/test-timeout-behavior.test.ts

Comment thread test/cli/test/test-timeout-behavior.test.ts Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node:vm RELEASE_ASSERT fix in d8bdce5 looks right — checkForTermination and NodeVMModule::evaluate now let an external TerminationException propagate, and the new runInThisContext test covers it. No further issues from me; deferring to a human for the JSC Watchdog / VMTraps interaction and the idle-sentinel design since this arms a VM-global watchdog around every test callback.

Extended reasoning...

Overview

This PR makes bun test --timeout interrupt synchronous infinite loops by arming JSC's Watchdog around each test callback in runTestCallback (src/test_runner/bun_test.zig). Supporting changes: JSC__VM__setExecutionTimeLimit now calls watchdog.enteredVM() when inside an existing VMEntryScope (src/jsc/bindings/bindings.cpp); NodeVMScript::checkForTermination and NodeVMModule::evaluate no longer RELEASE_ASSERT when a TerminationException arrives from outside their own {timeout} option (src/jsc/bindings/NodeVM*.cpp); dangling-process cleanup is hoisted into a shared killDanglingProcesses helper called from both the event-loop-timer path and the synchronous-return path (src/test_runner/Execution.zig); and three new integration tests cover the sync-loop, post-microtask-loop, and node:vm-loop cases (test/cli/test/test-timeout-behavior.test.ts).

Changes since my prior review

My earlier 🔴 about the node:vm RELEASE_ASSERT_NOT_REACHED was addressed in d8bdce5: both sites now leave the termination request/exception in place when it came from neither SIGINT nor the user's {timeout}, so it unwinds to runTestCallback's catch block. A regression test for vm.runInThisContext("while(true);") under bun test --timeout was added. The 🟡 test-convention nits (test.concurrent, drop explicit 30s timeouts) and CodeRabbit's not.toContain("Unhandled error") comment were also addressed. All four inline threads are resolved.

Security risks

None identified. No auth, crypto, network, or untrusted-input handling. The watchdog is an internal VM mechanism; the only new externally-observable behavior is that a previously-hanging bun test now reports a timeout.

Level of scrutiny

High. This is not a mechanical change — it arms a VM-global JSC Watchdog around every test/hook callback with a non-epoch deadline (the default), which interacts with: (1) node:vm's own watchdog usage via setupWatchdog, (2) Worker.terminate()'s TerminationException path, (3) test.concurrent (the watchdog is VM-global, which is why the author scoped it to the synchronous body only), and (4) JSC's VMEntryScope/VMTraps lifecycle. The "relax to maxInt(i32) seconds instead of noTimeLimit" idle-sentinel workaround for the uncancellable dispatchAfter timer is well-commented but is exactly the kind of JSC-internals reasoning that benefits from a reviewer who knows Watchdog::startTimer/shouldTerminate semantics. The if (vm->entryScope) guard in bindings.cpp similarly encodes a non-obvious invariant about VMTraps::handleTraps dereferencing vm.entryScope.

Other factors

  • The remaining CI failures (musl build shards, 26225.test.ts ASAN ThreadLock) appear unrelated per the author's retrigger commit message, but a human should confirm before merge.
  • The known limitation (macrotask-yield-then-spin still hangs) was acknowledged by the author as a deliberate scoping decision for a follow-up; not a regression.
  • Test coverage for the change itself is good (three new subprocess tests plus the existing processes get killed test).

@robobun robobun force-pushed the farm/fd0ad4ff/test-timeout-sync-loop branch 2 times, most recently from 3cb4605 to bdd9fa6 Compare May 21, 2026 23:45
Comment thread src/runtime/test_runner/bun_test.rs
robobun added 2 commits June 5, 2026 21:22
The per-test timeout was enforced by an event-loop timer, which can only
fire once control returns to the event loop. A test body that never
yields — e.g. `test("x", () => { while (true); })` — blocked the
thread forever and the timeout never fired.

Arm JSC's Watchdog around each test/hook callback in run_test_callback
(src/runtime/test_runner/bun_test.rs). The watchdog schedules on a
separate VMTraps queue thread and raises a TerminationException at the
next JS safepoint, breaking out of the loop; the existing
clear_termination_exception() + evaluate_timeout() path then reports the
test as timed out. The watchdog is a hang detector, not the precise
timer — callbacks that yield are still handled by the event-loop timer.
A one-second grace over the deadline keeps it from racing the event-loop
path for tests with very short timeouts whose synchronous prologue
(spawn, fixture setup) would otherwise be cut off before the first
await. On return the limit is relaxed to a large finite sentinel rather
than noTimeLimit so the un-cancellable dispatchAfter can't drive
shouldTerminate into startTimer(∞) and trip ASSERT(hasTimeLimit()).

Hoist the dangling-process kill into kill_dangling_processes shared by
handle_timeout and the synchronous-return path in step_sequence_one so
children spawned by a watchdog-interrupted callback are still cleaned up
(the event-loop timer never fires in that case).

JSC__VM__setExecutionTimeLimit now calls watchdog.enteredVM() only when
vm->entryScope is non-null, so the timer arms from inside an existing
VMEntryScope without making Watchdog::isActive() lie when armed from
native code (which would make VMTraps::handleTraps dereference the null
vm.entryScope when servicing a stale NeedWatchdogCheck trap from
autoTick).

node:vm's checkForTermination / NodeVMModule::evaluate now let an
external TerminationException propagate instead of
RELEASE_ASSERT_NOT_REACHED when it came from neither SIGINT nor the
user's {timeout} option — the test-runner watchdog is now a third source.

Fixes #21277
…utionTimeLimit

run_test_callback relaxes the JSC watchdog to INT32_MAX seconds between
callbacks instead of noTimeLimit (to keep Watchdog::startTimer's
internal asserts happy across stale dispatchAfter callbacks). That
left JSC__VM__hasExecutionTimeLimit() returning true for the rest of
the process, which permanently opted spawnSync out of its blocking
fast path in top-level / describe() code where auto_killer.enabled is
false. Gate on getTimeLimit() < INT32_MAX so the idle sentinel reads
as 'no limit' while a real node:vm {timeout} still disables the fast
path.
@robobun robobun force-pushed the farm/fd0ad4ff/test-timeout-sync-loop branch from 89138ee to 2b594f8 Compare June 5, 2026 21:25
Comment thread src/jsc/bindings/NodeVMScript.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Infinite loop int test causes hang

1 participant