fix(test --isolate): repoint NapiEnv at new global during swap#30214
fix(test --isolate): repoint NapiEnv at new global during swap#30214robobun wants to merge 5 commits into
Conversation
NapiEnv holds a raw Zig::GlobalObject*. When bun test --isolate swaps the global between files, the old global is gcUnprotect()'d and later swept, but NapiFinalizerTasks queued by the previous file (and weak handle callbacks that fire on subsequent sweeps) still hold NapiEnvs pointing at the dead global. Opening a handle scope through that env writes a barrier on a swept cell — debug builds assert isMarked(cell) in Heap::addToRememberedSet, release builds segfault at 0x68/0xD0. Move the old global's NapiEnvs to the new global and repoint m_globalObject during Zig__GlobalObject__createForTestIsolation so every env->globalObject() caller (NapiHandleScope, Finalizer.run, napi_async_work, ThreadSafeFunction) sees a live, gcProtect()'d global. Fixes #30205
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughWhen running tests with ChangesN-API Environment Repointing for Test Isolation
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 0/5 reviews remaining, refill in 50 minutes and 14 seconds. Comment |
|
Updated 8:52 PM PT - May 3rd, 2026
❌ @robobun, your commit 4494b6b has 3 failures in
🧪 To try this PR locally: bunx bun-pr 30214That installs a local version of the PR into your bun-30214 --bun |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/bindings/napi.h`:
- Around line 383-388: Add a guard and restrict visibility for the env
retargeting: make inline void setGlobalObject(Zig::GlobalObject* globalObject) a
non-public mutator (move it out of the public section) and inside it assert that
globalObject->vm() (or equivalent accessor) equals m_vm to enforce the invariant
that m_vm is fixed at construction; document that
GlobalObject::adoptNapiEnvsForTestIsolation() is the only intended caller so
callers must also update ownership bookkeeping when retargeting NapiEnvs.
In `@test/regression/issue/30205.test.ts`:
- Around line 80-90: Replace the current separate assertions that read
stdout/stderr then check exitCode with a single, combined assertion that awaits
stdout, stderr and proc.exited together and asserts the child result object
(including exitCode and signalCode) at once; specifically, change the
Promise.all call to collect proc.stdout.text(), proc.stderr.text(), and
proc.exited into variables (e.g. stdout, stderr, exited) and then assert an
object containing stdout, stderr, exitCode (from exited) and signalCode (from
exited) instead of asserting streams first and exitCode separately so
crashes/segfaults produce actionable diffs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9cf4346a-739c-43a2-9089-42c5523dfba5
📒 Files selected for processing (4)
src/bun.js/bindings/ZigGlobalObject.cppsrc/bun.js/bindings/ZigGlobalObject.hsrc/bun.js/bindings/napi.htest/regression/issue/30205.test.ts
|
Superseded by #30216, which has the same
Happy to fold anything from here into #30216 instead if preferred. |
… setGlobalObject The addon's printf() output is fully-buffered on a pipe and may or may not be flushed before the isolate runner exits depending on build flavour; release-asan CI saw only the banner on stdout. The crash is fully detected by stderr lacking the pass summary + non-zero exit, so assert those together with signalCode for an actionable diff and drop the stdout capture entirely. Also assert &getVM(global) == &m_vm in NapiEnv::setGlobalObject so future callers can't retarget an env across VMs.
CI ASAN sets BUN_JSC_validateExceptionChecks=1 on the outer test; bunEnv spreads process.env so the inner bun test --isolate inherited it and aborted on a pre-existing unchecked-exception scope in NAPI module init (napi_create_function -> napi_set_named_property), unrelated to the global-swap UAF under test. Strip it from the child env like the other NAPI tests in no-validate-exceptions.txt.
…worker panic, never retry (#30216) ## What `bun test --isolate` / `--parallel` crashes when a test file loads a native addon whose deferred napi finalizers outlive the file. The `--parallel` coordinator then silently retries the file once, which masks the panic and lets the run exit 0. Fixes #30205, #30191. Supersedes #30214 (same NapiEnv fix, but without the coordinator change, the `cleanup_hooks` retarget, or a test that actually reproduces on unpatched `main`). ## Reproduction ```sh git clone https://github.com/workglow-dev/libs && cd libs bun i && bun run build:packages bun test --timeout=30000 --parallel=4 packages/test/src/test/{util,task}/*.test.ts ``` On `main` (d484fd6), 3–4 workers crash per run with either ``` ASSERTION FAILED: isMarked(cell) JavaScriptCore/heap/Heap.cpp:1232 : void JSC::Heap::addToRememberedSet(const JSCell *) ``` or (when the slot is already being reallocated) ``` ASSERTION FAILED: m_cellState == CellState::DefinitelyWhite JavaScriptCore/JSCellInlines.h:69 : JSC::JSCell::JSCell(VM &, Structure *) ``` and in release builds the segfaults at `0x68` / `0xD0` reported in #30205. ## Root cause Frame-pointer walk from the assertion: ``` #3 Bun::NapiHandleScope::open(Zig::GlobalObject*, bool) #4 NapiHandleScope__open #6 napi.Finalizer.run #7 napi.NapiFinalizerTask.runOnJSThread #10 event_loop.tick #11 event_loop.waitForPromise #13 VirtualMachine.loadEntryPointForTestRunner ← next test file ``` `NapiEnv::m_globalObject` is a raw `Zig::GlobalObject*`. For non-experimental addons (`nm_version != NAPI_VERSION_EXPERIMENTAL`, which is ~every real-world addon — sharp, better-sqlite3, etc.), `napi_wrap`/`napi_create_external` finalizers are **deferred** to the event loop as `NapiFinalizerTask` rather than run inside GC sweep. Objects rooted on the old global (module graph, `globalThis.*`) only become collectable when `Zig__GlobalObject__createForTestIsolation` runs `gcUnprotect(oldGlobal)`. The `DeferGC` from #29573 ends at that function's `}`, so the next GC runs there, collects those objects, and enqueues their finalizers. Those tasks then run on the very next `eventLoop().tick()` — inside `loadEntryPointForTestRunner`'s `waitForPromise` for file N+1. `Finalizer.run` opens a `NapiHandleScope` via `env->globalObject()`, which reads `NapiHandleScopeImplStructure()` off the dead cell and writes `m_currentNapiHandleScopeImpl` on it → write barrier on an unmarked cell. The `--parallel` coordinator's `reapWorker` then re-queued the file once (`retries[idx] < 1`) into a fresh worker with no stale `NapiEnv`, which passed — so the run reported 0 fail despite multiple Bun panics in the log. ## Fix **NapiEnv retarget** (`ZigGlobalObject.cpp`, `napi.h`): `Zig__GlobalObject__createForTestIsolation` now calls `newGlobal->adoptNapiEnvsForTestIsolation(oldGlobal)` before `gcUnprotect`. Each `NapiEnv::m_globalObject` is repointed at the new global and the `Ref<NapiEnv>`s are moved over, so late finalizers open handle scopes on a live global and the envs stay owned after the old global is swept. `VirtualMachine.swapGlobalForTestIsolation` also repoints `rare_data.cleanup_hooks[*].globalThis` so `CleanupHook.eql()` stays accurate. **No retry, abort on panic** (`Coordinator.zig`): removed the per-file retry. A worker that dies mid-file is counted as one failure. If it died by a fatal signal (SIGILL/SIGTRAP/SIGABRT/SIGBUS/SIGFPE/SIGSEGV/SIGSYS — Bun's own `@trap()`, a JSC/WTF assertion, or native-addon crash), the whole run aborts with `error: a test worker process crashed with <SIG> while running <file>`. `process.exit()` / SIGKILL are still just a per-file failure and the run continues. ## Verification - `test/regression/issue/30205.test.ts` — 4 tests. Adds a tiny non-experimental addon (`isolate_finalizer_addon.c`) and a fixture pattern (`Bun.gc(true)` + module-scope `await 0` + objects rooted on `globalThis`) that crashes **8/8** on unpatched `main` and passes 8/8 with this change. - `workglow-dev/libs` full 201-file unit suite: 3× clean `--parallel=4` runs (was 3–4 crashes/run). - Gate: `git stash -- src/ && bun bd test test/regression/issue/30205.test.ts` → 3/4 fail; with fix → 4/4 pass. - `test/cli/test/isolation.test.ts`, `test/regression/issue/29519.test.ts` → pass (one pre-existing unrelated timeout in isolation.test.ts, same as #29573). - `test/cli/test/parallel.test.ts` → all tests I touched pass; the 3 timing-sensitive scale-up/work-steal tests that fail in this container fail identically on unmodified `main`. --------- Co-authored-by: robobun <robobun@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
…worker panic, never retry (oven-sh#30216) ## What `bun test --isolate` / `--parallel` crashes when a test file loads a native addon whose deferred napi finalizers outlive the file. The `--parallel` coordinator then silently retries the file once, which masks the panic and lets the run exit 0. Fixes oven-sh#30205, oven-sh#30191. Supersedes oven-sh#30214 (same NapiEnv fix, but without the coordinator change, the `cleanup_hooks` retarget, or a test that actually reproduces on unpatched `main`). ## Reproduction ```sh git clone https://github.com/workglow-dev/libs && cd libs bun i && bun run build:packages bun test --timeout=30000 --parallel=4 packages/test/src/test/{util,task}/*.test.ts ``` On `main` (d484fd6), 3–4 workers crash per run with either ``` ASSERTION FAILED: isMarked(cell) JavaScriptCore/heap/Heap.cpp:1232 : void JSC::Heap::addToRememberedSet(const JSCell *) ``` or (when the slot is already being reallocated) ``` ASSERTION FAILED: m_cellState == CellState::DefinitelyWhite JavaScriptCore/JSCellInlines.h:69 : JSC::JSCell::JSCell(VM &, Structure *) ``` and in release builds the segfaults at `0x68` / `0xD0` reported in oven-sh#30205. ## Root cause Frame-pointer walk from the assertion: ``` oven-sh#3 Bun::NapiHandleScope::open(Zig::GlobalObject*, bool) oven-sh#4 NapiHandleScope__open oven-sh#6 napi.Finalizer.run oven-sh#7 napi.NapiFinalizerTask.runOnJSThread oven-sh#10 event_loop.tick oven-sh#11 event_loop.waitForPromise oven-sh#13 VirtualMachine.loadEntryPointForTestRunner ← next test file ``` `NapiEnv::m_globalObject` is a raw `Zig::GlobalObject*`. For non-experimental addons (`nm_version != NAPI_VERSION_EXPERIMENTAL`, which is ~every real-world addon — sharp, better-sqlite3, etc.), `napi_wrap`/`napi_create_external` finalizers are **deferred** to the event loop as `NapiFinalizerTask` rather than run inside GC sweep. Objects rooted on the old global (module graph, `globalThis.*`) only become collectable when `Zig__GlobalObject__createForTestIsolation` runs `gcUnprotect(oldGlobal)`. The `DeferGC` from oven-sh#29573 ends at that function's `}`, so the next GC runs there, collects those objects, and enqueues their finalizers. Those tasks then run on the very next `eventLoop().tick()` — inside `loadEntryPointForTestRunner`'s `waitForPromise` for file N+1. `Finalizer.run` opens a `NapiHandleScope` via `env->globalObject()`, which reads `NapiHandleScopeImplStructure()` off the dead cell and writes `m_currentNapiHandleScopeImpl` on it → write barrier on an unmarked cell. The `--parallel` coordinator's `reapWorker` then re-queued the file once (`retries[idx] < 1`) into a fresh worker with no stale `NapiEnv`, which passed — so the run reported 0 fail despite multiple Bun panics in the log. ## Fix **NapiEnv retarget** (`ZigGlobalObject.cpp`, `napi.h`): `Zig__GlobalObject__createForTestIsolation` now calls `newGlobal->adoptNapiEnvsForTestIsolation(oldGlobal)` before `gcUnprotect`. Each `NapiEnv::m_globalObject` is repointed at the new global and the `Ref<NapiEnv>`s are moved over, so late finalizers open handle scopes on a live global and the envs stay owned after the old global is swept. `VirtualMachine.swapGlobalForTestIsolation` also repoints `rare_data.cleanup_hooks[*].globalThis` so `CleanupHook.eql()` stays accurate. **No retry, abort on panic** (`Coordinator.zig`): removed the per-file retry. A worker that dies mid-file is counted as one failure. If it died by a fatal signal (SIGILL/SIGTRAP/SIGABRT/SIGBUS/SIGFPE/SIGSEGV/SIGSYS — Bun's own `@trap()`, a JSC/WTF assertion, or native-addon crash), the whole run aborts with `error: a test worker process crashed with <SIG> while running <file>`. `process.exit()` / SIGKILL are still just a per-file failure and the run continues. ## Verification - `test/regression/issue/30205.test.ts` — 4 tests. Adds a tiny non-experimental addon (`isolate_finalizer_addon.c`) and a fixture pattern (`Bun.gc(true)` + module-scope `await 0` + objects rooted on `globalThis`) that crashes **8/8** on unpatched `main` and passes 8/8 with this change. - `workglow-dev/libs` full 201-file unit suite: 3× clean `--parallel=4` runs (was 3–4 crashes/run). - Gate: `git stash -- src/ && bun bd test test/regression/issue/30205.test.ts` → 3/4 fail; with fix → 4/4 pass. - `test/cli/test/isolation.test.ts`, `test/regression/issue/29519.test.ts` → pass (one pre-existing unrelated timeout in isolation.test.ts, same as oven-sh#29573). - `test/cli/test/parallel.test.ts` → all tests I touched pass; the 3 timing-sensitive scale-up/work-steal tests that fail in this container fail identically on unmodified `main`. --------- Co-authored-by: robobun <robobun@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Fixes #30205 (also the remaining half of #30191).
Repro
Any test suite that loads a NAPI module under
bun test --isolate(or--parallel, which implies--isolateper worker). Minimal:Release builds segfault at
0x68/0xD0/0xE00000020; debug builds assertisMarked(cell)inJSC::Heap::addToRememberedSet. The reporter'sworkglow-dev/libssuite hits it on everybun run test:bun:unit(the panics are masked byretry = 1so the exit code stays 0).Root cause
NapiEnv::m_globalObjectis a rawZig::GlobalObject*.Zig__GlobalObject__createForTestIsolationcreates a new global andgcUnprotect()s the old one, but never touches theNapiEnvs the old global created. Those envs are kept alive byNapiEnv::Refheld in queuedNapiFinalizerTasks (and by weak-handle owners that firenapi_internal_enqueue_finalizerwhen the old graph is swept on a later GC). When the deferred finalizer runs on the event loop:Every
process_dlopenin theFeatures:line of the reporter's panics is the fingerprint.This is distinct from #29519 / PR #29573: that
DeferGCprotects the new global duringfinishCreation; this is the old global being written to long after the swap, via a raw pointer the swap never updated.Fix
During the swap, move
oldGlobal->m_napiEnvsinto the new global and set each env'sm_globalObjectto the new global. This is done inside the existingDeferGCscope, beforegcUnprotect(oldGlobal), so the envs never observe an unprotected pointer. The new global keeps them alive and reports their pending finalizers viahasNapiFinalizers(); everyenv->globalObject()caller (NapiHandleScope,Finalizer.run,napi_async_work::runFromJS,ThreadSafeFunction,toJS(napi_env)) now sees a live,gcProtect()'d global.Verification
New regression test in
test/regression/issue/30205.test.tsspawnsbun test --isolateover 20 files that eachrequirethe existingtest/napi/napi-app/async_finalize_addon(NAPI v8, deferred finalizers) and wrap 1000 objects with Buffer ballast rooted onglobalThis, underBUN_JSC_collectContinuously=1.ASSERTION FAILED: isMarked(cell)— 10/10panic: Segmentation fault at address 0x68— 8/8The reporter's
sharprepro (20 files, nocollectContinuously) also passes 3/3 after the fix and 3/3 under--parallel=4 --retry=1 --max-concurrency=20.