Use visitor.vm() in GlobalObject::visitChildren; make HTTPHeaderIdentifiers eager by robobun · Pull Request #32407 · oven-sh/bun

robobun · 2026-06-16T16:45:55Z

Crash

Sentry BUN-2V1E: segfault inside WTF::TypeCastTraits<JSVMClientData>::isType reached from Zig::GlobalObject::visitChildrenImpl on a concurrent GC helper thread. 695 lifetime events (26 in the last 24h), 100% Windows x64, 1.2.17 through 1.3.14. 31% of events carry both workers_spawned=True and workers_terminated=True vs a ~3% baseline, pointing at worker-termination churn. Also seen intermittently in CI as the broadcast-channel-worker-gc flake (b03f1e6 is a rekick for it).

WTF::ParallelHelperPool::Thread::work
JSC::Heap::runBeginPhase lambda
JSC::SlotVisitor::drainFromShared
JSC::SlotVisitor::drain
JSC::SlotVisitor::visitChildren
JSC::MethodTable::visitChildren
Zig::GlobalObject::visitChildren
Zig::GlobalObject::visitChildrenImpl
WebCore::clientData(JSC::VM&)
WTF::downcast<JSVMClientData>
WTF::is<JSVMClientData>
TypeCastTraits<JSVMClientData>::isType   <-- SEGV

Cause

visitChildrenImpl ran:

WebCore::clientData(thisObject->vm())->httpHeaderIdentifiers().visit<Visitor>(visitor);

Two problems on this line:

1. thisObject->vm() dereferences cell state on the marker thread. JSGlobalObject::vm() returns *m_vm (a raw VM* const stored on the cell); clientData() then does downcast<JSVMClientData>(vm.clientData) whose RELEASE_ASSERT(!source || is<Target>(*source)) calls the virtual isWebCoreJSClientData(). The neighbouring visitGlobalObjectMember(unique_ptr) overload already guards a window where the concurrent marker visits a Zig::GlobalObject picked up via conservative stack scan while its IsoSubspace slot is being recycled; in that same window m_vm can read stale bytes, resolving to a garbage clientData whose vtable load faults. visitor.vm() (= m_heap.vm()) is guaranteed alive for the duration of marking and does not depend on the visited cell at all; this is how JSC's own visitChildren implementations (FunctionExecutable, JSWeakObjectRef, Structure) fetch the VM on the marker thread.

2. httpHeaderIdentifiers() was an unlocked lazy std::optional::emplace() called from both the mutator (NodeHTTP.cpp header assignment) and concurrent GC helper threads. With more than one Zig::GlobalObject in a VM (ShadowRealm, test-isolation swap, bake) distinct parallel marker helpers each visit a different global and all call httpHeaderIdentifiers() on the same JSVMClientData, so two threads can enter emplace() on the same storage. The HTTPHeaderIdentifiers constructor only runs ~90 LazyProperty::initLater() calls (each a single tagged-pointer store), so there is nothing worth deferring.

Fix

ZigGlobalObject.cpp: fetch the VM via visitor.vm() instead of thisObject->vm().
BunClientData.{h,cpp}: m_httpHeaderIdentifiers is now a plain eagerly-constructed member; httpHeaderIdentifiers() is an inline accessor.

Verification

The race window is too narrow to trip deterministically on Linux. An honest probe against the unfixed debug (ASAN) build, with Malloc=1 + BUN_JSC_collectContinuously=1 + BUN_JSC_numberOfGCMarkers=8:

5 iterations of an 8-round × 6-worker BroadcastChannel create/terminate/GC stress: clean.
8 iterations of a 100-round × 8-ShadowRealm (parallel-marker emplace) stress: clean.

So there is no fail-before proof to hand the gate; the crash signature is Windows-specific and timing-dependent. The fix is nonetheless clearly correct on inspection:

visitor.vm() is the JSC convention for the marker thread and cannot read through the visited cell.
An unlocked std::optional::emplace() reachable from two threads is a data race in any memory model.

A new stress test in test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts hammers the exact path (multiple globals per VM via ShadowRealm, worker churn, forced parallel markers, Malloc=1 on non-Windows) so a future regression on Windows CI will show up where the signature has already been observed.

bun bd test test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts   # 4 pass
bun bd test test/js/node/http/node-http.test.ts -t headers                     # 5 pass (HTTPHeaderIdentifiers path)
bun bd test test/js/node/http/numeric-header.test.ts                           # 1 pass

bun crash #20641 (BUN-N2D): same TypeCastTraits<JSVMClientData>::isType frame but reached from bunVMConcurrently on the main event loop during libuv signal processing, not from a GC marker thread. Different code path; this PR does not touch it.
Bun crashes when...? #20786 (BUN-PD8): same isType frame reached from JSC::subspaceFor inside Request__create on the HTTP server request path (main thread). Different code path; this PR does not touch it.
Flaky SIGILL crash in JSC garbage collector during bun test on Linux x64 (1.3.7 and 1.3.9) #27312: SIGILL (not SEGV) in SlotVisitor::drain on Linux during bun test cleanup. Adjacent area but a different fault signature; not claimed.
panic: Segmentation fault at address 0xD — "multiple threads are crashing" under Worker spawn/terminate churn (1.3.14, long-running server) #31880: generic "multiple threads are crashing" under worker churn, no decoded stack. Don't enqueue to a terminated worker's freed event loop from other threads #32071 already declines to claim it for the same reason; not claimed here either.

None are auto-closed by this PR. #20641 and #20786 suggest there may be other callers of clientData() that can see a bad vm.clientData on Windows; those are separate paths and out of scope here.

…TPHeaderIdentifiers Zig::GlobalObject::visitChildrenImpl runs on concurrent GC helper threads and reached JSVMClientData via clientData(thisObject->vm()). That path dereferences JSGlobalObject::m_vm (a raw VM* const stored on the cell), then does a RELEASE_ASSERT downcast that calls the virtual isWebCoreJSClientData() on vm.clientData. If the cell was picked up by conservative scan while its IsoSubspace slot is mid-recycle (the same window the neighbouring visitGlobalObjectMember(unique_ptr) null-guard covers), the m_vm read can resolve to garbage and the vcall faults inside TypeCastTraits<JSVMClientData>::isType. Production crash reports show this signature on Windows x64 under worker spawn/terminate churn. Use visitor.vm() instead: AbstractSlotVisitor::vm() returns m_heap.vm(), which is guaranteed alive for the duration of marking and does not depend on the visited cell's state. This matches how JSC's own visitChildren implementations (FunctionExecutable, JSWeakObjectRef, Structure) obtain the VM on the marker thread. Separately, JSVMClientData::httpHeaderIdentifiers() lazily std::optional::emplace()'d on first call with no synchronization, and was reachable from both the mutator (NodeHTTP header assignment) and parallel GC marker threads. With multiple Zig::GlobalObject cells in one VM (ShadowRealm, test isolation, bake), distinct marker helpers can each visit a different global and enter the emplace concurrently. HTTPHeaderIdentifiers is now a plain eagerly-constructed member; its constructor only runs ~90 LazyProperty::initLater() calls (each a single tagged-pointer store), so there is nothing worth deferring.

robobun · 2026-06-16T16:46:06Z

^{Updated 12:38 PM PT - Jun 16th, 2026}

❌ @robobun, your commit 32d5ce4 has 2 failures in Build #62837 (All Failures):

test/js/node/http/node-http-backpressure-max.test.ts - code 1 on 🍎 14 x64
test/js/bun/terminal/terminal.test.ts - code 1 on 🍎 14 x64

🧪 To try this PR locally:

bunx bun-pr 32407

That installs a local version of the PR into your bun-32407 executable, so you can run:

bun-32407 --bun

github-actions · 2026-06-16T16:51:28Z

Found 4 issues this PR may fix:

bun crash #20641 - Stack trace shows crash at exactly BunClientData.h:141 in WTF::TypeCastTraits<JSVMClientData>::isType via bunVMConcurrently — the exact concurrent GC visitor code path fixed by the thisObject->vm() → visitor.vm() change
Bun crashes when...? #20786 - Stack trace shows the same TypeCastTraits<JSVMClientData>::isType crash at BunClientData.h:141 via JSCell::subspaceFor, triggered by stale m_vm pointer access during GC marking
Flaky SIGILL crash in JSC garbage collector during bun test on Linux x64 (1.3.7 and 1.3.9) #27312 - Flaky SIGILL crash in SlotVisitor::drain on a GC marker thread (ParallelHelperPool::Thread::work), consistent with concurrent GC marker reading stale m_vm bytes via thisObject->vm() in visitChildrenImpl
panic: Segmentation fault at address 0xD — "multiple threads are crashing" under Worker spawn/terminate churn (1.3.14, long-running server) #31880 - Segfault under worker spawn/terminate churn with "multiple threads are crashing" — the exact scenario where recycled IsoSubspace slots cause thisObject->vm() to read stale bytes on the GC marker thread

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #20641
Fixes #20786
Fixes #27312
Fixes #31880

🤖 Generated with Claude Code

coderabbitai · 2026-06-16T16:53:52Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eb70a9fe-820a-41f6-8b1f-bd4fd075a8aa

📥 Commits

Reviewing files that changed from the base of the PR and between bd0fbbd and 32d5ce4.

📒 Files selected for processing (1)

test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts

Walkthrough

Fixes a concurrent GC crash by changing HTTPHeaderIdentifiers storage in JSVMClientData from lazy std::optional to an eagerly-constructed value, inlining the accessor, removing the out-of-line implementation, and updating GlobalObject::visitChildrenImpl to obtain the VM from visitor.vm() rather than thisObject->vm(). A regression stress test is added.

Changes

Concurrent GC Race Fix

Layer / File(s)	Summary
Eager `HTTPHeaderIdentifiers` initialization `src/jsc/bindings/BunClientData.h`, `src/jsc/bindings/BunClientData.cpp`	Changes the backing member from `std::optional<WebCore::HTTPHeaderIdentifiers>` to a direct value, inlines the `httpHeaderIdentifiers()` accessor with comments explaining why eager construction is required, and removes the now-dead lazy-init implementation from the `.cpp` file.
GC visitor uses `visitor.vm()` instead of `thisObject->vm()` `src/jsc/bindings/ZigGlobalObject.cpp`	In `GlobalObject::visitChildrenImpl`, replaces `WebCore::clientData(thisObject->vm())` with `WebCore::clientData(visitor.vm())` to prevent the concurrent GC marker from dereferencing a potentially stale `JSGlobalObject::m_vm` pointer during conservative-scan or mid-recycle windows.
Regression stress test `test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts`	Adds imports for test configuration and a stress test that spawns worker VMs creating a `BroadcastChannel`, terminates them under parallel marking load (`BUN_JSC_numberOfGCMarkers=8`, extra `ShadowRealm` globals, allocation pressure), and asserts clean exit with no stderr output.

Suggested reviewers

Jarred-Sumner

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the two main changes: using visitor.vm() in GlobalObject::visitChildren and making HTTPHeaderIdentifiers eager.
Description check	✅ Passed	The PR description comprehensively covers the crash cause, the fix implementation, verification steps, and related issues. It exceeds the minimal template requirements with detailed technical context.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts`:
- Line 275: Remove the filterStderr function call from the assertion on stderr
in the broadcast-channel-worker-gc.test.ts file. Change
expect(filterStderr(stderr)).toBe("") to expect(stderr).toBe("") to directly
assert that stderr is empty, since the ASAN startup warning that filterStderr
was designed to filter out no longer occurs in subprocess tests after the fix
from PR `#30412`.
- Line 268: The env object in the broadcast-channel-worker-gc.test.ts test is
missing documentation explaining the purpose of the Malloc: "1" setting. Add a
comment above or inline with the env configuration that explains that Malloc:
"1" routes bmalloc through the system allocator to enable AddressSanitizer
(ASAN) detection of memory safety issues in sanitizer-enabled builds. Follow the
same documentation pattern used in similar tests like websocket-server.test.ts
to maintain consistency across the test suite.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fd099991-b765-46bf-8330-c9829e25d86d

📥 Commits

Reviewing files that changed from the base of the PR and between 78f0fff and bd0fbbd.

📒 Files selected for processing (4)

src/jsc/bindings/BunClientData.cpp
src/jsc/bindings/BunClientData.h
src/jsc/bindings/ZigGlobalObject.cpp
test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts

💤 Files with no reviewable changes (1)

src/jsc/bindings/BunClientData.cpp

claude

I didn't find any issues, but this touches concurrent GC marking internals (visitChildrenImpl thread-safety, JSVMClientData member initialization order) with no deterministic fail-before repro — worth a human eye on the JSC reasoning.

Extended reasoning...

Overview

This PR addresses a Sentry crash (BUN-2V1E) in Zig::GlobalObject::visitChildrenImpl on concurrent GC helper threads. Two changes: (1) switch from thisObject->vm() to visitor.vm() when fetching client data during marking, and (2) make JSVMClientData::m_httpHeaderIdentifiers an eagerly-constructed member instead of a lazy std::optional to eliminate an unsynchronized emplace() reachable from both mutator and marker threads. A new stress test is added.

Security risks

None. This is internal GC/runtime correctness with no user-controlled input surface.

Level of scrutiny

High. Concurrent GC marking is one of the most subtle areas in the runtime. The reasoning is well-argued and matches JSC convention (visitor.vm() is indeed how upstream visitChildren implementations fetch the VM), and the unlocked std::optional::emplace() is a clear data race. However:

The author explicitly states there is no fail-before reproduction on Linux/ASAN; correctness rests on inspection of the IsoSubspace recycling / conservative-scan window described in the neighbouring visitGlobalObjectMember(unique_ptr) guard.
Eager construction of HTTPHeaderIdentifiers moves ~90 LazyProperty::initLater calls into the JSVMClientData constructor and changes member declaration position. I checked that the ctor takes no VM/heap dependency (just stores tagged function pointers) and that declaration order keeps it after m_heapData / subspaces, so destruction ordering looks unaffected — but member-ordering in this class is already load-bearing per the existing comments, so a second pair of eyes is warranted.
The added .template visit<Visitor> qualifier is a correct dependent-name fix but is a syntactic change in a hot template path.

Other factors

No bugs flagged by the automated review. CI build is still in progress. No prior human reviews on the thread. Given the criticality of concurrent GC code paths and the absence of a deterministic repro, deferring to a human reviewer familiar with JSC GC internals.

bmalloc's SystemHeap is unimplemented on Windows and setting Malloc=1 there hits a RELEASE_BASSERT. Windows has no ASAN lane anyway, so the env var only helps on the POSIX ASAN builds.

claude

I didn't find any issues — the visitor.vm() switch and eager HTTPHeaderIdentifiers construction both look correct on inspection — but this touches concurrent GC marking and IsoSubspace-recycle semantics without a deterministic repro, so it's worth a human pass from someone who knows JSC's marker-thread invariants.

Extended reasoning...

Overview

This PR addresses a Sentry-reported segfault (TypeCastTraits<JSVMClientData>::isType) in Zig::GlobalObject::visitChildrenImpl on concurrent GC helper threads. It makes three changes:

src/jsc/bindings/ZigGlobalObject.cpp: replaces thisObject->vm() with visitor.vm() when fetching clientData during marking, and adds the .template keyword on the dependent visit<Visitor> call.
src/jsc/bindings/BunClientData.{h,cpp}: converts m_httpHeaderIdentifiers from a lazily-emplaced std::optional to an eagerly-constructed plain member, inlining the accessor and removing the out-of-line lazy init.
test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts: adds a stress test (ShadowRealm globals + worker churn + numberOfGCMarkers=8 + Malloc=1 on non-Windows) that the author explicitly notes is a guard rather than a deterministic fail-before proof.

Security risks

None. This is a memory-safety/concurrency fix in internal GC bindings; no auth, crypto, input parsing, or external attack surface is touched.

Level of scrutiny

High. visitChildrenImpl runs on parallel GC marker threads and is on the hot path for every collection. The reasoning about conservative-scan-during-IsoSubspace-recycle reading a stale JSGlobalObject::m_vm is plausible and consistent with the existing visitGlobalObjectMember(unique_ptr) guard just above it, and visitor.vm() is indeed the JSC convention for marker-thread VM access. Likewise, the unlocked std::optional::emplace() reachable from both mutator and parallel markers is a textbook data race, and HTTPHeaderIdentifiers() only runs ~90 LazyProperty::initLater pointer stores, so eager construction is cheap. But the PR explicitly states there is no fail-before reproduction — the fix is argued by inspection. Concurrent-GC correctness arguments of this kind benefit from review by someone who owns the JSC/heap integration.

Other factors

The bug-hunting system found no issues.
CodeRabbit's two nits (document Malloc=1, drop filterStderr) were addressed in 1d66826 (the first fully, the second declined — filterStderr is still used, consistent with the other tests in the same file).
The musl build failure in CI is an unrelated LTO/linker infra issue, not caused by this diff.
HTTPHeaderIdentifiers has a no-arg default constructor that only calls initLater, so moving it to eager member construction in JSVMClientData doesn't introduce ordering hazards.

Given the subtlety of concurrent-marker invariants and the absence of a deterministic repro, I'm deferring rather than approving.

…e-max.test.ts timeouts, also seen on unrelated builds 62768/62763)

robobun · 2026-06-16T19:38:46Z

CI status: the diff itself is green. Both build 62826 and the re-roll 62837 fail only on two darwin-14-x64 timeouts (terminal.test.ts and node-http-backpressure-max.test.ts), neither of which touches GC marking, visitChildren, JSVMClientData, or HTTPHeaderIdentifiers. The identical pair also failed on unrelated PR build 62768, and node-http-backpressure-max on 62763, so this is a persistent lane flake. broadcast-channel-worker-gc.test.ts (the file this PR touches) passed on every lane in both runs.

Ready for maintainer review; the remaining red is unrelated to this change.

github-actions Bot added the claude label Jun 16, 2026

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts Outdated

Comment thread test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts

claude Bot reviewed Jun 16, 2026

View reviewed changes

test: gate Malloc=1 to non-Windows and document it

1d66826

bmalloc's SystemHeap is unimplemented on Windows and setting Malloc=1 there hits a RELEASE_BASSERT. Windows has no ASAN lane anyway, so the env var only helps on the POSIX ASAN builds.

claude Bot reviewed Jun 16, 2026

View reviewed changes

ci: retrigger (darwin-14-x64 terminal.test.ts + node-http-backpressur…

32d5ce4

…e-max.test.ts timeouts, also seen on unrelated builds 62768/62763)

Jarred-Sumner merged commit b0fef2f into main Jun 16, 2026
75 of 77 checks passed

Jarred-Sumner deleted the farm/e4e823db/fix-visitchildren-clientdata-race branch June 16, 2026 20:48

This was referenced Jun 16, 2026

streams: detach ArrayBuffers supplied to byte stream reads and enqueues #32406

Open

Rename Zig-prefixed names in the JSC bindings to Bun equivalents #31822

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use visitor.vm() in GlobalObject::visitChildren; make HTTPHeaderIdentifiers eager#32407

Use visitor.vm() in GlobalObject::visitChildren; make HTTPHeaderIdentifiers eager#32407
Jarred-Sumner merged 3 commits into
mainfrom
farm/e4e823db/fix-visitchildren-clientdata-race

robobun commented Jun 16, 2026 •

edited

Loading

Uh oh!

robobun commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

robobun commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robobun commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Crash

Cause

Fix

Verification

Related

Uh oh!

robobun commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

robobun commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robobun commented Jun 16, 2026 •

edited

Loading

robobun commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading