Use visitor.vm() in GlobalObject::visitChildren; make HTTPHeaderIdentifiers eager#32407
Conversation
…TPHeaderIdentifiers Zig::GlobalObject::visitChildrenImpl runs on concurrent GC helper threads and reached JSVMClientData via clientData(thisObject->vm()). That path dereferences JSGlobalObject::m_vm (a raw VM* const stored on the cell), then does a RELEASE_ASSERT downcast that calls the virtual isWebCoreJSClientData() on vm.clientData. If the cell was picked up by conservative scan while its IsoSubspace slot is mid-recycle (the same window the neighbouring visitGlobalObjectMember(unique_ptr) null-guard covers), the m_vm read can resolve to garbage and the vcall faults inside TypeCastTraits<JSVMClientData>::isType. Production crash reports show this signature on Windows x64 under worker spawn/terminate churn. Use visitor.vm() instead: AbstractSlotVisitor::vm() returns m_heap.vm(), which is guaranteed alive for the duration of marking and does not depend on the visited cell's state. This matches how JSC's own visitChildren implementations (FunctionExecutable, JSWeakObjectRef, Structure) obtain the VM on the marker thread. Separately, JSVMClientData::httpHeaderIdentifiers() lazily std::optional::emplace()'d on first call with no synchronization, and was reachable from both the mutator (NodeHTTP header assignment) and parallel GC marker threads. With multiple Zig::GlobalObject cells in one VM (ShadowRealm, test isolation, bake), distinct marker helpers can each visit a different global and enter the emplace concurrently. HTTPHeaderIdentifiers is now a plain eagerly-constructed member; its constructor only runs ~90 LazyProperty::initLater() calls (each a single tagged-pointer store), so there is nothing worth deferring.
|
Updated 12:38 PM PT - Jun 16th, 2026
❌ @robobun, your commit 32d5ce4 has 2 failures in
🧪 To try this PR locally: bunx bun-pr 32407That installs a local version of the PR into your bun-32407 --bun |
|
Found 4 issues this PR may fix:
🤖 Generated with Claude Code |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughFixes a concurrent GC crash by changing ChangesConcurrent GC Race Fix
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts`:
- Line 275: Remove the filterStderr function call from the assertion on stderr
in the broadcast-channel-worker-gc.test.ts file. Change
expect(filterStderr(stderr)).toBe("") to expect(stderr).toBe("") to directly
assert that stderr is empty, since the ASAN startup warning that filterStderr
was designed to filter out no longer occurs in subprocess tests after the fix
from PR `#30412`.
- Line 268: The env object in the broadcast-channel-worker-gc.test.ts test is
missing documentation explaining the purpose of the Malloc: "1" setting. Add a
comment above or inline with the env configuration that explains that Malloc:
"1" routes bmalloc through the system allocator to enable AddressSanitizer
(ASAN) detection of memory safety issues in sanitizer-enabled builds. Follow the
same documentation pattern used in similar tests like websocket-server.test.ts
to maintain consistency across the test suite.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fd099991-b765-46bf-8330-c9829e25d86d
📒 Files selected for processing (4)
src/jsc/bindings/BunClientData.cppsrc/jsc/bindings/BunClientData.hsrc/jsc/bindings/ZigGlobalObject.cpptest/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts
💤 Files with no reviewable changes (1)
- src/jsc/bindings/BunClientData.cpp
There was a problem hiding this comment.
I didn't find any issues, but this touches concurrent GC marking internals (visitChildrenImpl thread-safety, JSVMClientData member initialization order) with no deterministic fail-before repro — worth a human eye on the JSC reasoning.
Extended reasoning...
Overview
This PR addresses a Sentry crash (BUN-2V1E) in Zig::GlobalObject::visitChildrenImpl on concurrent GC helper threads. Two changes: (1) switch from thisObject->vm() to visitor.vm() when fetching client data during marking, and (2) make JSVMClientData::m_httpHeaderIdentifiers an eagerly-constructed member instead of a lazy std::optional to eliminate an unsynchronized emplace() reachable from both mutator and marker threads. A new stress test is added.
Security risks
None. This is internal GC/runtime correctness with no user-controlled input surface.
Level of scrutiny
High. Concurrent GC marking is one of the most subtle areas in the runtime. The reasoning is well-argued and matches JSC convention (visitor.vm() is indeed how upstream visitChildren implementations fetch the VM), and the unlocked std::optional::emplace() is a clear data race. However:
- The author explicitly states there is no fail-before reproduction on Linux/ASAN; correctness rests on inspection of the IsoSubspace recycling / conservative-scan window described in the neighbouring
visitGlobalObjectMember(unique_ptr)guard. - Eager construction of
HTTPHeaderIdentifiersmoves ~90LazyProperty::initLatercalls into theJSVMClientDataconstructor and changes member declaration position. I checked that the ctor takes no VM/heap dependency (just stores tagged function pointers) and that declaration order keeps it afterm_heapData/ subspaces, so destruction ordering looks unaffected — but member-ordering in this class is already load-bearing per the existing comments, so a second pair of eyes is warranted. - The added
.template visit<Visitor>qualifier is a correct dependent-name fix but is a syntactic change in a hot template path.
Other factors
No bugs flagged by the automated review. CI build is still in progress. No prior human reviews on the thread. Given the criticality of concurrent GC code paths and the absence of a deterministic repro, deferring to a human reviewer familiar with JSC GC internals.
bmalloc's SystemHeap is unimplemented on Windows and setting Malloc=1 there hits a RELEASE_BASSERT. Windows has no ASAN lane anyway, so the env var only helps on the POSIX ASAN builds.
There was a problem hiding this comment.
I didn't find any issues — the visitor.vm() switch and eager HTTPHeaderIdentifiers construction both look correct on inspection — but this touches concurrent GC marking and IsoSubspace-recycle semantics without a deterministic repro, so it's worth a human pass from someone who knows JSC's marker-thread invariants.
Extended reasoning...
Overview
This PR addresses a Sentry-reported segfault (TypeCastTraits<JSVMClientData>::isType) in Zig::GlobalObject::visitChildrenImpl on concurrent GC helper threads. It makes three changes:
src/jsc/bindings/ZigGlobalObject.cpp: replacesthisObject->vm()withvisitor.vm()when fetchingclientDataduring marking, and adds the.templatekeyword on the dependentvisit<Visitor>call.src/jsc/bindings/BunClientData.{h,cpp}: convertsm_httpHeaderIdentifiersfrom a lazily-emplacedstd::optionalto an eagerly-constructed plain member, inlining the accessor and removing the out-of-line lazy init.test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts: adds a stress test (ShadowRealm globals + worker churn +numberOfGCMarkers=8+Malloc=1on non-Windows) that the author explicitly notes is a guard rather than a deterministic fail-before proof.
Security risks
None. This is a memory-safety/concurrency fix in internal GC bindings; no auth, crypto, input parsing, or external attack surface is touched.
Level of scrutiny
High. visitChildrenImpl runs on parallel GC marker threads and is on the hot path for every collection. The reasoning about conservative-scan-during-IsoSubspace-recycle reading a stale JSGlobalObject::m_vm is plausible and consistent with the existing visitGlobalObjectMember(unique_ptr) guard just above it, and visitor.vm() is indeed the JSC convention for marker-thread VM access. Likewise, the unlocked std::optional::emplace() reachable from both mutator and parallel markers is a textbook data race, and HTTPHeaderIdentifiers() only runs ~90 LazyProperty::initLater pointer stores, so eager construction is cheap. But the PR explicitly states there is no fail-before reproduction — the fix is argued by inspection. Concurrent-GC correctness arguments of this kind benefit from review by someone who owns the JSC/heap integration.
Other factors
- The bug-hunting system found no issues.
- CodeRabbit's two nits (document
Malloc=1, dropfilterStderr) were addressed in 1d66826 (the first fully, the second declined —filterStderris still used, consistent with the other tests in the same file). - The musl build failure in CI is an unrelated LTO/linker infra issue, not caused by this diff.
HTTPHeaderIdentifiershas a no-arg default constructor that only callsinitLater, so moving it to eager member construction inJSVMClientDatadoesn't introduce ordering hazards.
Given the subtlety of concurrent-marker invariants and the absence of a deterministic repro, I'm deferring rather than approving.
…e-max.test.ts timeouts, also seen on unrelated builds 62768/62763)
|
CI status: the diff itself is green. Both build 62826 and the re-roll 62837 fail only on two darwin-14-x64 timeouts ( Ready for maintainer review; the remaining red is unrelated to this change. |
Crash
Sentry BUN-2V1E: segfault inside
WTF::TypeCastTraits<JSVMClientData>::isTypereached fromZig::GlobalObject::visitChildrenImplon a concurrent GC helper thread. 695 lifetime events (26 in the last 24h), 100% Windows x64, 1.2.17 through 1.3.14. 31% of events carry bothworkers_spawned=Trueandworkers_terminated=Truevs a ~3% baseline, pointing at worker-termination churn. Also seen intermittently in CI as thebroadcast-channel-worker-gcflake (b03f1e6 is a rekick for it).Cause
visitChildrenImplran:Two problems on this line:
1.
thisObject->vm()dereferences cell state on the marker thread.JSGlobalObject::vm()returns*m_vm(a rawVM* conststored on the cell);clientData()then doesdowncast<JSVMClientData>(vm.clientData)whoseRELEASE_ASSERT(!source || is<Target>(*source))calls the virtualisWebCoreJSClientData(). The neighbouringvisitGlobalObjectMember(unique_ptr)overload already guards a window where the concurrent marker visits aZig::GlobalObjectpicked up via conservative stack scan while its IsoSubspace slot is being recycled; in that same windowm_vmcan read stale bytes, resolving to a garbageclientDatawhose vtable load faults.visitor.vm()(=m_heap.vm()) is guaranteed alive for the duration of marking and does not depend on the visited cell at all; this is how JSC's ownvisitChildrenimplementations (FunctionExecutable,JSWeakObjectRef,Structure) fetch the VM on the marker thread.2.
httpHeaderIdentifiers()was an unlocked lazystd::optional::emplace()called from both the mutator (NodeHTTP.cppheader assignment) and concurrent GC helper threads. With more than oneZig::GlobalObjectin a VM (ShadowRealm, test-isolation swap, bake) distinct parallel marker helpers each visit a different global and all callhttpHeaderIdentifiers()on the sameJSVMClientData, so two threads can enteremplace()on the same storage. TheHTTPHeaderIdentifiersconstructor only runs ~90LazyProperty::initLater()calls (each a single tagged-pointer store), so there is nothing worth deferring.Fix
ZigGlobalObject.cpp: fetch the VM viavisitor.vm()instead ofthisObject->vm().BunClientData.{h,cpp}:m_httpHeaderIdentifiersis now a plain eagerly-constructed member;httpHeaderIdentifiers()is an inline accessor.Verification
The race window is too narrow to trip deterministically on Linux. An honest probe against the unfixed debug (ASAN) build, with
Malloc=1+BUN_JSC_collectContinuously=1+BUN_JSC_numberOfGCMarkers=8:So there is no fail-before proof to hand the gate; the crash signature is Windows-specific and timing-dependent. The fix is nonetheless clearly correct on inspection:
visitor.vm()is the JSC convention for the marker thread and cannot read through the visited cell.std::optional::emplace()reachable from two threads is a data race in any memory model.A new stress test in
test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.tshammers the exact path (multiple globals per VM via ShadowRealm, worker churn, forced parallel markers,Malloc=1on non-Windows) so a future regression on Windows CI will show up where the signature has already been observed.Related
Checked #31990 / #32071 / #32082 (worker event-loop enqueue after terminate, Strong<> releases before VM teardown): none touch
visitChildrenImplorvm.clientDataaccess from the marker thread. No open PR addresses this crash.The issue-matcher suggested four candidates; assessment against the actual stack:
TypeCastTraits<JSVMClientData>::isTypeframe but reached frombunVMConcurrentlyon the main event loop during libuv signal processing, not from a GC marker thread. Different code path; this PR does not touch it.isTypeframe reached fromJSC::subspaceForinsideRequest__createon the HTTP server request path (main thread). Different code path; this PR does not touch it.bun teston Linux x64 (1.3.7 and 1.3.9) #27312: SIGILL (not SEGV) inSlotVisitor::drainon Linux duringbun testcleanup. Adjacent area but a different fault signature; not claimed.None are auto-closed by this PR. #20641 and #20786 suggest there may be other callers of
clientData()that can see a badvm.clientDataon Windows; those are separate paths and out of scope here.