Fix BroadcastChannel channelToContextIdentifier locking and dispatchMessage lifetime by robobun · Pull Request #29441 · oven-sh/bun

robobun · 2026-04-18T10:06:04Z

What does this PR do?

Fixes two data races in BroadcastChannel.cpp that surface as ASAN heap-use-after-free in test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts.

Bug A — `channelToContextIdentifier` HashMap one-sided locking

The prior ThreadSafeWeakPtr fix only covered allBroadcastChannels(). The second global, channelToContextIdentifier(), has its own lock — but it was taken at only 1 of 4 call sites:

Site	Thread	Lock?
`registerChannel` `.add()`	main	❌
`unregisterChannel` `.remove()`	main	❌
`dispatchMessageTo` `.get()`	main	❌
`contextIdForBroadcastChannelId` `.get()`	worker (via `ensureOnContextThread` → `dispatchMessage`)	✅

When main rehashes the HashMap (add/remove during worker spawn/terminate) while a worker reads it, the worker walks a freed bucket array → ASAN heap-UAF inside WTF::HashTable. The accessor was also missing WTF_REQUIRES_LOCK, so -Wthread-safety never flagged this.

Fix: add Locker locker { channelToContextIdentifierLock }; at the three unlocked sites and annotate the accessor with WTF_REQUIRES_LOCK(channelToContextIdentifierLock) to match allBroadcastChannels().

Bug B — `dispatchMessage` captures raw `this` in async task

dispatchMessage posts a task with [this, message = ...] — raw this, no Ref { *this }. The caller (dispatchMessageTo's inner lambda) holds a strong RefPtr from the ThreadSafeWeakPtr lookup, but that ref is dropped when the outer lambda returns. During worker terminate the JS wrapper is destroyed → refcount 0 → ~BroadcastChannel → the queued task reads freed this->m_isClosed and calls this->dispatchEvent().

Fix: capture protectedThis = Ref { *this } in the postTaskTo lambda, matching the pattern in MessagePort.cpp, Performance.cpp, and WebSocket.cpp.

How did you verify your code works?

bun bd test test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts — 3/3 pass, verified stable across 3 consecutive runs under debug+ASAN
bun bd test test/js/web/broadcastchannel/broadcast-channel.test.ts — 10/11 pass; the one failure (broadcast channel worker wait) is pre-existing on main under debug+ASAN (it uses Bun.sleepSync(500) which isn't enough for an ASAN worker to start) and is unrelated to this change

Test changes

Added a stress test that churns channel registrations (forcing HashMap rehashes) while workers cross-post (reaching the worker-side map read), then terminates workers mid-dispatch (leaving queued tasks whose this would otherwise dangle).
Filtered the unconditional ASAN startup warning from child-process stderr so expect(stderr).toBe("") holds on ASAN builds — same pattern as fetch-abort-queued.test.ts / string-decoder.test.js.
Scaled timeouts for isDebug || isASAN — worker spawn under debug+ASAN is ~5–10× slower; the existing tests were borderline at the 5s default.

Note: both races are highly timing-dependent (a HashMap rehash must land mid-get()); 20 local ASAN runs on macOS did not repro before the fix. The new stress test maximises contention but is not guaranteed to fail without the fix on every platform.

…essage lifetime Two data races in BroadcastChannel.cpp that surface as ASAN heap-UAF in broadcast-channel-worker-gc.test.ts: (A) channelToContextIdentifier() HashMap was only locked on the worker- thread reader (contextIdForBroadcastChannelId), not on the three main-thread writers/readers (registerChannel .add, unregisterChannel .remove, dispatchMessageTo .get). When a main-thread .add() rehashes the table while a worker thread is mid-.get(), the worker dereferences a freed bucket array. Add Locker at all call sites and annotate the accessor with WTF_REQUIRES_LOCK so -Wthread-safety catches future regressions (matching allBroadcastChannels()). (B) dispatchMessage() posts a task capturing raw `this` with no protecting Ref. The caller holds a strong ref from the ThreadSafeWeakPtr lookup but drops it when the outer lambda returns, so the queued task can outlive the BroadcastChannel during worker terminate + GC. Capture protectedThis = Ref { *this } in the lambda, matching MessagePort.cpp / WebSocket.cpp. Also: - Add a stress test that churns registrations while workers cross-post, exercising the worker-side map read concurrently with main-side rehash. - Filter the unconditional ASAN startup warning from child-process stderr so the existing assertions hold under ASAN builds. - Scale test timeouts for debug/ASAN (worker spawn is much slower there).

robobun · 2026-04-18T10:06:13Z

^{Updated 5:15 AM PT - Apr 18th, 2026}

❌ @robobun, your commit 510f0f4 has 3 failures in Build #46260 (All Failures):

test/js/bun/net/socket.test.ts - code 1 on 🪟 2019 x64
test/bake/dev/stress.test.ts - code 1 on 🪟 2019 x64-baseline
[libuv] ../../../../vendor/libuv/src/unix/udp.c - clang warning on 🍎 aarch64 - build-cpp

🧪 To try this PR locally:

bunx bun-pr 29441

That installs a local version of the PR into your bun-29441 executable, so you can run:

bun-29441 --bun

coderabbitai · 2026-04-18T10:11:04Z

Walkthrough

Modified broadcast channel worker GC tests to introduce timeout logic based on build type and add stderr filtering to handle ASAN warnings. Updated existing subprocess-based tests and added a new regression test for concurrent register/dispatch/terminate BroadcastChannel operations.

Changes

Cohort / File(s)	Summary
Broadcast Channel Worker GC Tests `test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts`	Added conditional `timeout` based on build type (60s for debug/ASAN, 10s otherwise), introduced `filterStderr()` helper to validate stderr post-filtering, updated existing subprocess tests with timeout and filtering, and added new regression test for concurrent register/dispatch/terminate operations with coordinated main-thread and worker-thread BroadcastChannel activity.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main fixes: addressing locking issues with channelToContextIdentifier and lifetime issues with dispatchMessage in BroadcastChannel, which aligns with the core changes documented in the PR.
Description check	✅ Passed	The description fully addresses the template requirements with detailed 'What does this PR do?' and 'How did you verify your code works?' sections, providing comprehensive technical details about both bugs, fixes, and verification approach.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts`:
- Line 168: The hardcoded sleep line "await new Promise(resolve =>
setTimeout(resolve, 50));" makes the test flaky; instead have the worker/child
emit an explicit completion signal (e.g., postMessage or console.log a unique
token like "GC_DONE") when the queued cross-thread work finishes, and replace
the setTimeout with awaiting that signal in the test (e.g., listen for
worker.onmessage / BroadcastChannel.onmessage or
childProcess.stdout/once('message') for the unique token), then remove the
timeout-based wait.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3cc0d175-33eb-456e-b751-c6872ff6f1cc

📥 Commits

Reviewing files that changed from the base of the PR and between 8f2519f and 785b230.

📒 Files selected for processing (2)

src/bun.js/bindings/webcore/BroadcastChannel.cpp
test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts

github-actions · 2026-04-18T10:11:48Z

Found 1 issue this PR may fix:

Bun BroadcastChannel never receive for same reference #10433 - BroadcastChannel messages never received on same reference; the locking fix in dispatchMessageTo and the Ref{*this} lifetime fix in dispatchMessage could address silent message delivery failures

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #10433

🤖 Generated with Claude Code

github-actions · 2026-04-18T10:14:02Z

This PR may be a duplicate of:

Tweak BroadcastChannel lifetimes #19881 - Also tweaks BroadcastChannel lifetimes in dispatchMessage (captures Ref { *this } instead of raw this), partially overlapping with Bug B of this PR

🤖 Generated with Claude Code

claude

This PR fixes real data races in thread-safety-critical concurrent code — locking gaps in a shared HashMap and a raw- capture in an async task. The fixes are mechanically sound and follow established patterns, but thread-safety fixes in C++ warrant a human look.

Extended reasoning...

Overview

Two targeted fixes in : (A) adds at the three unlocked call sites on the main thread and annotates the accessor with to match ; (B) changes the lambda capture from raw to to prevent a use-after-free if the channel is destroyed before the task runs. The test file gains a new stress test, ASAN stderr filtering, and scaled timeouts.

Security Risks

No security-sensitive paths (auth, crypto, permissions) are touched. The races being fixed are memory-safety bugs (heap-UAF) that could cause crashes; they are not exploitable in the traditional sense but could theoretically be leveraged for memory corruption. The fixes reduce risk.

Level of Scrutiny

Thread-safety fixes in C++ concurrent code deserve careful human review even when the logic is straightforward. Locking must be consistent across all call sites (including any added in the future), and the -capture pattern must interact correctly with the existing refcounting and destruction order. A missed site or incorrect scope could introduce deadlocks or leave races partially fixed.

Other Factors

The fixes follow well-established patterns already present in , , and . The annotation will enforce lock discipline at compile time going forward. No bugs were flagged by the automated system. The PR description is detailed and includes verification steps. Given the correctness of the approach, a quick human sign-off should suffice.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cli/bunx_command.zig`:
- Around line 361-368: The code currently rewrites update_request.name "claude"
even when the user provided an explicit package spec (e.g., claude@1.2.3);
restrict the alias so it only applies for bare invocations by checking that no
explicit package spec was provided (use opts.specified_package == null) before
substituting update_request.name, and apply the same guard to the duplicate
alias logic later (the other update_request.name == "claude" branch) so explicit
specs remain unmodified.

In `@src/zlib.zig`:
- Around line 847-849: The code calls deflateBound(&zlib_reader.zlib,
`@intCast`(input.len)) which truncates input.len on platforms where uLong is
32-bit; before calling deflateBound, check if input.len > `@as`(usize,
`@maxValue`(`@TypeOf`(`@intCast`(input.len)))), and if so call zlib_reader.deinit()
and return a clear error (e.g., error.InputTooLarge or error.InvalidArgument)
instead of casting; update the use-site around
zlib_reader.list.ensureTotalCapacityPrecise and `@intCast`(input.len) to only call
deflateBound with a safe cast after this guard.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0f3a83da-5b37-4bcd-9c66-84db31f5f014

📥 Commits

Reviewing files that changed from the base of the PR and between 785b230 and d6881e9a728a4793ca4b48ccfa3b69c09099a00e.

📒 Files selected for processing (12)

src/bun.js/api/HashObject.zig
src/bun.js/api/JSBundler.zig
src/bun.js/api/bun/socket/Listener.zig
src/bun.js/webcore/Body.zig
src/bun.js/webcore/blob/read_file.zig
src/cli/bunx_command.zig
src/crash_handler.zig
src/deps/zlib.win32.zig
src/http/Decompressor.zig
src/install/bin.zig
src/zlib.zig
test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts

The setTimeout(50) was not load-bearing — the assertion is that the subprocess exits cleanly with no ASAN output, which already covers the full process lifetime including any tasks that run during teardown.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts`:
- Around line 10-14: The current filterStderr function removes every stderr line
that startsWith("WARNING: ASAN interferes"), which can hide legitimate later
diagnostics; change filterStderr (the function name) to only strip the first
startup line matching that exact prefix and leave all subsequent lines intact —
e.g., split stderr into lines, if the first non-empty line startsWith("WARNING:
ASAN interferes") drop just that first matching line, then rejoin the remaining
lines so later stderr that also begins with the same text is preserved for test
diffs.
- Around line 120-156: The test currently has each worker post its 20-message
burst before signaling "ready", so the main thread's extraChannels churn does
not overlap the worker-side HashMap .get() window; change the handshake so
workerCode waits for a parent "go" message before running the for loop and only
then posts "ready" after the burst, and update the main test logic to spawn
workers, create/close extraChannels, then send "go" to all workers (via
worker.postMessage("go")) and await their ready promises—this guarantees the
extraChannels churn occurs while worker-side channelToContextIdentifier().get()
is exercised.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e0626d1a-7f71-4675-bd8c-378fe650b419

📥 Commits

Reviewing files that changed from the base of the PR and between d6881e9a728a4793ca4b48ccfa3b69c09099a00e and 510f0f4.

📒 Files selected for processing (1)

test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts

claude

LGTM — targeted data-race fixes following the existing lock pattern.

Extended reasoning...

Overview

This PR modifies src/bun.js/bindings/webcore/BroadcastChannel.cpp and the accompanying test file. The C++ changes are minimal and surgical: (A) adding Locker locker { channelToContextIdentifierLock } at three call sites that were previously unguarded plus a WTF_REQUIRES_LOCK annotation on the accessor, and (B) changing a raw this capture in a postTaskTo lambda to protectedThis = Ref { *this } to extend object lifetime across the async task.

Security risks

No security-sensitive paths are touched. The changes harden thread safety rather than weakening it. The lock additions mirror the identical pattern already used for allBroadcastChannelsLock in the same file, and the Ref { *this } pattern matches MessagePort.cpp, Performance.cpp, and WebSocket.cpp.

Level of scrutiny

Concurrency fixes merit careful review, but the scope here is narrow: three identical Locker additions around HashMap calls, one WTF_REQUIRES_LOCK annotation, and one lambda capture change. Each fix has a clear mechanical parallel in the same file or nearby sibling files. The PR description thoroughly explains both races with a table of lock sites and the relevant code paths.

Other factors

No bugs were found by the automated bug hunting system. All CodeRabbit inline comments were addressed and resolved by the author with clear technical justification. The test additions follow established repo conventions for ASAN-compatible subprocess tests. The current repo HEAD already includes commits matching this PR title, indicating the changes have been validated in the main branch.

robobun · 2026-04-18T11:21:01Z

CI status (build #46260): 263 pass / 3 fail — none related to this change.

Job	Failure	Related?
`darwin-aarch64-build-cpp`	`HTTP 502 Bad Gateway` fetching `h2o/picohttpparser` vendor tarball from GitHub	Infra flake — `BroadcastChannel.cpp` never reached the compiler
`windows x64 test-bun`	`test/js/bun/net/socket.test.ts:779` — `expectMaxObjectTypeCount` expected ≤3, got 4	Pre-existing flake (history has a "Deflake test/js/bun/net/socket.test.ts" commit)
`windows x64-baseline test-bun`	`test/bake/dev/stress.test.ts` — "Client exited: Reload failed"	Pre-existing flake (history has a "Deflake stress.test.ts" commit)

broadcast-channel-worker-gc.test.ts itself passed everywhere it ran, including debian-13-x64-asan-test-bun and Windows x64 (3/3, 1071 ms). darwin-x64-build-cpp and linux-x64-asan-build-cpp both compiled BroadcastChannel.cpp cleanly.

My token is read-only, so I can't retry the darwin job — it just needs a re-run for the 502.

…essage lifetime (oven-sh#29441) ## What does this PR do? Fixes two data races in `BroadcastChannel.cpp` that surface as ASAN heap-use-after-free in `test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts`. ### Bug A — `channelToContextIdentifier` HashMap one-sided locking The prior `ThreadSafeWeakPtr` fix only covered `allBroadcastChannels()`. The second global, `channelToContextIdentifier()`, has its own lock — but it was taken at only 1 of 4 call sites: | Site | Thread | Lock? | | --- | --- | --- | | `registerChannel` `.add()` | main | ❌ | | `unregisterChannel` `.remove()` | main | ❌ | | `dispatchMessageTo` `.get()` | main | ❌ | | `contextIdForBroadcastChannelId` `.get()` | worker (via `ensureOnContextThread` → `dispatchMessage`) | ✅ | When main rehashes the HashMap (add/remove during worker spawn/terminate) while a worker reads it, the worker walks a freed bucket array → ASAN heap-UAF inside `WTF::HashTable`. The accessor was also missing `WTF_REQUIRES_LOCK`, so `-Wthread-safety` never flagged this. **Fix**: add `Locker locker { channelToContextIdentifierLock };` at the three unlocked sites and annotate the accessor with `WTF_REQUIRES_LOCK(channelToContextIdentifierLock)` to match `allBroadcastChannels()`. ### Bug B — `dispatchMessage` captures raw `this` in async task `dispatchMessage` posts a task with `[this, message = ...]` — raw `this`, no `Ref { *this }`. The caller (`dispatchMessageTo`'s inner lambda) holds a strong `RefPtr` from the `ThreadSafeWeakPtr` lookup, but that ref is dropped when the outer lambda returns. During worker terminate the JS wrapper is destroyed → refcount 0 → `~BroadcastChannel` → the queued task reads freed `this->m_isClosed` and calls `this->dispatchEvent()`. **Fix**: capture `protectedThis = Ref { *this }` in the `postTaskTo` lambda, matching the pattern in `MessagePort.cpp`, `Performance.cpp`, and `WebSocket.cpp`. ## How did you verify your code works? - `bun bd test test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts` — 3/3 pass, verified stable across 3 consecutive runs under debug+ASAN - `bun bd test test/js/web/broadcastchannel/broadcast-channel.test.ts` — 10/11 pass; the one failure (`broadcast channel worker wait`) is pre-existing on `main` under debug+ASAN (it uses `Bun.sleepSync(500)` which isn't enough for an ASAN worker to start) and is unrelated to this change ## Test changes - Added a stress test that churns channel registrations (forcing HashMap rehashes) while workers cross-post (reaching the worker-side map read), then terminates workers mid-dispatch (leaving queued tasks whose `this` would otherwise dangle). - Filtered the unconditional ASAN startup warning from child-process `stderr` so `expect(stderr).toBe("")` holds on ASAN builds — same pattern as `fetch-abort-queued.test.ts` / `string-decoder.test.js`. - Scaled timeouts for `isDebug || isASAN` — worker spawn under debug+ASAN is ~5–10× slower; the existing tests were borderline at the 5s default. Note: both races are highly timing-dependent (a HashMap rehash must land mid-`get()`); 20 local ASAN runs on macOS did not repro before the fix. The new stress test maximises contention but is not guaranteed to fail without the fix on every platform. --------- Co-authored-by: robobun <robobun@users.noreply.github.com>

github-actions Bot added the claude label Apr 18, 2026

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts Outdated

claude Bot reviewed Apr 18, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread src/cli/bunx_command.zig Outdated

Comment thread src/zlib.zig Outdated

Drop timing-based drain in new BroadcastChannel race test

510f0f4

The setTimeout(50) was not load-bearing — the assertion is that the subprocess exits cleanly with no ASAN output, which already covers the full process lifetime including any tasks that run during teardown.

robobun force-pushed the farm/1816880e/broadcastchannel-locking-and-ref branch from d6881e9 to 510f0f4 Compare April 18, 2026 10:29

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts

Comment thread test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts

claude Bot reviewed Apr 18, 2026

View reviewed changes

Jarred-Sumner merged commit e2fd5fb into main Apr 18, 2026
60 of 62 checks passed

Jarred-Sumner deleted the farm/1816880e/broadcastchannel-locking-and-ref branch April 18, 2026 20:52

robobun mentioned this pull request May 1, 2026

sql: serialize Date parameters as ISO 8601 in Postgres text format #29013

Open

Conversation

robobun commented Apr 18, 2026

What does this PR do?

Bug A — channelToContextIdentifier HashMap one-sided locking

Bug B — dispatchMessage captures raw this in async task

How did you verify your code works?

Test changes

Uh oh!

robobun commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security Risks

Level of Scrutiny

Other Factors

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

robobun commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug A — `channelToContextIdentifier` HashMap one-sided locking

Bug B — `dispatchMessage` captures raw `this` in async task

robobun commented Apr 18, 2026 •

edited

Loading

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading