MessagePort: serialize MessagePortChannelRegistry with a Lock (fix worker_threads data race) by dylan-conway · Pull Request #29442 · oven-sh/bun

dylan-conway · 2026-04-18T10:10:37Z

What

Replaces the commented-out ASSERT(isMainThread()) guards in MessagePortChannelRegistry with a real WTF::Lock, and makes MessagePortChannel thread-safe-refcounted so the registry can be safely used from worker threads.

Fixes #29458
Fixes #25805
Fixes #22471
Fixes #16186

Why

WebKit's MessagePortChannelRegistry is main-thread-only by design — every method asserts isMainThread() and worker access is routed through WorkerMessagePortChannelProvider → callOnMainThread. Bun stripped the proxy and commented the assertions out (with a // we totally are calling these off the main thread in many cases in Bun, so ........ note), so worker threads were directly mutating m_openChannels (an unguarded HashMap) and per-channel m_pendingMessages Vectors concurrently with the main thread.

The new stress test reproduces this as bmalloc heap corruption (pas panic: deallocation did fail / bitfit allocation error) on every run against current main.

The linked issues are all the same underlying race surfacing at different points in the call path:

Issue	Crash site	Faulting address	What was torn
#29458	`SerializedScriptValue::deserialize` ← `tryTakeMessage` ← `receiveMessageOnPort`	`0x40`	`m_pendingMessages[i].first()` returned a torn slot with a null `RefPtr<SerializedScriptValue>` while another thread reallocated the vector
#25805	MessageChannel transferred between two workers, hammered with `postMessage`	`0x8`	concurrent `m_openChannels` rehash + per-channel state mutation
#22471	`MessagePort::postMessage`	`0x18`	corrupted `m_openChannels.get()` / channel state under `workers_spawned`
#16186	`MessagePort::postMessage`	`0x30`	same as above, 17 workers

Why a lock instead of WebKit's thread-hop

WebKit bounces every registry call through callOnMainThread. Bun cannot do that directly because Node's synchronous receiveMessageOnPort() (tryTakeMessageForPort) needs to read the queue from the calling thread without round-tripping through the main loop. A lock gives the same single-writer guarantee WebKit's main-thread invariant provides, while keeping receiveMessageOnPort synchronous.

Details

MessagePortChannelRegistry gains Lock m_lock; m_openChannels becomes HashMap<…, ThreadSafeWeakPtr<MessagePortChannel>> and is WTF_GUARDED_BY_LOCK.
Every registry entry point takes the lock. The looked-up RefPtr<MessagePortChannel> is hoisted outside the locked scope so the channel destructor (which re-enters via messagePortChannelDestroyed) cannot deadlock.
MessagePortChannel is now ThreadSafeRefCountedAndCanMakeThreadSafeWeakPtr so ref/weak ops are atomic across threads.
MessagePortChannel::takeAllMessagesForPort now returns the message vector instead of invoking the callback directly; the registry invokes the callback after releasing the lock (the callback re-enters the registry via MessagePort::entanglePorts).
MessagePortChannel::tryTakeMessageForPort now clears m_pendingMessageProtectors[i] when it drains the queue (pre-existing self-ref leak on the receiveMessageOnPort path).
Removed dead m_messageBatchesInFlight / hasAnyMessagesPendingOrInFlight / existingChannelContainingPort (declared but never used in Bun).
Performance::timing()'s commented isMainThread() guard was a Window-only check in WebKit; in Bun each context has its own Performance/m_timing, so it's deleted rather than restored.

Testing

New test/js/web/workers/message-channel-concurrent.test.ts spawns 6 workers + main, each creating/transferring/posting/closing 8 channels × 400 iterations in a tight loop with no yields. Against current main this corrupts the heap on every run.

… of disabling isMainThread asserts WebKit's MessagePortChannelRegistry is main-thread-only — every method has ASSERT(isMainThread()) and worker access is routed through WorkerMessagePortChannelProvider via callOnMainThread. Bun stripped that proxy and commented the assertions out, so worker threads were mutating m_openChannels (an unguarded HashMap) and per-channel m_pendingMessages Vectors concurrently with the main thread. The new stress test reproduces this as a bmalloc bitfit / pas_segregated_page double-free on every run. Bun cannot adopt WebKit's thread-hop model directly because Node's synchronous receiveMessageOnPort() needs to read the queue from any thread without bouncing through the main loop, so instead the registry now owns a WTF::Lock and every entry point takes it. The channel RefPtr is hoisted outside the locked scope so the destructor (which re-enters the registry via messagePortChannelDestroyed) cannot deadlock, takeAllMessagesForPort returns the message vector and the callback is invoked after the lock is released (it re-enters the registry via entanglePorts), and MessagePortChannel becomes ThreadSafeRefCountedAndCanMakeThreadSafeWeakPtr so RefPtr/WeakPtr operations are atomic across threads. The dead m_messageBatchesInFlight tracking and the unused existingChannelContainingPort/hasAnyMessagesPendingOrInFlight declarations are removed along the way. Performance::timing()'s commented isMainThread() guard was a different case: in WebKit it gates a Window-only API, but in Bun each context has its own Performance instance with its own m_timing, so the guard is simply inapplicable and is deleted rather than restored.

robobun · 2026-04-18T10:10:49Z

^{Updated 5:06 PM PT - Apr 18th, 2026}

❌ @dylan-conway, your commit de0fd56 has 2 failures in Build #46343 (All Failures):

test/js/bun/http/tls-keepalive.test.ts - code 1 on 🍎 13 aarch64
test/js/bun/net/socket.test.ts - code 1 on 🪟 2019 x64-baseline
test/js/bun/net/socket.test.ts - code 1 on 🪟 2019 x64

🧪 To try this PR locally:

bunx bun-pr 29442

That installs a local version of the PR into your bun-29442 executable, so you can run:

bun-29442 --bun

github-actions · 2026-04-18T10:13:41Z

Found 3 issues this PR may fix:

MessageChannel between Workers causes crash/unexpected exit #25805 - MessageChannel between Workers causes crash/unexpected exit — directly reproduces concurrent MessagePort usage across workers causing segfault
Consistent SEGFAULT when using MessagePort #16186 - Consistent SEGFAULT when using MessagePort — segfault in MessagePort::postMessage with 17 concurrent workers
Segmentation fault at address 0x00000018 #22471 - Segmentation fault at address 0x00000018 — near-null dereference in MessagePort::postMessage, consistent with use-after-free from unserialized registry access

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #25805
Fixes #16186
Fixes #22471

🤖 Generated with Claude Code

github-actions · 2026-04-18T10:14:57Z

This PR may be a duplicate of:

fix(worker): add thread safety to MessageChannel to avoid crashes #25806 - Adds WTF::Lock to MessagePortChannel and MessagePortChannelRegistry to fix the same thread-safety data race with the same approach
fix(worker_threads): add thread-safety to MessagePortChannelRegistry #26016 - Adds a Lock to MessagePortChannelRegistry to protect m_openChannels from concurrent access, which is the core change in this PR

🤖 Generated with Claude Code

coderabbitai · 2026-04-18T10:17:49Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

MessagePortChannel and its registry were made thread-safe (thread-safe refcounting, thread-safe weak pointers, and a Lock). The message-draining API changed from an async CompletionHandler to a synchronous return. Message-in-flight telemetry/counters were removed. A concurrent worker stress test and minor provider/header cleanups were added.

Changes

Cohort / File(s)	Summary
MessagePortChannel core `src/bun.js/bindings/webcore/MessagePortChannel.h`, `src/bun.js/bindings/webcore/MessagePortChannel.cpp`	Switched class to `ThreadSafeRefCountedAndCanMakeThreadSafeWeakPtr`. Replaced async `takeAllMessagesForPort(... CompletionHandler<...>)` with synchronous `Vector<MessageWithMessagePorts> takeAllMessagesForPort(const MessagePortIdentifier&)`. Removed `hasAnyMessagesPendingOrInFlight()`, `beingTransferredCount()`, and `m_messageBatchesInFlight`. Removed local keep-alive guards and now clears pending-message protectors directly; also removed `relaxAdoptionRequirement()` call.
Registry thread-safety `src/bun.js/bindings/webcore/MessagePortChannelRegistry.h`, `src/bun.js/bindings/webcore/MessagePortChannelRegistry.cpp`	Added `wtf::Lock m_lock` and `WTF_GUARDED_BY_LOCK(m_lock)`; changed stored weak refs to `ThreadSafeWeakPtr<MessagePortChannel>`. Serialized access: lock around registry mutations and lookups, copy out channels/messages under lock, then invoke channel methods or callbacks after unlocking. Removed `existingChannelContainingPort()` declaration.
Provider plumbing & forwarding `src/bun.js/bindings/webcore/MessagePortChannelProvider.cpp`, `src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.cpp`	Removed a commented-out `ASSERT(isMainThread())`. Simplified `takeAllMessagesForPort` forwarding by renaming the parameter and passing the callback directly to the registry (removed wrapper lambda).
Headers / includes `src/bun.js/bindings/webcore/MessagePortChannel.h`, `src/bun.js/bindings/webcore/MessagePortChannelRegistry.h`	Added includes for `ThreadSafeWeakPtr.h` and `wtf/Lock.h`; removed now-unneeded non-thread-safe RefCounted includes.
Performance cleanup `src/bun.js/bindings/webcore/Performance.cpp`	Removed previously commented-out scriptExecutionContext/main-thread checks; behavior unchanged.
Tests — concurrent stress `test/js/web/workers/message-channel-concurrent-fixture.js`, `test/js/web/workers/message-channel-concurrent.test.ts`	Added a worker-thread stress fixture that concurrently creates/posts/transfers/closes MessagePorts and drains messages; added a test that spawns the fixture, validates `PASS` stdout and exit code 0, and filters sanitizer warnings from stderr.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding a Lock to serialize MessagePortChannelRegistry to fix a worker_threads data race.
Description check	✅ Passed	PR description covers both required sections: detailed 'What' explaining the fix and comprehensive 'Why' with technical rationale and design decisions.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/web/workers/message-channel-concurrent-fixture.js`:
- Around line 32-61: Add a subprocess watchdog that enforces a timeout when
running in the parent (isMainThread) so the fixture fails fast if a worker never
posts back: create a timer (e.g., watchdog = setTimeout(..., TIMEOUT_MS))
started before spawning/aggregating workers, have the timeout handler log an
error, terminate all Worker instances in the workers array, and call
process.exit(1); ensure the timer is cleared (clearTimeout(watchdog)) inside
finish() so successful completion cancels the watchdog; reference the existing
isMainThread block, WORKERS loop, workers array, finish() function, and worker
termination logic when implementing this.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a210a353-36ad-49f8-8630-5ff5976ce112

📥 Commits

Reviewing files that changed from the base of the PR and between 8f2519f and a3a7721.

📒 Files selected for processing (9)

src/bun.js/bindings/webcore/MessagePortChannel.cpp
src/bun.js/bindings/webcore/MessagePortChannel.h
src/bun.js/bindings/webcore/MessagePortChannelProvider.cpp
src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.cpp
src/bun.js/bindings/webcore/MessagePortChannelRegistry.cpp
src/bun.js/bindings/webcore/MessagePortChannelRegistry.h
src/bun.js/bindings/webcore/Performance.cpp
test/js/web/workers/message-channel-concurrent-fixture.js
test/js/web/workers/message-channel-concurrent.test.ts

💤 Files with no reviewable changes (2)

src/bun.js/bindings/webcore/Performance.cpp
src/bun.js/bindings/webcore/MessagePortChannelProvider.cpp

claude

This is a well-motivated and correctly structured threading fix, but the scope (ref-counting model change, lock protocol, callback → return-value refactor) warrants a human eye before merging — especially given the pre-existing protector-leak found in tryTakeMessageForPort.

Extended reasoning...

Overview

This PR replaces the previously unguarded (comment-only) main-thread assertions in MessagePortChannelRegistry with a real WTF::Lock, upgrades MessagePortChannel to ThreadSafeRefCountedAndCanMakeThreadSafeWeakPtr, changes m_openChannels to use ThreadSafeWeakPtr, and converts takeAllMessagesForPort from a callback to a return-value interface. It also removes dead code (m_messageBatchesInFlight, existingChannelContainingPort, hasAnyMessagesPendingOrInFlight) and adds a concurrent stress test.

Security risks

No authentication, crypto, or permissions code is involved. The change affects object lifetime and heap safety — the pre-existing issue in tryTakeMessageForPort not clearing m_pendingMessageProtectors[i] after draining the last message produces a temporary self-referential retain cycle. This is a memory-safety concern (delayed deallocation window in long-lived workers) rather than a security vulnerability.

Level of scrutiny

This merits careful human review. The change is not mechanical: it alters the ref-counting model for a class that interacts with the GC/finalizer path, introduces a lock protocol that must be consistently followed across all future callers, and changes the callback contract for takeAllMessagesForPort. Lock discipline bugs in C++ (use-after-free, deadlock) can be hard to catch in code review and catastrophic at runtime.

Other factors

The stress test is thorough and the PR description accurately explains the design trade-offs. The bug found by the hunter (tryTakeMessageForPort leaving a stale self-ref protector) is pre-existing and not introduced here, but the PR makes that path more prominent via receiveMessageOnPort. A trivial one-liner fix in the same file would eliminate the asymmetry cleanly. CI results are also still pending.

…; harden stress fixture tryTakeMessageForPort (the receiveMessageOnPort path) never cleared m_pendingMessageProtectors[i] after draining the queue, leaving a self-reference that delayed channel destruction until close(). Clear it when the queue empties, matching takeAllMessagesForPort. The stress fixture is reworked to wait on each worker's exit event and let the main loop drain naturally instead of calling worker.terminate() then process.exit() — the abrupt teardown was racing a separate, pre-existing postTaskConcurrently-after-terminate check on x64-asan and obscuring the result. A 30s subprocess watchdog and a 60s outer timeout are added so a deadlock regression fails fast with a useful message instead of hanging.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/web/workers/message-channel-concurrent-fixture.js`:
- Around line 45-74: The test only checks total !== 0 so a failing worker can be
hidden; compute expectedPerThread = ITERATIONS * CHANNELS_PER_ITERATION * 2,
track each worker's reported count (e.g. collect messages in a workersCounts
array or object from the worker.on("message", n => ...) handler keyed by worker
index/ID), keep the main-thread hammer() result separate, and in finish() verify
the main-thread result equals expectedPerThread and every worker's reported
count equals expectedPerThread (or at least that every worker reported >0)
before printing PASS; if any worker is missing or mismatched, fail the test with
process.exit(1).

In `@test/js/web/workers/message-channel-concurrent.test.ts`:
- Around line 18-20: The test currently asserts stderr is empty directly (using
the variables stderr and expect), but on ASAN builders Bun subprocesses can emit
a known startup warning; update the assertion to filter out any stderr lines
that start with "WARNING: ASAN interferes" (split stderr into lines, remove
lines matching that prefix) and then assert the filtered stderr is empty; apply
this change where stdout, stderr, exitCode are awaited and before the
expect(stderr).toBe("") assertion so the test uses the repo's filtered-stderr
convention.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7e33a01-fb55-464d-9d48-7bc9b46943ea

📥 Commits

Reviewing files that changed from the base of the PR and between 8bf9d4e and 6d5e1ee.

📒 Files selected for processing (3)

src/bun.js/bindings/webcore/MessagePortChannel.cpp
test/js/web/workers/message-channel-concurrent-fixture.js
test/js/web/workers/message-channel-concurrent.test.ts

…rning The fixture now verifies each hammer() returns exactly ITERATIONS * CHANNELS_PER_ITERATION * 2 messages and that every worker reports before printing PASS, so a worker-only regression cannot be hidden by the main-thread contribution. The test wrapper filters the known 'WARNING: ASAN interferes' startup line before asserting empty stderr, matching the convention in other subprocess tests.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/web/workers/message-channel-concurrent-fixture.js`:
- Around line 9-12: The four constants WORKERS, ITERATIONS,
CHANNELS_PER_ITERATION, and EXPECTED_PER_HAMMER use SCREAMING_SNAKE_CASE but the
repo requires camelCase for JS/TS; rename them to camelCase (e.g., workers,
iterations, channelsPerIteration, expectedPerHammer) and update all references
in this file (and other occurrences noted at lines referenced in the review) so
the variable names remain consistent and tests still compute EXPECTED_PER_HAMMER
as iterations * channelsPerIteration * 2.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f0e59577-466e-4c06-9afb-7da8d54fd0bb

📥 Commits

Reviewing files that changed from the base of the PR and between 6d5e1ee and 8fb83f9.

📒 Files selected for processing (2)

test/js/web/workers/message-channel-concurrent-fixture.js
test/js/web/workers/message-channel-concurrent.test.ts

CI's runner passes --timeout=90000 (scripts/runner.node.mjs testTimeout/2), so the fixture's own 30s watchdog already fires well within the per-test ceiling and the explicit 60s override is redundant.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/web/workers/message-channel-concurrent.test.ts`:
- Around line 18-24: Remove the brittle empty-stderr gating and instead rely on
the subprocess output and exit code as the regression signal: drop the
filteredStderr computation and its expect assertion, and change the assertions
to verify that stdout (the variable `stdout` resolved from `proc.stdout.text()`)
contains the expected "PASS" message and that `exitCode` (from `proc.exited`) is
0; update the Promise.all usage taking `stdout` and `exitCode` into account and
remove references to `stderr`/`filteredStderr` (variables `stderr` and
`filteredStderr`) from the test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2a1ec395-1d87-4614-b019-b9e2aec4574c

📥 Commits

Reviewing files that changed from the base of the PR and between 8fb83f9 and 36248eb.

📒 Files selected for processing (1)

test/js/web/workers/message-channel-concurrent.test.ts

…gnal

The fixture now exports hammer() and self-executes when loaded as a Worker; the test spawns the workers directly and asserts each returns the expected count. On a regression the test process itself crashes, which the runner reports as a failure — no need for the subprocess wrapper.

… bug receiveMessageOnPort's JS wrapper uses if (!res) so a message value of 0 is treated as no-message, which short-circuited the receive loop on the first channel (j === 0) of every iteration and made the count come up 400 short. Post {j} instead so the payload is always truthy; the wrapper bug is pre-existing and tracked separately.

claude

LGTM — both previously flagged issues have been addressed: the protector leak is fixed (commit 6d5e1ee) and the spurious test timeout was removed (commit 36248eb). The locking approach is sound.

Extended reasoning...

Overview

This PR replaces the commented-out ASSERT(isMainThread()) guards in MessagePortChannelRegistry with a WTF::Lock, and upgrades MessagePortChannel to ThreadSafeRefCountedAndCanMakeThreadSafeWeakPtr. It touches 7 C++ source files and 2 new test files.

Security risks

No security-sensitive code paths (auth, crypto, permissions) are involved. The change narrows a race condition; it does not introduce new attack surface.

Level of scrutiny

This is a concurrency fix in production-critical runtime code, but the approach is straightforward: a single coarse-grained lock around an otherwise-unguarded HashMap. The key correctness details — hoisting RefPtr outside lock scope before the destructor can re-enter via messagePortChannelDestroyed, calling the takeAllMessagesForPort callback after releasing the lock to avoid deadlock on re-entrant registry access — are clearly handled. Both previously raised concerns (protector leak in tryTakeMessageForPort and test timeout convention violation) have been resolved.

Other factors

No outstanding reviewer comments remain unaddressed. The stress test reliably reproduces the pre-fix heap corruption and validates the fix. CI build failures visible in the timeline are on the earlier commit and appear to be unrelated zig build issues, not failures in the C++ changes under review.

…primitive (#29937) ## What Replaces the WebKit-derived `MessagePortChannelProvider` / `MessagePortChannelProviderImpl` / `MessagePortChannelRegistry` / `MessagePortChannel` stack — plus the `BroadcastChannel::MainThreadBridge` / per-context `BunBroadcastChannelRegistry` — with a small, thread-safe concurrency primitive (`MessagePortPipe`) that the Web-facing classes wrap thinly. Net −1082 lines (+737 / −1819), 10 files deleted. ## Why The existing stack is Safari's multi-*process* design carried over verbatim: an abstract provider so UIProcess can swap in an IPC impl, a process-global `MessagePortIdentifier → MessagePortChannel` HashMap, `ProcessIdentifier` tracking, self-`RefPtr` "protector" members to keep channels alive across IPC hops, and a main-thread serialization point for `BroadcastChannel`. None of that indirection is needed in Bun's single-process multi-thread model — and worse, the `MessagePortChannelRegistry::m_openChannels` HashMap and `MessagePortChannel::m_pendingMessages` Vector are mutated from worker threads with **no lock** (the WebKit `ASSERT(isMainThread())` guards were just commented out). Concurrent `new MessageChannel()` across workers corrupts the HashMap: ``` HashTable.h:1574:17: runtime error: member access within null pointer of type 'const HashTable<MessagePortIdentifier, KeyValuePair<MessagePortIdentifier, WeakRef<MessagePortChannel>>, ...>' ``` This is the cause of #16186, #22471, #25805 and likely #29458. ## Design **`MessagePortPipe`** — `ThreadSafeRefCounted`, two sides. Each side has: - a `Lock` + `Deque<MessageWithMessagePorts>` inbox - one `std::atomic<uint64_t>` state word: `Closed | DrainScheduled | Attached` in the low byte, queued-count in the high bits `send()` locks the destination side, enqueues, and if `Attached && !DrainScheduled` flips `DrainScheduled` and posts one task to the receiver's `ScriptExecutionContext`. A burst of N sends schedules at most one cross-thread wakeup. `takeAll()` clears `DrainScheduled` before swapping the deque, so a racing send reschedules — at most one extra no-op drain, never a stranded message. `attach()`/`detach()` handle transfer: a detached side buffers; attach flushes the backlog with one wakeup. `hasPendingActivity` is two atomic loads (my queued count + peer's `Closed`), safe from the GC thread with no lock. **`MessagePort`** — holds `RefPtr<MessagePortPipe>` + `uint8_t side`. `postMessage` → `pipe->send`. `start()` → `pipe->attach`. `close()` → `pipe->close`. Transfer → `pipe->detach`, hand `{pipe, side}` through `TransferredMessagePort`, re-create on the other side. No global maps, no identifiers, no per-context port iteration. The drain task takes the whole inbox in one go (sender-side batching), then posts each message as its own local task so microtasks checkpoint between deliveries — matching the HTML "port message queue is a task source" semantics and Node's observable ordering. **`BroadcastChannel`** — one process-global `Lock` + `HashMap<String, Vector<{ctxId, weak channel}>>`. `postMessage` snapshots subscribers under the lock, releases, then posts one task per `(message, subscriber)` directly to each subscriber's context — no bounce through the main thread, no `MainThreadBridge`, no second `allBroadcastChannels` map. Per-channel inbox batching was prototyped and reverted: the spec's same-event-loop `(message-major, creation-minor)` delivery-order test (WPT `broadcast-channel.test.ts::"messages are delivered in port creation order"`) fails if subscribers drain channel-major. A per-`(context, name)` inbox could restore batching without breaking order — left as future work. ## Deleted `MessagePortChannel.{h,cpp}`, `MessagePortChannelProvider.{h,cpp}`, `MessagePortChannelProviderImpl.{h,cpp}`, `MessagePortChannelRegistry.{h,cpp}`, `MessagePortIdentifier.h`, `BroadcastChannelRegistry.h`, the `allMessagePorts()` / `portToContextIdentifier()` / `allBroadcastChannels()` / `channelToContextIdentifier()` global maps, `ScriptExecutionContext::{m_messagePorts, processMessageWithMessagePortsSoon, dispatchMessagePortEvents, createdMessagePort, destroyedMessagePort, m_broadcastChannelRegistry}`, and `WorkerGlobalScope::messagePortChannelProvider()`. ## Verification - `test/js/web/workers/message-channel.test.ts` — 12/12 - `test/js/web/broadcastchannel/broadcast-channel.test.ts` — 11/11 - `test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts` — 3/3 - `test/js/web/workers/` suite — 228 pass / 6 fail; all 6 fail identically on `main` (container-slowness timeouts), zero new failures - `test/js/node/worker_threads/worker_threads.test.ts` — 25 pass / 2 fail; both fail identically on `main` - 10 node `parallel/test-worker-message-port-*.js` + `test-broadcastchannel-*.js` — all pass - New `test/js/web/workers/message-port-pipe.test.ts`: - `concurrent MessageChannel creation across workers is race-free` — **fails 3/3 on main** with the HashTable UBSan null-deref shown above; **passes 5/5** on this branch - microtask ordering, buffered-before-start, FIFO `receiveMessageOnPort`, 1000-message cross-thread burst ## Relationship to other PRs #29832 / #29442 patch the race by adding a lock to the existing registry. This PR removes the registry instead — there's nothing left to race on. Either approach fixes the crash; this one also sheds the layering. Fixes #16186 Fixes #22471 Fixes #25805 --------- Co-authored-by: robobun <robobun@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

…primitive (oven-sh#29937) ## What Replaces the WebKit-derived `MessagePortChannelProvider` / `MessagePortChannelProviderImpl` / `MessagePortChannelRegistry` / `MessagePortChannel` stack — plus the `BroadcastChannel::MainThreadBridge` / per-context `BunBroadcastChannelRegistry` — with a small, thread-safe concurrency primitive (`MessagePortPipe`) that the Web-facing classes wrap thinly. Net −1082 lines (+737 / −1819), 10 files deleted. ## Why The existing stack is Safari's multi-*process* design carried over verbatim: an abstract provider so UIProcess can swap in an IPC impl, a process-global `MessagePortIdentifier → MessagePortChannel` HashMap, `ProcessIdentifier` tracking, self-`RefPtr` "protector" members to keep channels alive across IPC hops, and a main-thread serialization point for `BroadcastChannel`. None of that indirection is needed in Bun's single-process multi-thread model — and worse, the `MessagePortChannelRegistry::m_openChannels` HashMap and `MessagePortChannel::m_pendingMessages` Vector are mutated from worker threads with **no lock** (the WebKit `ASSERT(isMainThread())` guards were just commented out). Concurrent `new MessageChannel()` across workers corrupts the HashMap: ``` HashTable.h:1574:17: runtime error: member access within null pointer of type 'const HashTable<MessagePortIdentifier, KeyValuePair<MessagePortIdentifier, WeakRef<MessagePortChannel>>, ...>' ``` This is the cause of oven-sh#16186, oven-sh#22471, oven-sh#25805 and likely oven-sh#29458. ## Design **`MessagePortPipe`** — `ThreadSafeRefCounted`, two sides. Each side has: - a `Lock` + `Deque<MessageWithMessagePorts>` inbox - one `std::atomic<uint64_t>` state word: `Closed | DrainScheduled | Attached` in the low byte, queued-count in the high bits `send()` locks the destination side, enqueues, and if `Attached && !DrainScheduled` flips `DrainScheduled` and posts one task to the receiver's `ScriptExecutionContext`. A burst of N sends schedules at most one cross-thread wakeup. `takeAll()` clears `DrainScheduled` before swapping the deque, so a racing send reschedules — at most one extra no-op drain, never a stranded message. `attach()`/`detach()` handle transfer: a detached side buffers; attach flushes the backlog with one wakeup. `hasPendingActivity` is two atomic loads (my queued count + peer's `Closed`), safe from the GC thread with no lock. **`MessagePort`** — holds `RefPtr<MessagePortPipe>` + `uint8_t side`. `postMessage` → `pipe->send`. `start()` → `pipe->attach`. `close()` → `pipe->close`. Transfer → `pipe->detach`, hand `{pipe, side}` through `TransferredMessagePort`, re-create on the other side. No global maps, no identifiers, no per-context port iteration. The drain task takes the whole inbox in one go (sender-side batching), then posts each message as its own local task so microtasks checkpoint between deliveries — matching the HTML "port message queue is a task source" semantics and Node's observable ordering. **`BroadcastChannel`** — one process-global `Lock` + `HashMap<String, Vector<{ctxId, weak channel}>>`. `postMessage` snapshots subscribers under the lock, releases, then posts one task per `(message, subscriber)` directly to each subscriber's context — no bounce through the main thread, no `MainThreadBridge`, no second `allBroadcastChannels` map. Per-channel inbox batching was prototyped and reverted: the spec's same-event-loop `(message-major, creation-minor)` delivery-order test (WPT `broadcast-channel.test.ts::"messages are delivered in port creation order"`) fails if subscribers drain channel-major. A per-`(context, name)` inbox could restore batching without breaking order — left as future work. ## Deleted `MessagePortChannel.{h,cpp}`, `MessagePortChannelProvider.{h,cpp}`, `MessagePortChannelProviderImpl.{h,cpp}`, `MessagePortChannelRegistry.{h,cpp}`, `MessagePortIdentifier.h`, `BroadcastChannelRegistry.h`, the `allMessagePorts()` / `portToContextIdentifier()` / `allBroadcastChannels()` / `channelToContextIdentifier()` global maps, `ScriptExecutionContext::{m_messagePorts, processMessageWithMessagePortsSoon, dispatchMessagePortEvents, createdMessagePort, destroyedMessagePort, m_broadcastChannelRegistry}`, and `WorkerGlobalScope::messagePortChannelProvider()`. ## Verification - `test/js/web/workers/message-channel.test.ts` — 12/12 - `test/js/web/broadcastchannel/broadcast-channel.test.ts` — 11/11 - `test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts` — 3/3 - `test/js/web/workers/` suite — 228 pass / 6 fail; all 6 fail identically on `main` (container-slowness timeouts), zero new failures - `test/js/node/worker_threads/worker_threads.test.ts` — 25 pass / 2 fail; both fail identically on `main` - 10 node `parallel/test-worker-message-port-*.js` + `test-broadcastchannel-*.js` — all pass - New `test/js/web/workers/message-port-pipe.test.ts`: - `concurrent MessageChannel creation across workers is race-free` — **fails 3/3 on main** with the HashTable UBSan null-deref shown above; **passes 5/5** on this branch - microtask ordering, buffered-before-start, FIFO `receiveMessageOnPort`, 1000-message cross-thread burst ## Relationship to other PRs oven-sh#29832 / oven-sh#29442 patch the race by adding a lock to the existing registry. This PR removes the registry instead — there's nothing left to race on. Either approach fixes the crash; this one also sheds the layering. Fixes oven-sh#16186 Fixes oven-sh#22471 Fixes oven-sh#25805 --------- Co-authored-by: robobun <robobun@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

robobun · 2026-06-19T00:02:27Z

Superseded by #29937, which removed MessagePortChannelRegistry / MessagePortChannel entirely and replaced them with the thread-safe MessagePortPipe primitive. The files this PR locks are now empty stubs on main, so the diff no longer applies.

The regression coverage here is preserved: test/js/web/workers/message-port-pipe.test.ts has an equivalent "concurrent MessageChannel creation across workers is race-free" test plus a cross-thread burst ordering test. The linked issues (#16186, #22471, #25805, #29458) are all closed as fixed by #29937.

Closing as superseded.

[autofix.ci] apply automated fixes

8bf9d4e

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread test/js/web/workers/message-channel-concurrent-fixture.js Outdated

claude Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread src/bun.js/bindings/webcore/MessagePortChannel.cpp

claude Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread test/js/web/workers/message-channel-concurrent.test.ts Outdated

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread test/js/web/workers/message-channel-concurrent-fixture.js Outdated

Comment thread test/js/web/workers/message-channel-concurrent.test.ts Outdated

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread test/js/web/workers/message-channel-concurrent-fixture.js Outdated

claude Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread test/js/web/workers/message-channel-concurrent.test.ts

test: drop explicit test() timeout

36248eb

CI's runner passes --timeout=90000 (scripts/runner.node.mjs testTimeout/2), so the fixture's own 30s watchdog already fires well within the per-test ceiling and the explicit 60s override is redundant.

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread test/js/web/workers/message-channel-concurrent.test.ts Outdated

dylan-conway added 3 commits April 18, 2026 12:05

test: drop empty-stderr assertion; PASS + exit 0 is the regression si…

6091634

…gnal

claude Bot reviewed Apr 18, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 28, 2026

worker_threads: serialize MessagePortChannelRegistry across threads (1i5ms9) #29832

Closed

robobun mentioned this pull request Apr 29, 2026

Replace MessagePort/BroadcastChannel registries with MessagePortPipe primitive #29937

Merged

github-actions Bot mentioned this pull request Apr 30, 2026

Guard MessagePortChannelRegistry::m_openChannels with a Lock #29972

Closed

robobun mentioned this pull request Apr 30, 2026

Guard MessagePortChannel state with a per-channel Lock #29991

Closed

robobun closed this Jun 19, 2026

Conversation

dylan-conway commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Why a lock instead of WebKit's thread-hop

Details

Testing

Uh oh!

robobun commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

coderabbitai Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

robobun commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dylan-conway commented Apr 18, 2026 •

edited

Loading

robobun commented Apr 18, 2026 •

edited

Loading

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading