Guard MessagePortChannel state with a per-channel Lock by robobun · Pull Request #29991 · oven-sh/bun

robobun · 2026-04-30T08:23:36Z

Problem

MessagePortChannel is shared between the main thread and worker threads in Bun. Upstream WebKit serializes all access to it with ASSERT(isMainThread()), but Bun commented those out and calls directly from worker threads (see the note in MessagePortChannelRegistry.cpp: "we totally are calling these off the main thread in many cases in Bun, so ........").

This leaves two unsynchronized hazards inside MessagePortChannel itself:

m_pendingMessages Vector race — postMessageToRemote() on the sender thread calls m_pendingMessages[i].append(...) while takeAllMessagesForPort() on the receiver thread does result.swap(m_pendingMessages[i]). When append() reallocs the backing buffer mid-swap, ASAN reports container-overflow / SEGV in Vector::appendSlowCase → expandCapacity.
Non-atomic refcount — MessagePortChannel derives from RefCountedAndCanMakeWeakPtr (non-atomic RefCounted). Concurrent RefPtr channel = m_openChannels.get(...) from the registry on two threads can lose an increment → premature destruction → ASSERTION FAILED: m_openChannels.get(channel.port1()) == &channel in messagePortChannelDestroyed().

Repro

Worker floods 20 000 messages through a transferred MessagePort while the main thread's onmessage handler echoes each one back, so both threads are continuously appending to and draining from the same channel. Without the fix, under the debug ASAN build:

ERROR: AddressSanitizer: container-overflow ... thread T9 (Worker)
  #9 WebCore::MessagePortChannel::postMessageToRemote(...) MessagePortChannel.cpp:127
  #8 WTF::Vector<MessageWithMessagePorts>::append(...)
  #7 ...::appendSlowCase → expandCapacity → reserveCapacity

or

ASSERTION FAILED: m_openChannels.get(channel.port1()) == &channel
MessagePortChannelRegistry.cpp(72) : messagePortChannelDestroyed

9/10 runs crash. With the fix: 15/15 pass.

Fix

Add Lock m_lock to MessagePortChannel and take it in every method that touches mutable state (postMessageToRemote, takeAllMessagesForPort, tryTakeMessageForPort, closePort, disentanglePort, entanglePortWithProcess, processForPort). This matches the Locker locker { m_lock } pattern used in EventListenerMap, BroadcastChannel, AbortSignal.
Drop the lock before invoking the takeAllMessagesForPort completion callback (which can re-enter via dispatchMessages → postMessage) and before the last self-protector RefPtr is released (so the destructor never runs while holding its own lock). The formerly-vestigial "guarantee that the lock is unlocked before the last ref is released" comments are now accurate again.
Change the base class to ThreadSafeRefCounted<MessagePortChannel>, CanMakeWeakPtr<MessagePortChannel> so ref()/deref() are atomic. Drop the explicit relaxAdoptionRequirement() since ThreadSafeRefCountedBase already calls it in its constructor.

Verification

# Without fix (src/ stashed)
bun bd test test/js/web/workers/message-channel.test.ts -t 'concurrent MessagePort'
  (fail) ... AddressSanitizer: container-overflow ...

# With fix
bun bd test test/js/web/workers/message-channel.test.ts -t 'concurrent MessagePort'
  (pass) ... [17098ms]

# Full file
bun bd test test/js/web/workers/message-channel.test.ts
  13 pass, 0 fail

Guard MessagePortChannelRegistry::m_openChannels with a Lock #29972 locks the Registry's m_openChannels HashMap — different layer, explicitly notes it does not address this per-channel race. This PR is complementary.
MessagePort: serialize MessagePortChannelRegistry with a Lock (fix worker_threads data race) #29442 is a broader refactor that serializes all channel access through a single Registry lock + ThreadSafeRefCountedAndCanMakeThreadSafeWeakPtr. This PR takes the narrower per-channel-lock approach so independent channels don't serialize each other; happy to close in favor of MessagePort: serialize MessagePortChannelRegistry with a Lock (fix worker_threads data race) #29442 if that's preferred.
fix(worker): add thread safety to MessageChannel to avoid crashes #25806 takes the same per-channel-lock approach but has been stale with merge conflicts since Feb.
Replace MessagePort/BroadcastChannel registries with MessagePortPipe primitive #29937 replaces the whole MessagePortChannel/Registry stack with a new MessagePortPipe primitive (full rewrite).
Fixes MessageChannel between Workers causes crash/unexpected exit #25805

Upstream WebKit serializes all MessagePortChannel access to the main thread via ASSERT(isMainThread()). Bun removed those asserts and calls postMessageToRemote() / takeAllMessagesForPort() from both the worker thread and the main thread directly, so m_pendingMessages[i].append() races with result.swap(m_pendingMessages[i]) and the non-atomic RefCounted refcount is corrupted by concurrent ref()/deref() in the registry lookup path. Under ASAN this surfaces as container-overflow / SEGV in Vector::appendSlowCase -> expandCapacity, or as a premature destructor firing messagePortChannelDestroyed with a stale registry entry. Add a per-channel Lock guarding all mutable state (m_pendingMessages, m_pendingMessageProtectors, m_entangledToProcessProtectors, m_processes, m_isClosed, m_pendingMessagePortTransfers, m_messageBatchesInFlight) and switch the base class from RefCountedAndCanMakeWeakPtr to ThreadSafeRefCounted + CanMakeWeakPtr so ref/deref are atomic. The lock is dropped before invoking the takeAllMessagesForPort callback and before the last self-protector RefPtr is released, so there is no re-entrancy deadlock and no self-destruction under the lock.

coderabbitai · 2026-04-30T08:23:43Z

Warning

Rate limit exceeded

@robobun has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 18 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5b530003-ef55-4917-8f1f-3f654b7138ca

📥 Commits

Reviewing files that changed from the base of the PR and between 360bbb5 and fcfb9db.

📒 Files selected for processing (3)

src/bun.js/bindings/webcore/MessagePortChannel.cpp
src/bun.js/bindings/webcore/MessagePortChannel.h
test/js/web/workers/message-channel.test.ts

_{Review rate limit: 0/5 reviews remaining, refill in 5 minutes and 18 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

robobun · 2026-04-30T08:23:49Z

^{Updated 4:18 AM PT - Apr 30th, 2026}

❌ @robobun, your commit fcfb9db has 2 failures in Build #49377 (All Failures):

test/js/bun/s3/s3-storage-class.test.ts - code 1 on 🍎 14 aarch64
test/js/node/watch/fs.watch.test.ts - pid 10647 segmentation fault at address 0x0�[0m on 🐧 25.04 x64

🧪 To try this PR locally:

bunx bun-pr 29991

That installs a local version of the PR into your bun-29991 executable, so you can run:

bun-29991 --bun

github-actions · 2026-04-30T08:28:20Z

Found 3 issues this PR may fix:

Consistent SEGFAULT when using MessagePort #16186 - Consistent SEGFAULT when using MessagePort; the crash occurs during heavy concurrent worker MessagePort usage, matching the postMessage/takeAllMessages data race this PR guards with a per-channel lock.
Segmentation fault at address 0x00000018 #22471 - Segmentation fault at address 0x00000018; stack trace points directly to WebCore::MessagePort::postMessage (MessagePort.cpp:186), the exact code path now protected by the new lock.
Worker messages and close events often lost if worker calls process.exit immediately after postMessage #14144 - Worker messages and close events often lost if worker calls process.exit immediately after postMessage; the race between posting a message and channel teardown is the concurrent access scenario this PR's lock guards against.

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #16186
Fixes #22471
Fixes #14144

🤖 Generated with Claude Code

github-actions · 2026-04-30T08:31:06Z

This PR may be a duplicate of:

MessagePort: serialize MessagePortChannelRegistry with a Lock (fix worker_threads data race) #29442 - Also fixes MessageChannel between Workers causes crash/unexpected exit #25805 by adding locking to MessagePortChannel (with ThreadSafeRefCounted) and MessagePortChannelRegistry; modifies the same files
fix(worker): add thread safety to MessageChannel to avoid crashes #25806 - Adds a Lock to MessagePortChannel protecting m_pendingMessages and m_processes for thread safety; same root cause and approach
Replace MessagePort/BroadcastChannel registries with MessagePortPipe primitive #29937 - Replaces the entire MessagePortChannel/MessagePortChannelRegistry stack with a thread-safe MessagePortPipe primitive; fixes MessageChannel between Workers causes crash/unexpected exit #25805 via rewrite

🤖 Generated with Claude Code

claude

The locking and ThreadSafeRefCounted changes look correct to me, but since this is cross-thread C++ lifetime/locking code and you've explicitly flagged #29442 as an alternative design, I'd like a human to make the call on per-channel vs. registry-wide locking (and to sanity-check that the non-thread-safe CanMakeWeakPtr base is still OK for the registry's WeakRef map now that destruction can race across threads).

Extended reasoning...

Overview

This PR adds a per-instance WTF::Lock to MessagePortChannel and switches its base from RefCountedAndCanMakeWeakPtr to ThreadSafeRefCounted + CanMakeWeakPtr, guarding m_pendingMessages, m_processes, m_isClosed, the self-protector RefPtrs, and m_messageBatchesInFlight in every mutator. It carefully moves protectors and pending containers out to locals before the Locker scope ends so the destructor never runs while holding its own lock, and drops the lock before invoking the takeAllMessagesForPort completion callback to avoid re-entrancy deadlocks. A new ASAN/debug-only stress test floods 20k messages bidirectionally through a transferred port across three subprocess attempts.

Security risks

None in the traditional sense — no parsing, auth, or untrusted-input handling changed. The change removes a memory-safety hazard (Vector realloc race / non-atomic refcount UAF) rather than introducing one. The remaining risk is a subtler concurrency bug (deadlock or a missed lock site) rather than an exploitable surface.

Level of scrutiny

High. This is hand-written cross-thread synchronization in C++ touching object lifetime (ThreadSafeRefCounted, self-referential RefPtr protectors, WeakRef consumers in the registry). Mistakes here manifest as rare crashes that are hard to reproduce. Two specific things I'd want a human to confirm:

The base is now ThreadSafeRefCounted but still the non-thread-safe CanMakeWeakPtr; MessagePortChannelRegistry::m_openChannels holds WeakRef<MessagePortChannel>, and #29442 went with ThreadSafeRefCountedAndCanMakeThreadSafeWeakPtr instead. If weak-ref invalidation can now race with m_openChannels.get() on another thread, that may need the thread-safe variant (or rely on #29972's registry lock to serialize it).
Dropping relaxAdoptionRequirement() relies on ThreadSafeRefCountedBase not enforcing the adoption assertion that RefCountedBase does — that's true in current WebKit, but worth a glance since the constructor still creates RefPtr-to-this before adoptRef.

Other factors

The author explicitly offers to close this in favor of #29442's broader single-registry-lock design, which is a maintainer-level architectural choice I shouldn't make. The only inline finding is a style nit about the explicit 120_000 test timeout, which is informational and consistent with many other slow tests in test/js/ — not a blocker. No CODEOWNERS cover this path.

robobun · 2026-04-30T08:59:30Z

CI build #49377 finished with 2 failed jobs, both pre-existing flakes unrelated to MessagePortChannel:

Test	Job	Also failing on
`test/js/node/watch/fs.watch.test.ts` — "closed FSWatcher is collectable > persistent: false", segfault at 0x0	ubuntu 25.04 x64	main #49200, #49300
`test/js/bun/s3/s3-storage-class.test.ts` — `S3Error: UnknownError` in "should work with writer + options on big file"	darwin 14 aarch64	#49300 (unrelated lockfile branch)

All 20 x64-asan test shards — where the new concurrency test in message-channel.test.ts runs — passed. No MessagePort/MessageChannel/worker test failures on any platform.

robobun · 2026-04-30T11:22:26Z

Re the review's CanMakeWeakPtr concern:

CanMakeWeakPtr<T> with the default DefaultWeakPtrImpl already uses ThreadSafeRefCounted for the impl's own refcount (WeakPtrImplBase : public ThreadSafeRefCounted<Derived> in WeakPtrImpl.h), and the factory is eagerly initialized during messagePortChannelCreated() in the constructor before the channel is published to other threads — so there's no lazy-init race.

The remaining hazard — thread A in ~MessagePortChannel() removing the map entry while thread B does m_openChannels.get(port) and upgrades the WeakRef to a RefPtr — is the Registry's unsynchronized m_openChannels HashMap, which is exactly what #29972 guards. That race exists today on main independent of this PR (destruction could already land on any thread once both ports close); switching to atomic refcounting here doesn't introduce it, it just makes the refcount itself stop being one of the corruption sources.

If #29972's Registry lock turns out to be insufficient and the weak→strong upgrade needs ThreadSafeWeakPtr semantics, that would be a Registry-side change to m_openChannels' value type rather than to this class's base — happy to follow up there or defer to #29442's broader approach.

Jarred-Sumner · 2026-04-30T20:57:51Z

Closing in favor of #29937

github-actions Bot added the claude label Apr 30, 2026

claude Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread test/js/web/workers/message-channel.test.ts

Jarred-Sumner closed this Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard MessagePortChannel state with a per-channel Lock#29991

Guard MessagePortChannel state with a per-channel Lock#29991
robobun wants to merge 1 commit into
mainfrom
farm/d098c4c7/messageportchannel-lock

robobun commented Apr 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 30, 2026

Rate limit exceeded

Uh oh!

robobun commented Apr 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

robobun commented Apr 30, 2026 •

edited

Loading

Uh oh!

robobun commented Apr 30, 2026

Uh oh!

Jarred-Sumner commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robobun commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Repro

Fix

Verification

Related

Uh oh!

coderabbitai Bot commented Apr 30, 2026

Rate limit exceeded

Uh oh!

robobun commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

robobun commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robobun commented Apr 30, 2026

Uh oh!

Jarred-Sumner commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robobun commented Apr 30, 2026 •

edited

Loading

robobun commented Apr 30, 2026 •

edited

Loading

robobun commented Apr 30, 2026 •

edited

Loading