Guard MessagePortChannel state with a per-channel Lock#29991
Conversation
Upstream WebKit serializes all MessagePortChannel access to the main thread via ASSERT(isMainThread()). Bun removed those asserts and calls postMessageToRemote() / takeAllMessagesForPort() from both the worker thread and the main thread directly, so m_pendingMessages[i].append() races with result.swap(m_pendingMessages[i]) and the non-atomic RefCounted refcount is corrupted by concurrent ref()/deref() in the registry lookup path. Under ASAN this surfaces as container-overflow / SEGV in Vector::appendSlowCase -> expandCapacity, or as a premature destructor firing messagePortChannelDestroyed with a stale registry entry. Add a per-channel Lock guarding all mutable state (m_pendingMessages, m_pendingMessageProtectors, m_entangledToProcessProtectors, m_processes, m_isClosed, m_pendingMessagePortTransfers, m_messageBatchesInFlight) and switch the base class from RefCountedAndCanMakeWeakPtr to ThreadSafeRefCounted + CanMakeWeakPtr so ref/deref are atomic. The lock is dropped before invoking the takeAllMessagesForPort callback and before the last self-protector RefPtr is released, so there is no re-entrancy deadlock and no self-destruction under the lock.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
Review rate limit: 0/5 reviews remaining, refill in 5 minutes and 18 seconds. Comment |
|
Updated 4:18 AM PT - Apr 30th, 2026
❌ @robobun, your commit fcfb9db has 2 failures in
🧪 To try this PR locally: bunx bun-pr 29991That installs a local version of the PR into your bun-29991 --bun |
|
Found 3 issues this PR may fix:
🤖 Generated with Claude Code |
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
There was a problem hiding this comment.
The locking and ThreadSafeRefCounted changes look correct to me, but since this is cross-thread C++ lifetime/locking code and you've explicitly flagged #29442 as an alternative design, I'd like a human to make the call on per-channel vs. registry-wide locking (and to sanity-check that the non-thread-safe CanMakeWeakPtr base is still OK for the registry's WeakRef map now that destruction can race across threads).
Extended reasoning...
Overview
This PR adds a per-instance WTF::Lock to MessagePortChannel and switches its base from RefCountedAndCanMakeWeakPtr to ThreadSafeRefCounted + CanMakeWeakPtr, guarding m_pendingMessages, m_processes, m_isClosed, the self-protector RefPtrs, and m_messageBatchesInFlight in every mutator. It carefully moves protectors and pending containers out to locals before the Locker scope ends so the destructor never runs while holding its own lock, and drops the lock before invoking the takeAllMessagesForPort completion callback to avoid re-entrancy deadlocks. A new ASAN/debug-only stress test floods 20k messages bidirectionally through a transferred port across three subprocess attempts.
Security risks
None in the traditional sense — no parsing, auth, or untrusted-input handling changed. The change removes a memory-safety hazard (Vector realloc race / non-atomic refcount UAF) rather than introducing one. The remaining risk is a subtler concurrency bug (deadlock or a missed lock site) rather than an exploitable surface.
Level of scrutiny
High. This is hand-written cross-thread synchronization in C++ touching object lifetime (ThreadSafeRefCounted, self-referential RefPtr protectors, WeakRef consumers in the registry). Mistakes here manifest as rare crashes that are hard to reproduce. Two specific things I'd want a human to confirm:
- The base is now
ThreadSafeRefCountedbut still the non-thread-safeCanMakeWeakPtr;MessagePortChannelRegistry::m_openChannelsholdsWeakRef<MessagePortChannel>, and #29442 went withThreadSafeRefCountedAndCanMakeThreadSafeWeakPtrinstead. If weak-ref invalidation can now race withm_openChannels.get()on another thread, that may need the thread-safe variant (or rely on #29972's registry lock to serialize it). - Dropping
relaxAdoptionRequirement()relies onThreadSafeRefCountedBasenot enforcing the adoption assertion thatRefCountedBasedoes — that's true in current WebKit, but worth a glance since the constructor still createsRefPtr-to-thisbeforeadoptRef.
Other factors
The author explicitly offers to close this in favor of #29442's broader single-registry-lock design, which is a maintainer-level architectural choice I shouldn't make. The only inline finding is a style nit about the explicit 120_000 test timeout, which is informational and consistent with many other slow tests in test/js/ — not a blocker. No CODEOWNERS cover this path.
|
CI build #49377 finished with 2 failed jobs, both pre-existing flakes unrelated to MessagePortChannel:
All 20 |
|
Re the review's
The remaining hazard — thread A in If #29972's Registry lock turns out to be insufficient and the weak→strong upgrade needs |
|
Closing in favor of #29937 |
Problem
MessagePortChannelis shared between the main thread and worker threads in Bun. Upstream WebKit serializes all access to it withASSERT(isMainThread()), but Bun commented those out and calls directly from worker threads (see the note inMessagePortChannelRegistry.cpp: "we totally are calling these off the main thread in many cases in Bun, so ........").This leaves two unsynchronized hazards inside
MessagePortChannelitself:m_pendingMessagesVector race —postMessageToRemote()on the sender thread callsm_pendingMessages[i].append(...)whiletakeAllMessagesForPort()on the receiver thread doesresult.swap(m_pendingMessages[i]). Whenappend()reallocs the backing buffer mid-swap, ASAN reports container-overflow / SEGV inVector::appendSlowCase → expandCapacity.MessagePortChannelderives fromRefCountedAndCanMakeWeakPtr(non-atomicRefCounted). ConcurrentRefPtr channel = m_openChannels.get(...)from the registry on two threads can lose an increment → premature destruction →ASSERTION FAILED: m_openChannels.get(channel.port1()) == &channelinmessagePortChannelDestroyed().Repro
Worker floods 20 000 messages through a transferred
MessagePortwhile the main thread'sonmessagehandler echoes each one back, so both threads are continuously appending to and draining from the same channel. Without the fix, under the debug ASAN build:or
9/10 runs crash. With the fix: 15/15 pass.
Fix
Lock m_locktoMessagePortChanneland take it in every method that touches mutable state (postMessageToRemote,takeAllMessagesForPort,tryTakeMessageForPort,closePort,disentanglePort,entanglePortWithProcess,processForPort). This matches theLocker locker { m_lock }pattern used inEventListenerMap,BroadcastChannel,AbortSignal.takeAllMessagesForPortcompletion callback (which can re-enter viadispatchMessages → postMessage) and before the last self-protectorRefPtris released (so the destructor never runs while holding its own lock). The formerly-vestigial "guarantee that the lock is unlocked before the last ref is released" comments are now accurate again.ThreadSafeRefCounted<MessagePortChannel>, CanMakeWeakPtr<MessagePortChannel>soref()/deref()are atomic. Drop the explicitrelaxAdoptionRequirement()sinceThreadSafeRefCountedBasealready calls it in its constructor.Verification
Related
m_openChannelsHashMap — different layer, explicitly notes it does not address this per-channel race. This PR is complementary.ThreadSafeRefCountedAndCanMakeThreadSafeWeakPtr. This PR takes the narrower per-channel-lock approach so independent channels don't serialize each other; happy to close in favor of MessagePort: serialize MessagePortChannelRegistry with a Lock (fix worker_threads data race) #29442 if that's preferred.MessageChannelto avoid crashes #25806 takes the same per-channel-lock approach but has been stale with merge conflicts since Feb.MessagePortPipeprimitive (full rewrite).