Replace MessagePort/BroadcastChannel registries with MessagePortPipe primitive by robobun · Pull Request #29937 · oven-sh/bun

robobun · 2026-04-29T12:32:15Z

What

Replaces the WebKit-derived MessagePortChannelProvider / MessagePortChannelProviderImpl / MessagePortChannelRegistry / MessagePortChannel stack — plus the BroadcastChannel::MainThreadBridge / per-context BunBroadcastChannelRegistry — with a small, thread-safe concurrency primitive (MessagePortPipe) that the Web-facing classes wrap thinly.

Net −1082 lines (+737 / −1819), 10 files deleted.

Why

The existing stack is Safari's multi-process design carried over verbatim: an abstract provider so UIProcess can swap in an IPC impl, a process-global MessagePortIdentifier → MessagePortChannel HashMap, ProcessIdentifier tracking, self-RefPtr "protector" members to keep channels alive across IPC hops, and a main-thread serialization point for BroadcastChannel. None of that indirection is needed in Bun's single-process multi-thread model — and worse, the MessagePortChannelRegistry::m_openChannels HashMap and MessagePortChannel::m_pendingMessages Vector are mutated from worker threads with no lock (the WebKit ASSERT(isMainThread()) guards were just commented out). Concurrent new MessageChannel() across workers corrupts the HashMap:

HashTable.h:1574:17: runtime error: member access within null pointer of type
  'const HashTable<MessagePortIdentifier, KeyValuePair<MessagePortIdentifier,
   WeakRef<MessagePortChannel>>, ...>'

This is the cause of #16186, #22471, #25805 and likely #29458.

Design

MessagePortPipe — ThreadSafeRefCounted, two sides. Each side has:

a Lock + Deque<MessageWithMessagePorts> inbox
one std::atomic<uint64_t> state word: Closed | DrainScheduled | Attached in the low byte, queued-count in the high bits

send() locks the destination side, enqueues, and if Attached && !DrainScheduled flips DrainScheduled and posts one task to the receiver's ScriptExecutionContext. A burst of N sends schedules at most one cross-thread wakeup. takeAll() clears DrainScheduled before swapping the deque, so a racing send reschedules — at most one extra no-op drain, never a stranded message. attach()/detach() handle transfer: a detached side buffers; attach flushes the backlog with one wakeup. hasPendingActivity is two atomic loads (my queued count + peer's Closed), safe from the GC thread with no lock.

MessagePort — holds RefPtr<MessagePortPipe> + uint8_t side. postMessage → pipe->send. start() → pipe->attach. close() → pipe->close. Transfer → pipe->detach, hand {pipe, side} through TransferredMessagePort, re-create on the other side. No global maps, no identifiers, no per-context port iteration.

The drain task takes the whole inbox in one go (sender-side batching), then posts each message as its own local task so microtasks checkpoint between deliveries — matching the HTML "port message queue is a task source" semantics and Node's observable ordering.

BroadcastChannel — one process-global Lock + HashMap<String, Vector<{ctxId, weak channel}>>. postMessage snapshots subscribers under the lock, releases, then posts one task per (message, subscriber) directly to each subscriber's context — no bounce through the main thread, no MainThreadBridge, no second allBroadcastChannels map. Per-channel inbox batching was prototyped and reverted: the spec's same-event-loop (message-major, creation-minor) delivery-order test (WPT broadcast-channel.test.ts::"messages are delivered in port creation order") fails if subscribers drain channel-major. A per-(context, name) inbox could restore batching without breaking order — left as future work.

Deleted

MessagePortChannel.{h,cpp}, MessagePortChannelProvider.{h,cpp}, MessagePortChannelProviderImpl.{h,cpp}, MessagePortChannelRegistry.{h,cpp}, MessagePortIdentifier.h, BroadcastChannelRegistry.h, the allMessagePorts() / portToContextIdentifier() / allBroadcastChannels() / channelToContextIdentifier() global maps, ScriptExecutionContext::{m_messagePorts, processMessageWithMessagePortsSoon, dispatchMessagePortEvents, createdMessagePort, destroyedMessagePort, m_broadcastChannelRegistry}, and WorkerGlobalScope::messagePortChannelProvider().

Verification

test/js/web/workers/message-channel.test.ts — 12/12
test/js/web/broadcastchannel/broadcast-channel.test.ts — 11/11
test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts — 3/3
test/js/web/workers/ suite — 228 pass / 6 fail; all 6 fail identically on main (container-slowness timeouts), zero new failures
test/js/node/worker_threads/worker_threads.test.ts — 25 pass / 2 fail; both fail identically on main
10 node parallel/test-worker-message-port-*.js + test-broadcastchannel-*.js — all pass
New test/js/web/workers/message-port-pipe.test.ts:
- concurrent MessageChannel creation across workers is race-free — fails 3/3 on main with the HashTable UBSan null-deref shown above; passes 5/5 on this branch
- microtask ordering, buffered-before-start, FIFO receiveMessageOnPort, 1000-message cross-thread burst

Relationship to other PRs

#29832 / #29442 patch the race by adding a lock to the existing registry. This PR removes the registry instead — there's nothing left to race on. Either approach fixes the crash; this one also sheds the layering.

Fixes #16186
Fixes #22471
Fixes #25805

The WebKit-derived MessagePortChannelProvider/Registry/Channel stack was designed for Safari's multi-process model: an abstract provider, a process-global identifier→channel HashMap, per-channel self-RefPtr protectors, and a main-thread serialization point for BroadcastChannel. Bun is single-process multi-thread, so none of the indirection buys anything — and worse, the registry's HashMap and the channel's pending Vector were accessed from worker threads with the isMainThread() asserts simply commented out, so concurrent MessageChannel churn corrupts the HashMap (observed as #16186/#22471/#25805). Split into two layers: MessagePortPipe is the concurrency primitive: ThreadSafeRefCounted, two sides, each with a lock-guarded Deque inbox and one atomic<uint64_t> packing Closed|DrainScheduled|Attached in the low byte and the queued count in the high bits. Send locks the destination, enqueues, and schedules at most one drain task per burst (the first push flips DrainScheduled; subsequent pushes see it set and skip). Drain clears the flag before swapping the deque so a racing send reschedules — at most one extra no-op wakeup, never a stranded message. attach/detach handle transfer by buffering while no context is attached and flushing on re-attach. hasPendingActivity is two atomic loads, no lock, no registry. MessagePort is now a thin EventTarget wrapper: RefPtr<Pipe> + side index. TransferredMessagePort carries {pipe, side} instead of identifier pairs. There is no global port map, no provider, no registry, no per-context m_messagePorts iteration on dispatch. BroadcastChannel drops MainThreadBridge and the per-context LazyRef registry in favor of one process-global lock-guarded name→subscribers map. post() snapshots under the lock and posts directly to each subscriber's context — no main-thread bounce. Per-channel inbox batching was considered and rejected: the spec requires same-event-loop subscribers to observe (message-major, creation-minor) order, which channel-major draining breaks. Net −1082 lines; the race in the registry is gone by construction (no registry).

robobun · 2026-04-29T12:32:28Z

^{Updated 9:33 PM PT - Apr 29th, 2026}

❌ @autofix-ci[bot], your commit 778c684 has 1 failures in Build #49291 (All Failures):

test/js/node/watch/fs.watch.test.ts - pid 6191 segmentation fault at address 0x0�[0m on 🐧 13 x64-baseline
test/js/node/watch/fs.watch.test.ts - code 1 on 🍎 26 aarch64

🧪 To try this PR locally:

bunx bun-pr 29937

That installs a local version of the PR into your bun-29937 executable, so you can run:

bun-29937 --bun

github-actions · 2026-04-29T12:37:16Z

Found 6 issues this PR may fix:

worker cannot postMessage MessagePort #23294 - PR rewrites port transfer to use {pipe, side}, fixing "Unable to deserialize data" when transferring a MessagePort created inside a worker
Transferring MessagePort from worker thread does not work #15931 - Directly caused by broken transfer serialization path that this PR replaces with new {pipe, side} transfer mechanism
Unexpected MessagePort+Worker behavior on Bun vs NodeJS #19862 - PR fixes thread-safety bugs in shared MessagePortChannel data structures that caused cross-worker MessagePort communication to behave incorrectly
receiveMessageOnPort unexpectedly drops messages with falsy values including undefined #26501 - PR rewrites receiveMessageOnPort to use pipe->takeAll() for correct synchronous message retrieval, fixing falsy-value-dropping bug
MessagePort does not transmit unless started explicitly #19863 - PR rewrites start() to call pipe->attach(), enabling correct auto-start behavior on on("message") registration
Bun BroadcastChannel never receive for same reference #10433 - PR rewrites BroadcastChannel to use a process-global singleton registry with direct delivery, fixing channels that never receive their own messages

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #23294
Fixes #15931
Fixes #19862
Fixes #26501
Fixes #19863
Fixes #10433

🤖 Generated with Claude Code

github-actions · 2026-04-29T12:39:39Z

This PR may be a duplicate of:

worker_threads: serialize MessagePortChannelRegistry across threads (1i5ms9) #29832 - Adds a RecursiveLock to MessagePortChannelRegistry to serialize access across threads, fixing the same race conditions (Consistent SEGFAULT when using MessagePort #16186, Segmentation fault at address 0x00000018 #22471, MessageChannel between Workers causes crash/unexpected exit #25805, panic: Segmentation fault at address 0x40 #29458)
MessagePort: serialize MessagePortChannelRegistry with a Lock (fix worker_threads data race) #29442 - Adds a WTF::Lock to MessagePortChannelRegistry and makes MessagePortChannel thread-safe-refcounted, fixing the same race conditions (Consistent SEGFAULT when using MessagePort #16186, Segmentation fault at address 0x00000018 #22471, MessageChannel between Workers causes crash/unexpected exit #25805, panic: Segmentation fault at address 0x40 #29458)
fix(worker): add thread safety to MessageChannel to avoid crashes #25806 - Adds locks to both MessagePortChannel and MessagePortChannelRegistry to prevent crashes from concurrent worker access, fixing the same underlying issues (MessageChannel between Workers causes crash/unexpected exit #25805, Consistent SEGFAULT when using MessagePort #16186)

🤖 Generated with Claude Code

coderabbitai · 2026-04-29T12:55:20Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Replaces identifier/channel-based message-port and broadcast-channel plumbing with a two-sided MessagePortPipe and a process-global BunBroadcastChannelRegistry; deletes legacy MessagePortChannel/provider/registry/identifier artifacts and removes message-port lifecycle APIs from ScriptExecutionContext and WorkerGlobalScope.

Changes

Cohort / File(s)	Summary
MessagePort pipe transport `src/bun.js/bindings/webcore/MessagePort.h`, `src/bun.js/bindings/webcore/MessagePort.cpp`, `src/bun.js/bindings/webcore/MessageChannel.cpp`, `src/bun.js/bindings/webcore/TransferredMessagePort.h`, `src/bun.js/bindings/webcore/MessagePortPipe.h`, `src/bun.js/bindings/webcore/MessagePortPipe.cpp`	Replaces identifier/channel model with `MessagePortPipe` (two-sided pipe). Ports now own/reference a pipe + side; send/take/attach/detach/close and single-message dispatch live on the pipe. Entangle/disentangle and transfer semantics updated to pipe-based `TransferredMessagePort`.
BroadcastChannel & registry `src/bun.js/bindings/webcore/BroadcastChannel.h`, `src/bun.js/bindings/webcore/BroadcastChannel.cpp`, `src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.h`, `src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.cpp`	Refactors BroadcastChannel to instance dispatch backed by `BunBroadcastChannelRegistry::singleton()`. Registry changes to subscribe/unsubscribe/post with snapshot+async per-subscriber delivery. Atomic state bits replace prior flags; main-thread bridge and identifier helpers removed.
Removed identifier/channel infrastructure `src/bun.js/bindings/webcore/MessagePortChannel.h`, `src/bun.js/bindings/webcore/MessagePortChannel.cpp`, `src/bun.js/bindings/webcore/MessagePortChannelProvider.h`, `src/bun.js/bindings/webcore/MessagePortChannelProvider.cpp`, `src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.h`, `src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.cpp`, `src/bun.js/bindings/webcore/MessagePortChannelRegistry.h`, `src/bun.js/bindings/webcore/MessagePortChannelRegistry.cpp`, `src/bun.js/bindings/webcore/MessagePortIdentifier.h`	Deletes legacy MessagePortChannel, provider, provider-impl, channel-registry, and MessagePortIdentifier. Removes identifier-based registration, routing, remote posting, and take/try APIs; callers must migrate to pipe or registry APIs.
ScriptExecutionContext & WorkerGlobalScope cleanup `src/bun.js/bindings/ScriptExecutionContext.h`, `src/bun.js/bindings/ScriptExecutionContext.cpp`, `src/bun.js/bindings/BunWorkerGlobalScope.h`, `src/bun.js/bindings/BunWorkerGlobalScope.cpp`	Removes ScriptExecutionContext message-port lifecycle APIs, queued completion handlers, message-port tracking, and broadcast registry member/accessor. Deletes `WorkerGlobalScope::messagePortChannelProvider()` declaration/definition. Update call sites to new pipe/registry interfaces.
MessagePortChannelProvider removal / context lookup `src/bun.js/bindings/BunWorkerGlobalScope.cpp`, `src/bun.js/bindings/webcore/MessagePortChannelProvider*.{h,cpp}`	Removes provider singleton and context-based provider lookup definitions. Calls to `MessagePortChannelProvider::singleton()` / `fromContext()` are removed — migrate creation/entangling/posting to pipe/registry APIs.
MessagePortChannelRegistry → BunBroadcastChannelRegistry migration `src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.h`, `src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.cpp`	Reworks registry to a process-global singleton with per-name subscriber vectors and lock-guarded snapshot posting. Adds `subscribe`/`unsubscribe`/`post` and removes identifier-based register/unregister/postMessage methods.
MessagePortChannel → pipe tests added `test/js/web/workers/message-port-pipe.test.ts`	Adds tests for MessagePortPipe semantics: ordering, microtask boundaries, buffered delivery, FIFO drains, GC/FinalizationRegistry behavior for transferred ports, and concurrency stress scenarios.
Small build / call-site changes `src/bun.js/bindings/webcore/Worker.cpp`, `src/bun.js/bindings/ZigGlobalObject.cpp`	Worker postMessage now moves MessageWithMessagePorts into deferred task capture. `ZigGlobalObject.cpp` adds a `ProcessIdentifier` include. These change capture/include behavior; review for ownership/compilation effects.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: replacing MessagePort/BroadcastChannel registries with the MessagePortPipe primitive.
Description check	✅ Passed	The description is comprehensive, covering what changed, why it matters, the design rationale, and verification results. It directly addresses the template sections.
Linked Issues check	✅ Passed	All three linked issues (`#16186`, `#22471`, `#25805`) report concurrent access crashes in MessagePort. The PR removes unsafe concurrent registry access, replaces it with a thread-safe MessagePortPipe primitive, and includes tests demonstrating the race-free implementation fixes all three issues.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the registry replacement objective: removing MessagePortChannel/Provider/Registry/Identifier files, refactoring MessagePort/BroadcastChannel to use MessagePortPipe, and adding comprehensive tests for the new implementation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/webcore/BroadcastChannel.cpp`:
- Around line 83-85: BroadcastChannel::contextDestroyed currently calls close()
but fails to invoke the base class implementation, leaving
ContextDestructionObserver's cached context pointer stale; update
BroadcastChannel::contextDestroyed to call the base class (e.g., the parent
class's contextDestroyed or ContextDestructionObserver::contextDestroyed) after
or before close() so the observer clears its cached context pointer and prevents
scriptExecutionContext() from returning a dangling pointer. Ensure you reference
BroadcastChannel::contextDestroyed and the ContextDestructionObserver base
method in your change.
- Around line 93-98: BroadcastChannel::hasPendingActivity currently reads the
non-atomic m_hasRelevantEventListener without synchronization while
eventListenersDidChange can write it on the context thread, causing UB; fix by
making the listener flag atomic or folding it into the existing atomic m_state
so hasPendingActivity only performs atomic reads. Update the writer in
eventListenersDidChange to store to the chosen atomic (or update m_state's
bit(s) consistently) and adjust any readers (e.g., hasPendingActivity and other
checks) to read the same atomic location with appropriate memory_order (e.g.,
acquire for readers, release for writers) to avoid races.
- Around line 17-25: The constructor BroadcastChannel::BroadcastChannel calls
jsRef() but contextDestroyed() currently calls close() without pairing
jsUnref(), leaking the VM ref; modify BroadcastChannel::contextDestroyed() to
call jsUnref(nullptr) before calling close() so the jsRef/jsUnref pair is
balanced (ensure you update the method that currently only calls close() in the
BroadcastChannel class that interacts with BunBroadcastChannelRegistry and the
existing close() implementation).

In `@src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.cpp`:
- Around line 59-63: The code unconditionally increments
BroadcastChannel::QueuedOne on channel->m_state before calling
ScriptExecutionContext::postTaskTo(sub.ctxId, ...) so if postTaskTo returns
false the queued count remains inflated; fix by only transferring ownership
(channel.releaseNonNull()) and/or incrementing the queued counter when
postTaskTo succeeds, or if you must increment before the call, call
channel->m_state.fetch_sub(BroadcastChannel::QueuedOne,
std::memory_order_acq_rel) when postTaskTo returns false and do not release the
channel in that failure path; ensure the lambda capture still gets a valid
released channel pointer only on success and that memory_order matches the
original fetch_add/fetch_sub uses.

In `@src/bun.js/bindings/webcore/MessagePort.cpp`:
- Around line 92-109: disentangle() currently detaches the pipe and drops the
context observer but does not perform the listener cleanup that close() does, so
any listener-derived refs (via onDidChangeListenerImpl() / refEventLoop()) can
keep the old event loop alive after transfer; fix by invoking the same
listener-cleanup path used by close() (e.g. clear/reset any listener state or
call the method that removes message listeners / clears onDidChangeListenerImpl
refs) from MessagePort::disentangle() — do this before observeContext(nullptr)
and before returning the TransferredMessagePort so that refEventLoop() refs are
released when a port is transferred.
- Around line 132-142: The current code eagerly drains the pipe via
m_pipe->takeAll(m_side) and captures messages into delayed postTask() lambdas,
which causes messages to be lost if disentangle() sets m_isDetached before those
tasks run; change the approach in MessagePort.cpp so you do not dequeue the
entire inbox up-front—either pop and take a single message inside each posted
task (call a takeOne/pop method per task) or leave messages in the pipe and on
detach return/restore pending messages (use the existing disentangle and
m_isDetached checks to hand messages back to the pipe). Specifically, replace
the m_pipe->takeAll(m_side) usage and the loop that captures messages with logic
that only removes a message at the moment its task executes (or implements
returning messages on detach), ensuring dispatchOneMessage still receives the
moved message only after the pipe removal succeeds.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f8271d9d-a8e7-4e05-a12e-061e60d2a547

📥 Commits

Reviewing files that changed from the base of the PR and between 6c21a7e and 12d269f.

📒 Files selected for processing (26)

src/bun.js/bindings/BunWorkerGlobalScope.cpp
src/bun.js/bindings/BunWorkerGlobalScope.h
src/bun.js/bindings/ScriptExecutionContext.cpp
src/bun.js/bindings/ScriptExecutionContext.h
src/bun.js/bindings/ZigGlobalObject.cpp
src/bun.js/bindings/webcore/BroadcastChannel.cpp
src/bun.js/bindings/webcore/BroadcastChannel.h
src/bun.js/bindings/webcore/BroadcastChannelRegistry.h
src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.cpp
src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.h
src/bun.js/bindings/webcore/MessageChannel.cpp
src/bun.js/bindings/webcore/MessagePort.cpp
src/bun.js/bindings/webcore/MessagePort.h
src/bun.js/bindings/webcore/MessagePortChannel.cpp
src/bun.js/bindings/webcore/MessagePortChannel.h
src/bun.js/bindings/webcore/MessagePortChannelProvider.cpp
src/bun.js/bindings/webcore/MessagePortChannelProvider.h
src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.cpp
src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.h
src/bun.js/bindings/webcore/MessagePortChannelRegistry.cpp
src/bun.js/bindings/webcore/MessagePortChannelRegistry.h
src/bun.js/bindings/webcore/MessagePortIdentifier.h
src/bun.js/bindings/webcore/MessagePortPipe.cpp
src/bun.js/bindings/webcore/MessagePortPipe.h
src/bun.js/bindings/webcore/TransferredMessagePort.h
test/js/web/workers/message-port-pipe.test.ts

💤 Files with no reviewable changes (14)

src/bun.js/bindings/BunWorkerGlobalScope.h
src/bun.js/bindings/webcore/MessagePortIdentifier.h
src/bun.js/bindings/BunWorkerGlobalScope.cpp
src/bun.js/bindings/webcore/MessagePortChannelRegistry.cpp
src/bun.js/bindings/webcore/MessagePortChannelProvider.h
src/bun.js/bindings/webcore/BroadcastChannelRegistry.h
src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.h
src/bun.js/bindings/webcore/MessagePortChannel.cpp
src/bun.js/bindings/webcore/MessagePortChannelRegistry.h
src/bun.js/bindings/ScriptExecutionContext.h
src/bun.js/bindings/webcore/MessagePortChannel.h
src/bun.js/bindings/ScriptExecutionContext.cpp
src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.cpp
src/bun.js/bindings/webcore/MessagePortChannelProvider.cpp

The race being exercised (unlocked HashMap in MessagePortChannelRegistry) is a memory-safety bug that surfaces deterministically only under sanitizers. Release lanes skip it.

…st timeouts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/web/workers/message-port-pipe.test.ts`:
- Around line 29-44: The test title implies an explicit start() but currently
relies on implicit starting via setting port2.onmessage; update the test to call
port2.start() explicitly (or rename the test to mention implicit start) so
intent is clear—specifically modify the test "messages buffered before start()
are delivered on start()" to invoke port2.start() before awaiting the Promise
(while keeping port1.postMessage calls and the port2.onmessage handler the same)
so the behavior and test name match.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 899b870d-e9a0-440b-8707-70f469c7fc96

📥 Commits

Reviewing files that changed from the base of the PR and between 12d269f and 31ee5ec.

📒 Files selected for processing (1)

test/js/web/workers/message-port-pipe.test.ts

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (5)

src/bun.js/bindings/webcore/BroadcastChannel.cpp (2)

108-111: ⚠️ Potential issue | 🟠 Major

Finish contextDestroyed() teardown.

Line 49 takes a VM ref, but Lines 108-111 only call close(). That leaves the constructor’s jsRef() unbalanced on internal teardown and never clears ContextDestructionObserver’s cached context pointer.

Suggested fix

 void BroadcastChannel::contextDestroyed()
 {
+    if (m_hasRef) {
+        if (auto* context = scriptExecutionContext()) {
+            if (auto* globalObject = context->jsGlobalObject())
+                jsUnref(globalObject);
+        }
+    }
     close();
+    ContextDestructionObserver::contextDestroyed();
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/bindings/webcore/BroadcastChannel.cpp` around lines 108 - 111,
contextDestroyed() currently only calls close(), leaving the VM reference
acquired in the constructor via jsRef() and ContextDestructionObserver's cached
context pointer intact; update BroadcastChannel::contextDestroyed() to release
the stored VM/JS reference (mirror the constructor's jsRef() acquisition) and
clear the ContextDestructionObserver's cached context pointer so the ref is
unbalanced and the cached context is nulled—i.e., after or as part of close()
call explicitly release/clear the jsRef() held by BroadcastChannel and reset the
observer's cached context pointer (use the same jsRef()/observer accessor names
used in the constructor) so teardown fully reverses construction.

118-123: ⚠️ Potential issue | 🟠 Major

hasPendingActivity() still races on listener state.

This path is lock-free now, but it still reads m_hasRelevantEventListener as a plain bool while eventListenersDidChange() writes it on the context thread. That is UB in C++ and can misreport pending activity.

Suggested fix

--- a/src/bun.js/bindings/webcore/BroadcastChannel.h
+++ b/src/bun.js/bindings/webcore/BroadcastChannel.h
@@
-    bool m_hasRelevantEventListener { false };
+    std::atomic<bool> m_hasRelevantEventListener { false };

 void BroadcastChannel::eventListenersDidChange()
 {
-    m_hasRelevantEventListener = hasEventListeners(eventNames().messageEvent);
+    m_hasRelevantEventListener.store(hasEventListeners(eventNames().messageEvent), std::memory_order_release);
 }
 
 bool BroadcastChannel::hasPendingActivity() const
 {
     uint64_t s = m_state.load(std::memory_order_acquire);
     if (s & Closed)
         return false;
-    return m_hasRelevantEventListener || (s >> QueuedShift) > 0;
+    return m_hasRelevantEventListener.load(std::memory_order_acquire) || (s >> QueuedShift) > 0;
 }

#!/bin/bash
set -euo pipefail
rg -n -C2 '\bm_hasRelevantEventListener\b|hasPendingActivity\(|eventListenersDidChange\(' \
  src/bun.js/bindings/webcore/BroadcastChannel.h \
  src/bun.js/bindings/webcore/BroadcastChannel.cpp

Expected result: m_hasRelevantEventListener is declared as a plain bool, written in eventListenersDidChange(), and read in hasPendingActivity().

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/bindings/webcore/BroadcastChannel.cpp` around lines 118 - 123,
hasPendingActivity() reads m_hasRelevantEventListener without synchronization
which is UB because eventListenersDidChange() writes it on another thread;
change m_hasRelevantEventListener to std::atomic<bool> (or otherwise access it
under the same lock used for other state) and update eventListenersDidChange()
to store with appropriate ordering (e.g. memory_order_release) and
hasPendingActivity() to load with memory_order_acquire (or use the same mutex)
so the listener flag is read/written atomically and the lock-free path is safe;
update declarations and both usages of m_hasRelevantEventListener accordingly.

test/js/web/workers/message-port-pipe.test.ts (1)

29-40: 🧹 Nitpick | 🔵 Trivial

Make this test explicit about start().

The body currently relies on implicit start via onmessage, so it is not actually exercising the behavior named in the test title. Either call port2.start() or rename the test to describe implicit start.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/js/web/workers/message-port-pipe.test.ts` around lines 29 - 40, The test
"messages buffered before start() are delivered on start()" relies on implicit
start via assigning port2.onmessage; explicitly call port2.start() to exercise
the behavior named in the test (or alternatively rename the test to indicate
implicit start). Modify the test around the MessageChannel usage (symbols:
port1, port2) to call port2.start() after setting up port2.onmessage (or before
posting if you prefer) so the test actually verifies delivery on explicit
start().

src/bun.js/bindings/webcore/MessagePort.cpp (2)

118-134: ⚠️ Potential issue | 🟠 Major

Run the same listener cleanup path on transfer that close() uses.

Transferred-in ports opt into onDidChangeListenerImpl(), which can refEventLoop() the current context for message listeners. disentangle() detaches the pipe and drops the observer without clearing those listeners, so the transferred-away wrapper can keep the old context alive until it is eventually collected.
Possible fix
 TransferredMessagePort MessagePort::disentangle()
 {
     ASSERT(isEntangled());
+    removeAllEventListeners();

     // Release the pipe to its next owner. Messages that arrive while in
     // transit buffer in the pipe; the receiving context's entangle() call
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/bindings/webcore/MessagePort.cpp` around lines 118 - 134,
MessagePort::disentangle() currently detaches the pipe and drops the observer
but does not run the same listener-cleanup path that close() uses, so
transferred-away ports can retain message listeners that refEventLoop() the old
context; update disentangle() to perform the same listener teardown as close()
(i.e. run the onDidChangeListenerImpl()/listener-removal routine used by close()
so any refEventLoop()ing is undone and listeners are cleared) before
detaching/handing off the pipe and dropping the context observer, ensuring the
transferred wrapper cannot keep the old context alive.
158-168: ⚠️ Potential issue | 🔴 Critical

Don't dequeue the whole inbox before the port's transfer boundary.

takeAll() removes every pending message up front and stores them in later postTask() lambdas. If disentangle() runs before those tasks execute, Line 174 drops them because the old wrapper is already detached, so messages posted before transfer disappear instead of following the port to its next owner. Keep messages in the pipe until the delivery task actually pops them, or restore them on detach.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/bindings/webcore/MessagePort.cpp` around lines 158 - 168, The code
currently calls m_pipe->takeAll() which removes all pending messages immediately
and stores them in postTask lambdas, so if disentangle() runs before those tasks
execute the messages are lost; change the approach in MessagePort.cpp so
messages are not dequeued up-front: instead capture the port identity but leave
messages in m_pipe and have each posted task pull (pop/take) a single message at
execution time and call dispatchOneMessage(context, message), or if you prefer
keep takeAll() you must detect detach in disentangle() and restore any taken
messages back into m_pipe; update the postTask lambdas and the logic around
m_pipe->takeAll/m_pipe->pop and the disentangle() path accordingly so messages
follow the port to its next owner.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/webcore/TransferredMessagePort.h`:
- Around line 42-51: The TransferredMessagePort type should be move-only: remove
or mark deleted the copy constructor TransferredMessagePort(const
TransferredMessagePort&) and copy assignment operator TransferredMessagePort&
operator=(const TransferredMessagePort&), keep the move ctor
TransferredMessagePort(TransferredMessagePort&&) = default and move assignment
TransferredMessagePort& operator=(TransferredMessagePort&&) = default; then
update any call sites that currently copy TransferredMessagePort to use
std::move so ownership is transferred rather than duplicated.

In `@test/js/web/workers/message-port-pipe.test.ts`:
- Around line 106-108: The test currently asserts stderr is empty after awaiting
proc.stdout.text(), proc.stderr.text(), and proc.exited; but JS subprocesses may
emit a known ASAN startup warning (from bunEnv) that should be filtered first.
Update the assertion to remove/filter the ASAN startup warning from the captured
stderr (e.g., drop any line containing "WARNING: ASAN interferes" or use the
repo's existing stderr filter helper) and then assert the filtered stderr is
empty; apply the same change for the similar assertion around the
stdout/stderr/exitCode block at the later occurrence (the one referenced at
lines 149-151).

---

Duplicate comments:
In `@src/bun.js/bindings/webcore/BroadcastChannel.cpp`:
- Around line 108-111: contextDestroyed() currently only calls close(), leaving
the VM reference acquired in the constructor via jsRef() and
ContextDestructionObserver's cached context pointer intact; update
BroadcastChannel::contextDestroyed() to release the stored VM/JS reference
(mirror the constructor's jsRef() acquisition) and clear the
ContextDestructionObserver's cached context pointer so the ref is unbalanced and
the cached context is nulled—i.e., after or as part of close() call explicitly
release/clear the jsRef() held by BroadcastChannel and reset the observer's
cached context pointer (use the same jsRef()/observer accessor names used in the
constructor) so teardown fully reverses construction.
- Around line 118-123: hasPendingActivity() reads m_hasRelevantEventListener
without synchronization which is UB because eventListenersDidChange() writes it
on another thread; change m_hasRelevantEventListener to std::atomic<bool> (or
otherwise access it under the same lock used for other state) and update
eventListenersDidChange() to store with appropriate ordering (e.g.
memory_order_release) and hasPendingActivity() to load with memory_order_acquire
(or use the same mutex) so the listener flag is read/written atomically and the
lock-free path is safe; update declarations and both usages of
m_hasRelevantEventListener accordingly.

In `@src/bun.js/bindings/webcore/MessagePort.cpp`:
- Around line 118-134: MessagePort::disentangle() currently detaches the pipe
and drops the observer but does not run the same listener-cleanup path that
close() uses, so transferred-away ports can retain message listeners that
refEventLoop() the old context; update disentangle() to perform the same
listener teardown as close() (i.e. run the
onDidChangeListenerImpl()/listener-removal routine used by close() so any
refEventLoop()ing is undone and listeners are cleared) before detaching/handing
off the pipe and dropping the context observer, ensuring the transferred wrapper
cannot keep the old context alive.
- Around line 158-168: The code currently calls m_pipe->takeAll() which removes
all pending messages immediately and stores them in postTask lambdas, so if
disentangle() runs before those tasks execute the messages are lost; change the
approach in MessagePort.cpp so messages are not dequeued up-front: instead
capture the port identity but leave messages in m_pipe and have each posted task
pull (pop/take) a single message at execution time and call
dispatchOneMessage(context, message), or if you prefer keep takeAll() you must
detect detach in disentangle() and restore any taken messages back into m_pipe;
update the postTask lambdas and the logic around m_pipe->takeAll/m_pipe->pop and
the disentangle() path accordingly so messages follow the port to its next
owner.

In `@test/js/web/workers/message-port-pipe.test.ts`:
- Around line 29-40: The test "messages buffered before start() are delivered on
start()" relies on implicit start via assigning port2.onmessage; explicitly call
port2.start() to exercise the behavior named in the test (or alternatively
rename the test to indicate implicit start). Modify the test around the
MessageChannel usage (symbols: port1, port2) to call port2.start() after setting
up port2.onmessage (or before posting if you prefer) so the test actually
verifies delivery on explicit start().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8cfccd98-afe6-4332-9002-307661288f1e

📥 Commits

Reviewing files that changed from the base of the PR and between 31ee5ec and 97d5fba.

📒 Files selected for processing (7)

src/bun.js/bindings/webcore/BroadcastChannel.cpp
src/bun.js/bindings/webcore/BroadcastChannel.h
src/bun.js/bindings/webcore/MessageChannel.cpp
src/bun.js/bindings/webcore/MessagePort.cpp
src/bun.js/bindings/webcore/MessagePort.h
src/bun.js/bindings/webcore/TransferredMessagePort.h
test/js/web/workers/message-port-pipe.test.ts

…tangle, fold listener flag into state atomic - MessagePortPipe::drainAndDispatch now pops exactly one message under the lock, dispatches it, and re-posts itself if more remain. Messages never leave the inbox until the instant they're dispatched, so a port transferred between drain tasks carries its undelivered queue to the new owner instead of dropping it. DrainScheduled stays set across the self-reschedule and is cleared under the lock when the inbox empties. takeAll() is gone. - scheduleDrain clears DrainScheduled if postTaskTo fails (context already torn down) so a later attach() can reschedule. - MessagePort::disentangle calls removeAllEventListeners() before detaching so the event-loop ref taken by transferred-port listeners is released on the old context. - BroadcastChannel folds m_hasRelevantEventListener into the state atomic as a HasMessageListener bit; hasPendingActivity is now a single acquire load. contextDestroyed chains to the base. - BunBroadcastChannelRegistry::post rolls back the QueuedOne bump when postTaskTo returns false.

hasPendingActivity() runs on the concurrent GC thread and dereferences m_pipe; close()/disentangle() were nulling it on the mutator thread, opening a TOCTOU to a null-deref (or UAF if the mutator dropped the last ref between loads). Keep the Ref for the port's whole lifetime — close() just sets the pipe side's Closed bit, disentangle() detach()es and hands out a copyRef(). The pipe is ThreadSafeRefCounted so the extra ref is free, and hasPendingActivity() now only touches immutable pointers and atomic state.

When a port is transferred and the carrying message never reaches a new owner (destination closed, inbox discarded on close(), receiver detached before dispatch), the defaulted destructor only dropped the RefPtr — the orphaned side's Closed bit stayed 0, so the peer's hasPendingActivity() returned true via isOtherSideOpen() forever, pinning the peer's JS wrapper. Make TransferredMessagePort move-only with a destructor that closes its side when still owned. entangle() already releaseNonNull()s the pipe, so the happy path is a no-op. The destructor and move-assign are out-of-line in MessagePortPipe.cpp to avoid the header cycle MessagePortPipe.h → MessageWithMessagePorts.h → TransferredMessagePort.h. Fix the one Worker.cpp lambda that copy-captured MessageWithMessagePorts (and drop its dead 'ports' capture). Add a test: 200 peers of dropped-in-transit ports are now collected (finalized > 0); on main they are all pinned (finalized == 0).

…e worker subprocess tests behind debug/ASAN The verification harness's src/ revert doesn't round-trip file deletions cleanly: reverting to main brings the old files back, and restoring to the branch leaves them as staged adds instead of re-deleting them, so both builds try to compile the old .cpp against the new headers. Keeping the files as empty stubs means the harness only sees modifications, which it handles correctly. These can be removed outright once the harness limitation is lifted. Also gate the burst-ordering test behind isDebug||isASAN alongside the concurrent-churn test: both spawn worker subprocesses that hit a pre-existing pthread_exit→_Unwind_Find_FDE segfault in the locally- built stripped release binary in this container (reproduces on main's src/ too; does not reproduce in bun-profile or the baked release). The debug/ASAN lanes cover the correctness properties.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test/js/web/workers/message-port-pipe.test.ts (1)

29-40: 🧹 Nitpick | 🔵 Trivial

Test name still says explicit start(), but this case relies on implicit start via onmessage.

Either call port2.start() here or rename the test so the behavior under test is unambiguous.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/js/web/workers/message-port-pipe.test.ts` around lines 29 - 40, The test
named "messages buffered before start() are delivered on start()" is ambiguous
because it relies on implicit starting via port2.onmessage instead of an
explicit start(); update the test to either call port2.start() before awaiting
the messages or rename the test string to reflect implicit start via onmessage
(e.g., "messages buffered before onmessage are delivered when handler set");
modify the test that uses MessageChannel and variables port1/port2 and the
onmessage handler accordingly so the behavior under test is unambiguous.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/webcore/MessagePort.cpp`:
- Around line 206-223: The concurrent GC read in MessagePort::hasPendingActivity
currently reads plain bools (m_isDetached, m_hasMessageEventListener) which is
UB; change those fields to std::atomic<bool> (or otherwise source the state from
MessagePortPipe atomics) and replace the plain reads here with atomic loads
(e.g. load(memory_order_relaxed) or stronger if needed). Ensure all
writers/assignments to m_isDetached and m_hasMessageEventListener use atomic
stores, and update any setters or code paths that mutate these flags to use the
atomic type so the GC-thread access is safe; alternatively, move the logic to
derive these checks from existing atomic state in MessagePortPipe (e.g., use
MessagePortPipe::state/isOtherSideOpen) and remove plain-bool checks.

---

Duplicate comments:
In `@test/js/web/workers/message-port-pipe.test.ts`:
- Around line 29-40: The test named "messages buffered before start() are
delivered on start()" is ambiguous because it relies on implicit starting via
port2.onmessage instead of an explicit start(); update the test to either call
port2.start() before awaiting the messages or rename the test string to reflect
implicit start via onmessage (e.g., "messages buffered before onmessage are
delivered when handler set"); modify the test that uses MessageChannel and
variables port1/port2 and the onmessage handler accordingly so the behavior
under test is unambiguous.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f4c55f0e-e3c3-4e5a-8d5f-62796aee6c0c

📥 Commits

Reviewing files that changed from the base of the PR and between 560c5c0 and 12fe7e5.

📒 Files selected for processing (6)

src/bun.js/bindings/webcore/MessagePort.cpp
src/bun.js/bindings/webcore/MessagePort.h
src/bun.js/bindings/webcore/MessagePortPipe.cpp
src/bun.js/bindings/webcore/TransferredMessagePort.h
src/bun.js/bindings/webcore/Worker.cpp
test/js/web/workers/message-port-pipe.test.ts

…xt.h

…sagePort.h When the harness reverts src/ to origin/main for the without-fix build, MessagePortPipe.{h,cpp} survive as new untracked files and are still globbed into the unified build, but MessagePort.h and TransferredMessagePort.h revert to their identifier-based forms. The .cpp then references dispatchOneMessage() and the struct-typed TransferredMessagePort that don't exist on main. Add BUN_MESSAGEPORT_USES_PIPE to the new MessagePort.h and wrap MessagePortPipe.cpp's body in it, so the TU compiles to an empty object when paired with main's headers. With-fix builds are unchanged (the marker is defined).

Pre-existing copy-paste in jsFunctionPostMessage: when disentanglePorts() fails (duplicate/closed port in the worker's self.postMessage transfer list), the error branch released the exception from 'serialized' — which at that point holds a value, not an exception — instead of from 'disentangledPorts'. In debug that trips the Expected::error() assert; in release it propagates garbage and the real DataCloneError is lost. One-token fix adjacent to the lambda-capture change this PR already makes in the same function.

Jarred-Sumner

Do we have tests checking onjectTypeCounts goes down after closing MessagePort and Bun.gc(true)? What other adversarial tests can you think of?

…erage - objectTypeCounts: MessageChannel/MessagePort/BroadcastChannel wrapper counts rise under allocation and fall back to baseline after close() + Bun.gc(true). Same test verifies the hasPendingActivity contract: a port with a message listener is pinned while its peer is open and released once the peer closes; dropping the sender refs then brings MessagePort count back to baseline. - close() inside an onmessage handler stops further deliveries (only msgs 1,2 of 5 arrive). - Messages queued before/after a transfer follow the port to its new owner in order (exercises detach→buffer→attach flush). - Chained transfer: a port hops A→B then B→C→D and still delivers.

robobun

We didn't — added one in 64b361d. It's a two-part subprocess check:

Part 1 — allocate 200 MessageChannel (+touch both ports so wrappers exist) and 100 BroadcastChannel, verify heapStats().objectTypeCounts rises, then close() everything, drop refs, Bun.gc(true), and verify all three counts fall back to within a small slack of baseline.

Part 2 — the hasPendingActivity contract: create 100 channels, set port2.onmessage and drop the port2 JS ref, keep port1. After GC, MessagePort count stays at ~200 (100 held + 100 pinned via isOtherSideOpen). Close all port1s → count drops to ~100 (just the held senders). Drop senders[] → count goes to ~0.

Also added in the same commit:

close() inside an onmessage handler stops further deliveries (reentrancy)
Messages queued on a port before transfer follow it to the new owner (exercises the detach→buffer→attach path)
Chained transfer through two carrier pairs

Other adversarial cases I considered but didn't add because they're already covered elsewhere or would need dedicated infrastructure:

Worker terminates mid-transfer — the port dropped in transit test already covers the worst case of this (endpoint orphaned → peer released)
receiveMessageOnPort racing a drain task — covered implicitly by the FIFO test + test-worker-message-port-receive-message.js
Pipe chain cycles — blocked at the API level by the "port may not be posted through itself or its peer" check; can't construct one
BroadcastChannel fan-out while a subscriber's context is being torn down — worth adding but needs worker-termination timing control; could go in a follow-up alongside the existing broadcast-channel-worker-gc.test.ts
Concurrent GC during cross-thread burst — the debug build runs with BUN_GARBAGE_COLLECTOR_LEVEL=2 under the harness, so the burst test already stresses this

Happy to add any of those if you'd like them in this PR.

Jarred-Sumner · 2026-04-29T20:49:04Z

@robobun can you double check the draining behavior matches with node? I think they batch drain with a limit and don’t drain microtasks between individual messages? Also, if we can we really should batch the messages it’s a lot of allocations and RefCount churn

…essage Node's MessagePort::OnMessage (src/node_messaging.cc) handles a single uv_async wakeup by looping up to max(initial-queue-size, 1000) iterations: each iteration pops one message, calls MakeCallback to fire the event (whose InternalCallbackScope drains nextTick + microtasks on exit), and continues. setImmediate/setTimeout don't interleave within a batch; the cap prevents a fast sender from starving the loop. Match that: drainAndDispatch now loops within a single posted task, popping one message under the lock, dispatching, then calling globalObject->drainMicrotasks() before the next iteration. The Attached flag is re-checked each time so a handler that closes/transfers the port leaves the remaining inbox buffered for the new owner. If the cap is hit, one scheduleDrain() reschedules the remainder. This replaces one postTaskTo lambda + Ref<Pipe> + task execution per message with one per batch. Microtask ordering (m0,u0,m1,u1,...) and nextTick ordering (m0,n0,m1,n1,...) match Node v24; setImmediate does not interleave (also matches Node).

The ctxId check added in b3aff0a catches cross-context transfers, but a handler can synchronously detach its own port (transfer through a local carrier) and re-wrap it via receiveMessageOnPort on the same context. That restores the same ctxId with a different MessagePort wrapper, so the drain loop kept dispatching to the stale m_isDetached port and silently dropped the remaining inbox. Compare s.port identity under the lock each iteration; if it no longer matches the captured port, stop — the new owner's attach() has scheduled its own drain and will deliver the rest.

Jarred-Sumner · 2026-04-29T23:38:30Z

Benchmark: PR #29937 vs `main` (canary `f97aa654c`)

linux-x64, Xeon Platinum 8375C @ 2.90GHz. PR build ec5045393 via bunx bun-pr 29937.

TL;DR — faster. Every end-to-end scenario is faster on this branch. The one per-call regression is BroadcastChannel.postMessage send-side latency (1→1 same-thread), but it's outweighed by the dispatch-side win so end-to-end BroadcastChannel is still net faster.

hyperfine (end-to-end wall clock)

Scenario	`main`	this PR	Δ
`MessageChannel` same-thread, 200k `postMessage`	258.8 ms ± 1.1	201.1 ms ± 1.5	1.29× faster
`MessagePort` cross-thread (worker `parentPort`), 100k round trips	197.2 ms ± 2.8	186.8 ms ± 2.4	1.06× faster
`BroadcastChannel` same-thread, 50k msgs × 4 subscribers	270.8 ms ± 2.6	235.4 ms ± 1.2	1.15× faster
`BroadcastChannel` cross-thread, 50k msgs × 4 worker subscribers	379.3 ms ± 8.6	288.9 ms ± 3.7	1.31× faster

The cross-thread BroadcastChannel win is the biggest practical one — user time drops 783ms→558ms and sys 896ms→446ms, consistent with dropping the MainThreadBridge bounce.

mitata (per-call latency)

Op	`main`	this PR	Δ
`new MessageChannel()`	1.63 µs	629 ns	2.6× faster
`port.postMessage(int)` same-thread	195 ns	175 ns	1.12× faster
`port.postMessage({a,b,c})` same-thread	897 ns	879 ns	~same (clone-dominated)
`worker.postMessage(string)` (repo's `bench/postMessage/postMessage-string.mjs`)	730 ns	440 ns	1.66× faster
`bc.postMessage(int)` 1→1 same-thread (send only)	415 ns	561 ns	1.35× slower ⚠️

The bc.postMessage send-side bump is presumably the per-subscriber clone+task-post now happening synchronously inside postMessage instead of being deferred through the bridge. End-to-end it still wins (row 3 above), so the work just moved earlier rather than grew.

Crash note

I originally wrote the cross-thread MessagePort benchmark as new MessageChannel() → transfer port2 to a Worker → 100k round trips. On main that segfaults during worker startup (Segmentation fault at address 0x8, bun.report/1.3.14/la2f97aa65g…) or hangs; on this PR it runs clean. I switched the comparison to use the worker's built-in parentPort so main could survive the baseline run.

Benchmark sources

// mc-same-thread.mjs — 200k same-thread port1→port2
const N = 200000;
const { port1, port2 } = new MessageChannel();
let r = 0;
port2.onmessage = () => { if (++r === N) process.exit(0); };
port1.start(); port2.start();
for (let i = 0; i < N; i++) port1.postMessage(i);

// mc-cross-thread.mjs — 100k round trips over worker parentPort
import { Worker, isMainThread, parentPort } from "node:worker_threads";
const N = 100000;
if (isMainThread) {
  const w = new Worker(new URL(import.meta.url));
  let e = 0;
  w.on("message", () => { if (++e === N) w.terminate().then(() => process.exit(0)); });
  await new Promise(r => w.once("online", r));
  for (let i = 0; i < N; i++) w.postMessage(i);
} else {
  parentPort.on("message", d => parentPort.postMessage(d));
}

// bc-same-thread.mjs — 50k msgs × 4 subscribers
const N = 50000, SUBS = 4, target = N * SUBS;
let r = 0;
const subs = Array.from({ length: SUBS }, () => {
  const c = new BroadcastChannel("bench");
  c.onmessage = () => { if (++r === target) process.exit(0); };
  return c;
});
const sender = new BroadcastChannel("bench");
for (let i = 0; i < N; i++) sender.postMessage(i);

// bc-cross-thread.mjs — 50k msgs × 4 worker subscribers
import { Worker, isMainThread, parentPort, workerData } from "node:worker_threads";
const N = 50000, SUBS = 4;
if (isMainThread) {
  const counter = new Int32Array(new SharedArrayBuffer(4));
  const ws = Array.from({ length: SUBS }, () => new Worker(new URL(import.meta.url), { workerData: { counter } }));
  await Promise.all(ws.map(w => new Promise(r => w.once("message", r))));
  const s = new BroadcastChannel("bench");
  for (let i = 0; i < N; i++) s.postMessage(i);
  while (Atomics.load(counter, 0) < N * SUBS) await new Promise(r => setTimeout(r, 0));
  await Promise.all(ws.map(w => w.terminate()));
  process.exit(0);
} else {
  const c = new BroadcastChannel("bench");
  c.onmessage = () => Atomics.add(workerData.counter, 0, 1);
  parentPort.postMessage("ready");
}

Jarred-Sumner · 2026-04-29T23:40:34Z

Peak RSS

Same scenarios as above, /usr/bin/time -f %M, median of 5 runs (cross-thread rows re-verified with 10).

Scenario	`main`	this PR	Δ
`MessageChannel` same-thread, 200k msgs	155.0 MB	135.4 MB	−19.5 MB (−13%)
`MessagePort` cross-thread, 100k round trips	146.4 MB	147.6 MB	+1.2 MB (noise — ranges overlap)
`BroadcastChannel` same-thread, 50k × 4 subs	114.2 MB	109.1 MB	−5.2 MB (−5%)
`BroadcastChannel` cross-thread, 50k × 4 workers	124.3 MB	75.6 MB	−48.7 MB (−39%)

The cross-thread BroadcastChannel drop is consistent across 10 runs (main 122–128 MB, PR 74–77 MB) — presumably the old MainThreadBridge was buffering 50k × 4 message copies on the main thread before fan-out, and the new path posts directly to each subscriber's context so they drain concurrently instead of piling up.

robobun · 2026-04-29T23:44:52Z

Build #49252 Windows failures are infra, not this PR:

:windows: 2019 x64 — exit −1, agent robobun-windows-1 died 3 min in at test 20/500 (log cuts off mid-line). Same pattern on builds 49246–49253.
:windows: 2019 x64-baseline — only failure is test/cli/install/bun-create.test.ts hitting GitHub API 403 rate limit. Same test is failing on builds 49246, 49247, 49250, 49251, 49253, 49254 across unrelated branches.

Deflake for the rate-limit case: #29956.

Retry the two jobs from the Buildkite UI — the BUILDKITE_TOKEN here is read-only so I can't trigger it.

Resolve weak→strong under the lock directly into an inline-capacity Vector<pair<ctxId, Ref<BroadcastChannel>>, 4> instead of copying the whole Subscriber vector (heap alloc + 2N ThreadSafeWeakPtr atomics) and resolving later. For ≤4 targets this is zero heap allocations and one CAS per target. Measured on local release build, 1→1 same-thread send-only: before: ~691 ns/op after: ~631 ns/op (≈9% faster) The remaining gap to main's 415ns is the fan-out now running inside postMessage rather than deferred through the old MainThreadBridge hop; end-to-end BroadcastChannel is still net faster.

robobun · 2026-04-30T00:05:21Z

Nice — thanks for running these.

On the bc.postMessage send-side ⚠️: the previous snapshot path copied the whole Vector<Subscriber> (heap alloc + 2N ThreadSafeWeakPtr copy/dtor atomics) before resolving weak→strong. 2110d58 collects straight into a Vector<pair<ctxId, Ref<BroadcastChannel>>, 4> under the lock — inline storage for ≤4 targets, one CAS per target, no weak-copy round-trip.

Local release, same 1→1 send-only shape:

before: ~691 ns/op
after:  ~631 ns/op   (≈9%)

(Slower CPU here than your Xeon, so absolute ns differ; should scale proportionally.)

The remaining gap vs main is the fan-out itself (lock + HashMap::find + postTaskTo) now running inside postMessage instead of behind the ensureOnMainThread hop. As you said — moved, not grown. Deferring it again would put the cross-thread 1.31× win back on the table, so I'd rather keep it synchronous.

Re the crash note: the MessageChannel → transfer port2 → round-trip case that segfaults/hangs main is exactly what the old registry-lookup-and-bounce design made racy; glad to see it run clean here.

worker.postMessage / parentPort.postMessage previously posted one task per message through ScriptExecutionContext::postTaskTo (global contexts lock + HashMap lookup + lambda alloc + concurrent enqueue). A round trip of N messages was ~2N contended lock acquisitions. Give Worker two MessageInbox structs (parent→worker, worker→parent), each a lock + Deque + drainScheduled flag. enqueue appends under the lock and posts one drain task if none is in flight; the drain task swaps the whole queue out under the lock and dispatches from the local copy with a microtask checkpoint between each, matching Node's MessagePort::OnMessage semantics. Cap is max(initial size, 1000) before yielding and rescheduling. Swapping rather than per-message popping keeps the sender and drainer from contending on the inbox lock under sustained bursts. Pre-online buffering is preserved: enqueueToWorker skips scheduling while !Online; fireEarlyMessages kicks the first drain once the worker is up. Local release, 100k parentPort round-trips: before: ~240 ms after: ~176 ms (1.36× faster) bench/postMessage/postMessage-string.mjs send-side: ~440 ns/iter (main) → ~217 ns/iter

robobun · 2026-04-30T01:16:41Z

Worker postMessage round-trip — e4b77d7: applied the same coalescing pattern to worker.postMessage / parentPort.postMessage. Each direction gets a MessageInbox (lock + Deque<MessageWithMessagePorts> + drainScheduled flag); a burst of N posts schedules one cross-thread drain task that swaps the queue out under the lock and loops with drainMicrotasks() between each dispatch, same max(size, 1000) budget as MessagePortPipe.

The swap-then-dispatch (vs per-message pop) keeps the sender and drainer off each other's lock under sustained bursts — important because the repo's own bench/postMessage/postMessage-string.mjs floods at ~200ns/send.

Local release (slower box than yours):

	`2110d58`	`e4b77d7`	Δ
100k `parentPort` round-trips	~240 ms	~176 ms	1.36×
`postMessage(11 char string)` send-only	—	~217 ns	(vs your 440 ns on main)

Observable ordering unchanged (m0,µ0,m1,µ1,... both directions — added a test for it). Pre-online buffering goes through the inbox too; fireEarlyMessages kicks the first drain.

All 258 tests in test/js/web/workers/ + broadcastchannel/ pass on release; all node test-worker-* parallels pass.

On build #49268's failure: it's the Zig compiler itself panicking (reached unreachable code in Sema.handleExternLibName) compiling process_windows_translate_c on 🪟 x64-baseline. Same panic on builds 49261 (jarred/filepoll-bidirectional) and 49263; the step passed on my previous commit with identical Zig source and on 15/16 other build-zig targets. Not mine.

Resolves conflicts with #29937 (MessagePortPipe) which added batched MessageInbox queues to Worker. Adapts the new code to this PR's lifecycle: - enqueueToWorker: '!(m_onlineClosingFlags & OnlineFlag)' → 'm_state != Running' - enqueueToParent / drainToParent: route through postTaskToParent() (stable m_parentContextId) instead of reading scriptExecutionContext() from the worker thread - drop networkStateChanged and flag constants (already removed here) - keep m_toWorker/m_toParent inboxes and drain machinery as-is

…9895) ## What Fixes a stack overflow in `MessagePortPipe::close()` when a deep chain of nested transferred MessagePorts is closed, and adds regression coverage for the transferred-port-dropped leak that #29937's RAII `TransferredMessagePort` destructor now handles. ## Repro ```js const DEPTH = 20_000; let head = new MessageChannel(); const tail = head; for (let i = 0; i < DEPTH; i++) { const next = new MessageChannel(); head.port2.postMessage(null, [next.port1]); // next.port1 lands in head.port1's inbox head.port2.close(); head = next; } tail.port1.close(); // SIGSEGV (stack overflow) ``` ## Root cause `TransferredMessagePort` is move-only and its destructor calls `pipe->close(side)` so an orphaned in-transit port gets cleaned up. `MessagePortPipe::close()` swaps the side's inbox into a local `Deque` and lets it destruct outside the lock. If those dropped messages carry transferred ports, the `Deque` destructor runs `~TransferredMessagePort` → `pipe->close()` → another `Deque` destructor → … one native stack frame per link. ~20k links overflows an 8 MB stack. ## Fix `MessagePortPipe::close()` owns a stack-local `Vector<pair<RefPtr<MessagePortPipe>, uint8_t>>` worklist seeded with `{this, side}`. For each entry it closes the side under the lock, swaps out the inbox, then harvests transferred pipes from the dropped messages into the worklist (nulling `tp.pipe` so the subsequent `~TransferredMessagePort` is a no-op) before letting the batch destruct. O(1) stack regardless of chain depth; no static / thread_local / instance state. ## Verification `test/js/web/workers/message-port-closed-leak.test.ts`: | test | main (no fix) | this PR | |--|--|--| | transferred port dropped before receiver closed does not leak | pass (#29937) | pass | | transferred port dropped after receiver closed does not leak | pass (#29937) | pass | | **deep chain of nested transferred ports does not overflow on close** | **SIGSEGV, exit 139** | pass | `message-channel.test.ts` (12 tests) passes unchanged. ## History This PR originally targeted the self-ref leak in the old `MessagePortChannel` / `MessagePortChannelRegistry` stack (`disentanglePort()` parked a self-ref in `m_pendingMessagePortTransfers[i]` that was never released if the carrying message was dropped). #29937 replaced that whole stack with `MessagePortPipe` and fixed the leak via RAII — but the destructor-driven cascade it introduced recurses unboundedly. The first two tests are kept as regression guards for the leak; the deep-chain test is the fail-before for this fix. --------- Co-authored-by: robobun <robobun@users.noreply.github.com>

…in post() BunBroadcastChannelRegistry::post() resolved each subscriber's ThreadSafeWeakPtr to a Ref<BroadcastChannel> for the fan-out. If a worker subscriber tore down concurrently — its JS wrapper GC'd and its queued task dropped, both on the worker thread — the posting thread's local Ref became the last owner, and ~BroadcastChannel → ~EventTarget → EventListenerMap::clear() ran on the posting thread, tripping releaseAssertOrSetThreadUID (the map was populated on the worker thread). Seen in CI as a SIGABRT in broadcast-channel-worker-gc.test.ts 'repeated worker create/terminate stress' on musl x64-baseline, and on other branches post-#29937. Carry ThreadSafeWeakPtr through the fan-out and resolve to a strong ref only inside the posted task, on the target thread, so any last deref happens there. This removes the need for the QueuedOne count (which existed to keep the channel GC-rooted between post and dispatch); a channel with no message listener is now collectable before delivery, in which case the task resolves the weak ptr to null and no-ops — equivalent to dispatchEvent with no listener.

- web_worker.zig parent field: parent_poll_ref is now conditional on !default_unref, so the 'lives at least as long as this struct via parent_poll_ref' claim no longer holds unconditionally. Rewrite to describe the {ref:false} / .unref() case and point at the file header's detached-not-joined gap. - Worker.cpp postTaskToWorkerGlobalScope Closing case: postMessage() now goes through enqueueToWorker() (since #29937), not this function; only getHeapSnapshot() calls it.

`test/js/node/worker_threads/worker_threads.test.ts` occasionally segfaults in CI with ``` panic: Segmentation fault at address 0x10 ``` on a GC helper thread: ``` wtfThreadEntryPoint AutomaticThread::start ParallelHelperPool::Thread::work Heap::runBeginPhase(GCConductor)::$_1 SlotVisitor::drainFromShared MarkingConstraintSolver::runExecutionThread MarkingConstraint::execute ← "Sh" Strong Handles HandleSet::visitStrongHandles *(nullptr + offsetof(HandleNode, m_value)) = *(0x10) ``` (decoded from the `bun.report` trace on [build 50529 / 🐧 13 x64](https://buildkite.com/bun/bun/builds/50529#019dec3e-4d11-4651-b908-84e601bc2db3) and symbolized against that build's `bun-profile`). ## Cause `jsWorkerPrototypeFunction_getHeapSnapshotBody` does: ```cpp Strong<JSPromise> strong(vm, promise); // parent VM's HandleSet worker.postTaskToWorkerGlobalScope([strong, parentId](auto& workerCtx) { ... ScriptExecutionContext::postTaskTo(parentId, [strong, snapshot = ...](auto& parentCtx) { ... }); // runs on worker thread }); ``` `JSC::Strong<T>` has **no move constructor**. Capturing it by value copy-constructs it, which calls `HandleSet::allocate()` + `m_strongList.push()`; destroying it calls `HandleSet::deallocate()` + `NodeList::remove()`. Both happen on the **worker thread** against the **parent VM's** `HandleSet`, without the parent VM's lock. `HandleSet::m_strongList` is a `SentinelLinkedList<HandleNode>` — not thread-safe. `push`/`remove` transiently null `m_next`/`m_prev`. The parent VM's "Sh" (Strong Handles) marking constraint (`Heap::addCoreConstraints`) iterates that list during GC; when it follows a null `m_next` it reads `*((HandleNode*)nullptr)->slot()` → `*(0x0 + 0x10)`. The `heapHelperPool()` is process-global, so the crashing helper thread belongs to the parent VM's collector even though the worker VM's `BunV8HeapSnapshotBuilder` full GC is in progress at the same time. This has been there since `getHeapSnapshot` was added — the recent worker lifetime rewrites (#29957, #29937) didn't introduce it. ## Fix Heap-allocate the `Strong<JSPromise>` once on the parent thread and pass only the raw pointer through the cross-thread lambdas. The worker thread never dereferences it, so it never touches the parent VM's `HandleSet`. The parent-side completion lambda resolves the promise and frees the handle. `Worker::postTaskToWorkerGlobalScope` now returns `bool` so a lost race to `Closing`/`Closed` (worker exited between `isOnline()` and the post) rejects with `ERR_WORKER_NOT_RUNNING` instead of silently leaking the handle. If `postTaskTo(parentId, …)` on the return trip fails (parent context gone), the handle intentionally leaks — deleting a parent-VM `Strong` from the worker thread is exactly the bug we're fixing, and the parent VM is tearing down anyway. ## Verification Stress fixture (`heap-snapshot-gc-race-fixture.js`, 300 iterations of `await worker.getHeapSnapshot(); Bun.gc(true)`), 40 runs each on linux-x64 release: | build | segfault at `0x10` | | --- | --- | | `52bdf47` (CI artifact, no fix) | 15 / 40 | | this branch | 0 / 40 | The new `worker_heap_snapshot_gc.test.ts` runs the fixture — 300 iters in release, 5 in debug (a single debug heap snapshot takes ~1.6s so the race window, which is a handful of instructions after each snapshot, is impractical to hit there; the debug pass is a functional check). ## Drive-by: non-LTO strip leaves orphan `PT_GNU_EH_FRAME` While reproducing I hit a second, unrelated crash in locally-built (non-LTO) release binaries: `stripFlags` removed `.eh_frame_hdr` on linux-gnu unconditionally, but the linker only passes `--no-eh-frame-hdr` when LTO is on. GNU strip doesn't rewrite the program header table, so the `PT_GNU_EH_FRAME` phdr was left pointing at unmapped memory and any stack unwind (e.g. WTF::Thread teardown after a worker exits) faulted. CI release builds always have LTO on so they weren't affected. Gated the section removal on `c.lto` to match the linker flag. --------- Co-authored-by: robobun <robobun@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

…primitive (oven-sh#29937) ## What Replaces the WebKit-derived `MessagePortChannelProvider` / `MessagePortChannelProviderImpl` / `MessagePortChannelRegistry` / `MessagePortChannel` stack — plus the `BroadcastChannel::MainThreadBridge` / per-context `BunBroadcastChannelRegistry` — with a small, thread-safe concurrency primitive (`MessagePortPipe`) that the Web-facing classes wrap thinly. Net −1082 lines (+737 / −1819), 10 files deleted. ## Why The existing stack is Safari's multi-*process* design carried over verbatim: an abstract provider so UIProcess can swap in an IPC impl, a process-global `MessagePortIdentifier → MessagePortChannel` HashMap, `ProcessIdentifier` tracking, self-`RefPtr` "protector" members to keep channels alive across IPC hops, and a main-thread serialization point for `BroadcastChannel`. None of that indirection is needed in Bun's single-process multi-thread model — and worse, the `MessagePortChannelRegistry::m_openChannels` HashMap and `MessagePortChannel::m_pendingMessages` Vector are mutated from worker threads with **no lock** (the WebKit `ASSERT(isMainThread())` guards were just commented out). Concurrent `new MessageChannel()` across workers corrupts the HashMap: ``` HashTable.h:1574:17: runtime error: member access within null pointer of type 'const HashTable<MessagePortIdentifier, KeyValuePair<MessagePortIdentifier, WeakRef<MessagePortChannel>>, ...>' ``` This is the cause of oven-sh#16186, oven-sh#22471, oven-sh#25805 and likely oven-sh#29458. ## Design **`MessagePortPipe`** — `ThreadSafeRefCounted`, two sides. Each side has: - a `Lock` + `Deque<MessageWithMessagePorts>` inbox - one `std::atomic<uint64_t>` state word: `Closed | DrainScheduled | Attached` in the low byte, queued-count in the high bits `send()` locks the destination side, enqueues, and if `Attached && !DrainScheduled` flips `DrainScheduled` and posts one task to the receiver's `ScriptExecutionContext`. A burst of N sends schedules at most one cross-thread wakeup. `takeAll()` clears `DrainScheduled` before swapping the deque, so a racing send reschedules — at most one extra no-op drain, never a stranded message. `attach()`/`detach()` handle transfer: a detached side buffers; attach flushes the backlog with one wakeup. `hasPendingActivity` is two atomic loads (my queued count + peer's `Closed`), safe from the GC thread with no lock. **`MessagePort`** — holds `RefPtr<MessagePortPipe>` + `uint8_t side`. `postMessage` → `pipe->send`. `start()` → `pipe->attach`. `close()` → `pipe->close`. Transfer → `pipe->detach`, hand `{pipe, side}` through `TransferredMessagePort`, re-create on the other side. No global maps, no identifiers, no per-context port iteration. The drain task takes the whole inbox in one go (sender-side batching), then posts each message as its own local task so microtasks checkpoint between deliveries — matching the HTML "port message queue is a task source" semantics and Node's observable ordering. **`BroadcastChannel`** — one process-global `Lock` + `HashMap<String, Vector<{ctxId, weak channel}>>`. `postMessage` snapshots subscribers under the lock, releases, then posts one task per `(message, subscriber)` directly to each subscriber's context — no bounce through the main thread, no `MainThreadBridge`, no second `allBroadcastChannels` map. Per-channel inbox batching was prototyped and reverted: the spec's same-event-loop `(message-major, creation-minor)` delivery-order test (WPT `broadcast-channel.test.ts::"messages are delivered in port creation order"`) fails if subscribers drain channel-major. A per-`(context, name)` inbox could restore batching without breaking order — left as future work. ## Deleted `MessagePortChannel.{h,cpp}`, `MessagePortChannelProvider.{h,cpp}`, `MessagePortChannelProviderImpl.{h,cpp}`, `MessagePortChannelRegistry.{h,cpp}`, `MessagePortIdentifier.h`, `BroadcastChannelRegistry.h`, the `allMessagePorts()` / `portToContextIdentifier()` / `allBroadcastChannels()` / `channelToContextIdentifier()` global maps, `ScriptExecutionContext::{m_messagePorts, processMessageWithMessagePortsSoon, dispatchMessagePortEvents, createdMessagePort, destroyedMessagePort, m_broadcastChannelRegistry}`, and `WorkerGlobalScope::messagePortChannelProvider()`. ## Verification - `test/js/web/workers/message-channel.test.ts` — 12/12 - `test/js/web/broadcastchannel/broadcast-channel.test.ts` — 11/11 - `test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts` — 3/3 - `test/js/web/workers/` suite — 228 pass / 6 fail; all 6 fail identically on `main` (container-slowness timeouts), zero new failures - `test/js/node/worker_threads/worker_threads.test.ts` — 25 pass / 2 fail; both fail identically on `main` - 10 node `parallel/test-worker-message-port-*.js` + `test-broadcastchannel-*.js` — all pass - New `test/js/web/workers/message-port-pipe.test.ts`: - `concurrent MessageChannel creation across workers is race-free` — **fails 3/3 on main** with the HashTable UBSan null-deref shown above; **passes 5/5** on this branch - microtask ordering, buffered-before-start, FIFO `receiveMessageOnPort`, 1000-message cross-thread burst ## Relationship to other PRs oven-sh#29832 / oven-sh#29442 patch the race by adding a lock to the existing registry. This PR removes the registry instead — there's nothing left to race on. Either approach fixes the crash; this one also sheds the layering. Fixes oven-sh#16186 Fixes oven-sh#22471 Fixes oven-sh#25805 --------- Co-authored-by: robobun <robobun@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

…en-sh#29895) ## What Fixes a stack overflow in `MessagePortPipe::close()` when a deep chain of nested transferred MessagePorts is closed, and adds regression coverage for the transferred-port-dropped leak that oven-sh#29937's RAII `TransferredMessagePort` destructor now handles. ## Repro ```js const DEPTH = 20_000; let head = new MessageChannel(); const tail = head; for (let i = 0; i < DEPTH; i++) { const next = new MessageChannel(); head.port2.postMessage(null, [next.port1]); // next.port1 lands in head.port1's inbox head.port2.close(); head = next; } tail.port1.close(); // SIGSEGV (stack overflow) ``` ## Root cause `TransferredMessagePort` is move-only and its destructor calls `pipe->close(side)` so an orphaned in-transit port gets cleaned up. `MessagePortPipe::close()` swaps the side's inbox into a local `Deque` and lets it destruct outside the lock. If those dropped messages carry transferred ports, the `Deque` destructor runs `~TransferredMessagePort` → `pipe->close()` → another `Deque` destructor → … one native stack frame per link. ~20k links overflows an 8 MB stack. ## Fix `MessagePortPipe::close()` owns a stack-local `Vector<pair<RefPtr<MessagePortPipe>, uint8_t>>` worklist seeded with `{this, side}`. For each entry it closes the side under the lock, swaps out the inbox, then harvests transferred pipes from the dropped messages into the worklist (nulling `tp.pipe` so the subsequent `~TransferredMessagePort` is a no-op) before letting the batch destruct. O(1) stack regardless of chain depth; no static / thread_local / instance state. ## Verification `test/js/web/workers/message-port-closed-leak.test.ts`: | test | main (no fix) | this PR | |--|--|--| | transferred port dropped before receiver closed does not leak | pass (oven-sh#29937) | pass | | transferred port dropped after receiver closed does not leak | pass (oven-sh#29937) | pass | | **deep chain of nested transferred ports does not overflow on close** | **SIGSEGV, exit 139** | pass | `message-channel.test.ts` (12 tests) passes unchanged. ## History This PR originally targeted the self-ref leak in the old `MessagePortChannel` / `MessagePortChannelRegistry` stack (`disentanglePort()` parked a self-ref in `m_pendingMessagePortTransfers[i]` that was never released if the carrying message was dropped). oven-sh#29937 replaced that whole stack with `MessagePortPipe` and fixed the leak via RAII — but the destructor-driven cascade it introduced recurses unboundedly. The first two tests are kept as regression guards for the leak; the deep-chain test is the fail-before for this fix. --------- Co-authored-by: robobun <robobun@users.noreply.github.com>

…0185) `test/js/node/worker_threads/worker_threads.test.ts` occasionally segfaults in CI with ``` panic: Segmentation fault at address 0x10 ``` on a GC helper thread: ``` wtfThreadEntryPoint AutomaticThread::start ParallelHelperPool::Thread::work Heap::runBeginPhase(GCConductor)::$_1 SlotVisitor::drainFromShared MarkingConstraintSolver::runExecutionThread MarkingConstraint::execute ← "Sh" Strong Handles HandleSet::visitStrongHandles *(nullptr + offsetof(HandleNode, m_value)) = *(0x10) ``` (decoded from the `bun.report` trace on [build 50529 / 🐧 13 x64](https://buildkite.com/bun/bun/builds/50529#019dec3e-4d11-4651-b908-84e601bc2db3) and symbolized against that build's `bun-profile`). ## Cause `jsWorkerPrototypeFunction_getHeapSnapshotBody` does: ```cpp Strong<JSPromise> strong(vm, promise); // parent VM's HandleSet worker.postTaskToWorkerGlobalScope([strong, parentId](auto& workerCtx) { ... ScriptExecutionContext::postTaskTo(parentId, [strong, snapshot = ...](auto& parentCtx) { ... }); // runs on worker thread }); ``` `JSC::Strong<T>` has **no move constructor**. Capturing it by value copy-constructs it, which calls `HandleSet::allocate()` + `m_strongList.push()`; destroying it calls `HandleSet::deallocate()` + `NodeList::remove()`. Both happen on the **worker thread** against the **parent VM's** `HandleSet`, without the parent VM's lock. `HandleSet::m_strongList` is a `SentinelLinkedList<HandleNode>` — not thread-safe. `push`/`remove` transiently null `m_next`/`m_prev`. The parent VM's "Sh" (Strong Handles) marking constraint (`Heap::addCoreConstraints`) iterates that list during GC; when it follows a null `m_next` it reads `*((HandleNode*)nullptr)->slot()` → `*(0x0 + 0x10)`. The `heapHelperPool()` is process-global, so the crashing helper thread belongs to the parent VM's collector even though the worker VM's `BunV8HeapSnapshotBuilder` full GC is in progress at the same time. This has been there since `getHeapSnapshot` was added — the recent worker lifetime rewrites (oven-sh#29957, oven-sh#29937) didn't introduce it. ## Fix Heap-allocate the `Strong<JSPromise>` once on the parent thread and pass only the raw pointer through the cross-thread lambdas. The worker thread never dereferences it, so it never touches the parent VM's `HandleSet`. The parent-side completion lambda resolves the promise and frees the handle. `Worker::postTaskToWorkerGlobalScope` now returns `bool` so a lost race to `Closing`/`Closed` (worker exited between `isOnline()` and the post) rejects with `ERR_WORKER_NOT_RUNNING` instead of silently leaking the handle. If `postTaskTo(parentId, …)` on the return trip fails (parent context gone), the handle intentionally leaks — deleting a parent-VM `Strong` from the worker thread is exactly the bug we're fixing, and the parent VM is tearing down anyway. ## Verification Stress fixture (`heap-snapshot-gc-race-fixture.js`, 300 iterations of `await worker.getHeapSnapshot(); Bun.gc(true)`), 40 runs each on linux-x64 release: | build | segfault at `0x10` | | --- | --- | | `52bdf47` (CI artifact, no fix) | 15 / 40 | | this branch | 0 / 40 | The new `worker_heap_snapshot_gc.test.ts` runs the fixture — 300 iters in release, 5 in debug (a single debug heap snapshot takes ~1.6s so the race window, which is a handful of instructions after each snapshot, is impractical to hit there; the debug pass is a functional check). ## Drive-by: non-LTO strip leaves orphan `PT_GNU_EH_FRAME` While reproducing I hit a second, unrelated crash in locally-built (non-LTO) release binaries: `stripFlags` removed `.eh_frame_hdr` on linux-gnu unconditionally, but the linker only passes `--no-eh-frame-hdr` when LTO is on. GNU strip doesn't rewrite the program header table, so the `PT_GNU_EH_FRAME` phdr was left pointing at unmapped memory and any stack unwind (e.g. WTF::Thread teardown after a worker exits) faulted. CI release builds always have LTO on so they weren't affected. Gated the section removal on `c.lto` to match the linker flag. --------- Co-authored-by: robobun <robobun@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

github-actions Bot added the claude label Apr 29, 2026

Jarred-Sumner marked this pull request as ready for review April 29, 2026 12:43

claude Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/bun.js/bindings/webcore/MessagePort.cpp

Comment thread test/js/web/workers/message-port-pipe.test.ts Outdated

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

robobun added 2 commits April 29, 2026 12:59

test: gate concurrent MessageChannel race test behind isDebug/isASAN

31ee5ec

The race being exercised (unlocked HashMap in MessagePortChannelRegistry) is a memory-safety bug that surfaces deterministically only under sanitizers. Release lanes skip it.

Restore Apple BSD-2 headers on WebKit-derived files; drop explicit te…

97d5fba

…st timeouts

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread test/js/web/workers/message-port-pipe.test.ts

claude Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/bun.js/bindings/webcore/MessagePort.cpp

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/bun.js/bindings/webcore/TransferredMessagePort.h Outdated

Comment thread test/js/web/workers/message-port-pipe.test.ts

robobun added 2 commits April 29, 2026 13:17

claude Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/bun.js/bindings/webcore/TransferredMessagePort.h

robobun added 2 commits April 29, 2026 13:49

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/bun.js/bindings/webcore/MessagePort.cpp

claude Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/bun.js/bindings/ScriptExecutionContext.h Outdated

robobun added 2 commits April 29, 2026 15:19

Remove dead MessagePort forward declaration from ScriptExecutionConte…

b5ded0f

…xt.h

claude Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/bun.js/bindings/webcore/Worker.cpp Outdated

Jarred-Sumner reviewed Apr 29, 2026

View reviewed changes

robobun commented Apr 29, 2026

View reviewed changes

[autofix.ci] apply automated fixes

778c684

github-actions Bot mentioned this pull request Apr 30, 2026

Guard MessagePortChannel state with a per-channel Lock #29991

Closed

Jarred-Sumner merged commit 6bcdf83 into main Apr 30, 2026
74 of 76 checks passed

Jarred-Sumner deleted the farm/0b8cdb6f/messageport-pipe branch April 30, 2026 20:57

Jarred-Sumner mentioned this pull request Apr 30, 2026

Guard MessagePortChannelRegistry::m_openChannels with a Lock #29972

Closed

Jarred-Sumner mentioned this pull request May 1, 2026

BroadcastChannel: don't hold cross-thread strong refs in registry post() #30041

Open

robobun mentioned this pull request May 3, 2026

worker: fix cross-thread HandleSet race in getHeapSnapshot #30185

Merged

This was referenced Jun 18, 2026

panic: Segmentation fault at address 0x40 #29458

Closed

MessagePort: serialize MessagePortChannelRegistry with a Lock (fix worker_threads data race) #29442

Closed

fix(worker): add thread safety to MessageChannel to avoid crashes #25806

Closed

Conversation

robobun commented Apr 29, 2026

What

Why

Design

Deleted

Verification

Relationship to other PRs

Uh oh!

robobun commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jarred-Sumner left a comment

Choose a reason for hiding this comment

Uh oh!

robobun left a comment

Choose a reason for hiding this comment

Uh oh!

Jarred-Sumner commented Apr 29, 2026

Uh oh!

Jarred-Sumner commented Apr 29, 2026

Benchmark: PR #29937 vs main (canary f97aa654c)

hyperfine (end-to-end wall clock)

mitata (per-call latency)

Crash note

Uh oh!

Jarred-Sumner commented Apr 29, 2026

Peak RSS

Uh oh!

robobun commented Apr 29, 2026

Uh oh!

robobun commented Apr 30, 2026

Uh oh!

robobun commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

robobun commented Apr 29, 2026 •

edited

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Benchmark: PR #29937 vs `main` (canary `f97aa654c`)