Skip to content

Replace MessagePort/BroadcastChannel registries with MessagePortPipe primitive#29937

Merged
Jarred-Sumner merged 17 commits into
mainfrom
farm/0b8cdb6f/messageport-pipe
Apr 30, 2026
Merged

Replace MessagePort/BroadcastChannel registries with MessagePortPipe primitive#29937
Jarred-Sumner merged 17 commits into
mainfrom
farm/0b8cdb6f/messageport-pipe

Conversation

@robobun

@robobun robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

What

Replaces the WebKit-derived MessagePortChannelProvider / MessagePortChannelProviderImpl / MessagePortChannelRegistry / MessagePortChannel stack — plus the BroadcastChannel::MainThreadBridge / per-context BunBroadcastChannelRegistry — with a small, thread-safe concurrency primitive (MessagePortPipe) that the Web-facing classes wrap thinly.

Net −1082 lines (+737 / −1819), 10 files deleted.

Why

The existing stack is Safari's multi-process design carried over verbatim: an abstract provider so UIProcess can swap in an IPC impl, a process-global MessagePortIdentifier → MessagePortChannel HashMap, ProcessIdentifier tracking, self-RefPtr "protector" members to keep channels alive across IPC hops, and a main-thread serialization point for BroadcastChannel. None of that indirection is needed in Bun's single-process multi-thread model — and worse, the MessagePortChannelRegistry::m_openChannels HashMap and MessagePortChannel::m_pendingMessages Vector are mutated from worker threads with no lock (the WebKit ASSERT(isMainThread()) guards were just commented out). Concurrent new MessageChannel() across workers corrupts the HashMap:

HashTable.h:1574:17: runtime error: member access within null pointer of type
  'const HashTable<MessagePortIdentifier, KeyValuePair<MessagePortIdentifier,
   WeakRef<MessagePortChannel>>, ...>'

This is the cause of #16186, #22471, #25805 and likely #29458.

Design

MessagePortPipeThreadSafeRefCounted, two sides. Each side has:

  • a Lock + Deque<MessageWithMessagePorts> inbox
  • one std::atomic<uint64_t> state word: Closed | DrainScheduled | Attached in the low byte, queued-count in the high bits

send() locks the destination side, enqueues, and if Attached && !DrainScheduled flips DrainScheduled and posts one task to the receiver's ScriptExecutionContext. A burst of N sends schedules at most one cross-thread wakeup. takeAll() clears DrainScheduled before swapping the deque, so a racing send reschedules — at most one extra no-op drain, never a stranded message. attach()/detach() handle transfer: a detached side buffers; attach flushes the backlog with one wakeup. hasPendingActivity is two atomic loads (my queued count + peer's Closed), safe from the GC thread with no lock.

MessagePort — holds RefPtr<MessagePortPipe> + uint8_t side. postMessagepipe->send. start()pipe->attach. close()pipe->close. Transfer → pipe->detach, hand {pipe, side} through TransferredMessagePort, re-create on the other side. No global maps, no identifiers, no per-context port iteration.

The drain task takes the whole inbox in one go (sender-side batching), then posts each message as its own local task so microtasks checkpoint between deliveries — matching the HTML "port message queue is a task source" semantics and Node's observable ordering.

BroadcastChannel — one process-global Lock + HashMap<String, Vector<{ctxId, weak channel}>>. postMessage snapshots subscribers under the lock, releases, then posts one task per (message, subscriber) directly to each subscriber's context — no bounce through the main thread, no MainThreadBridge, no second allBroadcastChannels map. Per-channel inbox batching was prototyped and reverted: the spec's same-event-loop (message-major, creation-minor) delivery-order test (WPT broadcast-channel.test.ts::"messages are delivered in port creation order") fails if subscribers drain channel-major. A per-(context, name) inbox could restore batching without breaking order — left as future work.

Deleted

MessagePortChannel.{h,cpp}, MessagePortChannelProvider.{h,cpp}, MessagePortChannelProviderImpl.{h,cpp}, MessagePortChannelRegistry.{h,cpp}, MessagePortIdentifier.h, BroadcastChannelRegistry.h, the allMessagePorts() / portToContextIdentifier() / allBroadcastChannels() / channelToContextIdentifier() global maps, ScriptExecutionContext::{m_messagePorts, processMessageWithMessagePortsSoon, dispatchMessagePortEvents, createdMessagePort, destroyedMessagePort, m_broadcastChannelRegistry}, and WorkerGlobalScope::messagePortChannelProvider().

Verification

  • test/js/web/workers/message-channel.test.ts — 12/12
  • test/js/web/broadcastchannel/broadcast-channel.test.ts — 11/11
  • test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts — 3/3
  • test/js/web/workers/ suite — 228 pass / 6 fail; all 6 fail identically on main (container-slowness timeouts), zero new failures
  • test/js/node/worker_threads/worker_threads.test.ts — 25 pass / 2 fail; both fail identically on main
  • 10 node parallel/test-worker-message-port-*.js + test-broadcastchannel-*.js — all pass
  • New test/js/web/workers/message-port-pipe.test.ts:
    • concurrent MessageChannel creation across workers is race-freefails 3/3 on main with the HashTable UBSan null-deref shown above; passes 5/5 on this branch
    • microtask ordering, buffered-before-start, FIFO receiveMessageOnPort, 1000-message cross-thread burst

Relationship to other PRs

#29832 / #29442 patch the race by adding a lock to the existing registry. This PR removes the registry instead — there's nothing left to race on. Either approach fixes the crash; this one also sheds the layering.

Fixes #16186
Fixes #22471
Fixes #25805

The WebKit-derived MessagePortChannelProvider/Registry/Channel stack was
designed for Safari's multi-process model: an abstract provider, a
process-global identifier→channel HashMap, per-channel self-RefPtr
protectors, and a main-thread serialization point for BroadcastChannel.
Bun is single-process multi-thread, so none of the indirection buys
anything — and worse, the registry's HashMap and the channel's pending
Vector were accessed from worker threads with the isMainThread() asserts
simply commented out, so concurrent MessageChannel churn corrupts the
HashMap (observed as #16186/#22471/#25805).

Split into two layers:

MessagePortPipe is the concurrency primitive: ThreadSafeRefCounted, two
sides, each with a lock-guarded Deque inbox and one atomic<uint64_t>
packing Closed|DrainScheduled|Attached in the low byte and the queued
count in the high bits. Send locks the destination, enqueues, and
schedules at most one drain task per burst (the first push flips
DrainScheduled; subsequent pushes see it set and skip). Drain clears the
flag before swapping the deque so a racing send reschedules — at most
one extra no-op wakeup, never a stranded message. attach/detach handle
transfer by buffering while no context is attached and flushing on
re-attach. hasPendingActivity is two atomic loads, no lock, no registry.

MessagePort is now a thin EventTarget wrapper: RefPtr<Pipe> + side index.
TransferredMessagePort carries {pipe, side} instead of identifier pairs.
There is no global port map, no provider, no registry, no per-context
m_messagePorts iteration on dispatch.

BroadcastChannel drops MainThreadBridge and the per-context LazyRef
registry in favor of one process-global lock-guarded name→subscribers
map. post() snapshots under the lock and posts directly to each
subscriber's context — no main-thread bounce. Per-channel inbox batching
was considered and rejected: the spec requires same-event-loop
subscribers to observe (message-major, creation-minor) order, which
channel-major draining breaks.

Net −1082 lines; the race in the registry is gone by construction (no
registry).
@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 9:33 PM PT - Apr 29th, 2026

@autofix-ci[bot], your commit 778c684 has 1 failures in Build #49291 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29937

That installs a local version of the PR into your bun-29937 executable, so you can run:

bun-29937 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 6 issues this PR may fix:

  1. worker cannot postMessage MessagePort #23294 - PR rewrites port transfer to use {pipe, side}, fixing "Unable to deserialize data" when transferring a MessagePort created inside a worker
  2. Transferring MessagePort from worker thread does not work #15931 - Directly caused by broken transfer serialization path that this PR replaces with new {pipe, side} transfer mechanism
  3. Unexpected MessagePort+Worker behavior on Bun vs NodeJS #19862 - PR fixes thread-safety bugs in shared MessagePortChannel data structures that caused cross-worker MessagePort communication to behave incorrectly
  4. receiveMessageOnPort unexpectedly drops messages with falsy values including undefined #26501 - PR rewrites receiveMessageOnPort to use pipe->takeAll() for correct synchronous message retrieval, fixing falsy-value-dropping bug
  5. MessagePort does not transmit unless started explicitly #19863 - PR rewrites start() to call pipe->attach(), enabling correct auto-start behavior on on("message") registration
  6. Bun BroadcastChannel never receive for same reference #10433 - PR rewrites BroadcastChannel to use a process-global singleton registry with direct delivery, fixing channels that never receive their own messages

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #23294
Fixes #15931
Fixes #19862
Fixes #26501
Fixes #19863
Fixes #10433

🤖 Generated with Claude Code

@github-actions

Copy link
Copy Markdown
Contributor

@Jarred-Sumner Jarred-Sumner marked this pull request as ready for review April 29, 2026 12:43
Comment thread src/bun.js/bindings/webcore/MessagePort.cpp
Comment thread test/js/web/workers/message-port-pipe.test.ts Outdated
@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaces identifier/channel-based message-port and broadcast-channel plumbing with a two-sided MessagePortPipe and a process-global BunBroadcastChannelRegistry; deletes legacy MessagePortChannel/provider/registry/identifier artifacts and removes message-port lifecycle APIs from ScriptExecutionContext and WorkerGlobalScope.

Changes

Cohort / File(s) Summary
MessagePort pipe transport
src/bun.js/bindings/webcore/MessagePort.h, src/bun.js/bindings/webcore/MessagePort.cpp, src/bun.js/bindings/webcore/MessageChannel.cpp, src/bun.js/bindings/webcore/TransferredMessagePort.h, src/bun.js/bindings/webcore/MessagePortPipe.h, src/bun.js/bindings/webcore/MessagePortPipe.cpp
Replaces identifier/channel model with MessagePortPipe (two-sided pipe). Ports now own/reference a pipe + side; send/take/attach/detach/close and single-message dispatch live on the pipe. Entangle/disentangle and transfer semantics updated to pipe-based TransferredMessagePort.
BroadcastChannel & registry
src/bun.js/bindings/webcore/BroadcastChannel.h, src/bun.js/bindings/webcore/BroadcastChannel.cpp, src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.h, src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.cpp
Refactors BroadcastChannel to instance dispatch backed by BunBroadcastChannelRegistry::singleton(). Registry changes to subscribe/unsubscribe/post with snapshot+async per-subscriber delivery. Atomic state bits replace prior flags; main-thread bridge and identifier helpers removed.
Removed identifier/channel infrastructure
src/bun.js/bindings/webcore/MessagePortChannel.h, src/bun.js/bindings/webcore/MessagePortChannel.cpp, src/bun.js/bindings/webcore/MessagePortChannelProvider.h, src/bun.js/bindings/webcore/MessagePortChannelProvider.cpp, src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.h, src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.cpp, src/bun.js/bindings/webcore/MessagePortChannelRegistry.h, src/bun.js/bindings/webcore/MessagePortChannelRegistry.cpp, src/bun.js/bindings/webcore/MessagePortIdentifier.h
Deletes legacy MessagePortChannel, provider, provider-impl, channel-registry, and MessagePortIdentifier. Removes identifier-based registration, routing, remote posting, and take/try APIs; callers must migrate to pipe or registry APIs.
ScriptExecutionContext & WorkerGlobalScope cleanup
src/bun.js/bindings/ScriptExecutionContext.h, src/bun.js/bindings/ScriptExecutionContext.cpp, src/bun.js/bindings/BunWorkerGlobalScope.h, src/bun.js/bindings/BunWorkerGlobalScope.cpp
Removes ScriptExecutionContext message-port lifecycle APIs, queued completion handlers, message-port tracking, and broadcast registry member/accessor. Deletes WorkerGlobalScope::messagePortChannelProvider() declaration/definition. Update call sites to new pipe/registry interfaces.
MessagePortChannelProvider removal / context lookup
src/bun.js/bindings/BunWorkerGlobalScope.cpp, src/bun.js/bindings/webcore/MessagePortChannelProvider*.{h,cpp}
Removes provider singleton and context-based provider lookup definitions. Calls to MessagePortChannelProvider::singleton() / fromContext() are removed — migrate creation/entangling/posting to pipe/registry APIs.
MessagePortChannelRegistry → BunBroadcastChannelRegistry migration
src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.h, src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.cpp
Reworks registry to a process-global singleton with per-name subscriber vectors and lock-guarded snapshot posting. Adds subscribe/unsubscribe/post and removes identifier-based register/unregister/postMessage methods.
MessagePortChannel → pipe tests added
test/js/web/workers/message-port-pipe.test.ts
Adds tests for MessagePortPipe semantics: ordering, microtask boundaries, buffered delivery, FIFO drains, GC/FinalizationRegistry behavior for transferred ports, and concurrency stress scenarios.
Small build / call-site changes
src/bun.js/bindings/webcore/Worker.cpp, src/bun.js/bindings/ZigGlobalObject.cpp
Worker postMessage now moves MessageWithMessagePorts into deferred task capture. ZigGlobalObject.cpp adds a ProcessIdentifier include. These change capture/include behavior; review for ownership/compilation effects.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: replacing MessagePort/BroadcastChannel registries with the MessagePortPipe primitive.
Description check ✅ Passed The description is comprehensive, covering what changed, why it matters, the design rationale, and verification results. It directly addresses the template sections.
Linked Issues check ✅ Passed All three linked issues (#16186, #22471, #25805) report concurrent access crashes in MessagePort. The PR removes unsafe concurrent registry access, replaces it with a thread-safe MessagePortPipe primitive, and includes tests demonstrating the race-free implementation fixes all three issues.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the registry replacement objective: removing MessagePortChannel/Provider/Registry/Identifier files, refactoring MessagePort/BroadcastChannel to use MessagePortPipe, and adding comprehensive tests for the new implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/webcore/BroadcastChannel.cpp`:
- Around line 83-85: BroadcastChannel::contextDestroyed currently calls close()
but fails to invoke the base class implementation, leaving
ContextDestructionObserver's cached context pointer stale; update
BroadcastChannel::contextDestroyed to call the base class (e.g., the parent
class's contextDestroyed or ContextDestructionObserver::contextDestroyed) after
or before close() so the observer clears its cached context pointer and prevents
scriptExecutionContext() from returning a dangling pointer. Ensure you reference
BroadcastChannel::contextDestroyed and the ContextDestructionObserver base
method in your change.
- Around line 93-98: BroadcastChannel::hasPendingActivity currently reads the
non-atomic m_hasRelevantEventListener without synchronization while
eventListenersDidChange can write it on the context thread, causing UB; fix by
making the listener flag atomic or folding it into the existing atomic m_state
so hasPendingActivity only performs atomic reads. Update the writer in
eventListenersDidChange to store to the chosen atomic (or update m_state's
bit(s) consistently) and adjust any readers (e.g., hasPendingActivity and other
checks) to read the same atomic location with appropriate memory_order (e.g.,
acquire for readers, release for writers) to avoid races.
- Around line 17-25: The constructor BroadcastChannel::BroadcastChannel calls
jsRef() but contextDestroyed() currently calls close() without pairing
jsUnref(), leaking the VM ref; modify BroadcastChannel::contextDestroyed() to
call jsUnref(nullptr) before calling close() so the jsRef/jsUnref pair is
balanced (ensure you update the method that currently only calls close() in the
BroadcastChannel class that interacts with BunBroadcastChannelRegistry and the
existing close() implementation).

In `@src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.cpp`:
- Around line 59-63: The code unconditionally increments
BroadcastChannel::QueuedOne on channel->m_state before calling
ScriptExecutionContext::postTaskTo(sub.ctxId, ...) so if postTaskTo returns
false the queued count remains inflated; fix by only transferring ownership
(channel.releaseNonNull()) and/or incrementing the queued counter when
postTaskTo succeeds, or if you must increment before the call, call
channel->m_state.fetch_sub(BroadcastChannel::QueuedOne,
std::memory_order_acq_rel) when postTaskTo returns false and do not release the
channel in that failure path; ensure the lambda capture still gets a valid
released channel pointer only on success and that memory_order matches the
original fetch_add/fetch_sub uses.

In `@src/bun.js/bindings/webcore/MessagePort.cpp`:
- Around line 92-109: disentangle() currently detaches the pipe and drops the
context observer but does not perform the listener cleanup that close() does, so
any listener-derived refs (via onDidChangeListenerImpl() / refEventLoop()) can
keep the old event loop alive after transfer; fix by invoking the same
listener-cleanup path used by close() (e.g. clear/reset any listener state or
call the method that removes message listeners / clears onDidChangeListenerImpl
refs) from MessagePort::disentangle() — do this before observeContext(nullptr)
and before returning the TransferredMessagePort so that refEventLoop() refs are
released when a port is transferred.
- Around line 132-142: The current code eagerly drains the pipe via
m_pipe->takeAll(m_side) and captures messages into delayed postTask() lambdas,
which causes messages to be lost if disentangle() sets m_isDetached before those
tasks run; change the approach in MessagePort.cpp so you do not dequeue the
entire inbox up-front—either pop and take a single message inside each posted
task (call a takeOne/pop method per task) or leave messages in the pipe and on
detach return/restore pending messages (use the existing disentangle and
m_isDetached checks to hand messages back to the pipe). Specifically, replace
the m_pipe->takeAll(m_side) usage and the loop that captures messages with logic
that only removes a message at the moment its task executes (or implements
returning messages on detach), ensuring dispatchOneMessage still receives the
moved message only after the pipe removal succeeds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f8271d9d-a8e7-4e05-a12e-061e60d2a547

📥 Commits

Reviewing files that changed from the base of the PR and between 6c21a7e and 12d269f.

📒 Files selected for processing (26)
  • src/bun.js/bindings/BunWorkerGlobalScope.cpp
  • src/bun.js/bindings/BunWorkerGlobalScope.h
  • src/bun.js/bindings/ScriptExecutionContext.cpp
  • src/bun.js/bindings/ScriptExecutionContext.h
  • src/bun.js/bindings/ZigGlobalObject.cpp
  • src/bun.js/bindings/webcore/BroadcastChannel.cpp
  • src/bun.js/bindings/webcore/BroadcastChannel.h
  • src/bun.js/bindings/webcore/BroadcastChannelRegistry.h
  • src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.cpp
  • src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.h
  • src/bun.js/bindings/webcore/MessageChannel.cpp
  • src/bun.js/bindings/webcore/MessagePort.cpp
  • src/bun.js/bindings/webcore/MessagePort.h
  • src/bun.js/bindings/webcore/MessagePortChannel.cpp
  • src/bun.js/bindings/webcore/MessagePortChannel.h
  • src/bun.js/bindings/webcore/MessagePortChannelProvider.cpp
  • src/bun.js/bindings/webcore/MessagePortChannelProvider.h
  • src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.cpp
  • src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.h
  • src/bun.js/bindings/webcore/MessagePortChannelRegistry.cpp
  • src/bun.js/bindings/webcore/MessagePortChannelRegistry.h
  • src/bun.js/bindings/webcore/MessagePortIdentifier.h
  • src/bun.js/bindings/webcore/MessagePortPipe.cpp
  • src/bun.js/bindings/webcore/MessagePortPipe.h
  • src/bun.js/bindings/webcore/TransferredMessagePort.h
  • test/js/web/workers/message-port-pipe.test.ts
💤 Files with no reviewable changes (14)
  • src/bun.js/bindings/BunWorkerGlobalScope.h
  • src/bun.js/bindings/webcore/MessagePortIdentifier.h
  • src/bun.js/bindings/BunWorkerGlobalScope.cpp
  • src/bun.js/bindings/webcore/MessagePortChannelRegistry.cpp
  • src/bun.js/bindings/webcore/MessagePortChannelProvider.h
  • src/bun.js/bindings/webcore/BroadcastChannelRegistry.h
  • src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.h
  • src/bun.js/bindings/webcore/MessagePortChannel.cpp
  • src/bun.js/bindings/webcore/MessagePortChannelRegistry.h
  • src/bun.js/bindings/ScriptExecutionContext.h
  • src/bun.js/bindings/webcore/MessagePortChannel.h
  • src/bun.js/bindings/ScriptExecutionContext.cpp
  • src/bun.js/bindings/webcore/MessagePortChannelProviderImpl.cpp
  • src/bun.js/bindings/webcore/MessagePortChannelProvider.cpp

Comment thread src/bun.js/bindings/webcore/BroadcastChannel.cpp
Comment thread src/bun.js/bindings/webcore/BroadcastChannel.cpp
Comment thread src/bun.js/bindings/webcore/BroadcastChannel.cpp Outdated
Comment thread src/bun.js/bindings/webcore/BunBroadcastChannelRegistry.cpp
Comment thread src/bun.js/bindings/webcore/MessagePort.cpp
Comment thread src/bun.js/bindings/webcore/MessagePort.cpp Outdated
robobun added 2 commits April 29, 2026 12:59
The race being exercised (unlocked HashMap in MessagePortChannelRegistry)
is a memory-safety bug that surfaces deterministically only under
sanitizers. Release lanes skip it.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/web/workers/message-port-pipe.test.ts`:
- Around line 29-44: The test title implies an explicit start() but currently
relies on implicit starting via setting port2.onmessage; update the test to call
port2.start() explicitly (or rename the test to mention implicit start) so
intent is clear—specifically modify the test "messages buffered before start()
are delivered on start()" to invoke port2.start() before awaiting the Promise
(while keeping port1.postMessage calls and the port2.onmessage handler the same)
so the behavior and test name match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 899b870d-e9a0-440b-8707-70f469c7fc96

📥 Commits

Reviewing files that changed from the base of the PR and between 12d269f and 31ee5ec.

📒 Files selected for processing (1)
  • test/js/web/workers/message-port-pipe.test.ts

Comment thread test/js/web/workers/message-port-pipe.test.ts
Comment thread src/bun.js/bindings/webcore/MessagePort.cpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
src/bun.js/bindings/webcore/BroadcastChannel.cpp (2)

108-111: ⚠️ Potential issue | 🟠 Major

Finish contextDestroyed() teardown.

Line 49 takes a VM ref, but Lines 108-111 only call close(). That leaves the constructor’s jsRef() unbalanced on internal teardown and never clears ContextDestructionObserver’s cached context pointer.

Suggested fix
 void BroadcastChannel::contextDestroyed()
 {
+    if (m_hasRef) {
+        if (auto* context = scriptExecutionContext()) {
+            if (auto* globalObject = context->jsGlobalObject())
+                jsUnref(globalObject);
+        }
+    }
     close();
+    ContextDestructionObserver::contextDestroyed();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/bindings/webcore/BroadcastChannel.cpp` around lines 108 - 111,
contextDestroyed() currently only calls close(), leaving the VM reference
acquired in the constructor via jsRef() and ContextDestructionObserver's cached
context pointer intact; update BroadcastChannel::contextDestroyed() to release
the stored VM/JS reference (mirror the constructor's jsRef() acquisition) and
clear the ContextDestructionObserver's cached context pointer so the ref is
unbalanced and the cached context is nulled—i.e., after or as part of close()
call explicitly release/clear the jsRef() held by BroadcastChannel and reset the
observer's cached context pointer (use the same jsRef()/observer accessor names
used in the constructor) so teardown fully reverses construction.

118-123: ⚠️ Potential issue | 🟠 Major

hasPendingActivity() still races on listener state.

This path is lock-free now, but it still reads m_hasRelevantEventListener as a plain bool while eventListenersDidChange() writes it on the context thread. That is UB in C++ and can misreport pending activity.

Suggested fix
--- a/src/bun.js/bindings/webcore/BroadcastChannel.h
+++ b/src/bun.js/bindings/webcore/BroadcastChannel.h
@@
-    bool m_hasRelevantEventListener { false };
+    std::atomic<bool> m_hasRelevantEventListener { false };
 void BroadcastChannel::eventListenersDidChange()
 {
-    m_hasRelevantEventListener = hasEventListeners(eventNames().messageEvent);
+    m_hasRelevantEventListener.store(hasEventListeners(eventNames().messageEvent), std::memory_order_release);
 }
 
 bool BroadcastChannel::hasPendingActivity() const
 {
     uint64_t s = m_state.load(std::memory_order_acquire);
     if (s & Closed)
         return false;
-    return m_hasRelevantEventListener || (s >> QueuedShift) > 0;
+    return m_hasRelevantEventListener.load(std::memory_order_acquire) || (s >> QueuedShift) > 0;
 }
#!/bin/bash
set -euo pipefail
rg -n -C2 '\bm_hasRelevantEventListener\b|hasPendingActivity\(|eventListenersDidChange\(' \
  src/bun.js/bindings/webcore/BroadcastChannel.h \
  src/bun.js/bindings/webcore/BroadcastChannel.cpp

Expected result: m_hasRelevantEventListener is declared as a plain bool, written in eventListenersDidChange(), and read in hasPendingActivity().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/bindings/webcore/BroadcastChannel.cpp` around lines 118 - 123,
hasPendingActivity() reads m_hasRelevantEventListener without synchronization
which is UB because eventListenersDidChange() writes it on another thread;
change m_hasRelevantEventListener to std::atomic<bool> (or otherwise access it
under the same lock used for other state) and update eventListenersDidChange()
to store with appropriate ordering (e.g. memory_order_release) and
hasPendingActivity() to load with memory_order_acquire (or use the same mutex)
so the listener flag is read/written atomically and the lock-free path is safe;
update declarations and both usages of m_hasRelevantEventListener accordingly.
test/js/web/workers/message-port-pipe.test.ts (1)

29-40: 🧹 Nitpick | 🔵 Trivial

Make this test explicit about start().

The body currently relies on implicit start via onmessage, so it is not actually exercising the behavior named in the test title. Either call port2.start() or rename the test to describe implicit start.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/js/web/workers/message-port-pipe.test.ts` around lines 29 - 40, The test
"messages buffered before start() are delivered on start()" relies on implicit
start via assigning port2.onmessage; explicitly call port2.start() to exercise
the behavior named in the test (or alternatively rename the test to indicate
implicit start). Modify the test around the MessageChannel usage (symbols:
port1, port2) to call port2.start() after setting up port2.onmessage (or before
posting if you prefer) so the test actually verifies delivery on explicit
start().
src/bun.js/bindings/webcore/MessagePort.cpp (2)

118-134: ⚠️ Potential issue | 🟠 Major

Run the same listener cleanup path on transfer that close() uses.

Transferred-in ports opt into onDidChangeListenerImpl(), which can refEventLoop() the current context for message listeners. disentangle() detaches the pipe and drops the observer without clearing those listeners, so the transferred-away wrapper can keep the old context alive until it is eventually collected.

Possible fix
 TransferredMessagePort MessagePort::disentangle()
 {
     ASSERT(isEntangled());
+    removeAllEventListeners();

     // Release the pipe to its next owner. Messages that arrive while in
     // transit buffer in the pipe; the receiving context's entangle() call
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/bindings/webcore/MessagePort.cpp` around lines 118 - 134,
MessagePort::disentangle() currently detaches the pipe and drops the observer
but does not run the same listener-cleanup path that close() uses, so
transferred-away ports can retain message listeners that refEventLoop() the old
context; update disentangle() to perform the same listener teardown as close()
(i.e. run the onDidChangeListenerImpl()/listener-removal routine used by close()
so any refEventLoop()ing is undone and listeners are cleared) before
detaching/handing off the pipe and dropping the context observer, ensuring the
transferred wrapper cannot keep the old context alive.

158-168: ⚠️ Potential issue | 🔴 Critical

Don't dequeue the whole inbox before the port's transfer boundary.

takeAll() removes every pending message up front and stores them in later postTask() lambdas. If disentangle() runs before those tasks execute, Line 174 drops them because the old wrapper is already detached, so messages posted before transfer disappear instead of following the port to its next owner. Keep messages in the pipe until the delivery task actually pops them, or restore them on detach.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/bindings/webcore/MessagePort.cpp` around lines 158 - 168, The code
currently calls m_pipe->takeAll() which removes all pending messages immediately
and stores them in postTask lambdas, so if disentangle() runs before those tasks
execute the messages are lost; change the approach in MessagePort.cpp so
messages are not dequeued up-front: instead capture the port identity but leave
messages in m_pipe and have each posted task pull (pop/take) a single message at
execution time and call dispatchOneMessage(context, message), or if you prefer
keep takeAll() you must detect detach in disentangle() and restore any taken
messages back into m_pipe; update the postTask lambdas and the logic around
m_pipe->takeAll/m_pipe->pop and the disentangle() path accordingly so messages
follow the port to its next owner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/webcore/TransferredMessagePort.h`:
- Around line 42-51: The TransferredMessagePort type should be move-only: remove
or mark deleted the copy constructor TransferredMessagePort(const
TransferredMessagePort&) and copy assignment operator TransferredMessagePort&
operator=(const TransferredMessagePort&), keep the move ctor
TransferredMessagePort(TransferredMessagePort&&) = default and move assignment
TransferredMessagePort& operator=(TransferredMessagePort&&) = default; then
update any call sites that currently copy TransferredMessagePort to use
std::move so ownership is transferred rather than duplicated.

In `@test/js/web/workers/message-port-pipe.test.ts`:
- Around line 106-108: The test currently asserts stderr is empty after awaiting
proc.stdout.text(), proc.stderr.text(), and proc.exited; but JS subprocesses may
emit a known ASAN startup warning (from bunEnv) that should be filtered first.
Update the assertion to remove/filter the ASAN startup warning from the captured
stderr (e.g., drop any line containing "WARNING: ASAN interferes" or use the
repo's existing stderr filter helper) and then assert the filtered stderr is
empty; apply the same change for the similar assertion around the
stdout/stderr/exitCode block at the later occurrence (the one referenced at
lines 149-151).

---

Duplicate comments:
In `@src/bun.js/bindings/webcore/BroadcastChannel.cpp`:
- Around line 108-111: contextDestroyed() currently only calls close(), leaving
the VM reference acquired in the constructor via jsRef() and
ContextDestructionObserver's cached context pointer intact; update
BroadcastChannel::contextDestroyed() to release the stored VM/JS reference
(mirror the constructor's jsRef() acquisition) and clear the
ContextDestructionObserver's cached context pointer so the ref is unbalanced and
the cached context is nulled—i.e., after or as part of close() call explicitly
release/clear the jsRef() held by BroadcastChannel and reset the observer's
cached context pointer (use the same jsRef()/observer accessor names used in the
constructor) so teardown fully reverses construction.
- Around line 118-123: hasPendingActivity() reads m_hasRelevantEventListener
without synchronization which is UB because eventListenersDidChange() writes it
on another thread; change m_hasRelevantEventListener to std::atomic<bool> (or
otherwise access it under the same lock used for other state) and update
eventListenersDidChange() to store with appropriate ordering (e.g.
memory_order_release) and hasPendingActivity() to load with memory_order_acquire
(or use the same mutex) so the listener flag is read/written atomically and the
lock-free path is safe; update declarations and both usages of
m_hasRelevantEventListener accordingly.

In `@src/bun.js/bindings/webcore/MessagePort.cpp`:
- Around line 118-134: MessagePort::disentangle() currently detaches the pipe
and drops the observer but does not run the same listener-cleanup path that
close() uses, so transferred-away ports can retain message listeners that
refEventLoop() the old context; update disentangle() to perform the same
listener teardown as close() (i.e. run the
onDidChangeListenerImpl()/listener-removal routine used by close() so any
refEventLoop()ing is undone and listeners are cleared) before detaching/handing
off the pipe and dropping the context observer, ensuring the transferred wrapper
cannot keep the old context alive.
- Around line 158-168: The code currently calls m_pipe->takeAll() which removes
all pending messages immediately and stores them in postTask lambdas, so if
disentangle() runs before those tasks execute the messages are lost; change the
approach in MessagePort.cpp so messages are not dequeued up-front: instead
capture the port identity but leave messages in m_pipe and have each posted task
pull (pop/take) a single message at execution time and call
dispatchOneMessage(context, message), or if you prefer keep takeAll() you must
detect detach in disentangle() and restore any taken messages back into m_pipe;
update the postTask lambdas and the logic around m_pipe->takeAll/m_pipe->pop and
the disentangle() path accordingly so messages follow the port to its next
owner.

In `@test/js/web/workers/message-port-pipe.test.ts`:
- Around line 29-40: The test "messages buffered before start() are delivered on
start()" relies on implicit start via assigning port2.onmessage; explicitly call
port2.start() to exercise the behavior named in the test (or alternatively
rename the test to indicate implicit start). Modify the test around the
MessageChannel usage (symbols: port1, port2) to call port2.start() after setting
up port2.onmessage (or before posting if you prefer) so the test actually
verifies delivery on explicit start().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8cfccd98-afe6-4332-9002-307661288f1e

📥 Commits

Reviewing files that changed from the base of the PR and between 31ee5ec and 97d5fba.

📒 Files selected for processing (7)
  • src/bun.js/bindings/webcore/BroadcastChannel.cpp
  • src/bun.js/bindings/webcore/BroadcastChannel.h
  • src/bun.js/bindings/webcore/MessageChannel.cpp
  • src/bun.js/bindings/webcore/MessagePort.cpp
  • src/bun.js/bindings/webcore/MessagePort.h
  • src/bun.js/bindings/webcore/TransferredMessagePort.h
  • test/js/web/workers/message-port-pipe.test.ts

Comment thread src/bun.js/bindings/webcore/TransferredMessagePort.h Outdated
Comment thread test/js/web/workers/message-port-pipe.test.ts
robobun added 2 commits April 29, 2026 13:17
…tangle, fold listener flag into state atomic

- MessagePortPipe::drainAndDispatch now pops exactly one message under
  the lock, dispatches it, and re-posts itself if more remain. Messages
  never leave the inbox until the instant they're dispatched, so a port
  transferred between drain tasks carries its undelivered queue to the
  new owner instead of dropping it. DrainScheduled stays set across the
  self-reschedule and is cleared under the lock when the inbox empties.
  takeAll() is gone.
- scheduleDrain clears DrainScheduled if postTaskTo fails (context
  already torn down) so a later attach() can reschedule.
- MessagePort::disentangle calls removeAllEventListeners() before
  detaching so the event-loop ref taken by transferred-port listeners
  is released on the old context.
- BroadcastChannel folds m_hasRelevantEventListener into the state
  atomic as a HasMessageListener bit; hasPendingActivity is now a
  single acquire load. contextDestroyed chains to the base.
- BunBroadcastChannelRegistry::post rolls back the QueuedOne bump when
  postTaskTo returns false.
hasPendingActivity() runs on the concurrent GC thread and dereferences
m_pipe; close()/disentangle() were nulling it on the mutator thread,
opening a TOCTOU to a null-deref (or UAF if the mutator dropped the
last ref between loads). Keep the Ref for the port's whole lifetime —
close() just sets the pipe side's Closed bit, disentangle() detach()es
and hands out a copyRef(). The pipe is ThreadSafeRefCounted so the
extra ref is free, and hasPendingActivity() now only touches immutable
pointers and atomic state.
Comment thread src/bun.js/bindings/webcore/TransferredMessagePort.h
robobun added 2 commits April 29, 2026 13:49
When a port is transferred and the carrying message never reaches a
new owner (destination closed, inbox discarded on close(), receiver
detached before dispatch), the defaulted destructor only dropped the
RefPtr — the orphaned side's Closed bit stayed 0, so the peer's
hasPendingActivity() returned true via isOtherSideOpen() forever,
pinning the peer's JS wrapper.

Make TransferredMessagePort move-only with a destructor that closes
its side when still owned. entangle() already releaseNonNull()s the
pipe, so the happy path is a no-op. The destructor and move-assign
are out-of-line in MessagePortPipe.cpp to avoid the header cycle
MessagePortPipe.h → MessageWithMessagePorts.h → TransferredMessagePort.h.

Fix the one Worker.cpp lambda that copy-captured MessageWithMessagePorts
(and drop its dead 'ports' capture).

Add a test: 200 peers of dropped-in-transit ports are now collected
(finalized > 0); on main they are all pinned (finalized == 0).
…e worker subprocess tests behind debug/ASAN

The verification harness's src/ revert doesn't round-trip file
deletions cleanly: reverting to main brings the old files back, and
restoring to the branch leaves them as staged adds instead of
re-deleting them, so both builds try to compile the old .cpp against
the new headers. Keeping the files as empty stubs means the harness
only sees modifications, which it handles correctly. These can be
removed outright once the harness limitation is lifted.

Also gate the burst-ordering test behind isDebug||isASAN alongside
the concurrent-churn test: both spawn worker subprocesses that hit a
pre-existing pthread_exit→_Unwind_Find_FDE segfault in the locally-
built stripped release binary in this container (reproduces on main's
src/ too; does not reproduce in bun-profile or the baked release).
The debug/ASAN lanes cover the correctness properties.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
test/js/web/workers/message-port-pipe.test.ts (1)

29-40: 🧹 Nitpick | 🔵 Trivial

Test name still says explicit start(), but this case relies on implicit start via onmessage.

Either call port2.start() here or rename the test so the behavior under test is unambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/js/web/workers/message-port-pipe.test.ts` around lines 29 - 40, The test
named "messages buffered before start() are delivered on start()" is ambiguous
because it relies on implicit starting via port2.onmessage instead of an
explicit start(); update the test to either call port2.start() before awaiting
the messages or rename the test string to reflect implicit start via onmessage
(e.g., "messages buffered before onmessage are delivered when handler set");
modify the test that uses MessageChannel and variables port1/port2 and the
onmessage handler accordingly so the behavior under test is unambiguous.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/webcore/MessagePort.cpp`:
- Around line 206-223: The concurrent GC read in MessagePort::hasPendingActivity
currently reads plain bools (m_isDetached, m_hasMessageEventListener) which is
UB; change those fields to std::atomic<bool> (or otherwise source the state from
MessagePortPipe atomics) and replace the plain reads here with atomic loads
(e.g. load(memory_order_relaxed) or stronger if needed). Ensure all
writers/assignments to m_isDetached and m_hasMessageEventListener use atomic
stores, and update any setters or code paths that mutate these flags to use the
atomic type so the GC-thread access is safe; alternatively, move the logic to
derive these checks from existing atomic state in MessagePortPipe (e.g., use
MessagePortPipe::state/isOtherSideOpen) and remove plain-bool checks.

---

Duplicate comments:
In `@test/js/web/workers/message-port-pipe.test.ts`:
- Around line 29-40: The test named "messages buffered before start() are
delivered on start()" is ambiguous because it relies on implicit starting via
port2.onmessage instead of an explicit start(); update the test to either call
port2.start() before awaiting the messages or rename the test string to reflect
implicit start via onmessage (e.g., "messages buffered before onmessage are
delivered when handler set"); modify the test that uses MessageChannel and
variables port1/port2 and the onmessage handler accordingly so the behavior
under test is unambiguous.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f4c55f0e-e3c3-4e5a-8d5f-62796aee6c0c

📥 Commits

Reviewing files that changed from the base of the PR and between 560c5c0 and 12fe7e5.

📒 Files selected for processing (6)
  • src/bun.js/bindings/webcore/MessagePort.cpp
  • src/bun.js/bindings/webcore/MessagePort.h
  • src/bun.js/bindings/webcore/MessagePortPipe.cpp
  • src/bun.js/bindings/webcore/TransferredMessagePort.h
  • src/bun.js/bindings/webcore/Worker.cpp
  • test/js/web/workers/message-port-pipe.test.ts

Comment thread src/bun.js/bindings/webcore/MessagePort.cpp
Comment thread src/bun.js/bindings/ScriptExecutionContext.h Outdated
robobun added 2 commits April 29, 2026 15:19
…sagePort.h

When the harness reverts src/ to origin/main for the without-fix
build, MessagePortPipe.{h,cpp} survive as new untracked files and are
still globbed into the unified build, but MessagePort.h and
TransferredMessagePort.h revert to their identifier-based forms. The
.cpp then references dispatchOneMessage() and the struct-typed
TransferredMessagePort that don't exist on main.

Add BUN_MESSAGEPORT_USES_PIPE to the new MessagePort.h and wrap
MessagePortPipe.cpp's body in it, so the TU compiles to an empty
object when paired with main's headers. With-fix builds are
unchanged (the marker is defined).
Comment thread src/bun.js/bindings/webcore/Worker.cpp Outdated
Pre-existing copy-paste in jsFunctionPostMessage: when
disentanglePorts() fails (duplicate/closed port in the worker's
self.postMessage transfer list), the error branch released the
exception from 'serialized' — which at that point holds a value, not
an exception — instead of from 'disentangledPorts'. In debug that
trips the Expected::error() assert; in release it propagates garbage
and the real DataCloneError is lost. One-token fix adjacent to the
lambda-capture change this PR already makes in the same function.

@Jarred-Sumner Jarred-Sumner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests checking onjectTypeCounts goes down after closing MessagePort and Bun.gc(true)? What other adversarial tests can you think of?

…erage

- objectTypeCounts: MessageChannel/MessagePort/BroadcastChannel wrapper
  counts rise under allocation and fall back to baseline after close()
  + Bun.gc(true). Same test verifies the hasPendingActivity contract:
  a port with a message listener is pinned while its peer is open and
  released once the peer closes; dropping the sender refs then brings
  MessagePort count back to baseline.
- close() inside an onmessage handler stops further deliveries (only
  msgs 1,2 of 5 arrive).
- Messages queued before/after a transfer follow the port to its new
  owner in order (exercises detach→buffer→attach flush).
- Chained transfer: a port hops A→B then B→C→D and still delivers.

@robobun robobun left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't — added one in 64b361d. It's a two-part subprocess check:

Part 1 — allocate 200 MessageChannel (+touch both ports so wrappers exist) and 100 BroadcastChannel, verify heapStats().objectTypeCounts rises, then close() everything, drop refs, Bun.gc(true), and verify all three counts fall back to within a small slack of baseline.

Part 2 — the hasPendingActivity contract: create 100 channels, set port2.onmessage and drop the port2 JS ref, keep port1. After GC, MessagePort count stays at ~200 (100 held + 100 pinned via isOtherSideOpen). Close all port1s → count drops to ~100 (just the held senders). Drop senders[] → count goes to ~0.

Also added in the same commit:

  • close() inside an onmessage handler stops further deliveries (reentrancy)
  • Messages queued on a port before transfer follow it to the new owner (exercises the detach→buffer→attach path)
  • Chained transfer through two carrier pairs

Other adversarial cases I considered but didn't add because they're already covered elsewhere or would need dedicated infrastructure:

  • Worker terminates mid-transfer — the port dropped in transit test already covers the worst case of this (endpoint orphaned → peer released)
  • receiveMessageOnPort racing a drain task — covered implicitly by the FIFO test + test-worker-message-port-receive-message.js
  • Pipe chain cycles — blocked at the API level by the "port may not be posted through itself or its peer" check; can't construct one
  • BroadcastChannel fan-out while a subscriber's context is being torn down — worth adding but needs worker-termination timing control; could go in a follow-up alongside the existing broadcast-channel-worker-gc.test.ts
  • Concurrent GC during cross-thread burst — the debug build runs with BUN_GARBAGE_COLLECTOR_LEVEL=2 under the harness, so the burst test already stresses this

Happy to add any of those if you'd like them in this PR.

@Jarred-Sumner

Copy link
Copy Markdown
Collaborator

@robobun can you double check the draining behavior matches with node? I think they batch drain with a limit and don’t drain microtasks between individual messages? Also, if we can we really should batch the messages it’s a lot of allocations and RefCount churn

…essage

Node's MessagePort::OnMessage (src/node_messaging.cc) handles a single
uv_async wakeup by looping up to max(initial-queue-size, 1000)
iterations: each iteration pops one message, calls MakeCallback to
fire the event (whose InternalCallbackScope drains nextTick +
microtasks on exit), and continues. setImmediate/setTimeout don't
interleave within a batch; the cap prevents a fast sender from
starving the loop.

Match that: drainAndDispatch now loops within a single posted task,
popping one message under the lock, dispatching, then calling
globalObject->drainMicrotasks() before the next iteration. The
Attached flag is re-checked each time so a handler that
closes/transfers the port leaves the remaining inbox buffered for
the new owner. If the cap is hit, one scheduleDrain() reschedules
the remainder.

This replaces one postTaskTo lambda + Ref<Pipe> + task execution per
message with one per batch. Microtask ordering (m0,u0,m1,u1,...) and
nextTick ordering (m0,n0,m1,n1,...) match Node v24; setImmediate does
not interleave (also matches Node).
The ctxId check added in b3aff0a catches cross-context transfers, but a
handler can synchronously detach its own port (transfer through a local
carrier) and re-wrap it via receiveMessageOnPort on the same context.
That restores the same ctxId with a different MessagePort wrapper, so
the drain loop kept dispatching to the stale m_isDetached port and
silently dropped the remaining inbox.

Compare s.port identity under the lock each iteration; if it no longer
matches the captured port, stop — the new owner's attach() has
scheduled its own drain and will deliver the rest.
@Jarred-Sumner

Copy link
Copy Markdown
Collaborator

Benchmark: PR #29937 vs main (canary f97aa654c)

linux-x64, Xeon Platinum 8375C @ 2.90GHz. PR build ec5045393 via bunx bun-pr 29937.

TL;DR — faster. Every end-to-end scenario is faster on this branch. The one per-call regression is BroadcastChannel.postMessage send-side latency (1→1 same-thread), but it's outweighed by the dispatch-side win so end-to-end BroadcastChannel is still net faster.

hyperfine (end-to-end wall clock)

Scenario main this PR Δ
MessageChannel same-thread, 200k postMessage 258.8 ms ± 1.1 201.1 ms ± 1.5 1.29× faster
MessagePort cross-thread (worker parentPort), 100k round trips 197.2 ms ± 2.8 186.8 ms ± 2.4 1.06× faster
BroadcastChannel same-thread, 50k msgs × 4 subscribers 270.8 ms ± 2.6 235.4 ms ± 1.2 1.15× faster
BroadcastChannel cross-thread, 50k msgs × 4 worker subscribers 379.3 ms ± 8.6 288.9 ms ± 3.7 1.31× faster

The cross-thread BroadcastChannel win is the biggest practical one — user time drops 783ms→558ms and sys 896ms→446ms, consistent with dropping the MainThreadBridge bounce.

mitata (per-call latency)

Op main this PR Δ
new MessageChannel() 1.63 µs 629 ns 2.6× faster
port.postMessage(int) same-thread 195 ns 175 ns 1.12× faster
port.postMessage({a,b,c}) same-thread 897 ns 879 ns ~same (clone-dominated)
worker.postMessage(string) (repo's bench/postMessage/postMessage-string.mjs) 730 ns 440 ns 1.66× faster
bc.postMessage(int) 1→1 same-thread (send only) 415 ns 561 ns 1.35× slower ⚠️

The bc.postMessage send-side bump is presumably the per-subscriber clone+task-post now happening synchronously inside postMessage instead of being deferred through the bridge. End-to-end it still wins (row 3 above), so the work just moved earlier rather than grew.

Crash note

I originally wrote the cross-thread MessagePort benchmark as new MessageChannel() → transfer port2 to a Worker → 100k round trips. On main that segfaults during worker startup (Segmentation fault at address 0x8, bun.report/1.3.14/la2f97aa65g…) or hangs; on this PR it runs clean. I switched the comparison to use the worker's built-in parentPort so main could survive the baseline run.

Benchmark sources
// mc-same-thread.mjs — 200k same-thread port1→port2
const N = 200000;
const { port1, port2 } = new MessageChannel();
let r = 0;
port2.onmessage = () => { if (++r === N) process.exit(0); };
port1.start(); port2.start();
for (let i = 0; i < N; i++) port1.postMessage(i);
// mc-cross-thread.mjs — 100k round trips over worker parentPort
import { Worker, isMainThread, parentPort } from "node:worker_threads";
const N = 100000;
if (isMainThread) {
  const w = new Worker(new URL(import.meta.url));
  let e = 0;
  w.on("message", () => { if (++e === N) w.terminate().then(() => process.exit(0)); });
  await new Promise(r => w.once("online", r));
  for (let i = 0; i < N; i++) w.postMessage(i);
} else {
  parentPort.on("message", d => parentPort.postMessage(d));
}
// bc-same-thread.mjs — 50k msgs × 4 subscribers
const N = 50000, SUBS = 4, target = N * SUBS;
let r = 0;
const subs = Array.from({ length: SUBS }, () => {
  const c = new BroadcastChannel("bench");
  c.onmessage = () => { if (++r === target) process.exit(0); };
  return c;
});
const sender = new BroadcastChannel("bench");
for (let i = 0; i < N; i++) sender.postMessage(i);
// bc-cross-thread.mjs — 50k msgs × 4 worker subscribers
import { Worker, isMainThread, parentPort, workerData } from "node:worker_threads";
const N = 50000, SUBS = 4;
if (isMainThread) {
  const counter = new Int32Array(new SharedArrayBuffer(4));
  const ws = Array.from({ length: SUBS }, () => new Worker(new URL(import.meta.url), { workerData: { counter } }));
  await Promise.all(ws.map(w => new Promise(r => w.once("message", r))));
  const s = new BroadcastChannel("bench");
  for (let i = 0; i < N; i++) s.postMessage(i);
  while (Atomics.load(counter, 0) < N * SUBS) await new Promise(r => setTimeout(r, 0));
  await Promise.all(ws.map(w => w.terminate()));
  process.exit(0);
} else {
  const c = new BroadcastChannel("bench");
  c.onmessage = () => Atomics.add(workerData.counter, 0, 1);
  parentPort.postMessage("ready");
}

@Jarred-Sumner

Copy link
Copy Markdown
Collaborator

Peak RSS

Same scenarios as above, /usr/bin/time -f %M, median of 5 runs (cross-thread rows re-verified with 10).

Scenario main this PR Δ
MessageChannel same-thread, 200k msgs 155.0 MB 135.4 MB −19.5 MB (−13%)
MessagePort cross-thread, 100k round trips 146.4 MB 147.6 MB +1.2 MB (noise — ranges overlap)
BroadcastChannel same-thread, 50k × 4 subs 114.2 MB 109.1 MB −5.2 MB (−5%)
BroadcastChannel cross-thread, 50k × 4 workers 124.3 MB 75.6 MB −48.7 MB (−39%)

The cross-thread BroadcastChannel drop is consistent across 10 runs (main 122–128 MB, PR 74–77 MB) — presumably the old MainThreadBridge was buffering 50k × 4 message copies on the main thread before fan-out, and the new path posts directly to each subscriber's context so they drain concurrently instead of piling up.

@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

Build #49252 Windows failures are infra, not this PR:

  • :windows: 2019 x64 — exit −1, agent robobun-windows-1 died 3 min in at test 20/500 (log cuts off mid-line). Same pattern on builds 49246–49253.
  • :windows: 2019 x64-baseline — only failure is test/cli/install/bun-create.test.ts hitting GitHub API 403 rate limit. Same test is failing on builds 49246, 49247, 49250, 49251, 49253, 49254 across unrelated branches.

Deflake for the rate-limit case: #29956.

Retry the two jobs from the Buildkite UI — the BUILDKITE_TOKEN here is read-only so I can't trigger it.

Resolve weak→strong under the lock directly into an inline-capacity
Vector<pair<ctxId, Ref<BroadcastChannel>>, 4> instead of copying the
whole Subscriber vector (heap alloc + 2N ThreadSafeWeakPtr atomics)
and resolving later. For ≤4 targets this is zero heap allocations and
one CAS per target.

Measured on local release build, 1→1 same-thread send-only:
  before: ~691 ns/op
  after:  ~631 ns/op  (≈9% faster)

The remaining gap to main's 415ns is the fan-out now running inside
postMessage rather than deferred through the old MainThreadBridge hop;
end-to-end BroadcastChannel is still net faster.
@robobun

robobun commented Apr 30, 2026

Copy link
Copy Markdown
Collaborator Author

Nice — thanks for running these.

On the bc.postMessage send-side ⚠️: the previous snapshot path copied the whole Vector<Subscriber> (heap alloc + 2N ThreadSafeWeakPtr copy/dtor atomics) before resolving weak→strong. 2110d58 collects straight into a Vector<pair<ctxId, Ref<BroadcastChannel>>, 4> under the lock — inline storage for ≤4 targets, one CAS per target, no weak-copy round-trip.

Local release, same 1→1 send-only shape:

before: ~691 ns/op
after:  ~631 ns/op   (≈9%)

(Slower CPU here than your Xeon, so absolute ns differ; should scale proportionally.)

The remaining gap vs main is the fan-out itself (lock + HashMap::find + postTaskTo) now running inside postMessage instead of behind the ensureOnMainThread hop. As you said — moved, not grown. Deferring it again would put the cross-thread 1.31× win back on the table, so I'd rather keep it synchronous.

Re the crash note: the MessageChannel → transfer port2 → round-trip case that segfaults/hangs main is exactly what the old registry-lookup-and-bounce design made racy; glad to see it run clean here.

worker.postMessage / parentPort.postMessage previously posted one task
per message through ScriptExecutionContext::postTaskTo (global contexts
lock + HashMap lookup + lambda alloc + concurrent enqueue). A round trip
of N messages was ~2N contended lock acquisitions.

Give Worker two MessageInbox structs (parent→worker, worker→parent),
each a lock + Deque + drainScheduled flag. enqueue appends under the
lock and posts one drain task if none is in flight; the drain task
swaps the whole queue out under the lock and dispatches from the local
copy with a microtask checkpoint between each, matching Node's
MessagePort::OnMessage semantics. Cap is max(initial size, 1000) before
yielding and rescheduling.

Swapping rather than per-message popping keeps the sender and drainer
from contending on the inbox lock under sustained bursts.

Pre-online buffering is preserved: enqueueToWorker skips scheduling
while !Online; fireEarlyMessages kicks the first drain once the worker
is up.

Local release, 100k parentPort round-trips:
  before: ~240 ms
  after:  ~176 ms  (1.36× faster)

bench/postMessage/postMessage-string.mjs send-side:
  ~440 ns/iter (main) → ~217 ns/iter
@robobun

robobun commented Apr 30, 2026

Copy link
Copy Markdown
Collaborator Author

Worker postMessage round-tripe4b77d7: applied the same coalescing pattern to worker.postMessage / parentPort.postMessage. Each direction gets a MessageInbox (lock + Deque<MessageWithMessagePorts> + drainScheduled flag); a burst of N posts schedules one cross-thread drain task that swaps the queue out under the lock and loops with drainMicrotasks() between each dispatch, same max(size, 1000) budget as MessagePortPipe.

The swap-then-dispatch (vs per-message pop) keeps the sender and drainer off each other's lock under sustained bursts — important because the repo's own bench/postMessage/postMessage-string.mjs floods at ~200ns/send.

Local release (slower box than yours):

2110d58 e4b77d7 Δ
100k parentPort round-trips ~240 ms ~176 ms 1.36×
postMessage(11 char string) send-only ~217 ns (vs your 440 ns on main)

Observable ordering unchanged (m0,µ0,m1,µ1,... both directions — added a test for it). Pre-online buffering goes through the inbox too; fireEarlyMessages kicks the first drain.

All 258 tests in test/js/web/workers/ + broadcastchannel/ pass on release; all node test-worker-* parallels pass.

On build #49268's failure: it's the Zig compiler itself panicking (reached unreachable code in Sema.handleExternLibName) compiling process_windows_translate_c on 🪟 x64-baseline. Same panic on builds 49261 (jarred/filepoll-bidirectional) and 49263; the step passed on my previous commit with identical Zig source and on 15/16 other build-zig targets. Not mine.

@Jarred-Sumner Jarred-Sumner merged commit 6bcdf83 into main Apr 30, 2026
74 of 76 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/0b8cdb6f/messageport-pipe branch April 30, 2026 20:57
robobun added a commit that referenced this pull request Apr 30, 2026
Resolves conflicts with #29937 (MessagePortPipe) which added batched
MessageInbox queues to Worker. Adapts the new code to this PR's
lifecycle:
- enqueueToWorker: '!(m_onlineClosingFlags & OnlineFlag)' →
  'm_state != Running'
- enqueueToParent / drainToParent: route through postTaskToParent()
  (stable m_parentContextId) instead of reading scriptExecutionContext()
  from the worker thread
- drop networkStateChanged and flag constants (already removed here)
- keep m_toWorker/m_toParent inboxes and drain machinery as-is
Jarred-Sumner pushed a commit that referenced this pull request Apr 30, 2026
…9895)

## What

Fixes a stack overflow in `MessagePortPipe::close()` when a deep chain
of nested transferred MessagePorts is closed, and adds regression
coverage for the transferred-port-dropped leak that #29937's RAII
`TransferredMessagePort` destructor now handles.

## Repro

```js
const DEPTH = 20_000;
let head = new MessageChannel();
const tail = head;
for (let i = 0; i < DEPTH; i++) {
  const next = new MessageChannel();
  head.port2.postMessage(null, [next.port1]); // next.port1 lands in head.port1's inbox
  head.port2.close();
  head = next;
}
tail.port1.close();  // SIGSEGV (stack overflow)
```

## Root cause

`TransferredMessagePort` is move-only and its destructor calls
`pipe->close(side)` so an orphaned in-transit port gets cleaned up.
`MessagePortPipe::close()` swaps the side's inbox into a local `Deque`
and lets it destruct outside the lock. If those dropped messages carry
transferred ports, the `Deque` destructor runs `~TransferredMessagePort`
→ `pipe->close()` → another `Deque` destructor → … one native stack
frame per link. ~20k links overflows an 8 MB stack.

## Fix

`MessagePortPipe::close()` owns a stack-local
`Vector<pair<RefPtr<MessagePortPipe>, uint8_t>>` worklist seeded with
`{this, side}`. For each entry it closes the side under the lock, swaps
out the inbox, then harvests transferred pipes from the dropped messages
into the worklist (nulling `tp.pipe` so the subsequent
`~TransferredMessagePort` is a no-op) before letting the batch destruct.
O(1) stack regardless of chain depth; no static / thread_local /
instance state.

## Verification

`test/js/web/workers/message-port-closed-leak.test.ts`:

| test | main (no fix) | this PR |
|--|--|--|
| transferred port dropped before receiver closed does not leak | pass
(#29937) | pass |
| transferred port dropped after receiver closed does not leak | pass
(#29937) | pass |
| **deep chain of nested transferred ports does not overflow on close**
| **SIGSEGV, exit 139** | pass |

`message-channel.test.ts` (12 tests) passes unchanged.

## History

This PR originally targeted the self-ref leak in the old
`MessagePortChannel` / `MessagePortChannelRegistry` stack
(`disentanglePort()` parked a self-ref in
`m_pendingMessagePortTransfers[i]` that was never released if the
carrying message was dropped). #29937 replaced that whole stack with
`MessagePortPipe` and fixed the leak via RAII — but the
destructor-driven cascade it introduced recurses unboundedly. The first
two tests are kept as regression guards for the leak; the deep-chain
test is the fail-before for this fix.

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
robobun added a commit that referenced this pull request May 1, 2026
…in post()

BunBroadcastChannelRegistry::post() resolved each subscriber's
ThreadSafeWeakPtr to a Ref<BroadcastChannel> for the fan-out. If a
worker subscriber tore down concurrently — its JS wrapper GC'd and
its queued task dropped, both on the worker thread — the posting
thread's local Ref became the last owner, and ~BroadcastChannel →
~EventTarget → EventListenerMap::clear() ran on the posting thread,
tripping releaseAssertOrSetThreadUID (the map was populated on the
worker thread). Seen in CI as a SIGABRT in
broadcast-channel-worker-gc.test.ts 'repeated worker create/terminate
stress' on musl x64-baseline, and on other branches post-#29937.

Carry ThreadSafeWeakPtr through the fan-out and resolve to a strong
ref only inside the posted task, on the target thread, so any last
deref happens there. This removes the need for the QueuedOne count
(which existed to keep the channel GC-rooted between post and
dispatch); a channel with no message listener is now collectable
before delivery, in which case the task resolves the weak ptr to
null and no-ops — equivalent to dispatchEvent with no listener.
robobun added a commit that referenced this pull request May 1, 2026
- web_worker.zig parent field: parent_poll_ref is now conditional on
  !default_unref, so the 'lives at least as long as this struct via
  parent_poll_ref' claim no longer holds unconditionally. Rewrite to
  describe the {ref:false} / .unref() case and point at the file
  header's detached-not-joined gap.
- Worker.cpp postTaskToWorkerGlobalScope Closing case: postMessage()
  now goes through enqueueToWorker() (since #29937), not this
  function; only getHeapSnapshot() calls it.
Jarred-Sumner pushed a commit that referenced this pull request May 3, 2026
`test/js/node/worker_threads/worker_threads.test.ts` occasionally
segfaults in CI with

```
panic: Segmentation fault at address 0x10
```

on a GC helper thread:

```
wtfThreadEntryPoint
  AutomaticThread::start
    ParallelHelperPool::Thread::work
      Heap::runBeginPhase(GCConductor)::$_1
        SlotVisitor::drainFromShared
          MarkingConstraintSolver::runExecutionThread
            MarkingConstraint::execute   ← "Sh" Strong Handles
              HandleSet::visitStrongHandles
                *(nullptr + offsetof(HandleNode, m_value))  = *(0x10)
```

(decoded from the `bun.report` trace on [build 50529 / 🐧 13
x64](https://buildkite.com/bun/bun/builds/50529#019dec3e-4d11-4651-b908-84e601bc2db3)
and symbolized against that build's `bun-profile`).

## Cause

`jsWorkerPrototypeFunction_getHeapSnapshotBody` does:

```cpp
Strong<JSPromise> strong(vm, promise);                      // parent VM's HandleSet
worker.postTaskToWorkerGlobalScope([strong, parentId](auto& workerCtx) {
    ...
    ScriptExecutionContext::postTaskTo(parentId,
        [strong, snapshot = ...](auto& parentCtx) { ... }); // runs on worker thread
});
```

`JSC::Strong<T>` has **no move constructor**. Capturing it by value
copy-constructs it, which calls `HandleSet::allocate()` +
`m_strongList.push()`; destroying it calls `HandleSet::deallocate()` +
`NodeList::remove()`. Both happen on the **worker thread** against the
**parent VM's** `HandleSet`, without the parent VM's lock.

`HandleSet::m_strongList` is a `SentinelLinkedList<HandleNode>` — not
thread-safe. `push`/`remove` transiently null `m_next`/`m_prev`. The
parent VM's "Sh" (Strong Handles) marking constraint
(`Heap::addCoreConstraints`) iterates that list during GC; when it
follows a null `m_next` it reads `*((HandleNode*)nullptr)->slot()` →
`*(0x0 + 0x10)`.

The `heapHelperPool()` is process-global, so the crashing helper thread
belongs to the parent VM's collector even though the worker VM's
`BunV8HeapSnapshotBuilder` full GC is in progress at the same time.

This has been there since `getHeapSnapshot` was added — the recent
worker lifetime rewrites (#29957, #29937) didn't introduce it.

## Fix

Heap-allocate the `Strong<JSPromise>` once on the parent thread and pass
only the raw pointer through the cross-thread lambdas. The worker thread
never dereferences it, so it never touches the parent VM's `HandleSet`.
The parent-side completion lambda resolves the promise and frees the
handle.

`Worker::postTaskToWorkerGlobalScope` now returns `bool` so a lost race
to `Closing`/`Closed` (worker exited between `isOnline()` and the post)
rejects with `ERR_WORKER_NOT_RUNNING` instead of silently leaking the
handle. If `postTaskTo(parentId, …)` on the return trip fails (parent
context gone), the handle intentionally leaks — deleting a parent-VM
`Strong` from the worker thread is exactly the bug we're fixing, and the
parent VM is tearing down anyway.

## Verification

Stress fixture (`heap-snapshot-gc-race-fixture.js`, 300 iterations of
`await worker.getHeapSnapshot(); Bun.gc(true)`), 40 runs each on
linux-x64 release:

| build | segfault at `0x10` |
| --- | --- |
| `52bdf47` (CI artifact, no fix) | 15 / 40 |
| this branch | 0 / 40 |

The new `worker_heap_snapshot_gc.test.ts` runs the fixture — 300 iters
in release, 5 in debug (a single debug heap snapshot takes ~1.6s so the
race window, which is a handful of instructions after each snapshot, is
impractical to hit there; the debug pass is a functional check).

## Drive-by: non-LTO strip leaves orphan `PT_GNU_EH_FRAME`

While reproducing I hit a second, unrelated crash in locally-built
(non-LTO) release binaries: `stripFlags` removed `.eh_frame_hdr` on
linux-gnu unconditionally, but the linker only passes
`--no-eh-frame-hdr` when LTO is on. GNU strip doesn't rewrite the
program header table, so the `PT_GNU_EH_FRAME` phdr was left pointing at
unmapped memory and any stack unwind (e.g. WTF::Thread teardown after a
worker exits) faulted. CI release builds always have LTO on so they
weren't affected. Gated the section removal on `c.lto` to match the
linker flag.

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…primitive (oven-sh#29937)

## What

Replaces the WebKit-derived `MessagePortChannelProvider` /
`MessagePortChannelProviderImpl` / `MessagePortChannelRegistry` /
`MessagePortChannel` stack — plus the
`BroadcastChannel::MainThreadBridge` / per-context
`BunBroadcastChannelRegistry` — with a small, thread-safe concurrency
primitive (`MessagePortPipe`) that the Web-facing classes wrap thinly.

Net −1082 lines (+737 / −1819), 10 files deleted.

## Why

The existing stack is Safari's multi-*process* design carried over
verbatim: an abstract provider so UIProcess can swap in an IPC impl, a
process-global `MessagePortIdentifier → MessagePortChannel` HashMap,
`ProcessIdentifier` tracking, self-`RefPtr` "protector" members to keep
channels alive across IPC hops, and a main-thread serialization point
for `BroadcastChannel`. None of that indirection is needed in Bun's
single-process multi-thread model — and worse, the
`MessagePortChannelRegistry::m_openChannels` HashMap and
`MessagePortChannel::m_pendingMessages` Vector are mutated from worker
threads with **no lock** (the WebKit `ASSERT(isMainThread())` guards
were just commented out). Concurrent `new MessageChannel()` across
workers corrupts the HashMap:

```
HashTable.h:1574:17: runtime error: member access within null pointer of type
  'const HashTable<MessagePortIdentifier, KeyValuePair<MessagePortIdentifier,
   WeakRef<MessagePortChannel>>, ...>'
```

This is the cause of oven-sh#16186, oven-sh#22471, oven-sh#25805 and likely oven-sh#29458.

## Design

**`MessagePortPipe`** — `ThreadSafeRefCounted`, two sides. Each side
has:
- a `Lock` + `Deque<MessageWithMessagePorts>` inbox
- one `std::atomic<uint64_t>` state word: `Closed | DrainScheduled |
Attached` in the low byte, queued-count in the high bits

`send()` locks the destination side, enqueues, and if `Attached &&
!DrainScheduled` flips `DrainScheduled` and posts one task to the
receiver's `ScriptExecutionContext`. A burst of N sends schedules at
most one cross-thread wakeup. `takeAll()` clears `DrainScheduled` before
swapping the deque, so a racing send reschedules — at most one extra
no-op drain, never a stranded message. `attach()`/`detach()` handle
transfer: a detached side buffers; attach flushes the backlog with one
wakeup. `hasPendingActivity` is two atomic loads (my queued count +
peer's `Closed`), safe from the GC thread with no lock.

**`MessagePort`** — holds `RefPtr<MessagePortPipe>` + `uint8_t side`.
`postMessage` → `pipe->send`. `start()` → `pipe->attach`. `close()` →
`pipe->close`. Transfer → `pipe->detach`, hand `{pipe, side}` through
`TransferredMessagePort`, re-create on the other side. No global maps,
no identifiers, no per-context port iteration.

The drain task takes the whole inbox in one go (sender-side batching),
then posts each message as its own local task so microtasks checkpoint
between deliveries — matching the HTML "port message queue is a task
source" semantics and Node's observable ordering.

**`BroadcastChannel`** — one process-global `Lock` + `HashMap<String,
Vector<{ctxId, weak channel}>>`. `postMessage` snapshots subscribers
under the lock, releases, then posts one task per `(message,
subscriber)` directly to each subscriber's context — no bounce through
the main thread, no `MainThreadBridge`, no second `allBroadcastChannels`
map. Per-channel inbox batching was prototyped and reverted: the spec's
same-event-loop `(message-major, creation-minor)` delivery-order test
(WPT `broadcast-channel.test.ts::"messages are delivered in port
creation order"`) fails if subscribers drain channel-major. A
per-`(context, name)` inbox could restore batching without breaking
order — left as future work.

## Deleted

`MessagePortChannel.{h,cpp}`, `MessagePortChannelProvider.{h,cpp}`,
`MessagePortChannelProviderImpl.{h,cpp}`,
`MessagePortChannelRegistry.{h,cpp}`, `MessagePortIdentifier.h`,
`BroadcastChannelRegistry.h`, the `allMessagePorts()` /
`portToContextIdentifier()` / `allBroadcastChannels()` /
`channelToContextIdentifier()` global maps,
`ScriptExecutionContext::{m_messagePorts,
processMessageWithMessagePortsSoon, dispatchMessagePortEvents,
createdMessagePort, destroyedMessagePort, m_broadcastChannelRegistry}`,
and `WorkerGlobalScope::messagePortChannelProvider()`.

## Verification

- `test/js/web/workers/message-channel.test.ts` — 12/12
- `test/js/web/broadcastchannel/broadcast-channel.test.ts` — 11/11
- `test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts` —
3/3
- `test/js/web/workers/` suite — 228 pass / 6 fail; all 6 fail
identically on `main` (container-slowness timeouts), zero new failures
- `test/js/node/worker_threads/worker_threads.test.ts` — 25 pass / 2
fail; both fail identically on `main`
- 10 node `parallel/test-worker-message-port-*.js` +
`test-broadcastchannel-*.js` — all pass
- New `test/js/web/workers/message-port-pipe.test.ts`:
- `concurrent MessageChannel creation across workers is race-free` —
**fails 3/3 on main** with the HashTable UBSan null-deref shown above;
**passes 5/5** on this branch
- microtask ordering, buffered-before-start, FIFO
`receiveMessageOnPort`, 1000-message cross-thread burst

## Relationship to other PRs

oven-sh#29832 / oven-sh#29442 patch the race by adding a lock to the existing
registry. This PR removes the registry instead — there's nothing left to
race on. Either approach fixes the crash; this one also sheds the
layering.

Fixes oven-sh#16186
Fixes oven-sh#22471
Fixes oven-sh#25805

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…en-sh#29895)

## What

Fixes a stack overflow in `MessagePortPipe::close()` when a deep chain
of nested transferred MessagePorts is closed, and adds regression
coverage for the transferred-port-dropped leak that oven-sh#29937's RAII
`TransferredMessagePort` destructor now handles.

## Repro

```js
const DEPTH = 20_000;
let head = new MessageChannel();
const tail = head;
for (let i = 0; i < DEPTH; i++) {
  const next = new MessageChannel();
  head.port2.postMessage(null, [next.port1]); // next.port1 lands in head.port1's inbox
  head.port2.close();
  head = next;
}
tail.port1.close();  // SIGSEGV (stack overflow)
```

## Root cause

`TransferredMessagePort` is move-only and its destructor calls
`pipe->close(side)` so an orphaned in-transit port gets cleaned up.
`MessagePortPipe::close()` swaps the side's inbox into a local `Deque`
and lets it destruct outside the lock. If those dropped messages carry
transferred ports, the `Deque` destructor runs `~TransferredMessagePort`
→ `pipe->close()` → another `Deque` destructor → … one native stack
frame per link. ~20k links overflows an 8 MB stack.

## Fix

`MessagePortPipe::close()` owns a stack-local
`Vector<pair<RefPtr<MessagePortPipe>, uint8_t>>` worklist seeded with
`{this, side}`. For each entry it closes the side under the lock, swaps
out the inbox, then harvests transferred pipes from the dropped messages
into the worklist (nulling `tp.pipe` so the subsequent
`~TransferredMessagePort` is a no-op) before letting the batch destruct.
O(1) stack regardless of chain depth; no static / thread_local /
instance state.

## Verification

`test/js/web/workers/message-port-closed-leak.test.ts`:

| test | main (no fix) | this PR |
|--|--|--|
| transferred port dropped before receiver closed does not leak | pass
(oven-sh#29937) | pass |
| transferred port dropped after receiver closed does not leak | pass
(oven-sh#29937) | pass |
| **deep chain of nested transferred ports does not overflow on close**
| **SIGSEGV, exit 139** | pass |

`message-channel.test.ts` (12 tests) passes unchanged.

## History

This PR originally targeted the self-ref leak in the old
`MessagePortChannel` / `MessagePortChannelRegistry` stack
(`disentanglePort()` parked a self-ref in
`m_pendingMessagePortTransfers[i]` that was never released if the
carrying message was dropped). oven-sh#29937 replaced that whole stack with
`MessagePortPipe` and fixed the leak via RAII — but the
destructor-driven cascade it introduced recurses unboundedly. The first
two tests are kept as regression guards for the leak; the deep-chain
test is the fail-before for this fix.

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…0185)

`test/js/node/worker_threads/worker_threads.test.ts` occasionally
segfaults in CI with

```
panic: Segmentation fault at address 0x10
```

on a GC helper thread:

```
wtfThreadEntryPoint
  AutomaticThread::start
    ParallelHelperPool::Thread::work
      Heap::runBeginPhase(GCConductor)::$_1
        SlotVisitor::drainFromShared
          MarkingConstraintSolver::runExecutionThread
            MarkingConstraint::execute   ← "Sh" Strong Handles
              HandleSet::visitStrongHandles
                *(nullptr + offsetof(HandleNode, m_value))  = *(0x10)
```

(decoded from the `bun.report` trace on [build 50529 / 🐧 13
x64](https://buildkite.com/bun/bun/builds/50529#019dec3e-4d11-4651-b908-84e601bc2db3)
and symbolized against that build's `bun-profile`).

## Cause

`jsWorkerPrototypeFunction_getHeapSnapshotBody` does:

```cpp
Strong<JSPromise> strong(vm, promise);                      // parent VM's HandleSet
worker.postTaskToWorkerGlobalScope([strong, parentId](auto& workerCtx) {
    ...
    ScriptExecutionContext::postTaskTo(parentId,
        [strong, snapshot = ...](auto& parentCtx) { ... }); // runs on worker thread
});
```

`JSC::Strong<T>` has **no move constructor**. Capturing it by value
copy-constructs it, which calls `HandleSet::allocate()` +
`m_strongList.push()`; destroying it calls `HandleSet::deallocate()` +
`NodeList::remove()`. Both happen on the **worker thread** against the
**parent VM's** `HandleSet`, without the parent VM's lock.

`HandleSet::m_strongList` is a `SentinelLinkedList<HandleNode>` — not
thread-safe. `push`/`remove` transiently null `m_next`/`m_prev`. The
parent VM's "Sh" (Strong Handles) marking constraint
(`Heap::addCoreConstraints`) iterates that list during GC; when it
follows a null `m_next` it reads `*((HandleNode*)nullptr)->slot()` →
`*(0x0 + 0x10)`.

The `heapHelperPool()` is process-global, so the crashing helper thread
belongs to the parent VM's collector even though the worker VM's
`BunV8HeapSnapshotBuilder` full GC is in progress at the same time.

This has been there since `getHeapSnapshot` was added — the recent
worker lifetime rewrites (oven-sh#29957, oven-sh#29937) didn't introduce it.

## Fix

Heap-allocate the `Strong<JSPromise>` once on the parent thread and pass
only the raw pointer through the cross-thread lambdas. The worker thread
never dereferences it, so it never touches the parent VM's `HandleSet`.
The parent-side completion lambda resolves the promise and frees the
handle.

`Worker::postTaskToWorkerGlobalScope` now returns `bool` so a lost race
to `Closing`/`Closed` (worker exited between `isOnline()` and the post)
rejects with `ERR_WORKER_NOT_RUNNING` instead of silently leaking the
handle. If `postTaskTo(parentId, …)` on the return trip fails (parent
context gone), the handle intentionally leaks — deleting a parent-VM
`Strong` from the worker thread is exactly the bug we're fixing, and the
parent VM is tearing down anyway.

## Verification

Stress fixture (`heap-snapshot-gc-race-fixture.js`, 300 iterations of
`await worker.getHeapSnapshot(); Bun.gc(true)`), 40 runs each on
linux-x64 release:

| build | segfault at `0x10` |
| --- | --- |
| `52bdf47` (CI artifact, no fix) | 15 / 40 |
| this branch | 0 / 40 |

The new `worker_heap_snapshot_gc.test.ts` runs the fixture — 300 iters
in release, 5 in debug (a single debug heap snapshot takes ~1.6s so the
race window, which is a handful of instructions after each snapshot, is
impractical to hit there; the debug pass is a functional check).

## Drive-by: non-LTO strip leaves orphan `PT_GNU_EH_FRAME`

While reproducing I hit a second, unrelated crash in locally-built
(non-LTO) release binaries: `stripFlags` removed `.eh_frame_hdr` on
linux-gnu unconditionally, but the linker only passes
`--no-eh-frame-hdr` when LTO is on. GNU strip doesn't rewrite the
program header table, so the `PT_GNU_EH_FRAME` phdr was left pointing at
unmapped memory and any stack unwind (e.g. WTF::Thread teardown after a
worker exits) faulted. CI release builds always have LTO on so they
weren't affected. Gated the section removal on `c.lto` to match the
linker flag.

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MessageChannel between Workers causes crash/unexpected exit Segmentation fault at address 0x00000018 Consistent SEGFAULT when using MessagePort

2 participants