Skip to content

node:worker_threads: implement postMessageToThread [1bx5ty]#29828

Open
robobun wants to merge 48 commits into
mainfrom
farm/6140e626/worker-postMessageToThread
Open

node:worker_threads: implement postMessageToThread [1bx5ty]#29828
robobun wants to merge 48 commits into
mainfrom
farm/6140e626/worker-postMessageToThread

Conversation

@robobun

@robobun robobun commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Implements worker_threads.postMessageToThread(threadId, value[, transferList][, timeout]) (Node 22+), which sends a message to any thread by its threadId and emits process.on('workerMessage', value, source) on the target.

How it works

Each node:worker_threads Worker now gets a dedicated MessageChannel to the main thread. The worker-side port is plumbed through the existing serialized [workerData, environmentData] array (extended to include the port at index 2) in JSWorker.cppcreateNodeWorkerThreadsBinding in Worker.cpp, and is set up when the worker first loads node:worker_threads.

The main thread keeps a threadId → port registry. postMessageToThread always routes through the main thread: the sender posts a SEND_MESSAGE_TO_WORKER envelope (directly if on main, via its mainThreadPort otherwise), the main thread forwards a RECEIVE_MESSAGE_FROM_WORKER to the destination, and a SharedArrayBuffer + Atomics.waitAsync carries the ack so the sender can await delivery/timeout. Threads created outside of node:worker_threads (e.g. via the Web Worker constructor) act as a local root for their subtree.

New error codes: ERR_WORKER_MESSAGING_ERRORED, ERR_WORKER_MESSAGING_FAILED, ERR_WORKER_MESSAGING_SAME_THREAD, ERR_WORKER_MESSAGING_TIMEOUT.

Fixes uncovered along the way

  • process.emit() always returned true — the native EventEmitter::emitForBindings (used by process) ignored the return value of fireEventListeners(). It now returns false when there are no listeners, matching Node and the JS EventEmitter.

  • MessagePort.prototype.unref() was a no-op on transferred ports — adding a 'message' listener on a transferred port refs the event loop via a counter that unref() never touched, so an unreffed port still kept the process alive. Unified into a single updateEventLoopRef() predicate m_hasRef && (m_messageEventCount > 0 || m_wantsExplicitRef) && !m_isDetached; ref()/unref() toggle m_hasRef, hasRef() reflects whether the port is currently holding an event-loop ref. close() and disentangle() call updateEventLoopRef() after detaching so a ref'd/transferred port doesn't leak its ref on the source thread.

  • Worker::dispatchOnline raced with isOnline() — it posted the open event to the parent before setting OnlineFlag, so a fast parent could call worker.getHeapSnapshot() after receiving 'online' and see isOnline() == falseERR_WORKER_NOT_RUNNING. The flag is now set first.

  • Worker teardown leaked cross-thread tasks queued after the last tick (pre-existing; surfaced by LSAN on test-worker-messaging.js). postTaskTo(workerCtxId, ...) landing between a worker's last tick_concurrent() and its context leaving the global map leaked a ConcurrentTask box plus its EventLoopTask* payload; the worker VM box is raw-dealloc'd so no Drop runs on the queue. WebWorker__teardownJSCVM now eagerly removes the context from the map (idempotent via m_isInContextsMap) and drains the concurrent queue via a new Bun__dropConcurrentCppTasksForWorker shim, mirroring what global_exit() already does for the main thread. The four new test-worker-messaging*.js are also added to no-validate-leaksan.txt alongside the other 58 test-worker-* entries, for the separate per-worker sourcemap/WebWorker-box leaks those tests share with every other worker test.

Merge with #29937 (MessagePortPipe)

#29937 replaced the MessagePortChannelProvider/Registry stack with MessagePortPipe. Conflict resolution:

  • ScriptExecutionContext.cpp: dropped my destructor reorder — m_processMessageWithMessagePortsSoonHandlers no longer exists on main.
  • MessagePort.{cpp,h}: started from main's new design and re-layered the updateEventLoopRef() state machine. Main's close()/disentangle() already removeAllEventListeners() (covers the listener-count path); added updateEventLoopRef() calls to also release any explicit (onmessage/.ref()) ref before the context is nulled.

Tests

Now passing (checked in under test/js/node/test/parallel/):

  • test-worker-messaging.js
  • test-worker-messaging-errors-handler.js
  • test-worker-messaging-errors-invalid.js
  • test-worker-messaging-errors-timeout.js

Plus unit tests in test/js/node/worker_threads/worker_threads.test.ts for postMessageToThread, the process.emit return value, the MessagePort.unref() keep-alive fix, and transferring a ref'd port.

Each node:worker_threads Worker now gets a dedicated MessageChannel to the
main thread, whose other end is plumbed through the existing serialized
[workerData, environmentData] array (extended to include the port at index
2) in JSWorker.cpp/Worker.cpp. The main thread keeps a threadId -> port
registry; postMessageToThread routes through it and uses a
SharedArrayBuffer + Atomics.waitAsync to await delivery / timeout.

New error codes:
  ERR_WORKER_MESSAGING_ERRORED
  ERR_WORKER_MESSAGING_FAILED
  ERR_WORKER_MESSAGING_SAME_THREAD
  ERR_WORKER_MESSAGING_TIMEOUT

Also fixes uncovered while getting the tests to pass:

- Native EventEmitter::emitForBindings (used by process) ignored the return
  value of fireEventListeners() and always returned true; process.emit()
  now returns false when there are no listeners.

- MessagePort.prototype.unref() was a no-op for transferred ports: adding a
  'message' listener called refEventLoop() through a counter unref() never
  touched. Unify into a single event-loop ref that is active when
  (m_hasRef && m_messageEventCount > 0); ref()/unref() toggle m_hasRef
  (default true) and hasRef() reflects whether the port is currently
  keeping the loop alive.

- ~ScriptExecutionContext: drain pending processMessageWithMessagePortsSoon
  handlers before setting m_inScriptExecutionContextDestructor so a
  MessagePort whose last ref is held by such a handler (e.g. a worker
  terminated before its mainThreadPort ever dispatched) can be destroyed
  without tripping the assertion in willDestroyDestructionObserver.

Passes:
  test/js/node/test/parallel/test-worker-messaging.js
  test/js/node/test/parallel/test-worker-messaging-errors-handler.js
  test/js/node/test/parallel/test-worker-messaging-errors-invalid.js
  test/js/node/test/parallel/test-worker-messaging-errors-timeout.js
@robobun robobun requested a review from alii as a code owner April 28, 2026 11:38
@coderabbitai

coderabbitai Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a main-thread–routed postMessageToThread API with SharedArrayBuffer/Atomics ack, four worker-messaging error intrinsics, MessagePort event-loop ref/unref refactor, Worker bootstrap carrying a main-thread port, emitter/destructor ordering tweaks, and multiple worker-messaging tests.

Changes

Cohort / File(s) Summary
Error Codes & Intrinsics
src/bun.js/bindings/ErrorCode.cpp, src/bun.js/bindings/ErrorCode.ts, src/js/builtins.d.ts
Adds four worker-messaging error codes (ERR_WORKER_MESSAGING_ERRORED, ERR_WORKER_MESSAGING_FAILED, ERR_WORKER_MESSAGING_SAME_THREAD, ERR_WORKER_MESSAGING_TIMEOUT): host-factory cases with hardcoded messages, mapping entries for generation, and TypeScript declarations.
Worker bootstrap / payload
src/bun.js/bindings/webcore/JSWorker.cpp, src/bun.js/bindings/webcore/Worker.cpp
Worker constructor/deserialize payload expanded to include a mainThreadPort (serialized array length 2→3); binding array capacity increased and mainThreadPort propagated to worker threads.
MessagePort ref/unref refactor
src/bun.js/bindings/webcore/MessagePort.h, src/bun.js/bindings/webcore/MessagePort.cpp, src/bun.js/bindings/webcore/JSMessagePort.cpp
Introduces updateEventLoopRef() centralizing ref/unref logic; separates configured intent (m_hasRef, m_wantsExplicitRef) from runtime ref state (m_isRefingEventLoop); jsRef/jsUnref toggle intent only; onmessageerror setter no longer calls jsRef.
Event emission & destructor ordering
src/bun.js/bindings/webcore/EventEmitter.cpp, src/bun.js/bindings/ScriptExecutionContext.cpp
emitForBindings now returns the boolean result from emit; ScriptExecutionContext destructor drains pending message-with-ports handlers before setting m_inScriptExecutionContextDestructor to avoid assertion races.
Worker messaging API (JS)
src/js/node/worker_threads.ts
Adds postMessageToThread(destination, value, transferList, timeout) that routes messages via main-thread MessagePort registry, uses SharedArrayBuffer/Atomics for ack/timeout, validates timeout, and exposes the API on the module.
Tests: worker messaging & errors
test/js/node/test/parallel/test-worker-messaging.js, test/js/node/test/parallel/test-worker-messaging-errors-handler.js, test/js/node/test/parallel/test-worker-messaging-errors-invalid.js, test/js/node/test/parallel/test-worker-messaging-errors-timeout.js, test/js/node/worker_threads/worker_threads.test.ts
Adds multiple tests covering success path, same-thread/unknown-thread/no-listener/handler-thrown/timeout error cases, Atomics-based ack behavior, and related regression coverage; increases some test timeouts and verifies postMessageToThread export.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the primary change: implementing the postMessageToThread API for node:worker_threads, matching the main objective of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering the feature implementation, implementation details, bug fixes, and test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 10:26 PM PT - Jun 18th, 2026

@robobun, your commit 06a74c00a5b1c7b6e626648264ffb9f2a736adf5 passed in Build #63447! 🎉


🧪   To try this PR locally:

bunx bun-pr 29828

That installs a local version of the PR into your bun-29828 executable, so you can run:

bun-29828 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 3 issues this PR may fix:

  1. Bun doesn't finish the execution when all workers' ports are closed #11760 - The MessagePort.unref() fix addresses why workers don't exit after parentPort.close() — the port kept the event loop alive because unref was a no-op on transferred ports
  2. Bun process does not exit after Tinypool destroy() completes (worker threads remain alive) #29173 - The MessagePort ref-counting fix addresses why the process hangs after Tinypool destroy() — transferred ports that can't be unrefed keep the event loop alive
  3. Consistent SEGFAULT when using MessagePort #16186 - The ScriptExecutionContext destructor ordering fix addresses the class of SEGFAULT during worker teardown with active MessagePorts

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #11760
Fixes #29173
Fixes #16186

🤖 Generated with Claude Code

The eval-source-leak fixture spawns six workers with 100 MiB of source
each and the environmentdata-inherit fixture creates five nested workers;
both exceed the 5s default in debug/ASAN builds.
@robobun robobun force-pushed the farm/6140e626/worker-postMessageToThread branch from c62df23 to c93e868 Compare April 28, 2026 11:57

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/ErrorCode.cpp`:
- Around line 2397-2399: The user-facing error message for
ErrorCode::ERR_WORKER_MESSAGING_SAME_THREAD contains a typo ("sent" -> "send");
update the string passed to createError in the
ErrorCode::ERR_WORKER_MESSAGING_SAME_THREAD case so it reads "Cannot send a
message to the same thread" (refer to the case handling
ErrorCode::ERR_WORKER_MESSAGING_SAME_THREAD and the call to
createError(globalObject, ErrorCode::ERR_WORKER_MESSAGING_SAME_THREAD, ...)).

In `@src/bun.js/bindings/webcore/MessagePort.cpp`:
- Around line 384-418: The refactor dropped tracking for "messageerror"
listeners so MessagePort no longer holds the event-loop ref when only
onmessageerror/addEventListener("messageerror") is used; update
MessagePort::updateEventLoopRef to consider both the existing
m_messageEventCount and a separate m_messageErrorEventCount (or a combined
predicate) when computing shouldRef, and update
MessagePort::onDidChangeListenerImpl to increment/decrement/clear that
error-listener counter when eventType == eventNames().messageerror (in addition
to the existing messageEvent handling) so the port acquires/releases the ref
consistently with JSMessagePort.cpp's jsRef usage.

In `@src/js/node/worker_threads.ts`:
- Around line 316-320: The loop that reads listeners with
process.listeners("workerMessage") fetches unwrapped callbacks so calling
listeners[i].$call(...) bypasses any .once() wrapper semantics; change the
lookup to use process.rawListeners("workerMessage") and invoke the returned
handlers directly (so the wrapper created by process.once/_onceWrap runs and
removes itself) when iterating, keeping the existing use of $call on the actual
stored handler if present; update the reference in this block around listeners /
response = WORKER_MESSAGING_RESULT_DELIVERED to use rawListeners instead of
listeners().

In `@test/js/node/worker_threads/worker_threads.test.ts`:
- Around line 616-619: Remove the flaky stderr assertion: drop the line that
asserts expect(stderr).toBe("") in the subprocess test (the block using proc,
stdout, stderr, exitCode), leaving the Promise.all capture and the remaining
assertions that check stdout.trim() === "exit 0" and exitCode === 0; this keeps
the regression signal while avoiding ASAN-startup-warning flakiness for runs
that spawn bunExe() with bunEnv.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 860b63e3-c157-45ff-8cf5-d7c692413a73

📥 Commits

Reviewing files that changed from the base of the PR and between 4d615e8 and 731b331.

📒 Files selected for processing (15)
  • src/bun.js/bindings/ErrorCode.cpp
  • src/bun.js/bindings/ErrorCode.ts
  • src/bun.js/bindings/ScriptExecutionContext.cpp
  • src/bun.js/bindings/webcore/EventEmitter.cpp
  • src/bun.js/bindings/webcore/JSWorker.cpp
  • src/bun.js/bindings/webcore/MessagePort.cpp
  • src/bun.js/bindings/webcore/MessagePort.h
  • src/bun.js/bindings/webcore/Worker.cpp
  • src/js/builtins.d.ts
  • src/js/node/worker_threads.ts
  • test/js/node/test/parallel/test-worker-messaging-errors-handler.js
  • test/js/node/test/parallel/test-worker-messaging-errors-invalid.js
  • test/js/node/test/parallel/test-worker-messaging-errors-timeout.js
  • test/js/node/test/parallel/test-worker-messaging.js
  • test/js/node/worker_threads/worker_threads.test.ts

Comment thread src/jsc/bindings/ErrorCode.cpp
Comment thread src/bun.js/bindings/webcore/MessagePort.cpp
Comment thread src/js/node/worker_threads.ts
Comment thread test/js/node/worker_threads/worker_threads.test.ts
Avoids leaving a stale listener on process since the native EventEmitter
handles once-removal inside emit(), which receiveMessageFromWorker bypasses.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/js/node/worker_threads/worker_threads.test.ts (1)

616-619: ⚠️ Potential issue | 🟠 Major

Drop the unconditional empty-stderr assertion in this subprocess test.

Line 617 is brittle for JS-executing subprocesses in ASAN/debug environments; stdout.trim() === "exit 0" + exitCode === 0 already captures the regression signal.

Suggested patch
-  const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]);
-  expect(stderr).toBe("");
+  const [stdout, _stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]);
   expect(stdout.trim()).toBe("exit 0");
   expect(exitCode).toBe(0);

Based on learnings, runtime subprocess tests that execute JS via bunExe() + bunEnv should not assert empty stderr because known ASAN startup warnings can appear.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/js/node/worker_threads/worker_threads.test.ts` around lines 616 - 619,
Remove the brittle unconditional stderr assertion in the subprocess test by
deleting the expect(stderr).toBe("") check (the assertion referencing the stderr
variable) and keep the existing checks that stdout.trim() equals "exit 0" and
exitCode equals 0; this avoids failures from ASAN/debug startup warnings emitted
to proc.stderr while preserving the true regression signals from stdout and
exitCode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/js/node/worker_threads/worker_threads.test.ts`:
- Around line 616-619: Remove the brittle unconditional stderr assertion in the
subprocess test by deleting the expect(stderr).toBe("") check (the assertion
referencing the stderr variable) and keep the existing checks that stdout.trim()
equals "exit 0" and exitCode equals 0; this avoids failures from ASAN/debug
startup warnings emitted to proc.stderr while preserving the true regression
signals from stdout and exitCode.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: afae3bd9-db84-4af9-aa8e-afbec51179f9

📥 Commits

Reviewing files that changed from the base of the PR and between ac00c9e and 76848e4.

📒 Files selected for processing (1)
  • test/js/node/worker_threads/worker_threads.test.ts

robobun and others added 2 commits April 28, 2026 15:00
Setting port.onmessage / calling port.ref() on a non-transferred
MessagePort needs to keep the event loop alive even though
m_messageEventCount is not tracked for such ports (onDidChangeListener
is only wired for transferred ports). Introduce m_wantsExplicitRef so
jsRef() marks the port as wanting a ref independently of the listener
count, and make updateEventLoopRef() consider either source.

Fixes message-channel.test.ts 'message channel created on other thread'
where the worker in create-port-worker.js exited before the main thread
could round-trip through port1.onmessage.
Comment thread src/js/node/worker_threads.ts
Comment thread src/jsc/bindings/webcore/MessagePort.cpp
robobun added 2 commits April 29, 2026 01:51
…t setup

- Drop jsRef() from the onmessageerror setter so a port with only
  onmessageerror set does not keep the event loop alive, matching Node.
  The onmessage setter still refs.
- Document that the worker side of the postMessageToThread channel is
  wired lazily on first require('node:worker_threads'), so sending with
  no timeout to a worker that never imports it stays pending instead of
  rejecting with ERR_WORKER_MESSAGING_FAILED like Node does.
These spawn --smol subprocesses that each create a worker; the first
execArgv test does it twice and sits at ~4.5-5s in debug builds, which
tips over the 5s default under load.
Comment thread src/js/node/worker_threads.ts Outdated
Worker exit closes the port but does not notify already-pending
Atomics.waitAsync waiters, so the promise stays pending indefinitely
without a timeout, not 'until the worker exits'.
Comment thread src/jsc/bindings/webcore/MessagePort.cpp
The JS close() binding already calls jsUnref() before impl.close(), so
this was not observable from JS, but contextDestroyed() and any other
native close() callers bypass that. Add updateEventLoopRef() at the end
of close() so the !m_isDetached predicate is honoured regardless of the
entry point.
Comment thread src/js/node/worker_threads.ts
robobun added 2 commits April 29, 2026 03:59
Terminating an intermediate worker leaves its grandchildren in the
root's threadsPorts map since destroyMainThreadPort runs from the
parent's #onClose. Node's kOnExit is identically parent-driven, so
this is upstream parity; document it alongside the lazy-setup note.
Most tests in this file spawn a subprocess or a worker, each of which
takes 1-3s in debug/ASAN builds; under heavy load several were tipping
past the 5s default. A file-level default is simpler than continuing
to bump individual tests.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/js/node/worker_threads.ts (1)

325-331: ⚠️ Potential issue | 🟠 Major

process.once("workerMessage") is broken on this delivery path.

Lines 328-330 call the functions from process.listeners("workerMessage") directly, so the native JSEventEmitter never runs its once-removal path. A once listener will keep firing on every later delivery instead of unregistering after the first one.

Based on learnings, Bun's native JSEventEmitter only removes once listeners through the native emit path; direct invocation leaves them registered.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/js/node/worker_threads.ts` around lines 325 - 331, Directly invoking
listeners via listeners[i].$call bypasses JSEventEmitter's once-removal logic so
process.once("workerMessage") handlers never unregister; after calling each
listener (listeners[i].$call(process, value, source)), detect if that listener
is a one-time listener (the wrapper exposes the once flag, e.g.
listeners[i].$once or similar) and if so call
process.removeListener("workerMessage", listeners[i]) (or the appropriate remove
API) immediately after invocation; update the loop in the worker messaging
delivery code (the block invoking process.listeners("workerMessage") and using
$call and WORKER_MESSAGING_RESULT_DELIVERED) to remove one-time listeners so
once() behaves correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/js/node/worker_threads.ts`:
- Around line 380-388: postMessageToThread currently only validates timeout; add
upfront validation for destination and transferList using the validators from
internal/validators before any routing/cloning occurs: call the appropriate
validator(s) to ensure destination is the expected numeric/thread-id type and
transferList is an array of transferable objects (validate types and array
contents), and throw the corresponding $ERR_* error codes (e.g.,
$ERR_INVALID_ARG_TYPE / $ERR_INVALID_ARG_VALUE) on failure; keep these checks in
postMessageToThread alongside the existing validateNumber(timeout, "timeout", 0)
usage and import the needed validator functions from internal/validators.
- Around line 360-378: Ensure the main-thread route is wired before user code
runs by initializing mainThreadPort (/_mainThreadPort) from the worker bootstrap
payload and invoking setupMainThreadPort early in bootstrap so the worker-side
message listener exists even if user code never imports node:worker_threads;
update destroyMainThreadPort to not only post the kUnregisterMainThreadPort but
also to fail/drain any in-flight SAB waiters and pending deliveries created by
postMessageToThread (reject their promises/callbacks with an explicit error)
when the port is absent or closing; finally, ensure postMessageToThread checks
for port closure and returns/rejects immediately for outstanding messages (and
when appropriate call handleMessageFromThread for fallback) so sends cannot hang
forever — modify functions postMessageToThread, destroyMainThreadPort, and
setupMainThreadPort accordingly.

---

Duplicate comments:
In `@src/js/node/worker_threads.ts`:
- Around line 325-331: Directly invoking listeners via listeners[i].$call
bypasses JSEventEmitter's once-removal logic so process.once("workerMessage")
handlers never unregister; after calling each listener
(listeners[i].$call(process, value, source)), detect if that listener is a
one-time listener (the wrapper exposes the once flag, e.g. listeners[i].$once or
similar) and if so call process.removeListener("workerMessage", listeners[i])
(or the appropriate remove API) immediately after invocation; update the loop in
the worker messaging delivery code (the block invoking
process.listeners("workerMessage") and using $call and
WORKER_MESSAGING_RESULT_DELIVERED) to remove one-time listeners so once()
behaves correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c1229540-d96c-4904-b7bb-e7d8fd8bc592

📥 Commits

Reviewing files that changed from the base of the PR and between f5c6c08 and a56a2fe.

📒 Files selected for processing (4)
  • src/bun.js/bindings/webcore/JSMessagePort.cpp
  • src/bun.js/bindings/webcore/MessagePort.cpp
  • src/js/node/worker_threads.ts
  • test/js/node/worker_threads/worker_threads.test.ts

Comment thread src/js/node/worker_threads.ts
Comment thread src/js/node/worker_threads.ts
Comment thread src/jsc/bindings/webcore/JSMessagePort.cpp
Node's onmessage setter branches on typeof value === 'function', calling
ref()/start() for functions and unref()/stop() otherwise. Previously Bun
called jsRef() unconditionally, so port.onmessage = fn; port.onmessage =
null left the port refing the event loop.
Comment thread src/jsc/bindings/webcore/JSWorker.cpp
Third teardown path alongside close() and disentangle(): a port with
onmessage/.ref() whose peer has closed becomes GC-collectible via
hasPendingActivity() → PeerClosed, and ~MessagePort() runs with
m_isRefingEventLoop still set. Nothing on that path called
context->unrefEventLoop(), so the process hung with a permanent +1.

Release it in the destructor body; scriptExecutionContext() is still
valid there (ContextDestructionObserver's base destructor runs after).
Comment thread src/jsc/bindings/webcore/MessagePort.cpp
robobun added 2 commits May 4, 2026 03:24
Node's setupPortReferencing installs a 'newListener' hook that calls
this.ref() when the first 'message' listener is added. Mirror that in
onDidChangeListenerImpl's Add case so port.unref(); port.on('message', fn)
re-refs (matching Node), while port.on('message', fn); port.unref()
stays unref'd (also matching Node).

Before this, the updateEventLoopRef() predicate gated on m_hasRef which
stayed false after unref(), so the unref→on ordering left the port
without an event-loop ref — a regression from the refactor since the
old Add case reffed unconditionally.
Comment thread src/jsc/bindings/webcore/MessagePort.cpp
robobun added 2 commits May 4, 2026 04:09
…istener

Symmetric counterpart to 5f762a4: Node's setupPortReferencing installs
a 'removeListener' hook that calls this.unref() when the last 'message'
listener is removed. Mirror that in the Remove/Clear N→0 transition so
port.ref(); port.on(...); port.off(...); port.hasRef() → false,
matching Node. Not a regression (pre-PR behaved the same), but
completes the newListener/removeListener pair.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior review threads are resolved and the bug-hunting pass found nothing new, but this rewrites the MessagePort event-loop ref state machine and adds cross-thread messaging plumbing through C++ worker bootstrap — worth a human pass on the concurrency/lifetime bits before merging.

Extended reasoning...

Overview

This PR implements worker_threads.postMessageToThread (Node 22+) and, in the course of doing so, substantially rewrites the MessagePort event-loop ref/unref state machine. It touches 18 files: C++ bindings (MessagePort.{cpp,h}, MessagePortPipe.{cpp,h}, Worker.cpp, JSWorker.cpp, JSMessagePort.cpp, EventEmitter.cpp), the JS node:worker_threads module (~220 new lines of routing/SAB-ack logic), four new error codes, and ~240 lines of new tests plus four upstream Node parallel tests.

Beyond the headline feature, it also ships several adjacent fixes: process.emit() return value, Worker::dispatchOnline ordering vs isOnline(), a PeerClosed mirror bit in MessagePortPipe for a single-atomic-load hasPendingActivity() snapshot, and ref-release on all three MessagePort teardown paths (close(), disentangle(), ~MessagePort()).

Security risks

None identified. The new API surface is in-process inter-thread messaging; values go through the existing structured-clone path. No auth, crypto, filesystem, or network surface is touched.

Level of scrutiny

High. The MessagePort ref state machine governs whether processes/workers exit, and this PR replaced the prior ad-hoc m_hasRef/listener-counter logic with a centralized updateEventLoopRef() predicate plus three new state booleans. Over the course of review, seven distinct edge cases in that state machine were found and fixed iteratively (onmessageerror, onmessage = null, close(), disentangle(), ~MessagePort(), unref()→on(), on()→off() after .ref()), plus a threadId === -1 cleanup bug and a MessagePortPipe GC-snapshot race. That iteration history is a signal that the surface is subtle. The MessagePortPipe change also introduces a new cross-thread fetch_or(PeerClosed) under the peer's lock, and Worker::dispatchOnline reorders the OnlineFlag store relative to the parent-thread open event.

Other factors

  • The PR has extensive test coverage (4 upstream Node tests + 9 new bun-test cases including regression tests for each ref-leak fix).
  • All 14 review threads (CodeRabbit + claude) are resolved; the author was responsive and each fix is well-reasoned.
  • No human reviewer has looked at it yet — only bot review so far.
  • CI build #50874 is in progress for the latest commit.
  • The PR description's "Merge with #29937" section indicates this was rebased over a separate MessagePortPipe rewrite, which adds merge-resolution risk.

Given the concurrency-sensitive C++ changes, the destructor-time unrefEventLoop() call, and the number of edge cases already found during review, a human pass on MessagePort.cpp/MessagePortPipe.cpp is warranted.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing new from this pass — all earlier inline findings have been addressed. Still deferring to a human given the scope: the MessagePort event-loop ref state machine is fully rewritten, MessagePortPipe gains a new PeerClosed bit that changes hasPendingActivity() / GC-visibility, and the worker bootstrap payload format changes — all concurrency- and lifetime-sensitive enough to want eyes on.

Extended reasoning...

Overview

This PR implements worker_threads.postMessageToThread (Node 22+) and, along the way, rewrites the MessagePort event-loop ref-counting state machine and fixes several adjacent bugs. 18 files touched across:

  • JS runtime: src/js/node/worker_threads.ts — ~220 new lines implementing the main-thread-routed messaging protocol with SharedArrayBuffer + Atomics.waitAsync ack, plus a new per-worker MessageChannel and threadId → port registry. Also caches #threadId so #onClose can unregister after the native getter starts returning -1.
  • C++ MessagePort ref state machine: MessagePort.{h,cpp}, JSMessagePort.cppjsRef/jsUnref/onDidChangeListenerImpl rewritten around a single updateEventLoopRef() predicate (m_hasRef && (m_messageEventCount > 0 || m_wantsExplicitRef) && !m_isDetached). Over the course of review this was extended to release the ref in close(), disentangle(), and ~MessagePort(), and to mirror Node's newListener/removeListener semantics on the 0↔1 listener transitions.
  • MessagePortPipe: adds a PeerClosed mirror bit so hasPendingActivity() reads a single-atomic-load consistent snapshot instead of two separate loads (avoiding a torn {before-send, after-close} observation that could collect a wrapper with a message in flight).
  • Worker bootstrap: JSWorker.cpp / Worker.cpp / WorkerOptions.h — extends the serialized [workerData, environmentData] payload to length 3 with mainThreadPort, and reorders dispatchOnline to set OnlineFlag before posting open to the parent.
  • EventEmitter::emitForBindings: now returns emit()'s actual boolean instead of always true, so process.emit('x') returns false with no listeners.
  • Four new error codes; four upstream Node tests checked in; ~230 lines of new unit tests in worker_threads.test.ts.

Security risks

None identified. Messaging is intra-process between threads the user already controls; no auth, crypto, filesystem, or network surface is touched. The new postMessageToThread arguments are routed through existing MessagePort.postMessage / structured-clone, so transfer-list validation and DataCloneError semantics are inherited.

Level of scrutiny

High. The MessagePort ref state machine directly governs process/worker lifetime (whether the event loop stays alive), and the MessagePortPipe change alters what the GC thread observes via hasPendingActivity() — both are concurrency-sensitive C++ where subtle mistakes manifest as hangs, premature exits, or rare wrapper-collected-with-message-in-flight races. The review history bears this out: ten distinct issues were surfaced and fixed across seven follow-up commits (the #onClose / threadId == -1 leak, three separate teardown paths missing updateEventLoopRef(), onmessage/onmessageerror setter parity, and the Add/Remove 0↔1 transitions). That iteration converged well, but the density of edge cases is itself a signal that a human should sign off on the final state.

Other factors

  • All prior inline comments (mine and CodeRabbit's) are resolved; the bug-hunting system found nothing this round.
  • A documented known-limitation block in worker_threads.ts covers the lazy-setup hang, the no-notify-on-exit case, and orphaned grandchildren — all upstream-parity or deferred-to-native-bootstrap.
  • Test coverage is solid: four upstream Node parallel tests plus targeted Bun unit tests for each ref-leak fix (close, disentangle, GC-after-peer-close, unref-then-on, on-then-unref, exited-worker-id rejection).
  • The emitForBindings return-value change is a behavior change to process.emit() for the no-listener case — correct per Node, but worth a human glance for any internal callers that relied on the old always-true.
  • CI build #51731 is in progress for the latest merge-from-main commit.

robobun and others added 2 commits May 15, 2026 11:48
Comment thread src/jsc/bindings/webcore/MessagePort.h Outdated
robobun added 3 commits May 15, 2026 12:52
…ned with ErrorCode.rs

The Rust rewrite added src/jsc/ErrorCode.rs with hardcoded ordinals
derived from ErrorCode.ts. Inserting the four new codes alphabetically
(between ERR_WORKER_INIT_FAILED and ERR_WORKER_NOT_RUNNING) shifted
every later ordinal in the generated C++ enum but not in ErrorCode.rs,
so Rust callers like NativeZstd::jsc::ErrorCode::ZLIB_INITIALIZATION_FAILED
(= ErrorCode(264)) now pointed at ERR_WORKER_MESSAGING_SAME_THREAD in
the C++ table — test-zlib-zstd.js got the wrong error.code.

Move the four codes to the end of ErrorCode.ts (ordinals 312–315) and
add the matching entries to ErrorCode.rs. Also sync the MessagePort.h
state-machine field comments with the newListener/removeListener
behaviour added in 5f762a4/ca91e0ea.
bake-codegen.ts passed the raw minified CSS (starting with '*{...}')
as the OVERLAY_CSS define value, relying on Bun.build's auto-quote
fallback to turn it into a string literal. That fallback was broken
for a window of main commits (between the define-parser refactor and
314d044's '*'/'?' JSON-lexer fix), so any bootstrap bun in that
range fails bake-codegen with 'Operators are not allowed in JSON'.

JSON.stringify the value explicitly (like 'side' already does) so
codegen works regardless of which bun runs it. OVERLAY_CSS is declared
as 'string' and used as sheet.replace(OVERLAY_CSS), so a string
literal is the intended shape.
@robobun

robobun commented May 15, 2026

Copy link
Copy Markdown
Collaborator Author

Status

PR is green on everything it touches. On the latest build (#63280, 282 passed / 4 failed) the only failures are:

None of these touch anything in this diff. Zero worker/MessagePort/postMessageToThread failures on any platform, including x64-asan where the previous ConcurrentTask leak was caught (fixed in 7a5bc2d).

Four Node parallel tests now passing:

  • test-worker-messaging.js
  • test-worker-messaging-errors-handler.js
  • test-worker-messaging-errors-invalid.js
  • test-worker-messaging-errors-timeout.js

Plus 39 unit tests in worker_threads.test.ts covering postMessageToThread and the MessagePort ref state-machine edge cases found during review.

All review threads resolved. Ready for maintainer review/merge.

robobun and others added 9 commits May 21, 2026 17:52
…postMessageToThread

# Conflicts:
#	src/runtime/cli/run_command.rs
#	src/runtime/jsc_hooks.rs
…postMessageToThread

# Conflicts:
#	src/jsc/ErrorCode.rs
#	src/jsc/bindings/ErrorCode.ts
…postMessageToThread

# Conflicts:
#	src/jsc/ErrorCode.rs
#	src/jsc/bindings/ErrorCode.ts
A worker's EventLoop::concurrent_tasks queue was never drained on
shutdown: EventLoop::deinit() only walks self.tasks, and the VM box is
raw-dealloc'd so no Drop runs. Any postTaskTo(workerCtxId, ...) that
landed between the worker's last tick_concurrent() and its context
leaving the global map leaked a ConcurrentTask box plus its
EventLoopTask* payload. The main-thread exit path drains this queue in
global_exit(); workers had no equivalent.

test-worker-messaging.js makes the window trivially reachable: seven
threads all subscribed to the same BroadcastChannel, each posting as
the others tear down.

Eagerly removeFromContextsMap() in WebWorker__teardownJSCVM (so
postTaskTo returns false once teardown has begun), then drain the
queue via a new Bun__dropConcurrentCppTasksForWorker shim over the
existing EventLoop::drop_concurrent_cpp_tasks. The map lock held by
postTaskTo across lookup+enqueue makes the gate-then-drain race-free.
removeFromContextsMap() is now idempotent via m_isInContextsMap so
~GlobalObject's later call is a no-op.

Also add the four new test-worker-messaging*.js files to
no-validate-leaksan.txt alongside the other 58 test-worker-* entries,
for the pre-existing per-worker ParsedSourceMap/WebWorker-box leaks
those tests share with every other worker test.
Comment thread src/jsc/bindings/webcore/MessagePortPipe.h
isOtherSideOpen was folded into the PeerClosed mirror bit read by
isActivityPending; the test comment still named the removed function.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found — all earlier review threads are resolved and the latest commit is just the comment sync. This touches a lot of concurrency-sensitive native code (the MessagePort ref/unref state machine, MessagePortPipe GC-liveness atomics, worker-teardown task draining across the C++/Rust boundary), so it's worth a maintainer pass before merge.

Extended reasoning...

Overview

Implements worker_threads.postMessageToThread and, along the way, substantially reworks several native subsystems across 25 files: the MessagePort event-loop ref/unref state machine (MessagePort.{cpp,h}, JSMessagePort.cpp), the MessagePortPipe close protocol (new PeerClosed mirror bit + isActivityPending() replacing isOtherSideOpen(), which changes hasPendingActivity() GC-liveness semantics), worker bootstrap payload plumbing (JSWorker.cpp/Worker.cpp/WorkerOptions.h), ScriptExecutionContext map-removal idempotency, a new Rust FFI shim (Bun__dropConcurrentCppTasksForWorker) to drain leaked cross-thread tasks during worker teardown, the native EventEmitter::emitForBindings return value, four new error codes, and ~250 lines of new routing logic in worker_threads.ts.

Security risks

None identified. The new API surface is in-process thread-to-thread messaging with the same trust boundary as existing worker.postMessage; no auth, crypto, filesystem, or network exposure.

Level of scrutiny

High. The MessagePort ref state machine alone went through seven fix commits during review (onmessageerror, onmessage = null, close(), disentangle(), ~MessagePort(), unref();on(), ref();on();off()), plus a 🔴 fix for #onClose reading threadId === -1 that would have leaked every worker's port. The PeerClosed mirror bit changes when MessagePort wrappers become GC-collectible, and the WebWorker__teardownJSCVM change reorders context-map removal relative to GC and adds a cross-language drain hook — both are subtle lifetime/ordering changes in hot threading paths. These are exactly the kinds of changes where a maintainer familiar with Bun's worker/event-loop internals should confirm the invariants.

Other factors

Test coverage is good (4 upstream Node tests + ~10 new unit tests including regression tests for each ref-state-machine fix), CI is green on everything this PR touches, and all 15 prior review threads (mine and CodeRabbit's) are resolved. The remaining open question is design-level confidence in the native concurrency changes, not correctness bugs I can point to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants