node:worker_threads: implement postMessageToThread [1bx5ty]#29828
node:worker_threads: implement postMessageToThread [1bx5ty]#29828robobun wants to merge 48 commits into
Conversation
Each node:worker_threads Worker now gets a dedicated MessageChannel to the main thread, whose other end is plumbed through the existing serialized [workerData, environmentData] array (extended to include the port at index 2) in JSWorker.cpp/Worker.cpp. The main thread keeps a threadId -> port registry; postMessageToThread routes through it and uses a SharedArrayBuffer + Atomics.waitAsync to await delivery / timeout. New error codes: ERR_WORKER_MESSAGING_ERRORED ERR_WORKER_MESSAGING_FAILED ERR_WORKER_MESSAGING_SAME_THREAD ERR_WORKER_MESSAGING_TIMEOUT Also fixes uncovered while getting the tests to pass: - Native EventEmitter::emitForBindings (used by process) ignored the return value of fireEventListeners() and always returned true; process.emit() now returns false when there are no listeners. - MessagePort.prototype.unref() was a no-op for transferred ports: adding a 'message' listener called refEventLoop() through a counter unref() never touched. Unify into a single event-loop ref that is active when (m_hasRef && m_messageEventCount > 0); ref()/unref() toggle m_hasRef (default true) and hasRef() reflects whether the port is currently keeping the loop alive. - ~ScriptExecutionContext: drain pending processMessageWithMessagePortsSoon handlers before setting m_inScriptExecutionContextDestructor so a MessagePort whose last ref is held by such a handler (e.g. a worker terminated before its mainThreadPort ever dispatched) can be destroyed without tripping the assertion in willDestroyDestructionObserver. Passes: test/js/node/test/parallel/test-worker-messaging.js test/js/node/test/parallel/test-worker-messaging-errors-handler.js test/js/node/test/parallel/test-worker-messaging-errors-invalid.js test/js/node/test/parallel/test-worker-messaging-errors-timeout.js
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a main-thread–routed postMessageToThread API with SharedArrayBuffer/Atomics ack, four worker-messaging error intrinsics, MessagePort event-loop ref/unref refactor, Worker bootstrap carrying a main-thread port, emitter/destructor ordering tweaks, and multiple worker-messaging tests. Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
Updated 10:26 PM PT - Jun 18th, 2026
✅ @robobun, your commit 06a74c00a5b1c7b6e626648264ffb9f2a736adf5 passed in 🧪 To try this PR locally: bunx bun-pr 29828That installs a local version of the PR into your bun-29828 --bun |
|
Found 3 issues this PR may fix:
🤖 Generated with Claude Code |
The eval-source-leak fixture spawns six workers with 100 MiB of source each and the environmentdata-inherit fixture creates five nested workers; both exceed the 5s default in debug/ASAN builds.
c62df23 to
c93e868
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/bindings/ErrorCode.cpp`:
- Around line 2397-2399: The user-facing error message for
ErrorCode::ERR_WORKER_MESSAGING_SAME_THREAD contains a typo ("sent" -> "send");
update the string passed to createError in the
ErrorCode::ERR_WORKER_MESSAGING_SAME_THREAD case so it reads "Cannot send a
message to the same thread" (refer to the case handling
ErrorCode::ERR_WORKER_MESSAGING_SAME_THREAD and the call to
createError(globalObject, ErrorCode::ERR_WORKER_MESSAGING_SAME_THREAD, ...)).
In `@src/bun.js/bindings/webcore/MessagePort.cpp`:
- Around line 384-418: The refactor dropped tracking for "messageerror"
listeners so MessagePort no longer holds the event-loop ref when only
onmessageerror/addEventListener("messageerror") is used; update
MessagePort::updateEventLoopRef to consider both the existing
m_messageEventCount and a separate m_messageErrorEventCount (or a combined
predicate) when computing shouldRef, and update
MessagePort::onDidChangeListenerImpl to increment/decrement/clear that
error-listener counter when eventType == eventNames().messageerror (in addition
to the existing messageEvent handling) so the port acquires/releases the ref
consistently with JSMessagePort.cpp's jsRef usage.
In `@src/js/node/worker_threads.ts`:
- Around line 316-320: The loop that reads listeners with
process.listeners("workerMessage") fetches unwrapped callbacks so calling
listeners[i].$call(...) bypasses any .once() wrapper semantics; change the
lookup to use process.rawListeners("workerMessage") and invoke the returned
handlers directly (so the wrapper created by process.once/_onceWrap runs and
removes itself) when iterating, keeping the existing use of $call on the actual
stored handler if present; update the reference in this block around listeners /
response = WORKER_MESSAGING_RESULT_DELIVERED to use rawListeners instead of
listeners().
In `@test/js/node/worker_threads/worker_threads.test.ts`:
- Around line 616-619: Remove the flaky stderr assertion: drop the line that
asserts expect(stderr).toBe("") in the subprocess test (the block using proc,
stdout, stderr, exitCode), leaving the Promise.all capture and the remaining
assertions that check stdout.trim() === "exit 0" and exitCode === 0; this keeps
the regression signal while avoiding ASAN-startup-warning flakiness for runs
that spawn bunExe() with bunEnv.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 860b63e3-c157-45ff-8cf5-d7c692413a73
📒 Files selected for processing (15)
src/bun.js/bindings/ErrorCode.cppsrc/bun.js/bindings/ErrorCode.tssrc/bun.js/bindings/ScriptExecutionContext.cppsrc/bun.js/bindings/webcore/EventEmitter.cppsrc/bun.js/bindings/webcore/JSWorker.cppsrc/bun.js/bindings/webcore/MessagePort.cppsrc/bun.js/bindings/webcore/MessagePort.hsrc/bun.js/bindings/webcore/Worker.cppsrc/js/builtins.d.tssrc/js/node/worker_threads.tstest/js/node/test/parallel/test-worker-messaging-errors-handler.jstest/js/node/test/parallel/test-worker-messaging-errors-invalid.jstest/js/node/test/parallel/test-worker-messaging-errors-timeout.jstest/js/node/test/parallel/test-worker-messaging.jstest/js/node/worker_threads/worker_threads.test.ts
Avoids leaving a stale listener on process since the native EventEmitter handles once-removal inside emit(), which receiveMessageFromWorker bypasses.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/js/node/worker_threads/worker_threads.test.ts (1)
616-619:⚠️ Potential issue | 🟠 MajorDrop the unconditional empty-stderr assertion in this subprocess test.
Line 617 is brittle for JS-executing subprocesses in ASAN/debug environments;
stdout.trim() === "exit 0"+exitCode === 0already captures the regression signal.Suggested patch
- const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]); - expect(stderr).toBe(""); + const [stdout, _stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]); expect(stdout.trim()).toBe("exit 0"); expect(exitCode).toBe(0);Based on learnings, runtime subprocess tests that execute JS via
bunExe()+bunEnvshould not assert empty stderr because known ASAN startup warnings can appear.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/js/node/worker_threads/worker_threads.test.ts` around lines 616 - 619, Remove the brittle unconditional stderr assertion in the subprocess test by deleting the expect(stderr).toBe("") check (the assertion referencing the stderr variable) and keep the existing checks that stdout.trim() equals "exit 0" and exitCode equals 0; this avoids failures from ASAN/debug startup warnings emitted to proc.stderr while preserving the true regression signals from stdout and exitCode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/js/node/worker_threads/worker_threads.test.ts`:
- Around line 616-619: Remove the brittle unconditional stderr assertion in the
subprocess test by deleting the expect(stderr).toBe("") check (the assertion
referencing the stderr variable) and keep the existing checks that stdout.trim()
equals "exit 0" and exitCode equals 0; this avoids failures from ASAN/debug
startup warnings emitted to proc.stderr while preserving the true regression
signals from stdout and exitCode.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: afae3bd9-db84-4af9-aa8e-afbec51179f9
📒 Files selected for processing (1)
test/js/node/worker_threads/worker_threads.test.ts
Setting port.onmessage / calling port.ref() on a non-transferred MessagePort needs to keep the event loop alive even though m_messageEventCount is not tracked for such ports (onDidChangeListener is only wired for transferred ports). Introduce m_wantsExplicitRef so jsRef() marks the port as wanting a ref independently of the listener count, and make updateEventLoopRef() consider either source. Fixes message-channel.test.ts 'message channel created on other thread' where the worker in create-port-worker.js exited before the main thread could round-trip through port1.onmessage.
…t setup
- Drop jsRef() from the onmessageerror setter so a port with only
onmessageerror set does not keep the event loop alive, matching Node.
The onmessage setter still refs.
- Document that the worker side of the postMessageToThread channel is
wired lazily on first require('node:worker_threads'), so sending with
no timeout to a worker that never imports it stays pending instead of
rejecting with ERR_WORKER_MESSAGING_FAILED like Node does.
These spawn --smol subprocesses that each create a worker; the first execArgv test does it twice and sits at ~4.5-5s in debug builds, which tips over the 5s default under load.
Worker exit closes the port but does not notify already-pending Atomics.waitAsync waiters, so the promise stays pending indefinitely without a timeout, not 'until the worker exits'.
The JS close() binding already calls jsUnref() before impl.close(), so this was not observable from JS, but contextDestroyed() and any other native close() callers bypass that. Add updateEventLoopRef() at the end of close() so the !m_isDetached predicate is honoured regardless of the entry point.
Terminating an intermediate worker leaves its grandchildren in the root's threadsPorts map since destroyMainThreadPort runs from the parent's #onClose. Node's kOnExit is identically parent-driven, so this is upstream parity; document it alongside the lazy-setup note.
Most tests in this file spawn a subprocess or a worker, each of which takes 1-3s in debug/ASAN builds; under heavy load several were tipping past the 5s default. A file-level default is simpler than continuing to bump individual tests.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/js/node/worker_threads.ts (1)
325-331:⚠️ Potential issue | 🟠 Major
process.once("workerMessage")is broken on this delivery path.Lines 328-330 call the functions from
process.listeners("workerMessage")directly, so the nativeJSEventEmitternever runs its once-removal path. Aoncelistener will keep firing on every later delivery instead of unregistering after the first one.Based on learnings, Bun's native
JSEventEmitteronly removesoncelisteners through the native emit path; direct invocation leaves them registered.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/js/node/worker_threads.ts` around lines 325 - 331, Directly invoking listeners via listeners[i].$call bypasses JSEventEmitter's once-removal logic so process.once("workerMessage") handlers never unregister; after calling each listener (listeners[i].$call(process, value, source)), detect if that listener is a one-time listener (the wrapper exposes the once flag, e.g. listeners[i].$once or similar) and if so call process.removeListener("workerMessage", listeners[i]) (or the appropriate remove API) immediately after invocation; update the loop in the worker messaging delivery code (the block invoking process.listeners("workerMessage") and using $call and WORKER_MESSAGING_RESULT_DELIVERED) to remove one-time listeners so once() behaves correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/js/node/worker_threads.ts`:
- Around line 380-388: postMessageToThread currently only validates timeout; add
upfront validation for destination and transferList using the validators from
internal/validators before any routing/cloning occurs: call the appropriate
validator(s) to ensure destination is the expected numeric/thread-id type and
transferList is an array of transferable objects (validate types and array
contents), and throw the corresponding $ERR_* error codes (e.g.,
$ERR_INVALID_ARG_TYPE / $ERR_INVALID_ARG_VALUE) on failure; keep these checks in
postMessageToThread alongside the existing validateNumber(timeout, "timeout", 0)
usage and import the needed validator functions from internal/validators.
- Around line 360-378: Ensure the main-thread route is wired before user code
runs by initializing mainThreadPort (/_mainThreadPort) from the worker bootstrap
payload and invoking setupMainThreadPort early in bootstrap so the worker-side
message listener exists even if user code never imports node:worker_threads;
update destroyMainThreadPort to not only post the kUnregisterMainThreadPort but
also to fail/drain any in-flight SAB waiters and pending deliveries created by
postMessageToThread (reject their promises/callbacks with an explicit error)
when the port is absent or closing; finally, ensure postMessageToThread checks
for port closure and returns/rejects immediately for outstanding messages (and
when appropriate call handleMessageFromThread for fallback) so sends cannot hang
forever — modify functions postMessageToThread, destroyMainThreadPort, and
setupMainThreadPort accordingly.
---
Duplicate comments:
In `@src/js/node/worker_threads.ts`:
- Around line 325-331: Directly invoking listeners via listeners[i].$call
bypasses JSEventEmitter's once-removal logic so process.once("workerMessage")
handlers never unregister; after calling each listener
(listeners[i].$call(process, value, source)), detect if that listener is a
one-time listener (the wrapper exposes the once flag, e.g. listeners[i].$once or
similar) and if so call process.removeListener("workerMessage", listeners[i])
(or the appropriate remove API) immediately after invocation; update the loop in
the worker messaging delivery code (the block invoking
process.listeners("workerMessage") and using $call and
WORKER_MESSAGING_RESULT_DELIVERED) to remove one-time listeners so once()
behaves correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c1229540-d96c-4904-b7bb-e7d8fd8bc592
📒 Files selected for processing (4)
src/bun.js/bindings/webcore/JSMessagePort.cppsrc/bun.js/bindings/webcore/MessagePort.cppsrc/js/node/worker_threads.tstest/js/node/worker_threads/worker_threads.test.ts
Node's onmessage setter branches on typeof value === 'function', calling ref()/start() for functions and unref()/stop() otherwise. Previously Bun called jsRef() unconditionally, so port.onmessage = fn; port.onmessage = null left the port refing the event loop.
Third teardown path alongside close() and disentangle(): a port with onmessage/.ref() whose peer has closed becomes GC-collectible via hasPendingActivity() → PeerClosed, and ~MessagePort() runs with m_isRefingEventLoop still set. Nothing on that path called context->unrefEventLoop(), so the process hung with a permanent +1. Release it in the destructor body; scriptExecutionContext() is still valid there (ContextDestructionObserver's base destructor runs after).
Node's setupPortReferencing installs a 'newListener' hook that calls
this.ref() when the first 'message' listener is added. Mirror that in
onDidChangeListenerImpl's Add case so port.unref(); port.on('message', fn)
re-refs (matching Node), while port.on('message', fn); port.unref()
stays unref'd (also matching Node).
Before this, the updateEventLoopRef() predicate gated on m_hasRef which
stayed false after unref(), so the unref→on ordering left the port
without an event-loop ref — a regression from the refactor since the
old Add case reffed unconditionally.
…postMessageToThread
…istener Symmetric counterpart to 5f762a4: Node's setupPortReferencing installs a 'removeListener' hook that calls this.unref() when the last 'message' listener is removed. Mirror that in the Remove/Clear N→0 transition so port.ref(); port.on(...); port.off(...); port.hasRef() → false, matching Node. Not a regression (pre-PR behaved the same), but completes the newListener/removeListener pair.
There was a problem hiding this comment.
All prior review threads are resolved and the bug-hunting pass found nothing new, but this rewrites the MessagePort event-loop ref state machine and adds cross-thread messaging plumbing through C++ worker bootstrap — worth a human pass on the concurrency/lifetime bits before merging.
Extended reasoning...
Overview
This PR implements worker_threads.postMessageToThread (Node 22+) and, in the course of doing so, substantially rewrites the MessagePort event-loop ref/unref state machine. It touches 18 files: C++ bindings (MessagePort.{cpp,h}, MessagePortPipe.{cpp,h}, Worker.cpp, JSWorker.cpp, JSMessagePort.cpp, EventEmitter.cpp), the JS node:worker_threads module (~220 new lines of routing/SAB-ack logic), four new error codes, and ~240 lines of new tests plus four upstream Node parallel tests.
Beyond the headline feature, it also ships several adjacent fixes: process.emit() return value, Worker::dispatchOnline ordering vs isOnline(), a PeerClosed mirror bit in MessagePortPipe for a single-atomic-load hasPendingActivity() snapshot, and ref-release on all three MessagePort teardown paths (close(), disentangle(), ~MessagePort()).
Security risks
None identified. The new API surface is in-process inter-thread messaging; values go through the existing structured-clone path. No auth, crypto, filesystem, or network surface is touched.
Level of scrutiny
High. The MessagePort ref state machine governs whether processes/workers exit, and this PR replaced the prior ad-hoc m_hasRef/listener-counter logic with a centralized updateEventLoopRef() predicate plus three new state booleans. Over the course of review, seven distinct edge cases in that state machine were found and fixed iteratively (onmessageerror, onmessage = null, close(), disentangle(), ~MessagePort(), unref()→on(), on()→off() after .ref()), plus a threadId === -1 cleanup bug and a MessagePortPipe GC-snapshot race. That iteration history is a signal that the surface is subtle. The MessagePortPipe change also introduces a new cross-thread fetch_or(PeerClosed) under the peer's lock, and Worker::dispatchOnline reorders the OnlineFlag store relative to the parent-thread open event.
Other factors
- The PR has extensive test coverage (4 upstream Node tests + 9 new bun-test cases including regression tests for each ref-leak fix).
- All 14 review threads (CodeRabbit + claude) are resolved; the author was responsive and each fix is well-reasoned.
- No human reviewer has looked at it yet — only bot review so far.
- CI build #50874 is in progress for the latest commit.
- The PR description's "Merge with #29937" section indicates this was rebased over a separate
MessagePortPiperewrite, which adds merge-resolution risk.
Given the concurrency-sensitive C++ changes, the destructor-time unrefEventLoop() call, and the number of edge cases already found during review, a human pass on MessagePort.cpp/MessagePortPipe.cpp is warranted.
…postMessageToThread
There was a problem hiding this comment.
Nothing new from this pass — all earlier inline findings have been addressed. Still deferring to a human given the scope: the MessagePort event-loop ref state machine is fully rewritten, MessagePortPipe gains a new PeerClosed bit that changes hasPendingActivity() / GC-visibility, and the worker bootstrap payload format changes — all concurrency- and lifetime-sensitive enough to want eyes on.
Extended reasoning...
Overview
This PR implements worker_threads.postMessageToThread (Node 22+) and, along the way, rewrites the MessagePort event-loop ref-counting state machine and fixes several adjacent bugs. 18 files touched across:
- JS runtime:
src/js/node/worker_threads.ts— ~220 new lines implementing the main-thread-routed messaging protocol withSharedArrayBuffer+Atomics.waitAsyncack, plus a new per-workerMessageChannelandthreadId → portregistry. Also caches#threadIdso#onClosecan unregister after the native getter starts returning -1. - C++
MessagePortref state machine:MessagePort.{h,cpp},JSMessagePort.cpp—jsRef/jsUnref/onDidChangeListenerImplrewritten around a singleupdateEventLoopRef()predicate (m_hasRef && (m_messageEventCount > 0 || m_wantsExplicitRef) && !m_isDetached). Over the course of review this was extended to release the ref inclose(),disentangle(), and~MessagePort(), and to mirror Node'snewListener/removeListenersemantics on the 0↔1 listener transitions. MessagePortPipe: adds aPeerClosedmirror bit sohasPendingActivity()reads a single-atomic-load consistent snapshot instead of two separate loads (avoiding a torn {before-send, after-close} observation that could collect a wrapper with a message in flight).- Worker bootstrap:
JSWorker.cpp/Worker.cpp/WorkerOptions.h— extends the serialized[workerData, environmentData]payload to length 3 withmainThreadPort, and reordersdispatchOnlineto setOnlineFlagbefore postingopento the parent. EventEmitter::emitForBindings: now returnsemit()'s actual boolean instead of alwaystrue, soprocess.emit('x')returnsfalsewith no listeners.- Four new error codes; four upstream Node tests checked in; ~230 lines of new unit tests in
worker_threads.test.ts.
Security risks
None identified. Messaging is intra-process between threads the user already controls; no auth, crypto, filesystem, or network surface is touched. The new postMessageToThread arguments are routed through existing MessagePort.postMessage / structured-clone, so transfer-list validation and DataCloneError semantics are inherited.
Level of scrutiny
High. The MessagePort ref state machine directly governs process/worker lifetime (whether the event loop stays alive), and the MessagePortPipe change alters what the GC thread observes via hasPendingActivity() — both are concurrency-sensitive C++ where subtle mistakes manifest as hangs, premature exits, or rare wrapper-collected-with-message-in-flight races. The review history bears this out: ten distinct issues were surfaced and fixed across seven follow-up commits (the #onClose / threadId == -1 leak, three separate teardown paths missing updateEventLoopRef(), onmessage/onmessageerror setter parity, and the Add/Remove 0↔1 transitions). That iteration converged well, but the density of edge cases is itself a signal that a human should sign off on the final state.
Other factors
- All prior inline comments (mine and CodeRabbit's) are resolved; the bug-hunting system found nothing this round.
- A documented known-limitation block in
worker_threads.tscovers the lazy-setup hang, the no-notify-on-exit case, and orphaned grandchildren — all upstream-parity or deferred-to-native-bootstrap. - Test coverage is solid: four upstream Node parallel tests plus targeted Bun unit tests for each ref-leak fix (close, disentangle, GC-after-peer-close, unref-then-on, on-then-unref, exited-worker-id rejection).
- The
emitForBindingsreturn-value change is a behavior change toprocess.emit()for the no-listener case — correct per Node, but worth a human glance for any internal callers that relied on the old always-true. - CI build #51731 is in progress for the latest merge-from-main commit.
…postMessageToThread # Conflicts: # src/jsc/bindings/webcore/Worker.cpp
…ned with ErrorCode.rs The Rust rewrite added src/jsc/ErrorCode.rs with hardcoded ordinals derived from ErrorCode.ts. Inserting the four new codes alphabetically (between ERR_WORKER_INIT_FAILED and ERR_WORKER_NOT_RUNNING) shifted every later ordinal in the generated C++ enum but not in ErrorCode.rs, so Rust callers like NativeZstd::jsc::ErrorCode::ZLIB_INITIALIZATION_FAILED (= ErrorCode(264)) now pointed at ERR_WORKER_MESSAGING_SAME_THREAD in the C++ table — test-zlib-zstd.js got the wrong error.code. Move the four codes to the end of ErrorCode.ts (ordinals 312–315) and add the matching entries to ErrorCode.rs. Also sync the MessagePort.h state-machine field comments with the newListener/removeListener behaviour added in 5f762a4/ca91e0ea.
bake-codegen.ts passed the raw minified CSS (starting with '*{...}')
as the OVERLAY_CSS define value, relying on Bun.build's auto-quote
fallback to turn it into a string literal. That fallback was broken
for a window of main commits (between the define-parser refactor and
314d044's '*'/'?' JSON-lexer fix), so any bootstrap bun in that
range fails bake-codegen with 'Operators are not allowed in JSON'.
JSON.stringify the value explicitly (like 'side' already does) so
codegen works regardless of which bun runs it. OVERLAY_CSS is declared
as 'string' and used as sheet.replace(OVERLAY_CSS), so a string
literal is the intended shape.
StatusPR is green on everything it touches. On the latest build (#63280, 282 passed / 4 failed) the only failures are:
None of these touch anything in this diff. Zero worker/MessagePort/postMessageToThread failures on any platform, including x64-asan where the previous ConcurrentTask leak was caught (fixed in 7a5bc2d). Four Node parallel tests now passing:
Plus 39 unit tests in All review threads resolved. Ready for maintainer review/merge. |
…postMessageToThread # Conflicts: # src/runtime/cli/run_command.rs # src/runtime/jsc_hooks.rs
…postMessageToThread # Conflicts: # src/jsc/ErrorCode.rs # src/jsc/bindings/ErrorCode.ts
…postMessageToThread # Conflicts: # src/jsc/ErrorCode.rs # src/jsc/bindings/ErrorCode.ts
…postMessageToThread
…postMessageToThread
A worker's EventLoop::concurrent_tasks queue was never drained on shutdown: EventLoop::deinit() only walks self.tasks, and the VM box is raw-dealloc'd so no Drop runs. Any postTaskTo(workerCtxId, ...) that landed between the worker's last tick_concurrent() and its context leaving the global map leaked a ConcurrentTask box plus its EventLoopTask* payload. The main-thread exit path drains this queue in global_exit(); workers had no equivalent. test-worker-messaging.js makes the window trivially reachable: seven threads all subscribed to the same BroadcastChannel, each posting as the others tear down. Eagerly removeFromContextsMap() in WebWorker__teardownJSCVM (so postTaskTo returns false once teardown has begun), then drain the queue via a new Bun__dropConcurrentCppTasksForWorker shim over the existing EventLoop::drop_concurrent_cpp_tasks. The map lock held by postTaskTo across lookup+enqueue makes the gate-then-drain race-free. removeFromContextsMap() is now idempotent via m_isInContextsMap so ~GlobalObject's later call is a no-op. Also add the four new test-worker-messaging*.js files to no-validate-leaksan.txt alongside the other 58 test-worker-* entries, for the pre-existing per-worker ParsedSourceMap/WebWorker-box leaks those tests share with every other worker test.
…postMessageToThread
…postMessageToThread
isOtherSideOpen was folded into the PeerClosed mirror bit read by isActivityPending; the test comment still named the removed function.
There was a problem hiding this comment.
No new issues found — all earlier review threads are resolved and the latest commit is just the comment sync. This touches a lot of concurrency-sensitive native code (the MessagePort ref/unref state machine, MessagePortPipe GC-liveness atomics, worker-teardown task draining across the C++/Rust boundary), so it's worth a maintainer pass before merge.
Extended reasoning...
Overview
Implements worker_threads.postMessageToThread and, along the way, substantially reworks several native subsystems across 25 files: the MessagePort event-loop ref/unref state machine (MessagePort.{cpp,h}, JSMessagePort.cpp), the MessagePortPipe close protocol (new PeerClosed mirror bit + isActivityPending() replacing isOtherSideOpen(), which changes hasPendingActivity() GC-liveness semantics), worker bootstrap payload plumbing (JSWorker.cpp/Worker.cpp/WorkerOptions.h), ScriptExecutionContext map-removal idempotency, a new Rust FFI shim (Bun__dropConcurrentCppTasksForWorker) to drain leaked cross-thread tasks during worker teardown, the native EventEmitter::emitForBindings return value, four new error codes, and ~250 lines of new routing logic in worker_threads.ts.
Security risks
None identified. The new API surface is in-process thread-to-thread messaging with the same trust boundary as existing worker.postMessage; no auth, crypto, filesystem, or network exposure.
Level of scrutiny
High. The MessagePort ref state machine alone went through seven fix commits during review (onmessageerror, onmessage = null, close(), disentangle(), ~MessagePort(), unref();on(), ref();on();off()), plus a 🔴 fix for #onClose reading threadId === -1 that would have leaked every worker's port. The PeerClosed mirror bit changes when MessagePort wrappers become GC-collectible, and the WebWorker__teardownJSCVM change reorders context-map removal relative to GC and adds a cross-language drain hook — both are subtle lifetime/ordering changes in hot threading paths. These are exactly the kinds of changes where a maintainer familiar with Bun's worker/event-loop internals should confirm the invariants.
Other factors
Test coverage is good (4 upstream Node tests + ~10 new unit tests including regression tests for each ref-state-machine fix), CI is green on everything this PR touches, and all 15 prior review threads (mine and CodeRabbit's) are resolved. The remaining open question is design-level confidence in the native concurrency changes, not correctness bugs I can point to.
Implements
worker_threads.postMessageToThread(threadId, value[, transferList][, timeout])(Node 22+), which sends a message to any thread by itsthreadIdand emitsprocess.on('workerMessage', value, source)on the target.How it works
Each
node:worker_threadsWorkernow gets a dedicatedMessageChannelto the main thread. The worker-side port is plumbed through the existing serialized[workerData, environmentData]array (extended to include the port at index 2) inJSWorker.cpp→createNodeWorkerThreadsBindinginWorker.cpp, and is set up when the worker first loadsnode:worker_threads.The main thread keeps a
threadId → portregistry.postMessageToThreadalways routes through the main thread: the sender posts aSEND_MESSAGE_TO_WORKERenvelope (directly if on main, via itsmainThreadPortotherwise), the main thread forwards aRECEIVE_MESSAGE_FROM_WORKERto the destination, and aSharedArrayBuffer+Atomics.waitAsynccarries the ack so the sender can await delivery/timeout. Threads created outside ofnode:worker_threads(e.g. via the WebWorkerconstructor) act as a local root for their subtree.New error codes:
ERR_WORKER_MESSAGING_ERRORED,ERR_WORKER_MESSAGING_FAILED,ERR_WORKER_MESSAGING_SAME_THREAD,ERR_WORKER_MESSAGING_TIMEOUT.Fixes uncovered along the way
process.emit()always returnedtrue— the nativeEventEmitter::emitForBindings(used byprocess) ignored the return value offireEventListeners(). It now returnsfalsewhen there are no listeners, matching Node and the JSEventEmitter.MessagePort.prototype.unref()was a no-op on transferred ports — adding a'message'listener on a transferred port refs the event loop via a counter thatunref()never touched, so an unreffed port still kept the process alive. Unified into a singleupdateEventLoopRef()predicatem_hasRef && (m_messageEventCount > 0 || m_wantsExplicitRef) && !m_isDetached;ref()/unref()togglem_hasRef,hasRef()reflects whether the port is currently holding an event-loop ref.close()anddisentangle()callupdateEventLoopRef()after detaching so a ref'd/transferred port doesn't leak its ref on the source thread.Worker::dispatchOnlineraced withisOnline()— it posted theopenevent to the parent before settingOnlineFlag, so a fast parent could callworker.getHeapSnapshot()after receiving'online'and seeisOnline() == false→ERR_WORKER_NOT_RUNNING. The flag is now set first.Worker teardown leaked cross-thread tasks queued after the last tick (pre-existing; surfaced by LSAN on
test-worker-messaging.js).postTaskTo(workerCtxId, ...)landing between a worker's lasttick_concurrent()and its context leaving the global map leaked aConcurrentTaskbox plus itsEventLoopTask*payload; the worker VM box is raw-dealloc'd so noDropruns on the queue.WebWorker__teardownJSCVMnow eagerly removes the context from the map (idempotent viam_isInContextsMap) and drains the concurrent queue via a newBun__dropConcurrentCppTasksForWorkershim, mirroring whatglobal_exit()already does for the main thread. The four newtest-worker-messaging*.jsare also added tono-validate-leaksan.txtalongside the other 58test-worker-*entries, for the separate per-worker sourcemap/WebWorker-box leaks those tests share with every other worker test.Merge with #29937 (MessagePortPipe)
#29937replaced theMessagePortChannelProvider/Registrystack withMessagePortPipe. Conflict resolution:ScriptExecutionContext.cpp: dropped my destructor reorder —m_processMessageWithMessagePortsSoonHandlersno longer exists on main.MessagePort.{cpp,h}: started from main's new design and re-layered theupdateEventLoopRef()state machine. Main'sclose()/disentangle()alreadyremoveAllEventListeners()(covers the listener-count path); addedupdateEventLoopRef()calls to also release any explicit (onmessage/.ref()) ref before the context is nulled.Tests
Now passing (checked in under
test/js/node/test/parallel/):test-worker-messaging.jstest-worker-messaging-errors-handler.jstest-worker-messaging-errors-invalid.jstest-worker-messaging-errors-timeout.jsPlus unit tests in
test/js/node/worker_threads/worker_threads.test.tsforpostMessageToThread, theprocess.emitreturn value, theMessagePort.unref()keep-alive fix, and transferring a ref'd port.