net: route throws from socket lifecycle listeners to uncaughtException#29762
net: route throws from socket lifecycle listeners to uncaughtException#29762robobun wants to merge 1 commit into
Conversation
|
Updated 1:04 PM PT - Jun 18th, 2026
❌ @robobun, your commit 9063d7f has 4 failures in
🧪 To try this PR locally: bunx bun-pr 29762That installs a local version of the PR into your bun-29762 --bun |
WalkthroughAdds a Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
LGTM — thanks for addressing the describe.concurrent nit and the onread.callback case with a regression test.
Extended reasoning...
Overview
This PR changes error routing for user-code throws on the net.Socket data path. It adds a small pushDataToSocket() helper in src/js/node/net.ts that wraps self.push(buffer) in try/catch and forwards any caught exception through the global reportError() (→ process.on('uncaughtException')), then swaps the three existing data handlers (SocketHandlers, ServerHandlers, SocketHandlers2) to use it. It also updates the onread.callback handler to route throws through reportError instead of self.emit('error', ...). Four subprocess-based regression tests are added in test/js/node/net/node-net.test.ts.
Security risks
None. This is purely error-routing in the Node-compat layer — no auth, crypto, parsing, or untrusted-input handling is touched. The only behavioral change is where an already-thrown user exception surfaces (uncaughtException vs. socket 'error' / silently swallowed), which is strictly an improvement over the prior swallowing behavior.
Level of scrutiny
Low–medium. The diff is ~20 source lines plus tests. The mechanism is straightforward: Readable.push() synchronously emits 'data' in flowing mode, so a listener throw propagates up through it; catching there and calling reportError matches Node's semantics (where the throw bubbles through native onStreamRead and lands in uncaughtException). Returning true from the catch correctly avoids pausing the socket on a user-code bug. reportError is already the established pattern for this in Bun's JS layer (used in diagnostics_channel, _http_client, webstreams_adapters).
Other factors
I reviewed an earlier revision and left two comments (use describe.concurrent; apply the same fix to onread.callback). Both were addressed in ad030db, including a new regression test for the onread path that the author verified against Node first. No CODEOWNERS cover these files. The robobun CI failures are Windows agent-provisioning failures plus an unrelated worker_threads flake, not caused by this change. Test coverage is solid: server-side throw, client-side throw, no-handler crash, and onread.callback throw — each asserts both that uncaughtException fires and that the socket's own 'error' event does not.
|
Build 48204 completed. 252 jobs finished, 0 test failures. The 6 canceled + 28 blocked jobs are all Windows agent provisioning failures (same |
ad030db to
fbfcd2b
Compare
fbfcd2b to
b10c3ca
Compare
Follow-up to ffe68f0 addressing review on #29762: 1) Short-circuit semantics - Node's synchronous self.emit('connect'); self.emit('ready') pair means a throw from the 'connect' listener unwinds the frame and 'ready' never fires. The previous per-emit emitToSocket helper wrapped each call in its own try/catch, so after reporting the 'connect' throw, 'ready' still fired. Replace emitToSocket with safelyInvokeListeners(fn) which wraps a block: the reportError catch belongs to the whole emit group, not each emit. Regrouped all the call sites accordingly. 2) 'connect' leaves-socket-alive test - replace setTimeout(50) with an event-driven trigger: drive the post-throw write from the uncaughtException handler, have the server respond then end, and exit on the socket's 'end'. Per test/CLAUDE.md: 'never wait for time to pass in tests'. 3) Added a regression test that asserts 'ready' does NOT fire after a 'connect' throw - catches the per-emit mistake if it regresses.
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟣
src/js/node/net.ts:433-439— 🟣 Pre-existing, same bug class: ffe68f0'ssafelyInvokeListenerssweep wrapped the main paths ofServerHandlers.openandServerHandlers.handshakebut skipped their early-return branches — the twoself.emit('drop', data)calls (blockList / maxConnections) and the ECONNRESET branch'sself.emit('_tlsError', err)/self.server.emit('tlsClientError', err, self). A user listener that throws on those events still propagates back through the Zig catch intoHandlers.callErrorHandlerinstead ofuncaughtException. Low-traffic edges, but worth wrapping for consistency since the same functions now protect their happy paths a few lines below.Extended reasoning...
What the bug is
Commit ffe68f0 in this PR introduced
safelyInvokeListeners()and applied it to the user-facing emits inServerHandlers.open(the'connection'emit) andServerHandlers.handshake(thetlsClientError/secureConnection/secure/secureConnectblock). However, both handlers have early-return branches that emit user-facing events before reaching the wrapped section, and those were left as bareemitcalls:ServerHandlers.open— two early returns each callself.emit('drop', data):- the
self.blockList.check(...)branch (around src/js/node/net.ts:398) - the
self.maxConnectionsbranch (around src/js/node/net.ts:413)
ServerHandlers.handshake— the ECONNRESET early return (src/js/node/net.ts:443-450):if (!success && verifyError?.code === "ECONNRESET") { const err = new ConnResetException("socket hang up"); self.emit("_tlsError", err); self.server.emit("tlsClientError", err, self); self._hadError = true; self.destroy(); return; }
A throw from a user-registered
server.on('drop', ...),server.on('tlsClientError', ...), orsocket.on('_tlsError', ...)listener on these paths still escapes the JS handler frame, is caught by the Zigcatch |err|inonOpen/onHandshake, and is forwarded toHandlers.callErrorHandler→ServerHandlers.error. That misroutes the user's exception to the socket's'error'/destroy path (and foronOpen, also callsmarkInactive()and closes the native socket) instead of surfacing it asprocess.on('uncaughtException').Node's behaviour
In Node,
server.emit('drop', ...)is called fromonconnectioninlib/net.js, andserver.emit('tlsClientError', ...)from_tls_wrap.js, both invoked from native viaMakeCallbackwith no surrounding try/catch. A throw from either listener becomesuncaughtException; the socket-level'error'event never fires for it.Why nothing currently prevents it
safelyInvokeListenersis only applied to the fall-through path of each handler. The early-return branchesreturnbefore reaching the wrapped block, so theiremitcalls run bare inside the Zig-invoked frame. The Zig side wraps the entire JS callback in a singlecallback.call(...) catch |err| { handlers.callErrorHandler(...) }, so any throw that escapes JS is funneled there.Step-by-step proof (
'drop'case)const server = net.createServer(); server.maxConnections = 0; server.on('drop', () => { throw new Error('BOOM') }); process.on('uncaughtException', e => console.log('UNCAUGHT', e.message));- A client connects. Zig
onOpencallsServerHandlers.open. self._connections (0) >= self.maxConnections (0)is true →socket.end(); self.emit('drop', data);.- The
'drop'listener throws. The throw unwinds out ofemitand out ofServerHandlers.open. - Zig's
catch |err|fires and callshandlers.callErrorHandler(...)→ServerHandlers.error, which setsdata._hadError = trueand routes the user'sError('BOOM')through the socket-error path.onOpenadditionally callsmarkInactive()+ closes the socket on throw. - Node: step 4's throw reaches
MakeCallback's boundary and surfaces asuncaughtException; the'drop'listener's exception never appears on any socket'error'event.
The
tlsClientErrorECONNRESET case is identical: a TLS server whose handshake is interrupted by ECONNRESET emits'tlsClientError'from the unwrapped branch, and a throwing listener is misrouted toServerHandlers.errorinstead ofuncaughtException— while the otherserver.emit('tlsClientError', ...)call eight lines below (the verify-error branch) is wrapped, so the same event behaves differently depending on which branch fired it.Impact
Low.
'drop'only fires when a server hasblockListconfigured or hitsmaxConnections; the ECONNRESET-during-handshake path is a TLS edge case; and throwing from these listeners is rare. The practical effect is the same Node-compat divergence this PR fixes elsewhere: the throw is reported on the wrong channel (socket/server error path instead ofuncaughtException), and theonOpenZig path additionally tears down state on throw.Severity / scope
Pre-existing. None of these lines were modified by the PR. Flagging only because ffe68f0 explicitly entered both
ServerHandlers.openandServerHandlers.handshaketo apply this exact fix, so the unwrapped branches are now inconsistent with the wrapped ones a few lines below in the very same functions. There are further unwrapped emit sites elsewhere in the file (kAttach'sconnect/ready,emit('lookup'),emit('connectionAttempt'), etc.) — a single follow-up sweeping all of them is probably more useful than per-site patches.How to fix
Wrap each early-return emit group in
safelyInvokeListeners, e.g.:socket.end(); safelyInvokeListeners(() => { self.emit("drop", data); }); return;
and for the handshake ECONNRESET branch:
if (!success && verifyError?.code === "ECONNRESET") { const err = new ConnResetException("socket hang up"); safelyInvokeListeners(() => { self.emit("_tlsError", err); self.server.emit("tlsClientError", err, self); }); self._hadError = true; self.destroy(); return; }
- the
|
Thanks for the catch — acknowledging and deferring. Two reasons:
Will file a follow-up issue to sweep the remaining sites after this lands. |
c959ae5 to
7f9a1a9
Compare
|
Rebased onto main (was conflicting). Main refactored
All 8 |
081d7f7 to
f10ecea
Compare
|
CI note for maintainers: the only failure on the latest build (63155, sha f10ecea) is the That is a build-infra network timeout during the Android cross-build toolchain install, before any test runs. It is unrelated to this diff (JS-only change to |
f10ecea to
d19960e
Compare
There was a problem hiding this comment.
All my prior feedback has been addressed and I found no remaining issues, but this changes error-routing semantics across several node:net lifecycle events (data/connect/ready/connection/onread) with deliberate scoping carve-outs for TLS and timeout — worth a maintainer's sign-off on those boundaries before merge.
Extended reasoning...
Overview
This PR fixes #29761 by intercepting synchronous throws from user socket-event listeners and re-dispatching them via reportError() so they surface as process.on('uncaughtException') (matching Node), instead of letting them unwind into the native socket layer's call_error_handler where they were either misrouted to the socket's 'error' event or silently swallowed. Two small helpers are added to src/js/node/net.ts (pushDataToSocket, safelyInvokeListeners) plus a module-load capture of reportError, applied at six call sites spanning SocketHandlers/ServerHandlers/SocketHandlers2 data handlers, SocketHandlers.open, onconnection, afterConnect, and the onread.callback path. Eight subprocess tests are added.
Security risks
None. This is purely error-propagation plumbing — no auth, crypto, parsing, or trust-boundary changes. The reportError capture at module load is a hardening (tamper-resistance) improvement, not a new exposure.
Level of scrutiny
High. node:net underlies http/https/tls and essentially every networked Bun program. The change alters when/whether 'error' fires vs. 'uncaughtException', whether 'ready' fires after a throwing 'connect' listener, whether a server socket auto-resumes after a throwing 'connection' listener, and whether the connection survives a throwing 'connect' listener (it now does; previously the native onOpen path tore it down). These are all moves toward Node parity and are well-tested, but they are observable behavior changes in a foundational module.
Other factors
- I reviewed this across ~8 iterations (Apr–Jun); every nit and suggestion was addressed and all threads are resolved. The current bug-hunt pass found nothing.
- Scope grew well past the original issue (
'data'only) at my prompting, then was deliberately frozen: TLS handshake emits stay unwrapped (wrapping them surfaced unrelated pre-existing TLS bugs in CI), and the nativetimeouthandlers were intentionally left unwrapped (dead path; user-facing'timeout'already routes correctly via the JS timer). The author committed to a follow-up sweep for the remaining sites (drop/lookup/connectionAttempt*/kAttach/timeout). A maintainer should confirm they're comfortable with that scoping boundary. onconnectionnow wraps_socket.resume()inside the same try/catch asemit('connection'), so a throwing'connection'listener leaves the accepted socket paused. This matches Node (the throw unwinds before resume there too), but it's a real behavior change for Bun.- Tests follow repo conventions (
describe.concurrent, three-elementPromise.allpipe drain, event-driven rather than time-based). CI is green except an unrelatedlinux aarch64-android build-rusttoolchain-fetch network timeout the author flagged.
Net: I believe the change is correct and well-tested, but it's a semantic change in core networking error handling with explicit scope carve-outs — a human maintainer should give the final nod rather than a bot.
d19960e to
6764763
Compare
|
Rebased onto main (6764763). |
Part of #29761. When a user listener attached to a net.Socket/Server lifecycle event throws synchronously, the throw unwinds back into the native socket callback, which catches it and funnels it into the socket's onError handler (the 'error' event) or, on the open path, tears the connection down. Node instead re-raises the throw as 'uncaughtException' and leaves the socket alive. Add a safelyInvokeListeners helper that wraps a group of consecutive self.emit(...) calls and forwards a caught throw through reportError (which surfaces as 'uncaughtException'). Wrapping a whole emit group (not each emit) preserves Node's short-circuit semantics: a throw from the 'connect' listener aborts the synchronous sequence so 'ready' never fires. reportError is captured at module load so the routing can't be defeated by user code clobbering globalThis.reportError. Applied to the non-TLS lifecycle paths a user listener reaches synchronously from a native socket callback: - open: SocketHandlers 'connect'/'ready'; afterConnect 'connect'/'ready' - server: onconnection 'connection' - onread.callback: reportError instead of self.emit('error') Note: the 'data'-listener case from the original report is intentionally NOT included here. Routing 'data'-emit throws to uncaughtException at the JS layer (wrapping self.push) breaks node:http, which depends on the native layer catching 'data' throws and routing them to the socket 'error' event (Bun's http parser throws parse errors where Node returns them, and http's socketOnData relies on that routing). A correct 'data'-path fix needs a coordinated change in the native socket onData path plus http's parser error handling; that is tracked as a follow-up. Tests cover client 'connect' (incl. post-throw writability), 'ready', server 'connection', onread.callback, and the emit short-circuit.
6764763 to
9063d7f
Compare
|
Heads-up: I narrowed this PR after CI surfaced a real regression, and it now needs a maintainer decision on the remaining piece. What happenedThe previous revision wrapped
All four pass on main and fail with the data-path wrapping. Root cause
There is no way to distinguish a user
That spans native socket code + the http parser error model and also affects the What this PR now containsI reverted the data-path wrapping and kept only the safe, same-bug-class lifecycle fixes that do not touch the http data path:
Verified: the 4 http tests above pass again, a 40-test The original |
|
CI note (build 63365): the red lanes are all unrelated flake/infra, not this diff:
The change here is JS-only ( |
Fixes #29761.
Repro
[uncaught] BOOMand exits 1.got: hiand exits on the internal timeout. The throw is silently swallowed.Cause
Bun's Zig socket layer (
onDatainsrc/bun.js/api/bun/socket.zig) wrapscallback.call(...)in a try/catch and forwards any thrown exception toHandlers.callErrorHandler, which invokes the JS-sideSocketHandlers.error(src/js/node/net.ts). For server-side sockets the handler setsself._hadError = trueand re-delegates to the sharedSocketHandlers.error, whose_hadErrorguard trips and the error disappears on the floor. Even when the guard doesn't trip, routing a user-code throw to the socket's'error'event is wrong — Node re-raises it as'uncaughtException'and leaves the socket alone.Fix
Wrap
self.push(buffer)inSocketHandlers.data,ServerHandlers.data, andSocketHandlers2.datawith try/catch and forward the caught exception throughreportError, which surfaces it as'uncaughtException'. Matches Node's semantics: the socket's own'error'event doesn't fire for user-listener throws, and the socket stays alive.Verification
New tests in
test/js/node/net/node-net.test.tscover server-side, client-side, and no-handler scenarios:'data'throw →uncaughtExceptionfires; socket'error'does not.'data'throw → same.All three fail on main (two timeout, one sees the error on the socket's
'error'event instead ofuncaughtException) and pass on this branch.