net: emit 'close' when a backpressured socket's peer dies#31654
Conversation
|
Updated 7:15 AM PT - Jun 17th, 2026
❌ @robobun, your commit 299f240 has 1 failures in 🧪 To try this PR locally: bunx bun-pr 31654That installs a local version of the PR into your bun-31654 --bun |
|
Reproduced and fixed. A Matches stock Node v24 across clean FIN, RST-with- Regression test: two Bun processes over a UDS, survivor builds backpressure, peer |
|
Found 2 issues this PR may fix:
🤖 Generated with Claude Code |
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
WalkthroughThis PR introduces a socket teardown fix to prevent hangs when a peer closes during backpressure on Unix Domain Sockets. A new ChangesSocket teardown fix and regression test
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/node/net/node-net.test.ts`:
- Around line 753-756: Remove the timer-based watchdog: delete the setTimeout
that assigns to watchdog and the subsequent clearTimeout call; rely on awaiting
survivor.exited (and, if needed, adjust the test harness timeout via test
framework config rather than using setTimeout). Ensure references are to the
existing survivor.exited promise and survivor.kill usage remains untouched.
- Line 710: Replace the use of tmpdirSync() when constructing the UDS path for
udsPath and instead use the test harness tempDir fixture (call tempDir() to get
a harness-managed temporary directory) so the line creating udsPath uses
join(tempDir(), `hup-${randomUUID()}.sock`); also ensure the test
imports/receives the tempDir fixture from 'harness' if it's not already
available.
- Around line 712-727: Replace the tmpdirSync-based socket directory with
harness.tempDir by creating udsPath from tempDir(...) (use the harness.tempDir
helper instead of tmpdirSync), remove the ad-hoc setTimeout watchdog logic
(delete the setTimeout-based timer and instead rely on the test framework
timeout or an AbortController-based cancellation), and revert any unnecessary
changes to stdout handling—keep waitForLine as a simple reader that releases the
lock after each call and do not attempt to preserve reader state between calls;
target the udsPath variable, the setTimeout/watchdog code, and the waitForLine
function when making these edits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7cf53707-8231-4abe-bd71-f5aa333f875f
📒 Files selected for processing (4)
src/js/node/net.tstest/js/node/net/node-hup-backpressure-peer-fixture.jstest/js/node/net/node-hup-backpressure-survivor-fixture.jstest/js/node/net/node-net.test.ts
There was a problem hiding this comment.
I didn't find any bugs, but this changes socket lifecycle semantics in three close-handler paths (including newly propagating err into destroy() where it was previously dropped), so it's worth a human look at the event-ordering and downstream http/tls implications.
Extended reasoning...
Overview
This PR adds a destroyAfterClose helper to src/js/node/net.ts and wires it into all three native close handlers (SocketHandlers.close, ServerHandlers.close, SocketHandlers2.close). When the native layer reports the connection closed and the writable side still has buffered data (or an error is present), it now force-destroy()s the Duplex so 'close' fires instead of hanging on autoDestroy waiting for a writable half that can never drain. Two new fixture files and a UDS-based regression test are added.
Security risks
None identified — this is stream-lifecycle / event-emission logic with no auth, crypto, or input-parsing surface.
Level of scrutiny
Medium-high. net.Socket close/destroy semantics sit underneath node:http, node:https, node:tls, and node:http2. The fix is small but it's a behavioral change in three code paths simultaneously, and one of those paths (SocketHandlers2.close) previously dropped err entirely (there was a TODO about it) — it now feeds err into destroy(err), which will surface an 'error' event where none was emitted before. That's the intended Node-compat fix per the PR table, but it's the kind of change where a maintainer familiar with the http/tls layers should sanity-check that nothing upstream was relying on the old swallow-the-error behavior, and that the interaction with SocketHandlers.error (which already emit('error', ...)s directly) doesn't double-emit.
Other factors
- The regression test is solid (subprocess-isolated, watchdog-guarded, checks exactly-once semantics).
- The PR description verifies against Node v24 and lists the related test suites that still pass, but CI was still building at review time.
- Stream end/destroy ordering is notoriously subtle; the helper deliberately skips the
writableLength === 0 && !errcase to avoid racing'end', which is the right instinct but also signals this is delicate territory. - No prior reviews from me on this PR.
|
Linked #24808 (directly reproduced + tested here — the write-backpressure variant). Not auto-closing #28396: it's an umbrella for four separate |
|
Updated: broadened the fix to cover both zombie-socket shapes on native close — write-backpressure (the original report) and a paused/unpiped readable whose EOF is never consumed (the Two regression tests (backpressure via SIGKILL over a UDS; paused socket via Consolidation: closed my earlier #28732 (strict subset of this); cross-linked #28350 and #27161 for a maintainer to pick. Waiting on CI. |
|
CI on the previous push caught a real regression (thanks, annotations): propagating the native close's error into Fixed by dropping the error — Re-verified under |
ef80fdc to
9d3e856
Compare
|
Rebased onto main (133 commits, conflict in
All 141 |
|
Diff is green; CI red is unrelated flake. Four consecutive CI runs — each failing on a different test unrelated to this change, with zero
This PR touches only The review bot's final pass on 299f240 found no issues (all three edge cases it flagged earlier are addressed) and defers to a maintainer for sign-off given the overlap with #27161/#28350. Ready for a maintainer to review/merge past the unrelated flakes. |
d6837b7 to
341c164
Compare
A node:net socket whose peer disappears while the socket has a stuck half
(write-backpressure that can never flush, or a paused readable with nothing
buffered that never consumes the queued EOF) never emitted 'close'. The
process hung forever, spinning on the half-closed fd; server.close() never
completed.
The socket is a Duplex with { emitClose:false, autoDestroy:true }, so 'close'
only fires from _destroy, and autoDestroy only runs _destroy once both halves
finish. When a half is stuck it never does: the socket lingers as a zombie
(destroyed=false), server._connections never decrements, and a peer that keeps
the fd hot spins the loop.
Changes:
- SocketEmitEndNT now follows push(null) with read(0) (as SocketHandlers2.close
already does), so a paused stream with nothing buffered still schedules
'end' and can auto-destroy instead of stalling.
- SocketEmitEndNT's pending-write flush now also gates on self[kclosed] so a
clean close (no error) still fails the in-flight write — otherwise a paused
reader with buffered data plus a backpressured write leaves kWriting set and
autoDestroy can never fire even after the reader drains.
- The native close handlers force teardown via destroyAfterClose(): when
there is nothing left to read (readableLength === 0) and the socket is not
already destroyed, schedule destroy() on setImmediate. Deferring past the
nextTick queue lets the pending 'end' (from push(null)+read(0)) and any
"write EBADF" callback _write scheduled for a write that raced the close
fire first (test-net-socket-close-after-end.js,
test-net-socket-write-after-close.js). Readers with data still buffered are
left alone so they can consume it and emit 'end' before 'close', as Node
does. No error is passed: real read errors already surface through the
dedicated error paths in SocketEmitEndNT, and the passive peer close that
lands here is benign.
Two regression tests: (1) two Bun processes over a UDS, survivor builds
backpressure, peer SIGKILLed mid-flight, survivor must get one 'close' and
exit; (2) a paused server socket whose peer ends, 'close' fires and
server.close() completes. Both hang on the baked bun. All 141 test-net-*
Node parallel tests pass under bun bd (ASAN), along with node-net-server
(18/18), the http/tls parallel tests that previous iterations regressed,
and regression/issue/12117.
Rebased onto #31155, which substantially reworked these close handlers
(ECONNRESET-shaped destroy in SocketEmitEndNT, kwriteCallback flush); those
changes are kept and destroyAfterClose sits after them as the stuck-writable
fallback they leave open.
341c164 to
299f240
Compare
There was a problem hiding this comment.
No issues found on the latest revision — all three edge cases I flagged earlier are addressed. Deferring to a maintainer since this touches core node:net socket-teardown ordering across all three close handlers and overlaps with #27161/#28350, which someone with the full picture should reconcile.
Extended reasoning...
Overview
The PR modifies src/js/node/net.ts — specifically the three native close handlers (SocketHandlers.close, ServerHandlers.close, SocketHandlers2.close) and SocketEmitEndNT — to ensure a net.Socket whose peer dies while one Duplex half is stuck (write backpressure, or a paused readable with nothing buffered) still reaches _destroy and emits 'close'. It adds a destroyAfterClose() helper (deferred destroy() via setImmediate, gated on readableLength === 0 and a kclosed re-check), a read(0) kick after push(null), and broadens the kwriteCallback flush gate to include self[kclosed]. Two new regression tests plus two subprocess fixtures cover both stuck-half shapes.
Security risks
None identified. This is socket-lifecycle event ordering; no auth, crypto, parsing of untrusted input, or permission surfaces are touched.
Level of scrutiny
High. This is the teardown path for every node:net socket in Bun's Node compat layer, sitting underneath http/https/tls/undici/grpc/etc. The change hinges on subtle event-loop ordering invariants — setImmediate vs nextTick, 'end'-before-'close', autoDestroy's both-halves-finished requirement, and the detached-handle vs attached-handle close paths. The PR's own history shows how easy this is to get wrong: an earlier iteration that propagated the close error into destroy(err) broke fetch/ws/h2/undici/grpc/valkey across all platforms; the readableLength guard and setImmediate defer were each added to un-break specific Node parallel tests; and two of my own inline comments flagged narrow races (reconnect-from-'close', clean-close + buffered-readable + pending write) that required further amendments. The current revision looks correct and well-guarded, but this is exactly the kind of change where a maintainer who knows the uSockets/native-handler contract should sign off.
Other factors
- The PR overlaps three other PRs targeting the same handlers (#27161, #28350, #28732 — the last closed by the author as a subset of this one), and was rebased onto #31155 which substantially reworked the same code. The author explicitly invites a maintainer to pick/consolidate.
- All prior review threads (CodeRabbit's and mine) are resolved; the bug-hunting system found nothing on the current revision.
- Test coverage is solid: two targeted regression tests that hang on the unfixed binary, plus the author reports all 141
test-net-*Node parallel tests,node-net-server, and the previously-regressed http/tls parallels passing under ASAN. CI failures on the last two builds are unrelated flakes (bake DevServer UAF,bun installlockfile, Windows hot-reload). - Net change to runtime code is small (~30 lines), but the blast radius if the ordering is subtly wrong is large.
Symptom
A
node:netsocket whose peer disappears while the socket still has a stuck half never emits'close'. The process hangs forever (spinning on the half-closed fd when the peer keeps it hot), andserver.close()never completes. Callingsocket.destroy()from JS immediately clears it.Two stuck-half shapes trigger it:
SIGKILLed. A race: most peer kills deliver'close'normally. (The originally reported case,node:netover a Unix domain socket, both accepted and connecting sockets, macOS + Linux.)server.close()/ process-exit hang behind several Express/Supertest/proxy reports.Cause
A
net.Socketis aDuplexconstructed with{ emitClose: false, autoDestroy: true }, so'close'is emitted only from_destroy, and autoDestroy only runs_destroyonce both halves of the Duplex finish.When the native layer reports the connection closed,
SocketEmitEndNTends the readable half withpush(null), but:'end'from that alone (read(0)is what kicks it, andSocketEmitEndNTwasn't calling it), so the readable half never finishes;Either way autoDestroy waits forever,
_destroynever runs,'close'is never emitted, andserver._connectionsnever decrements (soserver.close()hangs).Fix
Two small changes in
src/js/node/net.ts:SocketEmitEndNTfollowspush(null)withread(0)(the same pairSocketHandlers2.closealready does), so a paused stream with nothing buffered schedules'end'and can auto-destroy instead of stalling.destroyAfterClose()in the three native close handlers forces the teardown: when there is nothing left to read (readableLength === 0) and the socket is not already destroyed, scheduledestroy()onsetImmediate. Deferring past the nextTick queue lets the pending'end'and any"write EBADF"callback_writescheduled for a write that raced the close fire first (test-net-socket-close-after-end.js,test-net-socket-write-after-close.js). Readers with data still buffered are left alone so they can consume it and emit'end'before'close', as Node does. No error is passed: real read errors already surface throughSocketEmitEndNT's dedicated error paths (from net,tls: port Node.js net/tls compatibility tests and fix the gaps they surface — half-open/reset/write semantics, server TLSSocket wrap, session/keylog, SNICallback/ALPNCallback, pfx, OpenSSL error shapes, addCACert, local binding (+305 tests) #31155), and the passive peer close that lands here is benign (a post-response RST, a keep-alive idle close).Verification
Two regression tests in
test/js/node/net/node-net.test.ts:SIGKILLed mid-flight, and the survivor must receive'close'and exit;'close'(after'end') and letserver.close()complete.All 141
test-net-*Node parallel tests pass underbun bd(ASAN), includingtest-net-socket-close-after-end.jsandtest-net-socket-write-after-close.jswhich earlier iterations of this PR regressed.node-net-server.test.ts(18/18), the http/tls parallel tests that previous CI runs flagged (test-https-eof-for-eom,test-http-1.0-keep-alive,test-tls-wrap-econnreset-socket,test-tls-reuse-host-from-socket, …), andregression/issue/12117all pass.Rebased onto #31155, which substantially reworked these close handlers (ECONNRESET-shaped destroy in
SocketEmitEndNT,kwriteCallbackflush). Those changes are kept;destroyAfterClosesits after them as the stuck-writable fallback they leave open. This also covers the net-level fix of #28350 / #28732 (server.close() hang on a paused socket), with thereadableLength/setImmediateguard so'end'-before-'close'ordering is preserved. Happy to rebase/defer however a maintainer prefers.Fixes #24808