usockets: don't defer TLS close when close_notify flush fails#30368
Conversation
ssl_handle_shutdown returned 0 ("wait for the peer") when SSL_shutdown
failed with SSL_ERROR_WANT_WRITE/WANT_READ during a graceful close
(force_fast_shutdown==0). That deferral makes sense for the
SSL_shutdown()==0 case (close_notify flushed, waiting for the peer's
reply) but not here: the alert never went out, SSL_SENT_SHUTDOWN is
already set, and on_writable/on_data short-circuit through
is_shut_down without re-dispatching the queued alert. Nothing will
ever close the fd, and SSL_free is only reached via raw close, so
s->ssl and the BoringSSL write_buffer holding the encoded alert leak
until some other socket event happens to arrive.
Observed as an intermittent LeakSanitizer failure in
test/js/node/http/node-https-checkServerIdentity.test.ts on the Debian
x64-asan lane (the spawned child exits right after server.close(), so
no further event arrives):
Direct leak of 417 byte(s) in 1 object(s) allocated from:
bssl::SSLBuffer::EnsureCap ssl/ssl_buffer.cc:72
bssl::do_tls_write ssl/s3_pkt.cc:194
bssl::tls_dispatch_alert ssl/s3_pkt.cc:373
SSL_shutdown ssl/ssl_lib.cc:1039
ssl_handle_shutdown crypto/openssl.c:821
us_internal_ssl_close crypto/openssl.c:871
us_internal_ssl_on_data
Return 1 instead so us_internal_ssl_close raw-closes immediately,
which detaches and frees the SSL. The unsent close_notify is
best-effort; the abrupt FIN is indistinguishable from a half-close to
the peer.
|
Updated 11:52 AM PT - May 7th, 2026
❌ @cirospaciari, your commit 8abe664 has 3 failures in
🧪 To try this PR locally: bunx bun-pr 30368That installs a local version of the PR into your bun-30368 --bun |
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR modifies TLS shutdown error handling in the OpenSSL integration. When ChangesTLS Shutdown Error Handling
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
Found 1 issue this PR may fix:
🤖 Generated with Claude Code |
There was a problem hiding this comment.
I didn't find any issues and the reasoning in the description is convincing, but this changes graceful-close semantics on a path every TLS socket goes through, so it's worth a human sign-off.
Extended reasoning...
Overview
This PR changes a single branch in ssl_handle_shutdown (packages/bun-usockets/src/crypto/openssl.c): when SSL_shutdown returns -1 with SSL_ERROR_WANT_READ/WANT_WRITE during a graceful close, it now returns 1 (close the raw socket immediately) instead of 0 (defer and wait for the peer). The functional diff is one line — return force_fast_shutdown ? 1 : 0; → return 1; — plus a detailed explanatory comment.
Security risks
None identified. This affects connection-teardown timing only; no auth, crypto negotiation, or data handling is touched. The worst-case effect is an abrupt TCP FIN instead of a flushed close_notify, which TLS 1.2/1.3 peers already tolerate.
Level of scrutiny
High. The diff is tiny, but it sits in the lifecycle path for every TLS socket close in Bun (HTTP(S), Bun.connect, node:tls, etc.). The argument hinges on a subtle invariant — that after SSL_shutdown fails with WANT_WRITE, SSL_SENT_SHUTDOWN is already set so us_internal_ssl_is_shut_down becomes true and on_writable/on_data short-circuit, leaving no retry path for the queued alert. Reading the surrounding code (us_internal_ssl_is_shut_down, us_internal_ssl_on_writable, us_internal_ssl_on_data, us_internal_ssl_close) supports that claim, and the LSan trace corroborates the leak. Still, this is exactly the kind of change where someone who owns the usockets TLS state machine should confirm there's no caller relying on the old deferred behavior under transient WANT_WRITE backpressure.
Other factors
- The fix is for an intermittent ASAN-only flake with no deterministic regression test, so correctness rests on the code-level argument rather than test coverage.
- The
force_fast_shutdown == 1path already returned 1 here, so the change only affects thecode == 0graceful path fromend()→closeAndDetach(.normal). - No prior reviews or outstanding comments on the PR.
What
ssl_handle_shutdowninpackages/bun-usockets/src/crypto/openssl.creturned 0 ("wait for the peer") whenSSL_shutdownfailed withSSL_ERROR_WANT_WRITE/WANT_READduring a graceful close (force_fast_shutdown == 0). Change it to return 1 so the raw socket closes (andSSL_freeruns) immediately.Why
SSL_shutdownallocates BoringSSL'sssl->s3->write_bufferto hold the encodedclose_notifyalert. If the BIO write fails (kernel buffer full, peer already gone),SSL_shutdownreturns -1 withWANT_WRITE. The old code returned 0 fromssl_handle_shutdown, which toldus_internal_ssl_closeto leave the fd open and wait for the peer.That deferral is correct for the
SSL_shutdown() == 0case (alert flushed, waiting for the peer's reply — see the comment inus_internal_ssl_close). It's wrong here: the alert never went out,SSL_SENT_SHUTDOWNis already set on the first call, and onceus_internal_ssl_is_shut_downis trueon_writable/on_datashort-circuit without ever re-dispatching the queued alert. There is no retry path, so the socket stays in limbo holdings->ssland the write buffer until some other event arrives — which may never happen.This shows up as an intermittent LSan failure on the Debian x64-asan lane in
test/js/node/http/node-https-checkServerIdentity.test.ts: the spawned child callsserver.close()and exits right after the request emitserror, so the lingering accepted socket never gets another event andSSL_freenever runs.This is a recurring pre-existing flake on
main— see retrigger commits8722a10109,13a267eabb,ded11f3fb7.Behavior
close_notifyis best-effort. Closing without it produces an abrupt FIN, which is indistinguishable from a half-close to the peer (TLS 1.3 doesn't require it; TLS 1.2 implementations tolerate it).SSL_shutdown() == 0→ wait for the peer'sclose_notify) is unchanged.force_fast_shutdown == 1(forceful close from_destroy()/ abort) already returned 1 here, so no change.Testing
bun bd test test/js/node/http/node-https-checkServerIdentity.test.ts— 4 pass, 0 fail.bun bd test test/js/node/tls/node-tls-connect.test.ts test/js/bun/net/socket.test.ts— 57 pass, 3 skip, 1 fail. The single failure (setSession() should not leak the SSL_SESSION returned by d2i_SSL_SESSION, RSS-growth threshold 40 MiB exceeded at ~293 MiB) reproduces identically on unmodifiedmainin a macOS debug build — pre-existing, unrelated to this change.