Skip to content

usockets: don't defer TLS close when close_notify flush fails#30368

Merged
dylan-conway merged 1 commit into
mainfrom
claude/fix-tls-close-notify-shutdown-leak
May 7, 2026
Merged

usockets: don't defer TLS close when close_notify flush fails#30368
dylan-conway merged 1 commit into
mainfrom
claude/fix-tls-close-notify-shutdown-leak

Conversation

@cirospaciari

Copy link
Copy Markdown
Member

What

ssl_handle_shutdown in packages/bun-usockets/src/crypto/openssl.c returned 0 ("wait for the peer") when SSL_shutdown failed with SSL_ERROR_WANT_WRITE / WANT_READ during a graceful close (force_fast_shutdown == 0). Change it to return 1 so the raw socket closes (and SSL_free runs) immediately.

Why

SSL_shutdown allocates BoringSSL's ssl->s3->write_buffer to hold the encoded close_notify alert. If the BIO write fails (kernel buffer full, peer already gone), SSL_shutdown returns -1 with WANT_WRITE. The old code returned 0 from ssl_handle_shutdown, which told us_internal_ssl_close to leave the fd open and wait for the peer.

That deferral is correct for the SSL_shutdown() == 0 case (alert flushed, waiting for the peer's reply — see the comment in us_internal_ssl_close). It's wrong here: the alert never went out, SSL_SENT_SHUTDOWN is already set on the first call, and once us_internal_ssl_is_shut_down is true on_writable/on_data short-circuit without ever re-dispatching the queued alert. There is no retry path, so the socket stays in limbo holding s->ssl and the write buffer until some other event arrives — which may never happen.

This shows up as an intermittent LSan failure on the Debian x64-asan lane in test/js/node/http/node-https-checkServerIdentity.test.ts: the spawned child calls server.close() and exits right after the request emits error, so the lingering accepted socket never gets another event and SSL_free never runs.

Direct leak of 417 byte(s) in 1 object(s) allocated from:
  bssl::SSLBuffer::EnsureCap          vendor/boringssl/ssl/ssl_buffer.cc:72
  bssl::do_tls_write                  vendor/boringssl/ssl/s3_pkt.cc:194
  bssl::tls_dispatch_alert            vendor/boringssl/ssl/s3_pkt.cc:373
  SSL_shutdown                        vendor/boringssl/ssl/ssl_lib.cc:1039
  ssl_handle_shutdown                 packages/bun-usockets/src/crypto/openssl.c:821
  us_internal_ssl_close               packages/bun-usockets/src/crypto/openssl.c:871
  us_internal_ssl_on_data

This is a recurring pre-existing flake on main — see retrigger commits 8722a10109, 13a267eabb, ded11f3fb7.

Behavior

  • The unsent close_notify is best-effort. Closing without it produces an abrupt FIN, which is indistinguishable from a half-close to the peer (TLS 1.3 doesn't require it; TLS 1.2 implementations tolerate it).
  • The deferred-graceful-close path (SSL_shutdown() == 0 → wait for the peer's close_notify) is unchanged.
  • force_fast_shutdown == 1 (forceful close from _destroy() / abort) already returned 1 here, so no change.

Testing

  • bun bd test test/js/node/http/node-https-checkServerIdentity.test.ts — 4 pass, 0 fail.
  • bun bd test test/js/node/tls/node-tls-connect.test.ts test/js/bun/net/socket.test.ts — 57 pass, 3 skip, 1 fail. The single failure (setSession() should not leak the SSL_SESSION returned by d2i_SSL_SESSION, RSS-growth threshold 40 MiB exceeded at ~293 MiB) reproduces identically on unmodified main in a macOS debug build — pre-existing, unrelated to this change.
  • The LSan leak itself is Linux-ASAN-only and timing-dependent; not reproducible on macOS, so there is no deterministic regression test.

ssl_handle_shutdown returned 0 ("wait for the peer") when SSL_shutdown
failed with SSL_ERROR_WANT_WRITE/WANT_READ during a graceful close
(force_fast_shutdown==0). That deferral makes sense for the
SSL_shutdown()==0 case (close_notify flushed, waiting for the peer's
reply) but not here: the alert never went out, SSL_SENT_SHUTDOWN is
already set, and on_writable/on_data short-circuit through
is_shut_down without re-dispatching the queued alert. Nothing will
ever close the fd, and SSL_free is only reached via raw close, so
s->ssl and the BoringSSL write_buffer holding the encoded alert leak
until some other socket event happens to arrive.

Observed as an intermittent LeakSanitizer failure in
test/js/node/http/node-https-checkServerIdentity.test.ts on the Debian
x64-asan lane (the spawned child exits right after server.close(), so
no further event arrives):

  Direct leak of 417 byte(s) in 1 object(s) allocated from:
    bssl::SSLBuffer::EnsureCap          ssl/ssl_buffer.cc:72
    bssl::do_tls_write                  ssl/s3_pkt.cc:194
    bssl::tls_dispatch_alert            ssl/s3_pkt.cc:373
    SSL_shutdown                        ssl/ssl_lib.cc:1039
    ssl_handle_shutdown                 crypto/openssl.c:821
    us_internal_ssl_close               crypto/openssl.c:871
    us_internal_ssl_on_data

Return 1 instead so us_internal_ssl_close raw-closes immediately,
which detaches and frees the SSL. The unsent close_notify is
best-effort; the abrupt FIN is indistinguishable from a half-close to
the peer.
@robobun

robobun commented May 7, 2026

Copy link
Copy Markdown
Collaborator
Updated 11:52 AM PT - May 7th, 2026

@cirospaciari, your commit 8abe664 has 3 failures in Build #52622 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 30368

That installs a local version of the PR into your bun-30368 executable, so you can run:

bun-30368 --bun

@coderabbitai

coderabbitai Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 44f1b48f-0193-48b2-89e4-464bdbb26994

📥 Commits

Reviewing files that changed from the base of the PR and between c5a2f8f and 8abe664.

📒 Files selected for processing (1)
  • packages/bun-usockets/src/crypto/openssl.c

Walkthrough

This PR modifies TLS shutdown error handling in the OpenSSL integration. When SSL_shutdown() returns 0 and the close-notify TLS alert cannot be flushed due to a BIO write failure, the function now unconditionally returns 1 instead of conditionally checking force_fast_shutdown. An explanatory comment documents that this prevents indefinite TLS alert queueing.

Changes

TLS Shutdown Error Handling

Layer / File(s) Summary
TLS Shutdown Error Path
packages/bun-usockets/src/crypto/openssl.c
ssl_handle_shutdown error handling for SSL_shutdown return 0 with failed close-notify flush now unconditionally returns 1 instead of conditionally depending on force_fast_shutdown. Expanded comment explains this prevents indefinite TLS alert queueing when SSL_SENT_SHUTDOWN is already set.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: preventing deferred TLS closure when close_notify fails to flush.
Description check ✅ Passed The description is comprehensive and well-structured, covering what changed, why, behavior implications, and testing results—exceeding the template requirements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Doing a large amount of fetch requests results in a memory leak #20912 - The crash stack trace shows us_internal_ssl_socket_close in the uSockets TLS close path — the exact code modified by this PR. The memory leak under high HTTPS fetch load is consistent with SSL objects not being freed when close_notify flush fails.

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #20912

🤖 Generated with Claude Code

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any issues and the reasoning in the description is convincing, but this changes graceful-close semantics on a path every TLS socket goes through, so it's worth a human sign-off.

Extended reasoning...

Overview

This PR changes a single branch in ssl_handle_shutdown (packages/bun-usockets/src/crypto/openssl.c): when SSL_shutdown returns -1 with SSL_ERROR_WANT_READ/WANT_WRITE during a graceful close, it now returns 1 (close the raw socket immediately) instead of 0 (defer and wait for the peer). The functional diff is one line — return force_fast_shutdown ? 1 : 0;return 1; — plus a detailed explanatory comment.

Security risks

None identified. This affects connection-teardown timing only; no auth, crypto negotiation, or data handling is touched. The worst-case effect is an abrupt TCP FIN instead of a flushed close_notify, which TLS 1.2/1.3 peers already tolerate.

Level of scrutiny

High. The diff is tiny, but it sits in the lifecycle path for every TLS socket close in Bun (HTTP(S), Bun.connect, node:tls, etc.). The argument hinges on a subtle invariant — that after SSL_shutdown fails with WANT_WRITE, SSL_SENT_SHUTDOWN is already set so us_internal_ssl_is_shut_down becomes true and on_writable/on_data short-circuit, leaving no retry path for the queued alert. Reading the surrounding code (us_internal_ssl_is_shut_down, us_internal_ssl_on_writable, us_internal_ssl_on_data, us_internal_ssl_close) supports that claim, and the LSan trace corroborates the leak. Still, this is exactly the kind of change where someone who owns the usockets TLS state machine should confirm there's no caller relying on the old deferred behavior under transient WANT_WRITE backpressure.

Other factors

  • The fix is for an intermittent ASAN-only flake with no deterministic regression test, so correctness rests on the code-level argument rather than test coverage.
  • The force_fast_shutdown == 1 path already returned 1 here, so the change only affects the code == 0 graceful path from end()closeAndDetach(.normal).
  • No prior reviews or outstanding comments on the PR.

@dylan-conway dylan-conway merged commit d352dfd into main May 7, 2026
76 of 80 checks passed
@dylan-conway dylan-conway deleted the claude/fix-tls-close-notify-shutdown-leak branch May 7, 2026 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants