Skip to content

WebSocket: release pending-activity ref when close()/terminate() called during CONNECTING#29918

Merged
Jarred-Sumner merged 6 commits into
mainfrom
farm/54e22d98/websocket-connecting-close-leak
Apr 29, 2026
Merged

WebSocket: release pending-activity ref when close()/terminate() called during CONNECTING#29918
Jarred-Sumner merged 6 commits into
mainfrom
farm/54e22d98/websocket-connecting-close-leak

Conversation

@robobun

@robobun robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Repro

const net = require('net');
const server = net.createServer(() => {}).listen(0, '127.0.0.1', () => {
  const ws = new WebSocket(`ws://127.0.0.1:${server.address().port}`);
  ws.onclose = () => { server.close(); console.log('closed'); };
  ws.close();
});

On current main this script never prints closed and never exits. ws.readyState stays CLOSING (2) forever and the WebSocket is never garbage-collected — heapStats() shows every such instance pinned indefinitely. Same for terminate().

Cause

connect() takes a pending-activity ref (incPendingActivityCount(), WebSocket.cpp:667) that is balanced only by didConnect / didFailWithErrorCode / didClose.

When close()/terminate() is called while m_state == CONNECTING, the handler sets m_state = CLOSING, calls Bun__WebSocketHTTP{S,}Client__cancel on the upgrade client, and returns. The Zig cancel() (WebSocketUpgradeClient.zig:439) clears outgoing_websocket without calling didAbruptClose — by design, since C++ initiated the cancel. But C++ never finishes the close either, so none of the completion callbacks fire, m_state never reaches CLOSED, m_pendingActivityCount stays ≥ 1, and hasPendingActivity() returns true permanently — the JS wrapper is a GC root forever and the native ref() is never deref()ed.

As a visible side effect, the existing #16995 test leaves ~4096 WebSocket instances in heapStats() on current main.

Fix

Extract the CONNECTING branch of close()/terminate() into failConnectingWebSocket(). After cancelling the upgrade client it posts a task that:

  1. Sets m_state = CLOSED
  2. Fires an error event followed by a close event with code 1006, wasClean = false — per the spec's "fail the WebSocket connection" steps; matches Chrome/Firefox and Node ws
  3. Calls disablePendingActivity() to release the ref taken in connect()

The native-callback path (m_native.onClose) is handled the same way as didClose.

Verification

New tests in test/js/web/websocket/websocket-close-connecting.test.ts for both close() and terminate():

  • fires error + close events and transitions to CLOSED — TCP server that never responds keeps the socket in CONNECTING; after close() an error event fires, then a close event with {code: 1006, wasClean: false}, and readyState === CLOSED.
  • does not leak — creates 64 sockets, closes each during CONNECTING, verifies via heapStats().objectTypeCounts.WebSocket that they are collected.

All four time out on the unpatched build (close event never fires).

…ed during CONNECTING

connect() takes a pending-activity ref that is only released by
didConnect / didFailWithErrorCode / didClose. When close() or
terminate() is called while still CONNECTING, the Zig upgrade
client's cancel() clears its back-pointer without calling any of
those, so the WebSocket stayed in CLOSING with m_pendingActivityCount
> 0 forever — hasPendingActivity() permanently true, never GC'd, and
no close event fired.

failConnectingWebSocket() now cancels the upgrade and queues a task
that transitions to CLOSED, fires the close event (code 1006,
wasClean=false — matching browsers/Node ws), and calls
disablePendingActivity().
@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 999baa81-2b7c-42a6-a2bc-be9700595100

📥 Commits

Reviewing files that changed from the base of the PR and between 9079d5b and 81bd2a9.

📒 Files selected for processing (3)
  • src/bun.js/bindings/webcore/WebSocket.cpp
  • src/bun.js/bindings/webcore/WebSocket.h
  • test/js/web/websocket/websocket-close-connecting.test.ts

Walkthrough

Introduces a dedicated failConnectingWebSocket() helper method to centralize handling of WebSocket shutdown during the CONNECTING state. Includes new tests validating event ordering and memory behavior for both close() and terminate() operations on connecting WebSockets.

Changes

Cohort / File(s) Summary
WebSocket Connection Failure Handler
src/bun.js/bindings/webcore/WebSocket.cpp, src/bun.js/bindings/webcore/WebSocket.h
Introduces private failConnectingWebSocket() method to unify duplicate shutdown logic in close() and terminate(). Handles state transitions via scheduled tasks, emits error and close events with code 1006 and wasClean false, and balances pending activity references from initialization.
WebSocket CONNECTING State Tests
test/js/web/websocket/websocket-close-connecting.test.ts
New test suite validating both ws.close() and ws.terminate() during CONNECTING state. Verifies event ordering (error before close), close event properties (readyState, code, wasClean), and absence of memory leaks under forced garbage collection with batch operations.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically describes the main change: releasing the pending-activity reference when close()/terminate() is called during CONNECTING state.
Description check ✅ Passed The description comprehensively covers both required sections: detailed explanation of what the PR does (root cause, fix approach, verification) and how it was verified (new tests for event order and leak detection).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 11:49 PM PT - Apr 28th, 2026

@Jarred-Sumner, your commit 8a5dc8c has 6 failures in Build #48949 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29918

That installs a local version of the PR into your bun-29918 executable, so you can run:

bun-29918 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Random breakage on WebSocket closure #17111 - Random breakage on WebSocket closure: terminate() on a connection that may still be in CONNECTING state causes a crash. This PR adds proper handling for terminate()/close() during CONNECTING state.

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #17111

🤖 Generated with Claude Code

Comment thread src/bun.js/bindings/webcore/WebSocket.cpp
Comment thread test/js/web/websocket/websocket.test.js Outdated
robobun added 3 commits April 29, 2026 02:11
Per the WHATWG WebSocket spec, close() while CONNECTING runs 'fail the
WebSocket connection', which requires an error event before the close
event. Mirrors the wasConnecting && isConnectionError branch in
didReceiveClose(). Matches Chrome/Firefox and npm ws.
websocket.test.js has several pre-existing tests (300× TLS handshakes,
1000× connect+close, external echo host) that run near the 5s default
timeout under debug+ASAN regardless of this change. Keeping the new
tests in a dedicated file — consistent with the other per-topic test
files in this directory — lets them be evaluated without those
unrelated borderline tests.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing both nits — the error→close ordering now matches spec/Chrome/Firefox/ws and the test timeout is gone. The change looks correct to me, but since it alters user-visible event behavior and touches the pending-activity refcounting path (and CI is currently red on build-zig/build-bun), I'd like a human to give it a final look.

Extended reasoning...

Overview

This PR fixes a leak where calling close() or terminate() on a WebSocket while readyState === CONNECTING left the socket permanently in CLOSING with m_pendingActivityCount > 0, pinning the JS wrapper as a GC root and never firing a close event. The fix extracts the duplicated CONNECTING-branch logic from close()/terminate() into a new failConnectingWebSocket() helper (WebSocket.cpp:923-972) that cancels the upgrade client, then posts a task which transitions to CLOSED, dispatches errorclose (per WHATWG "fail the WebSocket connection"), and calls disablePendingActivity() to balance the ref taken in connect(). A new test file covers event ordering, close-event properties, and a 64-socket heap-stats leak check for both close() and terminate().

I previously left two inline nits on an earlier revision — (1) missing error event before close, and (2) an explicit test timeout — both of which the author has addressed and resolved. I verified the Zig-side cancel() (WebSocketUpgradeClient.zig:439-459) clears outgoing_websocket before closing the TCP socket, so no completion callback can race the posted task; the m_state == CLOSED early-return guard in the task is defensive and effectively unreachable.

Security risks

None identified. No new untrusted-input parsing, no auth/crypto/permissions surface. The error message string is a fixed literal. The change is purely internal lifecycle/event sequencing.

Level of scrutiny

Moderate-to-high. This is core C++ in the WebSocket client touching refcounting (disablePendingActivity() performs a deref()) and introducing a user-visible behavior change: code that previously saw no events when closing during CONNECTING will now receive error then close. That's spec-correct and matches browsers/npm ws, but it's the kind of observable change that benefits from a maintainer's sign-off. The postTask + Ref { *this } + disablePendingActivity() pattern mirrors the existing m_upgradeClient == nullptr failure path in connect() and the tail of didFailWithErrorCode(), which gives confidence it's idiomatic for this file.

Other factors

  • CI (Build #48938) is red on build-zig / build-bun across several targets. The C++ looks syntactically consistent with surrounding code (e.g., createErrorEventInit(protectedThis, ...) matches the existing call in the connect() failure path), so the failures may be infra-related, but I can't confirm without the logs.
  • New tests are well-targeted and would have caught the original leak (they time out on the unpatched build per the PR description).
  • The github-actions bot suggests this may also fix #17111 (crash on terminate() during CONNECTING), which raises the value of human review to confirm that linkage.

@Jarred-Sumner Jarred-Sumner merged commit 8d2674a into main Apr 29, 2026
69 of 74 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/54e22d98/websocket-connecting-close-leak branch April 29, 2026 06:00
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…ed during CONNECTING (oven-sh#29918)

## Repro

```js
const net = require('net');
const server = net.createServer(() => {}).listen(0, '127.0.0.1', () => {
  const ws = new WebSocket(`ws://127.0.0.1:${server.address().port}`);
  ws.onclose = () => { server.close(); console.log('closed'); };
  ws.close();
});
```

On current main this script never prints `closed` and never exits.
`ws.readyState` stays `CLOSING` (2) forever and the WebSocket is never
garbage-collected — `heapStats()` shows every such instance pinned
indefinitely. Same for `terminate()`.

## Cause

`connect()` takes a pending-activity ref (`incPendingActivityCount()`,
[WebSocket.cpp:667](https://github.com/oven-sh/bun/blob/main/src/bun.js/bindings/webcore/WebSocket.cpp#L667))
that is balanced only by `didConnect` / `didFailWithErrorCode` /
`didClose`.

When `close()`/`terminate()` is called while `m_state == CONNECTING`,
the handler sets `m_state = CLOSING`, calls
`Bun__WebSocketHTTP{S,}Client__cancel` on the upgrade client, and
returns. The Zig `cancel()`
([WebSocketUpgradeClient.zig:439](https://github.com/oven-sh/bun/blob/main/src/http/websocket_client/WebSocketUpgradeClient.zig#L439))
clears `outgoing_websocket` **without** calling `didAbruptClose` — by
design, since C++ initiated the cancel. But C++ never finishes the close
either, so none of the completion callbacks fire, `m_state` never
reaches `CLOSED`, `m_pendingActivityCount` stays ≥ 1, and
`hasPendingActivity()` returns `true` permanently — the JS wrapper is a
GC root forever and the native `ref()` is never `deref()`ed.

As a visible side effect, the existing `oven-sh#16995` test leaves ~4096
`WebSocket` instances in `heapStats()` on current main.

## Fix

Extract the CONNECTING branch of `close()`/`terminate()` into
`failConnectingWebSocket()`. After cancelling the upgrade client it
posts a task that:
1. Sets `m_state = CLOSED`
2. Fires an `error` event followed by a `close` event with code `1006`,
`wasClean = false` — per the spec's "fail the WebSocket connection"
steps; matches Chrome/Firefox and Node `ws`
3. Calls `disablePendingActivity()` to release the ref taken in
`connect()`

The native-callback path (`m_native.onClose`) is handled the same way as
`didClose`.

## Verification

New tests in `test/js/web/websocket/websocket-close-connecting.test.ts`
for both `close()` and `terminate()`:
- `fires error + close events and transitions to CLOSED` — TCP server
that never responds keeps the socket in CONNECTING; after `close()` an
`error` event fires, then a `close` event with `{code: 1006, wasClean:
false}`, and `readyState === CLOSED`.
- `does not leak` — creates 64 sockets, closes each during CONNECTING,
verifies via `heapStats().objectTypeCounts.WebSocket` that they are
collected.

All four time out on the unpatched build (close event never fires).

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants