Skip to content

fix(ws): process pending handshakes concurrently#2359

Merged
richard-ramos merged 2 commits into
masterfrom
fix/wss-socket-leak
May 13, 2026
Merged

fix(ws): process pending handshakes concurrently#2359
richard-ramos merged 2 commits into
masterfrom
fix/wss-socket-leak

Conversation

@richard-ramos
Copy link
Copy Markdown
Member

@richard-ramos richard-ramos commented Apr 28, 2026

Summary

This PR fixes WebSocket accept head-of-line blocking caused by slow or malformed HTTP upgrade requests.

WsTransport now accepts raw HTTP streams separately from WebSocket handshake parsing. A dispatcher owns HttpServer.acceptStream(), while bounded handshake workers parse headers with readHttpRequest() and pass valid requests to WSServer.handleRequest(). Successfully upgraded connections are queued for WsTransport.accept().

This prevents one incomplete WebSocket handshake from blocking later valid WebSocket connections until the header timeout expires.

Requires:

Should fix:

Affected Areas

  • Gossipsub
  • Transports
    WebSocket transport accept/handshake path, dependency bump to nim-websock PR 193. Do not merge until then
  • Peer Management / Discovery
  • Protocol Logic
  • Build / Tooling
  • Other

Compatibility & Downstream Validation

Reference PRs / branches / commits demonstrating successful integration:

  • Nimbus:
    N/A

  • Waku:
    N/A

  • Codex:
    N/A

Impact on Library Users

This changes the public WsTransport.new timeout parameter from handshakeTimeout to headersTimeout, with the former being deprecated in nim-websocks.

Behaviorally, malformed or slow WebSocket handshakes are now handled inside the transport and should no longer cause the switch accept loop to stop or block valid WebSocket accepts behind header timeouts.

The WebSocket transport also gains a configurable concurrentAccepts limit, defaulting to 200, to bound concurrent handshake work and accepted-connection queueing.

Risk Assessment

Risk is moderate because this changes the WebSocket accept path and task lifecycle.

Main risks:

  • Accept ordering is no longer deterministic when multiple WebSocket handshakes are in flight.
  • Downstreams using the old named constructor argument handshakeTimeout must rename it to headersTimeout.
  • Very large malformed-connection floods can still consume bounded accept worker capacity, sockets, TLS resources, or OS backlog, but they should no longer serialize all WebSocket accepts one header timeout at a time.

Mitigations:

  • Handshake concurrency is bounded.
  • Successful accepts are queued through a bounded internal queue.
  • Malformed and slow handshakes are logged and closed inside the transport.
  • WebSocket transport tests include a regression for slow incomplete headers not blocking a valid accept.

References

Additional Notes

Verified locally with:

nim c --path:. tests/libp2p/transports/test_ws.nim
./tests/libp2p/transports/test_ws --output-level=VERBOSE

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates WsTransport’s inbound accept path to avoid head-of-line blocking from slow/malformed HTTP upgrade requests by splitting raw stream acceptance from WebSocket handshake parsing and bounding handshake concurrency.

Changes:

  • Refactors WebSocket accept flow to use an accept dispatcher + bounded handshake workers and a bounded queue of successfully-upgraded connections.
  • Renames the public constructor argument from handshakeTimeout to headersTimeout and adds concurrentAccepts to bound in-flight handshake work.
  • Adds a regression test for “slow headers don’t block valid accepts” and adjusts stream transport tests to tolerate nondeterministic accept order.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
libp2p/transports/wstransport.nim Implements concurrent accept/handshake pipeline and new configuration knobs (headersTimeout, concurrentAccepts).
tests/libp2p/transports/test_ws.nim Adds regression test to ensure slow/incomplete headers don’t block a valid accept.
tests/libp2p/transports/stream_tests.nim Updates connection-parallelism test to be robust to nondeterministic accept order in concurrent transports.

Comment thread libp2p/transports/wstransport.nim Outdated
Comment thread tests/libp2p/transports/test_ws.nim Outdated
Comment thread libp2p/transports/wstransport.nim Outdated
Comment on lines +192 to +197
proc(): Future[Connection] {.async: (raises: [CatchableError]).} =
let req = await readHttpRequest(stream, server.headersTimeout)
let wstransp = await self.wsserver.handleRequest(req)
return await self.connHandler(wstransp, server.secure, Direction.In)
)()
.wait(self.headersTimeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proc is only here so you can wait(self.headersTimeout)? If so, maybe create a template so you can

withTimeout(self.headersTimeout):
  your_code

Comment thread libp2p/transports/wstransport.nim Outdated
Comment thread libp2p/transports/wstransport.nim
Copilot AI review requested due to automatic review settings April 28, 2026 20:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment thread tests/libp2p/transports/stream_tests.nim Outdated
Comment thread libp2p/transports/wstransport.nim
Comment thread libp2p.nimble Outdated
Comment thread .pinned Outdated
Comment thread tests/libp2p/transports/test_ws.nim Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 71.59091% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.81%. Comparing base (c6ac669) to head (6467d4f).

Files with missing lines Patch % Lines
libp2p/transports/wstransport.nim 71.59% 50 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2359      +/-   ##
==========================================
+ Coverage   73.77%   73.81%   +0.03%     
==========================================
  Files         150      150              
  Lines       19810    19910     +100     
  Branches       19       19              
==========================================
+ Hits        14615    14696      +81     
- Misses       5195     5214      +19     
Files with missing lines Coverage Δ
libp2p/transports/wstransport.nim 81.79% <71.59%> (+0.06%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings April 28, 2026 22:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread libp2p/transports/wstransport.nim Outdated
Comment thread tests/libp2p/transports/test_ws.nim Outdated
Comment on lines +117 to +119
slow.close()
slowClosed = true

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slow.close() is used after the accept/dial succeeds, but it does not wait for the underlying transport to actually close. Since this test is asserting timing behavior and then immediately tears down connections/transports, prefer await slow.closeWait() here (with appropriate error handling) to reduce flakiness from lingering sockets during teardown.

Copilot uses AI. Check for mistakes.
Comment thread libp2p/transports/wstransport.nim Outdated
Comment on lines +216 to +217
if not accepted:
await closeHttpStream(stream)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to do this in deffer? would avoid same pattern in CancelledError as well

Comment thread libp2p/transports/wstransport.nim Outdated
Comment thread libp2p/transports/wstransport.nim Outdated
Comment on lines +281 to +289
finally:
for fut in acceptFuts:
if not fut.finished:
await noCancel fut.cancelAndWait()
elif fut.completed():
try:
await closeHttpStream(fut.read())
except CatchableError as exc:
trace "Error reading completed WS accept stream", description = exc.msg
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be done in defer? it would avoid this gigantic try block and indentation

Comment on lines +405 to +406
self.connections[Direction.In].mapIt(it.close()) &
self.connections[Direction.Out].mapIt(it.close())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.connections.values().mapIt(it.close()) ?

@github-project-automation github-project-automation Bot moved this to In Progress in nim-libp2p Apr 29, 2026
@richard-ramos richard-ramos force-pushed the fix/wss-socket-leak branch from 0da86f0 to 262056f Compare May 13, 2026 14:43
Copilot AI review requested due to automatic review settings May 13, 2026 15:45
@richard-ramos richard-ramos force-pushed the fix/wss-socket-leak branch from 262056f to 9cc67a9 Compare May 13, 2026 15:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

libp2p/transports/wstransport.nim:389

  • stop() enqueues the nil sentinel via notifyAcceptClosed() before cancelling acceptLoop / handshakeFuts. A handshake worker can still successfully enqueue a real Connection after this sentinel, causing accept() to pop nil and raise TransportClosedError even though there are valid connections still queued behind it (leaving them unaccepted/leaked).

Consider only signalling closure after all handshake workers are stopped (or preventing workers from enqueueing once shutdown begins), so the sentinel cannot be inserted ahead of real connections.

  self.running = false # mark stopped as soon as possible
  self.notifyAcceptClosed()

  try:
    trace "Stopping WS transport"
    await procCall Transport(self).stop() # call base

    var toWait: seq[Future[void]]
    if not isNil(self.acceptLoop) and not self.acceptLoop.finished:
      toWait.add(self.acceptLoop.cancelAndWait())

    for fut in self.handshakeFuts:
      if not fut.finished:
        toWait.add(fut.cancelAndWait())

Comment thread libp2p/transports/wstransport.nim
Comment thread tests/libp2p/transports/test_ws.nim Outdated
Comment thread libp2p.nimble
@richard-ramos richard-ramos force-pushed the fix/wss-socket-leak branch from 9cc67a9 to f5e4d60 Compare May 13, 2026 16:34
@richard-ramos richard-ramos enabled auto-merge (squash) May 13, 2026 16:35
Copilot AI review requested due to automatic review settings May 13, 2026 17:31
@richard-ramos richard-ramos force-pushed the fix/wss-socket-leak branch from 3ab31d6 to 6467d4f Compare May 13, 2026 17:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


const
DefaultHeadersTimeout = 3.seconds
DefaultConcurrentAccepts = 200
Comment thread .pinned
@richard-ramos richard-ramos disabled auto-merge May 13, 2026 19:07
@richard-ramos richard-ramos merged commit adff819 into master May 13, 2026
51 of 52 checks passed
@richard-ramos richard-ramos deleted the fix/wss-socket-leak branch May 13, 2026 19:07
@github-project-automation github-project-automation Bot moved this from In Progress to done in nim-libp2p May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: done

Development

Successfully merging this pull request may close these issues.

5 participants