Skip to content

Conversation

@FelixVaughan
Copy link
Contributor

@FelixVaughan FelixVaughan commented Nov 1, 2025

Rationale

Addresses #3648

Pool distributes load unevenly across connections when used with load-balanced endpoints (e.g., Kubernetes Services). It always selects the first available connection, causing some backend servers to be overused while others remain idle.

RoundRobinPool solves this by cycling through connections in a round-robin fashion, ensuring even distribution of requests across all backend servers.

Changes

Pool.js distribution
pool

RoundRobinPool.js distribution
rr-pool

Features

  • New RoundRobinPool class(lib/dispatcher/round-robin-pool.js)
  • Documentation(docs/docs/api/RoundRobinPool.md)
  • Tests
  • TS definitions

Status

  • I have read and agreed to the Developer's Certificate of Origin
  • Tested (10 comprehensive tests including distribution verification)
  • Benchmarked (optional) - Maintains same performance characteristics as Pool
  • Documented (API docs, examples, and use case guidance)
  • Review ready
  • In review
  • Merge ready

@codecov-commenter
Copy link

codecov-commenter commented Nov 1, 2025

Codecov Report

❌ Patch coverage is 88.48921% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.84%. Comparing base (55eeeb3) to head (8916027).

Files with missing lines Patch % Lines
lib/dispatcher/round-robin-pool.js 88.32% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4650      +/-   ##
==========================================
- Coverage   92.86%   92.84%   -0.02%     
==========================================
  Files         106      107       +1     
  Lines       33272    33411     +139     
==========================================
+ Hits        30897    31020     +123     
- Misses       2375     2391      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@metcoder95 metcoder95 requested a review from mcollina November 2, 2025 09:37
}
})

test('basic get', async (t) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this implementation relies on the RoundRobin algorithm implemented, I'd suggest that tests also accounts for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added verifies round-robin kGetDispatcher cycling algorithm to complement the round-robin wraps around correctly test

return new Client(origin, opts)
}

class RoundRobinPool extends PoolBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can also provide a similar approach to BalancedPool to account for faulty upstreams (Clients) that are misbehaving and possibly re-balance distribution accordingly to implementer options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus one for this. Would be happy to revisit and see what we can do

@mcollina
Copy link
Member

mcollina commented Nov 2, 2025

The reason the default is to always reuse the last use connection is to avoid the inherent risk of keep-alive race conditions.

IMHO, this should take into account the keep alive timeout of the individual connection, and just skip "old" ones that are most likely to cause an ECONNRESET. If a connection is over 80% of its keepalive, it's probably better to leave it alone unless if there is a different option. This could also be handled in the client itself as it would benefit all pools tbh. Without that change, users of this would see an increase of ECONNRESET errors.

I'm not sure how you guarantee that each connection actually goes to a different server. This seems unlikely, as it depends on the underneath load balancer algorithm.

@metcoder95
Copy link
Member

This could also be handled in the client itself as it would benefit all pools tbh.

Do you refer to the PoolBase isn't it?
I'm afraid that changing that behavior can break contracts that callers already rely on.

IMHO, this should take into account the keep alive timeout of the individual connection, and just skip "old" ones that are most likely to cause an ECONNRESET. If a connection is over 80% of its keepalive, it's probably better to leave it alone unless if there is a different option.

What option do you have in mind?

The proposal SGTM 👍

@mcollina
Copy link
Member

mcollina commented Nov 3, 2025

@metcoder95 nevermind, we already have that logic:

keepAliveTimeout - client[kKeepAliveTimeoutThreshold],
.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

My question on how this guarantees round robin on the server stand.

@FelixVaughan
Copy link
Contributor Author

You're correct, we only guarantee client-side connection distribution.

Backend distribution depends on how the load balancer assigns TCP connections to backends. I initially tested this with a homespun TCP LB which assigned each new connection to backends in round-robin fashion, but if the LB assigns using session affinity for example, there's not a whole lot we can do.

I added documentation explaining the caveat. Also open to other suggestions.

Copy link
Member

@metcoder95 metcoder95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants