Skip to content

sql: grow connection pool lazily instead of opening max on first use#30636

Draft
robobun wants to merge 3 commits into
mainfrom
farm/b4c5677a/sql-lazy-pool
Draft

sql: grow connection pool lazily instead of opening max on first use#30636
robobun wants to merge 3 commits into
mainfrom
farm/b4c5677a/sql-lazy-pool

Conversation

@robobun

@robobun robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Fixes #30632.

Repro

const sql = new Bun.SQL({
  adapter: "postgres",
  host: "localhost",
  port: 5432,
  username: "bun_sql_test",
  database: "bun_sql_test",
  max: 500,
});

await sql`SELECT 1`;
// `select count(*) from pg_stat_activity` now shows 500 backend connections

One query on a max: 500 pool opens all 500 connections. The user reports
this happens "immediately" — it's actually on the first use, but same net
effect: a pool never scales with load, it jumps straight to the ceiling.

Cause

connect() instantiated every pool slot eagerly on the first call, and
each pooled-connection constructor dials TCP immediately:

// src/js/internal/sql/shared.ts — pre-fix (BaseSQLAdapter.connect)
this.poolStarted = true;
const pollSize = this.connections.length; // = max
const firstConnection = this.createPooledConnection();
this.connections[0] = firstConnection;
// ...
for (let i = 1; i < pollSize; i++) {
  this.connections[i] = this.createPooledConnection();
}

Fix

Replace the eager burst with lazy growth. After the rebase onto main this
lands once in BaseSQLAdapter / BasePooledConnection in
src/js/internal/sql/shared.ts (the pool plumbing was consolidated there
by #32145; the original branch patched postgres.ts and mysql.ts
separately).

  • Pool starts empty; #tryGrowPool() appends one connection at a time,
    capped at maxPoolSize (= options.max).
  • First connect() opens exactly one connection.
  • flushConcurrentQueries() grows the pool when queued queries can't be
    served by ready or pending slots, discounting pending slots already
    earmarked for reservedQueue.
  • maxDistribution() divides by the pool ceiling rather than the current
    size, so it doesn't collapse every queued query onto the first socket.
  • connect()'s retry branch grows in parallel while callers wait behind
    in-flight slots. For non-retryable auth failures it fails fast with the
    cached error — unless password is a function (dynamic credential,
    e.g. IAM token), in which case it grows or force-retries a closed slot
    via the new forceRetry() so a rotated token can take effect even at
    max: 1.

Verification

before: 1 query on max=50 → 50 sockets opened
after:  1 query on max=50 → 1 socket opened
after:  100 concurrent queries on max=20 → 20 sockets, drain in ~5 rounds

Tests

test/js/sql/sql-pool-lazy-growth.test.ts — bare Bun.listen() TCP
sinks (one inert, one speaking just enough Postgres to reject auth), no
Docker/real DB required:

  • a single query opens exactly 1 socket, not max (postgres + mysql)
  • static-password auth failure fast-fails without opening more sockets
  • function-password retries auth per query at max: 1 (credential
    rotation recovery)
  • synchronous password() throw rejects instead of hanging

4 of 5 tests fail on unfixed main; the sync-throw guard passes there
because main now defers createConnection failures via
process.nextTick (it documents the contract regardless).

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 1:40 PM PT - Jun 15th, 2026

@alii, your commit 5143f2c has some failures in Build #62607 (All Failures)


🧪   To try this PR locally:

bunx bun-pr 30636

That installs a local version of the PR into your bun-30636 executable, so you can run:

bun-30636 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 2 issues this PR may fix:

  1. 🚀 Feature Request: Customizable Connection Pool Strategy (eager vs lazy) mysql / postgres #30631 - Feature request for customizable connection pool strategy (eager vs lazy), which this PR directly implements by switching to lazy pool growth
  2. [Bug] MySQL pool size > 1 causes Bun to hang during sql execution #26235 - MySQL pool size > 1 causes Bun to hang during SQL execution; lazy pool growth avoids eagerly opening all connections, which is the condition triggering the hang

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #30631
Fixes #26235

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Both MySQL and Postgres adapters now grow connection pools lazily on-demand up to a new maxPoolSize ceiling. Pool capacity tracking, growth helpers, distribution logic, and connect/flush paths were updated. A test asserts a single query opens exactly one TCP connection for each adapter.

Changes

Lazy Connection Pool Growth

Layer / File(s) Summary
Pool capacity tracking and lazy initialization
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
Both adapters add maxPoolSize and initialize connections as an empty array instead of pre-allocating to capacity.
Pool growth helper methods
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
Introduce private #tryGrowPool() to append new pooled connections when under maxPoolSize and #pendingConnectionsCount() to count in-flight handshakes.
Distribution and flush logic
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
maxDistribution() now targets maxPoolSize and flushConcurrentQueries() grows the pool when unserved waiters remain and pending handshakes are insufficient.
Connection requests when pool is active
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
connect() now attempts #tryGrowPool() when the pool has started but no ready connections exist, enqueuing callers if growth succeeds or failing fast on non-retryable errors.
Pool initialization for first query
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
First-query initialization opens a single connection via #tryGrowPool() (with preReserved when requested) instead of creating a full initial set.
Reserved connection path pool growth
src/js/internal/sql/mysql.ts, src/js/internal/sql/postgres.ts
Reserved-connection branch now enqueues callers and tries #tryGrowPool() when no ready connection can be assigned immediately.
Lazy pool behavior verification
test/js/sql/sql-pool-lazy-growth.test.ts
Add parameterized tests for postgres and mysql that use a TCP sink to count socket opens and assert a single SELECT 1 triggers exactly one TCP connection; add a Postgres handshake-failure test that ensures subsequent attempts fail fast.

Possibly related issues

  • #30632 — Implements lazy pool growth to prevent pre-allocation of all max connections on SQL adapter initialization; this PR tracks maxPoolSize and grows connections on demand.
  • #30631 — Matches the introduced lazy growth helpers and maxPoolSize changes for MySQL/Postgres adapters; functionally related.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: converting connection pool initialization from eager (opening max connections immediately) to lazy (opening connections on-demand).
Linked Issues check ✅ Passed The PR fully addresses issue #30632: it implements lazy pool growth starting from empty connections instead of pre-allocating max slots, preventing immediate connection burst and enabling dynamic scaling based on load.
Out of Scope Changes check ✅ Passed All changes directly support lazy pool growth: MySQL and Postgres adapters add maxPoolSize and tryGrowPool(), flushConcurrentQueries() updated to trigger growth, connect() refactored for lazy initialization, and tests verify the fix without unrelated alterations.
Description check ✅ Passed The PR description thoroughly explains the issue, cause, fix, and verification with code examples and test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/sql/sql-mysql.test.ts`:
- Around line 17-76: Extract the two tests beginning with describe("mysql
connection pool grows lazily (`#30632`)") from test/js/sql/sql-mysql.test.ts into
a new test file named something like test/js/sql/sql-mysql-nondocker.test.ts (or
sql-mysql-mock.test.ts); remove that describe block from the original file so
Docker-gated suite remains clean; keep the tests' exact bodies and using/server
setup and preserve imports (e.g., SQL, Bun.listen) and async semantics, and
ensure the new file is picked up by the test runner (naming convention) so the
non-Docker/mock MySQL regression suite runs separately from sql-mysql.test.ts.

In `@test/js/sql/sql.test.ts`:
- Around line 112-114: The test currently uses a fixed setTimeout to "give
connectors a chance" which is flaky; replace the hardcoded wait with a
deterministic wait that polls the condition until openedSockets === 0 (or times
out); update the snippet around openedSockets so the test awaits a
small-interval retry loop or a test helper (e.g., waitFor) that repeatedly
checks openedSockets and resolves when it becomes 0, failing the test if a
configurable timeout elapses—target the openedSockets variable and the test
block where the setTimeout is used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 652c6181-90bd-4cf5-ad8e-ed0d16dd92da

📥 Commits

Reviewing files that changed from the base of the PR and between b9c757b and 4af886b.

📒 Files selected for processing (4)
  • src/js/internal/sql/mysql.ts
  • src/js/internal/sql/postgres.ts
  • test/js/sql/sql-mysql.test.ts
  • test/js/sql/sql.test.ts

Comment thread test/js/sql/sql-mysql.test.ts Outdated
Comment thread test/js/sql/sql.test.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/sql/sql-pool-lazy-growth.test.ts`:
- Around line 45-46: The test currently iterates adapters with a for-loop and
calls describe(`${adapter} connection pool grows lazily (`#30632`)`, ...) which
violates the guideline to use parameterized suites; replace the manual loop with
describe.each([...]) so the suite is declared via
describe.each(["postgres","mysql"]) and the suite title uses the adapter
parameter (e.g., describe.each(...)(`%s connection pool grows lazily (`#30632`)`,
(adapter) => { ... }); update the existing describe block and move its inner
tests unchanged into the new describe.each callback so test names and behavior
remain the same.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 24709e75-36f3-45a6-8926-28306b0c3ee4

📥 Commits

Reviewing files that changed from the base of the PR and between 4af886b and 5169650.

📒 Files selected for processing (1)
  • test/js/sql/sql-pool-lazy-growth.test.ts

Comment thread test/js/sql/sql-pool-lazy-growth.test.ts Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 test/js/sql/sql.test.ts:87-115 — This test never issues a query, so on pre-fix Bun the eager burst (which lived inside connect(), not the constructor) is never triggered — openedSockets is 0 on system Bun too. That means the PR description's claim that "all 5 tests fail under USE_SYSTEM_BUN=1" is inaccurate for this case, and per CLAUDE.md a test that passes with USE_SYSTEM_BUN=1 is not a valid regression test. Consider dropping it (the other 4 tests already cover the fix) or correcting the PR description to say 4 of 5; also note CLAUDE.md says not to use setTimeout in tests.

    Extended reasoning...

    What the issue is

    The third Postgres test, "constructing Bun.SQL without running a query opens no connections", asserts that constructing a Bun.SQL instance with max: 10 and never issuing a query results in openedSockets === 0. The PR description states "Verified all 5 tests fail under USE_SYSTEM_BUN=1 (receive 50 sockets instead of 1)", but this particular test cannot fail on the unpatched runtime: it has max: 10 (not 50), it never runs a query, and the pre-fix code never opened sockets at construction time. Root CLAUDE.md line 130 says: "CRITICAL: Verify your test fails with USE_SYSTEM_BUN=1 bun test <file> and passes with bun bd test <file>. Your test is NOT VALID if it passes with USE_SYSTEM_BUN=1." This test does not meet that bar.

    Why pre-fix Bun also opens 0 sockets here

    Before this PR, the PostgresAdapter constructor only did:

    this.connections = new Array(connectionInfo.max);

    That allocates a sparse array of holes — no PooledPostgresConnection instances are constructed, so no TCP dials happen. The eager burst lived inside connect(), behind if (!this.poolStarted):

    this.poolStarted = true;
    const pollSize = this.connections.length;
    const firstConnection = new PooledPostgresConnection(this.connectionInfo, this);
    this.connections[0] = firstConnection;
    for (let i = 1; i < pollSize; i++) {
      this.connections[i] = new PooledPostgresConnection(this.connectionInfo, this);
    }

    connect() is only invoked when a query is run. The PR description itself acknowledges this: "The user reports this happens 'immediately' — it's actually on the first use". Since this test never issues a query, connect() is never called and poolStarted stays false.

    Step-by-step on system Bun

    1. new SQL({ adapter: "postgres", max: 10, ... })PostgresAdapter constructor runs, sets this.connections = new Array(10) (10 empty holes), poolStarted = false. No PooledPostgresConnection is created, so no socket is opened. openedSockets = 0.
    2. await new Promise(r => setTimeout(r, 50)) → nothing happens; there is no pending dial to race. openedSockets is still 0.
    3. expect(openedSockets).toBe(0)passes.
    4. await using disposal calls close()#close(). poolStarted is false, so the connection-iteration loop is skipped; readyConnections.clear() and waitingQueue.length = 0 are no-ops on empty containers.

    So under USE_SYSTEM_BUN=1 this test passes, contradicting the PR description's "all 5 tests fail" claim and violating the CLAUDE.md requirement.

    Why this is only a nit

    This is a test-methodology / PR-description-accuracy issue, not a runtime defect. The other four tests (single query → 1 socket, 20 concurrent on max: 5 → 5 sockets, ×2 for MySQL) genuinely fail on system Bun and pass with the fix, so the regression is properly covered. One could argue the spirit of the CLAUDE.md rule is "the PR must include at least one test that demonstrably catches the regression", which is satisfied — and this third test is a reasonable forward-looking guard against a future regression where the constructor itself becomes eager. But taken literally, the repo's own contributing guide says a test that passes under USE_SYSTEM_BUN=1 "is NOT VALID", and the PR body's verification claim is factually wrong for this case.

    Separately, the same test uses await new Promise(r => setTimeout(r, 50)), which CLAUDE.md line 129 also flags: "Do not use setTimeout in tests."

    Suggested fix

    Either:

    • Drop this test (tests 1, 2, 4, 5 already cover the fix and satisfy the USE_SYSTEM_BUN=1 rule), or
    • Keep it as a guard but update the PR description from "all 5 tests fail" to "tests 1, 2, 4, 5 fail under USE_SYSTEM_BUN=1; test 3 is a guard for already-correct constructor behavior", and replace the setTimeout(50) with something that doesn't rely on wall-clock time (e.g. await Bun.sleep(0) / a microtask drain, or just assert immediately since construction is synchronous).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/js/sql/sql-pool-lazy-growth.test.ts (1)

35-36: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Replace manual adapter loop with describe.each()

Use parameterized suite declaration instead of the for loop to match the repo’s test style rules.

♻️ Proposed change
-for (const adapter of ["postgres", "mysql"] as Adapter[]) {
-  describe(`${adapter} connection pool grows lazily (`#30632`)`, () => {
+describe.each(["postgres", "mysql"] as Adapter[])("%s connection pool grows lazily (`#30632`)", adapter => {
     test("a single query only opens one TCP connection, not `max`", async () => {
       using sink = makeSink();
       await using sql = new SQL({
@@
       await sql`SELECT 1`.catch(() => {});
       expect(sink.opened).toBe(1);
     });
-  });
-}
+  });
+});

As per coding guidelines, test/**/*.test.{ts,js,jsx,tsx,mjs,cjs} should "Use describe.each() for parameterized tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/js/sql/sql-pool-lazy-growth.test.ts` around lines 35 - 36, Replace the
manual for-loop creating per-adapter suites with a parameterized suite using
describe.each([...]) to match repo test style: convert the `for (const adapter
of ["postgres", "mysql"] as Adapter[]) { describe(\`${adapter} connection pool
grows lazily (`#30632`)\`, () => { ... }) }` pattern into `describe.each([
"postgres", "mysql" ])("... %s ...", (adapter) => { ... })`, ensuring you pass
the adapter into the inner tests (references: the outer describe block that
currently interpolates `${adapter}` and any uses of the `adapter` variable
inside tests) so behavior and test names remain identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@test/js/sql/sql-pool-lazy-growth.test.ts`:
- Around line 35-36: Replace the manual for-loop creating per-adapter suites
with a parameterized suite using describe.each([...]) to match repo test style:
convert the `for (const adapter of ["postgres", "mysql"] as Adapter[]) {
describe(\`${adapter} connection pool grows lazily (`#30632`)\`, () => { ... }) }`
pattern into `describe.each([ "postgres", "mysql" ])("... %s ...", (adapter) =>
{ ... })`, ensuring you pass the adapter into the inner tests (references: the
outer describe block that currently interpolates `${adapter}` and any uses of
the `adapter` variable inside tests) so behavior and test names remain
identical.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 46c2416a-0c8d-4721-8047-ad487409fc60

📥 Commits

Reviewing files that changed from the base of the PR and between 5169650 and 3308293.

📒 Files selected for processing (1)
  • test/js/sql/sql-pool-lazy-growth.test.ts

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

CI red on #54075, but none of the failures touch this diff — the SQL pool change only touches src/js/internal/sql/{postgres,mysql}.ts. What failed on x64-asan:

  • test/js/web/fetch/fetch-http2-client.test.ts — JSC panic ThreadLock is locked by thread 6422, not thread 6433 in an HTTP/2 POST+ReadableStream test (unrelated fetch code path)
  • Flaky-retry warnings for fetch.tls, update_interactive_install, and hot/hot.test.ts — all in fetch/CLI, not SQL

Already spent my one ci: retrigger on the earlier gate-infra flake (Permission denied during release rebuild), so this needs a maintainer retry.

Comment thread src/js/internal/sql/postgres.ts Outdated
@RiskyMH

RiskyMH commented May 13, 2026

Copy link
Copy Markdown
Collaborator

@robobun now i wonder is sqlite truly safe from this :)

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

@RiskyMH sqlite's safe — no pool. SQLiteAdapter has a single db handle (one new Database(filename) in the constructor, no max option, no pool slots to grow eagerly or otherwise).

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Skip pool growth when non-retryable auth error is set

@RiskyMH

RiskyMH commented May 13, 2026

Copy link
Copy Markdown
Collaborator

@robobun respond to @Lillious's review plz

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Skip pool growth when non-retryable auth error is set

@claude Tell @robobun to make these changes

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Done in d0ade91 — the non-retryable branch now skips #tryGrowPool() unless password is supplied as a function. Added a regression test using a minimal fake server that answers StartupMessage with AuthenticationRequest code 9 (SSPI) — Bun rejects as ERR_POSTGRES_UNSUPPORTED_AUTHENTICATION_METHOD. 5 sequential failed queries now open 1 socket (was 5).

@Lillious

Copy link
Copy Markdown
Contributor

Done in d0ade91 — the non-retryable branch now skips #tryGrowPool() unless password is supplied as a function. Added a regression test using a minimal fake server that answers StartupMessage with AuthenticationRequest code 9 (SSPI) — Bun rejects as ERR_POSTGRES_UNSUPPORTED_AUTHENTICATION_METHOD. 5 sequential failed queries now open 1 socket (was 5).

@robobun What if the password is supplied as a template literal?

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

A template literal is just a plain string by the time it hits the constructor — ${token} runs the interpolation before new SQL() ever sees the object. Bun can't re-evaluate it; it's identical to password: "foo".

The dynamic-creds escape hatch is specifically for the password: () => fetchFreshToken() shape — a function Bun re-invokes every time it opens a new socket (PooledPostgresConnection.createConnection does password = password() on each attempt). That's the only case where a retry can legitimately pick up a new secret.

If you want to refresh credentials per retry, wrap it:

new Bun.SQL({ password: async () => (await vault.fetch("db-pw")).value })

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Could there be an edge case scenario where database password is updated and the static password used to connect becomes stale and then the database password is reverted? How can we handle this the with fail fast implementation?

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Real concern — let me walk through the lifecycle.

Fail-fast only kicks in when every current pool slot has a cached non-retryable auth error. The pool doesn't know the server reverted the password, so cached errors keep getting returned until something clears them. Three things clear them today:

  1. .end() + new new SQL(...) — the app observes the auth errors and rebuilds the pool. Standard pattern.
  2. maxLifetime — if set, previously-connected slots close on expiry. Doesn't help for slots that never got past auth (canBeConnected was never set).
  3. password: () => latestPassword — the function shape is re-invoked per TCP attempt, so a fresh retry picks up whatever the current value is. This is the documented escape hatch for any rotating secret (IAM, Vault, or your rotate/revert scenario).

For the specific "static string, rotated, then reverted" flow: pre-fix Bun had the same end-state — it returned the cached auth error too — it just burned an extra TCP+auth RTT per query doing so. The reporter-visible difference is zero; the fix just stops the wasted handshakes.

So the recommendation for anyone whose password can change under them is the function shape. If there's demand for a sql.reset() / explicit "forget cached errors and retry" API, that's worth a separate issue — but it's additive to this fix, not a blocker.

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Please give me credit in the release notes when this releases.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Release notes are written by the maintainers at release time — I don't control them. You're credited as the reporter in #30632 and the PR body links back, so it's there for whoever drafts the notes.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Second CI run: same story as before — no failures touch this diff. Failed tests are all in fetch/hot-reload/jsc-stress/http2/node-http lanes:

  • fetch.tls — TimeoutError/UnreachableError mismatch
  • jsonwebtoken/async_sign — 90s timeout
  • hot.test.ts — sourcemap reload (was flaky on the previous CI too)
  • jsc-stressmprotect failed: 487 on Windows x64-baseline (OS-level)
  • fetch-http2-client — HTTP/2 ReadableStream test (same failure as previous CI)
  • 2x parallel node:http timeout on Windows 2019

Already used my one ci: retrigger earlier — needs a maintainer to re-run CI. Diff is green.

Comment thread src/js/internal/sql/postgres.ts Outdated
@Lillious

Copy link
Copy Markdown
Contributor

@robobun Review the recent comment made by @claude

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

@Lillious reviewed — valid finding. Fixed in bad3076. The growth heuristic was counting all pending connections as capacity for waitingQueue, but pending connections opened for reserve() callers get claimed by reservedQueue first (via release()) and never feed the waiting queue. Now subtracting up to reservedQueue.length from pending before comparing. Mixed reserve() + plain-query workloads no longer under-provision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/js/internal/sql/mysql.ts`:
- Around line 948-967: The code treats auth failures as non-retryable even when
connectionInfo.password is a function, and currently only calls `#tryGrowPool`()
which returns null at maxPoolSize, causing permanent failure; change the
fallback so that when canDynamicallyAuth (connectionInfo.password is a function)
is true and `#tryGrowPool`() returns null you reuse one of the closed slots
instead of immediately failing — e.g. take a closed slot (the same resource pool
mechanism used when a slot is released), push onConnected into the appropriate
queue (reservedQueue or waitingQueue) and trigger the code path that
opens/initializes a connection for that closed slot (the same path used by
connect()/the slot-reuse logic) so a fresh auth attempt with the dynamic
password is made; keep using storedError/connectionClosedError only when no
dynamic auth is available and no slot can be reused.

In `@src/js/internal/sql/postgres.ts`:
- Around line 1176-1195: The dynamic-credential recovery fails because we only
call `#tryGrowPool`() and add a new slot after checking capacity, which returns
null when connections.length === maxPoolSize and causes callers to replay a
cached auth error; also createConnection() can synchronously call onClose() so
release() may run before we enqueue the waiter. Fix by enqueuing the waiter
(push into reservedQueue or waitingQueue) before attempting any retry/grow, and
when connectionInfo.password is a function attempt to reopen an existing closed
slot (call into the same code path used by `#tryGrowPool`() but targeting a closed
connection) rather than only creating a new slot; ensure the reopen path handles
synchronous onClose() by registering onClose/onConnected callbacks before
invoking PooledPostgresConnection.createConnection() so release()/onClose()
cannot race the enqueue.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2c3fe454-2b26-422b-932a-51e06bdb9166

📥 Commits

Reviewing files that changed from the base of the PR and between 9b8adfa and bad3076.

📒 Files selected for processing (2)
  • src/js/internal/sql/mysql.ts
  • src/js/internal/sql/postgres.ts

Comment thread src/js/internal/sql/mysql.ts Outdated
Comment thread src/js/internal/sql/postgres.ts Outdated
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Third CI run, same pattern — no failures touch the SQL diff:

  • bun-upgradeETXTBSY when posix_spawn'ing a bun binary that's still being copied (fs race)
  • fetch-http2-client — WebKit AtomStringImpl::remove: wasRemoved assertion (WTF/Source WebKit bug, unrelated to anything JS-side; same HTTP/2 file that's been flaky on every run)
  • 3x test-http-should-emit-close-when-connection-is-aborted timeout — Windows 2019 / Windows aarch64 / x64

Already used my one ci: retrigger on the gate-infra flake earlier. Needs a maintainer to re-run CI. The SQL diff is green — sql-pool-lazy-growth.test.ts passes locally on debug (ASAN) and hasn't been flagged once.

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Please review coderabbit reported issues

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed both findings in 7a274c3:

1. max: 1 + function password couldn'''t recover. Once the pool hit max, #tryGrowPool() returned null and subsequent queries replayed the cached auth error. Now: after a failed grow we walk existing closed slots and call a new forceRetry() helper that bypasses retry()'''s auth-code gate, forcing a fresh handshake — which re-invokes password() and can succeed if the token rotated.

2. Synchronous onClose() race. Postgres' createConnection calls onClose synchronously when password() throws, so the old code's connection.retry() could drain the queue BEFORE the waiter-push landed. The queue push now happens at the top of the poolStarted branch, before any call that can trigger sync onClose. If we end up in the fast-fail path (static creds, no retry possible), we splice the waiter back out before calling onConnected(storedError) directly.

Two new tests:

  • function password retries auth on each new query (rotatable credentials)max: 1 + password: () => ... now opens a fresh socket per query (3 attempts → 3 sockets); was stuck at 1.
  • synchronous password() throw does not hang subsequent queries — 2 sequential queries with password: () => { throw ... } both reject with the thrown error in ~30ms; was timing out at 5s.

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Fourth CI run, nothing new — still all unrelated to the SQL diff:

  • bun-install-registry — text lockfile hoisting snapshot mismatch on darwin aarch64 (install/lockfile logic)
  • jsc-stressmprotect failed: 487 on Windows 2019 x64 (OS-level)
  • fetch-http2-client — WebKit WTF AtomStringImpl::remove: wasRemoved ASAN assertion + HTTP/2 Content-Length rejection (WebKit/HTTP2, same bucket as every previous run)

Diff is src/js/internal/sql/{postgres,mysql}.ts + one test file — completely orthogonal to any of those lanes. Retrigger budget already spent. Needs a maintainer to re-run.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 test/js/sql/sql-pool-lazy-growth.test.ts:176-178 — This test passes 5000 as the third argument to test(), but test/CLAUDE.md says "CRITICAL: Do not set a timeout on tests. Bun already has timeouts." Drop the explicit 5000 and rely on Bun's default per-test timeout — the hang you're guarding against will still fail the test.

    Extended reasoning...

    What the guideline says

    test/CLAUDE.md (line ~120) is explicit:

    No timeouts

    CRITICAL: Do not set a timeout on tests. Bun already has timeouts.

    The repo convention is that the test runner already supplies a per-test timeout, so individual tests should never pass the third timeout argument to test().

    Where the violation is

    test/js/sql/sql-pool-lazy-growth.test.ts:145–178:

    test(
      "synchronous \`password()\` throw does not hang subsequent queries",
      async () => {
        // …
      },
      5000,   // ← explicit per-test timeout
    );

    The in-test comment even says "The bun:test per-test timeout fails the test if anything hangs, which is the failure mode we're guarding" — that rationale applies equally to Bun's default timeout. Hardcoding 5000 doesn't add coverage, it just diverges from the repo convention (and is shorter than the default, so it can only make the test more likely to flake under load, not less).

    Step-by-step

    1. bun:test parses the third positional arg to test() as a millisecond timeout override.
    2. Bun's runner already enforces a default timeout on every test (this is what the CLAUDE.md rule is referring to).
    3. If the regression this test guards against ever recurs (waiter enqueued after release() drains the queue → query promise never settles), the await sql\SELECT ${i}`` on the second iteration never resolves.
    4. With or without the explicit 5000, the runner's timeout fires and the test fails — the explicit value buys nothing functionally.

    Why nothing else justifies it

    None of the other tests in this file set a timeout, and there's nothing about this case (no Docker, no real DB, the failing path resolves synchronously via process.nextTick-equivalent paths) that would warrant an exception to the rule.

    Fix

    Remove the trailing 5000 argument:

    test("synchronous \`password()\` throw does not hang subsequent queries", async () => {
      // …
    });

    This is a style/convention nit, not a functional bug.

Comment thread src/js/internal/sql/mysql.ts Outdated
Comment thread test/js/sql/sql-pool-lazy-growth.test.ts Outdated
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Fifth CI run, still the same cast of unrelated flakes — none touch the SQL diff:

  • fetch-abort-slow-connect — ASAN timing sensitivity (36ms observed vs <10ms expected)
  • bun-install-registry — text lockfile hoisting snapshot
  • fetch-http2-client — WebKit WTF AtomStringImpl::remove: wasRemoved ASAN assertion (every run)
  • 3x test-http-should-emit-close-when-connection-is-aborted — Windows 2019 timeouts

Diff is src/js/internal/sql/{postgres,mysql}.ts + one test file. Retrigger budget spent. Needs a maintainer to re-run CI.

@robobun robobun force-pushed the farm/b4c5677a/sql-lazy-pool branch from f32efc8 to 8fd13e5 Compare May 14, 2026 17:30
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Sixth CI run — still all unrelated to the SQL diff:

Diff is still just src/js/internal/sql/{postgres,mysql}.ts + the one test file. Retrigger budget spent. Needs a maintainer to re-run.

@Lillious

Copy link
Copy Markdown
Contributor

@robobun This branch is now out of sync with main. Bun was ported to Rust. Merge main into this branch and resolve merge conflicts and re-write in rust.

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Please rebase this for the new rust version of bun

1 similar comment
@RiskyMH

RiskyMH commented May 16, 2026

Copy link
Copy Markdown
Collaborator

@robobun Please rebase this for the new rust version of bun

@robobun robobun force-pushed the farm/b4c5677a/sql-lazy-pool branch from 8fd13e5 to a93cc67 Compare June 12, 2026 20:32
Comment thread src/js/internal/sql/shared.ts Outdated
Comment thread src/js/internal/sql/shared.ts Outdated
Comment thread test/js/sql/sql-pool-lazy-growth.test.ts Outdated
Comment thread src/js/internal/sql/shared.ts Outdated
Comment thread src/js/internal/sql/shared.ts
@robobun

robobun commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Status on 80d204ae: added an in-flight #growing counter to the lazy pool growth so a re-entrant function-valued password() can no longer recurse past max (addresses the review finding; regression test added, fails before / passes after).

Diff is src/js/internal/sql/shared.ts plus three SQL test files. All SQL suites pass locally on the ASAN debug build, and no SQL test appears in CI failures.

Build 62312 is red only on unrelated lanes:

  • bun-create (x64-asan) — git "Author identity unknown" (agent git user not configured)
  • bun-install / bun-install-lifecycle-scripts — install tests
  • serve-body-leak (x64-asan) — HTTP JSON-body memory threshold, flagged as a warning with a retry

None touch the SQL pool code. Ready for a maintainer to review or merge.

Comment thread src/js/internal/sql/shared.ts
@Lillious

Copy link
Copy Markdown
Contributor

@cirospaciari Could you please review this PR?

Rebuilt on main (pool plumbing consolidated into BaseSQLAdapter /
BasePooledConnection in shared.ts). Previously connect() instantiated
every pool slot eagerly on the first query, and each pooled-connection
constructor dials TCP immediately, so a single SELECT 1 on a max: 500
pool burst 500 connections.

- Pool starts empty; #tryGrowPool() appends one connection at a time
  (push, so the array is dense — no unassigned holes), capped at
  maxPoolSize (= options.max).
- First connect() opens exactly one connection.
- flushConcurrentQueries() grows the pool when queued queries can't be
  served by ready or pending slots, discounting pending slots already
  earmarked for reservedQueue.
- maxDistribution() divides by the pool ceiling, not the current size.
- connect()'s retry scan revives one closed slot per waiter (break
  after the first), so a query arriving after the pool idled out
  doesn't redial every slot.
- Non-retryable auth failures fail fast with the cached error, unless
  password is a function (dynamic credential), in which case a closed
  slot is force-retried so a rotated token can take effect at max: 1.

Because the array is dense, the unassigned-hole guards added in #32201
are unnecessary here — a slot being created is pushed only after
createPooledConnection returns, so re-entrant scans during a
synchronous password() see an empty (not holey) array.

Tests: test/js/sql/sql-pool-lazy-growth.test.ts (single query opens 1
socket for both adapters; static-password auth failure fast-fails;
function password retries per query at max: 1; synchronous password
throw rejects without hanging). Updated sql-close-pending-connection
and sql-onconnect-onclose-throw, which asserted the old per-slot eager
password/onclose counts at max: 2 — one slot opens per query now.
@robobun robobun force-pushed the farm/b4c5677a/sql-lazy-pool branch from 0ff6392 to 53a46a2 Compare June 14, 2026 20:41
Comment thread src/js/internal/sql/shared.ts
createPooledConnection() runs a function-valued password() synchronously
before the new slot is appended to connections[], so a password() that
re-enters the pool while an earlier slot is still mid-handshake read a
stale connections.length in #tryGrowPool()'s cap check and kept opening
slots past max (unbounded recursion in the worst case). Track slots that
are mid-creation in a #growing counter and include it in the cap check so
the limit holds under synchronous re-entrancy.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit (80d204a) addresses my re-entrancy note with the #growing counter and a regression test, and the bug hunter found nothing new on this revision — but this rewrites the core Bun.SQL pool's connect() / retry / queue-distribution control flow for every postgres/mysql user, so I'd like a maintainer to sign off rather than auto-approving.

Extended reasoning...

Overview

This PR changes Bun.SQL's connection pool from eager (open all max connections on first use) to lazy (grow one slot at a time as demand requires). The substantive change is in src/js/internal/sql/shared.ts (BaseSQLAdapter / BasePooledConnection): a new maxPoolSize field, #tryGrowPool() / #pendingConnectionsCount() helpers, a #growing re-entrancy counter, a forceRetry() escape hatch for function-valued passwords, and a rewrite of connect()'s scan/retry/queue branches plus the flushConcurrentQueries() growth heuristic. Two existing tests are updated for the new one-slot-per-query expectation and a new sql-pool-lazy-growth.test.ts adds five non-Docker regression tests using bare TCP sinks.

Security risks

None identified. No auth, crypto, or permission surfaces are touched; the change governs when sockets are opened, not how credentials are handled. The function-password path already existed and is only re-plumbed.

Level of scrutiny

High. This is production-critical control flow shared by every postgres/mysql/mariadb Bun.SQL user, with subtle interactions between retry classification, reserved vs. waiting queues, idle-timeout revival, and synchronous re-entry from user-supplied password() callbacks. The PR went through roughly ten review rounds, each surfacing a real edge case (auth fast-fail, reserved-queue under-provisioning, max:1 dynamic-credential recovery, dead-slot stranding, post-idle burst, re-entrant growth past max). The current revision looks correct to me and the bug hunter found nothing, but the iteration count itself is evidence that this code path is delicate enough to merit human sign-off.

Other factors

The most recent commit (80d204a) implements the #growing counter I suggested for the re-entrant password() case and adds a dedicated test, so I'm not blocking on anything outstanding. A maintainer review was already requested on the thread; deferring here rather than approving so that request stands.

@Lillious

Copy link
Copy Markdown
Contributor

@robobun Main branch was merged into this branch. Please verify there are no broken changes.

@robobun

robobun commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

@Lillious verified the main merge (`5143f2c4`) locally, no broken changes:

  • Clean merge: no conflict markers, and the lazy-pool fix (including the `#growing` re-entrancy counter) is intact.
  • Debug build succeeds on the merged tree.
  • SQL suites pass: the lazy-growth + re-entrant-password regression tests, sql-helpers-validation, sql-connect-error-reporting, and all 231 sqlite-sql tests (454 SQL tests total).

One note so it is not mistaken for a regression: two heavy sqlite-sql stress tests (prepared-statement finalization and the query-normalization fuzzer) exceed the 5s default timeout only under the local ASAN debug build (~11-13s there); they pass with a 60s timeout and pass on the CI release lanes. They do not touch the connection pool (SQLite uses its own adapter, not BaseSQLAdapter).

CI on the merge (build 62607) is green apart from the long-standing flaky serve-body-leak HTTP memory-threshold test (warning level), which is unrelated to this PR.

@Lillious

Lillious commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

This was tested with my project: https://github.com/Lillious-Networks/Frostfire-Forge and it works. It uses the least number of connections instead of using the max value immediately. Compared before and after and saw the changed expected behavior. Dropped my memory usage down by 80MB

@alii alii marked this pull request as draft June 17, 2026 19:50
@robobun

robobun commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator Author

Fair question, it is a real tradeoff.

max is documented as "Maximum number of connections in the pool," and Bun.SQL's API follows postgres.js, which grows the pool on demand up to max rather than opening all of them the moment the pool is first used. So max: 500 reads as a burst ceiling rather than a pre-open target, and the eager fill was a deviation from that (node-postgres, Go's database/sql, and pgbouncer all treat max as a ceiling as well).

The per-query latency benefit from pre-opening is real, but it only applies during warm-up, and only if the workload actually reaches that concurrency. Once the pool has grown to its working set the connections are warm and reused, so steady-state acquisition is identical to the eager path, and with idleTimeout defaulting to 0 they are never reaped. The eager cost, by contrast, is paid unconditionally: max: 500 opens 500 backend connections the first time you query (~80MB here, per @Lillious) and can exhaust the server's max_connections for capacity that is never used.

If warming the pool is the goal, an explicit pre-warm / minimum-idle option (similar to HikariCP's minimumIdle) would target that directly without making max itself eager. Happy to go whichever way you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: SQL Implementation creating un-used sql connections

5 participants