Skip to content

postgres: release the speculative query ref on every do_run error exit#32426

Closed
alii wants to merge 2 commits into
mainfrom
ali/postgres-dorun-ref-leaks
Closed

postgres: release the speculative query ref on every do_run error exit#32426
alii wants to merge 2 commits into
mainfrom
ali/postgres-dorun-ref-leaks

Conversation

@alii

@alii alii commented Jun 16, 2026

Copy link
Copy Markdown
Member

What this does

PostgresSQLQuery::do_run takes a speculative this.ref_() before dispatching. The simple-query execute_query failure path correctly released it (release_statement + deref) before throwing, but three other error exits did not:

  • statements.get_or_put failure (allocator)
  • writer.write(SYNC) failure
  • final requests.write_item failure (extended-protocol path)

Each leaked the query ref (and the latter two also leaked the statement ref). The omissions were inherited from the Zig original, which cleans up the simple-query writeItem failure but not these. Same error is thrown at the same point; the query/statement just no longer leak.

Extracted the cleanup into two closures (release_query_ref, throw_write_error) applied at all 8 error-return sites for consistency. Net −29 lines. From #31664 (closed).

Verification

  • cargo check -p bun_sql_jsc and cargo clippy --no-deps: clean.
  • 20/20 postgres mock-server tests pass (postgres-multi-statement-fields, sql-connect-error-reporting).

These are allocator-failure / write-failure paths only — not JS-observable on the success path, so there's no input that fails before and passes after; the verification is that the success path is unchanged.

do_run takes a speculative this.ref_() before dispatching. The
simple-query execute_query failure path correctly released it
(release_statement + deref) before throwing, but three other error
exits did not:

- statements.get_or_put failure (allocator)
- writer.write(SYNC) failure
- final requests.write_item failure (extended-protocol)

Each leaked the query ref (and the latter two also leaked the statement
ref). The omissions were inherited from the Zig original. Same error is
thrown at the same point; the query/statement just no longer leak.

Extracted the cleanup into two closures applied at all 8 error-return
sites for consistency. Net -29 lines.

These are allocator-failure / write-failure paths only — not
JS-observable on the success path. From #31664 (closed).
@robobun

robobun commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator
Updated 11:49 AM PT - Jun 17th, 2026

@alii, your commit d466a24 has 1 failures in Build #63161 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 32426

That installs a local version of the PR into your bun-32426 executable, so you can run:

bun-32426 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 2 issues this PR may fix:

  1. Bun.SQL: PostgresSQLConnection JS-side wrapper leak — ~9k objects/h linear, RSS +340MB/h #30010 - Query ref-count leaks on error paths in do_run keep query objects (and transitively their connection wrappers) pinned in the JSC heap, contributing to the reported linear accumulation of PostgresSQLConnection objects and RSS growth
  2. SQL: connection pool stalls under concurrent begin() + parameterized queries (idle connections never handed to pending queries) #32004 - A leaked query ref on the requests.write_item failure path (extended-protocol) can leave the connection in a "busy" state from the pool's perspective while the server has moved on, contributing to the observed pool stalls under concurrent begin() + parameterized queries

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #30010
Fixes #32004

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e27f7464-1c73-4e34-b23d-55d32c0c2135

📥 Commits

Reviewing files that changed from the base of the PR and between 66cd2fd and d950c24.

📒 Files selected for processing (1)
  • src/sql_jsc/postgres/PostgresSQLQuery.rs

Walkthrough

In PostgresSQLQuery::do_run, two shared closures (release_query_ref and throw_write_error) are introduced to centralize statement cleanup, speculative-ref undoing, and conditional JS error throwing. All protocol-level error paths (execute, bind+execute, prepare+query, write query, SYNC flush, and request enqueue) are updated to use these closures instead of repeating inline logic.

Changes

PostgreSQL Query Error Handling Refactor

Layer / File(s) Summary
Shared release_query_ref and throw_write_error closures
src/sql_jsc/postgres/PostgresSQLQuery.rs
Two closures added inside do_run: release_query_ref centralizes release_statement + deref(this_ptr), and throw_write_error converts AnyPostgresError to a JS exception only when no exception is already pending.
Error path call sites updated
src/sql_jsc/postgres/PostgresSQLQuery.rs
All protocol execution failure branches (execute_query, enqueue write-item, prepared-statement allocation, bind_and_execute, prepare_and_query_with_signature, write_query, writer.write(SYNC), and final connection.requests.write_item) replaced with release_query_ref() + throw_write_error(...) calls, removing ~29 lines of duplicated inline cleanup and conditional exception checks.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically summarizes the main change: releasing query references on error exits in the do_run method.
Description check ✅ Passed The description covers both required sections (What does this do and How did you verify), with comprehensive details about the fix, resource leaks, and verification steps.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. sql(postgres): ref poll_ref only after request is enqueued in do_run #32273 - Fixes the same three missing Self::deref(this_ptr) calls on error-return paths in PostgresSQLQuery::do_run (as a secondary fix alongside the poll_ref keepalive change)

🤖 Generated with Claude Code

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I traced all 8 error-exit sites and the refactor looks correct (closures are pure consolidation; the 3 newly-covered paths were genuine leaks), but since this is manual unsafe refcounting in the postgres driver it's worth a quick human pass — particularly the final write_item failure where release_statement() now derefs a stmt that may also be held by connection.statements.

Extended reasoning...

Overview

Single-file change to src/sql_jsc/postgres/PostgresSQLQuery.rs, net −29 lines. Extracts two closures (release_query_ref, throw_write_error) inside do_run and applies them at every error-return after the speculative this.ref_(). Four sites are pure de-duplication (behavior identical to before); three sites (statements.get_or_put allocator failure, writer.write(SYNC) failure, and the final extended-protocol requests.write_item failure) gain the release_statement() + deref(this_ptr) cleanup they were previously missing. The Signature::generate failure path is intentionally left as a bare deref (no statement exists yet).

Security risks

None. No input parsing, auth, or protocol-surface changes. The only behavioral delta is on allocator/socket-write failure paths, where a refcount that previously leaked is now released. The closures capture this/this_ptr/global_object immutably and are invoked at most once per execution, so no new aliasing or double-free is introduced on the consolidated paths.

Level of scrutiny

Medium-high. The diff is small and mechanical, but it lives in manual intrusive-refcount logic with unsafe { Self::deref(...) } inside a production DB driver. The interesting case is the final write_item failure: this.statement is now non-None (set just above in the 'enqueue block), so release_statement() will deref it — for named prepared statements that's 2→1 (map still owns it), for unnamed it's 1→0 (frees). That reads as correct, but it's exactly the kind of edge a maintainer who owns this code should glance at.

Other factors

cargo check / clippy clean and 20/20 postgres mock-server tests pass per the description; success path is byte-identical. No CODEOWNERS entry for this path. No prior reviewer comments to address. The bug-hunting system found nothing. Given the unsafe refcount surface, deferring rather than auto-approving.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any issues — the closure extraction is behavior-preserving and the three new release_query_ref() calls correctly undo the speculative ref_() (and statement ref where present) — but since this is unsafe intrusive-refcount cleanup in the postgres driver and overlaps with #32273, it's worth a human sign-off.

Extended reasoning...

Overview

This PR touches a single function, PostgresSQLQuery::do_run in src/sql_jsc/postgres/PostgresSQLQuery.rs. It (1) extracts the repeated release_statement() + Self::deref(this_ptr) cleanup and the has_exception()-guarded throw into two local closures, and (2) adds the cleanup to three error-return sites that previously leaked the speculative this.ref_() (and, on the final requests.write_item path, the statement ref): statements.get_or_put failure, writer.write(SYNC) failure, and the extended-protocol requests.write_item failure.

I traced each modified site:

  • The 5 refactored sites (execute_query, simple write_item, bind_and_execute, prepare_and_query_with_signature, write_query) are byte-for-byte equivalent to the inlined code they replace.
  • At the 3 newly-cleaned sites, release_statement() is either a no-op (statement is None at get_or_put/SYNC) or correctly drops the just-set statement ref (final write_item), and deref(this_ptr) brings the count from ≥2 back to ≥1 — never frees, matching the existing SAFETY comment.
  • The closures capture only shared borrows (this, global_object) plus the Copy this_ptr, consistent with the file's R-2 &self + Cell design; the untouched Signature::generate error path correctly keeps its bespoke handling (different error type, no statement yet).

Security risks

None. No untrusted input handling, parsing, or auth is touched — this is purely error-path resource cleanup. The unsafe blocks are consolidations of existing unsafe { Self::deref(this_ptr) } calls, not new unsafe surface.

Level of scrutiny

Medium-high. Although the diff is small (net −29 lines) and mechanically simple, it manipulates intrusive refcounts via unsafe in a production driver. A miscount here would manifest as a use-after-free or a leak under rare allocator/write-failure conditions that tests don't exercise directly. That puts it above my auto-approve bar for "simple, mechanical, or obvious" changes.

Other factors

  • The bug-hunting system found nothing; coderabbit had no actionable comments.
  • github-actions flagged this as overlapping with #32273, which reportedly fixes the same three missing deref calls as part of a larger change — a human should decide which lands / how to reconcile.
  • These are allocator-failure / socket-write-failure paths, so there's no direct regression test; verification was cargo check/clippy + the existing 20 mock-server tests passing, which confirms the success path is unchanged but doesn't exercise the fixed paths.

@alii

alii commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Superseded by #32464 (combined with #32273's poll_ref fix + test).

@alii alii closed this Jun 17, 2026
alii added a commit that referenced this pull request Jun 17, 2026
…t refs) (#32464)

Supersedes #32426 and #32273 — combines both, plus the
`release_statement()` at the SYNC-failure site that #32273 missed.

## Repro (the user-visible part)

```js
import { SQL } from "bun";
const sql = new SQL({ url: "postgres://...", max: 1, idleTimeout: 0, maxLifetime: 0 });
await sql.connect();
await new Promise(r => setImmediate(r));
await sql`SELECT ${new Boolean(true)}`.catch(e => e);  // ERR_INVALID_ARG_TYPE
// process hangs here instead of exiting
```

## Cause

`PostgresSQLQuery::do_run` did two pieces of speculative setup before
validating its arguments:

1. `connection.poll_ref.ref_()`. `KeepAlive` is a two-state flag
(`src/io/keep_alive.rs`), not a refcount, so when this query is the only
in-flight work the call flips Inactive → Active. Every synchronous error
return after that point left it stuck Active; nothing else on an idle
connection touches `poll_ref` until the next server message, which never
comes because nothing was written. The hang is masked when `do_run` runs
inside the connection's `on_data` microtask drain (whose epilogue
re-derives `poll_ref` from the queue), so it only shows up for queries
issued on a later turn — the normal case for any query after the first
on a pooled connection.
2. `this.ref_()`. The simple-query `execute_query` failure path
correctly released it, but three other error exits did not:
`statements.get_or_put` failure, `writer.write(SYNC)` failure, and the
final `requests.write_item` failure. The latter two also leaked the
just-allocated statement ref.

## Fix

- Move `poll_ref.ref_()` to after
`connection.requests.write_item(this_ptr)` succeeds, on both the
simple-query and prepared-statement branches. On every error path the
keepalive is now untouched; on the success path the request is enqueued
so the ref is balanced by `on_data`/`update_ref()` when the server
responds. Matches `JSMySQLQuery::do_run`, which never pre-refs.
- Extract the per-exit cleanup into two closures (`release_query_ref`,
`throw_write_error`) and apply them at all eight error-return sites —
the three previously-leaking sites now release what they took, and the
five existing sites lose their copy-pasted blocks. Net −74 in
`PostgresSQLQuery.rs`.

## Verification

Regression test added to `sql-onconnect-onclose-throw.test.ts`'s
`describeWithContainer("postgres", ...)` block: a fixture connects to
real Postgres, waits a tick so `do_run` runs outside the `on_data`
drain, issues the boxed-Boolean query, prints the rejection and falls
through. Without the fix the child hangs and the test times out; with
the fix it prints `rejected:ERR_INVALID_ARG_TYPE` and exits 0.

`cargo check -p bun_sql_jsc` and `cargo clippy --no-deps` clean.
`postgres-multi-statement-fields.test.ts` and
`sql-connect-error-reporting.test.ts` (20 tests, both query-protocol
paths) still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants