Skip to content

fix(auth): unwedge PGlite sign-up by routing single-user guard through the transaction adapter#952

Merged
buremba merged 4 commits into
mainfrom
feat/fix-pglite-signup-deadlock
May 20, 2026
Merged

fix(auth): unwedge PGlite sign-up by routing single-user guard through the transaction adapter#952
buremba merged 4 commits into
mainfrom
feat/fix-pglite-signup-deadlock

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 20, 2026

Summary

  • user.create.before and account.create.before called getDb() for a fresh pool connection while Better Auth's sign-up handler (and OAuth-user registration) already held the only one via runWithTransaction. In PGlite mode the pool is sized 1 (LOBU_DISABLE_PREPARE=1poolMax=1), so the hook deadlocked the request — no log line, no response, curl headers-timeout.
  • Routes the count + lookup through ctx.context.internalAdapter (countTotalUsers / findUserById), which reuses the in-flight transaction connection.
  • Declares principalKind as a Better Auth additional field (input/returned: false, fieldName: 'principal_kind') so the where-clause field name resolves without BA throwing Field principal_kind not found in model user. The column already has NOT NULL DEFAULT 'human' in the migration, so input: false lets the DB default fill in on signup.

Closes #947.

Codex consult

Initial investigation + recommendation from codex exec traced the deadlock to the postgres.js + Kysely transaction: true path holding the reserved connection while the hook asked for a second. Diff-review pass flagged two follow-ups, both applied (fail-closed via ctx!; corrected the account.create.before comment — only new OAuth user registration is transaction-wrapped).

Reproducer (red → green)

Manual run against the pre-built start-local.bundle.mjs (PGlite, ephemeral data dir, default LOBU_SINGLE_USER=1):

RED (before fix): POST /api/auth/sign-up/email timed out after 12s with 0 bytes; server log had zero entries for the request.

GREEN (with fix):

Integration test (both backends, same code path)

packages/server/src/__tests__/integration/auth/single-user-signup.test.ts — backend-agnostic, runs unchanged against external Postgres and PGlite. Asserts first-signup admitted + sign-in-ready, second-signup refused, and install_operator/bootstrap-user rows excluded from the human count.

  • PGlite (LOBU_TEST_BACKEND=pglite): 3/3 pass.
  • Real Postgres (local throwaway DB): 3/3 pass; full auth/ integration folder 17/17 pass.
  • Regression-guard verified: temporarily reverting the hook to the old getDb() query makes all three time out at 15s under PGlite — the test catches PGlite mode: /api/auth/sign-up/email hangs (headers timeout) #947, it doesn't just pass vacuously.

Test plan

  • bun run typecheck clean
  • make build-packages clean
  • PGlite reproducer red → green
  • Sign-up + sign-in round-trip (PGlite)
  • Integration test passes on PGlite and real Postgres
  • Regression guard confirmed (old code times out under PGlite)
  • OAuth account.create.before not directly E2E'd — same ctx!.context.internalAdapter pattern as the verified count path; needs a configured OAuth provider to exercise the new-OAuth-user transactional path

Summary by CodeRabbit

  • Bug Fixes

    • Enforced single-user mode to prevent additional signups; system placeholder accounts are ignored and won’t block the initial human signup.
    • Blocked account linking to restricted/system account types.
  • Tests

    • Added integration tests validating the single-user signup guard, credential creation and password verification, deterministic seeding, and auth-cache reset between cases.

Review Change Stack

…h the transaction adapter

The `user.create.before` and `account.create.before` hooks called
`getDb()` for a fresh pool connection while Better Auth's
`/api/auth/sign-up/email` (and OAuth registration) endpoints already
held the only one via `runWithTransaction`. In PGlite mode the pool is
sized 1 (LOBU_DISABLE_PREPARE → poolMax=1), so the hook deadlocked the
whole request — no log line, no response, curl headers-timeout.

Routes the count + lookup through `ctx.context.internalAdapter`
(`countTotalUsers` / `findUserById`), which reuses the in-flight
transaction connection. Declares `principalKind` as a Better Auth
additional field (input/returned: false, fieldName 'principal_kind')
so the where-clause field name resolves without BA throwing
"Field principal_kind not found in model user". Fail-closed if BA ever
invokes the hook without an auth context.

Closes #947.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

Adds a non-input/non-returned principalKind user field, updates signup and account-linking hooks to accept auth context and use ctx.context.internalAdapter for user counting and lookup, exports a test cache-clear helper, and adds integration tests for single-user sign-up behavior.

Changes

Principal kind schema and auth hook refactoring

Layer / File(s) Summary
Auth cache test utility
packages/server/src/auth/index.tsx
Exports clearAuthCacheForTests() to clear in-memory Better Auth instances used by tests.
Principal kind schema configuration
packages/server/src/auth/index.tsx
Better Auth user configuration extended with additionalFields.principalKind mapped to principal_kind column, configured as non-input/non-returned so database default applies on signup.
Signup guard context integration
packages/server/src/auth/index.tsx
user.create.before hook signature updated to accept (user, ctx); single-user mode enforcement uses ctx!.context.internalAdapter.countTotalUsers(...) excluding install_operator and bootstrap-user; missing ctx is fail-closed.
Account linking validation
packages/server/src/auth/index.tsx
account.create.before hook signature updated to accept (account, ctx); linked user loaded via ctx!.context.internalAdapter.findUserById(account.userId) and linking blocked when that user's principalKind is install_operator (throws APIError).
Single-user signup integration tests
packages/server/src/__tests__/integration/auth/single-user-signup.test.ts
Adds tests that set LOBU_SINGLE_USER=1 and a deterministic BETTER_AUTH_SECRET, use DB seeding and cache resets, and verify first human signup succeeds, subsequent human signup is rejected (403 + SIGN_UP_DISABLED_IN_SINGLE_USER_MODE), and non-human principals are ignored in the human count.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant AuthServer
  participant InternalAdapter
  participant Database

  Client->>AuthServer: POST /api/auth/sign-up/email (signup payload)
  AuthServer->>InternalAdapter: countTotalUsers({ excludeKinds: ["install_operator","bootstrap-user"] })
  InternalAdapter->>Database: SELECT COUNT(...) filtering principal_kind
  Database-->>InternalAdapter: count
  InternalAdapter-->>AuthServer: count result
  AuthServer->>AuthServer: allow or reject signup based on count
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

  • lobu-ai/lobu#902: The main PR’s single-user-mode signup enforcement changes (counting existing users via countTotalUsers while excluding legacy bootstrap-user/install_operator, with fail-closed ctx!) directly build on the retrieved PR #902’s update to the same user.create.before logic.
  • lobu-ai/lobu#898: Both PRs modify the Better Auth single-user enforcement in packages/server/src/auth/index.tsx (the databaseHooks.user.create.before user-creation guard using LOBU_SINGLE_USER and the SIGN_UP_DISABLED_IN_SINGLE_USER_MODE 403 logic).

Poem

🐰 I hopped through hooks and cleared the cache,
I counted humans and ignored the bash.
A hidden kind tucked out of sight,
Install ops blocked from linking tight.
Hooray — a rabbit's tiny signup bash.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: routing the single-user guard through the transaction adapter to fix the PGlite sign-up deadlock.
Description check ✅ Passed The description provides a comprehensive summary of the problem, solution, and verification, including manual reproducer results and integration test outcomes. Test plan section is completed with checkmarks.
Linked Issues check ✅ Passed The PR fully addresses #947 by fixing the deadlock (routing through ctx.context.internalAdapter instead of getDb()), adding the principalKind field, and providing verification via integration tests on both PGlite and Postgres.
Out of Scope Changes check ✅ Passed All changes are scoped to the single-user sign-up deadlock fix: the auth handler hooks, principalKind field declaration, test cache helper, and integration test directly support resolving #947.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/fix-pglite-signup-deadlock

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/server/src/auth/index.tsx`:
- Around line 766-773: The principalKind guard for OAuth account linking uses
optional chaining on ctx which silently allows linking when ctx is missing;
update the account-linking flow (the block using
ctx?.context.internalAdapter.findUserById, linkedUser, and the principalKind ===
"install_operator" check) to fail closed like the user.create.before hook:
explicitly require ctx (throw the same APIError when ctx is undefined or missing
context), then call ctx.context.internalAdapter.findUserById and perform the
principalKind check, preventing account linking if ctx is absent or the user is
an install_operator.
- Around line 649-661: The filter passed to
ctx.context.internalAdapter.countTotalUsers uses the additionalField key
"principalKind" but the API expects the actual DB column name; update the first
filter object in the countTotalUsers call inside
packages/server/src/auth/index.tsx to use field: "principal_kind" (leave
operator and value unchanged) so the call to countTotalUsers([{ field:
"principal_kind", operator: "ne", value: "install_operator" }, { field: "id",
operator: "ne", value: "bootstrap-user" }]) matches the expected format.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5c8d25a6-f7c8-4f8e-a2df-616583769ce3

📥 Commits

Reviewing files that changed from the base of the PR and between 8695c57 and e8770d1.

📒 Files selected for processing (1)
  • packages/server/src/auth/index.tsx

Comment thread packages/server/src/auth/index.tsx
Comment on lines +766 to +773
const linkedUser =
await ctx?.context.internalAdapter.findUserById(
account.userId,
);
const principalKind = (
linkedUser as { principalKind?: string } | null
)?.principalKind;
if (principalKind === "install_operator") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Inconsistent fail-closed behavior compared to signup hook.

The user.create.before hook (line 635) throws an APIError when ctx is missing, enforcing fail-closed semantics. Here, optional chaining silently proceeds if ctx is undefined, allowing account linking to succeed even when the principalKind check couldn't be performed.

If ctx is unexpectedly missing during an OAuth account link onto an install_operator user, the guard is bypassed. Consider aligning with the signup hook's fail-closed approach:

Proposed fix for fail-closed consistency
 					if (account.providerId !== "credential") {
+						if (!ctx) {
+							throw new APIError("INTERNAL_SERVER_ERROR", {
+								code: "ACCOUNT_LINK_NO_AUTH_CONTEXT",
+								message:
+									"Account linking rejected: missing auth context for install_operator guard.",
+							});
+						}
 						const linkedUser =
-							await ctx?.context.internalAdapter.findUserById(
+							await ctx.context.internalAdapter.findUserById(
 								account.userId,
 							);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/auth/index.tsx` around lines 766 - 773, The principalKind
guard for OAuth account linking uses optional chaining on ctx which silently
allows linking when ctx is missing; update the account-linking flow (the block
using ctx?.context.internalAdapter.findUserById, linkedUser, and the
principalKind === "install_operator" check) to fail closed like the
user.create.before hook: explicitly require ctx (throw the same APIError when
ctx is undefined or missing context), then call
ctx.context.internalAdapter.findUserById and perform the principalKind check,
preventing account linking if ctx is absent or the user is an install_operator.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

buremba added 3 commits May 20, 2026 02:36
…dant fail-closed branch

Trims the diff added by the previous commit:

- Folds the two adjacent comment blocks in user.create.before into one
  paragraph, keeps the "why ctx.internalAdapter" explanation.
- Drops the explicit `if (!ctx) throw …` — uses ctx! instead. A null
  ctx would throw a TypeError at the property access, which BA
  catches and surfaces as FAILED_TO_CREATE_USER (422); fail-closed
  for free, fewer lines.
- Matches the same pattern in account.create.before.
- Compacts the principalKind additionalField comment.

Reproducer still green (sign-up #1 200, sign-up #2 403 with
SIGN_UP_DISABLED_IN_SINGLE_USER_MODE).
 deadlock

Backend-agnostic integration test that runs unchanged against external
Postgres (default) and PGlite (LOBU_TEST_BACKEND=pglite). Asserts:

- first human signup is admitted and the row is sign-in-ready
  (principal_kind defaults to 'human' via the DB default, credential
  hash verifies against the submitted password);
- the second signup is refused with SIGN_UP_DISABLED_IN_SINGLE_USER_MODE;
- seeded install_operator and bootstrap-user rows don't count as the
  existing human.

Under the PGlite backend this reproduces #947: reverting the hook to a
fresh getDb() query hangs the request and the test fails on timeout
(verified — all three time out at 15s with the old code).
…ministic in the full suite

createAuth() memoizes betterAuth instances in a per-org TtlCache. Under
the integration suite's shared module graph (isolate:false), an earlier
test file builds the "__system__" instance while LOBU_SINGLE_USER is
unset, so its user.create.before closure has the guard disabled. My test
then reused that stale instance and the second signup was admitted —
order-dependent flake (passed alone, failed late in the suite).

- Add clearAuthCacheForTests() to auth/index.tsx (mirrors the existing
  clearXForTests helpers); production never needs it since env is stable
  per-process.
- Clear the cache in beforeEach (protect against upstream pollution) and
  afterEach (don't leak our LOBU_SINGLE_USER=1 instance to later files —
  this was the likely cause of the member-privacy flake in CI too).
- Make the "refuses" case seed a committed human via SQL instead of
  chaining two signups, removing the cross-request visibility dependency.

Verified against the full integration suite locally on real Postgres
(CI's singleFork/isolate:false config): the 3 single-user tests pass;
the only remaining local failures are the isolated-vm sandbox tests,
which are environment-specific and pass in CI.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/server/src/auth/index.tsx (1)

646-653: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the correct database column name in the countTotalUsers filter.

The field parameter in where clauses must use the actual database column name ("principal_kind"), not the additionalFields key ("principalKind"). Better Auth does not auto-map additionalFields keys in where clauses. Line 648 must be updated:

Fix required at line 648
field: "principal_kind"  // not "principalKind"

Without this change, the filter fails silently and does not exclude install_operator rows.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/auth/index.tsx` around lines 646 - 653, The where filter
passed to ctx!.context.internalAdapter.countTotalUsers uses the wrong field
name; update the filter object in the countTotalUsers call so the field key for
the principal kind uses the database column name "principal_kind" instead of
"principalKind" (leave the other condition for id/"bootstrap-user" intact) so
the install_operator rows are correctly excluded.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/server/src/auth/index.tsx`:
- Around line 646-653: The where filter passed to
ctx!.context.internalAdapter.countTotalUsers uses the wrong field name; update
the filter object in the countTotalUsers call so the field key for the principal
kind uses the database column name "principal_kind" instead of "principalKind"
(leave the other condition for id/"bootstrap-user" intact) so the
install_operator rows are correctly excluded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7c4e5635-b173-4f4a-860e-961b61c6d1f8

📥 Commits

Reviewing files that changed from the base of the PR and between c30113d and 17f3010.

📒 Files selected for processing (2)
  • packages/server/src/__tests__/integration/auth/single-user-signup.test.ts
  • packages/server/src/auth/index.tsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PGlite mode: /api/auth/sign-up/email hangs (headers timeout)

2 participants