fix(lbug): keep serve stable when sidecars are missing by ChamHerry · Pull Request #1747 · abhigyanpatwari/GitNexus

ChamHerry · 2026-05-21T07:12:12Z

Summary

Fixes the Express 5 private-network preflight wildcard route so gitnexus serve can start reliably.
Adds shared LadybugDB missing-shadow WAL sidecar preflight/quarantine/finalization helpers.
Reuses the helper from both direct and pooled LadybugDB adapters with targeted regression coverage.

Why

Local serving could surface repeated lbug.shadow missing-sidecar warnings after interrupted writes or checkpoint edge cases. The recovery now happens before read-only open for safe tiny orphan WALs, while larger WALs are preserved for explicit recovery.

Validation

cd gitnexus && npx vitest run test/unit/sidecar-recovery.test.ts test/unit/lbug-adapter-wal-schema.test.ts test/unit/pool-wal-recovery.test.ts test/unit/web-ui-serving.test.ts && npx tsc --noEmit
Pre-commit hook: eslint/prettier + gitnexus typecheck

Follow-up

The i18n changes are intentionally kept on a separate stacked branch (ChamHerry:codex/i18n-20260521). They should be submitted to upstream after this PR lands or after an upstream base branch is available, so reviewers do not see this stability diff duplicated in the i18n PR.

Shared missing-shadow WAL recovery prevents repeated read-only open warnings when LadybugDB sidecars are absent, while the Express preflight fix keeps `gitnexus serve` compatible with Express 5 route parsing. Constraint: LadybugDB read-only replay can require a `.shadow` sidecar that may be absent after interrupted writes or checkpoint edge cases. Rejected: keep reactive WARN-only quarantine in each adapter | it leaves repeated user-visible warnings and duplicate recovery behavior. Confidence: high Scope-risk: broad Directive: Do not silently delete large orphan WALs; only quarantine tiny orphan WALs before open and keep large WALs for explicit recovery. Tested: cd gitnexus && npx vitest run test/unit/sidecar-recovery.test.ts test/unit/lbug-adapter-wal-schema.test.ts test/unit/pool-wal-recovery.test.ts test/unit/web-ui-serving.test.ts && npx tsc --noEmit Not-tested: full npm test in this split branch; full unit suite passed on the source branch before PR split. Co-authored-by: OmX <omx@oh-my-codex.dev>

vercel · 2026-05-21T07:12:16Z

Someone is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

github-actions · 2026-05-21T07:41:28Z

CI Report

✅ All checks passed

Pipeline Status

Stage	Status	Details
✅ Typecheck	`success`	tsc --noEmit
✅ Tests	`success`	unit tests, 3 platforms
✅ E2E	`success`	gitnexus-web changes only

Test Results

Tests	Passed	Failed	Skipped	Duration
9410	9409	0	1	479s

✅ All 9409 tests passed

1 test(s) skipped — expand for details

buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric	Coverage	Covered	Base	Delta	Status
Statements	78.11%	30902/39558	N/A%	—	🟢 ███████████████░░░░░
Branches	66.65%	19635/29459	N/A%	—	🟢 █████████████░░░░░░░
Functions	82.28%	3139/3815	N/A%	—	🟢 ████████████████░░░░
Lines	81.47%	27866/34201	N/A%	—	🟢 ████████████████░░░░

_{📋 View full run · Generated by CI}

github-actions · 2026-05-21T07:46:17Z

Claude finished @magyargergo's task in 8m 23s —— View job

Production-Readiness Review: PR #1747 `fix(lbug): keep serve stable when sidecars are missing`

1. Review bar for this PR

This PR touches persistent local database state, sidecar recovery/quarantine logic, cross-process initialization, read-only vs writable semantics, pooled connections, and HTTP preflight routing. The bar is strict: confirmed defects in any of those lanes block merge independently. CI passing is necessary but not sufficient.

2. Problem being solved

Problem A: gitnexus serve crashes at startup under Express 5 because app.options('*', ...) is not valid Express 5 syntax (Express 5 rejects '*' as a glob path and throws SyntaxError).

Problem B: After interrupted writes or WAL checkpoint edge cases, lbug.wal can exist without lbug.shadow. A subsequent read-only open (serve, MCP) emits repeated native errors and may fail. Larger WALs should be preserved for explicit recovery; tiny orphan WALs should be silently quarantined (renamed, not deleted) before open.

Problem statement gaps verified:

No upstream GitNexus issue linked as the primary bug report for fix(lbug): keep serve stable when sidecars are missing #1747. Problem B's user reproduction path is not fully documented.
The boundary "tiny orphan WAL" is TINY_ORPHAN_WAL_BYTES = 4096 bytes (sidecar-recovery.ts:17). Rationale is implied (below one WAL page = no committed data) but not explicitly documented in comments.
PR fix(lbug): keep serve stable when sidecars are missing #1747 fixes both read-only serve/pool startup AND writable analyze paths.

3. Current PR state

State: Open. Head commit 23da8b1 on ChamHerry:codex/lbug-sidecar-recovery-20260521.
Changed files: 8 TypeScript files — gitnexus/src/core/lbug/{lbug-adapter.ts,pool-adapter.ts,sidecar-recovery.ts}, gitnexus/src/server/api.ts, and 4 test files. No i18n, workflow, Docker, dependency, parser, or generated-artifact changes visible. Branch hygiene is clean.
CI: GitHub Actions CI report (last visible run) shows 9,370 tests passed, typecheck success, E2E success. 1 test skipped (unrelated Ruby closure inference). 2 PR Description Check annotations marked cancelled due to higher-priority re-runs — visible state does not confirm whether these are stale superseded runs or current required-check blockers. Verify live Checks tab before merge.
Vercel: Bot requests team-member deployment authorization. Appears non-blocking for merge based on typical repo policy, but confirm against branch protection rules.

4. Merge status and mergeability

Visibility is incomplete. The cancelled PR Description Check annotations should be verified as stale (superseded runs) vs current blockers. Mergeability cannot be confirmed without verifying the live branch protection state. The code review below surfaces blocking defects that prevent merge regardless of CI state.

5. Repository history considered

PR fix(lbug): Recover gitnexus analyze from orphan LadybugDB sidecars when main DB file is missing #1622 / Issue analyze permanently fails after mid-run crash leaves orphan lbug.shadow without main DB #1618 ("Recover gitnexus analyze from orphan LadybugDB sidecars when main DB file is missing"): Prior art for sidecar recovery. fix(lbug): Recover gitnexus analyze from orphan LadybugDB sidecars when main DB file is missing #1622 handled absent main DB + orphan sidecars; fix(lbug): keep serve stable when sidecars are missing #1747 handles present DB + WAL present + shadow missing. The doInitLbug orphan-sidecar cleanup at lbug-adapter.ts:713 carries forward fix(lbug): Recover gitnexus analyze from orphan LadybugDB sidecars when main DB file is missing #1622's pattern, and the new preflightLbugSidecars is additive. fix(lbug): Recover gitnexus analyze from orphan LadybugDB sidecars when main DB file is missing #1622's data-loss lessons (ENOENT vs other FS errors, platform compat) are partially incorporated — but see Finding 1 below.
Fork PR [codex] Stabilize LadybugDB sidecar recovery ChamHerry/GitNexus#1 Codex review: Flagged that quarantineWalForMissingShadow may propagate ENOENT from fs.rename when a concurrent process already handled the WAL. This comment is NOT addressed in PR fix(lbug): keep serve stable when sidecars are missing #1747. (Confirmed below in Finding 1.)
PR feat(i18n): make web and CLI language-aware #1748 (i18n): Stacked on fix(lbug): keep serve stable when sidecars are missing #1747. Visible diff in fix(lbug): keep serve stable when sidecars are missing #1747 contains no i18n content — the PR body correctly says i18n is kept separate. No contamination detected.
Issues LadybugDB FTS/VECTOR extension load hard-exits Node on Windows #1199, Windows: gitnexus analyze segfaults during FTS extension load (empty repo repro) #1217 (Windows SIGSEGV on FTS install): The pool adapter's hasLocalWinFtsExtension probe at pool-adapter.ts:53 and the load-only policy at pool-adapter.ts:592–596 directly address these. History alignment is confirmed.
Issues LadybugDB (Kuzu) Native Library Crash on Windows - Query Commands ExitCode 0xC0000005 #1154, 1.6.4-rc.22 still fails @ 60% | Loading into LadybugDB... #1195, SIGSEGV crash in lbug::fts_extension::tableFunc on macOS 26 beta (arm64) + FTS read-only database error in MCP server #1204, Native crash (silent exit 1) on CALL CREATE_VECTOR_INDEX for ~8k vectors on macOS arm64 #1206, analyze fails with "Invalid argument" on Python files, hangs at 60% "Loading into LadybugDB..." #1229, Custom keyGenerator appears to use request IP without calling the ipKeyGenerator helper function for IPv6 addresses #1360: Not directly verifiable from local checkout. The lbug-config.ts commentary references LadybugDB FTS/VECTOR extension load hard-exits Node on Windows #1199 and Windows: gitnexus analyze segfaults during FTS extension load (empty repo repro) #1217 inline. Treat as background context only.

6. Branch hygiene assessment

Clean. One commit. 8 TypeScript files scoped to gitnexus/src/core/lbug/, gitnexus/src/server/api.ts, and targeted unit tests. No i18n, workflow, Docker, dependency, parser, or generated artifact churn. git diff --name-status HEAD~1...HEAD confirms exactly the 8 files listed in the PR. Unicode bidi/control character scan over all 8 changed files: no hidden characters found.

7. Understanding of the change

sidecar-recovery.ts (new file, 254 lines): Shared helper module. Provides: inspectLbugSidecars (classifies DB+WAL+shadow state), preflightLbugSidecars (quarantines tiny orphan WALs before open, warns on large ones), quarantineWalForMissingShadow (renames .wal to .wal.missing-shadow.{ts}-{rand} — preserves forensics, never deletes), finalizeLbugSidecarsAfterClose (post-close sweep), isMissingShadowSidecarError / shadowSidecarRecoveryMessage, and test-reset export.

lbug-adapter.ts (+191/-40): Adds READ_ONLY_SHADOW_REPLAY_PROBE, reopenReadOnlyAfterMissingShadow, reopenWritableAfterMissingShadow, ensureReadOnlyConnectionUsable, and calls preflightLbugSidecars + finalizeLbugSidecarsAfterClose from doInitLbug / safeClose. Read-only open now runs the probe query before returning the connection. Writable open detects missing-shadow error during schema creation and retries after quarantine.

pool-adapter.ts (+112/-0): Adds openReadOnlyDatabase (preflight + probe + quarantine-and-retry on missing shadow), replayShadowPagesWithWritableOpen (temporary writable open to replay dirty pages), and integrates them into doInitLbug. Uses the same sidecar-recovery.ts helpers as the direct adapter.

api.ts (+2/-1): Replaces app.options('*', ...) (Express 4 glob, throws in Express 5) with app.options(PNA_PREFLIGHT_PATH_REGEX, ...) where PNA_PREFLIGHT_PATH_REGEX = /^\/.*$/. The handler calls next() — the global middleware at line 713 already sets Access-Control-Allow-Private-Network: true on all responses.

8. Findings

FINDING 1 — TOCTOU ENOENT in `quarantineWalForMissingShadow` pool-adapter callers (confirmed defect, fork PR Codex comment unfixed)

Severity: HIGH

Risk: When two pool-adapter processes (e.g. two concurrent gitnexus serve instances, or an MCP server restarting while serve is initializing) both detect a tiny-orphan-WAL for the same dbPath, the first quarantines successfully and the second gets ENOENT from fs.rename. This propagates as a fatal error, aborting serve initialization even though the condition was already resolved.

Evidence found:

sidecar-recovery.ts:107:

await fs.rename(walPath, quarantinePath);  // throws ENOENT if WAL already gone

No ENOENT guard here. The function doesn't distinguish "WAL vanished because a peer already handled it" from "WAL path is wrong."

Callers and their ENOENT handling:

Call site	ENOENT handling
`finalizeLbugSidecarsAfterClose` (`sidecar-recovery.ts:199`)	✅ `catch (err) { if (!missing(err)) { warnOnce(...) } }`
`preflightLbugSidecars` (`sidecar-recovery.ts:145`)	❌ no try-catch → propagates to `doInitLbug`
`reopenReadOnlyAfterMissingShadow` (`lbug-adapter.ts:459`)	❌ `catch { throw new Error(shadowSidecarRecoveryMessage) }` → ENOENT becomes "rebuild index" error
`reopenWritableAfterMissingShadow` (`lbug-adapter.ts:486`)	❌ same pattern
`openReadOnlyDatabase` in pool (`pool-adapter.ts:388`)	❌ `catch { throw new Error(shadowSidecarRecoveryMessage) }`
`replayShadowPagesWithWritableOpen` in pool (`pool-adapter.ts:351`)	❌ same pattern

Key distinction: The direct adapter (lbug-adapter.ts) wraps preflightLbugSidecars inside acquireInitLock (lbug-adapter.ts:703), a cross-process file lock (lbug-adapter.ts:310). This serializes concurrent direct-adapter opens for the same dbPath, making the TOCTOU window effectively zero there.

The pool adapter (pool-adapter.ts) uses only initPromises (in-process deduplication, line 471). No acquireInitLock call exists anywhere in pool-adapter.ts. Two pool-adapter processes opening the same dbPath concurrently are unprotected. The openReadOnlyDatabase and replayShadowPagesWithWritableOpen paths are directly exposed to the TOCTOU race.

This is the exact defect the fork PR's Codex review identified, and it remains unaddressed.

Recommended fix: Either (a) guard quarantineWalForMissingShadow internally — catch ENOENT from fs.rename, re-inspect, and return success if the WAL is already gone; or (b) wrap each call site's catch block to distinguish ENOENT (non-fatal, WAL already handled) from EACCES/EPERM (still fatal, permission problem). Option (a) is more defensive. Keep ENOENT non-fatal only when re-inspection confirms the WAL is gone.

Blocks merge: YES — pool adapter lacks cross-process init lock, and ENOENT from quarantine rename is an unhandled fatal error reachable from concurrent pool opens.

Fix this →

FINDING 2 — Large WAL silently quarantined via writable schema-creation path, bypassing preflight policy

Severity: MEDIUM

Risk: preflightLbugSidecars intentionally warns but does NOT quarantine large orphan WALs (kind: 'orphan-wal', size > 4 KB) — the code at sidecar-recovery.ts:153-160 is explicit about this. However, after open, runSchemaCreationQueries (lbug-adapter.ts:555) can catch isMissingShadowSidecarError and return it to doInitLbug:759, which then calls reopenWritableAfterMissingShadow (lbug-adapter.ts:481), which calls quarantineWalForMissingShadow unconditionally — no size check. A large orphan WAL that triggers a missing-shadow error during the first DDL write will be quarantined silently without the user seeing the preflight warning.

Evidence found: lbug-adapter.ts:759-773:

const missingShadowError = await runSchemaCreationQueries(dbPath);
if (missingShadowError) {
  await safeClose();
  resetOpenConnectionState();
  const reopened = await reopenWritableAfterMissingShadow(dbPath, missingShadowError);
  // ...
}

reopenWritableAfterMissingShadow (lbug-adapter.ts:481-496) quarantines without checking WAL size.

Mitigating factor: The quarantined WAL is renamed to .wal.missing-shadow.{ts}-{rand}, not deleted. listQuarantinedMissingShadowWals allows recovery. So no silent permanent data loss occurs — the WAL is findable.

Recommended fix: In reopenWritableAfterMissingShadow, check WAL size before quarantining. If the WAL is large, fail with an actionable message ("large WAL found without shadow sidecar; manual recovery required") instead of quarantining. Alternatively, emit a WARN-level message before the rename explaining that a large WAL was quarantined during write recovery, so the user is not surprised.

Blocks merge: MAYBE — the WAL is preserved (not deleted), so there is no silent permanent data loss. However, the stated DoD criterion "large WAL files are preserved for explicit recovery and never silently deleted/quarantined in a way that hides user data loss" is inconsistent with the current writable recovery path. Whether the rename-to-quarantine-dir counts as "preserved for explicit recovery" is a judgment call. Treat as a required follow-up at minimum.

Fix this →

FINDING 3 — No TOCTOU test for quarantine rename ENOENT in adapter-level callers

Severity: MEDIUM

Risk: No test verifies behavior when fs.rename in quarantineWalForMissingShadow throws ENOENT from a pool-adapter path (openReadOnlyDatabase, replayShadowPagesWithWritableOpen). The existing test at pool-wal-recovery.test.ts:247 covers ENOENT during the corrupt WAL quarantine path (.wal.corrupt.), not the missing-shadow quarantine path (.wal.missing-shadow.).

Evidence found: pool-wal-recovery.test.ts:247-258 tests ENOENT in tryQuarantineAndReopen (corrupt WAL path). No equivalent test exists for openReadOnlyDatabase's quarantineWalForMissingShadow call at pool-adapter.ts:388, nor for replayShadowPagesWithWritableOpen at pool-adapter.ts:351.

Recommended fix: Add a test where connectionQueryMock rejects with the missing-shadow error AND fs.rename rejects with ENOENT. Assert that initLbug either succeeds (ideal) or fails with an informative message (not a raw ENOENT stack trace).

Blocks merge: NO independently, but this gap is what allowed Finding 1 to ship undetected.

FINDING 4 — Error string matching in `isMissingShadowSidecarError` and `isReadOnlyShadowReplayError` is brittle

Severity: LOW

Risk: isMissingShadowSidecarError matches /Cannot open file .*\.shadow: No such file or directory/i (sidecar-recovery.ts:64). isReadOnlyShadowReplayError matches /replay shadow pages under read-only mode/i (pool-adapter.ts:317, lbug-adapter.ts:451). These are hardcoded native LadybugDB message fragments. A LadybugDB version bump that rewrites these strings (e.g., changes "Cannot open file" to "Failed to open file", or localizes error messages) would silently break recovery — the missing-shadow errors would no longer be caught, and serve would emit the native error raw.

Evidence found: Tests use realistic strings matching the current LadybugDB output (e.g., pool-wal-recovery.test.ts:175: "IO exception: Cannot open file ${dbPath}.shadow: No such file or directory"). No LadybugDB version pinned to a specific string contract is visible. The @ladybugdb/core version in gitnexus/package.json is the source of truth (not read in this review — verify directly).

Recommended fix: Add a comment citing the LadybugDB version(s) and source file where these messages originate. If LadybugDB provides structured error codes, prefer those. At minimum, widen the shadow-missing regex to tolerate minor variations (e.g., /.shadow.*[Nn]o such file|[Cc]annot open.*\.shadow/).

Blocks merge: NO — tests use realistic strings; immediate risk is low. Mark as a follow-up before the next LadybugDB major version bump.

FINDING 5 — Express preflight test does not verify PNA header end-to-end

Severity: LOW

Risk: web-ui-serving.test.ts:328-333 verifies that app.options(PNA_PREFLIGHT_PATH_REGEX, ...) does not throw at registration time, and that a real OPTIONS request returns 404 (because no CORS/PNA middleware is set up in the test app). This confirms the route registration fix but does not verify that Access-Control-Allow-Private-Network: true is returned in production, or that an unauthorized origin does not receive the private-network grant.

Evidence found: The global middleware setting Access-Control-Allow-Private-Network: true (api.ts:713-716) runs on all requests regardless of origin, meaning even non-allowlisted origins receive the PNA header. The CORS origin callback only controls Access-Control-Allow-Origin. A browser receiving the PNA header without Access-Control-Allow-Origin will still block the request — so there is no exploitable CORS bypass. The security boundary is intact.

Recommended fix: Add a test that instantiates the full CORS + PNA middleware stack and verifies Access-Control-Allow-Private-Network: true is returned on OPTIONS requests. Not blocking, but would complete the regression coverage claimed in the PR.

Blocks merge: NO — security boundary is maintained by existing CORS middleware; this is a coverage gap, not a defect.

FINDING 6 — `warnedKeys` module-level global survives across serve restarts within a process

Severity: INFO

Risk: warnedKeys at sidecar-recovery.ts:19 is a Set<string> in module scope. A missing-shadow warning for a given dbPath will fire at most once per process lifetime. If serve recovers and the condition recurs, subsequent warnings are silently downgraded to debug. This is intentional anti-spam behavior, but if conditions fluctuate (e.g., repeated checkpoint failures), users may miss persistent problems.

_resetSidecarRecoveryWarningsForTest is exported for test cleanup. Production has no reset path.

Blocks merge: NO — intentional design decision, test reset provided. Worth a follow-up if serve becomes long-lived and users report persistent sidecar issues being silent.

9. PR-specific assessment sections

LadybugDB/lbug sidecar expert lane

State machine reconstructed:

State	Preflight action	Open action	Writable schema action	Post-close action
clean (no wal, no shadow)	return `clean`	open normally	run schema queries	no-op
wal + shadow (healthy WAL)	return `wal-with-shadow`	LadybugDB replays normally	run schema queries	no-op
tiny orphan wal (≤ 4KB, no shadow)	quarantine WAL (rename) → re-inspect → `clean`	open normally	run schema queries	quarantine if still present
large orphan wal (> 4KB, no shadow)	warn, return `orphan-wal`, do NOT quarantine	LadybugDB replay attempt	quarantine if schema query triggers missing-shadow ← inconsistency	warn, preserve
missing shadow after open (probe error)	N/A	probe → `isMissingShadowSidecarError` → quarantine + reopen	—	no-op
shadow replay needed (read-only)	N/A	probe → `isReadOnlyShadowReplayError` → writable open → re-read-only	—	no-op

The state machine is largely sound. The inconsistency in the "large orphan wal → writable schema" cell is Finding 2. All other transitions are correct.

Quarantine file naming: ${dbPath}.wal.missing-shadow.${Date.now()}-${Math.random().toString(36).slice(2)} — timestamp + random suffix. Deterministic enough for forensic debugging (timestamp identifies when quarantine occurred). Collision-safe: the random 5-7 char suffix prevents collisions on rapid concurrent calls. listQuarantinedMissingShadowWals correctly enumerates these files.

EACCES/EPERM handling: finalizeLbugSidecarsAfterClose correctly re-raises non-ENOENT errors via warnOnce. The adapter-level callers swallow all errors in the quarantine catch block (converting them to "rebuild index" messages), which may hide EACCES/EPERM. This is acceptable for the recovery path but could make permission problems harder to diagnose.

Windows handle-release timing: safeClose in lbug-adapter.ts:1542 explicitly probes Windows handle release before calling finalizeLbugSidecarsAfterClose. The pool adapter closeOne uses conn.close().catch(() => {}) (fire-and-forget) with no Windows probe — this is a pre-existing pattern, not introduced by this PR.

Direct adapter lane

preflightLbugSidecars is called at doInitLbug:741 inside the acquireInitLock critical section (doInitLbug:703). This cross-process lock serializes concurrent direct-adapter opens for the same dbPath, effectively preventing the TOCTOU race in this path. ✓

ensureReadOnlyConnectionUsable (lbug-adapter.ts:498) is also inside the lock. Its connection handle lifecycle is correct:

Initial handle closed before writable open (closeLbugConnection(handle) at 507, 511, 516) ✓
Writable handle closed in finally block at 528-530 ✓
Reopened handle closed on probe failure at 540 ✓

runSchemaCreationQueries (lbug-adapter.ts:555) correctly distinguishes:

isMissingShadowSidecarError → returns error for caller to handle
isWalCorruptionError → calls safeClose() + resetOpenConnectionState() + throws with actionable message ✓
Other errors: logs WARN if not "already exists" / busy / read-only ✓

resetOpenConnectionState (lbug-adapter.ts:548) clears currentDbPath, ftsLoaded, vectorExtensionLoaded, ensuredFTSIndexes — state reset is consistent on error paths. ✓

FTS: read-only open uses policy: 'load-only' at lbug-adapter.ts:780. Schema DDL is skipped for read-only opens (if (!readOnly) { ... runSchemaCreationQueries ... } at line 758). Read-only path does not trigger network INSTALL. ✓

safeClose (lbug-adapter.ts:1514): calls flushWAL then closes conn then db, then Windows handle probe, then finalizeLbugSidecarsAfterClose. Sequence is correct; no resources left open on the happy path. ✓

queryAndDrain resource management: lbug-adapter.ts:442 always drains (await drainQueryResult(queryResult)) before returning. Not verified from the implementation of drainQueryResult in this review, but callers depend on it consistently. No obvious leak found.

Pool adapter lane

The pool adapter does not use acquireInitLock. In-process deduplication via initPromises (pool-adapter.ts:454) is correct for single-process use. Cross-process protection is absent — this is the source of Finding 1.

openReadOnlyDatabase (pool-adapter.ts:367): preflight + read-only open + probe + quarantine-on-shadow-missing + second preflight + second open + second probe. Connection handle lifecycle in the missing-shadow path (pool-adapter.ts:384-407):

db.close() called when probe fails at 385 ✓
db set to undefined at 386 ✓
On final throw at 423, if (db) await db.close() ✓

replayShadowPagesWithWritableOpen (pool-adapter.ts:342): opens a writable DB, probes, closes in finally. Correct handle management. ✓

Pool integrity check at pool-adapter.ts:698-705: throws if totalConns < MAX_CONNS_PER_REPO, catching connection leaks during pool warmup. ✓

A single bad WAL sidecar does NOT poison the pool — doInitLbug throws before the pool entry is registered (pool entry registered last, at line 602). Concurrent callers waiting on initPromises will all see the same error and each retry independently. ✓

Express 5 serve startup lane

The fix: api.ts:133: export const PNA_PREFLIGHT_PATH_REGEX = /^\/.*$/; used in place of the Express 4 '*' glob. Express 5 accepts regex routes. The regex matches all paths starting with / — correct scope for a global OPTIONS handler.

Behavior preserved: The handler at api.ts:722-724 calls next(). The Access-Control-Allow-Private-Network: true header is already set by global middleware at api.ts:713. This is correct — the OPTIONS handler does not need to set headers itself; it just lets the CORS middleware (which runs on all methods including OPTIONS) handle the response.

CORS security boundary: isAllowedOrigin (api.ts:60) returns false for non-allowlisted origins. The cors() middleware at api.ts:701 uses this. The PNA header (Access-Control-Allow-Private-Network: true) is sent to all origins by the global middleware, but browsers only act on it when Access-Control-Allow-Origin is also present — and that is controlled by the CORS callback. No security regression introduced. ✓

Test at web-ui-serving.test.ts:328-333: Verifies startup no longer crashes. The test makes a real OPTIONS request and gets 404 (no routes registered on the minimal test app). This confirms the route registration fix without false positives. Gap noted in Finding 5.

Test coverage lane

sidecar-recovery.test.ts (real FS, no mocks):

inspectLbugSidecars: clean state ✓, wal+shadow ✓
Preflight: tiny orphan WAL quarantined without WARN ✓, large orphan WAL not quarantined + WARN ✓
Finalize: tiny orphan WAL quarantined ✓
Disable via env var ✓
listQuarantinedMissingShadowWals filter ✓
Gap: No TOCTOU test (ENOENT from quarantine rename). No concurrent open simulation.

pool-wal-recovery.test.ts:

Corrupt WAL quarantine + retry ✓, shadow replay ✓, missing shadow quarantine + reopen ✓, lock error no-quarantine ✓, retry failure error message ✓, no poisoned state after failure ✓
Line 247 test ("handles quarantine gracefully when .wal file does not exist"): Tests ENOENT during the corrupt WAL quarantine path (tryQuarantineAndReopen). This path does handle ENOENT (it throws, and the test asserts rejects.toThrow(/gitnexus analyze/)). This test does NOT cover the missing-shadow quarantine ENOENT path in openReadOnlyDatabase.

lbug-adapter-wal-schema.test.ts:

Writable missing-shadow quarantine + retry schema creation ✓
Read-only open skips DDL, uses load-only FTS ✓
Shadow replay with temp writable open ✓
Read-only missing-shadow quarantine + reopen ✓
WAL corruption mid-schema → safeClose called ✓
Gap: No test for ENOENT from quarantine rename in reopenReadOnlyAfterMissingShadow or reopenWritableAfterMissingShadow.

web-ui-serving.test.ts:

Express 5 route registration no-throw ✓
Real OPTIONS request handled without crash ✓
Gap: No end-to-end test for Access-Control-Allow-Private-Network header presence.

Overall: Tests prove the happy-path recovery and the large-WAL non-quarantine behavior. The TOCTOU error path (Finding 1) is not tested, which is why the defect shipped.

Cross-domain lane

Both changes (Express 5 route, lbug sidecar recovery) are framed as gitnexus serve startup stability. They are causally connected at the serve-startup level. The Express route fix is a 2-line change with no interaction with the lbug changes. Combining them in one PR is acceptable given the scope, but the lbug sidecar changes are substantially larger and independently risky. If the Express fix is urgently needed, it could be fast-tracked separately. As-is, the combined PR is structurally acceptable — but the blocking lbug defect (Finding 1) prevents the whole PR from merging.

10. Back-and-forth avoided by verifying

Init lock scope verified: acquireInitLock is a cross-process file lock at lbug-adapter.ts:310. It protects the direct-adapter init path including preflightLbugSidecars. This eliminates the TOCTOU risk in the direct-adapter path but does NOT cover the pool adapter.
Large WAL not deleted verified: quarantineWalForMissingShadow uses fs.rename, not fs.unlink. Quarantined files are findable via listQuarantinedMissingShadowWals. cleanQuarantinedMissingShadowWals is the only deletion path and is not auto-invoked anywhere in this PR.
FTS policy verified: Read-only open uses policy: 'load-only' at both direct adapter (lbug-adapter.ts:780) and pool adapter (pool-adapter.ts:595). No DDL or network INSTALL in read-only paths.
State reset verified: resetOpenConnectionState clears all 4 module-level fields (currentDbPath, ftsLoaded, vectorExtensionLoaded, ensuredFTSIndexes). Called on all error paths in doInitLbug.
Probe query safety verified: MATCH (n) RETURN n LIMIT 1 is valid Cypher on empty/no-node graphs — returns empty result set, not an error. No schema or extension requirement. No DDL side effect. ✓
Branch hygiene verified: git diff --name-status HEAD~1...HEAD confirms exactly 8 files. No i18n pollution from PR feat(i18n): make web and CLI language-aware #1748. Unicode scan clean.

11. Open questions (unavoidable)

Live CI state: The 2 cancelled PR Description Check annotations must be confirmed as stale superseded runs before merge. Cannot verify from local checkout.
LadybugDB version and string contract: The @ladybugdb/core version in gitnexus/package.json was not read. Verify the missing-shadow and shadow-replay error strings are stable across the pinned version range before merge.
Pool adapter cross-process init lock: Was the absence of acquireInitLock in pool-adapter.ts intentional (e.g., "pool is MCP-only and single-process")? If the pool is always single-process, Finding 1's TOCTOU risk is substantially reduced to a theoretical concern — but the ENOENT handling in the recovery callers remains incorrect regardless.

12. Final verdict

not production-ready

Justification: Finding 1 is a confirmed defect: quarantineWalForMissingShadow propagates ENOENT from fs.rename as a fatal error in all pool-adapter callers (openReadOnlyDatabase, replayShadowPagesWithWritableOpen), while finalizeLbugSidecarsAfterClose correctly treats ENOENT as non-fatal. The fork PR's Codex review explicitly flagged this, and it remains unaddressed. Two concurrent pool-adapter processes opening the same DB path can race on the WAL quarantine, causing serve initialization to abort with a spurious "rebuild the index" error even though the WAL was already handled. Finding 2 (large WAL quarantined during writable schema creation despite preflight deliberately warning-only for large WALs) is a secondary concern — the WAL is preserved via rename, but the behavior is inconsistent with the stated design goal. Both findings require small, targeted fixes. The Express 5 route fix and the overall sidecar recovery architecture are sound and well-tested, and both fixes should be unblocked once ENOENT handling is added to the quarantine rename call sites in the pool adapter paths.

magyargergo · 2026-05-21T07:56:55Z

@ChamHerry Could you please look into Claude's findings and address them? 🙏

magyargergo · 2026-05-21T09:11:09Z

I'm working on it now! This is an important fix and thank you for raising it! 🙏

…aware errors (PR abhigyanpatwari#1747 review) Addresses the production-readiness review of PR abhigyanpatwari#1747 (Findings 1, 2, 3 of 6). Findings 4, 5, 6 are deferred to follow-ups per the plan. 1. ENOENT-tolerance scoped to pool-adapter callers only - `quarantineWalForMissingShadow` stays strict in `sidecar-recovery.ts`. The direct adapter calls it inside `acquireInitLock` (cross-process file lock) — ENOENT there means the file vanished under lock and remains a real bug to surface. - New `tryQuarantineForMissingShadow` local helper in `pool-adapter.ts` returns a discriminated union { kind: 'quarantined', path } | { kind: 'peer-handled' }. Catches ENOENT, re-verifies via statIfExists, and converts to 'peer-handled' only when WAL really is gone. Defensive: if ENOENT but WAL still present, throws as classified error rather than silently returning success. 2. Symmetric WAL-size gate on both recovery paths - `refuseLargeWalQuarantine` applied in both `reopenReadOnlyAfterMissingShadow` and `reopenWritableAfterMissingShadow`. Closes the read-only data-loss vector (large orphan WAL silently discarded would never be replayed by a later writable open). 3. Permission-aware error classifier - New `renameFailureMessage` and `isPermissionRenameError` in `sidecar-recovery.ts`. EACCES / EPERM / EBUSY now surface a permission-specific message pointing at ACLs, AV exclusions, and file-locks. Other codes (ENOSPC, EROFS, EIO, ENOENT) fall through to `shadowSidecarRecoveryMessage`. - Used at both pool-adapter and direct-adapter caller catches around `quarantineWalForMissingShadow`. - `doInitLbug`'s pass-through classifier extended to include the new permission message. The lock-retry substring match tightened so "file-lock error" in the permission message is not mistaken for a LadybugDB lock-retry trigger. Tests - sidecar-recovery.test.ts: 7 new tests for `renameFailureMessage` and `isPermissionRenameError`. - pool-wal-recovery.test.ts: 6 new tests covering ENOENT race, EACCES/EPERM/EBUSY classification, ENOSPC fallthrough, and the defensive "WAL still present after ENOENT" branch. - lbug-adapter-wal-schema.test.ts: 5 new tests covering the symmetric size gate on both recovery paths, including the boundary at exactly TINY_ORPHAN_WAL_BYTES (4096) and the off-by-one at 4097. Deferred (tracked as follow-up work) - Brittle LadybugDB error-string matching (Finding 4). - PNA header end-to-end coverage gap (Finding 5). - warnedKeys module-global persistence (Finding 6). - Cross-process init lock for pool-adapter.

# Conflicts: # gitnexus/src/server/api.ts

…am (PR abhigyanpatwari#1747 review, Findings 4 & 6) Smallest viable response to the two remaining non-blocking findings from the production-readiness review of PR abhigyanpatwari#1747. An earlier-revision plan proposed regex widening + a near-miss detector + per-dbPath warn scoping; an adversarial doc-review found those defended against hypothetical strings LadybugDB does not produce, added observability theater with no recovery behavior change, and did not actually fix the long-running gitnexus serve case for hot dbPaths (where finalizeLbugSidecarsAfterClose rarely fires). Scope shrunk to dedup + counter-based — strictly behavior-changing and fully testable. Finding 4 — dedup + version-coupling markers - `isReadOnlyShadowReplayError` was inlined in both `lbug-adapter.ts:451` and `pool-adapter.ts:317`. Centralized as an export from `sidecar-recovery.ts`. The two local copies are removed; both adapters now import from the shared module. - Both LadybugDB-coupled predicates (`isMissingShadowSidecarError` and `isReadOnlyShadowReplayError`) gain a `// LADYBUGDB-CONTRACT:` marker comment citing `@ladybugdb/core ^0.16.1`. When bumping LadybugDB, `git grep "LADYBUGDB-CONTRACT"` enumerates every version-coupled spot. - Strict matcher unchanged — when LadybugDB actually changes the error format, the failure mode stays loud (raw native error propagates) and the markers make every affected predicate trivially greppable. Finding 6 — counter-based warn anti-spam - `warnedKeys: Set<string>` → `warnedKeyCounts: Map<string, number>`. `warnOnce` keeps its signature `(logger, key, message)` and keying convention unchanged — the swap is internal. - `WARN_MILESTONES = [1, 10, 100, 1000, 10000]`. Logarithmic spacing gives O(log N) warns for a condition that fires N times. Past the first occurrence the warn message is suffixed with "(Nth occurrence of this condition)" so persistence is visible in the log line itself. - Solves the long-running serve case: a hot dbPath hitting the same condition 100 times now fires 3 warns (occurrences 1, 10, 100) instead of 1 warn + 99 silent debug lines. Tests (10 new in sidecar-recovery.test.ts, all green) - Centralized isReadOnlyShadowReplayError: positive match, false-positive guard, structural assertion that the duplicate regex is gone from both adapter files, LADYBUGDB-CONTRACT marker count. - Counter-based warnOnce: milestone-at-10 with suffix, milestone-at-100, key isolation across dbPaths, reset zeroes the counter, first-occurrence message does NOT carry the suffix. Deferred (tracked separately) - Finding 5 — PNA header end-to-end coverage gap (CORS boundary is sound). - LadybugDB structured error codes (if/when the library exposes them). - Per-call milestone configurability — re-open if tuning is needed.

github-actions · 2026-05-21T10:49:27Z

✨ PR Autofix

Found fixable formatting / unused-import issues across 96 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":1747,"changed_lines":96,"head_sha":"3d70660b522ca7fb6cdec344d2cac5fa64a29294","run_id":"26221379665","apply_command":"/autofix"}

magyargergo · 2026-05-21T10:58:48Z

/autofix

github-actions · 2026-05-21T10:59:07Z

✅ Applied autofix and pushed a commit. (apply run)

Add shared CLI translation resources and web i18next/react-i18next wiring so GitNexus can present core command and UI text in English or Chinese while preserving existing defaults. Constraint: PR abhigyanpatwari#1748 was previously stacked on obsolete abhigyanpatwari#1747 work, so this branch is rebuilt from upstream/main and contains only i18n changes.\nRejected: Keeping the old stacked branch | it carried obsolete lbug sidecar commits and stayed conflicting after abhigyanpatwari#1747 merged.\nConfidence: medium\nScope-risk: broad\nDirective: Keep future i18n follow-ups separated from native-worker and lbug recovery fixes; update both locale trees together when adding translatable UI text.\nTested: GitNexus impact analysis for App, Header, useAppState, cleanCommand, doctorCommand; GitNexus detect_changes on staged linked worktree; CLI i18n vitest targets; web i18n vitest targets; gitnexus npx tsc --noEmit; gitnexus-web npx tsc -b --noEmit; gitnexus-web npm test; gitnexus-web npm run build; gitnexus npm run build.\nNot-tested: Browser visual/manual language switching; production Vercel deployment; full gitnexus npm test ended with Vitest unhandled simulated worker-pool startup crash after all 355 files and 9180 assertions passed.

Add shared CLI translation resources and web i18next/react-i18next wiring so GitNexus can present core command and UI text in English or Chinese while preserving existing defaults. Keep the heartbeat reconnect banner on a dedicated translation key so disconnect recovery remains distinguishable from initial server connection failures. Constraint: PR abhigyanpatwari#1748 was previously stacked on obsolete abhigyanpatwari#1747 work, so this branch is rebuilt from upstream/main and contains only i18n changes. Rejected: Keeping the old stacked branch | it carried obsolete lbug sidecar commits and stayed conflicting after abhigyanpatwari#1747 merged. Confidence: medium Scope-risk: broad Directive: Keep future i18n follow-ups separated from native-worker and lbug recovery fixes; update both locale trees together when adding translatable UI text. Tested: GitNexus impact analysis for App, Header, useAppState, cleanCommand, doctorCommand, and AppContent; GitNexus detect_changes on the linked worktree; CLI i18n vitest targets; web i18n vitest targets; gitnexus npx tsc --noEmit; gitnexus-web npx tsc -b --noEmit; gitnexus-web npm test; gitnexus-web npm run build; gitnexus npm run build; local Playwright heartbeat-reconnect e2e spec. Not-tested: Full browser visual/manual language switching; production Vercel deployment; full gitnexus npm test ended with Vitest unhandled simulated worker-pool startup crash after all 355 files and 9180 assertions passed.

ChamHerry changed the title ~~Stabilize LadybugDB sidecar recovery~~ fix(lbug): keep serve stable when sidecars are missing May 21, 2026

This was referenced May 21, 2026

feat(i18n): make web and CLI language-aware #1748

Merged

[codex] Stabilize LadybugDB sidecar recovery ChamHerry/GitNexus#1

Closed

Merge branch 'main' into codex/lbug-sidecar-recovery-20260521

8f8de49

magyargergo self-assigned this May 21, 2026

magyargergo added 3 commits May 21, 2026 11:13

Merge remote-tracking branch 'origin/main' into pr-1747-review

7d97baa

# Conflicts: # gitnexus/src/server/api.ts

chore(autofix): apply prettier + eslint fixes via /autofix command

b24436e

magyargergo added 2 commits May 21, 2026 12:06

ci: trigger CI rebuild

9e09470

Merge branch 'main' into codex/lbug-sidecar-recovery-20260521

3cb1804

magyargergo merged commit a9fef2c into abhigyanpatwari:main May 21, 2026
30 of 31 checks passed

ChamHerry deleted the codex/lbug-sidecar-recovery-20260521 branch May 21, 2026 15:47

magyargergo mentioned this pull request May 22, 2026

fix(lbug): skip init lock and filesystem mutations for read-only opens (#1783) #1784

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(lbug): keep serve stable when sidecars are missing#1747

fix(lbug): keep serve stable when sidecars are missing#1747
magyargergo merged 8 commits into
abhigyanpatwari:mainfrom
ChamHerry:codex/lbug-sidecar-recovery-20260521

ChamHerry commented May 21, 2026

Uh oh!

vercel Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

magyargergo commented May 21, 2026

Uh oh!

magyargergo commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

magyargergo commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChamHerry commented May 21, 2026

Summary

Why

Validation

Follow-up

Uh oh!

vercel Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Report

Pipeline Status

Test Results

Code Coverage

Tests

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Production-Readiness Review: PR #1747 fix(lbug): keep serve stable when sidecars are missing

1. Review bar for this PR

2. Problem being solved

3. Current PR state

4. Merge status and mergeability

5. Repository history considered

6. Branch hygiene assessment

7. Understanding of the change

8. Findings

FINDING 1 — TOCTOU ENOENT in quarantineWalForMissingShadow pool-adapter callers (confirmed defect, fork PR Codex comment unfixed)

FINDING 2 — Large WAL silently quarantined via writable schema-creation path, bypassing preflight policy

FINDING 3 — No TOCTOU test for quarantine rename ENOENT in adapter-level callers

FINDING 4 — Error string matching in isMissingShadowSidecarError and isReadOnlyShadowReplayError is brittle

FINDING 5 — Express preflight test does not verify PNA header end-to-end

FINDING 6 — warnedKeys module-level global survives across serve restarts within a process

9. PR-specific assessment sections

LadybugDB/lbug sidecar expert lane

Direct adapter lane

Pool adapter lane

Express 5 serve startup lane

Test coverage lane

Cross-domain lane

10. Back-and-forth avoided by verifying

11. Open questions (unavoidable)

12. Final verdict

Uh oh!

magyargergo commented May 21, 2026

Uh oh!

magyargergo commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

✨ PR Autofix

Uh oh!

magyargergo commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

Production-Readiness Review: PR #1747 `fix(lbug): keep serve stable when sidecars are missing`

FINDING 1 — TOCTOU ENOENT in `quarantineWalForMissingShadow` pool-adapter callers (confirmed defect, fork PR Codex comment unfixed)

FINDING 4 — Error string matching in `isMissingShadowSidecarError` and `isReadOnlyShadowReplayError` is brittle

FINDING 6 — `warnedKeys` module-level global survives across serve restarts within a process