Skip to content

fix(fts): Don't cache failed FTS index ensure; invalidate on pool teardown#1006

Merged
magyargergo merged 2 commits into
mainfrom
copilot/fix-bm25-index-creation-error
Apr 21, 2026
Merged

fix(fts): Don't cache failed FTS index ensure; invalidate on pool teardown#1006
magyargergo merged 2 commits into
mainfrom
copilot/fix-bm25-index-creation-error

Conversation

Copilot AI commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

ensureFTSIndexViaExecutor recorded the (repoId, table) ensure-key on every code path, including the catch-all branch. A single transient CREATE_FTS_INDEX failure (e.g. lock contention on first query after MCP server start) therefore disabled BM25 for that repo for the lifetime of the process — silently, since the query path turns FTS errors into [].

Changes

  • gitnexus/src/core/search/bm25-index.ts — Move ensuredPoolFTS.add(key) into the two success branches: CREATE_FTS_INDEX succeeded, or it failed with 'already exists'. All other errors leave the key unset and emit a console.warn with repoId / table / index / message so the failure is no longer invisible.
  • gitnexus/src/core/lbug/pool-adapter.ts — Add an addPoolCloseListener(listener) → disposer registry and fire registered listeners from closeOne after pool.delete(repoId). Listener exceptions are isolated so teardown always completes.
  • gitnexus/src/core/search/bm25-index.ts — Lazily register a pool-close listener (idempotent, on first MCP-pool-backed query) that calls a new exported invalidateEnsuredFTSForRepo(repoId) so LRU eviction / idle close / explicit closeLbug drop the matching ensure-state instead of letting it outlive the pool that produced it.
  • Testsgitnexus/test/unit/bm25-search.test.ts adds four regression cases: transient CREATE failure → second call retries (5 CREATEs again); 'already exists' is cached (0 CREATEs on retry); explicit invalidateEnsuredFTSForRepo clears entries; pool-close listener invalidates only the closed repo, leaving siblings intact.

Before / after

// before — key cached on EVERY path, including transient failures
} catch (e: any) {
  const msg = String(e?.message ?? '');
  if (!msg.includes('already exists')) {
    // swallow
  }
}
ensuredPoolFTS.add(key); // ← poisons the cache on failure

// after — cache only on confirmed success
try {
  await executor(`CALL CREATE_FTS_INDEX(...)`);
  ensuredPoolFTS.add(key);
} catch (e: any) {
  const msg = String(e?.message ?? '');
  if (msg.includes('already exists')) {
    ensuredPoolFTS.add(key);
  } else {
    console.warn(`[gitnexus] FTS index ensure failed for repo "${repoId}" ...`);
    // key left unset → next query retries
  }
}

Notes

  • CLI/pipeline FTS path (lbug-adapter.ts) is untouched; this only affects the MCP-pool path.
  • No retry/backoff added — first-retry-works is sufficient per the issue; repeated failures still fall through to [] and re-warn.
  • No schema, MCP contract, or persistence changes.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • extension.ladybugdb.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/GitNexus/GitNexus/gitnexus/node_modules/.bin/vitest run test/unit/bm25-search.test.ts (dns block)
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/GitNexus/GitNexus/gitnexus/node_modules/.bin/vitest run test/unit/stdout-silence.test.ts test/unit/calltool-dispatch.test.ts test/unit/impact-batching-grouping.test.ts Release/obj.target/tree_sitter_dart_binding/bindings/node/bindin/home/REDACTED/.cache/node-gyp/24./home/REDACTED/work/GitNexus/GitNexus/gitnexus/node_modules/.bin/tsc xus/gitnexus/vendor/tree-sitter-proto/node_modules/.bin/as e --global (dns block)
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/GitNexus/GitNexus/gitnexus/node_modules/.bin/vitest run test/unit modu�� st line and append a colon to the remaining line-I l/openssl/include modules/@npmcli/run-script/lib/node-gyp-bin/sh lude --global (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@vercel

vercel Bot commented Apr 21, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Apr 21, 2026 7:38am

Request Review

Copilot AI changed the title [WIP] Fix BM25 index creation failure handling Don't cache failed FTS index ensure; invalidate on pool teardown Apr 21, 2026
Copilot AI requested a review from magyargergo April 21, 2026 07:39
@github-actions

github-actions Bot commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 4m 32s —— View job


PR Review — Don't cache failed FTS index ensure; invalidate on pool teardown

Tasks:

  • Read repo docs (AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md)
  • Read changed files (bm25-index.ts, pool-adapter.ts, bm25-search.test.ts)
  • Define production-ready bar
  • Review diff against bar
  • Post final review

Production-ready bar for this PR

Based on repo docs and the affected area (MCP pool path → BM25/FTS search):

  • Correctness: The ensure-cache must only be set on confirmed success; transient failures must leave the key unset and retry transparently.
  • Operational safety: Pool teardown (LRU eviction, idle close, explicit closeLbug) must invalidate per-repo FTS ensure-state so a fresh pool re-creates indexes rather than trusting stale state from a previous pool lifetime.
  • Listener isolation: A throwing close listener must not abort teardown for other listeners or break the overall pool lifecycle.
  • LadybugDB single-writer invariant: No writes introduced to the read-only pool path beyond what was already there; no new lock-contention surface.
  • Test coverage: Unit tests in gitnexus/test/unit/ covering (a) transient failure → retry, (b) "already exists" → cached, (c) explicit invalidation, (d) pool-close listener scoping.
  • Contract stability: No changes to MCP tool schemas, resource URIs, or exported public API shapes beyond the new addPoolCloseListener export.
  • PR title: Must follow <type>[(scope)]: <subject> per CONTRIBUTING.md.
  • No secrets, no unscoped renames, no drive-by refactors per GUARDRAILS.md.

Findings

1. Correctness & functional completeness

The core fix is verified correct. In the original code, ensuredPoolFTS.add(key) appeared unconditionally after the try/catch block, so any error path — including transient lock contention — would poison the cache. The diff moves the add into exactly two branches: CREATE succeeded (bm25-index.ts:101) and CREATE threw "already exists" (bm25-index.ts:111). All other error paths leave the key unset and emit a console.warn (bm25-index.ts:113–117). This precisely matches the stated intent.

The pool-close listener machinery is wired correctly:

  • closeOne fires listeners after pool.delete(repoId) (pool-adapter.ts:185–195), so isLbugReady(repoId) === false is already true when invalidateEnsuredFTSForRepo runs — no TOCTOU window.
  • registerPoolCloseListenerOnce (bm25-index.ts:78–84) is idempotent via a module-level flag, preventing duplicate listener registration across repeated MCP-pool queries.
  • invalidateEnsuredFTSForRepo iterates the Set by prefix (repoId:), correctly scoping invalidation to one repo without touching siblings.

No missing wiring, no dead branches, no MCP contract changes.


2. Code clarity & clean code

## NIT — Parameter shadow: addPoolCloseListener reused as parameter name

  • Evidence: bm25-index.ts:79
  • Why it matters: The function registerPoolCloseListenerOnce names its parameter addPoolCloseListener, which is the same identifier as the export from pool-adapter.ts. Inside the function body the parameter is used, not the module import — this is correct but creates a subtle naming collision that makes grep and impact-analysis harder.
  • Recommendation: Rename the parameter to register or registerListener.
  • Confidence: High

## NIT — Disposer silently discarded

  • Evidence: bm25-index.ts:83addPoolCloseListener((repoId) => ...) returns a () => void disposer that is never stored.
  • Why it matters: The listener can never be removed. Intentional for process lifetime but undocumented; a future contributor may copy this pattern and expect cleanup to work.
  • Recommendation: Add an inline comment: // Disposer intentionally discarded — listener must fire for the process lifetime.
  • Confidence: High

No drive-by refactors, no barrel regressions, no // removed comments, no unused re-exports.


3. Test coverage & change safety

Four new cases in gitnexus/test/unit/bm25-search.test.ts (lines 204–312):

Case Behavior tested Assertion quality
Transient failure → retry 5 CREATEs on call 1; 5 CREATEs again on call 2 Meaningful — directly guards the regression
"already exists" → cached 0 CREATEs on call 2 Meaningful
invalidateEnsuredFTSForRepo 0 CREATEs before, 5 CREATEs after Meaningful
Pool-close listener scoping REPO re-creates; OTHER does not Meaningful

## NON-BLOCKING — poolCloseListenerRegistered not reset between tests; test relies on call ordering

  • Evidence: bm25-index.ts:77 (module-level let poolCloseListenerRegistered = false); bm25-search.test.ts:288 (expect(poolCloseListeners.length).toBeGreaterThanOrEqual(1)).
  • Why it matters: The assertion >= 1 is only true because an earlier test within the same file already triggered registerPoolCloseListenerOnce. If the pool-close listener test ever runs in isolation (e.g. vitest run --filter "pool-close") with module re-evaluation, poolCloseListeners could be empty and the assertion would still pass (>= 1 would fail — actually would break the test, which is the right outcome). More importantly, if Vitest module isolation resets bm25-index.ts between test files but the mock module is shared, the listener count and poolCloseListenerRegistered could desync. Currently this works because the tests run in the same module evaluation.
  • Recommendation: The test could be made self-contained by calling searchFTSFromLbug once before the assertion, or by asserting >= 1 only after the first pooled call within that it block. Low urgency — the existing tests pass and this is a test-isolation concern, not a production concern.
  • Confidence: Medium

## NON-BLOCKING — OTHER repo state not cleaned up in beforeEach

  • Evidence: bm25-search.test.ts:311invalidateEnsuredFTSForRepo(OTHER) is called at the end of the pool-close test, not in beforeEach/afterEach.
  • Why it matters: If the test fails before line 311, OTHER state bleeds into subsequent test reruns within the same process. beforeEach only resets REPO (line 199).
  • Recommendation: Move cleanup for OTHER into an afterEach or reset all known repos at the start of each test.
  • Confidence: High

TypeScript compile (tsc --noEmit) and vitest run test/unit cannot be verified from the diff alone — CI results in the PR checks tab are the authoritative source.


4. Performance

  • invalidateEnsuredFTSForRepo iterates the full ensuredPoolFTS Set on every pool close (bm25-index.ts:67–69). With 5 FTS indexes × N repos, Set size is O(5N). At MAX_POOL_SIZE = 5 repos, max size is 25 entries. Iteration cost is negligible.
  • poolCloseListeners is a Set<PoolCloseListener> (pool-adapter.ts:49). The close path iterates it synchronously after pool.delete (pool-adapter.ts:189–195). With one registered listener in practice, this is O(1).
  • No new hot-path overhead on the query path. Ensure-check is a Set.has O(1) guard (bm25-index.ts:94).

No performance concerns.


5. Operational risk

## NON-BLOCKING — CREATE_FTS_INDEX dispatched through the read-only pool

  • Evidence: bm25-index.ts:97–99; pool-adapter.ts:308–315 (readOnly: true); pool-adapter.ts:552–554 (isWriteQuery check).
  • Why it matters: The DB is opened read-only. CALL CREATE_FTS_INDEX(...) passes isWriteQuery (the regex \bCREATE\b does not match CREATE_FTS_INDEX because _ is a word character — no word boundary after E). Whether the FTS call actually succeeds against a read-only DB depends on LadybugDB internals. In practice the happy path is "already exists" (analyze created the index during ingest), so this never surfaces. A fresh DB with no prior analyze would fail with something other than "already exists"; with this PR that failure is correctly not cached, and the query falls back to []. This is pre-existing behavior — not introduced by this PR — but the comment at pool-adapter.ts:638–641 should arguably acknowledge CALL CREATE_FTS_INDEX alongside CALL QUERY_FTS_INDEX to prevent future confusion.
  • Recommendation: Add CREATE_FTS_INDEX to the inline comment on CYPHER_WRITE_RE explaining why it's not blocked. Fix this →
  • Confidence: High (pre-existing gap; not introduced here)

Rollback safety: Reverting this PR restores the poisoned-cache behavior but does not require re-analyze. The ensuredPoolFTS Set is process-scoped and ephemeral; no on-disk state changes. Safe to revert.

MCP contracts: No changes to tool schemas, resource URIs, or response shapes. addPoolCloseListener is a new export on an internal module; no MCP surface change.

LadybugDB single-writer invariant: Preserved. No new write paths introduced.

Embeddings, staleness, Windows paths: Untouched by this PR.


6. Maintainability

The close-listener pattern is clean and discoverable: addPoolCloseListener is exported from pool-adapter.ts with a JSDoc and the listener registration in bm25-index.ts has an inline comment explaining why. The pipeline phase DAG is not touched. No cross-phase coupling introduced.


Non-negotiables verification

Check Status
No secrets / tokens / machine paths ✅ Verified
No blind find-and-replace renames ✅ No renames
No HIGH/CRITICAL impact left unaddressed ✅ N/A — no shared symbol renames
PolyForm Noncommercial license untouched ✅ Verified
PR title follows conventional commit format ❌ Title is Don't cache failed FTS index ensure; invalidate on pool teardown — missing <type> prefix. CONTRIBUTING.md states this MUST follow the format (e.g., fix(search): don't cache failed FTS index ensure; invalidate on pool teardown). pr-labeler.yml enforces this in CI.

What I could not fully verify

  • Whether tsc --noEmit and vitest run test/unit pass — CI is the authoritative source (the PR's firewall note shows tests were attempted; results need confirmation in the Checks tab).
  • Whether LadybugDB's CREATE_FTS_INDEX procedure actually errors on a read-only connection vs. succeeds silently — this affects which catch branch fires in practice, but both branches are now handled correctly regardless.
  • The pre-existing CLI path (lbug-adapter.ts ensureFTSIndex) behavior is unchanged by this PR and was not reviewed.

Final verdict

Ready to merge — no blocking issues.

The core bug fix is correct and directly addresses the root cause. The pool-close listener wiring is sound. The four regression tests are meaningful and guard the critical behaviors. Non-blocking items (test isolation hygiene, disposer comment, pre-existing read-only/CREATE gap) do not affect production correctness. The PR title format violation is flagged for cleanup before or after merge.

@magyargergo magyargergo changed the title Don't cache failed FTS index ensure; invalidate on pool teardown fix(fts): Don't cache failed FTS index ensure; invalidate on pool teardown Apr 21, 2026
@magyargergo magyargergo marked this pull request as ready for review April 21, 2026 07:47
@magyargergo magyargergo merged commit f14068e into main Apr 21, 2026
28 of 29 checks passed
@magyargergo magyargergo deleted the copilot/fix-bm25-index-creation-error branch April 21, 2026 07:50
caork added a commit to caork/GitNexus that referenced this pull request Apr 22, 2026
Key upstream changes merged:
- feat(cli): gitnexus remove <target> command (abhigyanpatwari#1003)
- feat(python): scope-based call resolution pipeline RFC abhigyanpatwari#909 Ring 3 (abhigyanpatwari#980)
- feat(cli): repo fingerprinting via remote URL for sibling-clone detection (abhigyanpatwari#982)
- fix(group): friendly error on missing group name (abhigyanpatwari#989)
- fix(group): bubble local-impact errors in groupImpact (abhigyanpatwari#1007)
- fix(fts): don't cache failed FTS index (abhigyanpatwari#1006)
- fix(ci): docker alpine → debian (abhigyanpatwari#1014)
- deps: graphology 0.26.0, uuid 14, @types/node 25, @types/uuid 11

Conflict resolutions:
- AGENTS.md: kept our NEVER-format rules, added upstream language-specific
  hook rule, Tools Quick Reference table, Impact Risk Levels, gRPC guide link
- package.json: kept @types/multer, took upstream node/uuid type bumps
- package-lock.json: regenerated via npm install

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
…rdown (abhigyanpatwari#1006)

* Initial plan

* Don't cache failed FTS index ensure; invalidate on pool teardown

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/425d41bd-2cc1-49f6-8cc5-57368f0f238e

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FTS bootstrap caches failed index creation, silently disables BM25 for the rest of the process

2 participants