fix(augment): add CONTAINS fallback when FTS indexes unavailable#1476
Conversation
When the MCP server holds the KuzuDB write lock, the augment CLI opens the DB read-only. FTS indexes cannot be created in read-only mode, so searchFTSFromLbug returns ftsAvailable=false and an empty results array. The existing early-return path silently produced no enrichment. Add a Cypher name CONTAINS fallback that fires only when ftsAvailable is false and BM25 produced no symbol matches. This covers the read-only DB case (concurrent MCP server) and the first-run case (indexes not yet built). The fallback is wrapped in .catch(() => []) and cannot throw. When FTS indexes exist, this branch is never reached — behaviour is unchanged for users without a concurrent MCP server.
|
@Antheurus is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 8690 tests passed 1 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
|
Claude finished @magyargergo's task in 4m 34s —— View job PR #1476 Adversarial Review — fix(augment): add CONTAINS fallback when FTS indexes unavailableTasks:
Review bar for this PR
Current PR state
Branch hygiene assessmentBranch hygiene verdict: clean feature/fix PR One commit, one file changed, no unrelated churn, no lockfile/workflow/package/Docker/web-UI pollution. Understanding of the changeKuzuDB (LadybugDB) is opened in read-only mode by the pool adapter (confirmed in The old code would then map an empty This PR adds a Findings[BLOCKER]
|
| Scenario | Covered? |
|---|---|
| FTS available, matches → enrichment | ✅ integration test line 65 |
| FTS available, no matches → empty | ✅ integration test line 73 |
| FTS available, whitespace pattern → empty | ✅ integration test line 90 |
| FTS unavailable, graph has matches → fallback produces enrichment | ❌ no test |
| FTS unavailable, whitespace pattern → CONTAINS '' guard | ❌ no test |
| FTS unavailable, no matching nodes → empty | ❌ no test |
| Fallback query throws → graceful empty | ❌ no test |
| Quote in pattern does not inject | ❌ no test |
| FTS available, no matches → fallback NOT reached | ❌ no test |
The existing whitespace test passes only because FTS is available; it would still pass even if the CONTAINS '' bug were introduced. No test would fail before this PR on the new fallback path.
CI / release-readiness assessment
CI is green on the latest commit dfc7bc8. Typecheck passed. 8507/8508 tests pass. Security checks (CodeQL, Gitleaks, Dependency Review) are all green. No stale checks. The passing CI does not validate the new fallback code path — it validates that the existing augmentation tests (with FTS available) still pass, which they would regardless of the fallback's correctness.
Hidden Unicode / hygiene assessment
engine.ts contains two non-ASCII characters:
- Line 14:
→(U+2192, RIGHTWARDS ARROW) — in a JSDoc comment, visible typographic character - Line 317:
—(U+2014, EM DASH) — in an inline comment, visible typographic character
Neither is a bidirectional override control (U+202A–202E, U+2066–2069). Neither appears in executable code, query strings, regex patterns, or escaping logic. The GitHub warning is a false positive. No action required.
Elegance / maintainability assessment
The change is minimal and local — exactly right for a degradation fallback. The comment on lines 144–145 accurately describes the condition. The .catch(() => []) idiom is already used elsewhere in the engine for individual symbol queries. The dual sym.id || sym[0] field access pattern is consistent with the existing code. The approach does not over-engineer — it is intentionally simple.
The main readability concern is the absent trim() before splitting (which would also make the comment on line 144 more clearly correct), and the missing minimum-length guard. Both are one-line fixes.
Final verdict
❌ not production-ready
Two blockers must be resolved before merge:
1. CONTAINS '' (Blocker 1): A whitespace-only or whitespace-prefixed pattern of length ≥ 3 causes firstWord = "" and executes WHERE n.name CONTAINS '', which matches every node in the graph and injects arbitrary context into LLM hooks. This is a correctness and security failure on the primary use case (hook context poisoning). The fix is a one-line pattern.trim() before the split, plus a !firstWord || firstWord.length < 2 guard.
2. Missing tests (Blocker 2): The entire new code path — the only thing this PR changes in behaviour — has zero test coverage. The existing whitespace-only test passes because FTS is available in that run; it does not exercise the fallback. Per DoD §2.7 and §6, this is an explicit "Not Done" signal. The fallback can silently break, the CONTAINS '' scenario can silently regress, and no test would detect it.
The underlying approach (FTS-availability semantics, control-flow guard, read-only DB compatibility, error handling) is architecturally sound. Once the empty-firstWord guard is added and the FTS-unavailable path is tested, this PR should be close to mergeable, with the label-filter and parameterized-query items addressable as follow-up.
|
@Antheurus please look into Claude's findings 🙏 |
Blocker 1 — CONTAINS '' on whitespace-leading patterns:
pattern.split(/\s+/)[0] returns "" when the input has leading whitespace
(e.g. " ".split(/\s+/) → ["", ""]). In Kuzu, CONTAINS '' matches every
node with a name property, injecting arbitrary graph nodes into LLM context.
Fix: trim() before split, then guard on !firstWord || firstWord.length < 2.
No behaviour change for normal non-empty patterns.
Blocker 2 — zero test coverage on the FTS-unavailable code path:
The new CONTAINS fallback block (engine.ts lines 146-166) was exercised by
no existing test — all existing tests run with FTS indexes built. A second
withTestLbugDB fixture is added with no ftsIndexes, forcing searchFTSFromLbug
to return ftsAvailable: false, and asserts:
1. augment('login', ...) returns non-empty enrichment (fallback works)
2. augment(' ', ...) returns '' (CONTAINS '' guard holds)
3. augment('nxyz_notfound', ...) returns '' (no matching nodes)
4. executeQuery throwing returns '' (.catch(() => []) path)
|
Thanks for the review @magyargergo — both blockers addressed in the new commit ( Blocker 1 — Added const firstWord = pattern.trim().replace(/'/g, "''").split(/\s+/)[0];
if (!firstWord || firstWord.length < 2) return '';
Blocker 2 — zero test coverage on the fallback path Added a second
The medium findings (label filter, Also, I'm fairly new to contributing to open source — really appreciate the thorough review and the detailed findings. Learned a lot from the way it was structured. Thanks for the guidance! |
The same split(/\s+/)[0] bug existed at line 125 (BM25 symbol filter, FTS-available path) — a leading-whitespace pattern produced CONTAINS '' there too, matching every node in BM25-matched files. Fix: hoist patternFirstWord computation with trim() and the length guard to the top of augment(), before any DB interaction. Both CONTAINS sites (BM25 symbol filter and CONTAINS fallback) now use the single pre-validated value. No behaviour change for normal patterns; the guard fires once for all callers instead of being duplicated. Also tighten the whitespace test in the no-FTS suite from 3 spaces to 4 spaces so it unambiguously exercises the patternFirstWord guard rather than straddling the outer pattern.length < 3 boundary.
|
Follow-up fix (commit Consolidated both CONTAINS sites into a single |
|
Claude finished @magyargergo's task in 6m 34s —— View job PR #1476 Adversarial Review — fix(augment): add CONTAINS fallback when FTS indexes unavailableTasks:
Review bar for this PRApplied DoD.md v2.0.0 (all sections), with the PR-specific bar from the trigger:
Current PR state
Branch hygiene assessmentVerdict: merge-from-main commits present but harmless and merge-safe. Evidence: The three PR-author commits ( Understanding of the changeKuzuDB (LadybugDB) is opened read-only by the pool adapter when the MCP server holds the write lock. In read-only mode, FTS indexes cannot be created. This PR removes that early return, restructures to always run the BM25→symbol mapping loop, then adds a fallback A critical guard is added at the top of Findings[RESOLVED]
|
| Scenario | Covered? | Verdict |
|---|---|---|
| FTS unavailable + matching node → enrichment | ✅ augment('login', ...) in no-FTS suite |
Assertion: result.length > 0 + toContain('[GitNexus]') — adequate given seed data specificity |
FTS unavailable + whitespace-only pattern → '' |
✅ augment(' ', ...) — 4 spaces passes outer length < 3 check, caught by patternFirstWord guard |
✓ |
FTS unavailable + no matching nodes → '' |
✅ augment('nxyz_notfound', ...) |
✓ |
FTS unavailable + fallback throws → '' |
✅ vi.spyOn(poolAdapter, 'executeQuery').mockRejectedValue(...) |
The spy also fails FTS queries inside searchFTSFromLbug, preserving ftsAvailable: false. .catch(() => []) produces []. ✓ |
| FTS available + no BM25 results → fallback NOT reached | ✅ (implicit) augment('nonexistent_xyz', ...) in FTS suite — ftsAvailable: true → !ftsAvailable = false → fallback skipped |
Implicit only; no explicit comment binding it to the non-fallback behavior |
| Normal FTS-available path unchanged | ✅ all original integration tests pass | ✓ |
patternFirstWord guard in BM25 path (FTS-available + whitespace) |
✅ existing augment(' ', ...) test in FTS suite, now blocked at patternFirstWord guard instead of outer guard |
✓ |
| Apostrophe in pattern — no injection | ❌ no explicit test | Low risk given correct escaping, but a augment("O'Reilly", ...) test would pin the escaping contract |
CI/checks assessment:
The last CI bot comment at 2026-05-10T11:57Z reported ✅ all checks for the run at that time. The final merge commit 7f6e554 (current head) adds main commits 6a89472 (server repo-name sanitization) and e412d29 (C scope resolution), both of which have their own passing CI histories on main. The augmentation tests in gitnexus/ are run by the standard npm test CI step (not the e2e/web job). I cannot directly confirm the current head's check status from the available data. The risk is low given the merge only brought in passing main commits and the PR-author commits already had a full green CI run.
Hidden Unicode / hygiene:
The engine.ts file contains → (U+2192) at line 14 in a JSDoc comment and — (U+2014) at line 318 in an inline comment. Neither is a bidi control character (bidi range: U+202A–U+202E, U+2066–U+2069). Neither appears in executable code, query strings, regex patterns, or escaping logic. No action required.
Final verdict
✅ production-ready with minor follow-ups
Both prior blockers are definitively resolved: the CONTAINS '' attack vector is blocked by the pattern.trim() + !patternFirstWord || patternFirstWord.length < 2 guard at the top of augment() before any DB interaction, and the FTS-unavailable code path is genuinely exercised by the second withTestLbugDB fixture with no ftsIndexes. The control flow is correct — the fallback fires only when ftsAvailable === false && symbolMatches.length === 0, preserving the normal FTS path entirely. Query escaping is correct for Kuzu Cypher. Branch hygiene is acceptable — merge-from-main commits are harmless and the PR file diff is cleanly scoped to the two augmentation files.
The remaining items are Medium/Low and should be tracked as follow-ups: add a node-label filter (IN ['Function', 'Method', 'Class', ...]) to the fallback query to avoid File node saturation; consider executeParameterized if Kuzu supports CONTAINS parameterization; add ORDER BY size(n.name), n.name for determinism; add a debug-gated log in .catch(). An apostrophe-in-pattern test would pin the escaping contract. None of these block merge.
· Branch
|
@Antheurus Could you please look into claude's findings? 🙏 |
Asserts the CONTAINS fallback does NOT fire when FTS is available but BM25 returns zero results. Pins the safety property promised by the PR description: behavior is unchanged for users without the read-only-DB condition. If anyone later loosens the gate to `symbolMatches.length === 0` alone, this test fails.
Bring in upstream fixes including: - fix(search): create FTS indexes during analyze (abhigyanpatwari#1107) — ROOT CAUSE of query() returning 0 results (FTS indexes were never created because lazy creation failed on read-only MCP pool connection) - fix(search): load FTS during core DB init (abhigyanpatwari#1123) - fix(search): surface warning when FTS indexes missing (abhigyanpatwari#1418) - fix(augment): add CONTAINS fallback when FTS unavailable (abhigyanpatwari#1476) - fix(search): guard against undefined bm25Results (abhigyanpatwari#1489) - feat(cpp): C++ ADL V2 overload resolution improvements - feat(detect-changes): support git worktrees (abhigyanpatwari#1654) - feat(cpp): parameter type class sidecar, SFINAE filter - Various CI, security, and infrastructure improvements AscendC provider updated to match upstream naming: sourcePreprocessor → preprocessSource Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On Windows the LadybugDB FTS extension is not loaded (see pool-adapter win32 SIGSEGV guard) so `searchFTSFromLbug` reports `ftsAvailable: false` and the query tool returned an empty payload with an `analyze --force` warning that could not actually fix the underlying capability gap. Mirror the `augment` CONTAINS fallback from abhigyanpatwari#1476: when FTS is reported unavailable, run a per-label CONTAINS scan against Method / Function / Class / Interface / File so `query` still returns ranked matches. - New `containsKeywordFallback` private method on LocalBackend - Wired into all three FTS-unavailable code paths in `bm25Search` - Plumbs a `containsFallback` flag back to `query` so the warning text reflects the actual condition (extension missing vs index missing) Less accurate than true BM25 (token-presence count, not TF-IDF / BM25 scoring) but functional — restores end-to-end usability of the query tool on Windows. Tested locally against a 23k-symbol C# repo: queries that previously returned `{ processes: [], definitions: [] }` now return correct process-grouped results with the matching gotcha symbols at the top.
Problem
When the GitNexus MCP server is running alongside the
gitnexus augmentCLI (e.g. in Claude Code hooks), the MCP server holds the KuzuDB write lock. The augment CLI opens the DB viapool-adapterin read-only mode. FTS indexes cannot be created in read-only mode, sosearchFTSFromLbugreturnsftsAvailable: falsewith an empty results array.The existing code path hits
if (bm25Results.length === 0) return ''immediately after, producing no enrichment at all — silently, with no indication that the graph was reachable but FTS was unavailable.This also affects first-run scenarios where
gitnexus analyzehas completed but FTS indexes have not been built yet for a session.Fix
Add a Cypher
n.name CONTAINSfallback inaugment()that fires only when:ftsAvailable === false(indexes genuinely unavailable, not just no matches), ANDsymbolMatchesis empty after the BM25 mapping loopThe fallback does a direct graph name search — slower than FTS but always available when the graph is indexed. It is wrapped in
.catch(() => [])and cannot throw.Safety
.catch(() => [])means any graph error still results inreturn ''Changed file
gitnexus/src/core/augmentation/engine.ts— 25 insertions, 3 deletions