Skip to content

fix(augment): add CONTAINS fallback when FTS indexes unavailable#1476

Merged
magyargergo merged 9 commits into
abhigyanpatwari:mainfrom
Antheurus:fix/augment-readonly-fts-fallback
May 11, 2026
Merged

fix(augment): add CONTAINS fallback when FTS indexes unavailable#1476
magyargergo merged 9 commits into
abhigyanpatwari:mainfrom
Antheurus:fix/augment-readonly-fts-fallback

Conversation

@Antheurus

Copy link
Copy Markdown
Contributor

Problem

When the GitNexus MCP server is running alongside the gitnexus augment CLI (e.g. in Claude Code hooks), the MCP server holds the KuzuDB write lock. The augment CLI opens the DB via pool-adapter in read-only mode. FTS indexes cannot be created in read-only mode, so searchFTSFromLbug returns ftsAvailable: false with an empty results array.

The existing code path hits if (bm25Results.length === 0) return '' immediately after, producing no enrichment at all — silently, with no indication that the graph was reachable but FTS was unavailable.

This also affects first-run scenarios where gitnexus analyze has completed but FTS indexes have not been built yet for a session.

Fix

Add a Cypher n.name CONTAINS fallback in augment() that fires only when:

  1. ftsAvailable === false (indexes genuinely unavailable, not just no matches), AND
  2. symbolMatches is empty after the BM25 mapping loop

The fallback does a direct graph name search — slower than FTS but always available when the graph is indexed. It is wrapped in .catch(() => []) and cannot throw.

if (symbolMatches.length === 0 && !ftsAvailable) {
  const firstWord = pattern.replace(/'/g, "''").split(/\s+/)[0];
  const fallbackRows = await executeQuery(repoId, `
    MATCH (n)
    WHERE n.name CONTAINS '${firstWord}'
    RETURN n.id AS id, n.name AS name, labels(n)[0] AS type, n.filePath AS filePath
    LIMIT 5
  `).catch(() => []);
  // push to symbolMatches...
}

Safety

  • No behaviour change when FTS indexes exist — the fallback branch is never reached
  • No new dependencies
  • Graceful: .catch(() => []) means any graph error still results in return ''
  • Does not depend on any specific tooling setup (Claude Code, context-mode, etc.)

Changed file

  • gitnexus/src/core/augmentation/engine.ts — 25 insertions, 3 deletions

When the MCP server holds the KuzuDB write lock, the augment CLI opens
the DB read-only. FTS indexes cannot be created in read-only mode, so
searchFTSFromLbug returns ftsAvailable=false and an empty results array.
The existing early-return path silently produced no enrichment.

Add a Cypher name CONTAINS fallback that fires only when ftsAvailable is
false and BM25 produced no symbol matches. This covers the read-only DB
case (concurrent MCP server) and the first-run case (indexes not yet
built). The fallback is wrapped in .catch(() => []) and cannot throw.

When FTS indexes exist, this branch is never reached — behaviour is
unchanged for users without a concurrent MCP server.
@vercel

vercel Bot commented May 10, 2026

Copy link
Copy Markdown

@Antheurus is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented May 10, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
8691 8690 0 1 386s

✅ All 8690 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 78% 25814/33093 N/A% 🟢 ███████████████░░░░░
Branches 66.54% 16324/24532 N/A% 🟢 █████████████░░░░░░░
Functions 83.19% 2599/3124 N/A% 🟢 ████████████████░░░░
Lines 81.06% 23338/28790 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@github-actions

github-actions Bot commented May 10, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 4m 34s —— View job


PR #1476 Adversarial Review — fix(augment): add CONTAINS fallback when FTS indexes unavailable

Tasks:

  • Post initial comment
  • Read repo docs (AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, DoD.md)
  • Read engine.ts (changed file)
  • Read related augmentation/search/lbug files (bm25-index.ts, pool-adapter.ts)
  • Read related tests (augmentation.test.ts)
  • Check hidden Unicode in engine.ts
  • Assess branch hygiene (git log, diff)
  • Analyze control flow, query safety, retrieval quality, performance, read-only DB, error handling, tests
  • Post full review

Review bar for this PR

  • augment() must use BM25/FTS as primary fast path; fallback only when ftsAvailable === false and symbolMatches is empty.
  • Fallback cannot produce broad/noisy enrichment for empty or generic patterns; CONTAINS '' must not run.
  • Fallback query must be safe against Cypher injection; parameterized or correctly escaped.
  • Fallback cannot throw out of augment().
  • Tests must cover the FTS-unavailable code path — the changed behavior is not exercised by any existing test.
  • No behaviour change on the normal FTS-available path.
  • Hidden Unicode must not affect executable code, query strings, or escaping logic.

Current PR state

Item Status
Latest head SHA dfc7bc8
Open / draft Open, not draft
Commits 1
Changed files 1 (gitnexus/src/core/augmentation/engine.ts)
Net diff +25 / -3
CI checks ✅ All passed (8507/8508 tests; 1 pre-existing skip)
Typecheck tsc --noEmit passed
Augmentation integration tests ✅ run — but FTS-unavailable path not covered
CodeQL / GHAS / Gitleaks / Dep Review ✅ green per CI report
Hidden Unicode Lines 14, 317: (U+2192) and (U+2014) in comments only — visible typographic punctuation, not bidi overrides; GitHub warning is a false positive
Vercel preview Irrelevant to this review
Code matches PR body Yes

Branch hygiene assessment

Branch hygiene verdict: clean feature/fix PR

One commit, one file changed, no unrelated churn, no lockfile/workflow/package/Docker/web-UI pollution.


Understanding of the change

KuzuDB (LadybugDB) is opened in read-only mode by the pool adapter (confirmed in pool-adapter.ts:270–282). In read-only mode, FTS extension indexes cannot be created; loadFTSExtension is called with policy: 'load-only' and silently degrades if unavailable. Consequently searchFTSFromLbug() can return {results: [], ftsAvailable: false}.

The old code would then map an empty bm25Results array to an empty symbolMatches, and the subsequent if (symbolMatches.length === 0) return '' would produce no augmentation — even though the graph is fully reachable and indexed.

This PR adds a MATCH (n) WHERE n.name CONTAINS '${firstWord}' LIMIT 5 fallback that fires only when symbolMatches.length === 0 && !ftsAvailable. The ftsAvailable flag correctly distinguishes "FTS unavailable" from "FTS available but no matches" (bm25-index.ts:107: ftsAvailable = queriesSucceeded > 0). The fix correctly restores enrichment in the MCP-lock and first-run scenarios. The architecture rationale is sound.


Findings

[BLOCKER] CONTAINS '' runs on whitespace-leading or whitespace-only patterns

  • Category: Query safety / correctness

  • File: gitnexus/src/core/augmentation/engine.ts:147

  • Issue:

    const firstWord = pattern.replace(/'/g, "''").split(/\s+/)[0];

    JavaScript's /\s+/ split on a string with leading whitespace returns "" as the first element:

    • " ".split(/\s+/)["", ""]firstWord = ""
    • " loginFunc".split(/\s+/)["", "loginFunc"]firstWord = ""

    The outer guard if (!pattern || pattern.length < 3) return '' at line 87 only checks total length. A whitespace-only pattern " " (3 chars) passes that guard, reaches the fallback when FTS is unavailable, and executes:

    WHERE n.name CONTAINS ''

    In Kuzu Cypher, CONTAINS '' matches every node with a name property. This returns arbitrary context nodes (Community, Process, File, Function, …) and injects them as enrichment — exactly the noisy output this fallback must prevent.

    Repro:

    • Run gitnexus augment " " /some/repo while MCP holds the DB write lock (FTS unavailable).
    • Fallback fires, CONTAINS '' returns up to 5 arbitrary nodes, and bogus enrichment is emitted to the hook.
  • Why it matters: Hooks consume augment() output and embed it directly into LLM context. Injecting arbitrary graph nodes silently poisons agent context without any error signal.

  • Recommended fix:

    const firstWord = pattern.trim().replace(/'/g, "''").split(/\s+/)[0];
    if (!firstWord || firstWord.length < 2) return '';

    The trim() before splitting ensures leading/trailing whitespace is stripped. The minimum-length guard prevents single-character tokens from causing broad scans. Fix this →

  • Blocks merge: yes


[BLOCKER] No tests cover the FTS-unavailable fallback code path

  • Category: Tests

  • File: gitnexus/test/integration/augmentation.test.ts

  • Issue: The existing integration tests (lines 65–108) always run with FTS indexes fully built (AUGMENT_FTS_INDEXES on line 46). The new fallback block at engine.ts lines 146–166 has zero test coverage:

    • No test for ftsAvailable: false + graph nodes exist → fallback produces enrichment
    • No test for ftsAvailable: false + whitespace pattern → CONTAINS '' scenario
    • No test that the fallback is NOT reached when ftsAvailable: true and FTS returns empty
    • No test for fallback query failure → graceful empty

    The existing test 'returns empty string for whitespace-only pattern' passes because FTS is available in that run; the fallback branch is never entered. This test would continue to pass even if the CONTAINS '' bug were introduced.

    DoD.md §2.7: "Tests cover the real changed path — they would fail if behavior, wiring, or contracts were broken." — Not met. DoD.md §6: "The runtime path is not actually exercised by the tests." — Not done signal applies.

  • Why it matters: Any future edit to the fallback (escaping, condition, field mapping) can silently break it with no test failure. The exact scenario the PR claims to fix (FTS unavailable → enrichment restored) is never asserted.

  • Recommended fix: Add a mocked unit test or a second withTestLbugDB fixture that skips FTS index creation, then asserts:

    1. augment('login', ...) produces non-empty output (fallback works)
    2. augment(' ', ...) produces empty output (CONTAINS '' guard works)
    3. augment('nxyz', ...) with no matching graph nodes produces empty output
    4. A query-throwing mock produces empty output (.catch path)

    Fix this →

  • Blocks merge: yes (per DoD §2.7 and §6)


[MEDIUM] executeParameterized exists but unused — string interpolation used instead

  • Category: Security / code quality
  • File: gitnexus/src/core/augmentation/engine.ts:148–156, gitnexus/src/core/lbug/pool-adapter.ts:613
  • Issue: pool-adapter.ts exports executeParameterized (lines 613–643), which uses conn.prepare(cypher) + conn.execute(stmt, params) for proper parameterized query execution. The fallback uses executeQuery with inline string interpolation instead. While single-quote doubling is Kuzu Cypher's correct string-literal escaping, parameterized queries are the safe API and their existence in this codebase makes the inline approach unnecessary.
  • Why it matters: DoD §2.5 requires no new injection surfaces on paths consuming untrusted input. Hook patterns are untrusted (from Claude/Cursor agent inputs). Centralizing escaping avoids future mistakes if the pattern-extraction logic is copied or reused. Note: executeParameterized is not imported in engine.ts and would require adding to the destructured import.
  • Recommended fix: Use executeParameterized if it supports the $param / {param} syntax for CONTAINS queries in Kuzu. If not (Kuzu's CONTAINS may not support parameters for string predicates), add an explicit comment explaining why string escaping is used and add a shared escapeCypherString helper with tests.
  • Blocks merge: no (given correct escaping is in place, but should be addressed before the pattern spreads)

[MEDIUM] No node-label filter — fallback returns non-symbol nodes

  • Category: Retrieval quality
  • File: gitnexus/src/core/augmentation/engine.ts:151–155
  • Issue:
    MATCH (n)
    WHERE n.name CONTAINS '${firstWord}'
    RETURN n.id AS id, n.name AS name, labels(n)[0] AS type, n.filePath AS filePath
    LIMIT 5
    MATCH (n) over all nodes returns any labeled node with a name property, including Community, Process, File, Repo, or any other graph metadata. Community nodes have name (the label string), Process nodes have name, etc. These have no meaningful filePath and produce low-quality augmentation. The downstream callers/callees queries use WHERE n.id IN [...] — Community and Process nodes will yield zero caller/callee matches, wasting the batch query budget and polluting the output header count.
  • Why it matters: A repo with many named Community/Process nodes could saturate the LIMIT 5 with metadata nodes, leaving no actual symbol candidates.
  • Recommended fix:
    MATCH (n)
    WHERE labels(n)[0] IN ['Function', 'Method', 'Class', 'Interface', 'Struct', 'Enum', 'Field']
    AND n.name CONTAINS '${firstWord}'
    AND n.filePath IS NOT NULL
    RETURN n.id AS id, n.name AS name, labels(n)[0] AS type, n.filePath AS filePath
    ORDER BY size(n.name), n.name
    LIMIT 5
  • Blocks merge: no (worsens quality but doesn't corrupt the system; score: 1.0 and graceful downstream handling mean it degrades without crashing)

[LOW] No deterministic ordering in fallback query

  • Category: Retrieval quality / stability
  • Issue: LIMIT 5 without ORDER BY returns results in Kuzu's internal storage order, which may vary across DB versions, compaction, or node insertion order. This makes augmentation output non-deterministic across runs.
  • Recommended fix: Add ORDER BY size(n.name), n.name to prefer shorter (more specific) names and ensure stable output.
  • Blocks merge: no

[LOW] Silent .catch(() => []) swallows all diagnostic information

  • Category: Observability
  • File: gitnexus/src/core/augmentation/engine.ts:148–156
  • Issue: If executeQuery throws (pool not initialized, connection timeout, write-guard rejection, etc.), the error is swallowed entirely. DoD §2.8: "no silent catches that swallow diagnostics." Hook users have no way to know whether the fallback succeeded, failed quietly, or was never reached.
  • Recommended fix: Add a debug-gated log: if (process.env.GITNEXUS_DEBUG) { process.stderr.write(\[GitNexus] fallback CONTAINS query failed: ${err}\n`) }` inside the catch. This preserves the no-throw contract while providing a debug escape hatch.
  • Blocks merge: no

Augmentation control-flow assessment

Correct. ftsAvailable semantics are reliable: bm25-index.ts:107 sets ftsAvailable = queriesSucceeded > 0, so true = at least one FTS index query succeeded (even with zero results), false = all queries failed (indexes absent). The fallback guard symbolMatches.length === 0 && !ftsAvailable is exact:

  • FTS available + no matches → !ftsAvailable === false → fallback skipped ✓
  • FTS unavailable + no matches → fallback runs ✓
  • FTS unavailable + BM25 symbol matches populated → symbolMatches.length > 0 → fallback skipped ✓

No duplicate bm25Results declaration in the final file. Old early return on bm25Results.length === 0 was removed and replaced by the post-fallback if (symbolMatches.length === 0) return ''. Downstream callers/callees/processes/cohesion batching is unchanged. Normal FTS path is not affected.


Query safety assessment

Single-quote escaping via replace(/'/g, "''") is Kuzu Cypher's correct string-literal escaping mechanism — confirmed by the pre-existing identical pattern at engine.ts:119 and bm25-index.ts:37. Backslashes are not special in Kuzu string literals, so no backslash injection vector exists. Newlines and Unicode within firstWord cannot break the string literal in Kuzu.

However, the CONTAINS '' bug (Blocker 1) is a direct consequence of computing firstWord before trimming. A quote-injecting pattern like "'; DROP ..." would escape correctly ("''; DROP ..." → firstWord "''"CONTAINS ''''" which is a syntactically valid escaped empty string — not an injection). The escaping logic is correct for non-empty firstWord. The problem is not injection; it is empty-string matching.

executeParameterized exists and is the preferred API, but the current escaping is technically correct for Kuzu, and the actual attack surface is the empty-firstWord issue (not quote injection).


Retrieval quality assessment

Weak. MATCH (n) without label restriction returns any node with name, including non-symbol metadata. score: 1.0 is a fixed stub — it does not mislead ranking (all fallback results tie, cohesion sort runs after), but it differs semantically from BM25 scores. CONTAINS is case-sensitive; most symbol names are camelCase, so "login" would match login, loginService, loginForm, but not Login or LOGIN. This is generally acceptable for code symbol lookup but worth noting. No deterministic ordering. Missing label filter is a quality concern (Medium finding above).


Performance / scalability assessment

Acceptable for a degradation path, but only with the CONTAINS '' guard in place. With the guard missing (current state), a whitespace-only or whitespace-leading pattern causes a full node scan on every hook invocation — a DoS-equivalent on large monorepos. With the guard fixed, CONTAINS 'loginService' on a corpus of 100k nodes is a linear scan bounded by LIMIT 5. This is slower than FTS but acceptable in the degradation path, which is uncommon (MCP write lock / first run only). There is no retry loop, so no thundering-herd risk.


Read-only DB / FTS degradation assessment

Sound. pool-adapter.ts:362 opens with readOnly: true. executeQuery enforces isWriteQuery() (blocks CREATE, DROP, etc.) and the fallback query (MATCH ... WHERE ... RETURN ... LIMIT) passes that check. No FTS extension, VECTOR index, or write capability is needed. executeQuery can run while the MCP server holds the write lock because both use the same read-only pool. The first-run scenario (analyze done, FTS indexes not built) is covered by the same code path.


Error handling / observability assessment

.catch(() => []) correctly prevents hook/CLI crashes and ensures augment() never throws from the fallback. No unhandled promise rejection. The outer try/catch at line 316 additionally catches any leak. However, repeated silent failures leave no diagnostic trail — the debug-log recommendation (Low finding) applies.


Test assessment

Scenario Covered?
FTS available, matches → enrichment ✅ integration test line 65
FTS available, no matches → empty ✅ integration test line 73
FTS available, whitespace pattern → empty ✅ integration test line 90
FTS unavailable, graph has matches → fallback produces enrichment ❌ no test
FTS unavailable, whitespace pattern → CONTAINS '' guard ❌ no test
FTS unavailable, no matching nodes → empty ❌ no test
Fallback query throws → graceful empty ❌ no test
Quote in pattern does not inject ❌ no test
FTS available, no matches → fallback NOT reached ❌ no test

The existing whitespace test passes only because FTS is available; it would still pass even if the CONTAINS '' bug were introduced. No test would fail before this PR on the new fallback path.


CI / release-readiness assessment

CI is green on the latest commit dfc7bc8. Typecheck passed. 8507/8508 tests pass. Security checks (CodeQL, Gitleaks, Dependency Review) are all green. No stale checks. The passing CI does not validate the new fallback code path — it validates that the existing augmentation tests (with FTS available) still pass, which they would regardless of the fallback's correctness.


Hidden Unicode / hygiene assessment

engine.ts contains two non-ASCII characters:

  • Line 14: (U+2192, RIGHTWARDS ARROW) — in a JSDoc comment, visible typographic character
  • Line 317: (U+2014, EM DASH) — in an inline comment, visible typographic character

Neither is a bidirectional override control (U+202A–202E, U+2066–2069). Neither appears in executable code, query strings, regex patterns, or escaping logic. The GitHub warning is a false positive. No action required.


Elegance / maintainability assessment

The change is minimal and local — exactly right for a degradation fallback. The comment on lines 144–145 accurately describes the condition. The .catch(() => []) idiom is already used elsewhere in the engine for individual symbol queries. The dual sym.id || sym[0] field access pattern is consistent with the existing code. The approach does not over-engineer — it is intentionally simple.

The main readability concern is the absent trim() before splitting (which would also make the comment on line 144 more clearly correct), and the missing minimum-length guard. Both are one-line fixes.


Final verdict

❌ not production-ready

Two blockers must be resolved before merge:

1. CONTAINS '' (Blocker 1): A whitespace-only or whitespace-prefixed pattern of length ≥ 3 causes firstWord = "" and executes WHERE n.name CONTAINS '', which matches every node in the graph and injects arbitrary context into LLM hooks. This is a correctness and security failure on the primary use case (hook context poisoning). The fix is a one-line pattern.trim() before the split, plus a !firstWord || firstWord.length < 2 guard.

2. Missing tests (Blocker 2): The entire new code path — the only thing this PR changes in behaviour — has zero test coverage. The existing whitespace-only test passes because FTS is available in that run; it does not exercise the fallback. Per DoD §2.7 and §6, this is an explicit "Not Done" signal. The fallback can silently break, the CONTAINS '' scenario can silently regress, and no test would detect it.

The underlying approach (FTS-availability semantics, control-flow guard, read-only DB compatibility, error handling) is architecturally sound. Once the empty-firstWord guard is added and the FTS-unavailable path is tested, this PR should be close to mergeable, with the label-filter and parameterized-query items addressable as follow-up.

@magyargergo

Copy link
Copy Markdown
Collaborator

@Antheurus please look into Claude's findings 🙏

@magyargergo magyargergo self-assigned this May 10, 2026
magyargergo and others added 2 commits May 10, 2026 16:01
Blocker 1 — CONTAINS '' on whitespace-leading patterns:
pattern.split(/\s+/)[0] returns "" when the input has leading whitespace
(e.g. "   ".split(/\s+/) → ["", ""]). In Kuzu, CONTAINS '' matches every
node with a name property, injecting arbitrary graph nodes into LLM context.

Fix: trim() before split, then guard on !firstWord || firstWord.length < 2.
No behaviour change for normal non-empty patterns.

Blocker 2 — zero test coverage on the FTS-unavailable code path:
The new CONTAINS fallback block (engine.ts lines 146-166) was exercised by
no existing test — all existing tests run with FTS indexes built. A second
withTestLbugDB fixture is added with no ftsIndexes, forcing searchFTSFromLbug
to return ftsAvailable: false, and asserts:
1. augment('login', ...) returns non-empty enrichment (fallback works)
2. augment('   ', ...) returns '' (CONTAINS '' guard holds)
3. augment('nxyz_notfound', ...) returns '' (no matching nodes)
4. executeQuery throwing returns '' (.catch(() => []) path)
@Antheurus

Antheurus commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review @magyargergo — both blockers addressed in the new commit (2afda29).

Blocker 1 — CONTAINS '' (engine.ts:147)

Added pattern.trim() before the split and a !firstWord || firstWord.length < 2 early return:

const firstWord = pattern.trim().replace(/'/g, "''").split(/\s+/)[0];
if (!firstWord || firstWord.length < 2) return '';

" ".trim()"" → guard fires → return '' before the query runs. No behaviour change for normal patterns.

Blocker 2 — zero test coverage on the fallback path

Added a second withTestLbugDB fixture in augmentation.test.ts with no ftsIndexes, forcing searchFTSFromLbug to return ftsAvailable: false. Covers all four scenarios from the review:

  1. augment('login', ...) → non-empty with [GitNexus] (fallback produces enrichment)
  2. augment(' ', ...)'' (CONTAINS '' guard holds)
  3. augment('nxyz_notfound', ...)'' (no matching nodes)
  4. executeQuery mock-throwing → '' (.catch(() => []) path verified via vi.spyOn)

The medium findings (label filter, executeParameterized, ordering) noted — happy to address those as a follow-up if preferred, or I can add them to this PR.


Also, I'm fairly new to contributing to open source — really appreciate the thorough review and the detailed findings. Learned a lot from the way it was structured. Thanks for the guidance!

The same split(/\s+/)[0] bug existed at line 125 (BM25 symbol filter,
FTS-available path) — a leading-whitespace pattern produced CONTAINS ''
there too, matching every node in BM25-matched files.

Fix: hoist patternFirstWord computation with trim() and the length guard
to the top of augment(), before any DB interaction. Both CONTAINS sites
(BM25 symbol filter and CONTAINS fallback) now use the single pre-validated
value. No behaviour change for normal patterns; the guard fires once for
all callers instead of being duplicated.

Also tighten the whitespace test in the no-FTS suite from 3 spaces to
4 spaces so it unambiguously exercises the patternFirstWord guard rather
than straddling the outer pattern.length < 3 boundary.
@Antheurus

Antheurus commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up fix (commit fe38acdb): the same split(/\s+/)[0] bug existed at line 125 — the BM25 symbol-filter query in the FTS-available path had the same inline pattern that produces CONTAINS '' on leading-whitespace input.

Consolidated both CONTAINS sites into a single patternFirstWord computed at the top of augment() with trim() and the length guard applied once, before any DB interaction. Both paths now use the pre-validated value — no duplication of the guard logic.

@github-actions

github-actions Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 6m 34s —— View job


PR #1476 Adversarial Review — fix(augment): add CONTAINS fallback when FTS indexes unavailable

Tasks:

  • Post initial comment
  • Read repo docs (DoD.md, AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md)
  • Check branch hygiene (git log, diff --name-status, diff --stat)
  • Read engine.ts and augmentation.test.ts (changed files)
  • Read related files (bm25-index.ts, pool-adapter.ts, fixture helpers, schema.ts)
  • Search for key terms (patternFirstWord, ftsAvailable, executeQuery, withTestLbugDB, etc.)
  • Run hidden Unicode / non-ASCII scans
  • Verify CI status for current head
  • Post full adversarial review

Review bar for this PR

Applied DoD.md v2.0.0 (all sections), with the PR-specific bar from the trigger:

  • augment() must return empty for invalid/whitespace/too-short/non-matching patterns in all FTS states.
  • Fallback fires only when ftsAvailable === false && symbolMatches.length === 0.
  • Fallback cannot emit noisy or arbitrary enrichment for empty/generic input; CONTAINS '' must not run.
  • Fallback cannot throw out of augment().
  • Tests must exercise the actual FTS-unavailable runtime branch and fail on regressions.
  • Diff must remain scoped to the two augmentation files.
  • DoD §2.5 (no new injection surfaces), §2.7 (tests cover real changed path), §2.8 (no silent diagnostic swallows blocking debug), §6 (not-done signals).

Current PR state

Item Value
Current head SHA 7f6e554 (Merge branch 'main' into fix/augment-readonly-fts-fallback)
Open / draft Open, not draft
PR-author commits dfc7bc8, 2afda29, fe38acd (3 substantive commits)
Merge-from-main commits ee7bdef, a137330, 7f6e554 (3 merge commits)
Changed files (PR file diff) gitnexus/src/core/augmentation/engine.ts (+28/−4), gitnexus/test/integration/augmentation.test.ts (+63/−0)
Typecheck ✅ (per prior CI comment — tsc --noEmit)
Tests ✅ (8507/8508 per prior CI comment)
CI verification Cannot confirm current head 7f6e554 directly — prior CI comment at 2026-05-10T11:57Z was for dfc7bc8. The final merge commit (7f6e554) incorporates main commits 6a89472 + e412d29 which have their own CI history. If those passed independently, the merged head should be green, but this cannot be confirmed from the available data.
Hidden Unicode Non-ASCII at engine.ts:14 ( U+2192) and engine.ts:317 ( U+2014) — both in comments only, not in executable code, queries, regexes, or security logic. Not bidi controls. No action required.

Branch hygiene assessment

Verdict: merge-from-main commits present but harmless and merge-safe.

Evidence: The three PR-author commits (dfc7bc8, 2afda29, fe38acd) each touch only engine.ts and/or augmentation.test.ts. The git show --stat confirms this. The merge commits (ee7bdef, a137330, 7f6e554) pull in main changes (Cursor 2.4 hook upgrades, Windows FTS SIGSEGV fix, server repo-name sanitization, C scope resolution), but those appear in the merge base and are not part of the PR's file diff. git diff dfc7bc8~1 7f6e554 -- engine.ts augmentation.test.ts shows only the intended changes. No unrelated files were modified by the PR author. The diff is clean and correctly scoped.


Understanding of the change

KuzuDB (LadybugDB) is opened read-only by the pool adapter when the MCP server holds the write lock. In read-only mode, FTS indexes cannot be created. searchFTSFromLbug() queries each index in FTS_INDEXES via QUERY_FTS_INDEX(); when no indexes exist, every query throws and is caught, queriesSucceeded stays 0, and ftsAvailable = false. The prior code path hit if (bm25Results.length === 0) return '' immediately, silently producing no enrichment even though the graph was fully reachable.

This PR removes that early return, restructures to always run the BM25→symbol mapping loop, then adds a fallback MATCH (n) WHERE n.name CONTAINS '${patternFirstWord}' LIMIT 5 that fires only when symbolMatches.length === 0 && !ftsAvailable. The fallback is wrapped in .catch(() => []) and cannot throw.

A critical guard is added at the top of augment(): patternFirstWord = pattern.trim().replace(/'/g, "''").split(/\s+/)[0], with an early return if !patternFirstWord || patternFirstWord.length < 2. This prevents CONTAINS '' from running on whitespace-only input and consolidates escaping for both the BM25 symbol-filter query (FTS-available path) and the new fallback query.


Findings

[RESOLVED] CONTAINS '' on whitespace-leading patterns

Prior blocker. Fixed by pattern.trim() before split and the !patternFirstWord || patternFirstWord.length < 2 guard at engine.ts:90. " ".trim()""!patternFirstWord is true → early return before any DB interaction. Verified against the current diff.

[RESOLVED] Zero test coverage on FTS-unavailable fallback

Prior blocker. Fixed by the second withTestLbugDB('augment-no-fts', ...) fixture in augmentation.test.ts (lines 141–196). Omitting ftsIndexes means no FTS indexes are created; QUERY_FTS_INDEX calls throw; queriesSucceeded = 0; ftsAvailable = false. This matches the production read-only/first-run failure mode exactly. The fallback code block at engine.ts:149–168 is genuinely exercised.


[MEDIUM] No label filter in fallback query — non-symbol nodes can match

  • File: engine.ts:152–156
  • Evidence: MATCH (n) WHERE n.name CONTAINS '${patternFirstWord}' matches any node with a name property. Schema check (schema.ts:111–138) confirms Community and Process nodes do not have a name column — in Kuzu, accessing a nonexistent column returns null, so CONTAINS is false and they are filtered out implicitly. However, File nodes do have name (schema.ts:34). A file named login.ts or loginService.ts would match CONTAINS 'login' and be pushed into symbolMatches as a File node. Its downstream callers/callees lookup would yield nothing meaningful (no CALLS edges on File nodes), and the output would be login.ts (src/login.ts) with no relationship info — correct enough to not corrupt context, but not a code symbol.
  • Risk: In large production graphs, LIMIT 5 could be saturated by File nodes before Function/Method nodes appear, reducing fallback quality. No context poisoning risk — callers/callees of File nodes will be empty.
  • Recommended fix: Add AND labels(n)[0] IN ['Function', 'Method', 'Class', 'Interface', 'Struct', 'Enum', 'Field'] to the fallback WHERE clause and AND n.filePath IS NOT NULL.
  • Blocks merge: no — Mitigated by schema (Community/Process lack name). File matches are benign, just lower quality. Acceptable as a follow-up.

[MEDIUM] executeParameterized exists and is unused

  • File: engine.ts:150–157, pool-adapter.ts:628
  • Evidence: pool-adapter.ts:628 exports executeParameterized using conn.prepare(cypher) + conn.execute(stmt, params) — the safe API. The fallback uses string interpolation with manual escaping instead. The escaping (replace(/'/g, "''")) is the correct Kuzu Cypher string-literal escaping mechanism (confirmed by pre-existing identical pattern at engine.ts:122). The question is whether Kuzu's CONTAINS supports parameterized string arguments (WHERE n.name CONTAINS $token). If it does, executeParameterized is strictly preferred per DoD §2.5.
  • Risk: Current escaping is technically correct for non-empty patternFirstWord (the CONTAINS '' attack vector is now blocked at the guard level). The risk is future edits copying the pattern without the guard.
  • Recommended fix: Try executeParameterized with Kuzu's CONTAINS syntax. If unsupported, add a comment explaining why and extract a shared escapeCypherString(s: string): string helper with a unit test.
  • Blocks merge: no — Correct escaping is demonstrably in place.

[LOW] No deterministic ordering in fallback query

  • File: engine.ts:152
  • Issue: LIMIT 5 without ORDER BY returns results in Kuzu's internal storage order, which varies across DB versions, compaction, and node insertion order. Makes fallback output non-deterministic across runs.
  • Recommended fix: Add ORDER BY size(n.name), n.name to prefer shorter (more specific) names and ensure stable output.
  • Blocks merge: no

[LOW] Silent .catch(() => []) swallows all diagnostics

  • File: engine.ts:158
  • Issue: If executeQuery fails for any reason (pool exhausted, connection timeout, write-guard rejection, catalog error), the failure is invisible. DoD §2.8: no silent catches that swallow diagnostics.
  • Recommended fix: }.catch((err) => { if (process.env.GITNEXUS_DEBUG) process.stderr.write('[GitNexus] fallback CONTAINS query failed: ' + err + '\n'); return []; }). Preserves the no-throw contract while providing a debug escape hatch.
  • Blocks merge: no

PR-specific assessment sections

Runtime control-flow assessment:

patternFirstWord is computed at engine.ts:89–90 before any DB interaction. The outer guard (pattern.length < 3) and the inner guard (!patternFirstWord || patternFirstWord.length < 2) are correctly ordered — a pattern of exactly 3 spaces (" ") passes the first guard (length=3 ≥ 3) and is caught by the second (trim → empty → early return). The early return if (bm25Results.length === 0) return '' has been removed and replaced by a post-fallback check at line 170. ftsAvailable is correctly destructured from searchFTSFromLbug. Fallback fires only after the BM25→symbol mapping loop completes. Normal FTS path is unaffected — if ftsAvailable = true, !ftsAvailable is false, fallback is never entered regardless of symbolMatches.length. ✓

FTS-unavailable fallback assessment:

The withTestLbugDB fixture with no ftsIndexes option does NOT call adapter.createFTSIndex(). When searchFTSFromLbug runs, it calls QUERY_FTS_INDEX('Function', 'function_fts', ...) — this throws because the index doesn't exist in the catalog, caught by queryFTSViaExecutor's try/catch returning null. All 5 FTS_INDEXES entries fail → queriesSucceeded = 0ftsAvailable = false. This matches the production code path (MCP write lock prevents FTS index creation; first-run has no indexes yet). The fixture uses initLbugWithDb (not a true read-only connection) but this doesn't matter: ftsLoaded = true only means the FTS extension binary is loaded, not that any indexes exist. The fallback fires through the real code path, not a mock of searchFTSFromLbug. ✓

Query safety assessment:

The escaping chain: pattern.trim().replace(/'/g, "''").split(/\s+/)[0]. The replace happens before split, so a pattern like "foo' bar" becomes "foo'' bar" before splitting, giving patternFirstWord = "foo''". The fallback query becomes WHERE n.name CONTAINS 'foo''' — syntactically valid Kuzu Cypher representing the string foo'. Backslashes are not special in Kuzu string literals. Newlines within patternFirstWord cannot occur because split(/\s+/) strips them. The CONTAINS '' scenario is blocked by the !patternFirstWord || patternFirstWord.length < 2 guard: an empty patternFirstWord never reaches the fallback query. ✓ The escaping is correct for the current Kuzu dialect. Parameterized queries (executeParameterized) would be preferred but escaping is demonstrably correct.

Relevance/noise assessment:

patternFirstWord.length < 2 allows tokens like "to", "if", "id", "do" (2 chars). Consider augment("to process the request", cwd) in a no-FTS session: patternFirstWord = "to" (2 chars, passes), fallback runs CONTAINS 'to', matching any node name containing "to" — toString, toJSON, authorize, etc. With LIMIT 5 and no ordering, up to 5 potentially arbitrary nodes would be returned. For typical code search hook inputs (agent grepping for a symbol name), this degrades quality but is unlikely to poison context badly. For natural-language inputs ("to process"), this is legitimately noisy. A minimum length of 3 and/or a stop-word list would help, but this is a quality concern, not a correctness or safety one (the output is labeled [GitNexus] and does not misrepresent itself). Medium, acceptable as follow-up.

Result-shape assessment:

The sym.id || sym[0] fallback pattern is consistent with all other symbol-fetching in engine.ts (line 135–139). sym.id and sym.name are strings; they would not be 0 or false. Community and Process nodes lack name in the schema, so n.name returns null in Kuzu and they are filtered by CONTAINS. File nodes have name but no filePath issue — File nodes DO have filePath in the schema. If a File node matches, sym.filePath is populated. Downstream callers/callees for File nodes would return empty maps, producing a valid (if low-information) output block. No malformed enrichment risk. ✓

Test assessment:

Scenario Covered? Verdict
FTS unavailable + matching node → enrichment augment('login', ...) in no-FTS suite Assertion: result.length > 0 + toContain('[GitNexus]') — adequate given seed data specificity
FTS unavailable + whitespace-only pattern → '' augment(' ', ...) — 4 spaces passes outer length < 3 check, caught by patternFirstWord guard
FTS unavailable + no matching nodes → '' augment('nxyz_notfound', ...)
FTS unavailable + fallback throws → '' vi.spyOn(poolAdapter, 'executeQuery').mockRejectedValue(...) The spy also fails FTS queries inside searchFTSFromLbug, preserving ftsAvailable: false. .catch(() => []) produces []. ✓
FTS available + no BM25 results → fallback NOT reached ✅ (implicit) augment('nonexistent_xyz', ...) in FTS suite — ftsAvailable: true!ftsAvailable = false → fallback skipped Implicit only; no explicit comment binding it to the non-fallback behavior
Normal FTS-available path unchanged ✅ all original integration tests pass
patternFirstWord guard in BM25 path (FTS-available + whitespace) ✅ existing augment(' ', ...) test in FTS suite, now blocked at patternFirstWord guard instead of outer guard
Apostrophe in pattern — no injection ❌ no explicit test Low risk given correct escaping, but a augment("O'Reilly", ...) test would pin the escaping contract

CI/checks assessment:

The last CI bot comment at 2026-05-10T11:57Z reported ✅ all checks for the run at that time. The final merge commit 7f6e554 (current head) adds main commits 6a89472 (server repo-name sanitization) and e412d29 (C scope resolution), both of which have their own passing CI histories on main. The augmentation tests in gitnexus/ are run by the standard npm test CI step (not the e2e/web job). I cannot directly confirm the current head's check status from the available data. The risk is low given the merge only brought in passing main commits and the PR-author commits already had a full green CI run.

Hidden Unicode / hygiene:

The engine.ts file contains (U+2192) at line 14 in a JSDoc comment and (U+2014) at line 318 in an inline comment. Neither is a bidi control character (bidi range: U+202A–U+202E, U+2066–U+2069). Neither appears in executable code, query strings, regex patterns, or escaping logic. No action required.


Final verdict

✅ production-ready with minor follow-ups

Both prior blockers are definitively resolved: the CONTAINS '' attack vector is blocked by the pattern.trim() + !patternFirstWord || patternFirstWord.length < 2 guard at the top of augment() before any DB interaction, and the FTS-unavailable code path is genuinely exercised by the second withTestLbugDB fixture with no ftsIndexes. The control flow is correct — the fallback fires only when ftsAvailable === false && symbolMatches.length === 0, preserving the normal FTS path entirely. Query escaping is correct for Kuzu Cypher. Branch hygiene is acceptable — merge-from-main commits are harmless and the PR file diff is cleanly scoped to the two augmentation files.

The remaining items are Medium/Low and should be tracked as follow-ups: add a node-label filter (IN ['Function', 'Method', 'Class', ...]) to the fallback query to avoid File node saturation; consider executeParameterized if Kuzu supports CONTAINS parameterization; add ORDER BY size(n.name), n.name for determinism; add a debug-gated log in .catch(). An apostrophe-in-pattern test would pin the escaping contract. None of these block merge.
· Branch

@magyargergo

Copy link
Copy Markdown
Collaborator

@Antheurus Could you please look into claude's findings? 🙏

magyargergo and others added 3 commits May 11, 2026 13:39
Asserts the CONTAINS fallback does NOT fire when FTS is available but
BM25 returns zero results. Pins the safety property promised by the PR
description: behavior is unchanged for users without the read-only-DB
condition.

If anyone later loosens the gate to `symbolMatches.length === 0` alone,
this test fails.
@magyargergo magyargergo merged commit fcab1e2 into abhigyanpatwari:main May 11, 2026
27 of 28 checks passed
caork added a commit to caork/GitNexus that referenced this pull request May 18, 2026
Bring in upstream fixes including:
- fix(search): create FTS indexes during analyze (abhigyanpatwari#1107) — ROOT CAUSE of
  query() returning 0 results (FTS indexes were never created because
  lazy creation failed on read-only MCP pool connection)
- fix(search): load FTS during core DB init (abhigyanpatwari#1123)
- fix(search): surface warning when FTS indexes missing (abhigyanpatwari#1418)
- fix(augment): add CONTAINS fallback when FTS unavailable (abhigyanpatwari#1476)
- fix(search): guard against undefined bm25Results (abhigyanpatwari#1489)
- feat(cpp): C++ ADL V2 overload resolution improvements
- feat(detect-changes): support git worktrees (abhigyanpatwari#1654)
- feat(cpp): parameter type class sidecar, SFINAE filter
- Various CI, security, and infrastructure improvements

AscendC provider updated to match upstream naming:
  sourcePreprocessor → preprocessSource

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
powersoftvnai added a commit to powersoftvnai/GitNexus that referenced this pull request May 27, 2026
On Windows the LadybugDB FTS extension is not loaded (see pool-adapter
win32 SIGSEGV guard) so `searchFTSFromLbug` reports `ftsAvailable: false`
and the query tool returned an empty payload with an `analyze --force`
warning that could not actually fix the underlying capability gap.

Mirror the `augment` CONTAINS fallback from abhigyanpatwari#1476: when FTS is reported
unavailable, run a per-label CONTAINS scan against Method / Function /
Class / Interface / File so `query` still returns ranked matches.

- New `containsKeywordFallback` private method on LocalBackend
- Wired into all three FTS-unavailable code paths in `bm25Search`
- Plumbs a `containsFallback` flag back to `query` so the warning text
  reflects the actual condition (extension missing vs index missing)

Less accurate than true BM25 (token-presence count, not TF-IDF /
BM25 scoring) but functional — restores end-to-end usability of the
query tool on Windows.

Tested locally against a 23k-symbol C# repo: queries that previously
returned `{ processes: [], definitions: [] }` now return correct
process-grouped results with the matching gotcha symbols at the top.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants