Skip to content

fix(search): create FTS indexes during analyze#1107

Merged
magyargergo merged 1 commit into
mainfrom
fix/query-fts-index-readonly
Apr 27, 2026
Merged

fix(search): create FTS indexes during analyze#1107
magyargergo merged 1 commit into
mainfrom
fix/query-fts-index-readonly

Conversation

@magyargergo

Copy link
Copy Markdown
Collaborator

Summary

  • Materializes the configured BM25 FTS indexes during analyze, while LadybugDB is already open for writes.
  • Keeps query-time BM25 search on the read-only pool path by removing CREATE_FTS_INDEX calls from searchFTSFromLbug.
  • Centralizes the FTS index schema and tightens tests around read-only query behavior and process-grouped query results.

Fixes #1090

Test plan

  • npx tsc --noEmit
  • npx prettier --check src/core/search/bm25-index.ts src/core/search/fts-schema.ts src/core/search/fts-indexes.ts src/core/run-analyze.ts src/core/lbug/lbug-adapter.ts test/unit/bm25-search.test.ts test/integration/local-backend-calltool.test.ts
  • npx vitest run test/unit/bm25-search.test.ts --pool=threads
  • npx vitest run test/integration/search-pool.test.ts --pool=threads (tests pass; Windows native process exits with -1073741819 after reporting success)
  • npx vitest run test/integration/local-backend-calltool.test.ts --pool=threads (tests pass; Windows native process exits with -1073741819 after reporting success)
  • npm test -- --pool=threads (not green: unrelated resolver suites fail in C#, Java, Swift, and TypeScript large-file coverage; 7204 tests passed before the existing failures)

Keep query-time LadybugDB access read-only by materializing BM25 indexes in the writable analyze phase.
@vercel

vercel Bot commented Apr 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Apr 27, 2026 9:11am

Request Review

@github-actions

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
7573 7476 0 97 301s

✅ All 7476 tests passed

97 test(s) skipped — expand for details
  • Swift MethodExtractor > isTypeDeclaration > recognizes class_declaration
  • Swift MethodExtractor > isTypeDeclaration > recognizes protocol_declaration
  • Swift MethodExtractor > isTypeDeclaration > rejects import_declaration
  • Swift MethodExtractor > visibility > extracts public method
  • Swift MethodExtractor > visibility > extracts private method
  • Swift MethodExtractor > visibility > defaults to internal when no modifier
  • Swift MethodExtractor > protocol methods > marks protocol method as abstract
  • Swift MethodExtractor > static and class methods > detects static func as isStatic
  • Swift MethodExtractor > static and class methods > detects class func as isStatic
  • Swift MethodExtractor > parameters > extracts parameters with types and default values
  • Swift MethodExtractor > return type > extracts return type from -> annotation
  • Swift MethodExtractor > annotations > extracts @objc attribute
  • Swift MethodExtractor > isFinal > detects final func
  • Swift MethodExtractor > isFinal > is false when not final
  • Swift MethodExtractor > isAsync > detects async func
  • Swift MethodExtractor > isOverride > detects override method
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift explicit init inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference does not bind plain functions
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()
  • Swift implicit imports (cross-file visibility) > detects UserService class in Models.swift
  • Swift implicit imports (cross-file visibility) > resolves UserService() constructor call across files (no explicit import)
  • Swift implicit imports (cross-file visibility) > resolves service.fetchUser() member call across files
  • Swift implicit imports (cross-file visibility) > creates IMPORTS edges between files in the same module
  • Swift extension deduplication > detects Product class
  • Swift extension deduplication > resolves Product() constructor despite extension creating duplicate class node
  • Swift extension deduplication > resolves product.save() to Product.swift (primary definition)
  • Swift constructor call fallback (no new keyword) > resolves OCRService() as constructor call across files
  • Swift constructor call fallback (no new keyword) > resolves ocr.recognize() member call via constructor-inferred type
  • Swift export visibility (internal vs private) > resolves PublicService() constructor across files
  • Swift export visibility (internal vs private) > resolves internalHelper() across files (internal = module-scoped)
  • Swift if let / guard let binding resolution > detects User and Repo classes
  • Swift if let / guard let binding resolution > resolves user.save() inside if-let to User#save
  • Swift if let / guard let binding resolution > resolves repo.save() inside guard-let to Repo#save
  • Swift if let / guard let binding resolution > user.save() in if-let does NOT resolve to Repo#save
  • Swift await / try expression unwrapping > resolves user.save() via await fetchUser() return type
  • Swift await / try expression unwrapping > resolves repo.save() via try parseRepo() return type
  • Swift await / try expression unwrapping > detects fetchUser and parseRepo as functions
  • Swift for-in loop element type inference > detects User and Repo classes
  • Swift for-in loop element type inference > creates implicit import edges between files
  • Swift field-type resolution > detects classes and their properties
  • Swift field-type resolution > emits HAS_PROPERTY edges from class to field
  • Swift field-type resolution > resolves field-chain call user.address.save() → Address#save
  • Swift field-type resolution > emits ACCESSES edges for field reads in chains
  • Swift field-type resolution > populates field metadata (visibility, declaredType) on Property nodes
  • Swift call-result binding > resolves call-result-bound method call user.save() → User#save
  • Swift call-result binding > getUser() is present as a defined function
  • Swift call-result binding > emits processUser -> getUser CALLS edge for let-assigned free function call
  • Swift method enrichment > detects Animal protocol and Dog class
  • Swift method enrichment > emits IMPLEMENTS edge Dog -> Animal
  • Swift method enrichment > emits HAS_METHOD edges for Dog methods
  • Swift method enrichment > marks protocol Animal.speak as isAbstract
  • Swift method enrichment > marks Dog.speak as NOT isAbstract
  • Swift method enrichment > marks breathe as isFinal
  • Swift method enrichment > marks classify as isStatic
  • Swift method enrichment > captures @objc annotation on breathe
  • Swift method enrichment > populates parameterTypes for classify(_ name: String)
  • Swift method enrichment > records parameterCount for classify
  • Swift method enrichment > records returnType for speak
  • Swift method enrichment > resolves dog.speak() CALLS edge
  • Swift method enrichment > resolves Dog.classify("dog") CALLS edge
  • Swift abstract dispatch > detects Repository protocol and SqlRepository class
  • Swift abstract dispatch > emits IMPLEMENTS edge SqlRepository -> Repository
  • Swift abstract dispatch > emits HAS_METHOD edges for Repository.find and Repository.save
  • Swift abstract dispatch > emits HAS_METHOD edges for SqlRepository.find and SqlRepository.save
  • Swift abstract dispatch > marks base Repository.find as isAbstract
  • Swift abstract dispatch > marks base Repository.save as isAbstract
  • Swift abstract dispatch > marks concrete SqlRepository.find as NOT isAbstract
  • Swift abstract dispatch > resolves repo.find(id: 42) CALLS edge
  • Swift abstract dispatch > resolves repo.save(entity: user) CALLS edge
  • Swift abstract dispatch > populates parameterTypes for Repository.find
  • Swift abstract dispatch > populates parameterTypes for Repository.save
  • Swift abstract dispatch > records returnType for SqlRepository.find
  • Swift abstract dispatch > emits METHOD_IMPLEMENTS edges from SqlRepository methods → Repository protocol methods
  • Swift overloaded method disambiguation > detects 2 distinct find Method nodes on SqlRepository
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edges for both find overloads
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edge for save
  • Swift overloaded method disambiguation > emits exactly 3 METHOD_IMPLEMENTS edges total
  • Swift Child extends Parent — inherited method resolution (SM-9) > detects Parent and Child classes
  • Swift Child extends Parent — inherited method resolution (SM-9) > resolves c.parentMethod() to Parent.parentMethod via first-wins MRO walk

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 75.53% 21920/29020 75.6% 📉 -0.1 🔴 ███████████████░░░░░
Branches 64.52% 14057/21787 64.52% = 0.0 🟢 ████████████░░░░░░░░
Functions 80.67% 2117/2624 80.82% 📉 -0.1 🔴 ████████████████░░░░
Lines 78.33% 19780/25249 78.4% 📉 -0.1 🔴 ███████████████░░░░░

📋 View full run · Generated by CI

@github-actions

github-actions Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 3m 27s —— View job


PR #1107 Review — fix(search): create FTS indexes during analyze

Tasks:

  • Initialize review
  • Read governance docs (DoD.md, AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md)
  • Read changed files
  • Read supporting context (search-pool tests, lbug adapter, analyze entry points)
  • Analyze findings
  • Post final review

Review bar for this PR

  • Configured BM25 FTS indexes are materialized inside runFullAnalysis() while LadybugDB is open for writes, after graph data is loaded and before the DB is closed or served.
  • Query-time BM25 search (MCP pool path and CLI path) issues only QUERY_FTS_INDEX calls — no CREATE_FTS_INDEX or any write operation.
  • FTS_INDEXES in fts-schema.ts is the single source of truth consumed by both creation and query code — no residual hardcoded lists.
  • Existing search behavior remains correct for all five node types (File, Function, Class, Method, Interface).
  • Existing repos without FTS indexes either degrade acceptably or have a documented migration path.
  • Tests prove that searchFTSFromLbug does not issue CREATE_FTS_INDEX and that createSearchFTSIndexes creates the correct five indexes — and would fail if behavior or wiring changed.
  • Performance trade-off (eager analyze-time creation vs. lazy query-time creation) is acceptable and re-analyze is idempotent.
  • Validation covers TypeScript build, targeted prettier, and the three directly affected test suites; CI is green on all three platforms.

Understanding of the change

Issue #1090 was that searchFTSFromLbug called ensureFTSIndex (which internally calls CREATE_FTS_INDEX) on the MCP connection-pool path. LadybugDB FTS index creation requires a write-capable connection; the pool path is read-only by design. This caused failures or left BM25 search broken for MCP-routed queries.

Why query-time CREATE_FTS_INDEX was unsafe: The MCP pool adapter uses a connection that does not support write operations (isWriteQuery guard is enforced at the query-validation layer, and the native DB opened for pool queries is opened in read-only mode in production). Issuing CREATE_FTS_INDEX from that path either fails at the native layer or violates the read-only contract.

Lifecycle boundary changed: FTS index creation moved from "lazy, at first query" to "eager, during analyze". The boundary is now: graph loaded → FTS indexes created → DB closed → DB served to read-only query paths.

Runtime paths affected:

  • runFullAnalysis() (called from both cli/analyze.ts and server/analyze-worker.ts) — now creates FTS indexes
  • searchFTSFromLbug() — no longer creates anything; only issues QUERY_FTS_INDEX calls
  • MCP pool path and CLI direct-connection path are both read-only at query time

What must remain unchanged: BM25 search results remain ranked, merged across all five index tables, and limited correctly. Missing indexes still degrade gracefully (empty results, no crash). Re-analyze is idempotent.


Findings

[high] No test verifies that runFullAnalysis calls createSearchFTSIndexes

  • Category: Test coverage / runtime wiring
  • Files: gitnexus/src/core/run-analyze.ts, gitnexus/test/unit/bm25-search.test.ts
  • Issue: The unit test (createSearchFTSIndexes creates the configured indexes on the writable analysis path) mocks createFTSIndex and verifies the five calls are made when createSearchFTSIndexes() is called directly. This proves the function's own behavior but does not prove it is wired into runFullAnalysis(). If the call at run-analyze.ts:285 were accidentally removed, every test in the suite would continue to pass.
  • Why it matters here: The PR's central behavioral claim is "FTS indexes are materialized during analyze." DoD §2.7 requires tests that "would fail if behavior, wiring, or contracts were broken." A unit test of the helper function in isolation is not sufficient for a lifecycle-boundary change. There is no integration test that runs runFullAnalysis() end-to-end and then asserts that FTS indexes exist in the resulting DB.
  • Recommended fix: Add a focused integration test that calls runFullAnalysis() on a minimal fixture repo (a few files) and then queries the resulting DB with searchFTSFromLbug(), asserting at least one result is returned. Alternatively, add a spy-based assertion to the existing run-analyze unit tests that createSearchFTSIndexes is called once during a full run. The smallest correct approach: add vi.spyOn on createSearchFTSIndexes in a test that calls runFullAnalysis() and assert it was called.

[medium] Test fixture FTS columns diverge from the centralized schema

  • Category: Test fidelity / schema drift

  • Files: gitnexus/test/fixtures/local-backend-seed.ts, gitnexus/test/fixtures/search-seed.ts, gitnexus/src/core/search/fts-schema.ts

  • Issue: The centralized production schema (FTS_INDEXES) indexes only ['name', 'content'] for all five tables. But both test fixtures define their own column lists:

    • SEARCH_FTS_INDEXES uses ['name', 'content', 'description'] for Function, Class, Method, Interface
    • LOCAL_BACKEND_FTS_INDEXES uses ['name', 'content', 'description'] for Function, Class, Method — and omits Interface entirely

    This means integration tests search over a description column that production does not index, and the local-backend-calltool test does not exercise the Interface FTS index at all. Tests may find BM25 results (e.g. "User login" → description: 'User login') that the real production search would miss.

  • Why it matters here: The PR centralizes the FTS schema as a stated goal. The test fixtures immediately re-diverge from it, creating the exact drift risk the centralization was meant to prevent. DoD §2.7: "Ensure the centralized schema is not duplicated in test expectations in a way that can drift unnoticed."

  • Recommended fix: Update test fixtures to consume FTS_INDEXES from fts-schema.ts directly, mapping propertiescolumns. SEARCH_FTS_INDEXES and LOCAL_BACKEND_FTS_INDEXES should be derived from the canonical source, not redefined. If description is intentionally indexed in production, update fts-schema.ts to include it. If it is not, remove it from fixtures.


[medium] Existing repos without FTS indexes silently return empty BM25 results — no migration guidance

  • Category: Compatibility / operability

  • Files: gitnexus/src/core/search/bm25-index.ts, PR description

  • Issue: After this change, if a user has a repo indexed by a prior version of GitNexus:

    • FTS indexes may not exist (prior code created them lazily at first query; if they were never queried on the broken path, they don't exist)
    • queryFTS swallows "does not exist" errors and returns []
    • queryFTSViaExecutor catches all errors and returns []
    • BM25 search returns empty results with no diagnostic message

    The PR includes no release notes, no documentation, and no user-visible warning when FTS indexes are missing. There is also no in-code warning when queryFTS falls back to empty due to a missing index.

  • Why it matters here: From the user's perspective, upgrading GitNexus without re-running analyze causes BM25 search to silently stop working. The fix for query command: BM25 FTS index always fails with "Cannot execute write operations in a read-only database" #1090 was motivated by a broken query path — so existing repos likely had broken FTS anyway — but the PR should make this explicit. DoD §2.4: "Persisted data changes are backward-compatible or accompanied by a documented migration / reindex path." DoD §2.8: "Errors surfaced to users or callers are actionable."

  • Recommended fix: At minimum, add a note to the PR description (or a CHANGELOG entry if the project maintains one at release time) stating: "If you have existing indexed repos, run gitnexus analyze to materialize FTS indexes for BM25 search." Optionally, emit a debug log from queryFTS's "does not exist" catch branch, e.g. log('BM25: FTS index missing — re-run analyze to enable keyword search').


Definition of Done check

Item Status
Configured BM25 FTS indexes materialized during real analyze write path before DB is served satisfiedcreateSearchFTSIndexes called at run-analyze.ts:285 after loadGraphToLbug, inside the write-open window, before closeLbug
Query-time BM25 search does not issue CREATE_FTS_INDEX or any write operation satisfiedsearchFTSFromLbug contains only QUERY_FTS_INDEX cypher; verified in unit test
Centralized FTS schema is the single source of truth for creation and querying partially satisfiedfts-schema.ts is correctly consumed by production code; test fixtures re-define their own column lists and miss Interface
Existing search behavior correct for File, Function, Class, Method, Interface satisfied — all five tables present in schema and queried via FTS_INDEXES iteration
Existing repos without FTS indexes fail/degrade in a clearly acceptable way partially satisfied — degrades gracefully (empty results) but silently; no diagnostic or migration guidance
Tests prove read-only query behavior and analyze-time index creation through meaningful paths partially satisfied — read-only query behavior well tested; analyze-time wiring is not tested end-to-end
Performance impact of eager FTS creation acceptable; re-analyze idempotent satisfiedcreateFTSIndex swallows "already exists"; creation runs once at 85-90% phase
Validation sufficient for a storage/search lifecycle change partially satisfied — CI green, targeted tests pass; missing: test that wires runFullAnalysiscreateSearchFTSIndexes

Validation assessment

Tests added / updated:

  • bm25-search.test.ts: replaced large set of legacy tests with focused coverage of createSearchFTSIndexes, MCP pool read-only behavior, score aggregation, and schema coverage
  • local-backend-calltool.test.ts: minor cleanup; test uses LOCAL_BACKEND_FTS_INDEXES fixture (unchanged substance)

Validation claimed by the PR:

  • npx tsc --noEmit
  • Prettier check on changed files
  • vitest run test/unit/bm25-search.test.ts
  • vitest run test/integration/search-pool.test.ts
  • vitest run test/integration/local-backend-calltool.test.ts
  • npm test -- --pool=threads not green; claimed unrelated resolver failures after 7204 passing

Verified from CI:

  • CI report shows 7476 passed, 0 failed across all three platforms ✓
  • TypeCheck: success ✓
  • 97 skipped tests are Swift-related (pre-existing) ✓
  • Coverage deltas are marginal (≤0.1%) and consistent with deleting legacy test code ✓

Windows -1073741819 exits: Access violation after test success. The withTestLbugDB helper already documents this: "KuzuDB's C++ destructor can hang on Windows during native resource cleanup." CI shows 0 failures on Windows. This is a pre-existing known issue, not introduced by this PR.

Missing / not verifiable:

  • No test exercises the runFullAnalysiscreateSearchFTSIndexes wiring. This is the highest-value missing test for a lifecycle-boundary change.
  • PR claimed npm test -- --pool=threads had failures, but CI shows 7476 / 0 — the claimed resolver failures appear to be local-only or were fixed before merge. Not a blocker.

Does missing validation block production readiness? The wiring gap is a meaningful test quality gap per DoD §2.7, but it does not prevent the code from being functionally correct — the implementation is clearly right when reading the source. Combined with the fixture schema drift, this is below the bar for the "Tests prove the changed behavior and catch broken wiring" DoD item.


Final verdict

production-ready with minor follow-ups

The core change is architecturally correct and complete: FTS creation is properly moved to analyze-time within the write window, query paths are cleanly read-only, the schema is centralized, and existing code handles "already exists" gracefully for re-analysis. CI is green on all three platforms, including the directly affected integration tests.

Two gaps prevent full "production-ready" status. First, the PR's central behavioral claim — "FTS indexes are created during analyze" — is not defended by a test that would fail if the wiring were removed. Second, test fixtures diverge from the centralized schema (extra column, missing table), meaning integration tests probe production behavior that may not match the real index. Neither gap is a runtime bug, but both fall short of the DoD §2.7 test bar for a storage lifecycle change.

The compatibility concern (old repos silently losing BM25 search) is the lowest-risk finding: since #1090 meant BM25 was already broken on the MCP path for these repos, it is not a regression in practice — but it still warrants a release note or user-facing log entry.

Recommend merging after addressing the wiring test (high priority) and fixture schema drift (medium priority). The compatibility note can be handled as a follow-up release note.

@magyargergo magyargergo merged commit 5c434ff into main Apr 27, 2026
25 checks passed
@magyargergo magyargergo deleted the fix/query-fts-index-readonly branch April 27, 2026 10:00
caork added a commit to caork/GitNexus that referenced this pull request May 18, 2026
Bring in upstream fixes including:
- fix(search): create FTS indexes during analyze (abhigyanpatwari#1107) — ROOT CAUSE of
  query() returning 0 results (FTS indexes were never created because
  lazy creation failed on read-only MCP pool connection)
- fix(search): load FTS during core DB init (abhigyanpatwari#1123)
- fix(search): surface warning when FTS indexes missing (abhigyanpatwari#1418)
- fix(augment): add CONTAINS fallback when FTS unavailable (abhigyanpatwari#1476)
- fix(search): guard against undefined bm25Results (abhigyanpatwari#1489)
- feat(cpp): C++ ADL V2 overload resolution improvements
- feat(detect-changes): support git worktrees (abhigyanpatwari#1654)
- feat(cpp): parameter type class sidecar, SFINAE filter
- Various CI, security, and infrastructure improvements

AscendC provider updated to match upstream naming:
  sourcePreprocessor → preprocessSource

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

query command: BM25 FTS index always fails with "Cannot execute write operations in a read-only database"

1 participant