fix(search): create FTS indexes during analyze#1107
Conversation
Keep query-time LadybugDB access read-only by materializing BM25 indexes in the writable analyze phase.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 7476 tests passed 97 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
|
Claude finished @magyargergo's task in 3m 27s —— View job PR #1107 Review — fix(search): create FTS indexes during analyzeTasks:
Review bar for this PR
Understanding of the changeIssue #1090 was that Why query-time Lifecycle boundary changed: FTS index creation moved from "lazy, at first query" to "eager, during Runtime paths affected:
What must remain unchanged: BM25 search results remain ranked, merged across all five index tables, and limited correctly. Missing indexes still degrade gracefully (empty results, no crash). Re-analyze is idempotent. Findings[high] No test verifies that
|
| Item | Status |
|---|---|
| Configured BM25 FTS indexes materialized during real analyze write path before DB is served | satisfied — createSearchFTSIndexes called at run-analyze.ts:285 after loadGraphToLbug, inside the write-open window, before closeLbug |
Query-time BM25 search does not issue CREATE_FTS_INDEX or any write operation |
satisfied — searchFTSFromLbug contains only QUERY_FTS_INDEX cypher; verified in unit test |
| Centralized FTS schema is the single source of truth for creation and querying | partially satisfied — fts-schema.ts is correctly consumed by production code; test fixtures re-define their own column lists and miss Interface |
| Existing search behavior correct for File, Function, Class, Method, Interface | satisfied — all five tables present in schema and queried via FTS_INDEXES iteration |
| Existing repos without FTS indexes fail/degrade in a clearly acceptable way | partially satisfied — degrades gracefully (empty results) but silently; no diagnostic or migration guidance |
| Tests prove read-only query behavior and analyze-time index creation through meaningful paths | partially satisfied — read-only query behavior well tested; analyze-time wiring is not tested end-to-end |
| Performance impact of eager FTS creation acceptable; re-analyze idempotent | satisfied — createFTSIndex swallows "already exists"; creation runs once at 85-90% phase |
| Validation sufficient for a storage/search lifecycle change | partially satisfied — CI green, targeted tests pass; missing: test that wires runFullAnalysis → createSearchFTSIndexes |
Validation assessment
Tests added / updated:
bm25-search.test.ts: replaced large set of legacy tests with focused coverage ofcreateSearchFTSIndexes, MCP pool read-only behavior, score aggregation, and schema coveragelocal-backend-calltool.test.ts: minor cleanup; test usesLOCAL_BACKEND_FTS_INDEXESfixture (unchanged substance)
Validation claimed by the PR:
npx tsc --noEmit- Prettier check on changed files
vitest run test/unit/bm25-search.test.tsvitest run test/integration/search-pool.test.tsvitest run test/integration/local-backend-calltool.test.tsnpm test -- --pool=threadsnot green; claimed unrelated resolver failures after 7204 passing
Verified from CI:
- CI report shows 7476 passed, 0 failed across all three platforms ✓
- TypeCheck: success ✓
- 97 skipped tests are Swift-related (pre-existing) ✓
- Coverage deltas are marginal (≤0.1%) and consistent with deleting legacy test code ✓
Windows -1073741819 exits: Access violation after test success. The withTestLbugDB helper already documents this: "KuzuDB's C++ destructor can hang on Windows during native resource cleanup." CI shows 0 failures on Windows. This is a pre-existing known issue, not introduced by this PR.
Missing / not verifiable:
- No test exercises the
runFullAnalysis→createSearchFTSIndexeswiring. This is the highest-value missing test for a lifecycle-boundary change. - PR claimed
npm test -- --pool=threadshad failures, but CI shows 7476 / 0 — the claimed resolver failures appear to be local-only or were fixed before merge. Not a blocker.
Does missing validation block production readiness? The wiring gap is a meaningful test quality gap per DoD §2.7, but it does not prevent the code from being functionally correct — the implementation is clearly right when reading the source. Combined with the fixture schema drift, this is below the bar for the "Tests prove the changed behavior and catch broken wiring" DoD item.
Final verdict
production-ready with minor follow-ups
The core change is architecturally correct and complete: FTS creation is properly moved to analyze-time within the write window, query paths are cleanly read-only, the schema is centralized, and existing code handles "already exists" gracefully for re-analysis. CI is green on all three platforms, including the directly affected integration tests.
Two gaps prevent full "production-ready" status. First, the PR's central behavioral claim — "FTS indexes are created during analyze" — is not defended by a test that would fail if the wiring were removed. Second, test fixtures diverge from the centralized schema (extra column, missing table), meaning integration tests probe production behavior that may not match the real index. Neither gap is a runtime bug, but both fall short of the DoD §2.7 test bar for a storage lifecycle change.
The compatibility concern (old repos silently losing BM25 search) is the lowest-risk finding: since #1090 meant BM25 was already broken on the MCP path for these repos, it is not a regression in practice — but it still warrants a release note or user-facing log entry.
Recommend merging after addressing the wiring test (high priority) and fixture schema drift (medium priority). The compatibility note can be handled as a follow-up release note.
Bring in upstream fixes including: - fix(search): create FTS indexes during analyze (abhigyanpatwari#1107) — ROOT CAUSE of query() returning 0 results (FTS indexes were never created because lazy creation failed on read-only MCP pool connection) - fix(search): load FTS during core DB init (abhigyanpatwari#1123) - fix(search): surface warning when FTS indexes missing (abhigyanpatwari#1418) - fix(augment): add CONTAINS fallback when FTS unavailable (abhigyanpatwari#1476) - fix(search): guard against undefined bm25Results (abhigyanpatwari#1489) - feat(cpp): C++ ADL V2 overload resolution improvements - feat(detect-changes): support git worktrees (abhigyanpatwari#1654) - feat(cpp): parameter type class sidecar, SFINAE filter - Various CI, security, and infrastructure improvements AscendC provider updated to match upstream naming: sourcePreprocessor → preprocessSource Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
analyze, while LadybugDB is already open for writes.CREATE_FTS_INDEXcalls fromsearchFTSFromLbug.Fixes #1090
Test plan
npx tsc --noEmitnpx prettier --check src/core/search/bm25-index.ts src/core/search/fts-schema.ts src/core/search/fts-indexes.ts src/core/run-analyze.ts src/core/lbug/lbug-adapter.ts test/unit/bm25-search.test.ts test/integration/local-backend-calltool.test.tsnpx vitest run test/unit/bm25-search.test.ts --pool=threadsnpx vitest run test/integration/search-pool.test.ts --pool=threads(tests pass; Windows native process exits with-1073741819after reporting success)npx vitest run test/integration/local-backend-calltool.test.ts --pool=threads(tests pass; Windows native process exits with-1073741819after reporting success)npm test -- --pool=threads(not green: unrelated resolver suites fail in C#, Java, Swift, and TypeScript large-file coverage; 7204 tests passed before the existing failures)