fix(embeddings): create VECTOR index via conn.query, not the prepared path (#2114)#2133
Conversation
|
@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
✨ PR AutofixFound fixable formatting / unused-import issues across 17 changed lines. Comment |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10854 tests passed 16 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
magyargergo
left a comment
There was a problem hiding this comment.
Tri-review of #2133
Methods: GitNexus swarm (risk, test/CI) + Compound-Engineering personas (correctness, adversarial, maintainability, testing) + Codex.
Engine breakdown: 5 Claude lanes + Codex (the only independent engine — live this run). Two of the three methods are Claude under different personas, so their agreement is "consistent," not independent confirmation.
Verdict — the fix is correct and well-scoped ✅
Codex found nothing across all six areas it probed; ce-correctness found nothing; risk lane = LOW / production-ready. The change mirrors the established createFTSIndex (conn.query) pattern, both runEmbeddingPipeline callers use the same singleton writable connection, and the vectorIndexReady/semanticMode contract is preserved — on the zero-new-embeddings (totalNodes===0) branch it now succeeds instead of always failing (code-read).
Empirically refuted 🔬 (validation is a feature)
ce-adversarial raised a P2 that incremental analyze would serve a stale HNSW snapshot (CREATE returns already exists → no rebuild). I tested it against @ladybugdb/core@0.17.1: after building the index I inserted a new row → QUERY_VECTOR_INDEX returned it at distance ~0, and a deleted indexed row disappeared. LadybugDB HNSW auto-maintains on insert/delete, so the incremental path stays fresh — ce-correctness reached the same conclusion independently. Refuted. Also refuted across lanes: the already exists match swallowing a real failure (it's the codebase-wide idiom — createFTSIndex, schema bootstrap, extension-loader all use it), policy divergence on the double extension-load gate (the cached vectorExtensionLoaded flag makes the 2nd call a no-op), and concurrency / read-only / warn-noise.
⛔ Must-fix before merge (CI)
quality / format is failing. Root prettier wants embedding-pipeline.ts line 39–43 collapsed to a single import line (a side effect of dropping CREATE_VECTOR_INDEX_QUERY from that import) — prettier --write fixes it. All other gates pass: lint, typecheck, typecheck-web, CodeQL (js+py), macOS platform-sensitive. (ubuntu-coverage + windows still running; Vercel ❌ = deploy-auth, not code.)
Inline (1)
Tightening the regression test's bare rejects.toThrow() to anchor the specific #2114 error — see inline comment.
Test-quality polish (non-blocking, P2/P3)
- Unit exact-scan-fallback test (
embedding-pipeline.test.ts:530, unchanged by this PR) omitscreateVectorIndexfrom its adapter mock. Harmless today — the extension-unavailable path short-circuits before the adapter call — but a latentTypeErrortrap and inconsistent with the 3 other mock sites this PR updated. [test-ci + ce-testing] - The pipeline catch branch has no test. The
logger.warn({ err: error }, …)+ return-false on an adapter throw is the secondary fix (it's what made the original failure visible —{ error }serialized an Error to the reported{"error":{}}). Add a unit test that mocks the adaptercreateVectorIndexto reject and assertsvectorIndexReady===false/semanticMode==='exact-scan'. [ce-testing] expect(vectorIndexMock).toHaveBeenCalled()→ addtoHaveBeenCalledTimes(1)to lock the zero-nodes branch. [ce-testing]- Optional parity:
createVectorIndexlacks the in-process idempotency cachecreateFTSIndexhas (ensuredFTSIndexes);createVectorIndex as createVectorIndexOnDbshadows the localcreateVectorIndex(consider renaming the wrapper). [maintainability, P3]
Coverage note
The server (api.ts) embedding path isn't exercised end-to-end, but runEmbeddingPipeline's signature is unchanged, so risk is low.
Automated multi-tool digest (Codex + 5 Claude reviewer lanes), human-curated — the one independent engine and ce-correctness were clean; the only scary finding was empirically refuted. Verify before acting.
1703a37 to
a93af0e
Compare
… path (abhigyanpatwari#2114) `gitnexus analyze` generated embeddings but silently failed to create the LadybugDB VECTOR/HNSW index, degrading semantic search to exact-scan and recording `vectorSearch.status: exact-scan` in meta.json — even where the VECTOR extension was available and the identical `CALL CREATE_VECTOR_INDEX(...)` succeeded when run manually via `conn.query()`. Root cause: `CALL CREATE_VECTOR_INDEX(...)` compiles to multiple statements, so LadybugDB cannot run it through `conn.prepare()` ("We do not support prepare multiple statements"). The embedding pipeline's `createVectorIndex` ran it through the injected `executeQuery`, which routes to `executePrepared` -> `conn.prepare()`. The throw was swallowed (dev-only `logger.warn({ error }, ...)` — and an Error logged under the non-`err` key serializes to `{}`, the reporter's mysterious `{"error":{}}`), so analyze fell back to exact-scan. FTS index creation survived because `createFTSIndex` uses `conn.query()`; the singleton `executeQuery` was switched to the prepared path in abhigyanpatwari#1655, breaking VECTOR but not FTS. The read path (`CALL QUERY_VECTOR_INDEX`) prepares fine, so semantic search itself was unaffected — only index creation. Fix: add an adapter-owned `createVectorIndex()` that runs the procedure via `conn.query()` (mirroring `createFTSIndex`), idempotent on "already exists" so incremental re-runs don't spuriously downgrade. The pipeline keeps its extension-install-policy gate, delegates creation to the adapter, and now logs failures via `{ err: error }` without the dev gate so a future degrade is visible. Both `runEmbeddingPipeline` callers use the same singleton writable connection, so the single adapter change fixes both analyze and the server path. Tests: real-`@ladybugdb/core` regression test in lbug-vector-extension.test.ts (asserts SHOW_INDEXES reports code_embedding_idx/HNSW, idempotency, and that the prepared executeQuery path rejects); updated embedding-pipeline.test.ts unit mocks to the new contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…abhigyanpatwari#2114) The prepared-path-rejects assertion shared a connection with the index-creation tests, so by the time it ran the index already existed and conn.prepare() failed with "index already exists" — not the multi-statement rejection the test name claims. Move it to its own fresh (index-free) withTestLbugDB suite and anchor it to /prepare multiple statements/i so it can only pass for the real abhigyanpatwari#2114 reason. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bhigyanpatwari#2114) The exact-scan-fallback test's vi.doMock of lbug-adapter omitted createVectorIndex (the pipeline now imports it). Harmless today — the extension-unavailable path short-circuits before the adapter call — but it left a latent TypeError trap and was inconsistent with the three other adapter mock sites. Add the mock for parity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…igyanpatwari#2114) When the adapter's createVectorIndex throws (e.g. a DB error during HNSW build), the pipeline wrapper must swallow it, log via { err }, and degrade to exact-scan rather than failing the whole analyze run. This branch — the secondary abhigyanpatwari#2114 visibility fix — had no coverage. Asserts the pipeline does not throw and returns vectorIndexReady=false / semanticMode='exact-scan' with embeddings still persisted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…branch (abhigyanpatwari#2114) Tighten the totalNodes===0 routing test from toHaveBeenCalled() to toHaveBeenCalledTimes(1) so an accidental double-creation on the early-return branch is caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
abhigyanpatwari#2114) createVectorIndex re-issued CALL CREATE_VECTOR_INDEX on every call, relying on the 'already exists' error string for idempotency. Add a module-scoped vectorIndexEnsured guard (mirrors ensuredFTSIndexes): early-return true when set, set on success and on 'already exists', and reset it everywhere vectorExtensionLoaded resets so it can never go stale against a swapped or closed connection. The integration idempotency test now also asserts SHOW_INDEXES has no duplicate code_embedding_idx. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…alias (abhigyanpatwari#2114) The pipeline-local wrapper shared the name createVectorIndex with the adapter export, forcing an `as createVectorIndexOnDb` import alias. Rename the wrapper to buildVectorIndex and import the adapter export under its real name. Internal rename only — no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…atwari#2114) Note at the call site why the pipeline pre-check and the adapter's own loadVectorExtension are not redundant: the pre-check applies the embedding-specific install policy, and the adapter's second load is a no-op via the cached vectorExtensionLoaded flag (no double install). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🎉 |
Summary
gitnexus analyzegenerated and persisted embeddings but silently failed to create the LadybugDB VECTOR/HNSW index, degrading semantic search toexact-scanand recordingvectorSearch.status: exact-scaninmeta.json— even when the VECTOR extension was available and the identicalCALL CREATE_VECTOR_INDEX(...)succeeded when run manually against the same.gitnexus/lbugdatabase.Fixes #2114.
Root cause
CALL CREATE_VECTOR_INDEX(...)compiles to multiple statements, so LadybugDB cannot run it throughconn.prepare()— it throwsConnection Exception: We do not support prepare multiple statements.The embedding pipeline's
createVectorIndexran the query through the injectedexecuteQuery, which routes toexecutePrepared→conn.prepare(). The throw was swallowed (dev-onlylogger.warn({ error }, ...), and anErrorlogged under the non-errkey serializes to{}— the reporter's mysterious{"error":{}}), so analyze fell back to exact-scan.createFTSIndexusesconn.query()— that asymmetry was the smoking gun.executeQueryfromconn.query()to the prepared path but left FTS untouched.CALL QUERY_VECTOR_INDEX) prepares fine, so semantic search was never broken — only index creation.Verified empirically against
@ladybugdb/core@0.17.1(the reporter's version):CREATE_VECTOR_INDEXviapreparefails, viaconn.query()succeeds;QUERY_VECTOR_INDEXand plainMATCHprepare fine;JSON.stringify({error: new Error('x')})→{"error":{}}.Fix
lbug-adapter.ts— newcreateVectorIndex()export that runs the procedure viaconn.query()(mirrorscreateFTSIndex). Idempotent onalready existsso incremental re-runs don't spuriously downgrade; returnsfalsefor an unavailable extension or read-only DB.embedding-pipeline.ts— keeps the embedding-specific extension-install-policy gate, delegates index creation to the adapter, and logs failures via{ err: error }without theisDevgate so a future silent degrade is visible.runEmbeddingPipelinecallers (run-analyze.ts,server/api.ts) use the same singleton writable connection, so this single adapter change fixes both the CLI and server paths.CREATE_VECTOR_INDEXwas the only non-preparable procedure on the prepared path — no sibling sites to fix.Testing
tsc --noEmit— clean.test/integration/lbug-vector-extension.test.ts, real@ladybugdb/core) — new suite assertscreateVectorIndex()createscode_embedding_idx/HNSW (verified viaCALL SHOW_INDEXES()), is idempotent, and that the preparedexecuteQuerypath rejects (locks the root cause). Skips where the VECTOR extension is unavailable, mirroring the FTS-skip convention. 7/7 passed.test/unit/embedding-pipeline.test.ts) — updated adapter mocks to the new contract; the zero-nodes-to-embed test now asserts the adapter'screateVectorIndexis invoked (and thatCREATE_VECTOR_INDEXdoes not flow throughexecuteQuery). 27/27 passed.🤖 Generated with Claude Code