fix(embeddings): create VECTOR index via conn.query, not the prepared path (#2114) by magyargergo · Pull Request #2133 · abhigyanpatwari/GitNexus

magyargergo · 2026-06-10T05:59:49Z

Summary

gitnexus analyze generated and persisted embeddings but silently failed to create the LadybugDB VECTOR/HNSW index, degrading semantic search to exact-scan and recording vectorSearch.status: exact-scan in meta.json — even when the VECTOR extension was available and the identical CALL CREATE_VECTOR_INDEX(...) succeeded when run manually against the same .gitnexus/lbug database.

Fixes #2114.

Root cause

CALL CREATE_VECTOR_INDEX(...) compiles to multiple statements, so LadybugDB cannot run it through conn.prepare() — it throws Connection Exception: We do not support prepare multiple statements.

The embedding pipeline's createVectorIndex ran the query through the injected executeQuery, which routes to executePrepared → conn.prepare(). The throw was swallowed (dev-only logger.warn({ error }, ...), and an Error logged under the non-err key serializes to {} — the reporter's mysterious {"error":{}}), so analyze fell back to exact-scan.

FTS index creation survived in the same run because createFTSIndex uses conn.query() — that asymmetry was the smoking gun.
Regression introduced in fix: Use Ladybug native read-only enforcement and prepared statement execution for Cypher query paths #1655, which switched the singleton executeQuery from conn.query() to the prepared path but left FTS untouched.
The read path (CALL QUERY_VECTOR_INDEX) prepares fine, so semantic search was never broken — only index creation.

Verified empirically against @ladybugdb/core@0.17.1 (the reporter's version): CREATE_VECTOR_INDEX via prepare fails, via conn.query() succeeds; QUERY_VECTOR_INDEX and plain MATCH prepare fine; JSON.stringify({error: new Error('x')}) → {"error":{}}.

Fix

lbug-adapter.ts — new createVectorIndex() export that runs the procedure via conn.query() (mirrors createFTSIndex). Idempotent on already exists so incremental re-runs don't spuriously downgrade; returns false for an unavailable extension or read-only DB.
embedding-pipeline.ts — keeps the embedding-specific extension-install-policy gate, delegates index creation to the adapter, and logs failures via { err: error } without the isDev gate so a future silent degrade is visible.
Both runEmbeddingPipeline callers (run-analyze.ts, server/api.ts) use the same singleton writable connection, so this single adapter change fixes both the CLI and server paths. CREATE_VECTOR_INDEX was the only non-preparable procedure on the prepared path — no sibling sites to fix.

Testing

tsc --noEmit — clean.
Integration (test/integration/lbug-vector-extension.test.ts, real @ladybugdb/core) — new suite asserts createVectorIndex() creates code_embedding_idx/HNSW (verified via CALL SHOW_INDEXES()), is idempotent, and that the prepared executeQuery path rejects (locks the root cause). Skips where the VECTOR extension is unavailable, mirroring the FTS-skip convention. 7/7 passed.
Unit (test/unit/embedding-pipeline.test.ts) — updated adapter mocks to the new contract; the zero-nodes-to-embed test now asserts the adapter's createVectorIndex is invoked (and that CREATE_VECTOR_INDEX does not flow through executeQuery). 27/27 passed.

🤖 Generated with Claude Code

vercel · 2026-06-10T05:59:53Z

@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

github-actions · 2026-06-10T06:04:13Z

✨ PR Autofix

Found fixable formatting / unused-import issues across 17 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":2133,"changed_lines":17,"head_sha":"1703a377c43c2c6ab54a62520e3d1a9a669f23a8","run_id":"27256702896","apply_command":"/autofix"}

github-actions · 2026-06-10T06:14:47Z

CI Report

✅ All checks passed

Pipeline Status

Stage	Status	Details
✅ Typecheck	`success`	tsc --noEmit
✅ Tests	`success`	unit tests, 3 platforms
✅ E2E	`success`	gitnexus-web changes only

Test Results

Tests	Passed	Failed	Skipped	Duration
10870	10854	0	16	536s

✅ All 10854 tests passed

16 test(s) skipped — expand for details

COBOL pipeline benchmark > scales with file count
C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
C++ pipeline benchmark > scales with file count
C# pipeline benchmark > scales with file count — namespaces spread across the solution
C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
Go pipeline benchmark > scales with file count (workers enabled)
Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
Go structural interface detection benchmark > scales linearly with interface × struct count
Go structural interface detection split-phase benchmark > separates index-build and detection time
PHP pipeline benchmark > scales with file count (workers enabled)
Ruby pipeline benchmark > scales with file count (workers enabled)
Rust pipeline benchmark > scales with file count (workers enabled)
Vue pipeline benchmark > scales with component count
run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric	Coverage	Covered	Base	Delta	Status
Statements	75.08%	35498/47276	N/A%	—	🟢 ███████████████░░░░░
Branches	62.84%	21938/34909	N/A%	—	🟢 ████████████░░░░░░░░
Functions	80.83%	3834/4743	N/A%	—	🟢 ████████████████░░░░
Lines	78.88%	32098/40691	N/A%	—	🟢 ███████████████░░░░░

_{📋 View full run · Generated by CI}

magyargergo

Tri-review of #2133

Methods: GitNexus swarm (risk, test/CI) + Compound-Engineering personas (correctness, adversarial, maintainability, testing) + Codex.
Engine breakdown: 5 Claude lanes + Codex (the only independent engine — live this run). Two of the three methods are Claude under different personas, so their agreement is "consistent," not independent confirmation.

Verdict — the fix is correct and well-scoped ✅

Codex found nothing across all six areas it probed; ce-correctness found nothing; risk lane = LOW / production-ready. The change mirrors the established createFTSIndex (conn.query) pattern, both runEmbeddingPipeline callers use the same singleton writable connection, and the vectorIndexReady/semanticMode contract is preserved — on the zero-new-embeddings (totalNodes===0) branch it now succeeds instead of always failing (code-read).

Empirically refuted 🔬 (validation is a feature)

ce-adversarial raised a P2 that incremental analyze would serve a stale HNSW snapshot (CREATE returns already exists → no rebuild). I tested it against @ladybugdb/core@0.17.1: after building the index I inserted a new row → QUERY_VECTOR_INDEX returned it at distance ~0, and a deleted indexed row disappeared. LadybugDB HNSW auto-maintains on insert/delete, so the incremental path stays fresh — ce-correctness reached the same conclusion independently. Refuted. Also refuted across lanes: the already exists match swallowing a real failure (it's the codebase-wide idiom — createFTSIndex, schema bootstrap, extension-loader all use it), policy divergence on the double extension-load gate (the cached vectorExtensionLoaded flag makes the 2nd call a no-op), and concurrency / read-only / warn-noise.

⛔ Must-fix before merge (CI)

quality / format is failing. Root prettier wants embedding-pipeline.ts line 39–43 collapsed to a single import line (a side effect of dropping CREATE_VECTOR_INDEX_QUERY from that import) — prettier --write fixes it. All other gates pass: lint, typecheck, typecheck-web, CodeQL (js+py), macOS platform-sensitive. (ubuntu-coverage + windows still running; Vercel ❌ = deploy-auth, not code.)

Inline (1)

Tightening the regression test's bare rejects.toThrow() to anchor the specific #2114 error — see inline comment.

Test-quality polish (non-blocking, P2/P3)

Unit exact-scan-fallback test (embedding-pipeline.test.ts:530, unchanged by this PR) omits createVectorIndex from its adapter mock. Harmless today — the extension-unavailable path short-circuits before the adapter call — but a latent TypeError trap and inconsistent with the 3 other mock sites this PR updated. [test-ci + ce-testing]
The pipeline catch branch has no test. The logger.warn({ err: error }, …) + return-false on an adapter throw is the secondary fix (it's what made the original failure visible — { error } serialized an Error to the reported {"error":{}}). Add a unit test that mocks the adapter createVectorIndex to reject and asserts vectorIndexReady===false / semanticMode==='exact-scan'. [ce-testing]
expect(vectorIndexMock).toHaveBeenCalled() → add toHaveBeenCalledTimes(1) to lock the zero-nodes branch. [ce-testing]
Optional parity: createVectorIndex lacks the in-process idempotency cache createFTSIndex has (ensuredFTSIndexes); createVectorIndex as createVectorIndexOnDb shadows the local createVectorIndex (consider renaming the wrapper). [maintainability, P3]

Coverage note

The server (api.ts) embedding path isn't exercised end-to-end, but runEmbeddingPipeline's signature is unchanged, so risk is low.

Automated multi-tool digest (Codex + 5 Claude reviewer lanes), human-curated — the one independent engine and ce-correctness were clean; the only scary finding was empirically refuted. Verify before acting.

… path (abhigyanpatwari#2114) `gitnexus analyze` generated embeddings but silently failed to create the LadybugDB VECTOR/HNSW index, degrading semantic search to exact-scan and recording `vectorSearch.status: exact-scan` in meta.json — even where the VECTOR extension was available and the identical `CALL CREATE_VECTOR_INDEX(...)` succeeded when run manually via `conn.query()`. Root cause: `CALL CREATE_VECTOR_INDEX(...)` compiles to multiple statements, so LadybugDB cannot run it through `conn.prepare()` ("We do not support prepare multiple statements"). The embedding pipeline's `createVectorIndex` ran it through the injected `executeQuery`, which routes to `executePrepared` -> `conn.prepare()`. The throw was swallowed (dev-only `logger.warn({ error }, ...)` — and an Error logged under the non-`err` key serializes to `{}`, the reporter's mysterious `{"error":{}}`), so analyze fell back to exact-scan. FTS index creation survived because `createFTSIndex` uses `conn.query()`; the singleton `executeQuery` was switched to the prepared path in abhigyanpatwari#1655, breaking VECTOR but not FTS. The read path (`CALL QUERY_VECTOR_INDEX`) prepares fine, so semantic search itself was unaffected — only index creation. Fix: add an adapter-owned `createVectorIndex()` that runs the procedure via `conn.query()` (mirroring `createFTSIndex`), idempotent on "already exists" so incremental re-runs don't spuriously downgrade. The pipeline keeps its extension-install-policy gate, delegates creation to the adapter, and now logs failures via `{ err: error }` without the dev gate so a future degrade is visible. Both `runEmbeddingPipeline` callers use the same singleton writable connection, so the single adapter change fixes both analyze and the server path. Tests: real-`@ladybugdb/core` regression test in lbug-vector-extension.test.ts (asserts SHOW_INDEXES reports code_embedding_idx/HNSW, idempotency, and that the prepared executeQuery path rejects); updated embedding-pipeline.test.ts unit mocks to the new contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…abhigyanpatwari#2114) The prepared-path-rejects assertion shared a connection with the index-creation tests, so by the time it ran the index already existed and conn.prepare() failed with "index already exists" — not the multi-statement rejection the test name claims. Move it to its own fresh (index-free) withTestLbugDB suite and anchor it to /prepare multiple statements/i so it can only pass for the real abhigyanpatwari#2114 reason. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bhigyanpatwari#2114) The exact-scan-fallback test's vi.doMock of lbug-adapter omitted createVectorIndex (the pipeline now imports it). Harmless today — the extension-unavailable path short-circuits before the adapter call — but it left a latent TypeError trap and was inconsistent with the three other adapter mock sites. Add the mock for parity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…igyanpatwari#2114) When the adapter's createVectorIndex throws (e.g. a DB error during HNSW build), the pipeline wrapper must swallow it, log via { err }, and degrade to exact-scan rather than failing the whole analyze run. This branch — the secondary abhigyanpatwari#2114 visibility fix — had no coverage. Asserts the pipeline does not throw and returns vectorIndexReady=false / semanticMode='exact-scan' with embeddings still persisted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…branch (abhigyanpatwari#2114) Tighten the totalNodes===0 routing test from toHaveBeenCalled() to toHaveBeenCalledTimes(1) so an accidental double-creation on the early-return branch is caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

abhigyanpatwari#2114) createVectorIndex re-issued CALL CREATE_VECTOR_INDEX on every call, relying on the 'already exists' error string for idempotency. Add a module-scoped vectorIndexEnsured guard (mirrors ensuredFTSIndexes): early-return true when set, set on success and on 'already exists', and reset it everywhere vectorExtensionLoaded resets so it can never go stale against a swapped or closed connection. The integration idempotency test now also asserts SHOW_INDEXES has no duplicate code_embedding_idx. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…alias (abhigyanpatwari#2114) The pipeline-local wrapper shared the name createVectorIndex with the adapter export, forcing an `as createVectorIndexOnDb` import alias. Rename the wrapper to buildVectorIndex and import the adapter export under its real name. Internal rename only — no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…atwari#2114) Note at the call site why the pipeline pre-check and the adapter's own loadVectorExtension are not redundant: the pre-check applies the embedding-specific install policy, and the adapter's second load is a no-op via the cached vectorExtensionLoaded flag (no double install). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rwb-truelime · 2026-06-11T20:18:16Z

🎉

magyargergo commented Jun 10, 2026

View reviewed changes

Comment thread gitnexus/test/integration/lbug-vector-extension.test.ts Outdated

magyargergo force-pushed the fix/2114-vector-index-creation branch from 1703a37 to a93af0e Compare June 10, 2026 06:18

magyargergo and others added 8 commits June 10, 2026 06:18

magyargergo merged commit e46651e into abhigyanpatwari:main Jun 10, 2026
27 of 28 checks passed

magyargergo deleted the fix/2114-vector-index-creation branch June 10, 2026 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(embeddings): create VECTOR index via conn.query, not the prepared path (#2114)#2133

fix(embeddings): create VECTOR index via conn.query, not the prepared path (#2114)#2133
magyargergo merged 8 commits into
abhigyanpatwari:mainfrom
magyargergo:fix/2114-vector-index-creation

magyargergo commented Jun 10, 2026

Uh oh!

vercel Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

magyargergo left a comment

Uh oh!

Uh oh!

Uh oh!

rwb-truelime commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

magyargergo commented Jun 10, 2026

Summary

Root cause

Fix

Testing

Uh oh!

vercel Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

✨ PR Autofix

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Report

Pipeline Status

Test Results

Code Coverage

Tests

Uh oh!

magyargergo left a comment

Choose a reason for hiding this comment

Tri-review of #2133

Verdict — the fix is correct and well-scoped ✅

Empirically refuted 🔬 (validation is a feature)

⛔ Must-fix before merge (CI)

Inline (1)

Test-quality polish (non-blocking, P2/P3)

Coverage note

Uh oh!

Uh oh!

Uh oh!

rwb-truelime commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 10, 2026 •

edited

Loading