Skip to content

fix(embeddings): create VECTOR index via conn.query, not the prepared path (#2114)#2133

Merged
magyargergo merged 8 commits into
abhigyanpatwari:mainfrom
magyargergo:fix/2114-vector-index-creation
Jun 10, 2026
Merged

fix(embeddings): create VECTOR index via conn.query, not the prepared path (#2114)#2133
magyargergo merged 8 commits into
abhigyanpatwari:mainfrom
magyargergo:fix/2114-vector-index-creation

Conversation

@magyargergo

Copy link
Copy Markdown
Collaborator

Summary

gitnexus analyze generated and persisted embeddings but silently failed to create the LadybugDB VECTOR/HNSW index, degrading semantic search to exact-scan and recording vectorSearch.status: exact-scan in meta.json — even when the VECTOR extension was available and the identical CALL CREATE_VECTOR_INDEX(...) succeeded when run manually against the same .gitnexus/lbug database.

Fixes #2114.

Root cause

CALL CREATE_VECTOR_INDEX(...) compiles to multiple statements, so LadybugDB cannot run it through conn.prepare() — it throws Connection Exception: We do not support prepare multiple statements.

The embedding pipeline's createVectorIndex ran the query through the injected executeQuery, which routes to executePreparedconn.prepare(). The throw was swallowed (dev-only logger.warn({ error }, ...), and an Error logged under the non-err key serializes to {} — the reporter's mysterious {"error":{}}), so analyze fell back to exact-scan.

Verified empirically against @ladybugdb/core@0.17.1 (the reporter's version): CREATE_VECTOR_INDEX via prepare fails, via conn.query() succeeds; QUERY_VECTOR_INDEX and plain MATCH prepare fine; JSON.stringify({error: new Error('x')}){"error":{}}.

Fix

  • lbug-adapter.ts — new createVectorIndex() export that runs the procedure via conn.query() (mirrors createFTSIndex). Idempotent on already exists so incremental re-runs don't spuriously downgrade; returns false for an unavailable extension or read-only DB.
  • embedding-pipeline.ts — keeps the embedding-specific extension-install-policy gate, delegates index creation to the adapter, and logs failures via { err: error } without the isDev gate so a future silent degrade is visible.
  • Both runEmbeddingPipeline callers (run-analyze.ts, server/api.ts) use the same singleton writable connection, so this single adapter change fixes both the CLI and server paths. CREATE_VECTOR_INDEX was the only non-preparable procedure on the prepared path — no sibling sites to fix.

Testing

  • tsc --noEmit — clean.
  • Integration (test/integration/lbug-vector-extension.test.ts, real @ladybugdb/core) — new suite asserts createVectorIndex() creates code_embedding_idx/HNSW (verified via CALL SHOW_INDEXES()), is idempotent, and that the prepared executeQuery path rejects (locks the root cause). Skips where the VECTOR extension is unavailable, mirroring the FTS-skip convention. 7/7 passed.
  • Unit (test/unit/embedding-pipeline.test.ts) — updated adapter mocks to the new contract; the zero-nodes-to-embed test now asserts the adapter's createVectorIndex is invoked (and that CREATE_VECTOR_INDEX does not flow through executeQuery). 27/27 passed.

🤖 Generated with Claude Code

@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Found fixable formatting / unused-import issues across 17 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":2133,"changed_lines":17,"head_sha":"1703a377c43c2c6ab54a62520e3d1a9a669f23a8","run_id":"27256702896","apply_command":"/autofix"}

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10870 10854 0 16 536s

✅ All 10854 tests passed

16 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
  • C++ pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • Go structural interface detection benchmark > scales linearly with interface × struct count
  • Go structural interface detection split-phase benchmark > separates index-build and detection time
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • Vue pipeline benchmark > scales with component count
  • run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 75.08% 35498/47276 N/A% 🟢 ███████████████░░░░░
Branches 62.84% 21938/34909 N/A% 🟢 ████████████░░░░░░░░
Functions 80.83% 3834/4743 N/A% 🟢 ████████████████░░░░
Lines 78.88% 32098/40691 N/A% 🟢 ███████████████░░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tri-review of #2133

Methods: GitNexus swarm (risk, test/CI) + Compound-Engineering personas (correctness, adversarial, maintainability, testing) + Codex.
Engine breakdown: 5 Claude lanes + Codex (the only independent engine — live this run). Two of the three methods are Claude under different personas, so their agreement is "consistent," not independent confirmation.

Verdict — the fix is correct and well-scoped ✅

Codex found nothing across all six areas it probed; ce-correctness found nothing; risk lane = LOW / production-ready. The change mirrors the established createFTSIndex (conn.query) pattern, both runEmbeddingPipeline callers use the same singleton writable connection, and the vectorIndexReady/semanticMode contract is preserved — on the zero-new-embeddings (totalNodes===0) branch it now succeeds instead of always failing (code-read).

Empirically refuted 🔬 (validation is a feature)

ce-adversarial raised a P2 that incremental analyze would serve a stale HNSW snapshot (CREATE returns already exists → no rebuild). I tested it against @ladybugdb/core@0.17.1: after building the index I inserted a new row → QUERY_VECTOR_INDEX returned it at distance ~0, and a deleted indexed row disappeared. LadybugDB HNSW auto-maintains on insert/delete, so the incremental path stays fresh — ce-correctness reached the same conclusion independently. Refuted. Also refuted across lanes: the already exists match swallowing a real failure (it's the codebase-wide idiom — createFTSIndex, schema bootstrap, extension-loader all use it), policy divergence on the double extension-load gate (the cached vectorExtensionLoaded flag makes the 2nd call a no-op), and concurrency / read-only / warn-noise.

⛔ Must-fix before merge (CI)

quality / format is failing. Root prettier wants embedding-pipeline.ts line 39–43 collapsed to a single import line (a side effect of dropping CREATE_VECTOR_INDEX_QUERY from that import) — prettier --write fixes it. All other gates pass: lint, typecheck, typecheck-web, CodeQL (js+py), macOS platform-sensitive. (ubuntu-coverage + windows still running; Vercel ❌ = deploy-auth, not code.)

Inline (1)

Tightening the regression test's bare rejects.toThrow() to anchor the specific #2114 error — see inline comment.

Test-quality polish (non-blocking, P2/P3)

  • Unit exact-scan-fallback test (embedding-pipeline.test.ts:530, unchanged by this PR) omits createVectorIndex from its adapter mock. Harmless today — the extension-unavailable path short-circuits before the adapter call — but a latent TypeError trap and inconsistent with the 3 other mock sites this PR updated. [test-ci + ce-testing]
  • The pipeline catch branch has no test. The logger.warn({ err: error }, …) + return-false on an adapter throw is the secondary fix (it's what made the original failure visible — { error } serialized an Error to the reported {"error":{}}). Add a unit test that mocks the adapter createVectorIndex to reject and asserts vectorIndexReady===false / semanticMode==='exact-scan'. [ce-testing]
  • expect(vectorIndexMock).toHaveBeenCalled() → add toHaveBeenCalledTimes(1) to lock the zero-nodes branch. [ce-testing]
  • Optional parity: createVectorIndex lacks the in-process idempotency cache createFTSIndex has (ensuredFTSIndexes); createVectorIndex as createVectorIndexOnDb shadows the local createVectorIndex (consider renaming the wrapper). [maintainability, P3]

Coverage note

The server (api.ts) embedding path isn't exercised end-to-end, but runEmbeddingPipeline's signature is unchanged, so risk is low.


Automated multi-tool digest (Codex + 5 Claude reviewer lanes), human-curated — the one independent engine and ce-correctness were clean; the only scary finding was empirically refuted. Verify before acting.

Comment thread gitnexus/test/integration/lbug-vector-extension.test.ts Outdated
@magyargergo magyargergo force-pushed the fix/2114-vector-index-creation branch from 1703a37 to a93af0e Compare June 10, 2026 06:18
magyargergo and others added 8 commits June 10, 2026 06:18
… path (abhigyanpatwari#2114)

`gitnexus analyze` generated embeddings but silently failed to create the
LadybugDB VECTOR/HNSW index, degrading semantic search to exact-scan and
recording `vectorSearch.status: exact-scan` in meta.json — even where the
VECTOR extension was available and the identical `CALL CREATE_VECTOR_INDEX(...)`
succeeded when run manually via `conn.query()`.

Root cause: `CALL CREATE_VECTOR_INDEX(...)` compiles to multiple statements, so
LadybugDB cannot run it through `conn.prepare()` ("We do not support prepare
multiple statements"). The embedding pipeline's `createVectorIndex` ran it
through the injected `executeQuery`, which routes to `executePrepared` ->
`conn.prepare()`. The throw was swallowed (dev-only `logger.warn({ error }, ...)`
— and an Error logged under the non-`err` key serializes to `{}`, the reporter's
mysterious `{"error":{}}`), so analyze fell back to exact-scan. FTS index
creation survived because `createFTSIndex` uses `conn.query()`; the singleton
`executeQuery` was switched to the prepared path in abhigyanpatwari#1655, breaking VECTOR but
not FTS. The read path (`CALL QUERY_VECTOR_INDEX`) prepares fine, so semantic
search itself was unaffected — only index creation.

Fix: add an adapter-owned `createVectorIndex()` that runs the procedure via
`conn.query()` (mirroring `createFTSIndex`), idempotent on "already exists" so
incremental re-runs don't spuriously downgrade. The pipeline keeps its
extension-install-policy gate, delegates creation to the adapter, and now logs
failures via `{ err: error }` without the dev gate so a future degrade is
visible. Both `runEmbeddingPipeline` callers use the same singleton writable
connection, so the single adapter change fixes both analyze and the server path.

Tests: real-`@ladybugdb/core` regression test in lbug-vector-extension.test.ts
(asserts SHOW_INDEXES reports code_embedding_idx/HNSW, idempotency, and that the
prepared executeQuery path rejects); updated embedding-pipeline.test.ts unit
mocks to the new contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…abhigyanpatwari#2114)

The prepared-path-rejects assertion shared a connection with the
index-creation tests, so by the time it ran the index already existed and
conn.prepare() failed with "index already exists" — not the multi-statement
rejection the test name claims. Move it to its own fresh (index-free)
withTestLbugDB suite and anchor it to /prepare multiple statements/i so it can
only pass for the real abhigyanpatwari#2114 reason.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bhigyanpatwari#2114)

The exact-scan-fallback test's vi.doMock of lbug-adapter omitted createVectorIndex
(the pipeline now imports it). Harmless today — the extension-unavailable path
short-circuits before the adapter call — but it left a latent TypeError trap and
was inconsistent with the three other adapter mock sites. Add the mock for parity.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…igyanpatwari#2114)

When the adapter's createVectorIndex throws (e.g. a DB error during HNSW
build), the pipeline wrapper must swallow it, log via { err }, and degrade to
exact-scan rather than failing the whole analyze run. This branch — the
secondary abhigyanpatwari#2114 visibility fix — had no coverage. Asserts the pipeline does not
throw and returns vectorIndexReady=false / semanticMode='exact-scan' with
embeddings still persisted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…branch (abhigyanpatwari#2114)

Tighten the totalNodes===0 routing test from toHaveBeenCalled() to
toHaveBeenCalledTimes(1) so an accidental double-creation on the early-return
branch is caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
abhigyanpatwari#2114)

createVectorIndex re-issued CALL CREATE_VECTOR_INDEX on every call, relying on
the 'already exists' error string for idempotency. Add a module-scoped
vectorIndexEnsured guard (mirrors ensuredFTSIndexes): early-return true when set,
set on success and on 'already exists', and reset it everywhere
vectorExtensionLoaded resets so it can never go stale against a swapped or closed
connection. The integration idempotency test now also asserts SHOW_INDEXES has no
duplicate code_embedding_idx.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…alias (abhigyanpatwari#2114)

The pipeline-local wrapper shared the name createVectorIndex with the adapter
export, forcing an `as createVectorIndexOnDb` import alias. Rename the wrapper to
buildVectorIndex and import the adapter export under its real name. Internal
rename only — no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…atwari#2114)

Note at the call site why the pipeline pre-check and the adapter's own
loadVectorExtension are not redundant: the pre-check applies the
embedding-specific install policy, and the adapter's second load is a no-op via
the cached vectorExtensionLoaded flag (no double install).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@magyargergo magyargergo merged commit e46651e into abhigyanpatwari:main Jun 10, 2026
27 of 28 checks passed
@magyargergo magyargergo deleted the fix/2114-vector-index-creation branch June 10, 2026 07:00
@rwb-truelime

Copy link
Copy Markdown

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VECTOR index not created during analyze despite VECTOR extension being available

2 participants