feat(analyze): preserve existing embeddings by default; --force regenerates them; add --drop-embeddings opt-out (CLI + HTTP API)#1055
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…eddings opt-out Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/da1da041-afcd-4d38-8a2f-39ca52a462ff Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
|
@copilot if there are embeddings and user is using the |
…e+top-up) Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/e2759765-b8f6-453a-8c28-595439d23cb4 Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Done in bafd3df. |
|
Claude finished @magyargergo's task in 3m 39s —— View job Production Readiness ReviewTasks:
Review bar for this changeDerived from
Understanding of the changeWhat changed:
What should remain unchanged: All existing Runtime paths affected:
Contracts at risk: Findings[high]
|
| DoD Item | Status |
|---|---|
| Behavior implemented end-to-end on real runtime path | partially satisfied — CLI path is complete; HTTP API path preserves-by-default correctly but dropEmbeddings escape hatch is not wired |
| Edge cases handled or documented as out of scope | partially satisfied — cache-load failure is silently swallowed with no observable signal |
| Contracts preserved or updated in all consumers | partially satisfied — AnalyzeOptions updated in CLI and core; api.ts HTTP schema not updated |
| Docs updated where user-visible behavior changed | partially satisfied — AGENTS.md updated; GUARDRAILS.md contradicts new behavior |
| Tests cover the real changed path | not satisfied — type-assertion only; no behavioral coverage |
| Logging is proportionate — no silent catches that swallow diagnostics | not satisfied — cache-load failure produces no output |
| Destructive operations are opt-in or guarded | satisfied — --drop-embeddings is explicit opt-in |
| Scope matches — no unrelated churn | satisfied |
Validation assessment
- Evidence of tests: One new test in
run-analyze.test.ts— type assertion only; no behavioral coverage of the new flag logic. - Evidence of typecheck/build: PR notes that
tsc --noEmitand vitest could not be run locally (native tree-sitter build constraint). CI is the only validation gate. - Missing or unverifiable:
- No evidence that
--force+ embeddings case was integration-tested end-to-end. - No evidence that
dropEmbeddings: trueactually suppresses cache loading vs. merely being passed through. GUARDRAILS.mdnot changed;api.tsnot in the changed-files list — gap was not caught before submission.
- No evidence that
Final verdict
not production-ready
The primary behavior fix (preserve-by-default on the default analyze path) is correctly implemented in run-analyze.ts and works end-to-end for both CLI and HTTP paths. However, the --drop-embeddings escape hatch is dead code on the HTTP API path because api.ts does not extract or forward it from the request body — the PR description's claim that "the HTTP-bridge path picks it up with no further wiring" is demonstrably incorrect. GUARDRAILS.md non-negotiable #5 and the "Embeddings vanished after analyze" Sign now actively contradict the new behavior, which creates an agent correctness risk in a source-of-truth document that DoD.md requires to be kept accurate. Additionally, a silent catch {} on the cache-load re-introduces unlogged embedding loss through a new failure mode, and the single new test proves only that a type field exists, not that any of the new logic branches are correct. The two fixes needed are small and targeted; this should be blocked until they land.
…extract pure deriveEmbeddingMode + behavioral tests; sync GUARDRAILS.md Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/7d88e595-cbd8-47b2-ba4f-fb5b9a60cda4 Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
... All four findings addressed in ba36230:
|
gitnexus analyze(with or without--force) silently wiped every embedding from the index whenever--embeddingswas omitted, throwing away minutes of work on every routine re-analyze (post-commit hooks, agent integrations, CI). The information needed to make this safe (stats.embeddingsinmeta.json) was already written but never read.This PR implements proposal (1) from the issue: preserve on default, with
--forceupgraded to regenerate embeddings when the repo was already embedded, and an explicit--drop-embeddingsopt-out wired through both the CLI and the HTTP API.Behavior change
analyze(no flag) on a repo with embeddingsanalyze --forceon a repo with embeddingsanalyze --forceon a repo with no embeddingsanalyze --embeddingsanalyze --drop-embeddingsChanges
gitnexus/src/core/embedding-mode.ts(new) — pure helperderiveEmbeddingMode(options, existingEmbeddingCount)that returns{ shouldGenerateEmbeddings, preserveExistingEmbeddings, forceRegenerateEmbeddings, shouldLoadCache }. Lives in its own module (no native imports) so the branching contract can be unit-tested without LadybugDB / tree-sitter.gitnexus/src/core/run-analyze.ts— addeddropEmbeddingstoAnalyzeOptionsand delegates flag derivation toderiveEmbeddingMode. The resolvedshouldGenerateEmbeddingsflag gates the Phase-4 generation pass — so a forced re-index of an embedded repo loads the cache, restores it, and regenerates for new/changed nodes instead of quietly downgrading to "preserve only". Cache-load fires whenever the resolved load flag is set, regardless of--force(previously gated by!options.force, which is why the issue's--forcerepro also wiped). Logs an explicit line for each of the preserve / force-regenerate / explicit-drop branches, and the cache-loadcatchnow emitsWarning: could not load cached embeddings (<reason>). Embeddings will not be preserved on this run.so corrupt-DB / schema-mismatch failures are no longer indistinguishable from the original silent-data-loss bug.gitnexus/src/cli/analyze.ts+cli/index.ts— new--drop-embeddingsflag, plumbed throughAnalyzeOptions. Worker IPC (analyze-worker.ts) forwardsAnalyzeOptionsas a plain object.gitnexus/src/server/api.ts—POST /api/analyzenow destructuresdropEmbeddingsfrom the request body and forwards it to the analyze worker asoptions.dropEmbeddings. Without this, the new escape hatch was dead code on the HTTP/web-UI path.AGENTS.md"Keeping the Index Fresh" block (the old text wrongly said plainanalyze"deletes existing vectors") and the threegitnexus-cliSKILL tables.GUARDRAILS.md— non-negotiable Embeddings pipeline #5 rewritten to "plainanalyzenow preserves;--drop-embeddingsis the explicit wipe", and the "Embeddings vanished after analyze" Sign updated to point at--drop-embeddingsand the new cache-load warning as the only paths to zero.gitnexus/test/unit/run-analyze.test.tsnow contains 9 behavioral cases overderiveEmbeddingModecovering every (force×embeddings×dropEmbeddings×existingCount) combination, including: default-preserve,--forceregenerate,--forceno-op when no embeddings exist,--drop-embeddingssuppressing cache load (also when combined with--forceor--embeddings).Notes for reviewers
!options.forceguard from the cache-load deliberately.--forceretains its original meaning of "re-run the pipeline"; on top of that, when the repo was already embedded it now also re-runs the embedding pass (per maintainer feedback) instead of either wiping or silently preserving. Repos that never had embeddings see no change in--forcebehavior.analyzeafter a model change leaves the repo with zero embeddings until--embeddings(or--force) is passed, which matches the prior behavior in that edge case.npx tsc --noEmitshows no new errors on the changed files (pre-existinggitnexus-sharedworkspace-resolution errors are unrelated sandbox noise). vitest itself can't boot in the sandbox because the global setup requires the LadybugDB N-API binding and the sandboxnpm installcan't complete the nativepostinstall; the new purederiveEmbeddingModecases were verified locally withtsx(9 passed, 0 failed). Relying on CI for the full suite.