feat: add optional limit arg to --embeddings flag (closes #382)#1375
Conversation
Replaces the hard-coded 50,000-node embedding cap with an optional argument on `--embeddings`: --embeddings keeps the default 50K-node safety cap --embeddings 0 disables the cap entirely --embeddings <n> sets a custom cap When the cap is hit, the skip message now points users at the override syntax. When the cap is disabled on a >50K-node repo, an OOM advisory is logged. Reimplements stale PR abhigyanpatwari#389; merges the previously separate `--ignore-embeddings-node-limit` flag into the existing `--embeddings` flag per maintainer review feedback. Closes abhigyanpatwari#382
|
Someone is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
|
Claude finished @magyargergo's task in 4m 25s —— View job PR #1375 Review —
|
| Item | Value |
|---|---|
| Head SHA | f43d5d8d3aa91f18ebd343900b9db47e425de618 |
| Status | Open, non-draft |
| Commits | 1 |
| Changed files | 3 (cli/index.ts, cli/analyze.ts, core/run-analyze.ts) |
| CI — main CI | In progress (started 2026-05-06T11:13:25Z) |
| CI — CodeQL | ✅ Passed |
| CI — Dependency Review | ✅ Passed |
| CI — Gitleaks | ✅ Passed |
| CI — PR Description Check | ✅ Passed |
| CI — Docker Build | In progress |
| Issue #382 | Linked, closes line in PR body ✅ |
| Stale PR #389 | Superseded by design (folded into one flag) ✅ |
| Hidden Unicode (bidi/invisible controls) | None found — only em dashes, box-drawing chars, and arrows in comments/progress bar (all visible, safe) |
| Vercel | Irrelevant — CLI/core change only |
| Latest code matches PR body | Yes |
Understanding of the change
The 50k-node cap in runFullAnalysis exists because generating embeddings for very large graphs can exhaust heap memory. Users on large monorepos had no in-CLI way to opt out of the skip. This PR folds the override into --embeddings [limit] rather than adding a second flag. Commander's optional-value syntax means the flag is undefined when omitted, true when bare, and a string when given a value. The CLI converts the string to a typed number | undefined and passes embeddingsNodeLimit to runFullAnalysis; the pipeline uses options.embeddingsNodeLimit ?? DEFAULT_EMBEDDING_NODE_LIMIT. 0 is the conventional "no cap" sentinel.
Findings
[Medium] Missing tests for new parsing and cap logic
- Category: Test coverage (DoD §2.7)
- Files:
gitnexus/src/cli/analyze.ts(lines 343–354),gitnexus/src/core/run-analyze.ts(lines 342–363) - Issue: The PR ships no automated tests for:
--embeddings abc(fail fast),--embeddings -1(fail fast),--embeddings 1.5(fail fast),--embeddings 0(cap disabled),--embeddings 100000(custom cap), the OOM advisory log, the updated skip message, or therunFullAnalysiscap-decision logic. The existinganalyze-worker-timeout.test.tsprovides an exact pattern: mockrunFullAnalysis, callanalyzeCommand, assertprocess.exitCode === 1andrunFullAnalysisMocknot called. TherunFullAnalysiscap logic can be unit-tested by injecting a mockstats.nodesvalue and asserting on whetherrunEmbeddingPipelinewas called and what was logged. - Why it matters: DoD §2.7 requires tests to "fail if behavior, wiring, or contracts were broken." Today, the cap logic could regress entirely — reverting to a hard-coded guard — and no test would catch it. The PR body's test plan is manual-only.
- Recommended fix: Add a
gitnexus/test/unit/analyze-embeddings-limit.test.tsusing theanalyze-worker-timeout.test.tsmock-and-call pattern to cover CLI validation cases, and extendrun-analyze.test.tswith cap-decision tests (or extractnodeLimitlogic intoembedding-mode.tswhere it can be unit-tested cheaply). - Blocks merge: No — the feature is correct — but this gap must be closed in a follow-up before the PR is considered fully Done against the project's own DoD bar. Fix this →
[Minor] Validation fires after progress bar start — terminal left dirty on error
- Category: Observability/operability (DoD §2.8), UX correctness
- Files:
gitnexus/src/cli/analyze.tslines 286 vs 343–356 - Issue: The progress bar starts at line 286 (
bar.start(...)) andconsole.erroris redirected tobarLogat line 313. The--embeddingsvalidation block at lines 343–354 usesconsole.error(...)thenprocess.exit(1). Sincebar.stop()is never called, the terminal exits with the cursor hidden (hideCursor: true) and the progress bar not cleaned up. Every other validation in this function that setsprocess.exitCode = 1; return(lines 167–210) runs before the bar starts. The embeddings validation is the only one placed after bar start and usingprocess.exit(1)directly. - Why it matters: On most terminals this means the cursor vanishes after
--embeddings abc. Minor UX degradation, not a data-safety issue, but inconsistent with the existing pattern. - Recommended fix: Move the embeddings validation block to before
bar.start()(e.g., immediately after theworkerTimeoutblock at line 175), usingprocess.exitCode = 1; returnto match the existing pattern. Fix this → - Blocks merge: No.
[Minor/Informational] Commander optional arg consumes positional repo path
- Category: CLI usability, Commander edge case
- Files:
gitnexus/src/cli/index.tsline 27,gitnexus/src/cli/analyze.tslines 343–354 - Issue: With
--embeddings [limit], Commander consumes the next non-flag token as the optional value. Sogitnexus analyze --embeddings .treats.as the limit (not the repo path), fails validation withgot ".", and leavesinputPathunset. The more naturalgitnexus analyze . --embeddingsandgitnexus analyze --embeddings 0 .both work correctly. This is inherent to Commander's optional-value behavior and documented in Commander.js docs, but can catch users who put the path before the flag. - Why it matters: Not a safety issue — the validation does catch it — but the error message "got
." is confusing.--embeddings --drop-embeddingsis safe (Commander recognizes--tokens as flags, not values).--embeddings -1is also safe (Commander treats-1as a flag token, soembeddings = trueand-1causes a Commander unknown-option error before reaching the validator — the negative-value validation in analyze.ts is thus unreachable in practice). - Recommended fix: Add a note in the help text:
(place before the path argument, e.g. --embeddings 0 <path>), or add a test documenting Commander's behavior. - Blocks merge: No.
CLI option assessment
index.ts:27 — option declared as --embeddings [limit] ✅. Help text at lines 28–30 explains the 50k cap, 0 to disable, and custom <n> ✅. Commander behavior is correctly handled by the analyze.ts code: undefined (omitted), true (bare flag), string (with value). The --embeddings --drop-embeddings combination safely gives true for both. The option does not have an adjacent .option that conflicts. No positional-path greedy issue under normal usage patterns (gitnexus analyze . --embeddings 0), but see finding above for --embeddings . form. Help text is clear and practical.
Analyze parsing assessment
analyze.ts:96–147 — AnalyzeOptions.embeddings?: boolean | string is declared exactly once ✅. No duplicate. The comment at lines 98–104 explains Commander's three-valued behavior clearly.
Validation (lines 343–354):
Number(options.embeddings)with!Number.isInteger(parsed) || parsed < 0correctly rejects:"abc"(NaN, not integer),"1.5"(not integer),"-1"(< 0),"NaN"(NaN, not integer),"Infinity"(not integer) ✅"0"→ 0, integer, ≥ 0 → accepted ✅"001"→ 1, accepted (reasonable) ✅""— Commander will not produce""for an optional-value flag ✅
embeddingsEnabled = !!options?.embeddings correctly handles all cases: undefined→false, true→true, "0"→true, "100000"→true ✅.
Call at lines 358–384: no stale embeddings: options?.embeddings line. Only embeddings: embeddingsEnabled and embeddingsNodeLimit passed — clean ✅.
The one issue is placement after bar.start() (see finding above).
runFullAnalysis cap assessment
run-analyze.ts:118 — DEFAULT_EMBEDDING_NODE_LIMIT = 50_000 ✅. Named clearly.
Cap logic (lines 342–363):
const nodeLimit = options.embeddingsNodeLimit ?? DEFAULT_EMBEDDING_NODE_LIMIT;
const capDisabled = nodeLimit === 0;
if (capDisabled || stats.nodes <= nodeLimit) { embeddingSkipped = false; ... }No old hardcoded if (stats.nodes <= EMBEDDING_NODE_LIMIT) outer guard exists ✅. The feature is not broken by a surviving legacy guard. 0 → capDisabled = true → always generates ✅. Custom cap → stats.nodes <= nodeLimit check ✅. Default (undefined) → 50k cap ✅.
OOM advisory fires only when capDisabled && stats.nodes > DEFAULT_EMBEDDING_NODE_LIMIT — small repos are not noisy ✅. Skip message uses nodeLimit.toLocaleString() (active cap, not hardcoded 50k) ✅. Skip message includes --embeddings 0 and --embeddings <n> guidance ✅.
embeddingSkipped starts true and is set false only when generating — preserving behavior for the preserve-existing and no-op paths ✅.
OOM/resource-safety assessment
Default path is safe: embeddingsNodeLimit is undefined → 50k cap applied. Disabling requires explicit --embeddings 0 at the CLI, which passes embeddingsNodeLimit: 0 to runFullAnalysis. No environment variable or config path can accidentally set embeddingsNodeLimit: 0 — it's only ever set from the parsed CLI arg. Programmatic callers via analyze-worker.ts pass msg.options (type AnalyzeOptions from run-analyze.ts) where embeddingsNodeLimit defaults to undefined unless the server explicitly sets it; the server currently sends no such field. No other guardrail (embedding batching, ONNX thread limits, heap size) is touched. Warning text is proportionate — one log line, not alarming.
The one gap: runFullAnalysis itself does not re-validate embeddingsNodeLimit (e.g., reject negative values from programmatic callers). The CLI validates, but the API is public. A negative value would behave like a positive cap (stats.nodes <= negative_number → always skip), so the failure mode is safe by accident but not by design. Worth documenting or adding a cheap guard in runFullAnalysis.
Test assessment
Existing coverage (unaffected by this PR):
deriveEmbeddingMode— 8 tests cover the boolean-level embedding mode derivation ✅runFullAnalysisexports and fast-path — 2 tests ✅embedding-config.test.ts— env var overrides and invalid values ✅analyze-worker-timeout.test.ts— validation pattern foranalyzeCommand✅
New coverage (required by this PR, absent):
- CLI parsing:
--embeddings abc,--embeddings -1,--embeddings 1.5,--embeddings 0,--embeddings 100000, bare--embeddings runFullAnalysiscap decisions: skip when nodes > default, generate when nodes <= custom, generate when cap disabled, advisory log only when cap disabled and nodes > default- Skip message content (includes
--embeddings 0guidance)
The analyze-worker-timeout.test.ts template maps directly to what's needed. This is the most actionable gap in the PR.
Hidden Unicode / hygiene assessment
Bidirectional Unicode check (U+202A–202E, U+2066–2069): clean — no bidi controls found in any of the three changed files.
Non-ASCII characters present: em dashes (U+2014) in comments and log strings, box-drawing characters (U+2500, U+2550) in section header comments, block characters (U+2588, U+2591) in the progress bar format string, and right arrows (U+2192) in comments. All are visible characters, pre-existing in the codebase, and none appear in executable code paths that could affect parsing or behavior. No invisible or control-character injection risk.
No stale duplicate fields, no commented-out code, no unrelated file churn. Diff is clean and scoped to the three intended files.
Elegance / maintainability assessment
The design is minimal and correct. One flag, three states (default/custom/disabled), validated at the CLI boundary. The runFullAnalysis API is extended with one optional field (embeddingsNodeLimit?: number) with clear JSDoc semantics. DEFAULT_EMBEDDING_NODE_LIMIT is named clearly. The comment at analyze.ts:340–342 explains Commander's three-valued behavior without clutter. No second flag introduced, no over-abstraction. The two AnalyzeOptions interfaces (CLI vs. core) having the same name across modules could cause confusion for future contributors navigating imports, but both are clearly scoped and the different embeddings types (CLI: boolean | string, core: boolean) reflect their distinct responsibilities.
Final verdict
Production-ready with minor follow-ups.
The core logic is correct end-to-end: --embeddings 0 does bypass the 50k cap (the old guard is gone), bare --embeddings preserves old behavior exactly, invalid inputs are rejected before the pipeline starts, the OOM guardrail holds by default, and no other option is regressed. The TypeScript types are consistent. No hidden Unicode affects executable code.
The two follow-ups to address are: (1) add automated tests for the new parsing validation and cap-decision logic — the existing analyze-worker-timeout.test.ts provides an exact template and this is the clearest gap against the project's own DoD; (2) move the validation block before bar.start() or clean up the bar on exit to avoid leaving the terminal with a hidden cursor when --embeddings abc is passed. Neither blocks the feature from working correctly, but both should be resolved before the PR is considered fully closed against the project's DoD bar.
| Branch
|
@JorrinKievit Can you please look at these findings above? 🙏 |
Addresses PR abhigyanpatwari#1375 review feedback: - Move --embeddings validation before bar.start() so an invalid value can't leave the terminal with the progress bar's hidden cursor still active. Switches process.exit(1) to the sibling pattern (process.exitCode = 1; return). - Extract the node-count cap decision into deriveEmbeddingCap() in embedding-mode.ts so the branching can be unit-tested without spinning up LadybugDB or the pipeline. - Add analyze-embeddings-limit.test.ts: rejects abc/-1/1.5/NaN/Infinity before the pipeline starts; bare flag forwards undefined limit; --embeddings 0 forwards 0; --embeddings <n> forwards n; omitted flag keeps embeddings off. - Extend run-analyze.test.ts with deriveEmbeddingCap cases covering the default cap, boundary equality, custom cap (above + below default), and the cap-disabled sentinel.
done! |
|
Claude finished @magyargergo's task in 4m 15s —— View job PR #1375 Review —
|
| Item | Value |
|---|---|
| Head SHA | 730f9f9 (2 commits on branch since main) |
| Status | Open, non-draft |
| Commits | 2 (feat: add optional limit arg + test+chore: cover --embeddings [limit]) |
| Changed files | 6 — cli/index.ts, cli/analyze.ts, core/embedding-mode.ts, core/run-analyze.ts, test/unit/analyze-embeddings-limit.test.ts, test/unit/run-analyze.test.ts |
| CI — Dependency Review | ✅ Passed |
| CI — Gitleaks | ✅ Passed |
| CI — CodeQL | ✅ Passed |
| CI — Main CI | ⏳ In progress |
| CI — Docker Build | ⏳ In progress |
| Issue #382 | Linked via "Closes #382" ✅ |
| Stale PR #389 | Intentionally superseded — one flag, three states ✅ |
| Hidden Unicode (bidi/invisible) | None found in any changed file — em dashes, box-drawing, bullets in comments only (all visible, pre-existing) ✅ |
| Vercel | Irrelevant — CLI/core only ✅ |
| Latest code vs PR body | Exceeds PR body — embedding-mode.ts extraction and full test coverage added in commit 2, not mentioned in original PR description |
Note for reviewer: The PR now has 6 changed files, not 3. The second commit (
test+chore) addressed both gaps flagged in the prior review: (1) missing automated tests and (2) validation firing afterbar.start(). Both are resolved.
Understanding of the change
The 50k-node cap exists because generating embeddings for very large graphs can exhaust heap memory — the ONNX runtime loads full model weights and processes every node's text. Users on large monorepos had no in-CLI way to opt out of the automatic skip. This PR folds the override into the existing --embeddings [limit] flag rather than adding a second flag.
Commander's optional-value syntax produces three states: undefined (omitted), true (bare flag), string (with value). The CLI's AnalyzeOptions.embeddings?: boolean | string captures this; embeddingsEnabled = !!options?.embeddings converts to boolean for the core; embeddingsNodeLimit carries the parsed numeric cap as a clean typed value. The validation block now runs before bar.start(), matching all sibling validations.
Critically, the cap logic has been extracted from inline code in run-analyze.ts into a pure deriveEmbeddingCap(nodeCount, embeddingsNodeLimit) function in embedding-mode.ts — the same file as deriveEmbeddingMode. This makes both branch contracts unit-testable without spinning up LadybugDB, and the new tests take full advantage of this.
Findings
[Informational] Commander tokenizer makes --embeddings -1 unreachable from real CLI
- Category: CLI usability, Commander edge case
- Files:
gitnexus/test/unit/analyze-embeddings-limit.test.ts:44 - Issue: The test at line 44 lists
'-1'(string) in the invalid-values array and passes it asembeddings: '-1'directly toanalyzeCommand(). This correctly tests the programmatic path. Via the actual CLI however,--embeddings -1causes Commander to tokenize-1as a flag-like token, not as the optional value — sooptions.embeddingsbecomestrue(bare flag) and Commander emits an "unknown option" error. Theparsed < 0guard atanalyze.ts:184is therefore unreachable from the real CLI for the specific case of-1. For--embeddings=-1(equals form), Commander does parse it as the string"-1"and the guard fires correctly. - Why it matters: The test documents programmatic safety but the test label says "rejects invalid --embeddings value -1". A future contributor may not realize the actual CLI behavior differs from what the test implies.
- Recommended fix: Add a comment in the test explaining the CLI vs programmatic distinction for
-1, or document it in the help text: "Note:--embeddings -1is intercepted by Commander as an unknown option; use--embeddings=-1to test the validator." Not blocking — behavior is safe either way. - Blocks merge: No.
[Informational] OOM advisory and skip message text are not directly tested
- Category: Test coverage (DoD §2.7)
- Files:
gitnexus/src/core/run-analyze.ts:351–364,gitnexus/test/unit/run-analyze.test.ts - Issue:
deriveEmbeddingCapis fully tested (6 cases covering all branches). But the log strings assembled inrun-analyze.tslines 352–364 (OOM advisory text and skip message with--embeddings 0/--embeddings <n>guidance) are not asserted. If someone changes the message text or removes the advisory branch, no test fails. - Why it matters: Messages are user-visible and part of the UX promise made in the PR description. Minor — behavior correctness is covered; only text content is untested.
- Recommended fix: Optionally add a spy on
callbacks.onLogin a future test ofrunFullAnalysisto assert message content for the skip and advisory paths. Not required for merge. - Blocks merge: No.
[Informational] runFullAnalysis does not re-validate embeddingsNodeLimit for programmatic callers
- Category: API robustness, OOM guardrail (DoD §2.1)
- Files:
gitnexus/src/core/run-analyze.ts:69,gitnexus/src/core/embedding-mode.ts:55–63 - Issue: The CLI validates
embeddingsNodeLimit(non-negative integer, no decimals). A programmatic caller that bypasses the CLI and passesembeddingsNodeLimit: -5would havederiveEmbeddingCap(nodeCount, -5)→nodeCount > -5→ always true → always skip. The failure mode is safe by accident (always skips for negative), but the API contract is silent on this. - Why it matters: Low practical risk — no current caller passes negative values and the CLI guards against it. But worth a JSDoc note on
embeddingsNodeLimitin the coreAnalyzeOptionsinterface. - Recommended fix: Add to the JSDoc at
run-analyze.ts:69: "Negative values are not validated here — use the CLI for user-facing input validation. A negative value will behave as an always-skip cap." One line, no code change needed. - Blocks merge: No.
CLI parsing assessment
index.ts:27 — --embeddings [limit] ✅. Help text (lines 28–30) is clear, concise, and actionable: mentions off-by-default, 50k cap, 0 to disable, custom <n>. ✅
analyze.ts:177–193 — Validation block sits before bar.start() at line 305. This resolves the prior review's finding about hidden cursor on error exit. The block uses process.exitCode = 1; return to match every other early-exit validation in the function. ✅
Validation coverage:
"abc"→Number("abc")= NaN,Number.isInteger(NaN)= false → rejected ✅"1.5"→Number.isInteger(1.5)= false → rejected ✅"NaN"→ NaN path → rejected ✅"Infinity"→Number.isInteger(Infinity)= false → rejected ✅"-1"(string, programmatic) →parsed < 0→ rejected ✅"0"→ parsed=0, integer, ≥0 → accepted,embeddingsNodeLimit=0✅"100000"→ integer, ≥0 → accepted,embeddingsNodeLimit=100000✅true(bare flag) → string branch skipped,embeddingsNodeLimit=undefined✅undefined(omitted) → string branch skipped,embeddingsEnabled=false✅
No duplicate embeddings: property — only embeddings: embeddingsEnabled at line 367. ✅
--embeddings --drop-embeddings: Commander sees --drop-embeddings as a flag token, so options.embeddings = true and options.dropEmbeddings = true. Both are handled correctly downstream by deriveEmbeddingMode. ✅
Core embedding cap assessment
embedding-mode.ts:55–63 — deriveEmbeddingCap is a pure function extracting the cap-decision logic. This is a net improvement over the previous inline implementation noted in the prior review.
const nodeLimit = embeddingsNodeLimit ?? DEFAULT_EMBEDDING_NODE_LIMIT;
const capDisabled = nodeLimit === 0;
const skipForCap = !capDisabled && nodeCount > nodeLimit;Truth table:
embeddingsNodeLimit |
nodeCount |
capDisabled |
skipForCap |
|---|---|---|---|
undefined |
≤50k | false | false → generate ✅ |
undefined |
>50k | false | true → skip ✅ |
0 |
any | true | false → generate ✅ |
100000 |
≤100k | false | false → generate ✅ |
100000 |
>100k | false | true → skip ✅ |
No surviving legacy if (stats.nodes <= EMBEDDING_NODE_LIMIT) outer guard. ✅
run-analyze.ts:343–365:
embeddingSkippedstartstrue, set tofalseonly when generating ✅- OOM advisory fires only when
capDisabled && stats.nodes > DEFAULT_EMBEDDING_NODE_LIMIT— small repos are not noisy ✅ - Skip message uses
nodeLimit.toLocaleString()(active cap, not hardcoded 50k) ✅ semanticModeset in all branches ('vector-index'or'exact-scan'), falls back viaruntimeCapabilities✅
OOM guardrail assessment
The default path is safe: embeddingsNodeLimit is undefined in AnalyzeOptions and the CLI only sets it when the user passes a string value. No environment variable or config path can accidentally disable the cap — the only assignment in the codebase is embeddingsNodeLimit = parsed at analyze.ts:192, which requires an explicit string argument.
false, null, and empty string cannot disable the cap: !!options?.embeddings with false or null sets embeddingsEnabled=false, short-circuiting the entire cap decision. Empty string cannot reach the validator from Commander.
Custom caps above 50k do not trigger the OOM advisory — only capDisabled && stats.nodes > DEFAULT_EMBEDDING_NODE_LIMIT does. A user setting --embeddings 75000 on a 60k-node repo proceeds silently; the guardrail is the user's custom choice. This is correct behavior.
The error handler in analyze.ts:541–555 already guides users to increase NODE_OPTIONS=--max-old-space-size if they hit OOM during analysis. The OOM advisory for --embeddings 0 does not duplicate heap guidance — it references the cap rationale only. Appropriate.
Existing embedding behavior assessment
deriveEmbeddingMode is unchanged. It receives options.embeddings as boolean (after embeddingsEnabled = !!options?.embeddings), which is identical to the previous behavior. All 8 existing tests pass. ✅
--drop-embeddings alone: embeddingsEnabled=false → shouldGenerateEmbeddings=false, shouldLoadCache=false — unchanged. ✅
--embeddings 0 --drop-embeddings: embeddingsEnabled=true, embeddingsNodeLimit=0. deriveEmbeddingMode({embeddings:true, dropEmbeddings:true}, n) → shouldGenerateEmbeddings=true, shouldLoadCache=false. Then deriveEmbeddingCap(nodes, 0) → generate with no cache load. Embeddings are generated fresh from scratch. Coherent and intentional. ✅ (Covered by test at run-analyze.test.ts:135.)
Already-up-to-date fast path: returns before reaching any cap logic. embeddingsNodeLimit is not read. repoName derivation is unchanged. ✅
Embedding server mapping and backend selection: unchanged — runEmbeddingPipeline signature and call site at run-analyze.ts:401–418 are unmodified. ✅
Test assessment
analyze-embeddings-limit.test.ts (new, 101 lines):
- Uses the same mock-and-call pattern as
analyze-worker-timeout.test.ts✅ - Mocks:
run-analyze,lbug-adapter,repo-manager,git,max-file-size— correct minimal set ✅ NODE_OPTIONSinjection atbeforeEach:41bypassesensureHeap()re-exec ✅vi.resetModules()+ dynamicimport('../../src/cli/analyze.js')per test ensures clean module state ✅- Invalid values:
abc,-1,1.5,NaN,Infinity→ assertsprocess.exitCode===1,runFullAnalysisnot called, error message contains expected text ✅ - Bare
--embeddings→opts.embeddingsNodeLimitisundefined✅ --embeddings 0→opts.embeddingsNodeLimit === 0✅--embeddings 100000→opts.embeddingsNodeLimit === 100000✅- Omitted
--embeddings→opts.embeddings === false✅
run-analyze.test.ts (extended, +41 lines):
- Imports
deriveEmbeddingCapandDEFAULT_EMBEDDING_NODE_LIMITfromembedding-mode.tsdirectly ✅ - 6
deriveEmbeddingCaptests: default cap, exceeds default, boundary (equals cap = no skip),limit=0disables, custom above/below, custom below default ✅ - Tests exercise the production function, not copied logic ✅
- Tests would fail before this PR (the function didn't exist) ✅
Coverage gaps (minor, non-blocking):
- OOM advisory and skip message text not asserted
--embeddings -1test tests programmatic path; CLI path differs (Commander tokenizer)
Hidden Unicode / hygiene assessment
Bidi/invisible controls (U+202A–202E, U+2066–2069): none found in any of the 6 changed files. ✅
Non-ASCII characters present: em dashes (U+2014) in comments, en dashes (U+2013) in ranges, box-drawing chars (U+2500/U+2550) in section headers, block chars (U+2588/U+2591) in progress bar format, bullet (U+2022) in error messages, right arrow (U+2192) in comments. All visible, pre-existing in the codebase, none in executable code paths. ✅
No stale or duplicate properties: only embeddings: embeddingsEnabled at analyze.ts:367 — the old embeddings: options?.embeddings that could have leaked Commander string types into core is absent. ✅
No unrelated file churn. No package-lock changes. Diff is precisely scoped to the 4 source files + 2 test files for this feature. ✅
Elegance / maintainability assessment
The extraction of deriveEmbeddingCap into embedding-mode.ts is the standout design decision. Both pure derivation functions now live in the same module, separated from all native dependencies, and can be tested in isolation. This goes beyond what was asked and materially improves the codebase's testability surface.
CLI concerns stay in the CLI layer: boolean | string parsing, process.exitCode management, progress bar lifecycle. Core receives embeddings: boolean + embeddingsNodeLimit?: number — no Commander semantics leak. ✅
DEFAULT_EMBEDDING_NODE_LIMIT is defined once in embedding-mode.ts and re-exported from run-analyze.ts for backward compatibility with external callers. ✅
The comment at analyze.ts:98–104 explaining Commander's three-valued behavior is clear and appropriately placed. ✅
The implementation is the smallest correct solution: no speculative abstraction, no second flag, no broad embeddings refactor. Easy to extend if a future config-based cap is added (just another embeddingsNodeLimit source, same core path). ✅
Final verdict
Production-ready.
All three blocking signals from the prior review are resolved: automated tests now cover CLI parsing validation and cap-decision logic; the validation block was moved before bar.start() so terminal state is clean on error exit; the cap logic is centralized in a pure deriveEmbeddingCap function with no surviving legacy guard. The core API boundary is clean — runFullAnalysis receives typed boolean + number | undefined, never a Commander string. No hidden Unicode affects executable code. The deriveEmbeddingCap extraction is a concrete improvement to the codebase's testability beyond the stated scope of the PR.
The three remaining informational items (Commander tokenizer behavior for --embeddings -1, untested advisory message text, and the absence of embeddingsNodeLimit validation for programmatic callers) are documented above but none constitute a correctness risk or production hazard. The OOM guardrail holds by default, explicit opt-out works correctly, and invalid CLI inputs are rejected before the pipeline starts.
| Branch
Summary
Reimplements stale PR #389 with the design @magyargergo suggested in review — collapse the proposed
--ignore-embeddings-node-limitflag into an optional argument on the existing--embeddingsflag so the API stays small.Closes #382.
Why
The 50K-node embedding cap is a real OOM guardrail, but it's a hard blocker on large mono-repos. Users hitting it had no in-CLI way to opt in to running anyway. The original PR added a second flag; this version folds the override into the existing flag instead — same capability, smaller API surface.
Behavior changes
--embeddings 0is used on a repo larger than the default cap, a one-line warning logs that the cap exists for a reason.Implementation
cli/index.ts—--embeddingsbecomes--embeddings [limit]cli/analyze.ts— parses + validates the optional argument, forwardsembeddingsNodeLimittorunFullAnalysiscore/run-analyze.ts— replaces the hard-codedEMBEDDING_NODE_LIMITconstant with the per-call override;0is treated as "no cap"Test Plan
npx gitnexus analyze --embeddingson a >50K-node repo → embeddings skipped, message mentions--embeddings 0npx gitnexus analyze --embeddings 0on the same repo → embeddings generated, OOM advisory loggednpx gitnexus analyze --embeddings 100000→ custom cap honorednpx gitnexus analyze --embeddings abc→ fails fast with validation errornpx gitnexus analyze --help→ shows the new[limit]syntaxderiveEmbeddingModeunit tests still pass (boolean coercion preserved)