feat(cli): analyze --all to re-index all registered repos (#253)#1010
feat(cli): analyze --all to re-index all registered repos (#253)#1010azizur100389 wants to merge 3 commits into
Conversation
…twari#253) Mirrors `clean --all` — iterates ~/.gitnexus/registry.json and spawns a child `gitnexus analyze <path>` for each entry. Rationale for child-process isolation: each repo gets fresh LadybugDB handles, progress bar, heap, and monkey-patched console — a crash in one repo cannot corrupt siblings, and the existing analyze flow (SIGINT, ensureHeap, progress-bar setup) is reused verbatim without in-process state-reset hazards. - New `--all` flag on the analyze command. Mutually exclusive with [path], --name, --allow-duplicate-name — rejection happens BEFORE any filesystem or registry access. - Flags forwarded to each child: --force, --embeddings, --skills, --skip-agents-md, --no-stats, --skip-git, -v. - process.execArgv is propagated to the child spawn so tsx/tsnode dev setups and vitest-spawned integration tests work (otherwise children die with ERR_UNKNOWN_FILE_EXTENSION on .ts sources). - Missing-path entries (repo deleted on disk) are skipped with a warning; one failure does not halt the batch. Matches clean --all per-repo error-tolerance semantics. - Summary: N succeeded, M skipped, K failed. Exit 1 if K > 0, else 0 — surfaces partial-failure to cron / CI. - 5 integration tests cover: happy path (indexedAt advances for both registered repos), empty registry, two mutual-exclusion branches ([path], --name), missing-path tolerance (one deleted repo gets skipped, sibling still re-indexes). No destructive operations added. analyze --all only reads the registry; children write entries via their own registerRepo calls (existing, tested path). Zero new fs.rm sites, zero new safety surface.
|
@azizur100389 is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 7093 tests passed 97 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
|
Claude finished @magyargergo's task in 4m 0s —— View job Production-ready bar for this PRThis is a new
Findings1. Correctness & Functional CompletenessNo blocking correctness issues. Key invariants verified:
NON-BLOCKING: Missing integration test for
|
| Check | Result |
|---|---|
| No secrets / tokens committed | ✅ Verified |
| No blind find-and-replace renames | ✅ Verified |
| Conventional commit PR title | ✅ feat(cli): analyze --all … |
| No HIGH/CRITICAL impact unaddressed | ✅ New flag, no shared symbol edits |
| PolyForm license metadata untouched | ✅ Verified |
| Docs updated for public CLI contract change | ✅ In-CLI help text updated in index.ts; --all not in internal AGENTS.md but the block is auto-regenerated by analyze itself |
What I could not fully verify
listRegisteredRepos()error handling for a malformedregistry.json. If the JSON is corrupted, theawait listRegisteredRepos()call inanalyzeAllBranchwould throw. This would surface as an unhandled rejection in the async function, which Node.js would turn into a process exit with a non-zero code. The behavior is survivable but not explicitly tested. (Pre-existing concern, not introduced by this PR.)- Windows behavior of
process.argv[1]for globally-installed npm packages. On Windows, global npm binaries are typically invoked via a.cmdwrapper, andprocess.argv[1]may or may not point to the actual.jsfile depending on the Node/npm version. Cannot verify from diff alone. - Whether the 5-minute and 7-minute integration test timeout budgets are reliably met in CI. The CI run completed in 304s total (all tests), which suggests these are acceptable — but individual test timing is not visible in the CI report.
Final verdict
✅ Ready to merge — no blocking issues.
The implementation is correct, well-isolated, and consistent with the clean --all pattern already in the repo. All three safety invariants (LadybugDB single-writer, no new fs.rm surfaces, mutual-exclusion checks before I/O) are preserved. CI is green on all platforms. The one NON-BLOCKING gap (missing --allow-duplicate-name test) is real but the code path itself is wired and verified by inspection — acceptable to merge and add the test as a follow-up.
| * (missing-path entries). 1 when at least one entry's child exited | ||
| * non-zero — surfaces batch partial-failure to cron / CI. | ||
| */ | ||
| const analyzeAllBranch = async (options: AnalyzeOptions | undefined): Promise<void> => { |
There was a problem hiding this comment.
We need to be careful with this! Imagine having 100s of cloned repositories and somebody runs GitNexus with --all option and there's a possibility it may crash the computer because of CPU exhaustion. I'd implement a queue and depending on resource availability i'd start processing them. Also don't forget the pipeline itself also spawns subproceses. Maybe it would be faster to not to call the whole gitnexus but to call the ingestion pipeline with the necessary arguments.
There was a problem hiding this comment.
Thanks for the review — good catch, and I want to make sure I implement the right queue shape rather than guess and iterate.
Current state (for calibration)
The current implementation is strictly sequential — spawnSync blocks the for-loop until each child exits (analyze.ts:190–198), so only ONE analyze runs at any moment. Pipeline-internal parse workers are already CPU-bounded inside each child. That said, your concern about sustained load over 100s of repos is still valid — even sequential, that's hours of 100% CPU with no graceful pause.
Which queue shape do you have in mind?
A few options with tradeoffs; happy to implement whichever you prefer:
(a) Throttled sequential with resource check. Stay 1-at-a-time, but check os.loadavg() between children and pause if load > threshold. Downside: os.loadavg() returns [0, 0, 0] on Windows, so the gate is Unix-only unless we proxy through something else.
(b) --concurrency <n> with default 1. Opt-in parallelism for users who know their machine can handle it. Default 1 = current sequential behaviour. Downside: contradicts the CPU-exhaustion framing — parallel makes CPU pressure worse, not better — so I don't think this is what you meant, flagging for disambiguation.
(c) Confirmation gate + estimated duration. Print "About to re-index N repos sequentially. Estimated time: ~{N × avg}s. Pass --yes to confirm." and require -y, --yes for non-interactive runs. Same pattern as clean --all --force. Makes the resource cost visible before the user commits to it.
(d) Something else — happy to build it if you have a specific shape in mind.
The pipeline-direct suggestion
On "call the ingestion pipeline with the necessary arguments" — I deliberately went child-process-per-repo to keep LadybugDB handles, progress bar, SIGINT, and the monkey-patched console cleanly isolated per repo. Calling runFullAnalysis in-process would save ~1–2s process-startup per repo but means (i) a crash in one repo aborts the batch, (ii) state-reset hazards between iterations we'd need to prove idempotent. Happy to revisit if the queue design ends up needing in-process for some other reason — but I'd rather not bundle it in unless there's a concrete motivation.
Will hold on code changes until you point at the shape you want. 🙏
There was a problem hiding this comment.
This is very primitive but could be a good starting point
import si from 'systeminformation';
const CPU_THRESHOLD = 80;
const MEM_THRESHOLD = 85;
setInterval(async () => {
const [load, mem] = await Promise.all([
si.currentLoad(),
si.mem(),
]);
const cpu = load.currentLoad; // %
const memUsed = (mem.used / mem.total) * 100; // %
const shouldThrottle = cpu > CPU_THRESHOLD || memUsed > MEM_THRESHOLD;
console.log({
cpu: cpu.toFixed(1) + '%',
mem: memUsed.toFixed(1) + '%',
shouldThrottle,
});
}, 1000);More information about the npm package: https://www.npmjs.com/package/systeminformation
There was a problem hiding this comment.
Thanks for the concrete pointer — implemented in `1fb6bc36` using `systeminformation` exactly as you sketched.
What shipped
Defaults match your sketch 1:1: 80% CPU / 85% memory thresholds, 1-second poll interval, simple `cpu > CPU_THRESHOLD || memUsed > MEM_THRESHOLD` predicate.
Shape: between each `--all` child spawn, the batch polls CPU + memory. If either metric is above threshold, it prints `⏸ Throttling — CPU X.X% / mem Y.Y% (thresholds 80% / 85%) — waiting before [i/N]...` and re-polls every second. When both drop below threshold, it prints `▶ Resuming — CPU X.X% / mem Y.Y%` and spawns the next child.
No max-wait timeout by design. A host pinned above threshold is exactly the case you asked us to protect, so bailing after some arbitrary deadline would silently defeat the safety. Ctrl-C is the user's graceful exit.
Escape hatches:
- `--no-throttle` CLI flag for dedicated CI / build agents that accept the full resource cost
- `GITNEXUS_THROTTLE_CPU` / `GITNEXUS_THROTTLE_MEM` env vars for advanced tuning without code changes
On broken metrics provider (e.g. `systeminformation` platform issue): the batch falls through WITHOUT throttling rather than hanging indefinitely. Unit-tested.
Pipeline-direct suggestion
Kept child-process-per-repo. The isolation story (each repo gets fresh LadybugDB handles, progress bar, monkey-patched console, SIGINT handler) is what lets one repo crash without corrupting the batch. Happy to revisit if you want to swap in-process for a specific reason later — but since your main concern (resource pressure) is now addressed at the orchestration layer, I'd rather not bundle that bigger change in here.
Tests
- 15 new unit tests on `resource-throttle.ts` covering the predicate (both-under / CPU-only / mem-only / both-over / boundary), polling trajectory (idle-on-first-poll, multi-poll-then-clear), provider-throw tolerance, and env-var parsing (valid / NaN / 0 / negative / >100 / empty / fractional).
- 1 new integration test locks in `--no-throttle` bypass: sets thresholds to 1% (which would hang indefinitely without bypass) and asserts the batch completes AND the Throttling messaging is absent from output.
Local verification on the final commit:
- `tsc --noEmit` clean
- `resource-throttle.test.ts` 15/15
- Full unit suite 4216 pass (2 pre-existing git-utils env failures unchanged, unrelated — exist on pristine upstream main)
- `cli-e2e.test.ts -t "analyze --all|remove|clean --all|analyze --name|no-throttle"` 11/11 in 178s
CI running.
|
@azizur100389 Please take into consideration the commend I left about CPU exhaustion. 🙏 |
…1010 review) Address @magyargergo's CPU-exhaustion concern on PR abhigyanpatwari#1010. Per his review sketch, pause between --all iterations when CPU > 80% or memory > 85% so the batch paces itself to the host's real available headroom rather than running flat-out across 100s of repos. - New src/cli/resource-throttle.ts with: - Pure shouldThrottle(cpu, mem, thresholds) predicate - waitForResourceAvailability polling loop (no max-wait by design: the reviewer wants the protection to actually work, and Ctrl-C is the user's exit) - readThresholdsFromEnv honouring GITNEXUS_THROTTLE_CPU and GITNEXUS_THROTTLE_MEM for advanced tuning - Lazy import of systeminformation so CLI startup cost is unchanged when --all isn't used - Metrics-provider contract as a test seam — a broken provider (throw / platform issue) falls through WITHOUT hanging the batch (regression test locks this in) - analyze --all now calls waitForResourceAvailability before each child spawn; prints Throttling and Resuming messages with the current percentages at the right moments - --no-throttle CLI flag disables the check entirely for dedicated CI / build agents that accept the full resource cost - Commander 14 stores --no-throttle as throttle: false; AnalyzeOptions field name follows that convention - Uses systeminformation (already suggested by @magyargergo) — added as a runtime dependency - 15 new unit tests cover the predicate, polling trajectory, provider- throw tolerance, and env parsing (including fallback on NaN / zero / negative / over-100 / empty) - 1 new integration test proves --no-throttle actually bypasses the loop by setting the threshold to 1% (which would hang indefinitely without --no-throttle) Defaults (80% / 85%) match the reviewer's sketch exactly. No destructive operations added; throttling is additional caution, not reduced caution.
|
Claude finished @magyargergo's task in 2m 53s —— View job Production-ready bar for this PRThis PR adds
Findings1. Correctness & Functional CompletenessNo blocking correctness issues.
NON-BLOCKING: No
|
| Check | Result |
|---|---|
| No secrets / tokens committed | ✅ Verified |
| No blind find-and-replace renames | ✅ Verified |
| No HIGH/CRITICAL impact left unaddressed | ✅ New CLI flag, no shared symbol edits |
| PR titles follow conventional-commit format | ✅ feat(cli): ... and fix(cli): ... |
| PolyForm license metadata untouched | ✅ Verified (package.json:license unchanged) |
New dependency auditable + in dependencies (not devDependencies) |
✅ systeminformation@^5.31.5 — runtime dep, lazy-loaded, CI green on 3 platforms |
| Docs updated for public CLI contract change | ✅ --no-throttle, GITNEXUS_THROTTLE_CPU, GITNEXUS_THROTTLE_MEM documented in help text (index.ts:47–59) |
What I could not fully verify
- Windows
process.argv[1]format for a globally-installed npm binary. On Windows, the global shim may resolve to a.cmdwrapper path, not the.jsentry. Cannot verify from diff. systeminformationmem.usedsemantics across platforms — whether the 85% threshold triggers at consistent "effective" memory utilization on macOS vs Linux vs Windows.- Whether the spin-wait in the happy-path integration test (
Date.now() < waitUntilloop) reliably creates a distinct ISO-second timestamp boundary in all CI environments (very fast machines). In practice, 1.1s is generous and CI completed in 304s total, suggesting this is reliable. listRegisteredRepos()error handling for a malformedregistry.json— would surface as an unhandled rejection. Pre-existing concern, not introduced here.
Final verdict
✅ Ready to merge — no blocking issues.
The throttle implementation directly addresses @magyargergo's CPU-exhaustion concern. It is well-designed: a pure, mockable predicate; a lazy-loaded systeminformation provider; safe provider-throw fallthrough; and a clean --no-throttle escape for CI agents. Unit coverage (15 tests) and integration coverage (6 tests) are thorough. CI is green on all three platforms.
The two NON-BLOCKING gaps (missing --allow-duplicate-name test; indefinite hang on perpetually-throttled machine) are both documented, pre-accepted tradeoffs or minor follow-up items. Neither blocks merge.
I'm not sure about this! I don't think we should let LLMs to be able to bypass this. Generally speaking, it is good to keep in mind that reseource exhaustion comes with a great cost.
I'd put a warning on overusing resrouces
I don't think this is a good argument, mainly because we will run subprocesses concurrently and the progressbar and other details from the subprocesses may cause a mess in the terminal. We need to write a manager for this that will handle this cleanly. Please try to spec out a @azizur100389 This feature needs more work but with this we will head to the right direction. |
…olds (abhigyanpatwari#1010 review) Address 2 of 4 points from @magyargergo's round-2 review on PR abhigyanpatwari#1010. Done in this commit: - Remove the --no-throttle CLI flag entirely. Per his direction: "I don't think we should let LLMs to be able to bypass this. Resource exhaustion comes with a great cost." The safeguard is now non-bypassable by any user-surface flag; analyze --all always pauses when CPU > 80% or memory > 85%. - Add warnIfThresholdsRisky (fires at analyze --all start when either GITNEXUS_THROTTLE_CPU or GITNEXUS_THROTTLE_MEM is set to >= 90%). Matches his "I'd put a warning on overusing resources". Operators can still tune the env vars, but if they push the threshold so high the safeguard barely engages, they see a clear one-time warning naming the specific metric at risk (CPU-only vs memory-only vs both). - Remove the --no-throttle integration test; replace with an integration test that asserts the overuse warning fires with correct specificity (CPU=95% shown, memory= not shown because default), and the batch still completes. - 10 new unit tests on the warning helper cover defaults (no warn), CPU-only / memory-only / both-risky trigger paths, the 89.9% safe boundary, and the constant's documented 90% value. Deferred to a follow-up PR (the other 2 review points): - Concurrent execution across repos - ProcessManager that runs the ingestion pipeline in-process (instead of spawning the whole gitnexus CLI) with per-repo LadybugDB connections Rationale for deferring: this is a significant architecture shift — concurrent LadybugDB connections (the single-writer invariant @magyargergo himself flagged earlier needs to be revisited), progress-bar multiplexing for concurrent children, and state-reset hazards for in-process pipeline calls. Too large to bundle onto an already-wide PR. Will open a separate RFC-style PR to spec the ProcessManager design before implementation. Local verification: - tsc --noEmit clean - resource-throttle.test.ts 25/25 (10 new warning-helper tests) - Full unit suite 4226 pass + 2 pre-existing git-utils env failures - cli-e2e.test.ts -t "analyze --all|remove|clean --all|analyze --name|throttle" 11/11 in 179s
|
@magyargergo — thanks for the round-2 direction. Pushed `6892c0d5`. 2 of 4 addressed in this commit1. Removed `--no-throttle` ✅ 2. Warning on overusing resources ✅ Tests: 10 new unit tests on the warning helper (defaults / CPU-only / memory-only / both-risky / 89.9% boundary / constant check) and a new integration test that exercises the full CLI path — sets `GITNEXUS_THROTTLE_CPU=95`, asserts the warning fires with correct specificity (CPU-only, no false memory mention), and that the batch still completes. 2 of 4 deferred to a follow-up PR — significant architecture shift3. Concurrent execution + 4. ProcessManager running the pipeline in-process with per-repo LadybugDB connections I've deferred these to a separate follow-up PR rather than bundling them on #1010. Reason: this is a significant architecture shift with real design questions, not a mechanical change:
ProposalShip #1010 with the trim above (the non-bypassable safeguard + env warning) as the "headed in the right direction" first step you called out, and I'll open a separate RFC-style PR to spec the `ProcessManager` design — you can weigh in on the shape before I implement. That keeps the architecture work visible and reviewable, rather than hidden inside a feature PR. If you'd rather block #1010 on the full redesign, say so and I'll either (a) close #1010 and open a fresh PR starting from the ProcessManager design, or (b) spike the ProcessManager here — your call. Local verification on 6892c0d
|
|
I'm sorry but we cannot ship your changes without it. We must maintain high-quality and good user experinece. This would move it to the wrong direction. |
|
You're right — I'll build the Implementation approach. Each pipeline will run in a Defaults I'll use unless you want different:
OK to proceed with this shape? |
|
Thanks for the direction on this one, @magyargergo. Closing this PR — the ProcessManager / worker_threads / in-process-pipeline architecture you've outlined is substantial enough to warrant its own focused PR rather than being bolted on top of this one. Mixing a significant architecture review with ~500 lines of partial implementation muddies both. Two things from this branch worth preserving for the redesign:
Happy to open a doc-only RFC PR for the ProcessManager design separately if that's useful — the architecture review then happens on a diff that's specifically about shape, not tangled with code. Your point about non-bypassable resource safeguards landed, and the 80/85 + env-var warning design is what ships. That's the piece of this PR that doesn't go to waste. |
|
Apologies for any inconvenience this may have caused. I agree that this may need some additional thought. It would be best to open a new issue so we can discuss the design and agree on the right direction before moving forward with the PR. We appreciate the effort you’ve put into this work, and there is clear value in it as it stands. |
Problem (#253)
Users with many indexed repos have no batch re-index command.
clean --allexists;analyze --allis the natural counterpart. After pulling updates across multiple projects, or running a scheduled re-index cron, users currently have to either script a loop over paths externally or iterate overgitnexus listoutput manually.What this PR adds
New flag:
gitnexus analyze --all [--force] [--embeddings] [--skills] [--skip-agents-md] [--no-stats] [-v]Iterates
~/.gitnexus/registry.jsonand spawns a childgitnexus analyze <path>for each entry.Mutually exclusive with
[path],--name <alias>,--allow-duplicate-name— those are inherently single-repo. The exclusion check runs BEFORE any registry read or filesystem access; users see a clear error with exit 1.Why child-process isolation (not in-process loop)
Each repo gets:
ensureHeap()re-exec pathA crash or unhandled rejection in one repo's analyze cannot corrupt sibling state. The existing analyze flow (SIGINT cleanup, heap re-spawn, progress bar,
runFullAnalysis) is reused verbatim — no in-process state-reset hazards, no need to prove idempotency of a 1000-line orchestrator across batch iterations.Overhead: ~1-2s per repo for process startup. Acceptable —
--allis cron/maintenance, not interactive.Missing-path tolerance
If a registry entry's path no longer exists on disk (user
rm -rf'd a project), the entry is skipped with a warning before spawning. Matchesclean --all's per-repo error-tolerance: one stale entry doesn't halt the batch.execArgv inheritance
Child spawn passes
process.execArgvthrough — critical for tsx/tsnode dev setups and for the vitest integration tests. Without it, children die withERR_UNKNOWN_FILE_EXTENSIONtrying to load.tssources. No-op in production (compiled.js, empty execArgv).Summary output
Exit 1 if K > 0, else 0. Skipped entries are not failures (they're registry self-heal candidates). Cron / CI sees partial-failure correctly.
Safety — no guardrails added or weakened
analyze --allonly spawns child analyze processes. Nofs.rmsites added.listRegisteredRepos(). Children write via their ownregisterRepocalls — the existing, tested path (withRegistryNameCollisionError+canonicalizePath+assertSafeStoragePathfrom #1003 still in force).Per the #1003 lineage: zero new safety surface, zero new
fs.rmsites.Tests
5 new integration tests in
test/integration/cli-e2e.test.ts:analyze --all --force→ assertindexedAtadvances on both, summary reports2 succeeded, 0 skipped, 0 failed, per-repo headings format correctly[path]→ exit 1 with--all cannot be combined with [path]...--name→ exit 1 with same shape, mentions--namerm -rf'd between registration and--all; summary reports1 succeeded, 1 skipped, 0 failed, the keep-repo's.gitnexus/stays intactVerification (local)
tsc --noEmitcleannpm run test:unit— 4184 pass (2 pre-existing git-utils env failures unchanged, unrelated; they also fail on pristine upstream main)cli-e2e.test.ts -t "analyze --all|remove|clean --all|analyze --name"— 10/10 pass (5 new + 5 pre-existing)Out of scope
analyze --all --parallel <n>for concurrent re-indexing. Flagged in the issue comments; declined for v1 because it raises LadybugDB single-writer contention, progress-bar multiplexing, and memory-pressure design questions — all better handled as a follow-up after this ships and real usage patterns emerge.Closes #253.