feat: add --ignore-embeddings-node-limit flag to bypass 50K node cap by RyuzakiH · Pull Request #389 · abhigyanpatwari/GitNexus

RyuzakiH · 2026-03-19T19:12:00Z

Summary

This PR adds a new CLI flag, --ignore-embeddings-node-limit, to the analyze command.

When used together with --embeddings, this flag bypasses the existing 50,000-node safety cap for embedding generation.

It also updates the skip message to clearly inform users that they can override the limit with --ignore-embeddings-node-limit.

Closes #382

Changes

Added --ignore-embeddings-node-limit to the analyze command
Allowed embedding generation to continue beyond the 50,000-node limit when the flag is provided
Updated the skip message to mention the override flag

Test Plan

Run npx gitnexus analyze --embeddings on a repository with more than 50,000 nodes and verify that embeddings are skipped with the updated message mentioning --ignore-embeddings-node-limit
Run npx gitnexus analyze --embeddings --ignore-embeddings-node-limit on the same repository and verify that embeddings are generated
Run npx gitnexus analyze --help and verify that the new flag appears in the CLI help output
Run the existing test suite (npm test) and verify that there are no regressions

vercel · 2026-03-19T19:12:05Z

@RyuzakiH is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

magyargergo · 2026-03-19T20:13:17Z

  .option('--embeddings', 'Enable embedding generation for semantic search (off by default)')
+  .option('--ignore-embeddings-node-limit', 'Bypass the 50,000-node safety cap for embedding generation (use with --embeddings on large repos)')


We could merge them together and have

--embeddings [limit]

What do you think?

zander-raycraft · 2026-03-21T17:20:46Z

@claude review this, check for redundancies or misses in optimization for the method for this new flag. Can you also suggest an e2e test for this

github-actions · 2026-03-21T17:21:02Z

Claude finished @zander-raycraft's task in 1m 53s —— View job

Code Review: `--ignore-embeddings-node-limit` flag

Gather context and read changed files
Review for redundancies and optimization misses
Suggest e2e test

Review

The implementation is minimal and correct. The logic in analyze.ts:271–275 is the key change:

if (stats.nodes > EMBEDDING_NODE_LIMIT && !options.ignoreEmbeddingsNodeLimit) {
  embeddingSkipReason = `skipped (${stats.nodes.toLocaleString()} nodes > ${EMBEDDING_NODE_LIMIT.toLocaleString()} limit; use --ignore-embeddings-node-limit to override)`;
} else {
  embeddingSkipped = false;
}

This is clean — no redundancy, no wasted work. Short-circuit logic is handled correctly by the outer if (options?.embeddings) guard.

Issues Found

1. Silent no-op when flag is used without --embeddings (medium)

--ignore-embeddings-node-limit without --embeddings is silently ignored. Users who misread the docs will see no feedback. A one-liner warning would help:

// In analyze.ts, after the verbose check (~line 81):
if (options?.ignoreEmbeddingsNodeLimit && !options?.embeddings) {
  console.warn('  Warning: --ignore-embeddings-node-limit has no effect without --embeddings\n');
}

Fix this →

2. No warning about memory risk when bypassing the cap (low)

The 50K limit exists to prevent OOM. When it's bypassed, the user gets no indication that they're in risky territory. Consider logging a one-time advisory after the node count is known:

// After the embeddingSkipped = false branch:
if (options.ignoreEmbeddingsNodeLimit) {
  console.warn(`  Warning: bypassing node limit (${stats.nodes.toLocaleString()} nodes) — ensure sufficient memory\n`);
}

3. Reviewer's suggestion: --embeddings [limit] (design question)

magyargergo suggested collapsing both flags into --embeddings [limit] so --embeddings uses the default cap and --embeddings 0 bypasses it. This would be a cleaner UX but is a larger API change. The current two-flag approach works fine as long as the no-op case (point 1) is handled.

4. AnalyzeOptions interface field placement (nit)

ignoreEmbeddingsNodeLimit is placed at line 49 between embeddings and skills. It would read more clearly directly after embeddings — which it already is. This is fine as-is.

Suggested E2E Test

The challenge: we can't easily create a >50K node repo in tests. The best approach is a two-part strategy — a unit test for the branch logic, and a CLI-flag-registration test using the existing e2e harness.

Unit test — add to test/unit/cli-commands.test.ts or a new test/unit/analyze-node-limit.test.ts:

import { describe, it, expect, vi, beforeEach } from 'vitest';

// Mock all heavy deps so we can call analyzeCommand in-process
vi.mock('../../src/core/ingestion/pipeline.js', () => ({
  runPipelineFromRepo: vi.fn().mockResolvedValue({
    graph: {},
    repoPath: '/tmp/fake',
    totalFileCount: 10,
    communityResult: { stats: { totalCommunities: 0 }, communities: [] },
    processResult: { stats: { totalProcesses: 0 } },
  }),
}));
vi.mock('../../src/core/lbug/lbug-adapter.js', () => ({
  initLbug: vi.fn().mockResolvedValue(undefined),
  loadGraphToLbug: vi.fn().mockResolvedValue({ warnings: [] }),
  getLbugStats: vi.fn().mockResolvedValue({ nodes: 75_000, edges: 0 }), // over limit
  executeQuery: vi.fn().mockResolvedValue([{ cnt: 0 }]),
  executeWithReusedStatement: vi.fn().mockResolvedValue(undefined),
  closeLbug: vi.fn().mockResolvedValue(undefined),
  createFTSIndex: vi.fn().mockResolvedValue(undefined),
  loadCachedEmbeddings: vi.fn().mockResolvedValue({ embeddingNodeIds: new Set(), embeddings: [] }),
}));
vi.mock('../../src/storage/repo-manager.js', () => ({
  getStoragePaths: vi.fn().mockReturnValue({ storagePath: '/tmp/.gitnexus', lbugPath: '/tmp/.gitnexus/lbug.db' }),
  saveMeta: vi.fn().mockResolvedValue(undefined),
  loadMeta: vi.fn().mockResolvedValue(null),
  addToGitignore: vi.fn().mockResolvedValue(undefined),
  registerRepo: vi.fn().mockResolvedValue(undefined),
  getGlobalRegistryPath: vi.fn().mockReturnValue('/tmp/.gitnexus/registry.json'),
  cleanupOldKuzuFiles: vi.fn().mockResolvedValue({ found: false, needsReindex: false }),
}));
vi.mock('../../src/storage/git.js', () => ({
  getCurrentCommit: vi.fn().mockReturnValue('abc123'),
  isGitRepo: vi.fn().mockReturnValue(true),
  getGitRoot: vi.fn().mockReturnValue('/tmp/fake'),
}));
vi.mock('../../src/cli/ai-context.js', () => ({
  generateAIContextFiles: vi.fn().mockResolvedValue({ files: [] }),
}));
vi.mock('../../src/cli/skill-gen.js', () => ({
  generateSkillFiles: vi.fn().mockResolvedValue({ skills: [] }),
}));

describe('analyzeCommand -- embeddings node limit', () => {
  let consoleOutput: string[];
  let exitCode: number | undefined;
  let embeddingPipelineCalled: boolean;

  beforeEach(() => {
    consoleOutput = [];
    exitCode = undefined;
    embeddingPipelineCalled = false;
    vi.spyOn(process, 'stdout', 'get').mockReturnValue(process.stdout); // no-op
    vi.spyOn(console, 'log').mockImplementation((...args) => { consoleOutput.push(args.join(' ')); });
    vi.spyOn(process, 'exit').mockImplementation((code?: number) => { exitCode = code; throw new Error('process.exit'); });
  });

  it('skips embeddings when node count exceeds limit and flag not set', async () => {
    vi.doMock('../../src/core/embeddings/embedding-pipeline.js', () => ({
      runEmbeddingPipeline: vi.fn().mockImplementation(() => { embeddingPipelineCalled = true; }),
    }));
    const { analyzeCommand } = await import('../../src/cli/analyze.js');

    await analyzeCommand('/tmp/fake', { embeddings: true }).catch(() => {});

    const summary = consoleOutput.find(l => l.includes('Embeddings'));
    expect(summary).toMatch(/skipped/);
    expect(summary).toMatch(/--ignore-embeddings-node-limit/);
    expect(embeddingPipelineCalled).toBe(false);
  });

  it('runs embeddings when node count exceeds limit and --ignore-embeddings-node-limit is set', async () => {
    const runEmbeddingPipeline = vi.fn().mockResolvedValue(undefined);
    vi.doMock('../../src/core/embeddings/embedding-pipeline.js', () => ({ runEmbeddingPipeline }));
    const { analyzeCommand } = await import('../../src/cli/analyze.js');

    await analyzeCommand('/tmp/fake', { embeddings: true, ignoreEmbeddingsNodeLimit: true }).catch(() => {});

    expect(runEmbeddingPipeline).toHaveBeenCalled();
    const summary = consoleOutput.find(l => l.includes('Embeddings'));
    expect(summary).not.toMatch(/skipped/);
  });
});

CLI-flag-registration test — add to the existing describe('unhappy path') block in test/integration/cli-e2e.test.ts:

it('analyze --help lists --ignore-embeddings-node-limit flag', () => {
  const result = runCliRaw(['analyze', '--help'], MINI_REPO);
  if (result.status === null) return;

  expect(result.status).toBe(0);
  expect(result.stdout).toMatch(/--ignore-embeddings-node-limit/);
  expect(result.stdout).toMatch(/--embeddings/);
});

This is lightweight, has zero deps on a large repo, and guards against the flag being accidentally removed.

Summary

Finding	Severity	Actionable?
Silent no-op without `--embeddings`	Medium	Yes — add a `console.warn`
No OOM warning when bypassing cap	Low	Yes — add advisory log
`--embeddings [limit]` API design	Design	Maintainer decision
Implementation logic	✅ Correct	No changes needed

zander-raycraft · 2026-03-21T18:12:24Z

@RyuzakiH can you address the issues raised in the comments above in a new commit to this pr?

ShunsukeHayashi · 2026-03-22T19:33:38Z

This is a must-have for Enterprise adoptions.

We process and track context across multiple large repositories and enterprise codebases for autonomous agents (using OpenClaw MAS with 39 agents). The 50,000-node limit has been a hard blocker for computing the full blast radius on enterprise-scale mono-repos.

We absolutely need the ability to compute full embeddings across large codebases, even if it takes more compute resources. This --ignore-embeddings-node-limit flag allows us to run DeepWiki generation on massive repos during CI/CD without the agent losing scope of distant dependencies.

Tested this locally by patching our version and it works great. Looking forward to this getting merged.

magyargergo · 2026-03-28T14:43:44Z

⚠️ Upcoming Prettier formatting — rebase instructions

PR #563 adds Prettier as the code formatter for the repo. When it merges, the bulk format commit will touch ~350 files (style-only: whitespace, quotes, trailing commas). Your branch will likely conflict.

After #563 merges, rebase your branch:

git fetch origin
git checkout <your-branch>
git rebase origin/main

# Conflicts will be formatting-only — accept your version:
git checkout --theirs .
git add .
git rebase --continue

# Then re-format your branch to match the new style:
npx prettier --write .
git add -A
git commit -m "style: apply prettier formatting"
git push --force-with-lease

New setup step: Run npm install at the repo root (not just in gitnexus/) to get prettier + activate the pre-commit hook. The hook auto-formats staged files on every commit going forward.

JorrinKievit · 2026-05-05T20:09:22Z

@magyargergo Hey! Also looking for this change, should I open a new PR?

magyargergo · 2026-05-05T20:50:55Z

@JorrinKievit feel free to

feat: add --ignore-embeddings-node-limit flag to bypass 50K node cap

1a74f8a

magyargergo reviewed Mar 19, 2026

View reviewed changes

zander-raycraft added enhancement New feature or request bug-fix PR tag for bug fixes labels Mar 21, 2026

zander-raycraft removed the bug-fix PR tag for bug fixes label Mar 21, 2026

magyargergo closed this May 5, 2026

JorrinKievit mentioned this pull request May 6, 2026

feat: add optional limit arg to --embeddings flag (closes #382) #1375

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add --ignore-embeddings-node-limit flag to bypass 50K node cap#389

feat: add --ignore-embeddings-node-limit flag to bypass 50K node cap#389
RyuzakiH wants to merge 1 commit into
abhigyanpatwari:mainfrom
RyuzakiH:feat/ignore-embeddings-node-limit

RyuzakiH commented Mar 19, 2026

Uh oh!

vercel Bot commented Mar 19, 2026

Uh oh!

magyargergo Mar 19, 2026

Uh oh!

zander-raycraft commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

zander-raycraft commented Mar 21, 2026

Uh oh!

ShunsukeHayashi commented Mar 22, 2026

Uh oh!

magyargergo commented Mar 28, 2026

Uh oh!

JorrinKievit commented May 5, 2026

Uh oh!

magyargergo commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		.option('--embeddings', 'Enable embedding generation for semantic search (off by default)')
		.option('--ignore-embeddings-node-limit', 'Bypass the 50,000-node safety cap for embedding generation (use with --embeddings on large repos)')

Uh oh!

Conversation

RyuzakiH commented Mar 19, 2026

Summary

Changes

Test Plan

Uh oh!

vercel Bot commented Mar 19, 2026

Uh oh!

magyargergo Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

zander-raycraft commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: --ignore-embeddings-node-limit flag

Review

Issues Found

Suggested E2E Test

Summary

Uh oh!

zander-raycraft commented Mar 21, 2026

Uh oh!

ShunsukeHayashi commented Mar 22, 2026

Uh oh!

magyargergo commented Mar 28, 2026

⚠️ Upcoming Prettier formatting — rebase instructions

Uh oh!

JorrinKievit commented May 5, 2026

Uh oh!

magyargergo commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions Bot commented Mar 21, 2026 •

edited

Loading

Code Review: `--ignore-embeddings-node-limit` flag