Skip to content

fix: prevent stack overflow and memory exhaustion on large repo analysis#814

Merged
magyargergo merged 10 commits into
mainfrom
copilot/fix-analysis-failure-large-repos
Apr 13, 2026
Merged

fix: prevent stack overflow and memory exhaustion on large repo analysis#814
magyargergo merged 10 commits into
mainfrom
copilot/fix-analysis-failure-large-repos

Conversation

Copilot AI commented Apr 13, 2026

Copy link
Copy Markdown
Contributor
  • Fix prettier formatting in lbug-adapter.ts
  • Diagnose skip-git-cli.test.ts failure: --stack-size= in NODE_OPTIONS rejected by Node 24
  • Fix ensureHeap() in analyze.ts: pass --stack-size as direct CLI arg only, keep only --max-old-space-size in NODE_OPTIONS
  • Verify skip-git-cli.test.ts passes (both tests green)
  • Remove debug test files
  • Run validation (Code Review ✅, CodeQL ✅)

@vercel

vercel Bot commented Apr 13, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Apr 13, 2026 6:52pm

Request Review

- Convert c3Linearize from recursive to iterative (explicit work stack)
  to handle deep class hierarchies without stack overflow
- Replace push(...arr) spread patterns with safe loops in parse-worker.ts
  and lbug-adapter.ts to prevent stack overflow on large arrays
- Stream relationship CSV lines directly to per-pair temp files in
  lbug-adapter.ts instead of accumulating millions of lines in memory
- Add test for deep 500-level inheritance chain

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/9cb2eed2-adc7-4fa4-9216-e7ac3facb9b5

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
- Auto-set --stack-size=4096 alongside --max-old-space-size in analyze
  command to prevent stack overflow on deep class hierarchies
- Add helpful error guidance for known large-repo failure modes
  (stack overflow, heap OOM, Map size limits)

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/9cb2eed2-adc7-4fa4-9216-e7ac3facb9b5

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
- Add error handling for write stream close in lbug-adapter.ts
- Handle backpressure when writing relationship CSV lines to disk
- Clarify ENTER/MERGE phase transition comment in resolve.ts
- Fix inconsistent stack size in error message (4096 not 8192)

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/9cb2eed2-adc7-4fa4-9216-e7ac3facb9b5

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix analysis failures on large repositories fix: prevent stack overflow and memory exhaustion on large repo analysis Apr 13, 2026
Copilot AI requested a review from magyargergo April 13, 2026 18:15
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude is this pr ready for production? Is this following clean code principles? Are there any performance bottlenecks? Is the code structured in a way that will unlock easy contributions in the future?

@github-actions

github-actions Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 2m 5s —— View job


PR Review: fix: prevent stack overflow and memory exhaustion on large repo analysis

Tasks

  • Understand the request
  • Read changed files
  • Evaluate production readiness
  • Assess clean code principles
  • Identify performance bottlenecks
  • Assess contribution-friendliness

Summary

The PR fixes three real, well-understood crash categories in large repo analysis. The root cause diagnosis is correct, and the approaches are sound. This is close to production-ready, but there are a few issues worth addressing before merging.


Production Readiness

Overall: 🟡 Mostly ready — one correctness concern and one resource-leak risk

✅ What's solid

  • Iterative C3 linearization (resolve.ts): The ENTER/MERGE two-phase pattern is logically correct. Cycle detection, parent ordering (reversed push so leftmost parent is computed first), and cache-fall-through when all parents are already cached all work as intended.
  • Streaming relationships to disk (lbug-adapter.ts): Replacing the in-memory relsByPair Map with per-pair WriteStream is the right fix. Backpressure handling via rl.pause()/ws.once('drain', ...) is correct.
  • push(...arr)for...of (parse-worker.ts): Simple and correct. The comment added is good.
  • Stack size flag (analyze.ts): Defensively added to ensureHeap() re-exec alongside the existing heap flag.

🔴 Resource leak — write streams not closed on readline error

In lbug-adapter.ts, if the readline emits an error, the Promise rejects (via rl.on('error', reject)) but the per-pair WriteStreams in pairWriteStreams are never closed:

// lbug-adapter.ts ~L259
await new Promise<void>((resolve, reject) => {
  const rl = createInterface({ ... });
  rl.on('line', (line) => { ... ws.write(...) ... });
  rl.on('close', resolve);
  rl.on('error', reject);  // ← WriteStreams are not cleaned up on this path
});

If the source CSV is corrupted or a disk error occurs mid-read, all open write streams become orphaned. Consider adding a finally-style cleanup or using rl.on('error', (err) => { for (const ws of pairWriteStreams.values()) ws.destroy(); reject(err); }).

Fix this →


🟡 Test depth is conservatively low

The new test uses DEPTH = 500 (mro-processor.test.ts:556):

// test/unit/mro-processor.test.ts
const DEPTH = 500;

The recursive version would typically overflow at ~1000–5000 levels depending on the platform's default stack. At 500, the test may pass with either the old or new implementation on some systems, making it a weaker regression guard than intended. DEPTH = 2000 would be a safer threshold that reliably triggers the old bug on all platforms.

Fix this →


Clean Code

Overall: 🟢 Good — well-commented, intentional trade-offs

  • resolve.ts: The const ENTER = 0, MERGE = 1 approach works, but using a union type (type Phase = 'enter' | 'merge') would make the intent more self-documenting without any performance cost. The existing comments are thorough.
  • Redundant guard in MERGE phase (resolve.ts:125-129): The check if (!directParents || directParents.length === 0) at the top of the MERGE phase is unreachable — the ENTER phase already handles that case and pops the frame. Removing it would reduce cognitive noise.
  • lbug-adapter.ts:353 uses failedPairCsvPaths.includes(pairCsvPath) on an array inside a loop. A Set<string> would be cleaner and O(1) here, though in practice the failure list is short.
  • The inProgress?: Set<string> parameter on c3Linearize is preserved for API compatibility but is less intuitive for the iterative implementation — a brief comment clarifying it's legacy-compatible would help.

Performance Bottlenecks

Overall: 🟡 No regressions introduced; one pre-existing O(n²) hotspot worth noting

  • C3 MERGE loop (resolve.ts:161-189): The sequences.some((s) => s.length > 0) outer loop combined with other.indexOf(candidate, 1) is O(n²) in the number of ancestors per class. This existed in the original recursive version too — not a regression — but it's the dominant cost for deeply diamond-inherited hierarchies.
  • Streaming lbug-adapter.ts: The new streaming approach is strictly better than the previous in-memory accumulation for large repos. No performance concerns introduced.
  • buildParentMapFromHeritage (resolve.ts:249): Uses the correct head-pointer BFS pattern (noted in the comment), avoiding the O(n) Array.shift() per dequeue. Well done.
  • pairWriteStreams growth: In extreme cases (many unique FROM→TO label pairs), you could accumulate a large number of open file descriptors. The current implementation is bounded by the number of node type combinations (~30 types = bounded), so this is not a practical concern.

Contribution Friendliness

Overall: 🟢 Good structure, one navigation concern

  • The new streaming section in lbug-adapter.ts is well-commented and follows the existing pattern of the file. New contributors can understand why disk streaming is used instead of in-memory accumulation.
  • lbug-adapter.ts is already long (~1100 lines after this PR). The relationship-loading logic could be extracted into a helper (streamRelsToDisk) to make the loadGraphToLbug function easier to follow at a glance. Not blocking, but worth considering.
  • analyze.ts error message additions (lines 299–315) are clear and actionable for users — excellent UX.
  • The test structure in mro-processor.test.ts matches existing patterns and is easy to extend.

Verdict

Criterion Rating
Production ready 🟡 Merge after fixing write-stream leak
Clean code 🟢 Good
Performance 🟢 No regressions
Contribution-friendly 🟢 Good

Blocking: The write-stream resource leak on readline error in lbug-adapter.ts.
Recommended: Increase test DEPTH from 500 to 2000.
Nice-to-have: Remove redundant MERGE-phase guard in resolve.ts, switch failedPairCsvPaths to a Set.

- Fix write-stream resource leak on readline error by destroying all
  open WriteStreams before rejecting (lbug-adapter.ts)
- Switch failedPairCsvPaths from array to Set for O(1) lookup
- Remove redundant MERGE-phase empty-parents guard in resolve.ts
  (unreachable — ENTER phase already handles that case)
- Increase deep inheritance test DEPTH from 500 to 2000 for
  reliable regression coverage across platforms

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/cf1f3e22-3864-454a-a3a5-2bded9ebfdba

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude is this pr ready for production? Is this following clean code principles? Are there any performance bottlenecks? Is the code structured in a way that will unlock easy contributions in the future?

@magyargergo magyargergo marked this pull request as ready for review April 13, 2026 18:27
@github-actions

github-actions Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 1m 34s —— View job


PR Review: fix: prevent stack overflow and memory exhaustion on large repo analysis

Tasks

  • Understand the request
  • Read changed files
  • Evaluate production readiness
  • Assess clean code principles
  • Identify performance bottlenecks
  • Assess contribution-friendliness

Summary

This PR fixes three real crash categories in large repo analysis: stack overflow from recursive C3 linearization, memory exhaustion from in-memory relationship accumulation, and push(...arr) stack overflows on very large arrays. The previous review identified several issues — all have been addressed in the latest commit. This is production-ready.


Production Readiness: 🟢 Ready to merge

All previously-blocking issues have been resolved:

✅ Write-stream resource leak — FIXED
lbug-adapter.ts:304–308 now correctly destroys all open write streams on readline error:

rl.on('error', (err) => {
  for (const ws of pairWriteStreams.values()) ws.destroy();
  reject(err);
});

✅ Test depth — FIXED
mro-processor.test.ts:548 now uses DEPTH = 2000 with a 15s timeout, reliably covering the failure threshold on all platforms. The comment explains the intent clearly.

✅ Redundant MERGE-phase guard — REMOVED
resolve.ts:122–125 now carries only a comment confirming the invariant holds, with no dead code branch.

failedPairCsvPaths — CONVERTED TO Set
lbug-adapter.ts:326 uses new Set<string>() and .has() / .add() throughout.

One minor observation that's pre-existing (not introduced by this PR):

gatherAncestors (resolve.ts:31) still uses queue.shift() which is O(n) per dequeue. The comment in buildParentMapFromHeritage correctly uses a head-pointer pattern to avoid this, but gatherAncestors — used separately by mro-processor.ts — doesn't. Not a blocker; just worth a future follow-up.


Clean Code: 🟢 Good

  • resolve.ts: The two-phase ENTER/MERGE stack pattern is well-commented with clear invariant notes. The const ENTER = 0, MERGE = 1 approach works; the comments are sufficient to orient new contributors. No dead code remains.
  • lbug-adapter.ts: The streaming logic is clearly explained with inline comments covering the backpressure rationale (pause/drain). The failedPairCsvPaths Set usage is idiomatic.
  • parse-worker.ts:765–767: The comment explaining why push(...arr) was replaced is exactly right — this kind of documentation of a non-obvious constraint is good practice.
  • analyze.ts:298–315: The error detection + actionable suggestion block is thorough and user-facing UX is meaningfully improved.
  • inProgress?: Set<string> parameter on c3Linearize: This optional parameter is now effectively unused in the iterative implementation (line 72 always creates a local visiting set or uses the passed-in one). A brief comment noting it's preserved for API compatibility would help future contributors understand why it exists.

Performance: 🟢 No regressions; improvements in two areas

  • Streaming lbug-adapter.ts: Disk streaming with backpressure is strictly better than in-memory Map<string, string[]> for large repos. Memory usage during relationship loading is now O(write buffer) rather than O(total relationships).
  • Iterative C3 in resolve.ts: The work-stack approach correctly avoids call-stack growth. The MERGE loop's sequences.some() + other.indexOf(candidate, 1) is O(n²) in ancestors per class — same as the original recursive version, not a regression.
  • buildParentMapFromHeritage head-pointer BFS: The queue[head++] pattern is correct and avoids O(n) Array.shift() per dequeue for wide hierarchies.
  • Stack flag in ensureHeap(): The --stack-size=4096 flag is a sound defensive measure. The check !nodeOpts.includes('--stack-size') prevents double-application.
  • One pre-existing concern: gatherAncestors (used for graph-level MRO emission, not the lookup path) uses queue.shift() — O(n) per dequeue. This predates this PR and doesn't affect the fixes introduced here.

Contribution Friendliness: 🟢 Good

  • The streaming pattern in lbug-adapter.ts is well-commented; a contributor can follow the flow without deep knowledge of the codebase.
  • The iterative C3 algorithm in resolve.ts is annotated at each phase boundary — the ENTER/MERGE labeling mirrors standard textbook descriptions and maps directly to the comment block at the top of c3Linearize.
  • analyze.ts error messages tell users exactly what to try, reducing support burden.
  • lbug-adapter.ts is now ~1100 lines. The relationship loading block (lines 249–408) is cohesive but long. Extracting it into a streamRelationshipsToDisk helper would improve navigability for future contributors — not blocking, but worth considering when the file next grows.
  • Test structure in mro-processor.test.ts matches existing patterns; easy to add further cases.

Verdict

Criterion Rating
Production ready 🟢 Ready to merge
Clean code 🟢 Good
Performance 🟢 No regressions
Contribution-friendly 🟢 Good

All blocking and recommended items from the prior review have been addressed. The only remaining items are minor pre-existing concerns not introduced by this PR.

Copilot AI and others added 2 commits April 13, 2026 18:35
Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/b5ca33c4-bb03-402f-a206-21ea7e1e310e

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/b5ca33c4-bb03-402f-a206-21ea7e1e310e

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/fc59cd11-348b-4e22-b9ea-98787300de48

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
…compat)

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/fc59cd11-348b-4e22-b9ea-98787300de48

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
@magyargergo magyargergo merged commit 6388113 into main Apr 13, 2026
14 checks passed
@magyargergo magyargergo deleted the copilot/fix-analysis-failure-large-repos branch April 13, 2026 19:08
@magyargergo magyargergo mentioned this pull request Apr 13, 2026
2 tasks
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
…sis (abhigyanpatwari#814)

* Initial plan

* fix: prevent stack overflow and memory issues on large repo analysis

- Convert c3Linearize from recursive to iterative (explicit work stack)
  to handle deep class hierarchies without stack overflow
- Replace push(...arr) spread patterns with safe loops in parse-worker.ts
  and lbug-adapter.ts to prevent stack overflow on large arrays
- Stream relationship CSV lines directly to per-pair temp files in
  lbug-adapter.ts instead of accumulating millions of lines in memory
- Add test for deep 500-level inheritance chain

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/9cb2eed2-adc7-4fa4-9216-e7ac3facb9b5

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* fix: add stack size flag and enhanced error messages for large repos

- Auto-set --stack-size=4096 alongside --max-old-space-size in analyze
  command to prevent stack overflow on deep class hierarchies
- Add helpful error guidance for known large-repo failure modes
  (stack overflow, heap OOM, Map size limits)

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/9cb2eed2-adc7-4fa4-9216-e7ac3facb9b5

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* fix: address code review feedback

- Add error handling for write stream close in lbug-adapter.ts
- Handle backpressure when writing relationship CSV lines to disk
- Clarify ENTER/MERGE phase transition comment in resolve.ts
- Fix inconsistent stack size in error message (4096 not 8192)

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/9cb2eed2-adc7-4fa4-9216-e7ac3facb9b5

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* fix: address PR review — resource leak, redundant guard, Set, test depth

- Fix write-stream resource leak on readline error by destroying all
  open WriteStreams before rejecting (lbug-adapter.ts)
- Switch failedPairCsvPaths from array to Set for O(1) lookup
- Remove redundant MERGE-phase empty-parents guard in resolve.ts
  (unreachable — ENTER phase already handles that case)
- Increase deep inheritance test DEPTH from 500 to 2000 for
  reliable regression coverage across platforms

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/cf1f3e22-3864-454a-a3a5-2bded9ebfdba

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* style: fix prettier formatting in lbug-adapter.ts

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/b5ca33c4-bb03-402f-a206-21ea7e1e310e

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* chore: revert unintended package.json/lock changes

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/b5ca33c4-bb03-402f-a206-21ea7e1e310e

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* fix: strip NODE_OPTIONS in skip-git-cli test child processes

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/fc59cd11-348b-4e22-b9ea-98787300de48

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* fix: don't put --stack-size in NODE_OPTIONS (rejected by Node 24)

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/fc59cd11-348b-4e22-b9ea-98787300de48

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* fix: pass --stack-size as CLI arg only, not in NODE_OPTIONS (Node 24 compat)

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/fc59cd11-348b-4e22-b9ea-98787300de48

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Analysis fails on large repositories (Maximum call stack size exceeded)

2 participants