Skip to content

fix(csharp): eliminate global-namespace typeBindings O(files²) OOM (#1871)#1954

Merged
magyargergo merged 3 commits into
mainfrom
issue-1871
May 31, 2026
Merged

fix(csharp): eliminate global-namespace typeBindings O(files²) OOM (#1871)#1954
magyargergo merged 3 commits into
mainfrom
issue-1871

Conversation

@magyargergo

@magyargergo magyargergo commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #1871gitnexus analyze hangs for hours and OOMs at "Resolving types (Csharp [2/3] — analyzing types)" on large C# codebases with tens of thousands of files in the global (no-declared-namespace) namespace.

PR #1905 fixed the BindingRef twin of this via the workspaceFqnBindings fast-path, but left the typeBindings propagation loop in populateCsharpNamespaceSiblings untouched. That loop copies every global file's module-scope return-type bindings into every other global file's Scope.typeBindings:

  • with S files in the '' bucket and K distinct method names → O(S²) time and O(S·K) memory
  • at the reporter's ~36k files that's ~1.3B Map entries (~65–130 GB) → the 10-hour hang + OOM at 128 GB they reported (and why REGISTRY_PRIMARY_CSHARP=0 sidesteps it).

Root cause, measured

Instrumented a concentrated global-namespace fixture (unique method names = the real-world case):

Files typeBinding copies time scaling
500 249,500 5.0s
1000 999,000 10.6s linear
2000 3,998,000 (≈S²) 65s 3.06× = quadratic

A subtle detail hid this for a long time: if every method shares a name, the module-typeBinding keys collide and all copies are skipped (innerSets=0) — which is exactly what the old benchmark fixture did.

Fix

Route global-namespace module typeBindings through a new scope-independent workspaceTypeBindings channel, populated once (O(K)) from the '' bucket and consulted as a fallback by the two typeBindings chain-walkers (findReceiverTypeBinding, followChainPostFinalize) — instead of copying per file. This is the typeBindings analogue of #1905's workspaceFqnBindings.

After: 2000 files 6.3s (was 65s), 4000 files 6.8s, heap linear.

Faster and more correct

The C# spec makes the unnamed namespace a single declaration space whose members are "available for use in a named namespace" — global types are visible from every file. The old per-file copy only exposed them to other no-namespace files; named-namespace files never saw them (a latent correctness bug). Consulting one shared channel from every scope chain mirrors how Roslyn resolves against a single Compilation.GlobalNamespace symbol rather than copying symbols per file.

Regression guard

Strengthened csharp-pipeline-benchmark.test.ts so it would actually catch this:

  • unique method name per file (a shared name collapses the key and skips all copies, hiding the blow-up)
  • concentrated scales raised to 2000 so the existing < 3× sub-quadratic assertion trips on the regression (pre-fix 1000→2000 was 3.06×).

Validation

Files

6 files, +90/−16:

  • model/scope-resolution-indexes.ts — new workspaceTypeBindings channel
  • finalize-orchestrator.ts — init
  • languages/csharp/namespace-siblings.ts — populate once, skip '' in per-file copy
  • scope-resolution/scope/walkers.ts + scope-resolution/passes/imported-return-types.ts — fallback in the two chain-walkers
  • test/integration/csharp-pipeline-benchmark.test.ts — regression guard

🤖 Generated with Claude Code


Update — named-namespace generalization folded in (#1871 follow-up)

The original commit fixed the OOM only for the global ('' / no-declared-namespace) bucket. The tri-review flagged, and a follow-up plan confirmed, that a solution with all files under one named namespace (e.g. file-scoped namespace Company.Product;, common in modern .NET) reproduced the same O(files²) blow-up — in both loops: the BindingRef per-scope augmentation (#1905's twin) and the typeBindings per-file copy. This PR now ships the complete fix (global + named).

What was added (second commit):

  • Namespace-keyed shared channels namespaceFqnBindings / namespaceTypeBindings (per-namespace analogues of the flat workspaceFqnBindings / workspaceTypeBindings) + an accessibleNamespacesByScope gate, populated once per named bucket from the existing expandedNamespaces derivation — O(defs), not O(files × defs).
  • The shared walkers (findReceiverTypeBinding, lookupBindingsAt, followChainPostFinalize) are now namespace-aware: after the per-scope chain and the flat global channel miss, they consult the per-namespace channels gated by the caller module's accessible namespaces. Language-neutral — only the C# hook populates the channels (AGENTS rule).
  • Precedence preserved: local chain → named namespace → global (named-before-global, matching the pre-gitnexus analyze hangs/OOMs during [Resolving types (Csharp [2/3] - analyzing types] on large same-namespace codebases #1871 chain order so a name present in both still resolves named-first).
  • using static member exposure and the global '' fast-paths are unchanged.

Parity-neutralscripts/run-parity.ts --language csharp passes (legacy DAG == registry-primary, 218 tests each); C# resolver suite 386 tests green; scope-resolution unit suite 1104 green; new always-run walker-fallback coverage added (namespace-channel-lookup.test.ts), including the workspaceTypeBindings channel that previously had only gated-benchmark coverage.

Measured (concentrated named namespace, unique method names):

Files time scaling
500 4.3s
1000 4.8s 0.56× (linear)
2000 5.6s 0.58× (linear)

(vs. the quadratic 65s-at-2000 the global case showed pre-fix; edges scale linearly, confirming cross-file resolution still works.)

Post-Deploy Monitoring & Validation

  • Healthy signal: gitnexus analyze completes on large C# repos (single namespace or no namespace) in roughly linear time; no OOM at the "Resolving types (Csharp 2/3)" stage; heap stays bounded.
  • Watch: analyze wall-clock and peak RSS on large C#/Unity/.NET repos; the scope-parity CI job (must stay green — output-neutrality guarantee).
  • Failure signal / rollback trigger: any C# edge-count delta in the parity gate (would mean the refactor was not output-neutral), or a renewed hang/OOM at the type-resolution stage. Mitigation: REGISTRY_PRIMARY_CSHARP=0 remains the escape hatch; revert the two #1871 commits to fall back to per-file copying.
  • Validation window/owner: first large-C#-repo analyze after merge; repo maintainers.

…1871)

Large C# solutions with tens of thousands of files in the global
(unnamed) namespace OOM'd / hung for hours at "Resolving types
(Csharp 2/3)". PR #1905 fixed the BindingRef twin of this via the
`workspaceFqnBindings` fast-path, but left the typeBindings
propagation loop in `populateCsharpNamespaceSiblings` untouched: it
copies every global file's module-scope return-type bindings into
every OTHER global file's `Scope.typeBindings`. With S files in the
`''` bucket and K distinct method names, that is O(S²) time and
O(S·K) memory — ~1.3B Map entries (~65-130 GB) at 36k files.

Measured on a concentrated global-namespace fixture: the per-file
copy went quadratic (1000→2000 files = 3.06× for 2× the files,
65s at 2000). Route global-namespace module typeBindings through a
new scope-independent `workspaceTypeBindings` channel populated ONCE
(O(K)) and consulted as a fallback by the typeBindings chain-walkers
(`findReceiverTypeBinding`, `followChainPostFinalize`), instead of the
per-file copy. After: 2000 files 6.3s, 4000 files 6.8s, heap linear.

This also makes resolution MORE correct, not just faster. The C#
spec makes the unnamed namespace a single declaration space whose
members are "available for use in a named namespace", so global types
are visible from every file. The old per-file copy only exposed them
to OTHER no-namespace files; named-namespace files never saw them.
Consulting the shared channel from every scope chain mirrors how
Roslyn resolves against a single `Compilation.GlobalNamespace` symbol
rather than copying symbols per file.

Strengthen csharp-pipeline-benchmark.test.ts so it would catch this:
give each file a unique method name (a shared name collapses the
module-typeBinding key and skips all copies, hiding the blow-up) and
raise the concentrated scales to 2000 so the sub-quadratic assertion
trips on the regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented May 31, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment May 31, 2026 4:51pm

Request Review

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Tri-Review: #1954 — eliminate global-namespace typeBindings O(files²) OOM

Methods (3 engines). GitNexus swarm (risk, test/CI) + Compound-Engineering personas (correctness, adversarial, performance, maintainability, testing) = 7 Claude lanes, plus Codex (live, structured output) — the one genuinely independent engine. The two Claude families share priors, so the strong signal below is Codex + a Claude lane agreeing; Claude-only agreement is "consistent across personas." Findings 1–2 were also coordinator-verified by code-read, and gated through the synthesis critic.

Verdict: sound and mergeable for the reported case. CI is all-green — including typecheck, lint, format, and scope-parity (the broadening did not drift parity snapshots). The core fix is correct and the perf claim holds (concentrated-global 2000 files 65s→6.3s, heap linear).

Validated — credit where due

Codex and the Claude correctness/adversarial lanes independently refuted the main hazards, with file:line traces I re-confirmed:

  • No cross-language contamination. findReceiverTypeBinding is shared, but scopeResolutionPhase runs resolution per language and each runScopeResolution builds a fresh workspaceTypeBindings: new Map() (finalize-orchestrator.ts:148); only C# populates it. A PHP/JS/Go receiver can never reach a C# global binding. (Codex high + adversarial)
  • No mis-shadowing. Both walkers consult the per-scope chain first; the workspace map is a miss-only fallback, so local/named bindings still win. (Codex high + correctness)
  • No infinite loop in followChainPostFinalizews !== current + visited set + depth cap bound it. (Codex high + correctness)
  • Broadening is C#-spec-correct (global types visible from named-namespace files) and parity-clean.

Findings (both P2, non-blocking — see inline)

  1. Single large named namespace still hits the same O(files²) (namespace-siblings.ts:530). The fix special-cases only the '' bucket; the per-file copy still runs for every named bucket. Materializes when method return-type names are unique across files (common case). The "only bucket that grows to every-file size" comment is wrong for a concentrated named namespace. Pre-existing, not a regression.
  2. New fallback only guarded by a gated benchmark (csharp-pipeline-benchmark.test.ts:246). It's skipIf(!GITNEXUS_BENCH) and asserts timing not edges, so nothing in normal CI guards workspaceTypeBindings population or the walker fallback.

Lower-priority / body-only

  • Consistency: the new channel is absent from validate-bindings-immutability.ts (which validates workspaceFqnBindings), and invariant I8 in contract/scope-resolver.ts still says "two-channel" though there are now four. Minor; partly pre-existing drift since #1905. (maintainability)
  • Ordering nuance (low confidence): populateNamespaceSiblings runs before propagateImportedReturnTypes (run.ts:355 vs 377), so the channel can hold pre-collapse return-type refs → at worst under-resolution (a missing edge), never a wrong edge; followChainPostFinalize partially re-follows. Likely negligible for single-hop return bindings; a multi-hop global-alias fixture would settle it. (adversarial, conf ~55)
  • Bypass readers (refuted): Codex noted lookup-core.ts:316 and compound-receiver.ts:231 read typeBindings without the workspace fallback, but neither received global copies under the old scheme → no regression. Completeness watch only.

CI: all checks pass.

Automated multi-tool digest (GitNexus swarm + CE personas + Codex), critic-gated. Verify before acting; findings 1–2 are the actionable items and neither blocks the fix for the reported no-namespace case.

Comment thread gitnexus/src/core/ingestion/languages/csharp/namespace-siblings.ts Outdated
// (per-file global typeBindings copy), the 1000→2000 step measured ~3.06×
// for a 2× file increase — failing the <3 sub-quadratic assertion below.
// The workspaceTypeBindings fast-path keeps it ~linear (~1.2×).
const scales = [500, 1000, 2000];

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2 · test gap] This benchmark is the only test exercising the new workspaceTypeBindings path, and it's describe.skipIf(!BENCH_ENABLED) (gated on GITNEXUS_BENCH=1) and asserts timing ratios, not resolved edges (skipGraphPhases: true). So in normal CI nothing guards the new behavior — a regression that broke workspaceTypeBindings population or the findReceiverTypeBinding / followChainPostFinalize fallback would pass.

(The existing always-run csharp-hooks test covers workspaceFqnBindings — a different channel — so it provides zero coverage here.)

Suggested: an always-run unit test that populates workspaceTypeBindings with one entry over an empty scope chain and asserts both walkers resolve through it, plus a first-wins collision case. [code-read]

@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10514 10504 0 10 718s

✅ All 10504 tests passed

10 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 80.27% 36856/45914 79.84% 📈 +0.4 🟢 ████████████████░░░░
Branches 68.86% 23486/34103 68.5% 📈 +0.4 🟢 █████████████░░░░░░░
Functions 85.32% 3813/4469 84.94% 📈 +0.4 🟢 █████████████████░░░
Lines 83.81% 33183/39589 83.36% 📈 +0.5 🟢 ████████████████░░░░

📋 View full run · Generated by CI

…ed namespaces (#1871)

#1954 eliminated the namespace-siblings O(files²) OOM only for the global
('' / no-declared-namespace) bucket. A solution with all files under one
named namespace (e.g. file-scoped `namespace Company.Product;`, common in
modern .NET) still reproduced the #1871 blow-up — and in BOTH loops: the
BindingRef per-scope augmentation (#1905's twin) AND the typeBindings
per-file copy (#1954's twin) were each still O(N²) for a named bucket.

Generalize the shared-channel approach to named namespaces:
- Add namespace-keyed channels `namespaceFqnBindings` / `namespaceTypeBindings`
  (the per-namespace analogues of `workspaceFqnBindings` / `workspaceTypeBindings`)
  plus `accessibleNamespacesByScope`, populated ONCE per named bucket from the
  existing `expandedNamespaces` derivation — O(defs), not O(files × defs).
- Make the shared walkers (`findReceiverTypeBinding`, `lookupBindingsAt`,
  `followChainPostFinalize`) namespace-aware: after the per-scope chain and the
  flat global channel miss, consult the per-namespace channels gated by the
  caller module's accessible namespaces. Language-neutral — only the C# hook
  populates the channels; the machinery names no language (AGENTS rule).
- Precedence preserved: local chain → named namespace → global. Named is
  consulted before the flat global channel because pre-#1871 named siblings
  lived in the chain / bindingAugmentations (above the workspace channel), so a
  name in both a named and the global namespace must still resolve named-first.
- `using static` member exposure and the global '' fast-paths are unchanged.

Parity-neutral: `run-parity.ts --language csharp` passes (legacy DAG ==
registry-primary, 218 tests each); the C# resolver suite (386 tests) is green.
Measured: a concentrated named namespace at 500/1000/2000 files now scales
linearly (~0.57×) and ~5.6s at 2000 files, vs the quadratic blow-up before.

Tests:
- New always-run unit coverage for the walker fallbacks
  (namespace-channel-lookup.test.ts): global `workspaceTypeBindings` (the #1954
  channel previously covered only by a gated benchmark), namespace gating /
  no-leak, named-before-global precedence, local shadowing, loop termination.
- Extend the immutability validator + invariant I8 to the new channels and
  `workspaceTypeBindings`; update the `mkIndexes` factory.
- Add a concentrated-NAMED-namespace shape to the C# pipeline benchmark with
  the sub-quadratic scaling assertion and an edge-count sanity check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@magyargergo magyargergo merged commit e275826 into main May 31, 2026
33 checks passed
@magyargergo magyargergo deleted the issue-1871 branch May 31, 2026 17:21
prajapatisparsh pushed a commit to prajapatisparsh/GitNexus that referenced this pull request Jun 1, 2026
…n the worker path (abhigyanpatwari#1951)

Registry-primary C# and Java produced zero EXTENDS/IMPLEMENTS edges when the
worker pool was engaged (large repos), so diagrams showed classes and
interfaces with no inheritance edges between them. Small fixtures stayed under
the worker threshold and ran sequentially, where the legacy heritage path is
intact, so the bug was invisible to existing tests.

Root cause: inheritance edges for migrated (registry-primary) languages came
only from the legacy `@heritage.*` -> processHeritage path, and the worker
pipeline drops those legacy artifacts for registry-primary languages via the
`shouldAccumulate` gate (parse-impl.ts). Scope-resolution runs in both parse
modes (worker-safe) but emitted nothing for C#/Java because, unlike C++, they
synthesized no `@reference.inherits` captures.

Fix (the existing C++ pattern): C#/Java now synthesize `@reference.inherits`
captures from their base lists, routing inheritance through scope-resolution.
The generic `preEmitInheritanceEdges` pass now decides EXTENDS vs IMPLEMENTS
from the resolved target's symbol kind (Interface -> IMPLEMENTS), mirroring the
legacy `resolveExtendsType` semantics so the registry path matches the legacy
DAG. C++ has no Interface targets, so it always takes the EXTENDS branch and is
unchanged. Capture scope per language matches the legacy heritage query (C#:
class+interface base lists; Java: class superclass + implemented interfaces) to
preserve scope-resolution parity.

The one-line worker fallback (always accumulating deferredWorkerHeritage) was
deliberately not used: it would resurrect the legacy DAG for migrated
languages, double-emit against C++, and re-introduce the O(files^2) heritage
cost the registry migration removes. No double-emission: in sequential mode the
legacy path emits first and scope-resolution dedups against the graph; in
worker mode the legacy path is dropped and scope-resolution fills the gap.

Touches none of PR abhigyanpatwari#1954's protected files.

Tests: new worker-forced regression (test/integration/heritage-worker-path.test.ts)
that forces the worker pool on small fixtures and asserts the edges; it fails
before this change (0 edges) and passes after. C# and Java scope-resolution
parity gates pass in both legacy and registry-primary modes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gitnexus analyze hangs/OOMs during [Resolving types (Csharp [2/3] - analyzing types] on large same-namespace codebases

1 participant