fix(csharp): eliminate global-namespace typeBindings O(files²) OOM (#1871)#1954
Conversation
…1871) Large C# solutions with tens of thousands of files in the global (unnamed) namespace OOM'd / hung for hours at "Resolving types (Csharp 2/3)". PR #1905 fixed the BindingRef twin of this via the `workspaceFqnBindings` fast-path, but left the typeBindings propagation loop in `populateCsharpNamespaceSiblings` untouched: it copies every global file's module-scope return-type bindings into every OTHER global file's `Scope.typeBindings`. With S files in the `''` bucket and K distinct method names, that is O(S²) time and O(S·K) memory — ~1.3B Map entries (~65-130 GB) at 36k files. Measured on a concentrated global-namespace fixture: the per-file copy went quadratic (1000→2000 files = 3.06× for 2× the files, 65s at 2000). Route global-namespace module typeBindings through a new scope-independent `workspaceTypeBindings` channel populated ONCE (O(K)) and consulted as a fallback by the typeBindings chain-walkers (`findReceiverTypeBinding`, `followChainPostFinalize`), instead of the per-file copy. After: 2000 files 6.3s, 4000 files 6.8s, heap linear. This also makes resolution MORE correct, not just faster. The C# spec makes the unnamed namespace a single declaration space whose members are "available for use in a named namespace", so global types are visible from every file. The old per-file copy only exposed them to OTHER no-namespace files; named-namespace files never saw them. Consulting the shared channel from every scope chain mirrors how Roslyn resolves against a single `Compilation.GlobalNamespace` symbol rather than copying symbols per file. Strengthen csharp-pipeline-benchmark.test.ts so it would catch this: give each file a unique method name (a shared name collapses the module-typeBinding key and skips all copies, hiding the blow-up) and raise the concentrated scales to 2000 so the sub-quadratic assertion trips on the regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
magyargergo
left a comment
There was a problem hiding this comment.
PR Tri-Review: #1954 — eliminate global-namespace typeBindings O(files²) OOM
Methods (3 engines). GitNexus swarm (risk, test/CI) + Compound-Engineering personas (correctness, adversarial, performance, maintainability, testing) = 7 Claude lanes, plus Codex (live, structured output) — the one genuinely independent engine. The two Claude families share priors, so the strong signal below is Codex + a Claude lane agreeing; Claude-only agreement is "consistent across personas." Findings 1–2 were also coordinator-verified by code-read, and gated through the synthesis critic.
Verdict: sound and mergeable for the reported case. CI is all-green — including typecheck, lint, format, and scope-parity (the broadening did not drift parity snapshots). The core fix is correct and the perf claim holds (concentrated-global 2000 files 65s→6.3s, heap linear).
Validated — credit where due
Codex and the Claude correctness/adversarial lanes independently refuted the main hazards, with file:line traces I re-confirmed:
- No cross-language contamination.
findReceiverTypeBindingis shared, butscopeResolutionPhaseruns resolution per language and eachrunScopeResolutionbuilds a freshworkspaceTypeBindings: new Map()(finalize-orchestrator.ts:148); only C# populates it. A PHP/JS/Go receiver can never reach a C# global binding. (Codex high + adversarial) - No mis-shadowing. Both walkers consult the per-scope chain first; the workspace map is a miss-only fallback, so local/named bindings still win. (Codex high + correctness)
- No infinite loop in
followChainPostFinalize—ws !== current+visitedset + depth cap bound it. (Codex high + correctness) - Broadening is C#-spec-correct (global types visible from named-namespace files) and parity-clean.
Findings (both P2, non-blocking — see inline)
- Single large named namespace still hits the same O(files²) (
namespace-siblings.ts:530). The fix special-cases only the''bucket; the per-file copy still runs for every named bucket. Materializes when method return-type names are unique across files (common case). The "only bucket that grows to every-file size" comment is wrong for a concentrated named namespace. Pre-existing, not a regression. - New fallback only guarded by a gated benchmark (
csharp-pipeline-benchmark.test.ts:246). It'sskipIf(!GITNEXUS_BENCH)and asserts timing not edges, so nothing in normal CI guardsworkspaceTypeBindingspopulation or the walker fallback.
Lower-priority / body-only
- Consistency: the new channel is absent from
validate-bindings-immutability.ts(which validatesworkspaceFqnBindings), and invariant I8 incontract/scope-resolver.tsstill says "two-channel" though there are now four. Minor; partly pre-existing drift since #1905. (maintainability) - Ordering nuance (low confidence):
populateNamespaceSiblingsruns beforepropagateImportedReturnTypes(run.ts:355vs377), so the channel can hold pre-collapse return-type refs → at worst under-resolution (a missing edge), never a wrong edge;followChainPostFinalizepartially re-follows. Likely negligible for single-hop return bindings; a multi-hop global-alias fixture would settle it. (adversarial, conf ~55) - Bypass readers (refuted): Codex noted
lookup-core.ts:316andcompound-receiver.ts:231readtypeBindingswithout the workspace fallback, but neither received global copies under the old scheme → no regression. Completeness watch only.
CI: all checks pass.
Automated multi-tool digest (GitNexus swarm + CE personas + Codex), critic-gated. Verify before acting; findings 1–2 are the actionable items and neither blocks the fix for the reported no-namespace case.
| // (per-file global typeBindings copy), the 1000→2000 step measured ~3.06× | ||
| // for a 2× file increase — failing the <3 sub-quadratic assertion below. | ||
| // The workspaceTypeBindings fast-path keeps it ~linear (~1.2×). | ||
| const scales = [500, 1000, 2000]; |
There was a problem hiding this comment.
[P2 · test gap] This benchmark is the only test exercising the new workspaceTypeBindings path, and it's describe.skipIf(!BENCH_ENABLED) (gated on GITNEXUS_BENCH=1) and asserts timing ratios, not resolved edges (skipGraphPhases: true). So in normal CI nothing guards the new behavior — a regression that broke workspaceTypeBindings population or the findReceiverTypeBinding / followChainPostFinalize fallback would pass.
(The existing always-run csharp-hooks test covers workspaceFqnBindings — a different channel — so it provides zero coverage here.)
Suggested: an always-run unit test that populates workspaceTypeBindings with one entry over an empty scope chain and asserts both walkers resolve through it, plus a first-wins collision case. [code-read]
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10504 tests passed 10 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
…ed namespaces (#1871) #1954 eliminated the namespace-siblings O(files²) OOM only for the global ('' / no-declared-namespace) bucket. A solution with all files under one named namespace (e.g. file-scoped `namespace Company.Product;`, common in modern .NET) still reproduced the #1871 blow-up — and in BOTH loops: the BindingRef per-scope augmentation (#1905's twin) AND the typeBindings per-file copy (#1954's twin) were each still O(N²) for a named bucket. Generalize the shared-channel approach to named namespaces: - Add namespace-keyed channels `namespaceFqnBindings` / `namespaceTypeBindings` (the per-namespace analogues of `workspaceFqnBindings` / `workspaceTypeBindings`) plus `accessibleNamespacesByScope`, populated ONCE per named bucket from the existing `expandedNamespaces` derivation — O(defs), not O(files × defs). - Make the shared walkers (`findReceiverTypeBinding`, `lookupBindingsAt`, `followChainPostFinalize`) namespace-aware: after the per-scope chain and the flat global channel miss, consult the per-namespace channels gated by the caller module's accessible namespaces. Language-neutral — only the C# hook populates the channels; the machinery names no language (AGENTS rule). - Precedence preserved: local chain → named namespace → global. Named is consulted before the flat global channel because pre-#1871 named siblings lived in the chain / bindingAugmentations (above the workspace channel), so a name in both a named and the global namespace must still resolve named-first. - `using static` member exposure and the global '' fast-paths are unchanged. Parity-neutral: `run-parity.ts --language csharp` passes (legacy DAG == registry-primary, 218 tests each); the C# resolver suite (386 tests) is green. Measured: a concentrated named namespace at 500/1000/2000 files now scales linearly (~0.57×) and ~5.6s at 2000 files, vs the quadratic blow-up before. Tests: - New always-run unit coverage for the walker fallbacks (namespace-channel-lookup.test.ts): global `workspaceTypeBindings` (the #1954 channel previously covered only by a gated benchmark), namespace gating / no-leak, named-before-global precedence, local shadowing, loop termination. - Extend the immutability validator + invariant I8 to the new channels and `workspaceTypeBindings`; update the `mkIndexes` factory. - Add a concentrated-NAMED-namespace shape to the C# pipeline benchmark with the sub-quadratic scaling assertion and an edge-count sanity check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n the worker path (abhigyanpatwari#1951) Registry-primary C# and Java produced zero EXTENDS/IMPLEMENTS edges when the worker pool was engaged (large repos), so diagrams showed classes and interfaces with no inheritance edges between them. Small fixtures stayed under the worker threshold and ran sequentially, where the legacy heritage path is intact, so the bug was invisible to existing tests. Root cause: inheritance edges for migrated (registry-primary) languages came only from the legacy `@heritage.*` -> processHeritage path, and the worker pipeline drops those legacy artifacts for registry-primary languages via the `shouldAccumulate` gate (parse-impl.ts). Scope-resolution runs in both parse modes (worker-safe) but emitted nothing for C#/Java because, unlike C++, they synthesized no `@reference.inherits` captures. Fix (the existing C++ pattern): C#/Java now synthesize `@reference.inherits` captures from their base lists, routing inheritance through scope-resolution. The generic `preEmitInheritanceEdges` pass now decides EXTENDS vs IMPLEMENTS from the resolved target's symbol kind (Interface -> IMPLEMENTS), mirroring the legacy `resolveExtendsType` semantics so the registry path matches the legacy DAG. C++ has no Interface targets, so it always takes the EXTENDS branch and is unchanged. Capture scope per language matches the legacy heritage query (C#: class+interface base lists; Java: class superclass + implemented interfaces) to preserve scope-resolution parity. The one-line worker fallback (always accumulating deferredWorkerHeritage) was deliberately not used: it would resurrect the legacy DAG for migrated languages, double-emit against C++, and re-introduce the O(files^2) heritage cost the registry migration removes. No double-emission: in sequential mode the legacy path emits first and scope-resolution dedups against the graph; in worker mode the legacy path is dropped and scope-resolution fills the gap. Touches none of PR abhigyanpatwari#1954's protected files. Tests: new worker-forced regression (test/integration/heritage-worker-path.test.ts) that forces the worker pool on small fixtures and asserts the edges; it fails before this change (0 edges) and passes after. C# and Java scope-resolution parity gates pass in both legacy and registry-primary modes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Fixes #1871 —
gitnexus analyzehangs for hours and OOMs at "Resolving types (Csharp [2/3] — analyzing types)" on large C# codebases with tens of thousands of files in the global (no-declared-namespace) namespace.PR #1905 fixed the BindingRef twin of this via the
workspaceFqnBindingsfast-path, but left the typeBindings propagation loop inpopulateCsharpNamespaceSiblingsuntouched. That loop copies every global file's module-scope return-type bindings into every other global file'sScope.typeBindings:''bucket and K distinct method names → O(S²) time and O(S·K) memoryREGISTRY_PRIMARY_CSHARP=0sidesteps it).Root cause, measured
Instrumented a concentrated global-namespace fixture (unique method names = the real-world case):
Fix
Route global-namespace module typeBindings through a new scope-independent
workspaceTypeBindingschannel, populated once (O(K)) from the''bucket and consulted as a fallback by the two typeBindings chain-walkers (findReceiverTypeBinding,followChainPostFinalize) — instead of copying per file. This is the typeBindings analogue of #1905'sworkspaceFqnBindings.After: 2000 files 6.3s (was 65s), 4000 files 6.8s, heap linear.
Faster and more correct
The C# spec makes the unnamed namespace a single declaration space whose members are "available for use in a named namespace" — global types are visible from every file. The old per-file copy only exposed them to other no-namespace files; named-namespace files never saw them (a latent correctness bug). Consulting one shared channel from every scope chain mirrors how Roslyn resolves against a single
Compilation.GlobalNamespacesymbol rather than copying symbols per file.Regression guard
Strengthened
csharp-pipeline-benchmark.test.tsso it would actually catch this:< 3×sub-quadratic assertion trips on the regression (pre-fix 1000→2000 was 3.06×).Validation
npm run build(tsc) ✅Files
6 files, +90/−16:
model/scope-resolution-indexes.ts— newworkspaceTypeBindingschannelfinalize-orchestrator.ts— initlanguages/csharp/namespace-siblings.ts— populate once, skip''in per-file copyscope-resolution/scope/walkers.ts+scope-resolution/passes/imported-return-types.ts— fallback in the two chain-walkerstest/integration/csharp-pipeline-benchmark.test.ts— regression guard🤖 Generated with Claude Code
Update — named-namespace generalization folded in (#1871 follow-up)
The original commit fixed the OOM only for the global (
''/ no-declared-namespace) bucket. The tri-review flagged, and a follow-up plan confirmed, that a solution with all files under one named namespace (e.g. file-scopednamespace Company.Product;, common in modern .NET) reproduced the same O(files²) blow-up — in both loops: theBindingRefper-scope augmentation (#1905's twin) and the typeBindings per-file copy. This PR now ships the complete fix (global + named).What was added (second commit):
namespaceFqnBindings/namespaceTypeBindings(per-namespace analogues of the flatworkspaceFqnBindings/workspaceTypeBindings) + anaccessibleNamespacesByScopegate, populated once per named bucket from the existingexpandedNamespacesderivation — O(defs), not O(files × defs).findReceiverTypeBinding,lookupBindingsAt,followChainPostFinalize) are now namespace-aware: after the per-scope chain and the flat global channel miss, they consult the per-namespace channels gated by the caller module's accessible namespaces. Language-neutral — only the C# hook populates the channels (AGENTS rule).using staticmember exposure and the global''fast-paths are unchanged.Parity-neutral —
scripts/run-parity.ts --language csharppasses (legacy DAG == registry-primary, 218 tests each); C# resolver suite 386 tests green; scope-resolution unit suite 1104 green; new always-run walker-fallback coverage added (namespace-channel-lookup.test.ts), including theworkspaceTypeBindingschannel that previously had only gated-benchmark coverage.Measured (concentrated named namespace, unique method names):
(vs. the quadratic 65s-at-2000 the global case showed pre-fix; edges scale linearly, confirming cross-file resolution still works.)
Post-Deploy Monitoring & Validation
gitnexus analyzecompletes on large C# repos (single namespace or no namespace) in roughly linear time; no OOM at the "Resolving types (Csharp 2/3)" stage; heap stays bounded.scope-parityCI job (must stay green — output-neutrality guarantee).REGISTRY_PRIMARY_CSHARP=0remains the escape hatch; revert the two#1871commits to fall back to per-file copying.