fix(csharp): stop spurious IMPORTS edges from ungated using-resolution (#1881)#1908
Merged
Conversation
Types declared in the C# global (default) namespace are visible from every file, so the previous per-scope augmentation materialized O(scopes × defs) BindingRefs — on large Unity solutions (tens of thousands of global types) this caused severe slowness and OOM. Route global-namespace types through a single workspace-level binding channel (workspaceFqnBindings, consulted by lookupBindingsAt) for O(D) memory. Also fix quadratic costs in the non-global path: append defs in place instead of copying (was O(D²) per bucket), pre-index the first scope per file (was O(S²·D)), and seed de-dup sets instead of repeated .some scans. Add csharp-pipeline-benchmark.test.ts (mirrors the PHP benchmark) with spread and concentrated-global-namespace scenarios to track elapsedMs, peakHeapMB, nodeCount, and edgeCount. Post-fix runs show linear scaling and stable heap. Co-authored-by: Cursor <cursoragent@cursor.com>
Worker threads can't return tree-sitter Trees across MessageChannels, so the cross-phase tree cache is empty for worker-parsed files. The C# same-namespace pass (populateCsharpNamespaceSiblings -> extractFileStructure) then re-parsed every file with tree-sitter to find namespace / using-static nodes — effectively parsing a large solution a second time during scope resolution. Add a line-scanner fallback (extractCsharpStructureViaScanner) used only when no cached Tree is available, mirroring PHP's fix for issue #1741. It extracts the same namespaces / usingStaticPaths the AST walk produces for the common line-anchored forms (file-scoped + block namespaces, plain / global / aliased `using static`). The AST walk stays authoritative on the sequential / warm-cache path. Micro-benchmark over 3000 synthetic files: scanner is ~188x faster than parse+walk (0.001 vs 0.251 ms/file) with identical output on the parity spot-check; real-world files are larger, so the worker-path saving is bigger. Adds csharp-namespace-extraction.test.ts (12 cases) covering all declaration forms plus negative cases (using var, plain using, comments). Co-authored-by: Cursor <cursoragent@cursor.com>
… using-static perf Addresses the production-readiness review of the namespace-siblings OOM fix. - Add a unit test proving global-(default-)namespace C# types route to indexes.workspaceFqnBindings (one entry per simple name) with ZERO bindingAugmentations — pinning the O(D) invariant behind the #1871 Unity-scale OOM fix and guarding against a revert to per-scope O(scopes x defs) augmentation. (The csharp-hooks mock now supplies workspaceFqnBindings, which the global fast path reads directly.) - Correct the workspaceFqnBindings doc comment: it is shared by PHP (backslash-FQN keys) and C# (global-namespace simple-name keys); the two key formats are disjoint. - Pre-index parsedFiles by path before the `using static` member-injection loop, replacing an O(files) find-per-import with an O(1) Map lookup. Verified: tsc --noEmit clean; csharp-hooks + csharp-namespace-extraction suites pass (38 tests); prettier clean; eslint 0 errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es, docs) Addresses the multi-agent code review of this PR — the concrete, defensible findings. Two items intentionally deferred (below). - namespace-siblings.ts: couple the augmentation bucket + its de-dup set into one nullable lifecycle, removing the seen!/bucketArr! non-null assertions (identical runtime, still lazy). - validate-bindings-immutability.ts: extend the dev-mode immutability validator to the third channel (workspaceFqnBindings) + a test; complete the validator test mock with workspaceFqnBindings. - walkers.ts: document that namesAtScope deliberately excludes the scope-independent workspaceFqnBindings channel (enumerating workspace names at every scope would flood per-scope callers; lookupBindingsAt still consults it when resolving a specific name). - scope-resolution-indexes.ts: reframe the workspaceFqnBindings doc to describe the key-format contract language-neutrally (examples, not language branching). - csharp-hooks.test.ts: assert workspace entries carry origin:'namespace'; add a partial-class test (same simple name, distinct nodeIds across global files → both kept); rename the stale "parses" cache-miss test to "scans". - csharp-pipeline-benchmark.test.ts: clearTimeout the Promise.race budget timer (dangling handle when the pipeline won the race). - csharp.test.ts: correct the #1066 comment — extractFileStructure no longer re-parses on cache miss (line scanner); only emitCsharpScopeCaptures re-parses. Deferred (surfaced, not applied): (1) worker-path scanner mis-reads namespace/using-static inside block comments and verbatim/raw strings — an explicitly documented trade-off mirroring the PHP scanner; hardening it to track comment/string state is a separate decision. (2) workspaceFqnBindings is read via an `as Map` cast; a type-safe mutable handle from finalize-orchestrator is a cross-module contract change. Verified: tsc --noEmit clean; 49 unit tests pass (incl. 3 new); prettier clean; eslint 0 errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses the two deferred PR-review findings plus the remaining test gap. #1 — Worker-path scanner false positives: the line scanner now tracks block- comment and string state across lines (advanceCsScanState), so a `namespace` / `using static` keyword at the start of a line inside a block comment, verbatim string (@"..."), or raw string literal ("""...""") is no longer mistaken for a declaration on the worker cache-miss path. It matches only at code-state line starts. 5 new scanner tests cover the block-comment / raw / verbatim cases. #4 — workspaceFqnBindings type safety: the ReadonlyMap->Map cast is localized to one documented line, and global-namespace writes go through a new getWorkspaceBucket helper (mirroring getAugmentationBucket) rather than an inline `.set()` at the mutation site. #2 — lookupBindingsAt workspace-channel coverage: walkers-augmentations.test.ts now exercises the third (workspace) channel: workspace-only, append-after- finalized/augmented, and dedup-loses-to-finalized/augmented precedence. #5 — OOM CI guard: the deterministic O(D) invariant (zero per-scope augmentation for global types) is already asserted by the always-on csharp-hooks unit tests added earlier; the scale/time benchmark stays appropriately opt-in (skipIf). Verified: tsc --noEmit clean; 69 unit tests (4 suites) + 210 C# integration resolver tests pass; prettier clean; eslint 0 errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The using-static member-injection loop and the cross-namespace import loop both de-duped via `bucketArr.some((b) => b.def.nodeId === ...)` — O(A) per item. Both now use a per-file `Map<simpleName, Set<nodeId>>`, seeded lazily from the augmentation bucket (capturing entries from earlier passes), matching the global and named-namespace paths. Same dedup semantics, O(1) amortized. Verified: tsc --noEmit clean; csharp-hooks unit (27) + C# integration resolver (210) tests pass; prettier + eslint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…paces (#1881) C# `using` directives were resolving via an ungated suffix match, so a BCL using like `System.Threading.Tasks` matched a coincidental local `Tasks.cs` and emitted spurious IMPORTS edges. Add a declared-namespace gate that only permits suffix-fallback when the import plausibly refers to an in-repo namespace (exact, immediate-parent-declared, or ancestor-of a declared namespace anchored at an in-repo root). Both resolution legs — the legacy DAG and the registry-primary scope resolver — thread the same evidence to the gate, including the no-csproj path. Declared namespaces are collected with #1905's comment/string-aware scanner (extractCsharpStructureViaScanner, lazily imported) instead of a regex, so `namespace` tokens in comments/strings can't seed phantom namespaces. Scan truncation or unreadable subtrees fail OPEN (gate disabled) and are logged. Stacked on #1905 (fix/csharp-namespace-scope-oom). Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
✨ PR AutofixFound fixable formatting / unused-import issues across 237 changed lines. Comment |
Contributor
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10237 tests passed 7 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
…1881) scanCSharpProject read every .cs/.csproj in full with no size guard and issued per-directory reads with no concurrency bound, an OOM/FD-exhaustion vector on large or generated repos. Add an fs.stat size guard before each read, reusing getMaxFileSizeBytes() (the same 512KB cap the Phase-1 walker uses). An oversized or unreadable .cs now signals truncation so the #1881 suffix-fallback gate fails OPEN rather than wrongly suppressing an import whose declaring namespace lived in the skipped file (previously a silent return left the scan looking complete). Adds a size-cap scan test.
…1881) The scan issued every .cs/.csproj read in a directory at once via Promise.all, so in-flight file descriptors scaled with the largest directory's file count. Issue reads in bounded windows (32, mirroring the Phase-1 filesystem-walker) via Promise.allSettled; an unexpected read/scan rejection now trips truncation (fail open) instead of rejecting the whole scan. Behavior-preserving for namespace collection (C# scope-resolution parity passes on both legs).
…gate (#1908) Reflow hand-wrapped lines in scope-resolver.ts and the csharp integration test that prettier collapses under printWidth 100. Formatting only, no behavioral change; clears the failing quality/format CI gate.
…able the #1881 gate (#1908) Code-review follow-up. The scan read each .cs fully into a string behind a 512KB size cap (the tree-sitter parse budget); a single larger generated file (*.g.cs, EF/gRPC output) tripped `truncated`, making the #1881 suffix-fallback gate fail open repo-wide and silently undoing the fix on real repos. Stream each .cs line-by-line via createReadStream + readline into a new incremental scanner (createCsharpStructureScanner) instead of buffering the whole file. Memory is now constant regardless of file size, so the per-file size cap is dropped for the namespace line-scan and large generated files are fully collected. extractCsharpStructureViaScanner is reimplemented on the same incremental scanner (byte-identical; C# parity 2/2). collectDeclaredNamespaces returns 'ok' | 'truncated' (truncation now only from an unreadable file) and the truncation warn lists its real causes. csproj reads keep their size guard. Prior art: ripgrep/ctags/Node readline stream rather than cap for line scans; GitHub (384KB) and Sourcegraph (1MB) cap only their full-content indexes.
…r CodeQL TOCTOU (#1908) CodeQL js/file-system-race flagged the fs.stat + fs.readFile size guard in readCsprojConfig as a check-then-use filesystem race. Replace it with a length-capped createReadStream (readFileTextCapped) — same memory bound on untrusted input, no stat-then-read race, and consistent with the streamed .cs scan. Behavior is unchanged for real .csproj files (parity 2/2).
…1908, Codex F1) A single scan truncation (unreadable dir/file, depth/dir cap) set one repo-wide `truncated` flag that made csharpSuffixFallbackAllowed fail open for EVERY import, silently re-enabling the #1881 BCL->local suffix matches. Add a CSHARP_EXTERNAL_ROOTS denylist (System/Microsoft/...): an external-rooted using that does not align with an in-repo declared namespace stays BLOCKED even under truncation, while genuinely local-looking usings still fail open. A repo that declares the root is allowed via the alignment escape hatch. Shared predicate, so both legs inherit it.
…dex F2) In the no-csproj branch of resolveCsharpImportTarget, resolveDirectMatch ran BEFORE the gate, so a path-aligned Legacy/System/Threading/Tasks.cs satisfied 'using System.Threading.Tasks;' even though System.* is not a declared in-repo namespace — while the legacy leg (gate-first) blocked it, so the legs were not equivalent. Run csharpSuffixFallbackAllowed first (return null on fail), then direct-match, then progressive stripping — mirroring the legacy ordering. Adds a no-csproj fixture with a deep path-aligned Tasks.cs and dual-leg integration describes (registry + forced-legacy), plus a path-aligned unit case. Parity 2/2.
… matchers (#1908, Codex F3) The line scanner treated its output as complete even when it missed valid C# namespace forms, so the gate failed CLOSED and over-blocked legit imports. Make CS_NAMESPACE_RE/CS_USING_STATIC_RE Unicode-aware (\p{L}\p{N} + u flag) and strip leading/segment @ so verbatim/Unicode identifiers are captured to match the AST. For forms the regex still can't capture (split across lines, not at line start, attributed), set a per-file 'incomplete' flag; collectDeclaredNamespaces returns 'truncated' for such files so the #1881 gate fails OPEN instead of dropping the namespace. High-precision detectors + guard tests keep ordinary forms (incl. // namespace comments) from tripping incomplete.
… Codex F4) readCsprojConfig read only the first 512KB of a .csproj and, on a match-miss, couldn't tell 'no RootNamespace' from 'RootNamespace past the cap' — both synthesized a filename root. A wrong authoritative root makes imports under the real root resolve to nothing AND suppresses the fallback. Replace the capped read with a streamed early-stop search (findCsprojRootNamespace) that reads until the tag or EOF: filename fallback ONLY on genuine read-to-EOF absence; on a soft-budget cap-hit or unreadable file, OMIT the config so the no-csproj fallback stays reachable. Removes the now-unused readFileTextCapped + getMaxFileSizeBytes cap from the scan. Parity 2/2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1881: C#
usingdirectives were emitting spuriousIMPORTSedges to unrelated in-repo files. The import resolver's suffix-match fallback was ungated, so a BCL using such asusing System.Threading.Tasks;matched a coincidental localTasks.cs.csharp-namespace-gate.ts): suffix-fallback resolution is allowed only when the import plausibly refers to an in-repo namespace — exact match, immediate-parent declared, or strict ancestor of a declared namespace anchored at an in-repo root. The immediate-parent anchor is what stops a declared BCL prefix (e.g.namespace System;) from green-lighting unrelated BCL usings.loadImportConfigs) and the registry-primary scope resolver (loadCsharpResolutionConfig), including the no-csproj path (now stops the chain for non-aligned imports instead of falling through to the ungated generic strategy).extractCsharpStructureViaScanner, lazily + memoized-imported) instead of a regex, sonamespacetokens in comments/strings can't seed phantom namespaces. A singlecsharpScanToEvidenceprojection feeds both carriers so they can't drift.logger.warn, so a too-small cap can't silently suppress edges as a mystery regression.Changes
csharp-namespace-gate.ts(new) — pure gate predicates (importAlignsWithDeclaredNamespaces,csharpSuffixFallbackAllowed).language-config.ts— one-passscanCSharpProjectreuses the fix(csharp): eliminate namespace-siblings OOM and worker-path re-parse #1905 scanner; addscsharpScanToEvidence; truncation observability + fail-open onreaddirfailure.languages/csharp/resolution-config.ts(new) —loadCsharpResolutionConfigfor the registry leg.languages/csharp/import-target.ts,scope-resolver.ts— thread evidence through the registry leg.import-resolvers/{csharp.ts, configs/csharp.ts, types.ts}— gate the legacy leg incl. no-csproj parity.csharp-spurious-edgesfixture (covers BCL-vs-local collision both legs).Test plan
npx tsc --noEmit— cleanscope-resolution/csharp/*,import-resolver-factory) — 109 passresolvers/csharp.test.ts, both legs + fix(csharp): eliminate namespace-siblings OOM and worker-path re-parse #1905 namespace-siblings) — 214 passdist/; local worktree can't build — siblinggitnexus-sharednot installed, so CLI-spawn unit tests are skipped locally)Follow-up
Double-scan perf (one
scanCSharpProjectper analyze across both legs) intentionally not merged here: skippingloadImportConfigs' scan would alterimportMap(populated byaddImportEdgeregardless of registry-primary). Best coordinated in #1905 where the scan lifecycle is being reworked.Made with Cursor