Skip to content

fix(csharp): stop spurious IMPORTS edges from ungated using-resolution (#1881)#1908

Merged
magyargergo merged 27 commits into
mainfrom
fix/1881-csharp-unrelated-edges
May 30, 2026
Merged

fix(csharp): stop spurious IMPORTS edges from ungated using-resolution (#1881)#1908
magyargergo merged 27 commits into
mainfrom
fix/1881-csharp-unrelated-edges

Conversation

@magyargergo

Copy link
Copy Markdown
Collaborator

Summary

Fixes #1881: C# using directives were emitting spurious IMPORTS edges to unrelated in-repo files. The import resolver's suffix-match fallback was ungated, so a BCL using such as using System.Threading.Tasks; matched a coincidental local Tasks.cs.

  • Adds a declared-namespace gate (csharp-namespace-gate.ts): suffix-fallback resolution is allowed only when the import plausibly refers to an in-repo namespace — exact match, immediate-parent declared, or strict ancestor of a declared namespace anchored at an in-repo root. The immediate-parent anchor is what stops a declared BCL prefix (e.g. namespace System;) from green-lighting unrelated BCL usings.
  • Both resolution legs thread the same evidence to the gate: the legacy DAG (loadImportConfigs) and the registry-primary scope resolver (loadCsharpResolutionConfig), including the no-csproj path (now stops the chain for non-aligned imports instead of falling through to the ungated generic strategy).
  • Declared namespaces are collected with fix(csharp): eliminate namespace-siblings OOM and worker-path re-parse #1905's comment/string-aware scanner (extractCsharpStructureViaScanner, lazily + memoized-imported) instead of a regex, so namespace tokens in comments/strings can't seed phantom namespaces. A single csharpScanToEvidence projection feeds both carriers so they can't drift.
  • Fail-open safety: scan truncation (dir/depth caps) or unreadable subtrees disable the gate (allow fallback) and emit a logger.warn, so a too-small cap can't silently suppress edges as a mystery regression.

⚠️ Stacked on #1905 (fix/csharp-namespace-scope-oom) — base is set there so the diff is #1881-only. Retarget to main once #1905 merges.

Changes

  • csharp-namespace-gate.ts (new) — pure gate predicates (importAlignsWithDeclaredNamespaces, csharpSuffixFallbackAllowed).
  • language-config.ts — one-pass scanCSharpProject reuses the fix(csharp): eliminate namespace-siblings OOM and worker-path re-parse #1905 scanner; adds csharpScanToEvidence; truncation observability + fail-open on readdir failure.
  • languages/csharp/resolution-config.ts (new)loadCsharpResolutionConfig for the registry leg.
  • languages/csharp/import-target.ts, scope-resolver.ts — thread evidence through the registry leg.
  • import-resolvers/{csharp.ts, configs/csharp.ts, types.ts} — gate the legacy leg incl. no-csproj parity.
  • Tests + a csharp-spurious-edges fixture (covers BCL-vs-local collision both legs).

Test plan

  • npx tsc --noEmit — clean
  • C# unit (scope-resolution/csharp/*, import-resolver-factory) — 109 pass
  • C# integration (resolvers/csharp.test.ts, both legs + fix(csharp): eliminate namespace-siblings OOM and worker-path re-parse #1905 namespace-siblings) — 214 pass
  • CI full suite (builds dist/; local worktree can't build — sibling gitnexus-shared not installed, so CLI-spawn unit tests are skipped locally)

Follow-up

Double-scan perf (one scanCSharpProject per analyze across both legs) intentionally not merged here: skipping loadImportConfigs' scan would alter importMap (populated by addImportEdge regardless of registry-primary). Best coordinated in #1905 where the scan lifecycle is being reworked.

Made with Cursor

magyargergo and others added 12 commits May 29, 2026 13:49
Types declared in the C# global (default) namespace are visible from
every file, so the previous per-scope augmentation materialized
O(scopes × defs) BindingRefs — on large Unity solutions (tens of
thousands of global types) this caused severe slowness and OOM.

Route global-namespace types through a single workspace-level binding
channel (workspaceFqnBindings, consulted by lookupBindingsAt) for O(D)
memory. Also fix quadratic costs in the non-global path: append defs in
place instead of copying (was O(D²) per bucket), pre-index the first
scope per file (was O(S²·D)), and seed de-dup sets instead of repeated
.some scans.

Add csharp-pipeline-benchmark.test.ts (mirrors the PHP benchmark) with
spread and concentrated-global-namespace scenarios to track elapsedMs,
peakHeapMB, nodeCount, and edgeCount. Post-fix runs show linear scaling
and stable heap.

Co-authored-by: Cursor <cursoragent@cursor.com>
Worker threads can't return tree-sitter Trees across MessageChannels, so
the cross-phase tree cache is empty for worker-parsed files. The C#
same-namespace pass (populateCsharpNamespaceSiblings -> extractFileStructure)
then re-parsed every file with tree-sitter to find namespace / using-static
nodes — effectively parsing a large solution a second time during scope
resolution.

Add a line-scanner fallback (extractCsharpStructureViaScanner) used only
when no cached Tree is available, mirroring PHP's fix for issue #1741. It
extracts the same namespaces / usingStaticPaths the AST walk produces for
the common line-anchored forms (file-scoped + block namespaces, plain /
global / aliased `using static`). The AST walk stays authoritative on the
sequential / warm-cache path.

Micro-benchmark over 3000 synthetic files: scanner is ~188x faster than
parse+walk (0.001 vs 0.251 ms/file) with identical output on the parity
spot-check; real-world files are larger, so the worker-path saving is
bigger. Adds csharp-namespace-extraction.test.ts (12 cases) covering all
declaration forms plus negative cases (using var, plain using, comments).

Co-authored-by: Cursor <cursoragent@cursor.com>
… using-static perf

Addresses the production-readiness review of the namespace-siblings OOM fix.

- Add a unit test proving global-(default-)namespace C# types route to
  indexes.workspaceFqnBindings (one entry per simple name) with ZERO
  bindingAugmentations — pinning the O(D) invariant behind the #1871
  Unity-scale OOM fix and guarding against a revert to per-scope
  O(scopes x defs) augmentation. (The csharp-hooks mock now supplies
  workspaceFqnBindings, which the global fast path reads directly.)
- Correct the workspaceFqnBindings doc comment: it is shared by PHP
  (backslash-FQN keys) and C# (global-namespace simple-name keys); the two
  key formats are disjoint.
- Pre-index parsedFiles by path before the `using static` member-injection
  loop, replacing an O(files) find-per-import with an O(1) Map lookup.

Verified: tsc --noEmit clean; csharp-hooks + csharp-namespace-extraction
suites pass (38 tests); prettier clean; eslint 0 errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es, docs)

Addresses the multi-agent code review of this PR — the concrete, defensible
findings. Two items intentionally deferred (below).

- namespace-siblings.ts: couple the augmentation bucket + its de-dup set into
  one nullable lifecycle, removing the seen!/bucketArr! non-null assertions
  (identical runtime, still lazy).
- validate-bindings-immutability.ts: extend the dev-mode immutability validator
  to the third channel (workspaceFqnBindings) + a test; complete the validator
  test mock with workspaceFqnBindings.
- walkers.ts: document that namesAtScope deliberately excludes the
  scope-independent workspaceFqnBindings channel (enumerating workspace names at
  every scope would flood per-scope callers; lookupBindingsAt still consults it
  when resolving a specific name).
- scope-resolution-indexes.ts: reframe the workspaceFqnBindings doc to describe
  the key-format contract language-neutrally (examples, not language branching).
- csharp-hooks.test.ts: assert workspace entries carry origin:'namespace'; add a
  partial-class test (same simple name, distinct nodeIds across global files →
  both kept); rename the stale "parses" cache-miss test to "scans".
- csharp-pipeline-benchmark.test.ts: clearTimeout the Promise.race budget timer
  (dangling handle when the pipeline won the race).
- csharp.test.ts: correct the #1066 comment — extractFileStructure no longer
  re-parses on cache miss (line scanner); only emitCsharpScopeCaptures re-parses.

Deferred (surfaced, not applied): (1) worker-path scanner mis-reads
namespace/using-static inside block comments and verbatim/raw strings — an
explicitly documented trade-off mirroring the PHP scanner; hardening it to track
comment/string state is a separate decision. (2) workspaceFqnBindings is read
via an `as Map` cast; a type-safe mutable handle from finalize-orchestrator is a
cross-module contract change.

Verified: tsc --noEmit clean; 49 unit tests pass (incl. 3 new); prettier clean;
eslint 0 errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses the two deferred PR-review findings plus the remaining test gap.

#1 — Worker-path scanner false positives: the line scanner now tracks block-
comment and string state across lines (advanceCsScanState), so a `namespace` /
`using static` keyword at the start of a line inside a block comment, verbatim
string (@"..."), or raw string literal ("""...""") is no longer mistaken for a
declaration on the worker cache-miss path. It matches only at code-state line
starts. 5 new scanner tests cover the block-comment / raw / verbatim cases.

#4 — workspaceFqnBindings type safety: the ReadonlyMap->Map cast is localized
to one documented line, and global-namespace writes go through a new
getWorkspaceBucket helper (mirroring getAugmentationBucket) rather than an
inline `.set()` at the mutation site.

#2 — lookupBindingsAt workspace-channel coverage: walkers-augmentations.test.ts
now exercises the third (workspace) channel: workspace-only, append-after-
finalized/augmented, and dedup-loses-to-finalized/augmented precedence.

#5 — OOM CI guard: the deterministic O(D) invariant (zero per-scope
augmentation for global types) is already asserted by the always-on
csharp-hooks unit tests added earlier; the scale/time benchmark stays
appropriately opt-in (skipIf).

Verified: tsc --noEmit clean; 69 unit tests (4 suites) + 210 C# integration
resolver tests pass; prettier clean; eslint 0 errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The using-static member-injection loop and the cross-namespace import loop both
de-duped via `bucketArr.some((b) => b.def.nodeId === ...)` — O(A) per item. Both
now use a per-file `Map<simpleName, Set<nodeId>>`, seeded lazily from the
augmentation bucket (capturing entries from earlier passes), matching the
global and named-namespace paths. Same dedup semantics, O(1) amortized.

Verified: tsc --noEmit clean; csharp-hooks unit (27) + C# integration resolver
(210) tests pass; prettier + eslint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…paces (#1881)

C# `using` directives were resolving via an ungated suffix match, so a BCL
using like `System.Threading.Tasks` matched a coincidental local `Tasks.cs`
and emitted spurious IMPORTS edges. Add a declared-namespace gate that only
permits suffix-fallback when the import plausibly refers to an in-repo
namespace (exact, immediate-parent-declared, or ancestor-of a declared
namespace anchored at an in-repo root). Both resolution legs — the legacy
DAG and the registry-primary scope resolver — thread the same evidence to
the gate, including the no-csproj path.

Declared namespaces are collected with #1905's comment/string-aware scanner
(extractCsharpStructureViaScanner, lazily imported) instead of a regex, so
`namespace` tokens in comments/strings can't seed phantom namespaces. Scan
truncation or unreadable subtrees fail OPEN (gate disabled) and are logged.

Stacked on #1905 (fix/csharp-namespace-scope-oom).

Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel

vercel Bot commented May 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment May 30, 2026 8:32am

Request Review

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Found fixable formatting / unused-import issues across 237 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":1908,"changed_lines":237,"head_sha":"2f7ef9154242f2ea4caabe9396c01f8d6355f8ab","run_id":"26657669554","apply_command":"/autofix"}

Base automatically changed from fix/csharp-namespace-scope-oom to main May 29, 2026 19:25
@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10244 10237 0 7 643s

✅ All 10237 tests passed

7 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 80.02% 35957/44932 79.48% 📈 +0.5 🟢 ████████████████░░░░
Branches 68.69% 22956/33418 68% 📈 +0.7 🟢 █████████████░░░░░░░
Functions 85.03% 3705/4357 84.56% 📈 +0.5 🟢 █████████████████░░░
Lines 83.56% 32382/38750 82.97% 📈 +0.6 🟢 ████████████████░░░░

📋 View full run · Generated by CI

…1881)

scanCSharpProject read every .cs/.csproj in full with no size guard and
issued per-directory reads with no concurrency bound, an OOM/FD-exhaustion
vector on large or generated repos. Add an fs.stat size guard before each
read, reusing getMaxFileSizeBytes() (the same 512KB cap the Phase-1 walker
uses). An oversized or unreadable .cs now signals truncation so the #1881
suffix-fallback gate fails OPEN rather than wrongly suppressing an import
whose declaring namespace lived in the skipped file (previously a silent
return left the scan looking complete). Adds a size-cap scan test.
…1881)

The scan issued every .cs/.csproj read in a directory at once via
Promise.all, so in-flight file descriptors scaled with the largest
directory's file count. Issue reads in bounded windows (32, mirroring
the Phase-1 filesystem-walker) via Promise.allSettled; an unexpected
read/scan rejection now trips truncation (fail open) instead of
rejecting the whole scan. Behavior-preserving for namespace collection
(C# scope-resolution parity passes on both legs).
…gate (#1908)

Reflow hand-wrapped lines in scope-resolver.ts and the csharp integration
test that prettier collapses under printWidth 100. Formatting only, no
behavioral change; clears the failing quality/format CI gate.
Comment thread gitnexus/src/core/ingestion/language-config.ts Fixed
Comment thread gitnexus/src/core/ingestion/language-config.ts Fixed
…able the #1881 gate (#1908)

Code-review follow-up. The scan read each .cs fully into a string behind a
512KB size cap (the tree-sitter parse budget); a single larger generated file
(*.g.cs, EF/gRPC output) tripped `truncated`, making the #1881 suffix-fallback
gate fail open repo-wide and silently undoing the fix on real repos.

Stream each .cs line-by-line via createReadStream + readline into a new
incremental scanner (createCsharpStructureScanner) instead of buffering the
whole file. Memory is now constant regardless of file size, so the per-file
size cap is dropped for the namespace line-scan and large generated files are
fully collected. extractCsharpStructureViaScanner is reimplemented on the same
incremental scanner (byte-identical; C# parity 2/2). collectDeclaredNamespaces
returns 'ok' | 'truncated' (truncation now only from an unreadable file) and the
truncation warn lists its real causes. csproj reads keep their size guard.

Prior art: ripgrep/ctags/Node readline stream rather than cap for line scans;
GitHub (384KB) and Sourcegraph (1MB) cap only their full-content indexes.
…r CodeQL TOCTOU (#1908)

CodeQL js/file-system-race flagged the fs.stat + fs.readFile size guard in
readCsprojConfig as a check-then-use filesystem race. Replace it with a
length-capped createReadStream (readFileTextCapped) — same memory bound on
untrusted input, no stat-then-read race, and consistent with the streamed
.cs scan. Behavior is unchanged for real .csproj files (parity 2/2).
…1908, Codex F1)

A single scan truncation (unreadable dir/file, depth/dir cap) set one
repo-wide `truncated` flag that made csharpSuffixFallbackAllowed fail
open for EVERY import, silently re-enabling the #1881 BCL->local suffix
matches. Add a CSHARP_EXTERNAL_ROOTS denylist (System/Microsoft/...): an
external-rooted using that does not align with an in-repo declared
namespace stays BLOCKED even under truncation, while genuinely
local-looking usings still fail open. A repo that declares the root is
allowed via the alignment escape hatch. Shared predicate, so both legs
inherit it.
…dex F2)

In the no-csproj branch of resolveCsharpImportTarget, resolveDirectMatch
ran BEFORE the gate, so a path-aligned Legacy/System/Threading/Tasks.cs
satisfied 'using System.Threading.Tasks;' even though System.* is not a
declared in-repo namespace — while the legacy leg (gate-first) blocked
it, so the legs were not equivalent. Run csharpSuffixFallbackAllowed
first (return null on fail), then direct-match, then progressive
stripping — mirroring the legacy ordering. Adds a no-csproj fixture with
a deep path-aligned Tasks.cs and dual-leg integration describes (registry
+ forced-legacy), plus a path-aligned unit case. Parity 2/2.
… matchers (#1908, Codex F3)

The line scanner treated its output as complete even when it missed valid
C# namespace forms, so the gate failed CLOSED and over-blocked legit
imports. Make CS_NAMESPACE_RE/CS_USING_STATIC_RE Unicode-aware (\p{L}\p{N}
+ u flag) and strip leading/segment @ so verbatim/Unicode identifiers are
captured to match the AST. For forms the regex still can't capture (split
across lines, not at line start, attributed), set a per-file 'incomplete'
flag; collectDeclaredNamespaces returns 'truncated' for such files so the
#1881 gate fails OPEN instead of dropping the namespace. High-precision
detectors + guard tests keep ordinary forms (incl. // namespace comments)
from tripping incomplete.
… Codex F4)

readCsprojConfig read only the first 512KB of a .csproj and, on a
match-miss, couldn't tell 'no RootNamespace' from 'RootNamespace past
the cap' — both synthesized a filename root. A wrong authoritative root
makes imports under the real root resolve to nothing AND suppresses the
fallback. Replace the capped read with a streamed early-stop search
(findCsprojRootNamespace) that reads until the tag or EOF: filename
fallback ONLY on genuine read-to-EOF absence; on a soft-budget cap-hit or
unreadable file, OMIT the config so the no-csproj fallback stays
reachable. Removes the now-unused readFileTextCapped + getMaxFileSizeBytes
cap from the scan. Parity 2/2.
@magyargergo magyargergo merged commit 4b787be into main May 30, 2026
27 checks passed
@magyargergo magyargergo deleted the fix/1881-csharp-unrelated-edges branch May 30, 2026 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unrelated edges are generated for C# Files

2 participants