fix(ingestion): migrate all languages' inheritance to scope-resolution on the worker path (#1951)#1956
Conversation
…n the worker path (#1951) Registry-primary C# and Java produced zero EXTENDS/IMPLEMENTS edges when the worker pool was engaged (large repos), so diagrams showed classes and interfaces with no inheritance edges between them. Small fixtures stayed under the worker threshold and ran sequentially, where the legacy heritage path is intact, so the bug was invisible to existing tests. Root cause: inheritance edges for migrated (registry-primary) languages came only from the legacy `@heritage.*` -> processHeritage path, and the worker pipeline drops those legacy artifacts for registry-primary languages via the `shouldAccumulate` gate (parse-impl.ts). Scope-resolution runs in both parse modes (worker-safe) but emitted nothing for C#/Java because, unlike C++, they synthesized no `@reference.inherits` captures. Fix (the existing C++ pattern): C#/Java now synthesize `@reference.inherits` captures from their base lists, routing inheritance through scope-resolution. The generic `preEmitInheritanceEdges` pass now decides EXTENDS vs IMPLEMENTS from the resolved target's symbol kind (Interface -> IMPLEMENTS), mirroring the legacy `resolveExtendsType` semantics so the registry path matches the legacy DAG. C++ has no Interface targets, so it always takes the EXTENDS branch and is unchanged. Capture scope per language matches the legacy heritage query (C#: class+interface base lists; Java: class superclass + implemented interfaces) to preserve scope-resolution parity. The one-line worker fallback (always accumulating deferredWorkerHeritage) was deliberately not used: it would resurrect the legacy DAG for migrated languages, double-emit against C++, and re-introduce the O(files^2) heritage cost the registry migration removes. No double-emission: in sequential mode the legacy path emits first and scope-resolution dedups against the graph; in worker mode the legacy path is dropped and scope-resolution fills the gap. Touches none of PR #1954's protected files. Tests: new worker-forced regression (test/integration/heritage-worker-path.test.ts) that forces the worker pool on small fixtures and asserts the edges; it fails before this change (0 edges) and passes after. C# and Java scope-resolution parity gates pass in both legacy and registry-primary modes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10747 tests passed 10 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
…e + normalize qualified-generic bases (#1951) Addresses two defects found by adversarial/correctness review of the #1951 fix: 1. Wrong inheritance-edge source for C# primary-constructor classes (worker mode). A C# 12 primary constructor (`class User(int id) : Base`) is synthesized into the class scope, so `preEmitInheritanceEdges` -> tryEmitEdge -> `resolveCallerGraphId` degraded the EXTENDS/IMPLEMENTS source to `Constructor:User` instead of `Class:User`. That produced a meaningless constructor-rooted inheritance edge and, because `buildMro` only maps class-like graph ids, silently dropped the class from the MRO (breaking method-override/dispatch resolution). The pre-pass already resolves the correct class source for its dedup key; it now also passes that id to the emitter so the edge is always class-owned. C++/C are unaffected (their constructors are not in the class scope, so the resolved id is identical). 2. Fully-qualified generic C# bases (`A.B.Base<T>`) silently emitted no edge: `terminalTypeNameNode` returned the `generic_name` node verbatim (`Base<T>`), which never resolves. It now recurses on the qualified-name tail to reach the bare identifier (`Base`), matching the documented `IRepository<T>` -> `IRepository` normalization. Also bundles the inheritance pre-pass's `tryEmitEdge` overrides (edge type + caller id) into a single options object instead of trailing positional params, and switches the Java inheritance walk to named-children traversal. Tests (test/integration/heritage-worker-path.test.ts): add a primary-constructor + qualified-generic fixture asserting every inheritance edge source is the Class node (the regression) and that `App.Repo<int>` resolves; add a worker/sequential parity block asserting both modes yield identical, non-duplicated EXTENDS/ IMPLEMENTS; pin edge `reason` to `scope-resolution: inherits`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…itance captures C# now synthesizes @reference.inherits captures from class/interface base lists (plus the new csharp-primary-ctor-heritage fixture), which legitimately changes the C# capture-output corpus. Update the committed fingerprint; the scaling ratio stays linear (~1.06, budget 1.5). No other language drifted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…esolution; gate legacy heritage by isRegistryPrimary (#1951) Completes the #1951 fix. The reported "missing EXTENDS/IMPLEMENTS edges" was an incomplete migration: every language became registry-primary for calls/imports, but inheritance still came from the legacy `@heritage` path, which the worker pipeline drops for registry-primary languages — so inheritance edges silently vanished in worker mode for EVERY registry-primary language (reproduced: Python emitted 0 EXTENDS in worker mode), not just C#/Java. It hid because the parity suite runs sequentially, where legacy heritage still runs. Migrate the remaining 9 languages' class/interface inheritance to scope- resolution (the C++/C#/Java `@reference.inherits` pattern), then suppress the legacy `@heritage` path for registry-primary languages with a bare `isRegistryPrimary` gate — identical to how call-processor / import-processor already gate. Scope-resolution is now the single inheritance source. Per language (matching each legacy `@heritage` query scope exactly for parity): - python/typescript/javascript/kotlin/swift: base-list / heritage-clause walks emitting `@reference.inherits`; EXTENDS vs IMPLEMENTS decided centrally from the resolved target's symbol kind. - go: struct-embedding (anonymous fields), matching the legacy embedding scope. - php: extends / implements / trait-use; trait targets map to IMPLEMENTS. - rust: trait-impls via the `emitHeritageEdges` hook (impl owner != enclosing scope, so the generic inherits pass can't source it) — emits S IMPLEMENTS T. - ruby: `class < superclass` via captures, leaving the existing mixin `emitHeritageEdges` path intact. Shared: - preEmitInheritanceEdges (run.ts): IMPLEMENTS when the resolved target is an Interface OR Trait (matches legacy resolveExtendsType + the trait-impl branch); plus import-aware disambiguation for ambiguous bases (resolveAmbiguousInheritanceBaseViaImports in walkers.ts) — engages only on a findClassBindingInScope miss, preferring the candidate whose file is imported/included by the referencing file (C++ `#include` disambiguation, C# `using` disambiguation). Never changes single-match resolution. - heritage-processor: bare `isRegistryPrimary` skip, mirroring call/import. Regenerated the 7 capture-golden snapshots and re-baselined the scope-capture fingerprints (go/rust/php/ruby/swift) for the new `@reference.inherits` output; all stay linear (scaling ~1.0). Validation: scope-resolution parity 28/28 (all 14 languages, both REGISTRY_PRIMARY_<LANG>=0 and =1 legs); tsc clean; bench fingerprint gate green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ate ts/js/kotlin in the bench While verifying the #1951 inheritance migration didn't introduce superlinear capture scaling, added scope-capture bench coverage for TypeScript, JavaScript, and Kotlin — three languages that had NO scaling gate. That revealed PRE-EXISTING O(n²) in the JS and Kotlin emitters (the #1951 inheritance synths are O(N) and were ruled out by measuring with them disabled). JS `emitJsScopeCaptures` and Kotlin `emitKotlinScopeCaptures` re-derived AST nodes with `findNodeAtRange(tree.rootNode, range, type)` PER query match — an O(matches × N) root walk (the #1848/#1915 pattern fixed for go/python but never caught here because these languages were unbenchmarked). Fixed by threading the tree-sitter captured node `c.node` (per-match nodeMap), mirroring the csharp/go implementations. Behavior-preserving: capture-output fingerprints are unchanged (byte-identical), so resolution/parity are untouched. JavaScript: scaling 2.96 -> 1.01 (~13x; 2995ms -> 232ms on the 800-entity bench) Kotlin: scaling 2.78 -> 0.84 (~8x; 4641ms -> 591ms) bench/scope-capture/measure.mjs now covers 10 languages (added ts/js/kotlin with inheritance-bearing synthetic units so the #1951 synth pass is timed at scale); baselines.json gains their fingerprints. `measure.mjs --check` PASS (10 languages), all scaling_ratio < 1.5. JS/Kotlin resolver parity unchanged in both REGISTRY_PRIMARY legs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
magyargergo
left a comment
There was a problem hiding this comment.
Tri-review — scope-resolution inheritance migration (#1951)
Methods & engine independence. Two methods, both Claude — GitNexus swarm + Compound-Engineering personas (correctness, adversarial, performance, maintainability, testing returned findings; risk + test-CI lanes ended mid-investigation). Codex — the only independent engine — returned no findings, so this is a two-method, both-Claude review, NOT three independent confirmations. Every posted finding was coordinator-verified by re-reading the code ([code-read]); the adversarial lane also reproduced several via tsx.
Visibility. Current visible state is incomplete. I could verify the full diff, each per-language synth against its legacy @heritage query, the shared resolution path, and the worker-test coverage, but not the Codex lane (empty) nor the risk / test-CI agent lanes (ended mid-investigation). Treat those missing items as mandatory verification points, not confirmed facts.
Problem. #1951: registry-primary languages emitted 0 EXTENDS/IMPLEMENTS edges in worker mode (legacy @heritage dropped by the worker pipeline, no scope-resolution replacement). This PR migrates all 14 languages to scope-resolution, gates legacy heritage by isRegistryPrimary, adds import-aware ambiguous-base disambiguation, and fixes pre-existing js/kotlin O(n²) capture loops.
Repository history considered. Base e2758262 is the #1954 typeBindings-OOM merge. walkers.ts (gains the new disambiguation helper) is a memory-hot file — #1954 / #1905 / #1800 all fought OOM there. #1918 ("linearize scope-capture across ALL languages O(n²)→O(n)") claimed all languages but js/kotlin were unbenchmarked → it missed them; this PR closes that gap and adds their bench gate. The registry-primary inheritance pattern follows the C# (#1019) and Swift (#937/#1948) migrations.
Current state & merge status: checks pending (OPEN, mergeStateStatus=BLOCKED with CI in flight). 31 files, +1623/−209.
Branch hygiene: merge-from-main commit present but harmless and merge-safe — #1954 is merged in as the base; the change is additive and touches none of #1954's protected regions.
Findings
P1 — Rust trait-impl resolves the WRONG struct/trait on cross-module name collision.
- Risk:
emitRustTraitImplEdgeskeysgraphIdByNameby bare simple name with last-write-wins, ignoring the impl's module → for twostruct User(or two traits) across modules, an impl edge sources from the wrongUserand the real one is missing. Legacy was file-scoped and refused ambiguous global matches; Rust is registry-primary so legacy is gated off → this hook is the sole emitter. - Evidence to check:
gitnexus/src/core/ingestion/languages/rust/scope-resolver.ts:54.[code-read + tsx-reproduced] - Recommended fix: resolve struct/trait via the impl site's scope (
findClassBindingInScope) / file-scoped context; refuse (emit nothing) on >1 candidate to preserve the legacy "wrong edge is worse than no edge" invariant. - Blocks merge: yes.
P1 (test) — no worker-forced inheritance coverage for 9 of the 14 migrated languages.
- Risk: #1951 is worker-mode-only; the parity CI runs sequentially. The worker-forced test covers only C# and Java. The 9 others (python, typescript, javascript, go, php, kotlin, rust, swift, ruby) now depend entirely on their scope-resolution synth in worker mode (legacy gated off) — a worker-only capture regression in any would pass every CI gate.
- Evidence to check:
gitnexus/test/integration/heritage-worker-path.test.ts:49(describe blocks are C#/Java only).[code-read] - Recommended fix: add worker-forced describe blocks (force the pool + assert
usedWorkerPool===true+ the edge set) for at least go/python/php/rust. - Blocks merge: maybe.
P2 — Java/TypeScript synths emit GENERIC-base edges the legacy @heritage query never matched (latent parity divergence + inaccurate doc).
- Risk: the legacy Java/TS heritage queries are
(type_identifier)-only; the synths normalizegeneric_type(Box<T>→Box) and emit.extends Box<T>/implements IFoo<T>→ registry emits, legacy emits none → the =0/=1 parity gate diverges the moment a generic-base fixture is added (none exists today, so it is green). The synth docs claim "matching the legacy query exactly" — inaccurate. (C# and Rust legacy DO handle generics → at parity.) - Evidence to check: legacy
gitnexus/src/core/ingestion/tree-sitter-queries.ts:784,788; synthgitnexus/src/core/ingestion/languages/java/captures.ts:271andtypescript/captures.ts:507.[code-read + tsx-reproduced] - Recommended fix: widen the legacy Java/TS queries to match generic bases (registry is the more-correct behavior), or drop the synth generic branch; add a generic-base parity fixture; fix the docs.
- Blocks merge: no.
P2 — Swift qualified base Outer.Inner resolves to the qualifier, not the base.
- Risk:
swiftBaseTypeIdentifierreturns the FIRSTtype_identifierchild of auser_type→Outer(the qualifier) forclass A: Outer.Inner, notInner(the base) — wrong target, contradicting its own doc comment. - Evidence to check:
gitnexus/src/core/ingestion/languages/swift/captures.ts:350.[code-read + tsx-reproduced] - Recommended fix: take the LAST
type_identifierchild (the base), per the doc. - Blocks merge: no.
Back-and-forth avoided by verifying
- No double-emission — correctness traced both parse modes: legacy is fully gated for registry-primary in worker (
shouldAccumulate) and sequential (processHeritageisRegistryPrimary skip); theexistingdedup seed inpreEmitInheritanceEdgesis a no-op on the production path (live only in shadow mode). - The js/kotlin O(n²) fix is byte-identical — correctness + adversarial each verified every converted call site (captured node == old
findNodeAtRangeresult; bench fingerprints unchanged). All 10 benched languages < 1.5 scaling. - Shared code is language-neutral (no DoD violation —
Interface||Traitis a symbol-kind check with precedent inresolveExtendsType). Go-embedding / Kotlin-:/ Swift-protocol edge types match legacy for resolved simple-name bases.
Lower-priority (verify before acting)
- PHP
gitnexus/src/core/ingestion/languages/php/captures.ts:325— stale comment claims the central pass treatsTraitas EXTENDS, butrun.ts:125already mapsTrait→IMPLEMENTS; misleading. - Perf — kotlin synth uses
descendantsOfType→descendants(O(N·depth) array-spread;bench/scope-capture/measure.mjsreports kotlin scaling 0.92 — linear-slope, heap-heavy on huge files; pre-existing pattern).gitnexus/src/core/ingestion/scope-resolution/scope/walkers.ts:332allocates two Sets per ambiguous-name call in the OOM-sensitive file, not cached across calls. - Maintainability —
kotlin/captures.ts>1000 lines (extract the synth);gitnexus/src/core/ingestion/languages/go/captures.ts:227findNamedChildOfTypeduplicates canonicalfindChild; six byte-identical per-languagevisit*walkers;tryEmitEdge'sinheritanceOverridebag.
Open questions
- Generic-base inheritance: support it (widen the legacy queries) or match-to-legacy (drop from the synths)? The registry behavior is arguably the more correct one.
Verdict
not production-ready. The Rust cross-module name collision (P1) is a reachable wrong/missing-edge regression for a common Rust pattern (same-named structs/traits across modules), and the legacy path it replaces deliberately refused such ambiguity — it should be resolved or bounded before merge. The Java/TS/Swift name-extraction divergences (P2) and the absent worker-forced tests for 9 of 14 migrated languages (P1-test) are the exact bug class this PR fixes left unguarded, and should be addressed too. That said, the migration's core is solid and validated — scope-resolution parity is 28/28 across both legs, the js/kotlin O(n²) fix is byte-identical, there is no double-emission, and the shared code stays language-neutral — so these are bounded, fixable surface issues, not architectural problems.
Automated multi-tool review digest (2 methods, both Claude; Codex unavailable). Verify each finding before acting.
| skipWorkers: true, | ||
| }); | ||
|
|
||
| describe('C# inheritance edges on the worker path (#1951)', () => { |
There was a problem hiding this comment.
[P1 — testing] No worker-forced coverage for 9 of 14 migrated languages. #1951 is worker-mode-only; the parity CI runs sequentially. This file forces the worker pool only for C#/Java. The 9 others (python/ts/js/go/php/kotlin/rust/swift/ruby) now depend entirely on their scope-resolution synth in worker mode (legacy gated off) — a worker-only capture regression in any passes every CI gate. Fix: add worker-forced describe blocks (force pool + assert usedWorkerPool===true + edge set) for at least go/python/php/rust, mirroring the C# block. [code-read]
| case 'scoped_type_identifier': | ||
| // `java.io.Serializable` → trailing `type_identifier` (`Serializable`). | ||
| return node.lastNamedChild; | ||
| case 'generic_type': { |
There was a problem hiding this comment.
[P2 — correctness/parity] Generic-base divergence from legacy. This generic_type branch normalizes Box<T>→Box and emits an inheritance edge, but the legacy Java @heritage query (tree-sitter-queries.ts:784,788) is (type_identifier)-only and matches nothing for extends Box<T> / implements IFoo<T>. So registry(=1) emits an edge legacy(=0) doesn't → the parity gate diverges the moment a generic-base fixture exists (none does today, so it's green). The doc comment's "matching the legacy query exactly" is inaccurate. Same in typescript/captures.ts:507. (C#/Rust legacy DO handle generics → fine.) Fix: widen the legacy Java/TS queries to match generic bases (registry is more correct) or drop this branch; add a generic-base parity fixture. [code-read + tsx-reproduced]
| case 'identifier': | ||
| case 'type_identifier': | ||
| return node; | ||
| case 'generic_type': { |
There was a problem hiding this comment.
[P2 — correctness/parity] Same generic-base divergence as Java. terminalTsTypeNameNode normalizes generic_type and the synth emits for implements IFoo<T> / extends Base<T>, but the legacy TS @heritage query is (type_identifier)-only → registry emits edges legacy doesn't. Latent =0/=1 parity break + the doc's "matching legacy" is inaccurate. See the Java comment for the fix. [code-read + tsx-reproduced]
|
Claude finished @magyargergo's task in 7m 36s —— View job PR #1956 — Production-Readiness ReviewTodo List
Review bar for this PR
Problem being solvedIssue #1951: In large repos, inheritance/implementation edges ( The PR starts with a C#/Java fix then expands in commit Current PR state
Merge status and mergeabilitychecks failing The CI main workflow ( Repository history considered
Branch hygiene assessmentmerge-from-main commit present but harmless and merge-safe Commits Understanding of the changeLegacy heritage gating: New Shared inheritance pre-pass: Edge override in Import-aware disambiguation: JS/Kotlin O(n²) fix: Both capture files were using Rust trait-impl bridge: Bench/golden: All 10 benchmarked languages now have linear-slope entries in FindingsFinding 1 — Rust cross-module name collision: wrong/missing trait-impl edgeSeverity: P1 Status: confirmed Risk: Contrast with the legacy heritage processor ( Evidence checked: Recommended fix: Before building Blocks merge: yes Finding 2 — No worker-forced inheritance coverage for 9 of 14 migrated languagesSeverity: P1 Status: confirmed Risk: Evidence checked: Recommended fix: Add at minimum one worker-forced describe block each for Go (embedding), PHP (extends + trait use), Rust (trait impl), and Python (inheritance) — the four highest-risk migrated languages. Each should assert Blocks merge: yes — all 9 newly-migrated languages go unguarded against worker-only regressions with no forced-worker test coverage. Finding 3 — Java generic-base parity divergence from legacySeverity: P2 Status: confirmed (latent) Risk: Evidence checked: Recommended fix: Either (a) widen the legacy Java heritage query to match Blocks merge: no on its own, but this is latent parity drift the PR's author explicitly claims doesn't exist. Finding 4 — TypeScript generic-base parity divergence from legacySeverity: P2 Status: confirmed (latent) Risk: Same pattern as Finding 3. Evidence checked: Recommended fix: Same as Finding 3 — widen legacy TS heritage query and add parity fixture, or document and test the intentional expansion. Blocks merge: no on its own. Finding 5 — Swift qualified base resolves to qualifier, not terminal baseSeverity: P2 Status: unverified suspicion — intentional legacy match, semantic concern open Risk: Counter-evidence: The function's doc comment at lines 339–344 explicitly states "a qualified Recommended fix: No change required for merge, but a Swift fixture with Blocks merge: no — intentional legacy parity. Finding 6 — PHP stale trait commentSeverity: low Status: confirmed (misleading but behavior is correct) Risk: Evidence checked: Recommended fix: Update the comment to: "The central pass ( Blocks merge: no. Finding 7 — Kotlin/walkers perf allocation concernSeverity: low-P2 Status: unverified suspicion — benchmarks show linear, walkers concern minimal Risk: Kotlin synth ( Evidence checked: Recommended fix: No change required before merge. Follow-up: cache the two Set allocations across calls in Blocks merge: no. Finding 8 — Maintainability duplicationSeverity: low Status: unverified (code duplication exists; no confirmed correctness risk) Risk: Six per-language Blocks merge: no. Finding 9 — CI / check status failingSeverity: P1 Status: confirmed Risk: CI workflow run Evidence checked: Recommended fix: Investigate and fix the failing test before merging. Re-run CI to confirm green. Blocks merge: yes. PR-specific assessment sectionsWorker-path inheritance restorationLane A — PASS for C#/Java, UNGUARDED for 9 others. The core mechanism is verified correct: (1) workers produce Per-language parity and synth scopeC#: Synth matches legacy ( Java: Synth extends beyond legacy by handling Python: Synth matches legacy ( TypeScript: Synth extends beyond legacy by handling JavaScript: Synth matches legacy ( Go: Synth adds struct embedding inheritance captures. Sequential parity passes. No worker-forced test. PHP: Synth covers class Kotlin: Synth covers Rust: Swift: Synth covers Ruby: Synth covers C++: Pre-existing synth, unchanged. Behavior preserved. C, COBOL: No inheritance; no synth needed. Edge source/target/type/reason correctness
Name resolution and ambiguity
MRO/downstream dispatch
Performance and memory
Test adequacy and CI wiring
Branch hygiene and mixed-domain riskAll 31 changed files are causally connected to the #1951 scope (inheritance edge migration + JS/Kotlin benchmark hardening). The all-language migration is justified: committing a partial migration (only C#/Java) while gating legacy for all 14 languages would leave 12 languages with neither legacy nor scope-resolution inheritance. JS/Kotlin O(n²) fix belongs here because it validates the benchmark gate added to prove the migration doesn't regress performance. No unrelated workflow, web, release, or dependency churn in the 31-file diff. Hidden Unicode/hygiene
Back-and-forth avoided by verifying
Open questions
Final verdictnot production-ready Three confirmed issues block merge:
The core migration architecture is sound: the |
…collision (#1951) emitRustTraitImplEdges resolved both the struct S and trait T of an impl block through a workspace-wide simple-name index with last-write- wins, so two same-named structs/traits across modules collapsed onto one def — sourcing impl edges from the wrong module (and dropping the real edge). Thread the finalized ScopeResolutionIndexes through the emitHeritageEdges hook and resolve S/T from the impl block's own scope via findClassBindingInScope, falling back to import-aware disambiguation, refusing (emitting nothing) on ambiguity — restoring the legacy file-scoped path's 'a wrong edge is worse than no edge' contract. New rust-cross-module-collision fixture proves each edge sources from the User in its own module. rust-traits parity unchanged (2/2).
…c bases (#1951) The registry-primary scope-resolution synth normalized generic bases (Box<T> -> Box) and emitted inheritance edges, but the legacy @Heritage queries were type_identifier-only and matched nothing for extends Box<T> / implements IFoo<T>. That made registry(=1) and legacy(=0) diverge the moment a generic-base fixture existed — a latent scope-parity break, and the synth docs' 'matching legacy exactly' was inaccurate. Widen the legacy Java queries (superclass/super_interfaces generic_type) and the TS implements clause (generic_type name:) to capture the inner type_identifier — the bare base name, so capture text is unchanged. TS extends already captured via value:(identifier). Both paths now agree on generic bases (the more-correct behavior, matching C#/Rust). New java-generic-base + typescript-generic-base fixtures assert the edges under BOTH parity legs; docs corrected.
…egment (#1951) swiftBaseTypeIdentifier returned the FIRST type_identifier of the flat user_type, so 'class A: Outer.Inner' synthesized an inheritance reference to the qualifier Outer instead of the nested base Inner — contradicting its own doc. Take the LAST type_identifier (Outer.Inner -> Inner; generic args sit in a sibling type_arguments node, so Box<Int> -> Box and Outer.Inner<T> -> Inner fall out naturally). firstInheritedType in receiver-binding.ts had the same latent qualifier bug (it feeds super-receiver binding) and is fixed in lockstep; both docs corrected. New capture-level unit test asserts the extracted base is Inner (the changed path; a bare nested-type name does not resolve to an edge in the current model, so this is the precise observable). swift-qualified-base fixture added; golden regenerated (purely additive — no existing entry drift, confirming non-qualified bases are byte-identical).
) The synthesizePhpInheritanceReferences NOTE claimed the central pass classifies a resolved Trait as EXTENDS and 'must treat Trait as IMPLEMENTS for full parity' — but preEmitInheritanceEdges already maps Interface || Trait to IMPLEMENTS, so trait-use is correct on both paths. The stale note would mislead a future reader into 'fixing' a non-bug.
…anguages (#1951) #1951 was worker-mode-only, but worker-forced coverage existed for only C# and Java; the other 9 migrated languages depended entirely on their scope-resolution synth in worker mode (legacy heritage gated off), so a worker-only capture regression would pass every sequential CI gate. Add table-driven worker-forced blocks for go, python, php, rust, kotlin, ruby, typescript, javascript, and swift — each forces the pool, asserts usedWorkerPool === true (guarding against silent sequential fallback), and pins the exact EXTENDS / IMPLEMENTS edge set. Rust exercises the scope-aware trait-impl resolution and TS the generic-base path in worker mode. Header doc updated to the full migrated-language scope.
…1956 review fixtures Adding rust-cross-module-collision, swift-qualified-base, and typescript-generic-base to the lang-resolution corpus changes the order-independent capture fingerprint for those three languages (the bench globs lang-resolution/<lang>-*). The change is additive only — existing fixtures are byte-identical (swift golden regen was purely additive; ts/rust capture synth is untouched) and scaling stays linear (rust ~1.02, swift ~1.07, ts ~1.0). All 10 benched languages pass --check.
…lision fixture The rust-captures-golden test globs all rust-* lang-resolution fixtures; the new rust-cross-module-collision fixture (added for the #1951 review P1 fix) adds 4 entries to the snapshot. Purely additive — existing fixture digests are unchanged, confirming emitRustScopeCaptures is byte-identical (the P1 fix lives in scope-resolver.ts, not the capture path).
Tighten the Java/TS inheritance-synth doc comments: the legacy-query widening achieves parity on SIMPLE (unqualified) generic bases; qualified bases (a.b.Box, a.b.Box<T>) remain a pre-existing legacy gap the synth resolves but the legacy query does not (both outcomes safe). The prior 'both paths now agree on generic bases' overstated the scope. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ance captures The #1951 migration added python @reference.inherits capture synthesis (languages/python/captures.ts), which changes emitPythonScopeCaptures output and therefore its fingerprint. The original PR re-baselined the unified scope-capture bench (baselines.json) but missed python's SEPARATE harness (bench/python-scope/baseline-fingerprint.txt) — leaving the 'tests / benchmarks' job red. Update the committed fingerprint to match the legitimate post-migration capture output (scaling stays linear ~1.05; python resolver parity is 221/221 both legs). The import-target resolver fingerprint is unaffected and still passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1956 review) Address the code-review P2: preEmitInheritanceEdges inlined a two-set dedup guard (coarse `existing` per-(caller,target,type) gate seeded from the graph + per-site `seen` shared with the generic edge bridge) whose joint semantics weren't self-documenting at the call site. Extract emitInheritanceEdgeDirect, which owns both keys and documents the contract in one place, so a future inheritance pass reuses it rather than re-deriving the dual-key pattern. dedupKey and the rel: id shape are unchanged, so graph output stays byte-identical (verified: pipeline-graph-golden + heritage-worker-path green; tsc clean).
…y leg (#1956 review) The U8 hardening of the C# ambiguous-heritage assertion (pin Tier-2 resolution to Models/Handler.cs) is correct for the registry-primary model but broke the legacy parity leg: C# namespace `using` does not emit the file-level import edge in the legacy DAG (the same root cause as the existing `using-import edge ... through the scope-resolution path` expected-failure), so the legacy leg cannot disambiguate and resolveHeritageId refuses to a synthetic Class:/Interface: target. The old vacuous `if (targetFilePath)` guard passed in both legs only because it asserted nothing in the legacy leg. Per the migration policy (scope-resolution is the correct model), this is a scope-resolver-only correctness win: assert the correct registry-primary behavior unconditionally (no leg-conditional logic in the test) and list the test in LEGACY_RESOLVER_PARITY_EXPECTED_FAILURES (helpers.ts), right beside the C# using-import-edge exclusion it shares a root cause with. Verified: full scope-parity suite 28/28 (all 14 languages, both legs) — C# was the only break; the fix is the only change needed.
…alized heritage) PR #1940 ("Centralize heritage supertype matching") refactored the legacy @Heritage queries from hand-written per-language arms into config-driven alternations: heritage-extractors/configs/<lang>.ts declare the supertype node shapes, buildSupertypeAlternation() generates the [(a)(b)...]@Heritage.* arms, and normalizeSupertypeName() reduces the matched supertype node to its innermost simple name at runtime. Conflict: tree-sitter-queries.ts (both sides rewrote the heritage section). Resolved by taking #1940's version wholesale — every change this branch made to that file (the U1 Rust scoped impl_item arm, the U2 Java/TS scoped arms, and the Java end-anchor 2-segment fix) is fully superseded: #1940's shape descriptors already include the same scoped shapes (Java/Rust scoped_type_identifier; TS member_expression + nested_type_identifier), and capturing the whole supertype node + normalizing at runtime structurally avoids the 2-segment double-match the end-anchor was guarding. The registry-primary synth changes (rust/ts captures.ts) are on a different path and untouched by #1940; they remain and continue to agree with the legacy leg. Verified on the merged tree: tsc clean; the java/typescript/rust qualified-base fixtures pass under BOTH legs (registry synth == #1940 legacy normalizer).
… 7 registry synths (#1956) PR #1940 ("Centralize heritage supertype matching") widened the LEGACY @Heritage leg (config-driven shapes + normalizeSupertypeName) to capture qualified / generic / scoped / attribute / subscript / record / struct / delegation / member-expression / interface-embed bases. But the registry-PRIMARY synths (the production path for migrated languages) were never widened to match, so in production these inheritance edges were silently dropped — a pre-existing gap exposed as a legacy>synth asymmetry by an audit of the post-#1940 merge. This is the exact #1951 theme; rust/ts/cpp already handled their qualified bases, so this brings the remaining 7 languages to parity: - python: attribute (class X(pkg.Base)) + subscript (Generic[T]) bases - go: qualified_type (pkg.Base), generic_type (Box[T]), pointer, AND interface_type embeds (the synth previously walked struct_type only) - javascript: member_expression base (extends ns.Base) - csharp: walk record_declaration + struct_declaration base_lists (incl. primary_constructor_base_type) + alias_qualified_name - ruby: scope_resolution superclass (class C < Mod::Super) - kotlin: explicit_delegation (class F : Iface by d) - java: walk interface_declaration extends_interfaces (interface IA extends IB) Each synth's base-name extractor was widened to return the trailing/inner node for the new shapes (existing simple-base path byte-identical); each was real-parse-verified so the synth's bare name equals normalizeSupertypeName(base) — the legacy leg's reduction — guaranteeing registry<->legacy agreement. New <lang>-qualified-base / java-iface-extends fixtures + both-leg parity blocks assert the new edges; full parity suite is 28/28 (all 14 langs, both legs). Goldens regenerated additively (existing fixtures gain one inherits capture each); scope-capture + python-scope benches re-baselined, all linear. Known follow-up: csharp record->record in the SAME namespace (record UserRecord : BaseEntity) — the synth emits the capture but the same-namespace record-target binding is not resolved on the registry leg (legacy does emit it); a separate registry resolution gap, not asserted here to avoid a leg divergence.
* fix(java): close parsing-layer coverage gaps F35/F38/F41 (#1928) Registry-primary scope-resolution path (the live one post-#942/#943): - F35 [HIGH]: qualified / qualified-generic constructor calls. `new pkg.Foo()` parses as a `scoped_type_identifier` that the query bound only as `@reference.call.constructor.qualified` with no `@reference.name`, so the scope extractor fell back to the whole-expression anchor and the reference name became the raw `new pkg.Foo()` text (never resolved). Bind the simple -name tail (end-anchored last child) and add an arm for the previously uncaptured `new pkg.Box<String>()` (qualified + generic) shape. - F38 [MEDIUM]: `super(...)` / `this(...)` explicit constructor invocations, modeled as `explicit_constructor_invocation` and never matched by the scope query, dropped the chained-constructor CALLS edges. Synthesize them with the target resolved structurally (this -> enclosing type name; super -> superclass tail via the shared javaBaseLookupNameNode, skipping implicit Object) plus arity for overload disambiguation. - F41 [LOW]: interpretJavaTypeBinding stripped the qualifier before generics, so a qualified generic type arg (`Map<String, com.example.User>`) was cut inside the generic into `User>`. Strip generics first, then the qualifier; make the erasure fallback qualifier-tolerant. F36/F37 already landed upstream (#1940/#1956); F39/F40 are legacy-bank remnants that are no longer consumed (legacy @import skipped in parse-worker; legacy @call never read in parse-impl) so they are intentionally left untouched. Tests: low-level capture unit tests (constructor shapes incl. double-match guard; super/this/enum/implicit-Object), interpretJavaTypeBinding unit tests (qualified generic args + the corruption case), and end-to-end resolver tests with new fixtures asserting the CALLS edges resolve to the correct constructors. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(scope-resolution): register Constructor overload keys so this()/super() chains don't self-loop (#1928 F38 review) Review of #2045 caught two gaps; both confirmed by reproduction. P2 — F38 this() emitted a self-loop. On the java-explicit-constructor fixture, Child(int){ this(); } produced CALLS Child()#0 -> Child()#0 instead of Child(int)#1 -> Child()#0. Root cause is the language-agnostic graph-bridge: the parse phase mints distinct Constructor nodes (Child#0, Child#1) carrying parameterTypes, but node-lookup.ts registered the parameter-types / shape overload keys only for Function/Method, never Constructor, so both ctors collapsed onto the first-wins qualified/simple key and the caller Child(int) resolved to Child#0 (the this() target). Extend the overload keys to Constructor in both node-lookup.ts (registration) and ids.ts (lookup) via a shared isOverloadableCallable predicate. Verified the edge now connects distinct nodes (Child#1 -> Child#0); super(1)->Base#1 still correct. No cross-language regressions (the 9 worker-path failures reproduce identically on clean HEAD). Also harden the integration test: it matched the this() edge on name only, which a self-loop satisfies; now assert the endpoints are DISTINCT constructors. P3 — F41 order-regression guard was inert (List<Map<String,User>> normalizes to List under both strip orders). Add List<com.x.Foo<String>> -> List, which is corrupted to Foo<String>> under the old order and only correct generics-first. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(java): update fingerprint and add notes for constructor query captures in baselines.json Updated the fingerprint for the Java section and added detailed notes regarding the enhancements in constructor query captures, including qualified and qualified-generic constructor queries. This change reflects ongoing improvements in the parsing layer coverage and fixture updates. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Gergő Magyar <gergomagyar@icloud.com>
Summary
Fixes #1951 — registry-primary languages produced zero
EXTENDS/IMPLEMENTSedges in worker mode, so diagrams showed classes and interfaces with no inheritance edges between them. The bug was invisible to tests because small fixtures stay under the worker threshold and run sequentially, where the legacy heritage path is still intact.This PR migrates every language's inheritance to the scope-resolution path and gates the legacy
@heritageDAG behindisRegistryPrimary, so inheritance edges are emitted consistently in both the sequential and worker pipelines.Root cause
Inheritance edges for migrated (registry-primary) languages came only from the legacy
@heritage.*→processHeritagepath. The worker pipeline drops those legacy artifacts for registry-primary languages via theshouldAccumulategate (parse-impl.ts). Scope-resolution runs in both parse modes (worker-safe — it re-parses on cache miss) but emitted nothing for the migrated languages because, unlike C++, they synthesized no@reference.inheritscaptures.Fix
languages/*/captures.tssynthesize@reference.inheritscaptures from each language's heritage syntax (base lists,implblocks,usetraits, embedding), routing inheritance through scope-resolution. Capture scope matches each language's legacy@heritagequery to preserve parity.scope-resolution/pipeline/run.ts(preEmitInheritanceEdges, language-agnostic) decidesEXTENDSvsIMPLEMENTSfrom the resolved target's symbol kind (Interface/Trait→IMPLEMENTS), mirroring legacyresolveExtendsType. C++ has noInterface/Traittargets → alwaysEXTENDS, behavior unchanged. Adds import-aware disambiguation (resolveAmbiguousInheritanceBaseViaImports) for ambiguous same-named bases.isRegistryPrimary, and the js/kotlin pre-existing O(n²)findNodeAtRange-per-match scope-capture loops were linearized (threaded captured node) and added to the bench gate.The one-line worker fallback (always accumulating
deferredWorkerHeritage) was deliberately not used: it would resurrect the legacy DAG for migrated languages, double-emit against C++, and re-introduce the O(files²) heritage cost the registry migration removes. No double-emission: sequential mode → legacy emits first → scope-resolution dedups against the graph; worker mode → legacy dropped → scope-resolution fills the gap.Review findings addressed (#1956 self-review)
emitRustTraitImplEdgesresolved theimplblock's structSand traitTthrough a workspace-wide simple-name index with last-write-wins, so two same-named structs/traits across modules collapsed onto one def — sourcing impl edges from the wrong module. Now threads the finalizedScopeResolutionIndexesinto theemitHeritageEdgeshook and resolvesS/Tscope-aware (findClassBindingInScope+ import-aware fallback), refusing on ambiguity — restoring the legacy "a wrong edge is worse than no edge" invariant. Newrust-cross-module-collisionfixture proves each edge sources from its own module.heritage-worker-path.test.tscovered only C#/Java; the other 9 languages depended entirely on their worker-mode synth with no worker-forced test. Added table-driven worker-forced blocks (force the pool, assertusedWorkerPool === true+ the exact edge set) for go, python, php, rust, kotlin, ruby, typescript, javascript, and swift.Box<T>→Box) the legacy@heritagequery never matched ((type_identifier)-only), a latent=0/=1scope-parity break. Widened the legacy Java queries (superclass/super_interfacesgeneric_type) and the TSimplementsclause to capture the innertype_identifier— both paths now agree on generic bases (consistent with C#/Rust). Newjava-generic-base/typescript-generic-basefixtures assert it under both parity legs. Inaccurate "matches legacy exactly" docs corrected.swiftBaseTypeIdentifierreturned the firsttype_identifierof the flatuser_type, soclass A: Outer.Innerresolved toOuterinstead of the baseInner. Now takes the trailing segment;firstInheritedType(super-receiver binding) had the same latent bug and is fixed in lockstep. New capture-level unit test +swift-qualified-basefixture.run.tsTrait→IMPLEMENTSmapping.Tests
test/integration/heritage-worker-path.test.tsnow covers all 11 migrated languages (forces the pool, assertsusedWorkerPool === trueand the exact edge set). Fails before this change (0 edges in worker mode), passes after.tsc --noEmit,eslint,prettier --check, scope-capture bench (--check, fingerprints re-baselined for the 3 added fixtures, scaling stays linear), and the consolidated scope-parity runner are green.Compatibility
Touches none of PR #1954's protected files. C++ inheritance behavior is unchanged (always
EXTENDS). The sharedpreEmitInheritanceEdges/emitHeritageEdgeschanges stay language-neutral; theemitHeritageEdgescontract gained an appended optionalscopesparameter (Ruby's narrower hook stays assignable). Graph node/edge shapes and persisted IDs are unchanged.Performance & heritage benchmarking
The inheritance-capture synth (
@reference.inherits) added by this migration was only driven at scale by the scope-capture benchmark for ts/js/kotlin; go/csharp/rust/php/ruby/swift/python had flat synthetic scale sources, and java, c, and c++ were not benched at all. Every language's synthetic bench source is now heritage-bearing, covering its distinct form(s) — go struct-embedding; csharp/java extends+implements; rust trait-impl; php extends+trait-use; ruby superclass+mixin; swift superclass+protocol; python single+multiple inheritance; c++ single+multiple inheritance (c has no inheritance — flat) — so the synth's per-node cost is gated for every language (scaling < 1.5).Adding the three previously-unbenched languages surfaced the same pre-existing O(n²) in each — the
findNodeAtRange(tree.rootNode, …)per-match root-walk that #1848/#1915/#1918 fixed for the other languages but missed for java/c/c++ (they had no bench). All three fixed by threading the query's captured node (capture output byte-identical, verified by before/after digests over every fixture): java 3.089→0.99, c 3.475→0.96, c++ 2.30→1.12. C++'semitCppInheritanceCapturesis the original synth pattern #1951 followed for the other languages.A pre-existing flaky test (
sequential-language-availability→ processHeritage skip-warning) is also de-flaked: #1951'sisRegistryPrimarygate preempts the parser-availability skip for migrated languages, so the test now forces legacy mode to deterministically exercise the path it asserts.Tri-review follow-ups
A subsequent tri-review (Codex/gpt-5.5 + CE personas) of this migration surfaced follow-ups, planned + deepened (4-lens document review) and implemented here as 10 commits:
impl crate::traits::Drawable for User(andimpl m::Wrapped<T> for S) now resolves by its trailing simple name in both legs (bareTypeIdentifierhandlesscoped_type_identifier/generic_type; the legacy Rust@heritageimpl_itemarm became one real-parse-verified alternation).@heritagequeries (Javascoped_type_identifier, TSmember_expression/nested_type_identifier, plain + generic) to match the synth, and fixed a TS synth gap (terminalTsTypeNameNodenow treats aproperty_identifiertail as a leaf, soextends ns.Basesynthesizes an edge it previously dropped). Newjava-qualified-base/typescript-qualified-baseparity fixtures run under both legs.render(<Foo a b/>)) no longer inherits the enclosing call's arity — an early JSX-anchor guard at the arity call site (JS +.tsx); the stale "resolves to null for a JSX anchor" comment is corrected.emitHeritageEdgesscope-index dependency; dropped thetryEmitEdgeinheritance-override bag (emit directly viaemitInheritanceEdgeDirect); extracted a sharedwalkNamedTreereplacing seven near-identical per-language walkers; deduplicated the Swift base-tail helper into a leaf module.bench --check+ all seven*-captures-goldendigests unchanged.if (targetFilePath)guard); addedresolveAmbiguousInheritanceBaseViaImportsbranch unit tests; de-languaged the EXTENDS-vs-IMPLEMENTS discriminator comment (AGENTS.md); clarified the Swift registry-primary availability-test titles (Swift stays registry-primary, no flag forcing).Tier-2 review of the follow-ups (caught + fixed): the Java scoped
@heritagearms double-matched a 2-segment qualified base (extends Outer.Innerparses both segments as directtype_identifierchildren → spuriousEXTENDS Outer+ a=2/=1parity break), masked because the fixtures used only 3-segment names. Fixed with a trailing end-anchor on all four Java arms, guarded by new 2-segment fixtures (Plain,Two), verified=1/=1both legs. The flagged P2 (dual dedup-set readability) was applied. One advisory accepted: the TSmember_expressionleaf also matchesextends this.Base— parity-symmetric, below the confidence gate, and not tightenable without breaking legitimate deep-namespace bases. Every other claim (U5 byte-identical emission, U1 alternation, U6 walker, U3 guard, U7 cycle-free) was independently refuted as sound.Post-merge heritage audit + synth widening (7 languages)
Merging
origin/mainbrought in PR #1940 ("Centralize heritage supertype matching"), which widened the legacy@heritageleg (config-driven shapes + a runtimenormalizeSupertypeName) to capture qualified / generic / scoped / attribute / subscript / record / struct / delegation / member-expression / interface-embed bases. A per-language audit (one verifier per language, real-parse) confirmed the merge dropped nothing — but surfaced that the registry-primary synths (the production path) were never widened to match, silently dropping those inheritance edges in production. This is the same #1951 gap already fixed for rust/ts/cpp; the audit found it unfixed for 7 more languages, so this PR closes them:class X(pkg.Base)) + subscript (Generic[T]) basesqualified_type(pkg.Base),generic_type(Box[T]), pointer, ANDinterface_typeembeds (the synth previously walkedstruct_typeonly)member_expressionbase (extends ns.Base)record_declaration+struct_declarationbase-lists (incl.primary_constructor_base_type) +alias_qualified_namescope_resolutionsuperclass (class C < Mod::Super)explicit_delegation(class F : Iface by d)interface_declarationextends-interfaces (interface IA extends IB)Each synth's base-name extractor was widened to reduce the new shapes to the bare name, real-parse-verified to equal
normalizeSupertypeName(base)(the legacy leg's reduction) so registry↔legacy agree by construction; the existing simple-base path is byte-identical. New<lang>-qualified-base/java-iface-extendsfixtures + both-leg parity blocks assert the edges. Full parity suite: 28/28 (all 14 languages, both legs). Goldens regenerated additively (existing fixtures gain one inherits capture each); scope-capture + python-scope benches re-baselined, all linear.Known follow-up: csharp
record UserRecord : BaseEntityin the same namespace — the synth emits the capture but the same-namespace record-target binding is not resolved on the registry leg (the legacy leg does emit it). A separate, narrow registry resolution gap, not asserted to avoid a leg divergence; both-safe (missing edge).Residual Review Findings
Non-blocking items from the autonomous multi-agent code-review pass (verdict: ready to merge — zero correctness/security/performance/contract/standards findings). No issue tracker is configured for autonomous filing, so they are recorded here as the durable sink.
gitnexus/src/core/ingestion/languages/rust/scope-resolver.ts:78: the Rust refuse-on-ambiguity branch is not exercised end-to-end by a Rust fixture. The refusal helper (resolveAmbiguousInheritanceBaseViaImports) is already covered by existing C++/C# ambiguity tests, and the resolved path is covered byrust-cross-module-collision+rust-traitsparity (both legs). A Rust-specific E2E fixture (two same-named structs,implin a third file importing neither) asserting0 IMPLEMENTSwould close it; deferred to avoid anotherrust-*golden/bench re-baseline.gitnexus/src/core/ingestion/languages/swift/receiver-binding.ts:99:firstInheritedType's qualified-base change has nosuper.method()integration test. Low value — a qualified Swift superclass produces no observable edge in the current model; the parallelswiftBaseTypeIdentifierchange is unit-tested.Post-Deploy Monitoring & Validation
No additional operational monitoring required — these are offline code-graph ingestion changes (no runtime service, endpoint, or persistent-data surface). Correctness is gated entirely by CI:
ci-scope-parity— all 14 languages, bothREGISTRY_PRIMARY_*=0/1legs: the registry↔legacy edge set must stay identical (the 2-segment Java double-match would surface here as=1/=0).tests / benchmarks—bench/scope-capture/measure.mjs --check(13 langs, scaling < 1.5) + the*-captures-goldendigests must stay byte-identical (re-baselined additively for rust/java/ts here).resolveAmbiguousInheritanceBaseViaImportstests.Healthy signal: all gates green. Failure signal: a parity
=1/=0divergence or a golden/fingerprint drift on a language whose synth wasn't intended to change — each unit is a standalone commit, so revert the offending commit.🤖 Generated with Claude Code