Skip to content

fix(ingestion): migrate all languages' inheritance to scope-resolution on the worker path (#1951)#1956

Merged
magyargergo merged 38 commits into
mainfrom
fix/1951-extends-implements-edges
Jun 1, 2026
Merged

fix(ingestion): migrate all languages' inheritance to scope-resolution on the worker path (#1951)#1956
magyargergo merged 38 commits into
mainfrom
fix/1951-extends-implements-edges

Conversation

@magyargergo

@magyargergo magyargergo commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #1951 — registry-primary languages produced zero EXTENDS/IMPLEMENTS edges in worker mode, so diagrams showed classes and interfaces with no inheritance edges between them. The bug was invisible to tests because small fixtures stay under the worker threshold and run sequentially, where the legacy heritage path is still intact.

This PR migrates every language's inheritance to the scope-resolution path and gates the legacy @heritage DAG behind isRegistryPrimary, so inheritance edges are emitted consistently in both the sequential and worker pipelines.

Root cause

Inheritance edges for migrated (registry-primary) languages came only from the legacy @heritage.*processHeritage path. The worker pipeline drops those legacy artifacts for registry-primary languages via the shouldAccumulate gate (parse-impl.ts). Scope-resolution runs in both parse modes (worker-safe — it re-parses on cache miss) but emitted nothing for the migrated languages because, unlike C++, they synthesized no @reference.inherits captures.

Fix

  • languages/*/captures.ts synthesize @reference.inherits captures from each language's heritage syntax (base lists, impl blocks, use traits, embedding), routing inheritance through scope-resolution. Capture scope matches each language's legacy @heritage query to preserve parity.
  • scope-resolution/pipeline/run.ts (preEmitInheritanceEdges, language-agnostic) decides EXTENDS vs IMPLEMENTS from the resolved target's symbol kind (Interface/TraitIMPLEMENTS), mirroring legacy resolveExtendsType. C++ has no Interface/Trait targets → always EXTENDS, behavior unchanged. Adds import-aware disambiguation (resolveAmbiguousInheritanceBaseViaImports) for ambiguous same-named bases.
  • Legacy heritage is gated by isRegistryPrimary, and the js/kotlin pre-existing O(n²) findNodeAtRange-per-match scope-capture loops were linearized (threaded captured node) and added to the bench gate.

The one-line worker fallback (always accumulating deferredWorkerHeritage) was deliberately not used: it would resurrect the legacy DAG for migrated languages, double-emit against C++, and re-introduce the O(files²) heritage cost the registry migration removes. No double-emission: sequential mode → legacy emits first → scope-resolution dedups against the graph; worker mode → legacy dropped → scope-resolution fills the gap.

Review findings addressed (#1956 self-review)

  • P1 (Rust, blocks merge) — cross-module trait-impl collision. emitRustTraitImplEdges resolved the impl block's struct S and trait T through a workspace-wide simple-name index with last-write-wins, so two same-named structs/traits across modules collapsed onto one def — sourcing impl edges from the wrong module. Now threads the finalized ScopeResolutionIndexes into the emitHeritageEdges hook and resolves S/T scope-aware (findClassBindingInScope + import-aware fallback), refusing on ambiguity — restoring the legacy "a wrong edge is worse than no edge" invariant. New rust-cross-module-collision fixture proves each edge sources from its own module.
  • P1 (test) — worker-forced coverage for all migrated languages. heritage-worker-path.test.ts covered only C#/Java; the other 9 languages depended entirely on their worker-mode synth with no worker-forced test. Added table-driven worker-forced blocks (force the pool, assert usedWorkerPool === true + the exact edge set) for go, python, php, rust, kotlin, ruby, typescript, javascript, and swift.
  • P2 (Java/TS) — generic-base parity. The synths emitted generic-base edges (Box<T>Box) the legacy @heritage query never matched ((type_identifier)-only), a latent =0/=1 scope-parity break. Widened the legacy Java queries (superclass/super_interfaces generic_type) and the TS implements clause to capture the inner type_identifier — both paths now agree on generic bases (consistent with C#/Rust). New java-generic-base / typescript-generic-base fixtures assert it under both parity legs. Inaccurate "matches legacy exactly" docs corrected.
  • P2 (Swift) — qualified base resolved to the qualifier. swiftBaseTypeIdentifier returned the first type_identifier of the flat user_type, so class A: Outer.Inner resolved to Outer instead of the base Inner. Now takes the trailing segment; firstInheritedType (super-receiver binding) had the same latent bug and is fixed in lockstep. New capture-level unit test + swift-qualified-base fixture.
  • low (PHP) — stale comment corrected to point at the already-correct run.ts TraitIMPLEMENTS mapping.

Tests

  • Worker-forced regression test/integration/heritage-worker-path.test.ts now covers all 11 migrated languages (forces the pool, asserts usedWorkerPool === true and the exact edge set). Fails before this change (0 edges in worker mode), passes after.
  • Generic-base fixtures asserted under both parity legs; Rust collision + Swift qualified-base coverage added.
  • tsc --noEmit, eslint, prettier --check, scope-capture bench (--check, fingerprints re-baselined for the 3 added fixtures, scaling stays linear), and the consolidated scope-parity runner are green.

Compatibility

Touches none of PR #1954's protected files. C++ inheritance behavior is unchanged (always EXTENDS). The shared preEmitInheritanceEdges / emitHeritageEdges changes stay language-neutral; the emitHeritageEdges contract gained an appended optional scopes parameter (Ruby's narrower hook stays assignable). Graph node/edge shapes and persisted IDs are unchanged.

Performance & heritage benchmarking

The inheritance-capture synth (@reference.inherits) added by this migration was only driven at scale by the scope-capture benchmark for ts/js/kotlin; go/csharp/rust/php/ruby/swift/python had flat synthetic scale sources, and java, c, and c++ were not benched at all. Every language's synthetic bench source is now heritage-bearing, covering its distinct form(s) — go struct-embedding; csharp/java extends+implements; rust trait-impl; php extends+trait-use; ruby superclass+mixin; swift superclass+protocol; python single+multiple inheritance; c++ single+multiple inheritance (c has no inheritance — flat) — so the synth's per-node cost is gated for every language (scaling < 1.5).

Adding the three previously-unbenched languages surfaced the same pre-existing O(n²) in each — the findNodeAtRange(tree.rootNode, …) per-match root-walk that #1848/#1915/#1918 fixed for the other languages but missed for java/c/c++ (they had no bench). All three fixed by threading the query's captured node (capture output byte-identical, verified by before/after digests over every fixture): java 3.089→0.99, c 3.475→0.96, c++ 2.30→1.12. C++'s emitCppInheritanceCaptures is the original synth pattern #1951 followed for the other languages.

A pre-existing flaky test (sequential-language-availability → processHeritage skip-warning) is also de-flaked: #1951's isRegistryPrimary gate preempts the parser-availability skip for migrated languages, so the test now forces legacy mode to deterministically exercise the path it asserts.

Tri-review follow-ups

A subsequent tri-review (Codex/gpt-5.5 + CE personas) of this migration surfaced follow-ups, planned + deepened (4-lens document review) and implemented here as 10 commits:

  • U1 — Rust qualified/scoped trait-impl bases. impl crate::traits::Drawable for User (and impl m::Wrapped<T> for S) now resolves by its trailing simple name in both legs (bareTypeIdentifier handles scoped_type_identifier / generic_type; the legacy Rust @heritage impl_item arm became one real-parse-verified alternation).
  • U2 — Java/TS qualified + qualified-generic bases at parity. Widened the legacy @heritage queries (Java scoped_type_identifier, TS member_expression / nested_type_identifier, plain + generic) to match the synth, and fixed a TS synth gap (terminalTsTypeNameNode now treats a property_identifier tail as a leaf, so extends ns.Base synthesizes an edge it previously dropped). New java-qualified-base / typescript-qualified-base parity fixtures run under both legs.
  • U3 — JSX call-arity. A JSX element used as a call argument (render(<Foo a b/>)) no longer inherits the enclosing call's arity — an early JSX-anchor guard at the arity call site (JS + .tsx); the stale "resolves to null for a JSX anchor" comment is corrected.
  • U4–U7 — maintainability cleanup (byte-identical). Documented the emitHeritageEdges scope-index dependency; dropped the tryEmitEdge inheritance-override bag (emit directly via emitInheritanceEdgeDirect); extracted a shared walkNamedTree replacing seven near-identical per-language walkers; deduplicated the Swift base-tail helper into a leaf module. bench --check + all seven *-captures-golden digests unchanged.
  • U8 — test hardening. Pinned the C# ambiguous-heritage targets unconditionally (was a vacuous if (targetFilePath) guard); added resolveAmbiguousInheritanceBaseViaImports branch unit tests; de-languaged the EXTENDS-vs-IMPLEMENTS discriminator comment (AGENTS.md); clarified the Swift registry-primary availability-test titles (Swift stays registry-primary, no flag forcing).

Tier-2 review of the follow-ups (caught + fixed): the Java scoped @heritage arms double-matched a 2-segment qualified base (extends Outer.Inner parses both segments as direct type_identifier children → spurious EXTENDS Outer + a =2/=1 parity break), masked because the fixtures used only 3-segment names. Fixed with a trailing end-anchor on all four Java arms, guarded by new 2-segment fixtures (Plain, Two), verified =1/=1 both legs. The flagged P2 (dual dedup-set readability) was applied. One advisory accepted: the TS member_expression leaf also matches extends this.Base — parity-symmetric, below the confidence gate, and not tightenable without breaking legitimate deep-namespace bases. Every other claim (U5 byte-identical emission, U1 alternation, U6 walker, U3 guard, U7 cycle-free) was independently refuted as sound.

Post-merge heritage audit + synth widening (7 languages)

Merging origin/main brought in PR #1940 ("Centralize heritage supertype matching"), which widened the legacy @heritage leg (config-driven shapes + a runtime normalizeSupertypeName) to capture qualified / generic / scoped / attribute / subscript / record / struct / delegation / member-expression / interface-embed bases. A per-language audit (one verifier per language, real-parse) confirmed the merge dropped nothing — but surfaced that the registry-primary synths (the production path) were never widened to match, silently dropping those inheritance edges in production. This is the same #1951 gap already fixed for rust/ts/cpp; the audit found it unfixed for 7 more languages, so this PR closes them:

  • python — attribute (class X(pkg.Base)) + subscript (Generic[T]) bases
  • goqualified_type (pkg.Base), generic_type (Box[T]), pointer, AND interface_type embeds (the synth previously walked struct_type only)
  • javascriptmember_expression base (extends ns.Base)
  • csharprecord_declaration + struct_declaration base-lists (incl. primary_constructor_base_type) + alias_qualified_name
  • rubyscope_resolution superclass (class C < Mod::Super)
  • kotlinexplicit_delegation (class F : Iface by d)
  • javainterface_declaration extends-interfaces (interface IA extends IB)

Each synth's base-name extractor was widened to reduce the new shapes to the bare name, real-parse-verified to equal normalizeSupertypeName(base) (the legacy leg's reduction) so registry↔legacy agree by construction; the existing simple-base path is byte-identical. New <lang>-qualified-base / java-iface-extends fixtures + both-leg parity blocks assert the edges. Full parity suite: 28/28 (all 14 languages, both legs). Goldens regenerated additively (existing fixtures gain one inherits capture each); scope-capture + python-scope benches re-baselined, all linear.

Known follow-up: csharp record UserRecord : BaseEntity in the same namespace — the synth emits the capture but the same-namespace record-target binding is not resolved on the registry leg (the legacy leg does emit it). A separate, narrow registry resolution gap, not asserted to avoid a leg divergence; both-safe (missing edge).

Residual Review Findings

Non-blocking items from the autonomous multi-agent code-review pass (verdict: ready to merge — zero correctness/security/performance/contract/standards findings). No issue tracker is configured for autonomous filing, so they are recorded here as the durable sink.

  • P2gitnexus/src/core/ingestion/languages/rust/scope-resolver.ts:78: the Rust refuse-on-ambiguity branch is not exercised end-to-end by a Rust fixture. The refusal helper (resolveAmbiguousInheritanceBaseViaImports) is already covered by existing C++/C# ambiguity tests, and the resolved path is covered by rust-cross-module-collision + rust-traits parity (both legs). A Rust-specific E2E fixture (two same-named structs, impl in a third file importing neither) asserting 0 IMPLEMENTS would close it; deferred to avoid another rust-* golden/bench re-baseline.
  • P3gitnexus/src/core/ingestion/languages/swift/receiver-binding.ts:99: firstInheritedType's qualified-base change has no super.method() integration test. Low value — a qualified Swift superclass produces no observable edge in the current model; the parallel swiftBaseTypeIdentifier change is unit-tested.

Post-Deploy Monitoring & Validation

No additional operational monitoring required — these are offline code-graph ingestion changes (no runtime service, endpoint, or persistent-data surface). Correctness is gated entirely by CI:

  • ci-scope-parity — all 14 languages, both REGISTRY_PRIMARY_*=0/1 legs: the registry↔legacy edge set must stay identical (the 2-segment Java double-match would surface here as =1/=0).
  • tests / benchmarksbench/scope-capture/measure.mjs --check (13 langs, scaling < 1.5) + the *-captures-golden digests must stay byte-identical (re-baselined additively for rust/java/ts here).
  • The new qualified-base, JSX-arity, and resolveAmbiguousInheritanceBaseViaImports tests.

Healthy signal: all gates green. Failure signal: a parity =1/=0 divergence or a golden/fingerprint drift on a language whose synth wasn't intended to change — each unit is a standalone commit, so revert the offending commit.

🤖 Generated with Claude Code

…n the worker path (#1951)

Registry-primary C# and Java produced zero EXTENDS/IMPLEMENTS edges when the
worker pool was engaged (large repos), so diagrams showed classes and
interfaces with no inheritance edges between them. Small fixtures stayed under
the worker threshold and ran sequentially, where the legacy heritage path is
intact, so the bug was invisible to existing tests.

Root cause: inheritance edges for migrated (registry-primary) languages came
only from the legacy `@heritage.*` -> processHeritage path, and the worker
pipeline drops those legacy artifacts for registry-primary languages via the
`shouldAccumulate` gate (parse-impl.ts). Scope-resolution runs in both parse
modes (worker-safe) but emitted nothing for C#/Java because, unlike C++, they
synthesized no `@reference.inherits` captures.

Fix (the existing C++ pattern): C#/Java now synthesize `@reference.inherits`
captures from their base lists, routing inheritance through scope-resolution.
The generic `preEmitInheritanceEdges` pass now decides EXTENDS vs IMPLEMENTS
from the resolved target's symbol kind (Interface -> IMPLEMENTS), mirroring the
legacy `resolveExtendsType` semantics so the registry path matches the legacy
DAG. C++ has no Interface targets, so it always takes the EXTENDS branch and is
unchanged. Capture scope per language matches the legacy heritage query (C#:
class+interface base lists; Java: class superclass + implemented interfaces) to
preserve scope-resolution parity.

The one-line worker fallback (always accumulating deferredWorkerHeritage) was
deliberately not used: it would resurrect the legacy DAG for migrated
languages, double-emit against C++, and re-introduce the O(files^2) heritage
cost the registry migration removes. No double-emission: in sequential mode the
legacy path emits first and scope-resolution dedups against the graph; in
worker mode the legacy path is dropped and scope-resolution fills the gap.

Touches none of PR #1954's protected files.

Tests: new worker-forced regression (test/integration/heritage-worker-path.test.ts)
that forces the worker pool on small fixtures and asserts the edges; it fails
before this change (0 edges) and passes after. C# and Java scope-resolution
parity gates pass in both legacy and registry-primary modes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented May 31, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Jun 1, 2026 3:09pm

Request Review

@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10757 10747 0 10 668s

✅ All 10747 tests passed

10 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 80.33% 37397/46553 79.84% 📈 +0.5 🟢 ████████████████░░░░
Branches 68.89% 23801/34545 68.5% 📈 +0.4 🟢 █████████████░░░░░░░
Functions 85.47% 3879/4538 84.94% 📈 +0.5 🟢 █████████████████░░░
Lines 83.9% 33646/40100 83.36% 📈 +0.5 🟢 ████████████████░░░░

📋 View full run · Generated by CI

…e + normalize qualified-generic bases (#1951)

Addresses two defects found by adversarial/correctness review of the #1951 fix:

1. Wrong inheritance-edge source for C# primary-constructor classes (worker
   mode). A C# 12 primary constructor (`class User(int id) : Base`) is
   synthesized into the class scope, so `preEmitInheritanceEdges` -> tryEmitEdge
   -> `resolveCallerGraphId` degraded the EXTENDS/IMPLEMENTS source to
   `Constructor:User` instead of `Class:User`. That produced a meaningless
   constructor-rooted inheritance edge and, because `buildMro` only maps
   class-like graph ids, silently dropped the class from the MRO (breaking
   method-override/dispatch resolution). The pre-pass already resolves the
   correct class source for its dedup key; it now also passes that id to the
   emitter so the edge is always class-owned. C++/C are unaffected (their
   constructors are not in the class scope, so the resolved id is identical).

2. Fully-qualified generic C# bases (`A.B.Base<T>`) silently emitted no edge:
   `terminalTypeNameNode` returned the `generic_name` node verbatim
   (`Base<T>`), which never resolves. It now recurses on the qualified-name
   tail to reach the bare identifier (`Base`), matching the documented
   `IRepository<T>` -> `IRepository` normalization.

Also bundles the inheritance pre-pass's `tryEmitEdge` overrides (edge type +
caller id) into a single options object instead of trailing positional params,
and switches the Java inheritance walk to named-children traversal.

Tests (test/integration/heritage-worker-path.test.ts): add a primary-constructor
+ qualified-generic fixture asserting every inheritance edge source is the Class
node (the regression) and that `App.Repo<int>` resolves; add a worker/sequential
parity block asserting both modes yield identical, non-duplicated EXTENDS/
IMPLEMENTS; pin edge `reason` to `scope-resolution: inherits`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…itance captures

C# now synthesizes @reference.inherits captures from class/interface base
lists (plus the new csharp-primary-ctor-heritage fixture), which legitimately
changes the C# capture-output corpus. Update the committed fingerprint; the
scaling ratio stays linear (~1.06, budget 1.5). No other language drifted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…esolution; gate legacy heritage by isRegistryPrimary (#1951)

Completes the #1951 fix. The reported "missing EXTENDS/IMPLEMENTS edges" was an
incomplete migration: every language became registry-primary for calls/imports,
but inheritance still came from the legacy `@heritage` path, which the worker
pipeline drops for registry-primary languages — so inheritance edges silently
vanished in worker mode for EVERY registry-primary language (reproduced: Python
emitted 0 EXTENDS in worker mode), not just C#/Java. It hid because the parity
suite runs sequentially, where legacy heritage still runs.

Migrate the remaining 9 languages' class/interface inheritance to scope-
resolution (the C++/C#/Java `@reference.inherits` pattern), then suppress the
legacy `@heritage` path for registry-primary languages with a bare
`isRegistryPrimary` gate — identical to how call-processor / import-processor
already gate. Scope-resolution is now the single inheritance source.

Per language (matching each legacy `@heritage` query scope exactly for parity):
- python/typescript/javascript/kotlin/swift: base-list / heritage-clause walks
  emitting `@reference.inherits`; EXTENDS vs IMPLEMENTS decided centrally from
  the resolved target's symbol kind.
- go: struct-embedding (anonymous fields), matching the legacy embedding scope.
- php: extends / implements / trait-use; trait targets map to IMPLEMENTS.
- rust: trait-impls via the `emitHeritageEdges` hook (impl owner != enclosing
  scope, so the generic inherits pass can't source it) — emits S IMPLEMENTS T.
- ruby: `class < superclass` via captures, leaving the existing mixin
  `emitHeritageEdges` path intact.

Shared:
- preEmitInheritanceEdges (run.ts): IMPLEMENTS when the resolved target is an
  Interface OR Trait (matches legacy resolveExtendsType + the trait-impl
  branch); plus import-aware disambiguation for ambiguous bases
  (resolveAmbiguousInheritanceBaseViaImports in walkers.ts) — engages only on a
  findClassBindingInScope miss, preferring the candidate whose file is
  imported/included by the referencing file (C++ `#include` disambiguation,
  C# `using` disambiguation). Never changes single-match resolution.
- heritage-processor: bare `isRegistryPrimary` skip, mirroring call/import.

Regenerated the 7 capture-golden snapshots and re-baselined the scope-capture
fingerprints (go/rust/php/ruby/swift) for the new `@reference.inherits` output;
all stay linear (scaling ~1.0).

Validation: scope-resolution parity 28/28 (all 14 languages, both
REGISTRY_PRIMARY_<LANG>=0 and =1 legs); tsc clean; bench fingerprint gate green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ate ts/js/kotlin in the bench

While verifying the #1951 inheritance migration didn't introduce superlinear
capture scaling, added scope-capture bench coverage for TypeScript, JavaScript,
and Kotlin — three languages that had NO scaling gate. That revealed PRE-EXISTING
O(n²) in the JS and Kotlin emitters (the #1951 inheritance synths are O(N) and
were ruled out by measuring with them disabled).

JS `emitJsScopeCaptures` and Kotlin `emitKotlinScopeCaptures` re-derived AST
nodes with `findNodeAtRange(tree.rootNode, range, type)` PER query match — an
O(matches × N) root walk (the #1848/#1915 pattern fixed for go/python but never
caught here because these languages were unbenchmarked). Fixed by threading the
tree-sitter captured node `c.node` (per-match nodeMap), mirroring the csharp/go
implementations. Behavior-preserving: capture-output fingerprints are unchanged
(byte-identical), so resolution/parity are untouched.

  JavaScript: scaling 2.96 -> 1.01 (~13x; 2995ms -> 232ms on the 800-entity bench)
  Kotlin:     scaling 2.78 -> 0.84 (~8x;  4641ms -> 591ms)

bench/scope-capture/measure.mjs now covers 10 languages (added ts/js/kotlin with
inheritance-bearing synthetic units so the #1951 synth pass is timed at scale);
baselines.json gains their fingerprints. `measure.mjs --check` PASS (10
languages), all scaling_ratio < 1.5. JS/Kotlin resolver parity unchanged in both
REGISTRY_PRIMARY legs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tri-review — scope-resolution inheritance migration (#1951)

Methods & engine independence. Two methods, both Claude — GitNexus swarm + Compound-Engineering personas (correctness, adversarial, performance, maintainability, testing returned findings; risk + test-CI lanes ended mid-investigation). Codex — the only independent engine — returned no findings, so this is a two-method, both-Claude review, NOT three independent confirmations. Every posted finding was coordinator-verified by re-reading the code ([code-read]); the adversarial lane also reproduced several via tsx.

Visibility. Current visible state is incomplete. I could verify the full diff, each per-language synth against its legacy @heritage query, the shared resolution path, and the worker-test coverage, but not the Codex lane (empty) nor the risk / test-CI agent lanes (ended mid-investigation). Treat those missing items as mandatory verification points, not confirmed facts.

Problem. #1951: registry-primary languages emitted 0 EXTENDS/IMPLEMENTS edges in worker mode (legacy @heritage dropped by the worker pipeline, no scope-resolution replacement). This PR migrates all 14 languages to scope-resolution, gates legacy heritage by isRegistryPrimary, adds import-aware ambiguous-base disambiguation, and fixes pre-existing js/kotlin O(n²) capture loops.

Repository history considered. Base e2758262 is the #1954 typeBindings-OOM merge. walkers.ts (gains the new disambiguation helper) is a memory-hot file — #1954 / #1905 / #1800 all fought OOM there. #1918 ("linearize scope-capture across ALL languages O(n²)→O(n)") claimed all languages but js/kotlin were unbenchmarked → it missed them; this PR closes that gap and adds their bench gate. The registry-primary inheritance pattern follows the C# (#1019) and Swift (#937/#1948) migrations.

Current state & merge status: checks pending (OPEN, mergeStateStatus=BLOCKED with CI in flight). 31 files, +1623/−209.

Branch hygiene: merge-from-main commit present but harmless and merge-safe#1954 is merged in as the base; the change is additive and touches none of #1954's protected regions.

Findings

P1 — Rust trait-impl resolves the WRONG struct/trait on cross-module name collision.

  • Risk: emitRustTraitImplEdges keys graphIdByName by bare simple name with last-write-wins, ignoring the impl's module → for two struct User (or two traits) across modules, an impl edge sources from the wrong User and the real one is missing. Legacy was file-scoped and refused ambiguous global matches; Rust is registry-primary so legacy is gated off → this hook is the sole emitter.
  • Evidence to check: gitnexus/src/core/ingestion/languages/rust/scope-resolver.ts:54. [code-read + tsx-reproduced]
  • Recommended fix: resolve struct/trait via the impl site's scope (findClassBindingInScope) / file-scoped context; refuse (emit nothing) on >1 candidate to preserve the legacy "wrong edge is worse than no edge" invariant.
  • Blocks merge: yes.

P1 (test) — no worker-forced inheritance coverage for 9 of the 14 migrated languages.

  • Risk: #1951 is worker-mode-only; the parity CI runs sequentially. The worker-forced test covers only C# and Java. The 9 others (python, typescript, javascript, go, php, kotlin, rust, swift, ruby) now depend entirely on their scope-resolution synth in worker mode (legacy gated off) — a worker-only capture regression in any would pass every CI gate.
  • Evidence to check: gitnexus/test/integration/heritage-worker-path.test.ts:49 (describe blocks are C#/Java only). [code-read]
  • Recommended fix: add worker-forced describe blocks (force the pool + assert usedWorkerPool===true + the edge set) for at least go/python/php/rust.
  • Blocks merge: maybe.

P2 — Java/TypeScript synths emit GENERIC-base edges the legacy @heritage query never matched (latent parity divergence + inaccurate doc).

  • Risk: the legacy Java/TS heritage queries are (type_identifier)-only; the synths normalize generic_type (Box<T>Box) and emit. extends Box<T> / implements IFoo<T> → registry emits, legacy emits none → the =0/=1 parity gate diverges the moment a generic-base fixture is added (none exists today, so it is green). The synth docs claim "matching the legacy query exactly" — inaccurate. (C# and Rust legacy DO handle generics → at parity.)
  • Evidence to check: legacy gitnexus/src/core/ingestion/tree-sitter-queries.ts:784,788; synth gitnexus/src/core/ingestion/languages/java/captures.ts:271 and typescript/captures.ts:507. [code-read + tsx-reproduced]
  • Recommended fix: widen the legacy Java/TS queries to match generic bases (registry is the more-correct behavior), or drop the synth generic branch; add a generic-base parity fixture; fix the docs.
  • Blocks merge: no.

P2 — Swift qualified base Outer.Inner resolves to the qualifier, not the base.

  • Risk: swiftBaseTypeIdentifier returns the FIRST type_identifier child of a user_typeOuter (the qualifier) for class A: Outer.Inner, not Inner (the base) — wrong target, contradicting its own doc comment.
  • Evidence to check: gitnexus/src/core/ingestion/languages/swift/captures.ts:350. [code-read + tsx-reproduced]
  • Recommended fix: take the LAST type_identifier child (the base), per the doc.
  • Blocks merge: no.

Back-and-forth avoided by verifying

  • No double-emission — correctness traced both parse modes: legacy is fully gated for registry-primary in worker (shouldAccumulate) and sequential (processHeritage isRegistryPrimary skip); the existing dedup seed in preEmitInheritanceEdges is a no-op on the production path (live only in shadow mode).
  • The js/kotlin O(n²) fix is byte-identical — correctness + adversarial each verified every converted call site (captured node == old findNodeAtRange result; bench fingerprints unchanged). All 10 benched languages < 1.5 scaling.
  • Shared code is language-neutral (no DoD violation — Interface||Trait is a symbol-kind check with precedent in resolveExtendsType). Go-embedding / Kotlin-: / Swift-protocol edge types match legacy for resolved simple-name bases.

Lower-priority (verify before acting)

  • PHP gitnexus/src/core/ingestion/languages/php/captures.ts:325 — stale comment claims the central pass treats Trait as EXTENDS, but run.ts:125 already maps Trait→IMPLEMENTS; misleading.
  • Perf — kotlin synth uses descendantsOfTypedescendants (O(N·depth) array-spread; bench/scope-capture/measure.mjs reports kotlin scaling 0.92 — linear-slope, heap-heavy on huge files; pre-existing pattern). gitnexus/src/core/ingestion/scope-resolution/scope/walkers.ts:332 allocates two Sets per ambiguous-name call in the OOM-sensitive file, not cached across calls.
  • Maintainabilitykotlin/captures.ts >1000 lines (extract the synth); gitnexus/src/core/ingestion/languages/go/captures.ts:227 findNamedChildOfType duplicates canonical findChild; six byte-identical per-language visit* walkers; tryEmitEdge's inheritanceOverride bag.

Open questions

  • Generic-base inheritance: support it (widen the legacy queries) or match-to-legacy (drop from the synths)? The registry behavior is arguably the more correct one.

Verdict

not production-ready. The Rust cross-module name collision (P1) is a reachable wrong/missing-edge regression for a common Rust pattern (same-named structs/traits across modules), and the legacy path it replaces deliberately refused such ambiguity — it should be resolved or bounded before merge. The Java/TS/Swift name-extraction divergences (P2) and the absent worker-forced tests for 9 of 14 migrated languages (P1-test) are the exact bug class this PR fixes left unguarded, and should be addressed too. That said, the migration's core is solid and validated — scope-resolution parity is 28/28 across both legs, the js/kotlin O(n²) fix is byte-identical, there is no double-emission, and the shared code stays language-neutral — so these are bounded, fixable surface issues, not architectural problems.

Automated multi-tool review digest (2 methods, both Claude; Codex unavailable). Verify each finding before acting.

Comment thread gitnexus/src/core/ingestion/languages/rust/scope-resolver.ts Outdated
skipWorkers: true,
});

describe('C# inheritance edges on the worker path (#1951)', () => {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1 — testing] No worker-forced coverage for 9 of 14 migrated languages. #1951 is worker-mode-only; the parity CI runs sequentially. This file forces the worker pool only for C#/Java. The 9 others (python/ts/js/go/php/kotlin/rust/swift/ruby) now depend entirely on their scope-resolution synth in worker mode (legacy gated off) — a worker-only capture regression in any passes every CI gate. Fix: add worker-forced describe blocks (force pool + assert usedWorkerPool===true + edge set) for at least go/python/php/rust, mirroring the C# block. [code-read]

case 'scoped_type_identifier':
// `java.io.Serializable` → trailing `type_identifier` (`Serializable`).
return node.lastNamedChild;
case 'generic_type': {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2 — correctness/parity] Generic-base divergence from legacy. This generic_type branch normalizes Box<T>Box and emits an inheritance edge, but the legacy Java @heritage query (tree-sitter-queries.ts:784,788) is (type_identifier)-only and matches nothing for extends Box<T> / implements IFoo<T>. So registry(=1) emits an edge legacy(=0) doesn't → the parity gate diverges the moment a generic-base fixture exists (none does today, so it's green). The doc comment's "matching the legacy query exactly" is inaccurate. Same in typescript/captures.ts:507. (C#/Rust legacy DO handle generics → fine.) Fix: widen the legacy Java/TS queries to match generic bases (registry is more correct) or drop this branch; add a generic-base parity fixture. [code-read + tsx-reproduced]

case 'identifier':
case 'type_identifier':
return node;
case 'generic_type': {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2 — correctness/parity] Same generic-base divergence as Java. terminalTsTypeNameNode normalizes generic_type and the synth emits for implements IFoo<T> / extends Base<T>, but the legacy TS @heritage query is (type_identifier)-only → registry emits edges legacy doesn't. Latent =0/=1 parity break + the doc's "matching legacy" is inaccurate. See the Java comment for the fix. [code-read + tsx-reproduced]

Comment thread gitnexus/src/core/ingestion/languages/swift/captures.ts Outdated
Comment thread gitnexus/src/core/ingestion/languages/php/captures.ts Outdated
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 7m 36s —— View job


PR #1956 — Production-Readiness Review

Todo List

  • Post initial comment
  • Read required docs (DoD.md, AGENTS.md, TESTING.md)
  • Check current CI status
  • Read key context files (registry-primary-flag, parse-impl, heritage-processor, scope-resolution pipeline, edges, walkers)
  • Read all changed language capture files (Rust scope-resolver, Java, TypeScript, Swift, PHP)
  • Read worker-path integration test
  • Run search term scans (shouldAccumulate, deferredWorkerHeritage, isRegistryPrimary, preEmitInheritanceEdges, graphIdByName, generic_type heritage queries)
  • Run hidden Unicode/bidi hygiene checks
  • Verify each already-left finding
  • Assess all lanes
  • Compile and post final verdict

Review bar for this PR

  • Worker-mode EXTENDS/IMPLEMENTS edges must be emitted on the actual worker parse path for every registry-primary language migrated in this PR; test must force the worker pool and assert usedWorkerPool === true.
  • Legacy heritage must only be gated for languages with a complete scope-resolution replacement; no previously-emitted edge may be silently dropped.
  • All inheritance edge sources must be class/interface/trait owner nodes; all targets must be resolved without wrong-ID cross-module collisions.
  • EXTENDS vs IMPLEMENTS must follow language semantics for classes, interfaces, traits, and mixed cases (C#, Java, Rust, PHP, Swift, Go, etc.).
  • No O(N²) capture loops, unbounded allocation in memory-hot paths, or OOM regressions; benchmark harness must enforce linear scaling.
  • Worker-forced tests must exist for every newly migrated language (or the gap must be explicitly justified and bounded); parity suite must cover both REGISTRY_PRIMARY_<LANG>=0 and =1.
  • All CI checks (typecheck, lint, format, coverage, cross-platform, parity, benchmarks) must be green before merge.
  • Final diff must match the stated scope; merge-from-main commits must be harmless and merge-safe.

Problem being solved

Issue #1951: In large repos, inheritance/implementation edges (EXTENDS, IMPLEMENTS) were completely missing from diagrams. The root cause is that C# and Java are registry-primary, meaning their @heritage.* captures are dropped in the worker pipeline by the shouldAccumulate gate in parse-impl.ts:699–709 (isRegistryPrimary(lang) → return false). Scope-resolution runs in worker mode but had no @reference.inherits captures for C# and Java (unlike C++), so it emitted nothing. Small fixtures stayed below the worker threshold and ran sequentially where the legacy path still fired, hiding the bug from existing tests.

The PR starts with a C#/Java fix then expands in commit b25e2c4 to migrate all 14 registry-primary languages' inheritance to scope-resolution and gate the legacy processHeritage path for all of them. A gap remains: the initial test coverage only forced worker mode for C# and Java; the 9 other newly migrated languages are now guarded only by sequential-mode parity tests.


Current PR state

  • Status: OPEN, not draft.
  • Head branch: fix/1951-extends-implements-edges, 7 commits.
  • Changed files: 31 (matches the changed_files list provided).
  • Checks: CI workflow run 26726641212 completed with conclusion: failure. No subsequent passing CI run is visible. 1 test failed / 10236 passed / 10 skipped (from the CI bot report). Typecheck: success. E2E: success. Gitleaks, Dependency Review, CodeQL, Docker: success.
  • Review state: No formal approvals. One comprehensive inline review comment (self-authored by PR author) with 6 inline findings.

Merge status and mergeability

checks failing

The CI main workflow (CI, run 26726641212) completed with conclusion: failure. One test is failing. There is no subsequent successful CI run visible. Per DoD §4.2 and §6, a change is not Done if CI is failing even if the code looks correct.


Repository history considered


Branch hygiene assessment

merge-from-main commit present but harmless and merge-safe

Commits 215b464 and 40cb5c1 merged main (Swift migration #1948 and C# typeBindings fix #1954, respectively) into the branch. Both are already in main. The final diff against main (31 files) covers only inheritance capture synths, scope-resolution pipeline changes, heritage-processor gating, bench/golden updates, and the new integration test. No unrelated workflow, web, release, or dependency changes appear in the 31-file final diff.


Understanding of the change

Legacy heritage gating: heritage-processor.ts:209 now skips isRegistryPrimary(language) files entirely in processHeritage. This is clean and matches the call-processor/import-processor precedent. processHeritageFromExtracted is NOT gated (it handles pre-extracted records for both legacy and scope-resolution paths) — correct.

New @reference.inherits synths: Each language's captures file adds a synthesize*InheritanceReferences(root) function that walks the AST and emits { '@reference.inherits': ..., '@reference.name': ... } capture records. These are consumed downstream by finalizeScopeModel which sees kind: 'inherits' reference sites.

Shared inheritance pre-pass: preEmitInheritanceEdges in run.ts:63–151 iterates all inherits reference sites, resolves the target def via findClassBindingInScope → import-aware disambiguation fallback, finds the enclosing class via findEnclosingClassDef, and emits EXTENDS or IMPLEMENTS based on targetDef.type === 'Interface' || targetDef.type === 'Trait'. All inherits sites are added to handledSites unconditionally so the generic reference bridge doesn't re-emit them.

Edge override in tryEmitEdge: The inheritanceOverride bag ({ edgeType?, callerGraphId? }) pins the edge to the enclosing class (not a method/constructor anchor) and to the correct EXTENDS/IMPLEMENTS type. This fixes the C# primary-constructor regression where resolveCallerGraphId would have degraded the source to the constructor.

Import-aware disambiguation: resolveAmbiguousInheritanceBaseViaImports in walkers.ts:332–378 is language-neutral and conservative: tiered exact-file then same-directory match; refuses on ≥2 survivors at any tier. Only engaged on multiple-candidate ambiguity.

JS/Kotlin O(n²) fix: Both capture files were using findNodeAtRange(tree.rootNode, capture.startPosition, type) per match — O(matches × rootChildren). Fixed by threading capture.node directly from the query match array, matching the prior #1915/#1918 pattern for Go/Python/etc.

Rust trait-impl bridge: emitRustTraitImplEdges (scope-resolver.ts:40–89) builds a workspace-wide Map<simpleName, graphId> and matches impl T for S reference sites against it. This is the sole emitter for Rust inheritance (legacy gated off).

Bench/golden: All 10 benchmarked languages now have linear-slope entries in baselines.json; JS/Kotlin/TS added fresh; others rebaselined for new @reference.inherits captures. Capture golden JSON files updated across all 7 changed languages.


Findings

Finding 1 — Rust cross-module name collision: wrong/missing trait-impl edge

Severity: P1

Status: confirmed

Risk: emitRustTraitImplEdges (scope-resolver.ts:47–54) builds graphIdByName as Map<string, string> keyed by simple name with last-write-wins. For two struct User or two traits named identically across modules, graphIdByName.get('User') returns whichever def was iterated last. An impl T1 for a::User can emit b::User IMPLEMENTS T1 (wrong source node), while the real edge is missing. The MRO built downstream consumes these wrong source IDs silently.

Contrast with the legacy heritage processor (heritage-processor.ts:74–77): it explicitly refuses global-tier collisions: "For global with multiple candidates, refuse (a wrong edge is worse than no edge)". This PR removes that safeguard for Rust without a replacement. Rust is registry-primary so legacy is gated off — this hook is the sole emitter.

Evidence checked: scope-resolver.ts:47–54, heritage-processor.ts:74–77. No fixture testing two same-named structs/traits across modules exists. The fix comment at scope-resolver.ts:26–38 claims workspace-wide simple-name match is fine because "a trait T is commonly declared in a different file from the impl block" — this is true but doesn't address the collision case.

Recommended fix: Before building graphIdByName, detect collisions: if a name already exists in the map with a different graph ID, remove it and add to a collision set; then skip lookup for names in the collision set. This restores the legacy "refuse on ambiguity" invariant. Add a fixture with struct User in two modules and a trait impl.

Blocks merge: yes


Finding 2 — No worker-forced inheritance coverage for 9 of 14 migrated languages

Severity: P1

Status: confirmed

Risk: heritage-worker-path.test.ts forces worker mode for C# and Java only (5 describe blocks covering csharp-proj, csharp-interface-heritage, java-heritage, csharp-primary-ctor-heritage, and worker/sequential parity for csharp-proj). The remaining 9 migrated languages — Python, TypeScript, JavaScript, Go, PHP, Kotlin, Rust, Swift, Ruby — were all migrated in commit b25e2c4 and have their legacy heritage gated off. Their parity CI tests run sequentially with REGISTRY_PRIMARY_<LANG>=0/1. A worker-only capture regression in any of those 9 would emit 0 edges in production while parity CI stays green — exactly the bug class this PR is meant to fix.

Evidence checked: heritage-worker-path.test.ts:1–187 — all describe blocks are C# or Java. registry-primary-flag.ts:71–86 confirms all 14 are migrated. heritage-processor.ts:209 gates all of them.

Recommended fix: Add at minimum one worker-forced describe block each for Go (embedding), PHP (extends + trait use), Rust (trait impl), and Python (inheritance) — the four highest-risk migrated languages. Each should assert usedWorkerPool === true and exact edge sets for EXTENDS/IMPLEMENTS. This is the exact pattern the C#/Java blocks establish.

Blocks merge: yes — all 9 newly-migrated languages go unguarded against worker-only regressions with no forced-worker test coverage.


Finding 3 — Java generic-base parity divergence from legacy

Severity: P2

Status: confirmed (latent)

Risk: javaBaseLookupNameNode (captures.ts:264–278) handles generic_type nodes (e.g., extends Box<T>) by recursing to the bare type_identifier (Box) and emitting an inheritance edge. The legacy Java heritage query (tree-sitter-queries.ts:783–788) captures only (type_identifier) @heritage.extends and (type_list (type_identifier) @heritage.implements) — no generic_type support. Result: registry mode emits an edge for extends Box<T>, legacy does not. The parity gate passes today only because no generic-base Java fixture exists. The doc claim "matching the legacy query exactly" is inaccurate.

Evidence checked: tree-sitter-queries.ts:783–788, captures.ts:264–278. No generic-base Java fixture in test/fixtures/lang-resolution/.

Recommended fix: Either (a) widen the legacy Java heritage query to match generic_type and add a generic-base parity fixture, or (b) drop the generic_type branch from the synth to match legacy. Option (a) is the more correct behavior since Java does allow extends Comparable<T>. Whichever choice is made must be explicit, documented, and tested.

Blocks merge: no on its own, but this is latent parity drift the PR's author explicitly claims doesn't exist.


Finding 4 — TypeScript generic-base parity divergence from legacy

Severity: P2

Status: confirmed (latent)

Risk: Same pattern as Finding 3. terminalTsTypeNameNode (typescript/captures.ts:502–521) handles generic_type nodes. Legacy TS heritage query (tree-sitter-queries.ts:317–327) captures (identifier) @heritage.extends and (type_identifier) @heritage.implements — no generic_type branch. No generic-base TypeScript fixture exists.

Evidence checked: tree-sitter-queries.ts:317–327, typescript/captures.ts:502–521.

Recommended fix: Same as Finding 3 — widen legacy TS heritage query and add parity fixture, or document and test the intentional expansion.

Blocks merge: no on its own.


Finding 5 — Swift qualified base resolves to qualifier, not terminal base

Severity: P2

Status: unverified suspicion — intentional legacy match, semantic concern open

Risk: swiftBaseTypeIdentifier (swift/captures.ts:345–353) returns the FIRST type_identifier child of a user_type. For a qualified inheritance base class A: Outer.Inner, if the tree-sitter-swift grammar represents Outer.Inner as a user_type with Outer as the first type_identifier, the function returns Outer rather than Inner, creating an inheritance edge to the wrong type.

Counter-evidence: The function's doc comment at lines 339–344 explicitly states "a qualified Outer.Inner keeps the leading segment (matching the legacy .scm first-position bind, and the proven firstInheritedType walk in receiver-binding.ts)". The reviewer's claim that this "contradicts its own doc comment" appears inaccurate — the comment agrees with the implementation. This is an intentional match to legacy behavior. Whether that legacy behavior is semantically correct for Swift qualified types is an open question but not a regression introduced by this PR.

Recommended fix: No change required for merge, but a Swift fixture with Outer.Inner inheritance and a comment clarifying the legacy match would prevent future confusion.

Blocks merge: no — intentional legacy parity.


Finding 6 — PHP stale trait comment

Severity: low

Status: confirmed (misleading but behavior is correct)

Risk: php/captures.ts:323–327 reads: "the central pass classifies a resolved Trait def as EXTENDS (only Interface yields IMPLEMENTS). See the resolver's risk note — the central target-kind discriminator must treat Trait as IMPLEMENTS for full parity." But run.ts:124–125 already contains targetDef.type === 'Interface' || targetDef.type === 'Trait' ? 'IMPLEMENTS' : 'EXTENDS'. The comment describes a problem that is already fixed, which will mislead future readers into attempting a "fix" for non-existing behavior.

Evidence checked: php/captures.ts:323–327, run.ts:124–125. Fix this →

Recommended fix: Update the comment to: "The central pass (run.ts:124) maps both Interface and Trait targets to IMPLEMENTS — parity with legacy trait-impl → IMPLEMENTS is maintained."

Blocks merge: no.


Finding 7 — Kotlin/walkers perf allocation concern

Severity: low-P2

Status: unverified suspicion — benchmarks show linear, walkers concern minimal

Risk: Kotlin synth (kotlin/captures.ts:217) uses descendantsOfType(rootNode, 'class_declaration') which returns an array, producing one allocation per file. The baselines.json shows scaling_budget: 1.5 for Kotlin which is linear-range. The resolveAmbiguousInheritanceBaseViaImports function in walkers.ts allocates two Set<string> per call (lines 358–363) but only fires on ambiguous-name cases (multiple class-like defs with same simple name), which is rare in production. The OOM sensitivity from #1954 is around C# namespace-type-bindings, not inheritance disambiguation.

Evidence checked: kotlin/captures.ts:217, walkers.ts:358–363, baselines.json (kotlin scaling_budget 1.5 with _added note), run.ts:344 (disambiguation called only on findClassBindingInScope miss).

Recommended fix: No change required before merge. Follow-up: cache the two Set allocations across calls in resolveAmbiguousInheritanceBaseViaImports if profiling reveals hot-path pressure.

Blocks merge: no.


Finding 8 — Maintainability duplication

Severity: low

Status: unverified (code duplication exists; no confirmed correctness risk)

Risk: Six per-language visitXxx helpers are structurally similar; kotlin/captures.ts exceeds 1000 lines; go/captures.ts:227 adds a findNamedChildOfType that may duplicate a canonical findChild helper; tryEmitEdge gains inheritanceOverride options bag. None of these create confirmed wrong behavior — they are maintainability debt.

Blocks merge: no.


Finding 9 — CI / check status failing

Severity: P1

Status: confirmed

Risk: CI workflow run 26726641212 completed with conclusion: failure. The CI bot comment from 2026-05-31T17:45:30Z reports 1 test failed / 10236 passed. No subsequent successful run is visible. Merging with a failing CI gate violates DoD §4.2 ("cd gitnexus && npm test must pass") and DoD §6 ("Not Done if CI is failing").

Evidence checked: mcp__github_ci__get_ci_status output — "conclusion": "failure" for run 26726641212. No passing run in the 13 visible runs.

Recommended fix: Investigate and fix the failing test before merging. Re-run CI to confirm green.

Blocks merge: yes.


PR-specific assessment sections

Worker-path inheritance restoration

Lane A — PASS for C#/Java, UNGUARDED for 9 others.

The core mechanism is verified correct: (1) workers produce @reference.inherits captures from the new per-language synths; (2) shouldAccumulate at parse-impl.ts:703–705 gates these out for registry-primary languages in deferredWorkerHeritage (no accumulation of legacy heritage artifacts); (3) scope-resolution's preEmitInheritanceEdges consumes kind: 'inherits' reference sites and emits the edges; (4) tryEmitEdge pins the source to the enclosing class via inheritanceOverride.callerGraphId. The C#/Java worker tests (5 describe blocks, usedWorkerPool === true assertion on all) prove the end-to-end path works for those two languages. For the 9 other languages migrated in b25e2c4, no worker-forced test exists — they rely entirely on sequential-mode parity tests.

Per-language parity and synth scope

C#: Synth matches legacy (base_list class + interface names, generic_name for generic bases). Parity fixtures exist (csharp-proj, csharp-interface-heritage, csharp-primary-ctor-heritage). ✓

Java: Synth extends beyond legacy by handling generic_type (finds 3). Latent parity gap (Finding 3).

Python: Synth matches legacy (argument_list-based heritage). Sequential parity passes. No worker-forced test.

TypeScript: Synth extends beyond legacy by handling generic_type for extends/implements clauses. Latent parity gap (Finding 4).

JavaScript: Synth matches legacy (extends_clause → identifier). Sequential parity passes. No worker-forced test.

Go: Synth adds struct embedding inheritance captures. Sequential parity passes. No worker-forced test.

PHP: Synth covers class extends, implements, trait use, trait-uses-trait. Stale comment (Finding 6) but behavior correct.

Kotlin: Synth covers delegation_specifier (: base list). Sequential parity passes. No worker-forced test.

Rust: emitRustTraitImplEdges custom hook (not the shared pre-pass). Name collision risk (Finding 1). No worker-forced test.

Swift: Synth covers inherits_from clauses. Qualified name returns first segment (Finding 5, intentional legacy match). No worker-forced test.

Ruby: Synth covers class inheritance only (no mixin — those go through extractFromCall which is unaffected). Sequential parity passes. No worker-forced test.

C++: Pre-existing synth, unchanged. Behavior preserved.

C, COBOL: No inheritance; no synth needed.

Edge source/target/type/reason correctness

  • EXTENDS vs IMPLEMENTS: run.ts:124–125 correctly maps Interface|Trait → IMPLEMENTS, else EXTENDS. Matches legacy resolveExtendsType behavior. PHP traits, Rust traits, Ruby include/extend/prepend all land on IMPLEMENTS. ✓
  • C# primary constructor source: Verified via test assertion at heritage-worker-path.test.ts:144 (extends_.every(e => e.sourceLabel === 'Class')). inheritanceOverride.callerGraphId = findEnclosingClassDef(...) pins the class node. ✓
  • Rust trait-impl source: The explicitReceiver.name provides the struct name S from impl T for S; graphIdByName.get(structName) provides the graph ID. Correct in single-candidate case; collision case produces wrong source (Finding 1). ✓ / ✗
  • Reason string: 'scope-resolution: inherits' for all scope-resolution emissions; 'trait-impl' for Rust (matches legacy). Verified via test at line 65. ✓

Name resolution and ambiguity

  • findClassBindingInScope resolves single-match names cleanly via scope chain. ✓
  • resolveAmbiguousInheritanceBaseViaImports falls back only on miss + ≥2 candidates; refuses on ≥2 tier-survivors. Conservative and safe. ✓
  • Qualified names (C#: App.Repo<int>, Java: java.io.Serializable, TS: models.Base) normalized to bare simple name via language-specific terminal-node extractors. ✓
  • Rust cross-module collision not handled (Finding 1). ✗

MRO/downstream dispatch

preEmitInheritanceEdges runs at run.ts:344 — before provider.buildMro at line 364. Inheritance edges are in the graph when MRO is built. Wrong Rust source IDs would cause classes to appear in the wrong MRO chain. For all other languages, source/target IDs are correct. Risk is bounded to Rust (Finding 1).

Performance and memory

  • JS/Kotlin O(n²) fix verified: both now thread capture.node from the query match; findNodeAtRange(tree.rootNode, ...) per-match calls removed. baselines.json shows scaling_budget: 1.5 (linear range) for both. ✓
  • All 10 languages in bench have scaling assertions. No benchmark-regression visible in baselines.json. ✓
  • walkers.ts OOM-sensitive paths: new resolveAmbiguousInheritanceBaseViaImports function is guarded to rare ambiguous-name cases; two Set allocations per call but function rarely fires. Low risk. ✓
  • emitRustTraitImplEdges iterates all parsedFiles × all defs for the name index and all parsedFiles × all referenceSites for the emit loop: O(files × defs + files × sites) — linear. ✓

Test adequacy and CI wiring

  • C#/Java worker-forced coverage: thorough. 5 describe blocks, all assert usedWorkerPool === true, exact edge sets, source labels, reason strings, no-duplicate checks. ✓
  • Worker-forced coverage for 9 other migrated languages: absent. ✗
  • Sequential parity: all 14 languages pass parity (28/28 per PR body). CI scope-parity gate auto-discovers MIGRATED_LANGUAGES and runs each test twice. ✓
  • Golden/fingerprint rebaselines: all have _rebaselined or _added notes explaining the legitimate capture-output change. ✓
  • Heritage-worker-path.test.ts is in test/integration/ and wired into npm run test:integration / npm test. ✓
  • CI is currently failing with 1 test (Finding 9). ✗

Branch hygiene and mixed-domain risk

All 31 changed files are causally connected to the #1951 scope (inheritance edge migration + JS/Kotlin benchmark hardening). The all-language migration is justified: committing a partial migration (only C#/Java) while gating legacy for all 14 languages would leave 12 languages with neither legacy nor scope-resolution inheritance. JS/Kotlin O(n²) fix belongs here because it validates the benchmark gate added to prove the migration doesn't regress performance. No unrelated workflow, web, release, or dependency churn in the 31-file diff.

Hidden Unicode/hygiene

git grep -nP "[\x{202A}-\x{202E}\x{2066}-\x{2069}]" was run against all changed directories. No hidden bidi Unicode controls found in changed source files. The GitHub warning visible in the PR diff view appears to be a generic warning, not specific to this PR's files.


Back-and-forth avoided by verifying

  • No double-emission: Confirmed. processHeritage:209 skips all registry-primary languages. preEmitInheritanceEdges seeds existing from current graph EXTENDS/IMPLEMENTS edges and skips already-present ones. Sequential mode: legacy skipped → scope-resolution emits. Worker mode: legacy dropped by shouldAccumulate → scope-resolution emits. No duplication.
  • JS/Kotlin output byte-identical: Confirmed. Bench fingerprints enforce this. findNodeAtRange removal does not change which captures fire — only how the matched node is retrieved.
  • Worker tests genuinely force the pool: Confirmed. workerThresholdsForTest: { minFiles: 1, minBytes: 1 } forces any repo through the worker path regardless of file count. usedWorkerPool === true asserted on every C#/Java describe block.
  • C# primary constructors source from Class node: Confirmed. inheritanceOverride.callerGraphId = findEnclosingClassDef(...) bypasses resolveCallerGraphId which would have returned the constructor. Test at line 144 asserts e.sourceLabel === 'Class' for all edges.
  • PHP trait-use correctly emits IMPLEMENTS: Confirmed. run.ts:124–125 maps Trait → IMPLEMENTS. The PHP stale comment is misleading but behavior is correct.
  • Legacy gate complete for all 14 languages: Confirmed. All 14 languages in MIGRATED_LANGUAGES have inheritance synths (C and COBOL don't need them — no OOP inheritance). processHeritage:209 skips all of them.
  • Import-aware disambiguation is conservative: Confirmed. resolveAmbiguousInheritanceBaseViaImports refuses on ≥2 tier survivors at both tiers. Never changes single-match behavior.

Open questions

  • 1 failing CI test: The exact test name is not available from the visible CI data. What test is failing, and is it directly caused by this PR or a pre-existing flake? This must be investigated and resolved before merge regardless.

Final verdict

not production-ready

Three confirmed issues block merge:

  1. Rust cross-module name collision (Finding 1, P1): emitRustTraitImplEdges emits wrong edges for same-named structs or traits across modules. The legacy path this replaces explicitly refused such collisions; the new code has no equivalent guard. Wrong inheritance edges propagate into MRO and method dispatch silently.

  2. Missing worker-forced tests for 9 of 14 migrated languages (Finding 2, P1): Commit b25e2c4 migrated all 14 languages and gated all their legacy paths, but only C# and Java have worker-forced test coverage. Python, Go, PHP, TypeScript, JavaScript, Kotlin, Rust, Swift, and Ruby can regress in worker mode with CI remaining green — exactly the bug class this PR is fixing.

  3. CI is failing (Finding 9, P1): One test is failing in the current CI run. Per DoD §4.2 and §6, this is a hard blocker.

The core migration architecture is sound: the preEmitInheritanceEdges pre-pass is clean, the IMPLEMENTS/EXTENDS discriminator is correct, the C# primary-constructor pin is correct, double-emission is properly prevented, and the JS/Kotlin O(n²) fix is byte-identical. These three bounded, fixable issues should be addressed before merge. The Java/TS generic-base latent parity gaps (Findings 3–4) should also be explicitly resolved or documented.

magyargergo and others added 8 commits June 1, 2026 04:35
…collision (#1951)

emitRustTraitImplEdges resolved both the struct S and trait T of an
impl block through a workspace-wide simple-name index with last-write-
wins, so two same-named structs/traits across modules collapsed onto one
def — sourcing impl edges from the wrong module (and dropping the real
edge). Thread the finalized ScopeResolutionIndexes through the
emitHeritageEdges hook and resolve S/T from the impl block's own scope
via findClassBindingInScope, falling back to import-aware
disambiguation, refusing (emitting nothing) on ambiguity — restoring the
legacy file-scoped path's 'a wrong edge is worse than no edge' contract.

New rust-cross-module-collision fixture proves each edge sources from
the User in its own module. rust-traits parity unchanged (2/2).
…c bases (#1951)

The registry-primary scope-resolution synth normalized generic bases
(Box<T> -> Box) and emitted inheritance edges, but the legacy @Heritage
queries were type_identifier-only and matched nothing for extends Box<T>
/ implements IFoo<T>. That made registry(=1) and legacy(=0) diverge the
moment a generic-base fixture existed — a latent scope-parity break, and
the synth docs' 'matching legacy exactly' was inaccurate.

Widen the legacy Java queries (superclass/super_interfaces generic_type)
and the TS implements clause (generic_type name:) to capture the inner
type_identifier — the bare base name, so capture text is unchanged. TS
extends already captured via value:(identifier). Both paths now agree on
generic bases (the more-correct behavior, matching C#/Rust). New
java-generic-base + typescript-generic-base fixtures assert the edges
under BOTH parity legs; docs corrected.
…egment (#1951)

swiftBaseTypeIdentifier returned the FIRST type_identifier of the flat
user_type, so 'class A: Outer.Inner' synthesized an inheritance
reference to the qualifier Outer instead of the nested base Inner —
contradicting its own doc. Take the LAST type_identifier (Outer.Inner ->
Inner; generic args sit in a sibling type_arguments node, so Box<Int> ->
Box and Outer.Inner<T> -> Inner fall out naturally). firstInheritedType
in receiver-binding.ts had the same latent qualifier bug (it feeds
super-receiver binding) and is fixed in lockstep; both docs corrected.

New capture-level unit test asserts the extracted base is Inner (the
changed path; a bare nested-type name does not resolve to an edge in the
current model, so this is the precise observable). swift-qualified-base
fixture added; golden regenerated (purely additive — no existing entry
drift, confirming non-qualified bases are byte-identical).
)

The synthesizePhpInheritanceReferences NOTE claimed the central pass
classifies a resolved Trait as EXTENDS and 'must treat Trait as
IMPLEMENTS for full parity' — but preEmitInheritanceEdges already maps
Interface || Trait to IMPLEMENTS, so trait-use is correct on both paths.
The stale note would mislead a future reader into 'fixing' a non-bug.
…anguages (#1951)

#1951 was worker-mode-only, but worker-forced coverage existed for only
C# and Java; the other 9 migrated languages depended entirely on their
scope-resolution synth in worker mode (legacy heritage gated off), so a
worker-only capture regression would pass every sequential CI gate.

Add table-driven worker-forced blocks for go, python, php, rust, kotlin,
ruby, typescript, javascript, and swift — each forces the pool, asserts
usedWorkerPool === true (guarding against silent sequential fallback),
and pins the exact EXTENDS / IMPLEMENTS edge set. Rust exercises the
scope-aware trait-impl resolution and TS the generic-base path in worker
mode. Header doc updated to the full migrated-language scope.
…1956 review fixtures

Adding rust-cross-module-collision, swift-qualified-base, and
typescript-generic-base to the lang-resolution corpus changes the
order-independent capture fingerprint for those three languages (the
bench globs lang-resolution/<lang>-*). The change is additive only —
existing fixtures are byte-identical (swift golden regen was purely
additive; ts/rust capture synth is untouched) and scaling stays linear
(rust ~1.02, swift ~1.07, ts ~1.0). All 10 benched languages pass
--check.
…lision fixture

The rust-captures-golden test globs all rust-* lang-resolution fixtures;
the new rust-cross-module-collision fixture (added for the #1951 review
P1 fix) adds 4 entries to the snapshot. Purely additive — existing
fixture digests are unchanged, confirming emitRustScopeCaptures is
byte-identical (the P1 fix lives in scope-resolver.ts, not the capture
path).
Tighten the Java/TS inheritance-synth doc comments: the legacy-query
widening achieves parity on SIMPLE (unqualified) generic bases; qualified
bases (a.b.Box, a.b.Box<T>) remain a pre-existing legacy gap the synth
resolves but the legacy query does not (both outcomes safe). The prior
'both paths now agree on generic bases' overstated the scope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@magyargergo magyargergo changed the title fix(ingestion): emit C#/Java inheritance edges via scope-resolution on the worker path (#1951) fix(ingestion): migrate all languages' inheritance to scope-resolution on the worker path (#1951) Jun 1, 2026
…ance captures

The #1951 migration added python @reference.inherits capture synthesis
(languages/python/captures.ts), which changes emitPythonScopeCaptures
output and therefore its fingerprint. The original PR re-baselined the
unified scope-capture bench (baselines.json) but missed python's SEPARATE
harness (bench/python-scope/baseline-fingerprint.txt) — leaving the
'tests / benchmarks' job red. Update the committed fingerprint to match
the legitimate post-migration capture output (scaling stays linear ~1.05;
python resolver parity is 221/221 both legs). The import-target
resolver fingerprint is unaffected and still passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1956 review)

Address the code-review P2: preEmitInheritanceEdges inlined a two-set dedup guard
(coarse `existing` per-(caller,target,type) gate seeded from the graph + per-site
`seen` shared with the generic edge bridge) whose joint semantics weren't
self-documenting at the call site. Extract emitInheritanceEdgeDirect, which owns
both keys and documents the contract in one place, so a future inheritance pass
reuses it rather than re-deriving the dual-key pattern.

dedupKey and the rel: id shape are unchanged, so graph output stays byte-identical
(verified: pipeline-graph-golden + heritage-worker-path green; tsc clean).
…y leg (#1956 review)

The U8 hardening of the C# ambiguous-heritage assertion (pin Tier-2 resolution to
Models/Handler.cs) is correct for the registry-primary model but broke the legacy
parity leg: C# namespace `using` does not emit the file-level import edge in the
legacy DAG (the same root cause as the existing
`using-import edge ... through the scope-resolution path` expected-failure), so
the legacy leg cannot disambiguate and resolveHeritageId refuses to a synthetic
Class:/Interface: target. The old vacuous `if (targetFilePath)` guard passed in
both legs only because it asserted nothing in the legacy leg.

Per the migration policy (scope-resolution is the correct model), this is a
scope-resolver-only correctness win: assert the correct registry-primary behavior
unconditionally (no leg-conditional logic in the test) and list the test in
LEGACY_RESOLVER_PARITY_EXPECTED_FAILURES (helpers.ts), right beside the C#
using-import-edge exclusion it shares a root cause with.

Verified: full scope-parity suite 28/28 (all 14 languages, both legs) — C# was
the only break; the fix is the only change needed.
…alized heritage)

PR #1940 ("Centralize heritage supertype matching") refactored the legacy
@Heritage queries from hand-written per-language arms into config-driven
alternations: heritage-extractors/configs/<lang>.ts declare the supertype node
shapes, buildSupertypeAlternation() generates the [(a)(b)...]@Heritage.* arms,
and normalizeSupertypeName() reduces the matched supertype node to its innermost
simple name at runtime.

Conflict: tree-sitter-queries.ts (both sides rewrote the heritage section).
Resolved by taking #1940's version wholesale — every change this branch made to
that file (the U1 Rust scoped impl_item arm, the U2 Java/TS scoped arms, and the
Java end-anchor 2-segment fix) is fully superseded: #1940's shape descriptors
already include the same scoped shapes (Java/Rust scoped_type_identifier; TS
member_expression + nested_type_identifier), and capturing the whole supertype
node + normalizing at runtime structurally avoids the 2-segment double-match the
end-anchor was guarding. The registry-primary synth changes (rust/ts captures.ts)
are on a different path and untouched by #1940; they remain and continue to agree
with the legacy leg.

Verified on the merged tree: tsc clean; the java/typescript/rust qualified-base
fixtures pass under BOTH legs (registry synth == #1940 legacy normalizer).
… 7 registry synths (#1956)

PR #1940 ("Centralize heritage supertype matching") widened the LEGACY @Heritage
leg (config-driven shapes + normalizeSupertypeName) to capture qualified /
generic / scoped / attribute / subscript / record / struct / delegation /
member-expression / interface-embed bases. But the registry-PRIMARY synths (the
production path for migrated languages) were never widened to match, so in
production these inheritance edges were silently dropped — a pre-existing gap
exposed as a legacy>synth asymmetry by an audit of the post-#1940 merge. This is
the exact #1951 theme; rust/ts/cpp already handled their qualified bases, so this
brings the remaining 7 languages to parity:

- python: attribute (class X(pkg.Base)) + subscript (Generic[T]) bases
- go:     qualified_type (pkg.Base), generic_type (Box[T]), pointer, AND
          interface_type embeds (the synth previously walked struct_type only)
- javascript: member_expression base (extends ns.Base)
- csharp: walk record_declaration + struct_declaration base_lists (incl.
          primary_constructor_base_type) + alias_qualified_name
- ruby:   scope_resolution superclass (class C < Mod::Super)
- kotlin: explicit_delegation (class F : Iface by d)
- java:   walk interface_declaration extends_interfaces (interface IA extends IB)

Each synth's base-name extractor was widened to return the trailing/inner node
for the new shapes (existing simple-base path byte-identical); each was
real-parse-verified so the synth's bare name equals normalizeSupertypeName(base)
— the legacy leg's reduction — guaranteeing registry<->legacy agreement. New
<lang>-qualified-base / java-iface-extends fixtures + both-leg parity blocks
assert the new edges; full parity suite is 28/28 (all 14 langs, both legs).
Goldens regenerated additively (existing fixtures gain one inherits capture
each); scope-capture + python-scope benches re-baselined, all linear.

Known follow-up: csharp record->record in the SAME namespace
(record UserRecord : BaseEntity) — the synth emits the capture but the
same-namespace record-target binding is not resolved on the registry leg
(legacy does emit it); a separate registry resolution gap, not asserted here to
avoid a leg divergence.
@magyargergo magyargergo merged commit 0fc0211 into main Jun 1, 2026
54 of 56 checks passed
@magyargergo magyargergo deleted the fix/1951-extends-implements-edges branch June 1, 2026 16:04
magyargergo added a commit that referenced this pull request Jun 5, 2026
* fix(java): close parsing-layer coverage gaps F35/F38/F41 (#1928)

Registry-primary scope-resolution path (the live one post-#942/#943):

- F35 [HIGH]: qualified / qualified-generic constructor calls. `new pkg.Foo()`
  parses as a `scoped_type_identifier` that the query bound only as
  `@reference.call.constructor.qualified` with no `@reference.name`, so the
  scope extractor fell back to the whole-expression anchor and the reference
  name became the raw `new pkg.Foo()` text (never resolved). Bind the simple
  -name tail (end-anchored last child) and add an arm for the previously
  uncaptured `new pkg.Box<String>()` (qualified + generic) shape.

- F38 [MEDIUM]: `super(...)` / `this(...)` explicit constructor invocations,
  modeled as `explicit_constructor_invocation` and never matched by the scope
  query, dropped the chained-constructor CALLS edges. Synthesize them with the
  target resolved structurally (this -> enclosing type name; super -> superclass
  tail via the shared javaBaseLookupNameNode, skipping implicit Object) plus
  arity for overload disambiguation.

- F41 [LOW]: interpretJavaTypeBinding stripped the qualifier before generics, so
  a qualified generic type arg (`Map<String, com.example.User>`) was cut inside
  the generic into `User>`. Strip generics first, then the qualifier; make the
  erasure fallback qualifier-tolerant.

F36/F37 already landed upstream (#1940/#1956); F39/F40 are legacy-bank remnants
that are no longer consumed (legacy @import skipped in parse-worker; legacy
@call never read in parse-impl) so they are intentionally left untouched.

Tests: low-level capture unit tests (constructor shapes incl. double-match
guard; super/this/enum/implicit-Object), interpretJavaTypeBinding unit tests
(qualified generic args + the corruption case), and end-to-end resolver tests
with new fixtures asserting the CALLS edges resolve to the correct constructors.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(scope-resolution): register Constructor overload keys so this()/super() chains don't self-loop (#1928 F38 review)

Review of #2045 caught two gaps; both confirmed by reproduction.

P2 — F38 this() emitted a self-loop. On the java-explicit-constructor fixture,
Child(int){ this(); } produced CALLS Child()#0 -> Child()#0 instead of
Child(int)#1 -> Child()#0. Root cause is the language-agnostic graph-bridge: the
parse phase mints distinct Constructor nodes (Child#0, Child#1) carrying
parameterTypes, but node-lookup.ts registered the parameter-types / shape
overload keys only for Function/Method, never Constructor, so both ctors
collapsed onto the first-wins qualified/simple key and the caller Child(int)
resolved to Child#0 (the this() target). Extend the overload keys to Constructor
in both node-lookup.ts (registration) and ids.ts (lookup) via a shared
isOverloadableCallable predicate. Verified the edge now connects distinct nodes
(Child#1 -> Child#0); super(1)->Base#1 still correct. No cross-language
regressions (the 9 worker-path failures reproduce identically on clean HEAD).

Also harden the integration test: it matched the this() edge on name only, which
a self-loop satisfies; now assert the endpoints are DISTINCT constructors.

P3 — F41 order-regression guard was inert (List<Map<String,User>> normalizes to
List under both strip orders). Add List<com.x.Foo<String>> -> List, which is
corrupted to Foo<String>> under the old order and only correct generics-first.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(java): update fingerprint and add notes for constructor query captures in baselines.json

Updated the fingerprint for the Java section and added detailed notes regarding the enhancements in constructor query captures, including qualified and qualified-generic constructor queries. This change reflects ongoing improvements in the parsing layer coverage and fixture updates.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Gergő Magyar <gergomagyar@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extends & Implements Edges are missing

1 participant