Skip to content

fix(audit): Centralize heritage supertype matching (#1921/#1922)#1940

Merged
magyargergo merged 12 commits into
abhigyanpatwari:mainfrom
zander-raycraft:core/parsing-audit
Jun 1, 2026
Merged

fix(audit): Centralize heritage supertype matching (#1921/#1922)#1940
magyargergo merged 12 commits into
abhigyanpatwari:mainfrom
zander-raycraft:core/parsing-audit

Conversation

@zander-raycraft

@zander-raycraft zander-raycraft commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR fixes a systemic gap where heritage queries (extends, implements, embeds, trait impls) across nearly every OO language matched only the bare type_identifier base and silently dropped qualified, generic, scoped, and interface supertypes. The shape was being patched one language at a time and never generalized. This change introduces a single shared supertype-alternation builder plus a matching runtime name-normalizer, adopts it across all ten OO language heritage queries, and extends the heritage-extractors/configs/ framework (previously only go.ts and ruby.ts) to cover the rest. The result is one source of truth for "what a supertype can look like," so qualified, generic, scoped, and interface bases now produce EXTENDS and IMPLEMENTS edges instead of being lost.

What it changes

One shared alternation builder. buildSupertypeAlternation() takes a per-language shape descriptor (the set of tree-sitter node types a supertype can take) and returns the idiomatic [(a) (b) (c)] @heritage.* alternation fragment. The heritage query blocks interpolate this instead of each re-deriving the shape inline.

One shared name-normalizer. normalizeSupertypeName() reduces whatever node actually matched down to the innermost simple identifier, so Base<T>, pkg.Base, and ns::Base all collapse to Base. It is node-type driven (no language names), tries field access first then walks children, and skips generic argument lists and delegate or call subtrees (so Kotlin : Bar by baz resolves to Bar, not the delegate). This mirrors the existing C++ registry path so the simple-name ctx.resolve(name) contract keeps holding everywhere.

Per-language shape configs. Each OO language now declares its supertype node types in heritage-extractors/configs/<lang>.ts, consumed both by the query builder and by the runtime normalizer. The existing go.ts and ruby.ts keep their hooks (shouldSkipExtends, callBasedHeritage) and gain a shape descriptor alongside them.

Widened heritage queries and missing containers. Every legacy heritage block in tree-sitter-queries.ts now uses the builder output, and the previously absent container patterns are added: Java and TypeScript interface inheritance, C# record and struct base lists, and Go interface-in-interface embedding.

No downstream changes. Edge resolution, EXTENDS versus IMPLEMENTS selection, MRO, and the schema are untouched. The change lives entirely at the query and name-extraction layer, which is exactly why it is the durable fix. No language provider files were modified.

Issue Reference

Findings addressed

Every finding from the issue now emits its heritage edge, each covered by a fixture in test/integration/heritage-supertype-shapes.test.ts.

Finding Shape now captured Fixture asserts Done
Java F36 generic and qualified supertypes (generic_type, scoped_type_identifier) A->Base:extends, A->IFoo/Bar:implements Yes
Java F37 interface to interface (extends_interfaces) IA->IB/IC:implements Yes
TypeScript F81 interface inheritance (extends_type_clause) I->A/B:implements Yes
TypeScript F82 qualified base (member_expression) C->Base:extends Yes
JavaScript F43 qualified superclass (member_expression) C->Base:extends Yes
C# F10 record and struct heritage (record_declaration, struct_declaration base lists) R->Base/IFoo, S->IFoo/IBar Yes
C# F11 qualified and primary-constructor base (qualified_name, primary_constructor_base_type) A->Base (qualified), R->Base (primary ctor) Yes
Go F31 qualified, generic, and interface-in-interface embeds (qualified_type, generic_type, type_elem) D->Base/Gen/Animal, I->Reader/Other, named field correctly skipped Yes
Python F57 qualified and subscripted bases (attribute, subscript) C->Model (attribute), C->Generic (subscript) Yes
Ruby F63 qualified superclass and scoped class name (scope_resolution) Bar->Sup:extends Yes
Rust F69 impl ScopedTrait for Type (scoped_type_identifier) Foo->Trait:trait-impl Yes
Kotlin F46 interface delegation by (explicit_delegation) Foo->Bar:extends Yes
C++ F6 templated and qualified bases in legacy CPP_QUERIES (template_type, qualified_identifier) D->Base/Other Yes

What it fixes

Inheritance edges for qualified, generic, scoped, and interface supertypes were silently dropped across nearly every OO language, which fed directly into incomplete blast-radius and MRO results. The fix is centralized so the shape can no longer drift back open per language. Because a malformed tree-sitter query is caught and the whole file's heritage is skipped, a query-compile guard test now asserts that every language's full query string compiles, so a future bad fragment fails loudly in CI instead of regressing silently.

Files changed

File Type Description
src/core/ingestion/heritage-extractors/supertype-alternation.ts Added Shared, language-agnostic builder (buildSupertypeAlternation) and runtime name-normalizer (normalizeSupertypeName).
src/core/ingestion/heritage-extractors/configs/java.ts Added Java supertype shapes (generic, scoped, interface).
src/core/ingestion/heritage-extractors/configs/csharp.ts Added C# base-list shapes (qualified, scoped, generic, primary-constructor).
src/core/ingestion/heritage-extractors/configs/typescript.ts Added TypeScript class-extends and interface-extends shapes.
src/core/ingestion/heritage-extractors/configs/javascript.ts Added JavaScript class-extends shapes (bare and qualified).
src/core/ingestion/heritage-extractors/configs/python.ts Added Python superclass shapes (attribute, subscript).
src/core/ingestion/heritage-extractors/configs/rust.ts Added Rust impl trait and type shapes (scoped, generic).
src/core/ingestion/heritage-extractors/configs/kotlin.ts Added Kotlin delegation-specifier shapes (user type, constructor invocation, explicit delegation).
src/core/ingestion/heritage-extractors/configs/cpp.ts Added C++ base-class shapes (templated, qualified).
src/core/ingestion/heritage-extractors/generic.ts Modified Routes extends, implements, trait, and class names through normalizeSupertypeName; drops empty names. Covers both the worker and sequential paths.
src/core/ingestion/heritage-types.ts Modified Adds the SupertypeShapeDescriptor type consumed by the builder and configs.
src/core/ingestion/tree-sitter-queries.ts Modified All ten OO heritage blocks now interpolate the builder output, plus new container patterns for Java and TypeScript interfaces, C# record and struct, and Go interface embedding.
src/core/ingestion/heritage-extractors/configs/go.ts Modified Adds goHeritageShapes; existing shouldSkipExtends preserved.
src/core/ingestion/heritage-extractors/configs/ruby.ts Modified Adds rubyHeritageShapes; existing callBasedHeritage mixin routing preserved.
test/unit/supertype-alternation.test.ts Added Unit tests for the builder and normalizer across every supertype shape.
test/integration/heritage-supertype-shapes.test.ts Added Real tree-sitter fixture per finding, plus the query-compile guard for all languages.

New flow

per-language config (configs/<lang>.ts)
        │   declares supertype node-type shapes
        ▼
buildSupertypeAlternation(descriptor, tag)
        │   emits  [(type_identifier) (qualified) (generic) (scoped) (interface)] @heritage.*
        ▼
tree-sitter-queries.ts
        │   interpolates the fragment into each language's heritage block
        ▼
parse phase (worker pool and sequential both share the same query)
        │   query captures @heritage.class / @heritage.extends / @heritage.implements / @heritage.trait
        ▼
heritageExtractor.extract(captureMap)
        │   normalizeSupertypeName() reduces each matched node to its simple name
        │   (Base<T> becomes Base, pkg.Base becomes Base, ns::Base becomes Base, "Bar by baz" becomes Bar)
        ▼
heritage-processor (resolveAndAddHeritageEdge / processHeritageFromExtracted)
        │   resolves the name and chooses EXTENDS vs IMPLEMENTS downstream
        ▼
KnowledgeGraph
            EXTENDS / IMPLEMENTS edges now present for every supertype shape

Tests

Typecheck (npx tsc --noEmit) is clean. The heritage suite passes (5 files, 104 tests), and a broad sweep of the heritage, inheritance, extends, implements, MRO, and interface tests passes with no regressions (563 tests). The only failing test in the broad sweep is a pre-existing, environment-specific worker-pool spawn check (ruby-sequential-mixin), confirmed to fail identically on baseline with these changes stashed, so it is unrelated to this work.

Test file Tests Covers
test/integration/heritage-supertype-shapes.test.ts 29 One fixture per finding across all languages, plus the query-compile guard for every supported grammar.
test/unit/supertype-alternation.test.ts 23 The builder (alternation output, de-duplication) and the normalizer (leaf, field, children, generic strip, scoped strip, delegation).
test/unit/heritage-extraction.test.ts existing Heritage extractor behavior, unchanged and still passing.
test/integration/heritage-extractor-wiring.test.ts existing Provider wiring, unchanged and still passing.
test/unit/heritage-processor.test.ts existing Edge resolution and emission, unchanged and still passing.

Notes

This fixes the legacy @heritage.* query path, which is the live emission path for heritage in every language (it is ungated by the registry-primary migration). The registry path's existing C++ and Ruby-mixin emission is untouched and already correct. All thirteen node-type choices were validated against the official tree-sitter grammars at the versions this repo pins, and the alternation and hidden-rule handling were validated against the official tree-sitter query documentation.

References

Every supertype node type added to the configs was verified against the official tree-sitter grammar's generated node-types.json at the exact version this repo pins in package.json, so the shapes match the grammar actually loaded at parse time (no version drift). The shared-heritage approach itself (the bracketed alternation and the handling of hidden grammar rules) was verified against the official tree-sitter query documentation.

Grammar node-type sources (one per language, at the pinned version tag):

Language Pinned version Source verified against
Java 0.23.5 https://github.com/tree-sitter/tree-sitter-java/blob/v0.23.5/src/node-types.json
C# 0.23.1 https://github.com/tree-sitter/tree-sitter-c-sharp/blob/v0.23.1/src/node-types.json
TypeScript 0.23.2 https://github.com/tree-sitter/tree-sitter-typescript/blob/v0.23.2/typescript/src/node-types.json
JavaScript 0.23.0 https://github.com/tree-sitter/tree-sitter-javascript/blob/v0.23.0/src/node-types.json
Python 0.23.4 https://github.com/tree-sitter/tree-sitter-python/blob/v0.23.4/src/node-types.json
Go 0.23.0 https://github.com/tree-sitter/tree-sitter-go/blob/v0.23.0/src/node-types.json
Rust 0.23.1 https://github.com/tree-sitter/tree-sitter-rust/blob/v0.23.1/src/node-types.json
Ruby 0.23.1 https://github.com/tree-sitter/tree-sitter-ruby/blob/v0.23.1/src/node-types.json
C++ 0.23.2 https://github.com/tree-sitter/tree-sitter-cpp/blob/v0.23.2/src/node-types.json
Kotlin 0.3.8 (community, fwcd) https://github.com/fwcd/tree-sitter-kotlin/blob/0.3.8/src/node-types.json (repository confirmed via https://registry.npmjs.org/tree-sitter-kotlin)

Tree-sitter pattern and concept documentation:

The existing C++ registry-path emitter (languages/cpp/captures.ts, iterBaseClasses and extractBaseLookupName) was used as the in-repo reference for the name-normalizer, since it already reduces templated and qualified C++ bases to a simple name.

Testing & verification

  • cd gitnexus && npm test
  • cd gitnexus && npm run test:integration (if core/indexing/MCP paths changed)
  • cd gitnexus && npx tsc --noEmit
  • cd gitnexus-web && npm test (if web changed)
  • cd gitnexus-web && npx tsc -b --noEmit (if web changed)
  • Manual / Playwright E2E (note environment — see gitnexus-web/e2e/)

Risk & rollout

Checklist

  • PR body meets repo minimum length (workflow may label short descriptions)
  • If AGENTS.md / overlays changed: headers, scope block, and changelog updated per project conventions
  • No secrets, tokens, or machine-specific paths committed

Refs #1922

Refs #1923

Refs #1927

Refs #1930

Refs #1934

…eric, scoped, and interface bases produce inheritance edges across all OO languages, with per-language configs and fixtures.
@vercel

vercel Bot commented May 30, 2026

Copy link
Copy Markdown

@zander-raycraft is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10678 10668 0 10 728s

✅ All 10668 tests passed

10 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 80.32% 37026/46095 79.84% 📈 +0.5 🟢 ████████████████░░░░
Branches 68.91% 23568/34201 68.5% 📈 +0.4 🟢 █████████████░░░░░░░
Functions 85.38% 3832/4488 84.94% 📈 +0.4 🟢 █████████████████░░░
Lines 83.86% 33333/39748 83.36% 📈 +0.5 🟢 ████████████████░░░░

📋 View full run · Generated by CI

…meouts, ERROR/partial parse flags, tree-sitter pinned to 0.21.1, and CI ABI checks for every grammar.
@zander-raycraft

zander-raycraft commented May 30, 2026

Copy link
Copy Markdown
Collaborator Author

Summary

This PR update closes both cross-cutting infra findings from #1922. infra-2 makes parseSourceSafe honor its "parse safely" contract: it now enforces a per-parse timeout and detects ERROR/partial parses. infra-1 turns the alleged tree-sitter ABI skew into an enforced guarantee: the runtime is pinned exactly and a blocking CI assertion proves every grammar's ABI is compatible with it.

What it changes

Per-parse timeout (infra-2). parseSourceSafe arms a wall-clock budget (GITNEXUS_PARSE_TIMEOUT_MS, default 15s, 0 disables) before parsing. On timeout the runtime returns null, so the function calls parser.reset(), clears the budget, and throws a typed ParseTimeoutError. The budget is always cleared in a finally so it never leaks onto the next parse of the reused singleton parser. This protects the sequential and group extraction paths, which previously had no per-parse guard (only the worker pool did).

Intrinsic ERROR detection (infra-2). After a successful parse, parseSourceSafe itself checks rootNode.hasError/isMissing and logs a throttled, structured debug record, then returns the tree anyway. Error recovery is a downgrade, never a drop, since hasError is common in otherwise-valid code (grammar lag, macros, embedded DSLs). A parseHadErrors(tree) helper is exported for callers that want the boolean.

Exact ABI pin (infra-1). tree-sitter is pinned from the floating caret ^0.21.1 to exact 0.21.1, making "pinned to a compatible pair" literally true rather than drifting across patch releases.

Blocking CI assertion (infra-1). check-tree-sitter-upgrade-readiness.py gains an --assert-current mode that asserts every grammar's compiled ABI (including the vendored ones) is within the installed runtime's accepted range, keyed off the runtime's own ABI constants so it self-adjusts after any future bump. A new per-OS load-smoke test loads and parses through every grammar (Swift included, since its prebuilt binary has no introspectable parser.c). Both run in a blocking abi-assert CI job wired into the ci-status gate.

Rationale

The issue framed the 0.21 runtime against 0.23.x grammars as a latent crash. Investigation showed the pairing is already ABI-compatible: every grammar peer-depends on tree-sitter ^0.21.x, the runtime declares LANGUAGE_VERSION 14 / MIN_COMPATIBLE 13, and every grammar emits ABI 14 (in range). There is also no tree-sitter runtime 0.23.x at all (the npm line is 0.21, 0.22, 0.25), and 0.25 ships no prebuilds (a source-compile install regression). So the durable fix is the exact pin plus an enforced CI compatibility check, not a risky runtime bump. The timeout uses setTimeoutMicros, which exists and works on the pinned 0.21.1 runtime (empirically a 2ms budget cut an 84ms parse at 2.1ms), and is placed behind a small shim so the future 0.25/0.26 progressCallback swap is a one function change. A separate, gated follow-up may bump to 0.22.4 (confirmed safe, prebuilds present, bufferSize retained); it is intentionally out of scope here.

Issue Fix

Files changed

File Type Description
src/core/tree-sitter/safe-parse.ts Modified Per-parse timeout (budget, reset(), ParseTimeoutError, finally clear), intrinsic hasError/isMissing detection with throttled logging, parseHadErrors/getParseDiagnostics helpers, optional label param, version shim.
src/core/ingestion/import-processor.ts Modified Routes its existing astHasError telemetry through parseHadErrors.
package.json Modified tree-sitter ^0.21.1 to exact 0.21.1.
src/core/tree-sitter/parser-loader.ts Modified Adds listGrammarSources() introspection (additive) for the load-smoke.
src/core/group/extractors/grpc-patterns/proto.ts Modified Corrects a false "0.25 runtime" comment (runtime is 0.21.1, loads ABI 13 to 14).
.github/scripts/check-tree-sitter-upgrade-readiness.py Modified New --assert-current mode asserting every grammar ABI is in the current runtime's range, vendored grammars included, Swift deferred to the load-smoke.
.github/workflows/ci-tests.yml Modified Adds the blocking abi-assert matrix job (static check plus load-smoke) and exposes its result as an output.
.github/workflows/ci.yml Modified Wires the ABI result into the ci-status gate with an explicit failing clause.
scripts/cross-platform-tests.ts Modified Registers the load-smoke in NATIVE_ADDON_SMOKE for the OS matrix.
test/unit/safe-parse.test.ts Modified Timeout fires and parser recovers on the next parse; malformed input flagged degraded but kept; clean regression.
test/unit/cli-commands.test.ts Modified Updates a stale ^0.21.1 assertion to the exact pin.
test/unit/parser-loader-abi.test.ts Added Load-smoke over every SOURCES grammar (Swift included).

Acceptance

Requirement Met by
Runtime + grammar ABI pinned to a compatible pair exact tree-sitter: 0.21.1; pair proven ABI 14 in [13, 14]
with a CI/version assertion --assert-current (npm + vendored) plus per-OS load-smoke, blocking in ci-status
parseSourceSafe enforces a timeout budget plus null to reset() to ParseTimeoutError
and detects ERROR/partial parses intrinsic hasError/isMissing check inside parseSourceSafe

Tests

Typecheck clean (npx tsc --noEmit). Full unit suite 6800 passed, 0 failed. Full npm run test:integration (with its web bundling prebuild) exit 0, 3232 passed, 0 failed. The --assert-current gate exits 0 with all 14 introspectable grammars in range. New tests cover the timeout, the recover-after-timeout invariant, degraded-but-kept detection, and the grammar load-smoke.

@zander-raycraft zander-raycraft marked this pull request as ready for review May 31, 2026 00:36
@zander-raycraft zander-raycraft changed the title fix(audit): Centralize heritage supertype matching fix(audit): Centralize heritage supertype matching (#1921/#1922) May 31, 2026
@zander-raycraft

Copy link
Copy Markdown
Collaborator Author

@magyargergo lmk your thoughts

@magyargergo

Copy link
Copy Markdown
Collaborator

You are the best @zander-raycraft ! Will have look asap"

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated tri-review digest — multi-lane Claude review (correctness, adversarial, reliability, maintainability, testing + dedicated risk & CI lanes). Codex was dispatched as the independent engine but returned only an async launch stub with no usable findings, so this is a Claude-multi-lane review, not an independent cross-engine one — lane agreement here means "consistent across personas," not independent confirmation. Verify before acting.

This is a solid, well-tested refactor. CI gating is correctly wired (the new abi-assert job blocks merge on all 3 platforms with no continue-on-error/soft-fail). The safe-parse budget lifecycle (arm → finally-clear → reset() on null-return) is correct on every traced path; setTimeoutMicros/reset() exist in the pinned 0.21.1 runtime; resolveParseTimeoutMs fails safe on bad env input; the 15s budget < 30s pool idle timeout relationship holds. Multi-interface implements A, B multiplicity is not regressed (integration test confirms both edges). The normalizer resolves all standard generic/qualified/scoped shapes correctly. Two real issues stand out.

P1 — A single slow file can abort the entire ingestion run (new failure mode)

parseSourceSafe previously never threw; it now arms a 15s budget on every parse and throws ParseTimeoutError on timeout (safe-parse.ts:222). The new docstring (safe-parse.ts:106) asserts every caller wraps the call in try/catch — but two post-finalize scope-resolution hooks do not:

  • populateNamespaceSiblingsrun.ts:355java/package-siblings.ts:21
  • populateRangeBindingsrun.ts:380go/range-binding.ts:24, cpp/range-bindings.ts, rust/range-binding.ts

On a tree-cache miss (the worker-mode default for these files) the hooks re-parse; a >15s pathological file now throws uncaught, runner.ts:191-213 wraps and re-throws, and the whole analyze run aborts — where pre-PR it produced a degraded tree and continued. The per-language emitScopeCaptures path is safe (wrapped at scope-extractor-bridge.ts:49); the gap is specifically these two hooks. Fix: wrap the two hook calls (or the parseSourceSafe calls inside them) in try/catch and skip the offending file, and correct the docstring. (Verified by code-read across the cited files; converged across the correctness, adversarial, and reliability lanes.)

P2 — Kotlin class X : Base by delegate records the delegate as the supertype

Reproduced end-to-end: class Repo : DataSource by source yields heritage edge Repo → source instead of Repo → DataSource. The normalizer's right-to-left children-walk (supertype-alternation.ts:146-153) reaches explicit_delegation's trailing simple_identifier delegate — a LEAF_TYPE that is not in SKIPPED_INNER_TYPES — before the leading user_type. The by call() form works (its call_expression is skipped), so the bug is the bare-identifier / navigation-expression delegate form — and the only Kotlin delegation test uses the call form. Fix: for explicit_delegation prefer the leading user_type (or skip the delegate-expression node types). (Reproduced via a standalone tree-sitter-kotlin probe.)

Lower-priority / to verify

  • P3 degradedParseCount (safe-parse.ts:130) is a module global the comment calls "per-run," but it never resets in a long-lived worker / MCP process — after 20 degraded files, all further degraded-parse debug logs are suppressed process-wide. Diagnostics-only.
  • P3 import-processor.ts:338 swapped tree.rootNode?.hasError for parseHadErrors(tree), which dereferences rootNode without the null-guard the success path uses (safe-parse.ts:234). Low likelihood, narrow — real grammars always expose a root.
  • Go interface embedding (type_elem ${GO_EMBED_ALT}) may also capture Go 1.18+ type-set elements (int | float64) as spurious extends edges — a new false-positive surface; unverified, worth a check.
  • PHP/Swift/Dart go through createHeritageExtractor and now run normalizeSupertypeName universally despite having no descriptor and no shape test. For PHP this collapses qualified names to the simple name (verified: Models\BaseModelBaseModel), which is consistent with the PR's simple-name normalization thesis — but it's an untested behavior change for unported languages; confirm it matches resolution expectations.

Test gaps

No test exercises the timeout-throw propagation through the unguarded hooks (P1); the degraded-parse throttle (first-20-then-suppress) and the simplifyRawName fallback are untested; Kotlin constructor_invocation and bare-identifier delegation, and C# scoped_type / global::, are untested.

Maintainability (non-blocking)

No compile-time/registry gate links descriptor → *_ALT constant → query interpolation, so a new language config can be added and silently not wired. The cross-grammar INNER_NAME_FIELDS / LEAF_TYPES / SKIPPED_INNER_TYPES god-lists in supertype-alternation.ts have no per-shape coverage test (a typo'd or missing shape silently falls through to simplifyRawName).

CI

Green except a Vercel "Authorization required" failure (auth-only, irrelevant to this PR).


Automated multi-tool digest — verify findings before acting. Codex did not contribute; treat lane agreement as persona-consistency, not independent confirmation.

Comment thread gitnexus/src/core/tree-sitter/safe-parse.ts Outdated
// type arguments or the delegate expression, not the supertype name.
// (Kotlin `constructor_invocation`/`explicit_delegation` wrap the
// user_type first and a value_arguments / call_expression second.)
if (SKIPPED_INNER_TYPES.has(child.type)) continue;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — Kotlin by-delegation yields the wrong heritage edge. This right-to-left children-walk reaches explicit_delegation's trailing simple_identifier (the delegate) before its leading user_type (the real supertype). simple_identifier is a LEAF_TYPE and is not in SKIPPED_INNER_TYPES, so it wins.

Reproduced with a tree-sitter-kotlin probe:

class Repo : DataSource by source     => normalized: "source"      (WRONG — should be DataSource)
class Repo : DataSource by provider() => normalized: "DataSource"  (call form works: call_expression is skipped)

So class Repo : DataSource by source records Repo extends source (a delegate property, not a type). The only Kotlin delegation test uses the by call() form, which masks this.

Fix: for explicit_delegation, prefer the leading user_type (or add the delegate-expression node types to the skip set). [reproduced].

@zander-raycraft

Copy link
Copy Markdown
Collaborator Author

Thanks! @magyargergo

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🔥

@magyargergo magyargergo merged commit fca30c7 into abhigyanpatwari:main Jun 1, 2026
33 of 34 checks passed
magyargergo added a commit that referenced this pull request Jun 1, 2026
…alized heritage)

PR #1940 ("Centralize heritage supertype matching") refactored the legacy
@Heritage queries from hand-written per-language arms into config-driven
alternations: heritage-extractors/configs/<lang>.ts declare the supertype node
shapes, buildSupertypeAlternation() generates the [(a)(b)...]@Heritage.* arms,
and normalizeSupertypeName() reduces the matched supertype node to its innermost
simple name at runtime.

Conflict: tree-sitter-queries.ts (both sides rewrote the heritage section).
Resolved by taking #1940's version wholesale — every change this branch made to
that file (the U1 Rust scoped impl_item arm, the U2 Java/TS scoped arms, and the
Java end-anchor 2-segment fix) is fully superseded: #1940's shape descriptors
already include the same scoped shapes (Java/Rust scoped_type_identifier; TS
member_expression + nested_type_identifier), and capturing the whole supertype
node + normalizing at runtime structurally avoids the 2-segment double-match the
end-anchor was guarding. The registry-primary synth changes (rust/ts captures.ts)
are on a different path and untouched by #1940; they remain and continue to agree
with the legacy leg.

Verified on the merged tree: tsc clean; the java/typescript/rust qualified-base
fixtures pass under BOTH legs (registry synth == #1940 legacy normalizer).
magyargergo added a commit that referenced this pull request Jun 1, 2026
… 7 registry synths (#1956)

PR #1940 ("Centralize heritage supertype matching") widened the LEGACY @Heritage
leg (config-driven shapes + normalizeSupertypeName) to capture qualified /
generic / scoped / attribute / subscript / record / struct / delegation /
member-expression / interface-embed bases. But the registry-PRIMARY synths (the
production path for migrated languages) were never widened to match, so in
production these inheritance edges were silently dropped — a pre-existing gap
exposed as a legacy>synth asymmetry by an audit of the post-#1940 merge. This is
the exact #1951 theme; rust/ts/cpp already handled their qualified bases, so this
brings the remaining 7 languages to parity:

- python: attribute (class X(pkg.Base)) + subscript (Generic[T]) bases
- go:     qualified_type (pkg.Base), generic_type (Box[T]), pointer, AND
          interface_type embeds (the synth previously walked struct_type only)
- javascript: member_expression base (extends ns.Base)
- csharp: walk record_declaration + struct_declaration base_lists (incl.
          primary_constructor_base_type) + alias_qualified_name
- ruby:   scope_resolution superclass (class C < Mod::Super)
- kotlin: explicit_delegation (class F : Iface by d)
- java:   walk interface_declaration extends_interfaces (interface IA extends IB)

Each synth's base-name extractor was widened to return the trailing/inner node
for the new shapes (existing simple-base path byte-identical); each was
real-parse-verified so the synth's bare name equals normalizeSupertypeName(base)
— the legacy leg's reduction — guaranteeing registry<->legacy agreement. New
<lang>-qualified-base / java-iface-extends fixtures + both-leg parity blocks
assert the new edges; full parity suite is 28/28 (all 14 langs, both legs).
Goldens regenerated additively (existing fixtures gain one inherits capture
each); scope-capture + python-scope benches re-baselined, all linear.

Known follow-up: csharp record->record in the SAME namespace
(record UserRecord : BaseEntity) — the synth emits the capture but the
same-namespace record-target binding is not resolved on the registry leg
(legacy does emit it); a separate registry resolution gap, not asserted here to
avoid a leg divergence.
@magyargergo magyargergo linked an issue Jun 4, 2026 that may be closed by this pull request
5 tasks
magyargergo added a commit that referenced this pull request Jun 5, 2026
* fix(java): close parsing-layer coverage gaps F35/F38/F41 (#1928)

Registry-primary scope-resolution path (the live one post-#942/#943):

- F35 [HIGH]: qualified / qualified-generic constructor calls. `new pkg.Foo()`
  parses as a `scoped_type_identifier` that the query bound only as
  `@reference.call.constructor.qualified` with no `@reference.name`, so the
  scope extractor fell back to the whole-expression anchor and the reference
  name became the raw `new pkg.Foo()` text (never resolved). Bind the simple
  -name tail (end-anchored last child) and add an arm for the previously
  uncaptured `new pkg.Box<String>()` (qualified + generic) shape.

- F38 [MEDIUM]: `super(...)` / `this(...)` explicit constructor invocations,
  modeled as `explicit_constructor_invocation` and never matched by the scope
  query, dropped the chained-constructor CALLS edges. Synthesize them with the
  target resolved structurally (this -> enclosing type name; super -> superclass
  tail via the shared javaBaseLookupNameNode, skipping implicit Object) plus
  arity for overload disambiguation.

- F41 [LOW]: interpretJavaTypeBinding stripped the qualifier before generics, so
  a qualified generic type arg (`Map<String, com.example.User>`) was cut inside
  the generic into `User>`. Strip generics first, then the qualifier; make the
  erasure fallback qualifier-tolerant.

F36/F37 already landed upstream (#1940/#1956); F39/F40 are legacy-bank remnants
that are no longer consumed (legacy @import skipped in parse-worker; legacy
@call never read in parse-impl) so they are intentionally left untouched.

Tests: low-level capture unit tests (constructor shapes incl. double-match
guard; super/this/enum/implicit-Object), interpretJavaTypeBinding unit tests
(qualified generic args + the corruption case), and end-to-end resolver tests
with new fixtures asserting the CALLS edges resolve to the correct constructors.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(scope-resolution): register Constructor overload keys so this()/super() chains don't self-loop (#1928 F38 review)

Review of #2045 caught two gaps; both confirmed by reproduction.

P2 — F38 this() emitted a self-loop. On the java-explicit-constructor fixture,
Child(int){ this(); } produced CALLS Child()#0 -> Child()#0 instead of
Child(int)#1 -> Child()#0. Root cause is the language-agnostic graph-bridge: the
parse phase mints distinct Constructor nodes (Child#0, Child#1) carrying
parameterTypes, but node-lookup.ts registered the parameter-types / shape
overload keys only for Function/Method, never Constructor, so both ctors
collapsed onto the first-wins qualified/simple key and the caller Child(int)
resolved to Child#0 (the this() target). Extend the overload keys to Constructor
in both node-lookup.ts (registration) and ids.ts (lookup) via a shared
isOverloadableCallable predicate. Verified the edge now connects distinct nodes
(Child#1 -> Child#0); super(1)->Base#1 still correct. No cross-language
regressions (the 9 worker-path failures reproduce identically on clean HEAD).

Also harden the integration test: it matched the this() edge on name only, which
a self-loop satisfies; now assert the endpoints are DISTINCT constructors.

P3 — F41 order-regression guard was inert (List<Map<String,User>> normalizes to
List under both strip orders). Add List<com.x.Foo<String>> -> List, which is
corrupted to Foo<String>> under the old order and only correct generics-first.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(java): update fingerprint and add notes for constructor query captures in baselines.json

Updated the fingerprint for the Java section and added detailed notes regarding the enhancements in constructor query captures, including qualified and qualified-generic constructor queries. This change reflects ongoing improvements in the parsing layer coverage and fixture updates.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Gergő Magyar <gergomagyar@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants