fix(audit): Centralize heritage supertype matching (#1921/#1922)#1940
Conversation
…eric, scoped, and interface bases produce inheritance edges across all OO languages, with per-language configs and fixtures.
|
@zander-raycraft is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10668 tests passed 10 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
…meouts, ERROR/partial parse flags, tree-sitter pinned to 0.21.1, and CI ABI checks for every grammar.
SummaryThis PR update closes both cross-cutting infra findings from #1922. infra-2 makes What it changesPer-parse timeout (infra-2). Intrinsic ERROR detection (infra-2). After a successful parse, Exact ABI pin (infra-1). Blocking CI assertion (infra-1). RationaleThe issue framed the 0.21 runtime against 0.23.x grammars as a latent crash. Investigation showed the pairing is already ABI-compatible: every grammar peer-depends on Issue FixFiles changed
Acceptance
TestsTypecheck clean ( |
|
@magyargergo lmk your thoughts |
|
You are the best @zander-raycraft ! Will have look asap" |
magyargergo
left a comment
There was a problem hiding this comment.
Automated tri-review digest — multi-lane Claude review (correctness, adversarial, reliability, maintainability, testing + dedicated risk & CI lanes). Codex was dispatched as the independent engine but returned only an async launch stub with no usable findings, so this is a Claude-multi-lane review, not an independent cross-engine one — lane agreement here means "consistent across personas," not independent confirmation. Verify before acting.
This is a solid, well-tested refactor. CI gating is correctly wired (the new abi-assert job blocks merge on all 3 platforms with no continue-on-error/soft-fail). The safe-parse budget lifecycle (arm → finally-clear → reset() on null-return) is correct on every traced path; setTimeoutMicros/reset() exist in the pinned 0.21.1 runtime; resolveParseTimeoutMs fails safe on bad env input; the 15s budget < 30s pool idle timeout relationship holds. Multi-interface implements A, B multiplicity is not regressed (integration test confirms both edges). The normalizer resolves all standard generic/qualified/scoped shapes correctly. Two real issues stand out.
P1 — A single slow file can abort the entire ingestion run (new failure mode)
parseSourceSafe previously never threw; it now arms a 15s budget on every parse and throws ParseTimeoutError on timeout (safe-parse.ts:222). The new docstring (safe-parse.ts:106) asserts every caller wraps the call in try/catch — but two post-finalize scope-resolution hooks do not:
populateNamespaceSiblings—run.ts:355→java/package-siblings.ts:21populateRangeBindings—run.ts:380→go/range-binding.ts:24,cpp/range-bindings.ts,rust/range-binding.ts
On a tree-cache miss (the worker-mode default for these files) the hooks re-parse; a >15s pathological file now throws uncaught, runner.ts:191-213 wraps and re-throws, and the whole analyze run aborts — where pre-PR it produced a degraded tree and continued. The per-language emitScopeCaptures path is safe (wrapped at scope-extractor-bridge.ts:49); the gap is specifically these two hooks. Fix: wrap the two hook calls (or the parseSourceSafe calls inside them) in try/catch and skip the offending file, and correct the docstring. (Verified by code-read across the cited files; converged across the correctness, adversarial, and reliability lanes.)
P2 — Kotlin class X : Base by delegate records the delegate as the supertype
Reproduced end-to-end: class Repo : DataSource by source yields heritage edge Repo → source instead of Repo → DataSource. The normalizer's right-to-left children-walk (supertype-alternation.ts:146-153) reaches explicit_delegation's trailing simple_identifier delegate — a LEAF_TYPE that is not in SKIPPED_INNER_TYPES — before the leading user_type. The by call() form works (its call_expression is skipped), so the bug is the bare-identifier / navigation-expression delegate form — and the only Kotlin delegation test uses the call form. Fix: for explicit_delegation prefer the leading user_type (or skip the delegate-expression node types). (Reproduced via a standalone tree-sitter-kotlin probe.)
Lower-priority / to verify
- P3
degradedParseCount(safe-parse.ts:130) is a module global the comment calls "per-run," but it never resets in a long-lived worker / MCP process — after 20 degraded files, all further degraded-parse debug logs are suppressed process-wide. Diagnostics-only. - P3
import-processor.ts:338swappedtree.rootNode?.hasErrorforparseHadErrors(tree), which dereferencesrootNodewithout the null-guard the success path uses (safe-parse.ts:234). Low likelihood, narrow — real grammars always expose a root. - Go interface embedding
(type_elem ${GO_EMBED_ALT})may also capture Go 1.18+ type-set elements (int | float64) as spuriousextendsedges — a new false-positive surface; unverified, worth a check. - PHP/Swift/Dart go through
createHeritageExtractorand now runnormalizeSupertypeNameuniversally despite having no descriptor and no shape test. For PHP this collapses qualified names to the simple name (verified:Models\BaseModel→BaseModel), which is consistent with the PR's simple-name normalization thesis — but it's an untested behavior change for unported languages; confirm it matches resolution expectations.
Test gaps
No test exercises the timeout-throw propagation through the unguarded hooks (P1); the degraded-parse throttle (first-20-then-suppress) and the simplifyRawName fallback are untested; Kotlin constructor_invocation and bare-identifier delegation, and C# scoped_type / global::, are untested.
Maintainability (non-blocking)
No compile-time/registry gate links descriptor → *_ALT constant → query interpolation, so a new language config can be added and silently not wired. The cross-grammar INNER_NAME_FIELDS / LEAF_TYPES / SKIPPED_INNER_TYPES god-lists in supertype-alternation.ts have no per-shape coverage test (a typo'd or missing shape silently falls through to simplifyRawName).
CI
Green except a Vercel "Authorization required" failure (auth-only, irrelevant to this PR).
Automated multi-tool digest — verify findings before acting. Codex did not contribute; treat lane agreement as persona-consistency, not independent confirmation.
| // type arguments or the delegate expression, not the supertype name. | ||
| // (Kotlin `constructor_invocation`/`explicit_delegation` wrap the | ||
| // user_type first and a value_arguments / call_expression second.) | ||
| if (SKIPPED_INNER_TYPES.has(child.type)) continue; |
There was a problem hiding this comment.
P2 — Kotlin by-delegation yields the wrong heritage edge. This right-to-left children-walk reaches explicit_delegation's trailing simple_identifier (the delegate) before its leading user_type (the real supertype). simple_identifier is a LEAF_TYPE and is not in SKIPPED_INNER_TYPES, so it wins.
Reproduced with a tree-sitter-kotlin probe:
class Repo : DataSource by source => normalized: "source" (WRONG — should be DataSource)
class Repo : DataSource by provider() => normalized: "DataSource" (call form works: call_expression is skipped)
So class Repo : DataSource by source records Repo extends source (a delegate property, not a type). The only Kotlin delegation test uses the by call() form, which masks this.
Fix: for explicit_delegation, prefer the leading user_type (or add the delegate-expression node types to the skip set). [reproduced].
|
Thanks! @magyargergo |
…alized heritage) PR #1940 ("Centralize heritage supertype matching") refactored the legacy @Heritage queries from hand-written per-language arms into config-driven alternations: heritage-extractors/configs/<lang>.ts declare the supertype node shapes, buildSupertypeAlternation() generates the [(a)(b)...]@Heritage.* arms, and normalizeSupertypeName() reduces the matched supertype node to its innermost simple name at runtime. Conflict: tree-sitter-queries.ts (both sides rewrote the heritage section). Resolved by taking #1940's version wholesale — every change this branch made to that file (the U1 Rust scoped impl_item arm, the U2 Java/TS scoped arms, and the Java end-anchor 2-segment fix) is fully superseded: #1940's shape descriptors already include the same scoped shapes (Java/Rust scoped_type_identifier; TS member_expression + nested_type_identifier), and capturing the whole supertype node + normalizing at runtime structurally avoids the 2-segment double-match the end-anchor was guarding. The registry-primary synth changes (rust/ts captures.ts) are on a different path and untouched by #1940; they remain and continue to agree with the legacy leg. Verified on the merged tree: tsc clean; the java/typescript/rust qualified-base fixtures pass under BOTH legs (registry synth == #1940 legacy normalizer).
… 7 registry synths (#1956) PR #1940 ("Centralize heritage supertype matching") widened the LEGACY @Heritage leg (config-driven shapes + normalizeSupertypeName) to capture qualified / generic / scoped / attribute / subscript / record / struct / delegation / member-expression / interface-embed bases. But the registry-PRIMARY synths (the production path for migrated languages) were never widened to match, so in production these inheritance edges were silently dropped — a pre-existing gap exposed as a legacy>synth asymmetry by an audit of the post-#1940 merge. This is the exact #1951 theme; rust/ts/cpp already handled their qualified bases, so this brings the remaining 7 languages to parity: - python: attribute (class X(pkg.Base)) + subscript (Generic[T]) bases - go: qualified_type (pkg.Base), generic_type (Box[T]), pointer, AND interface_type embeds (the synth previously walked struct_type only) - javascript: member_expression base (extends ns.Base) - csharp: walk record_declaration + struct_declaration base_lists (incl. primary_constructor_base_type) + alias_qualified_name - ruby: scope_resolution superclass (class C < Mod::Super) - kotlin: explicit_delegation (class F : Iface by d) - java: walk interface_declaration extends_interfaces (interface IA extends IB) Each synth's base-name extractor was widened to return the trailing/inner node for the new shapes (existing simple-base path byte-identical); each was real-parse-verified so the synth's bare name equals normalizeSupertypeName(base) — the legacy leg's reduction — guaranteeing registry<->legacy agreement. New <lang>-qualified-base / java-iface-extends fixtures + both-leg parity blocks assert the new edges; full parity suite is 28/28 (all 14 langs, both legs). Goldens regenerated additively (existing fixtures gain one inherits capture each); scope-capture + python-scope benches re-baselined, all linear. Known follow-up: csharp record->record in the SAME namespace (record UserRecord : BaseEntity) — the synth emits the capture but the same-namespace record-target binding is not resolved on the registry leg (legacy does emit it); a separate registry resolution gap, not asserted here to avoid a leg divergence.
* fix(java): close parsing-layer coverage gaps F35/F38/F41 (#1928) Registry-primary scope-resolution path (the live one post-#942/#943): - F35 [HIGH]: qualified / qualified-generic constructor calls. `new pkg.Foo()` parses as a `scoped_type_identifier` that the query bound only as `@reference.call.constructor.qualified` with no `@reference.name`, so the scope extractor fell back to the whole-expression anchor and the reference name became the raw `new pkg.Foo()` text (never resolved). Bind the simple -name tail (end-anchored last child) and add an arm for the previously uncaptured `new pkg.Box<String>()` (qualified + generic) shape. - F38 [MEDIUM]: `super(...)` / `this(...)` explicit constructor invocations, modeled as `explicit_constructor_invocation` and never matched by the scope query, dropped the chained-constructor CALLS edges. Synthesize them with the target resolved structurally (this -> enclosing type name; super -> superclass tail via the shared javaBaseLookupNameNode, skipping implicit Object) plus arity for overload disambiguation. - F41 [LOW]: interpretJavaTypeBinding stripped the qualifier before generics, so a qualified generic type arg (`Map<String, com.example.User>`) was cut inside the generic into `User>`. Strip generics first, then the qualifier; make the erasure fallback qualifier-tolerant. F36/F37 already landed upstream (#1940/#1956); F39/F40 are legacy-bank remnants that are no longer consumed (legacy @import skipped in parse-worker; legacy @call never read in parse-impl) so they are intentionally left untouched. Tests: low-level capture unit tests (constructor shapes incl. double-match guard; super/this/enum/implicit-Object), interpretJavaTypeBinding unit tests (qualified generic args + the corruption case), and end-to-end resolver tests with new fixtures asserting the CALLS edges resolve to the correct constructors. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(scope-resolution): register Constructor overload keys so this()/super() chains don't self-loop (#1928 F38 review) Review of #2045 caught two gaps; both confirmed by reproduction. P2 — F38 this() emitted a self-loop. On the java-explicit-constructor fixture, Child(int){ this(); } produced CALLS Child()#0 -> Child()#0 instead of Child(int)#1 -> Child()#0. Root cause is the language-agnostic graph-bridge: the parse phase mints distinct Constructor nodes (Child#0, Child#1) carrying parameterTypes, but node-lookup.ts registered the parameter-types / shape overload keys only for Function/Method, never Constructor, so both ctors collapsed onto the first-wins qualified/simple key and the caller Child(int) resolved to Child#0 (the this() target). Extend the overload keys to Constructor in both node-lookup.ts (registration) and ids.ts (lookup) via a shared isOverloadableCallable predicate. Verified the edge now connects distinct nodes (Child#1 -> Child#0); super(1)->Base#1 still correct. No cross-language regressions (the 9 worker-path failures reproduce identically on clean HEAD). Also harden the integration test: it matched the this() edge on name only, which a self-loop satisfies; now assert the endpoints are DISTINCT constructors. P3 — F41 order-regression guard was inert (List<Map<String,User>> normalizes to List under both strip orders). Add List<com.x.Foo<String>> -> List, which is corrupted to Foo<String>> under the old order and only correct generics-first. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(java): update fingerprint and add notes for constructor query captures in baselines.json Updated the fingerprint for the Java section and added detailed notes regarding the enhancements in constructor query captures, including qualified and qualified-generic constructor queries. This change reflects ongoing improvements in the parsing layer coverage and fixture updates. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Gergő Magyar <gergomagyar@icloud.com>
Summary
This PR fixes a systemic gap where heritage queries (extends, implements, embeds, trait impls) across nearly every OO language matched only the bare
type_identifierbase and silently dropped qualified, generic, scoped, and interface supertypes. The shape was being patched one language at a time and never generalized. This change introduces a single shared supertype-alternation builder plus a matching runtime name-normalizer, adopts it across all ten OO language heritage queries, and extends theheritage-extractors/configs/framework (previously onlygo.tsandruby.ts) to cover the rest. The result is one source of truth for "what a supertype can look like," so qualified, generic, scoped, and interface bases now produceEXTENDSandIMPLEMENTSedges instead of being lost.What it changes
One shared alternation builder.
buildSupertypeAlternation()takes a per-language shape descriptor (the set of tree-sitter node types a supertype can take) and returns the idiomatic[(a) (b) (c)] @heritage.*alternation fragment. The heritage query blocks interpolate this instead of each re-deriving the shape inline.One shared name-normalizer.
normalizeSupertypeName()reduces whatever node actually matched down to the innermost simple identifier, soBase<T>,pkg.Base, andns::Baseall collapse toBase. It is node-type driven (no language names), tries field access first then walks children, and skips generic argument lists and delegate or call subtrees (so Kotlin: Bar by bazresolves toBar, not the delegate). This mirrors the existing C++ registry path so the simple-namectx.resolve(name)contract keeps holding everywhere.Per-language shape configs. Each OO language now declares its supertype node types in
heritage-extractors/configs/<lang>.ts, consumed both by the query builder and by the runtime normalizer. The existinggo.tsandruby.tskeep their hooks (shouldSkipExtends,callBasedHeritage) and gain a shape descriptor alongside them.Widened heritage queries and missing containers. Every legacy heritage block in
tree-sitter-queries.tsnow uses the builder output, and the previously absent container patterns are added: Java and TypeScript interface inheritance, C#recordandstructbase lists, and Go interface-in-interface embedding.No downstream changes. Edge resolution,
EXTENDSversusIMPLEMENTSselection, MRO, and the schema are untouched. The change lives entirely at the query and name-extraction layer, which is exactly why it is the durable fix. No language provider files were modified.Issue Reference
Findings addressed
Every finding from the issue now emits its heritage edge, each covered by a fixture in
test/integration/heritage-supertype-shapes.test.ts.generic_type,scoped_type_identifier)A->Base:extends,A->IFoo/Bar:implementsextends_interfaces)IA->IB/IC:implementsextends_type_clause)I->A/B:implementsmember_expression)C->Base:extendsmember_expression)C->Base:extendsrecord_declaration,struct_declarationbase lists)R->Base/IFoo,S->IFoo/IBarqualified_name,primary_constructor_base_type)A->Base(qualified),R->Base(primary ctor)qualified_type,generic_type,type_elem)D->Base/Gen/Animal,I->Reader/Other, named field correctly skippedattribute,subscript)C->Model(attribute),C->Generic(subscript)scope_resolution)Bar->Sup:extendsimpl ScopedTrait for Type(scoped_type_identifier)Foo->Trait:trait-implby(explicit_delegation)Foo->Bar:extendsCPP_QUERIES(template_type,qualified_identifier)D->Base/OtherWhat it fixes
Inheritance edges for qualified, generic, scoped, and interface supertypes were silently dropped across nearly every OO language, which fed directly into incomplete blast-radius and MRO results. The fix is centralized so the shape can no longer drift back open per language. Because a malformed tree-sitter query is caught and the whole file's heritage is skipped, a query-compile guard test now asserts that every language's full query string compiles, so a future bad fragment fails loudly in CI instead of regressing silently.
Files changed
src/core/ingestion/heritage-extractors/supertype-alternation.tsbuildSupertypeAlternation) and runtime name-normalizer (normalizeSupertypeName).src/core/ingestion/heritage-extractors/configs/java.tssrc/core/ingestion/heritage-extractors/configs/csharp.tssrc/core/ingestion/heritage-extractors/configs/typescript.tssrc/core/ingestion/heritage-extractors/configs/javascript.tssrc/core/ingestion/heritage-extractors/configs/python.tssrc/core/ingestion/heritage-extractors/configs/rust.tssrc/core/ingestion/heritage-extractors/configs/kotlin.tssrc/core/ingestion/heritage-extractors/configs/cpp.tssrc/core/ingestion/heritage-extractors/generic.tsnormalizeSupertypeName; drops empty names. Covers both the worker and sequential paths.src/core/ingestion/heritage-types.tsSupertypeShapeDescriptortype consumed by the builder and configs.src/core/ingestion/tree-sitter-queries.tssrc/core/ingestion/heritage-extractors/configs/go.tsgoHeritageShapes; existingshouldSkipExtendspreserved.src/core/ingestion/heritage-extractors/configs/ruby.tsrubyHeritageShapes; existingcallBasedHeritagemixin routing preserved.test/unit/supertype-alternation.test.tstest/integration/heritage-supertype-shapes.test.tsNew flow
Tests
Typecheck (
npx tsc --noEmit) is clean. The heritage suite passes (5 files, 104 tests), and a broad sweep of the heritage, inheritance, extends, implements, MRO, and interface tests passes with no regressions (563 tests). The only failing test in the broad sweep is a pre-existing, environment-specific worker-pool spawn check (ruby-sequential-mixin), confirmed to fail identically on baseline with these changes stashed, so it is unrelated to this work.test/integration/heritage-supertype-shapes.test.tstest/unit/supertype-alternation.test.tstest/unit/heritage-extraction.test.tstest/integration/heritage-extractor-wiring.test.tstest/unit/heritage-processor.test.tsNotes
This fixes the legacy
@heritage.*query path, which is the live emission path for heritage in every language (it is ungated by the registry-primary migration). The registry path's existing C++ and Ruby-mixin emission is untouched and already correct. All thirteen node-type choices were validated against the official tree-sitter grammars at the versions this repo pins, and the alternation and hidden-rule handling were validated against the official tree-sitter query documentation.References
Every supertype node type added to the configs was verified against the official tree-sitter grammar's generated
node-types.jsonat the exact version this repo pins inpackage.json, so the shapes match the grammar actually loaded at parse time (no version drift). The shared-heritage approach itself (the bracketed alternation and the handling of hidden grammar rules) was verified against the official tree-sitter query documentation.Grammar node-type sources (one per language, at the pinned version tag):
Tree-sitter pattern and concept documentation:
[(a) (b) (c)](the one-of pattern the builder emits): https://tree-sitter.github.io/tree-sitter/using-parsers/queries/2-operators.htmlnode-types.jsonis the generated, authoritative description of a grammar's nodes (so it is the correct source to validate against): https://tree-sitter.github.io/tree-sitter/using-parsers/6-static-node-types.html_-prefixed rules such as Java_typeare enumerated as their concrete subtypes rather than matched directly): https://tree-sitter.github.io/tree-sitter/creating-parsers/3-writing-the-grammar.htmlThe existing C++ registry-path emitter (
languages/cpp/captures.ts,iterBaseClassesandextractBaseLookupName) was used as the in-repo reference for the name-normalizer, since it already reduces templated and qualified C++ bases to a simple name.Testing & verification
cd gitnexus && npm testcd gitnexus && npm run test:integration(if core/indexing/MCP paths changed)cd gitnexus && npx tsc --noEmitcd gitnexus-web && npm test(if web changed)cd gitnexus-web && npx tsc -b --noEmit(if web changed)gitnexus-web/e2e/)Risk & rollout
Checklist
AGENTS.md/ overlays changed: headers, scope block, and changelog updated per project conventionsRefs #1922
Refs #1923
Refs #1927
Refs #1930
Refs #1934