Skip to content

feat: language-aware code intelligence — symbol resolution, MRO, constructor discrimination#238

Merged
magyargergo merged 37 commits into
mainfrom
feat/method-resolution-order
Mar 13, 2026
Merged

feat: language-aware code intelligence — symbol resolution, MRO, constructor discrimination#238
magyargergo merged 37 commits into
mainfrom
feat/method-resolution-order

Conversation

@magyargergo

@magyargergo magyargergo commented Mar 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR builds a language-aware code intelligence engine -- the core of an LSP-grade symbol resolver that understands imports, inheritance, method ownership, constructor patterns, receiver-typed member calls, and call-site semantics across 9 languages.

Symbol Resolution Engine

  • 3-tier resolver: exact FQN -> scope-walk (file/namespace) -> guarded fuzzy fallback that refuses ambiguous matches
  • Import-aware disambiguation: resolves duplicate symbol names using import chains (PSR-4, JVM packages, Go packages, Rust modules, C++ includes, C# namespaces, Python relative imports)
  • Call-form discrimination: callForm field distinguishes regular calls, member calls (obj.method()), and constructor calls (new Foo() / struct literals) -- each form gets different candidate filtering
  • Arity filtering: narrows overloaded candidates by argument count when multiple symbols share a name
  • Receiver-type filtering: for member calls with typed receivers, filters candidates by ownerId matching the resolved receiver type -- disambiguates user.save() vs repo.save() when both User and Repo have save() methods

Receiver-Constrained Resolution (Phase 3)

  • TypeEnv: per-file Map<string, string> built from explicit type annotations (let x: T, T x, x: T, var x models.T, typed params)
  • extractReceiverName: extracts receiver identifier from member call AST across all 9 languages
  • resolveCallTarget step D: filters candidates by ownerId matching resolved receiver type nodeId
  • Go receiver types: findEnclosingClassId extracts receiver type from method_declaration (e.g., func (u *User) Save() -> User struct)
  • Safety: falls back to name-only resolution if receiver type unknown; refuses if 2+ candidates remain after filtering

Constructor & Struct Literal Resolution

  • new Foo() resolves to Constructor > Class (TS, Java, C#, C++, PHP, Kotlin)
  • Struct literals (User{...} in Go, User { ... } in Rust) resolve to Struct nodes
  • C# primary constructors (class User(string name, int age)) emit Constructor nodes with correct parameterCount
  • C# target-typed new (User u = new("x")) extracts type from variable declaration

Method Resolution Order (MRO)

  • Walks inheritance DAG, detects method-name collisions across parents
  • 5 language-specific strategies: C++ leftmost-base, C#/Java class-over-interface, Python C3 linearization, Rust qualified syntax, default BFS
  • Emits OVERRIDES edges when inherited methods collide (never targets Property nodes)

Heritage & Ownership

  • HAS_METHOD edges: Class->Method ownership from AST container analysis
  • Method signatures: parameterCount and returnType extracted for Method/Constructor nodes
  • Go struct embedding: EXTENDS edges for embedded structs + PackageMap optimization
  • Swift extension heritage: extension declarations emit EXTENDS edges
  • Rust grouped imports: use crate::{models, other} correctly expanded

Resolver Architecture

  • Language-specific resolvers extracted into resolvers/ directory (JVM, Go, C#, PHP, Rust, standard, utils)
  • import-processor.ts reduced from 1412 -> 711 lines (50% reduction)

Bug Fixes

  • C/C++ resolveImportPath no longer mangles #include paths
  • Heritage heuristic language-gated (no longer applies class/interface rules to wrong languages)
  • C# base_list distinguishes EXTENDS vs IMPLEMENTS via symbol table + I[A-Z] heuristic
  • Global tier no longer blocks resolution when kind/arity filtering can narrow to 1 candidate
  • Go qualified_type (models.User) correctly unwrapped in TypeEnv
  • Kotlin user_type wrapper and variable_declaration nesting handled in TypeEnv
  • C++ field_expression receiver extraction via argument field

Schema

  • Method table: parameterCount INT32, returnType STRING
  • Cypher examples updated with HAS_METHOD, OVERRIDES, diamond inheritance queries

Test plan

Integration tests -- 9 languages (test/integration/resolvers/)

Language Tests Coverage
TypeScript 19 heritage, imports, member-calls, constructor resolution, receiver-constrained
Java 22 heritage, ambiguous, arity filtering, member-calls, constructors, receiver-constrained
C# 23 heritage, ambiguous, arity, member-calls, primary constructors, receiver-constrained
C++ 18 diamond inheritance, ambiguous, arity, member-calls, constructors, receiver-constrained
Python 14 heritage, ambiguous, arity, member-calls, receiver-constrained
Go 18 struct embedding, PackageMap, ambiguous, arity, member-calls, struct literals, receiver-constrained
Rust 17 trait-impl, ambiguous, arity, member-calls, struct literals, receiver-constrained
Kotlin 18 delegation specifiers, JVM imports, ambiguous, arity, member-calls, receiver-constrained
PHP 29 PSR-4, traits, enums, MRO overrides, ambiguous, arity, member-calls, constructors, receiver-constrained

Unit tests

  • 13 MRO tests: C++ diamond, C#/Java class+interface, Python C3 + cycle detection, Rust traits
  • 21 HAS_METHOD tests: container types, findEnclosingClassId across all languages (incl. Go receiver types)
  • 11 method signature tests: extractMethodSignature for TS/Python/Java/C#
  • 14 heritage processor tests: C# interface resolution via symbol table
  • 5 constructor-aware filtering tests: kind narrowing, arity fallback
  • TypeEnv tests: type annotation extraction across 9 languages
  • Symbol resolver tests: scope-walk, ambiguity refusal, tier priority
  • Call-form tests: inferCallForm, extractReceiverName, countCallArguments

Stats

  • 230 files changed, 10,100 insertions, 973 deletions
  • 1242+ tests passing, 53 test files

@vercel

vercel Bot commented Mar 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Mar 13, 2026 0:25am

Request Review

Base automatically changed from refactor/pre-merge-consolidation to main March 10, 2026 23:03
Implement full MRO computation for multi-language inheritance hierarchies:

- HAS_METHOD edges: Class→Method ownership edges emitted during parsing
  (both worker pool and sequential fallback paths)
- Method signatures: extract parameterCount and returnType from AST nodes
- C# heritage fix: distinguish EXTENDS vs IMPLEMENTS for base_list captures
  using symbol table lookup + I[A-Z] naming heuristic fallback
- MRO processor (Phase 4.5): walks inheritance DAG, detects method-name
  collisions across parents, applies language-specific resolution:
  - C++: leftmost base class in declaration order wins
  - C#/Java: class method wins over interface default
  - Python: C3 linearization with cycle detection
  - Rust: no auto-resolution (requires qualified syntax)
  - Default: first definition in BFS order wins
- OVERRIDES edges emitted for resolved method collisions
- KuzuDB schema: Method table extended with parameterCount/returnType;
  dedicated CSV writer and COPY query for 10-column Method rows
- MCP tools: updated Cypher examples for HAS_METHOD, OVERRIDES, diamond

72 tests across 5 test files covering MRO resolution, HAS_METHOD edges,
method signature extraction, C# heritage resolution, and integration
tests across C#/Rust/Python/TS/Java/C++.
@github-actions

github-actions Bot commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Unit Tests success 3 platforms
✅ Integration success 3 OS x 4 groups = 12 jobs

Test Results

1636 passed
· 507 suites · 1636 total
· ⏱️ 38s
· 📊 1146 unit + 490 integration

Code Coverage

Combined (Unit + Integration)

Metric Coverage Covered Threshold Status
Statements 48.94% 3775/7713 26% 🟢 █████████░░░░░░░░░░░
Branches 41% 2188/5336 23% 🟢 ████████░░░░░░░░░░░░
Functions 51.04% 417/817 28% 🟢 ██████████░░░░░░░░░░
Lines 50.37% 3496/6940 27% 🟢 ██████████░░░░░░░░░░
Coverage breakdown by test suite

Unit Tests

Metric Coverage Covered Threshold Status
Statements 37.7% 2908/7713 26% 🟢 ███████░░░░░░░░░░░░░
Branches 32.36% 1727/5336 23% 🟢 ██████░░░░░░░░░░░░░░
Functions 39.16% 320/817 28% 🟢 ███████░░░░░░░░░░░░░
Lines 39.07% 2712/6940 27% 🟢 ███████░░░░░░░░░░░░░

Integration Tests

Metric Coverage Covered Threshold Status
Statements 24.69% 1905/7713 26% 🔴 ████░░░░░░░░░░░░░░░░
Branches 19.28% 1029/5336 23% 🔴 ███░░░░░░░░░░░░░░░░░
Functions 26.8% 219/817 28% 🔴 █████░░░░░░░░░░░░░░░
Lines 25.37% 1761/6940 27% 🔴 █████░░░░░░░░░░░░░░░
Coverage thresholds are auto-ratcheted — they only go up

Vitest thresholds.autoUpdate bumps the floor whenever local coverage exceeds it.
CI enforces the current thresholds; developers commit the ratcheted values.


📋 View full run · Generated by CI

Introduces a shared 3-tier resolveSymbol function used by both
heritage-processor and call-processor:
1. Same-file (lookupExactFull — authoritative)
2. Import-scoped (filtered by ImportMap — high confidence)
3. Global fuzzy (first match — low confidence fallback)

Adds lookupExactFull to SymbolTable returning full SymbolDefinition
with type info needed for heritage Class/Interface disambiguation.
@magyargergo

Copy link
Copy Markdown
Collaborator Author

@claude

@claude

claude Bot commented Mar 11, 2026

Copy link
Copy Markdown

Claude finished @magyargergo's task in 3m 24s —— View job


Code Review

  • Gather context and understand the PR
  • Read changed files
  • Review MRO processor
  • Review HAS_METHOD implementation
  • Review heritage processor changes
  • Review schema/CSV changes
  • Post comprehensive review

Great work on this feature — the overall architecture is clean and the test coverage is solid. Here are my findings, grouped by severity.


Bugs / Critical Issues

1. KuzuDB schema missing Class→Property (and other) HAS_METHOD edge pairs (schema.ts)

HAS_METHOD edges are emitted for Property nodes (parse-worker.ts:955, parsing-processor.ts:252), but FROM Class TO \Property`is not listed inRELATION_SCHEMA. This will silently fail or error when loading the CSV into KuzuDB. The same may be true for other container→Property combinations (e.g. Struct→Property). You'll need to add these missing FROM/TO` pairs to the relation table schema.

Fix this →

2. findEnclosingClassId picks the trait name, not the struct name for impl Trait for Struct (utils.ts:271-275)

For Rust's impl MyTrait for MyStruct { fn foo() {} }, the impl_item AST node's first type_identifier child is MyTrait, not MyStruct. The current fallback:

current.children?.find((c: any) =>
  c.type === 'type_identifier' || c.type === 'identifier' || c.type === 'name'
);

…will find MyTrait first, so HAS_METHOD edges for methods inside impl Trait for Struct point to the wrong class. The for keyword is a sibling node that separates trait from implementor — you'd need to skip past it to find the implementing type.

Fix this →


Design / Semantic Issues

3. MRO emits OVERRIDES for Property name collisions (mro-processor.ts:341-399)

methodsByName is built from all HAS_METHOD target nodes, which includes Property nodes. If two parent classes have a property with the same name, the MRO processor will emit an OVERRIDES edge for it — semantically misleading since properties don't have resolution order semantics in most languages. Consider filtering methodMap to only include Method and Constructor nodes, or renaming the concept.

4. buildTransitiveEdgeTypes first-reach heuristic can misclassify ancestors (mro-processor.ts:421-453)

Consider a hierarchy where MyClass : BaseClass, IFoo and BaseClass also implements IFoo. Then IFoo is reachable via EXTENDS→EXTENDS through BaseClass. BFS first-reach would mark IFoo as EXTENDS rather than IMPLEMENTS. In practice this affects the resolveCsharpJava conflict resolution: an interface method from IFoo could be treated as a class method (winning the conflict when it should be a lower-priority interface method).

5. OVERRIDES edge direction is Class→Method (not Method→Method) (mro-processor.ts:389-398)

graph.addRelationship({
  sourceId: classId,
  targetId: resolution.resolvedTo,  // a methodId
  type: 'OVERRIDES',

The edge goes from the child class to the winning ancestor method. Cypher queries expecting (m1:Method)-[:OVERRIDES]->(m2:Method) won't work — you'd need (c:Class)-[:OVERRIDES]->(m:Method). This semantic should be documented explicitly, or consider emitting Method→Method edges from a (potentially synthesized) child method node.


Minor Issues

6. Default language fallback is 'typescript' instead of 'default' (mro-processor.ts:287)

const language = classNode.properties.language ?? 'typescript';

If TypeScript ever gets a dedicated MRO case added to the switch, this silent fallback would bypass the default: branch for nodes with missing language. Using 'unknown' or 'default' as fallback would be clearer intent.

Fix this →

7. I[A-Z] heuristic applied to all languages (heritage-processor.ts:49)

resolveExtendsType is called for all languages when processing heritage.extends captures (TypeScript, Python, C++, etc.). The I[A-Z] naming convention heuristic only makes sense for C# and Java. A class named Icon in Python or Inspector in C++ would incorrectly get an IMPLEMENTS edge. Consider gating the heuristic behind a language check.

Fix this →

8. extractMethodSignature misses Go return types

Go uses a different AST structure for return types (no type_annotation or return_type node types — Go uses return_type but as a field name, and results can be grouped in parameter_list). The extractMethodSignature function returns undefined for Go methods consistently, making returnType unusable for Go-heavy codebases. Worth adding a Go-specific branch or at least documenting the gap.

9. Symbol resolver Tier 3 returns the first global match silently (symbol-resolver.ts:46)

return allDefs[0];

When there are multiple classes with the same name across files, the first-indexed match wins with no signal about the ambiguity. This could cause incorrect EXTENDS/IMPLEMENTS edges to be drawn to the wrong class. Consider returning null when allDefs.length > 1 and the name is too common, or at minimum adding the confidence on the resulting relationship.


Observations (not bugs)

  • MROEntry results not persisted: The full linearized MRO order per class is computed but then discarded after computeMRO. If you want to expose MRO order via Cypher queries later, you'd need to store entries as node properties or in a dedicated table.
  • Constructor parameterCount/returnType: Constructor nodes get method signatures extracted, but returnType is always undefined (constructors have no explicit return). escapeCSVNumber defaults parameterCount to 0 which is fine, but the Constructor table schema (CODE_ELEMENT_BASE) doesn't include these columns. Since Constructors go to the constructor.csv (multiLang path), the fields are silently dropped — that's consistent, but worth knowing.
  • Duplicate HAS_METHOD logic: The same enclosing-class detection exists in both parse-worker.ts:954–967 and parsing-processor.ts:251–264. Any future bug fix must be applied in both places.

Overall the implementation is well-structured and the test suite is thorough. The Class→Property schema gap and the Rust impl Trait for Struct name issue are the two I'd prioritize fixing before merge.

- lookupExactFull now O(1) via direct SymbolDefinition storage in fileIndex
  (shared object references with globalIndex — zero additional memory)
- Added resolveSymbolInternal() preserving { definition, tier, candidateCount }
  for test assertions and logging
- Tier 3 now returns null when multiple global candidates exist instead of
  arbitrary allDefs[0] — a wrong edge is worse than no edge
- call-processor: renamed fuzzy-global → unique-global, removed dead branch
- 12 new tests: tier assertions, ambiguous refusal per language family,
  heritage false-positive guard, O(1) shared reference verification
Phase 5 critical fixes from all-language analysis:
- Python: add relative_import query capture (PEP 328) — `.models`, `..utils`
  were silently dropped, producing zero ImportMap entries
- Rust: extract prefix from grouped imports `crate::module::{A, B}` — brace
  groups previously failed resolution entirely
- Swift: use normalizedFileList for Windows path compatibility in module
  import resolution (matches Go's resolveGoPackage pattern)
- MRO: fix c_sharp → csharp language name mismatch (enum is 'csharp'),
  add Kotlin to C#/Java resolution rules (class method wins over interface)
…resolution

Add 32 integration tests across 6 language fixtures (TypeScript, C#, C++,
Java, Python, Rust) with exact toBe/toEqual assertions validating heritage
edges, import resolution, and trait implementations.

Fix C/C++ import resolution bug where dot-to-slash conversion mangled
include paths (e.g. "animal.h" became "animal/h"). Now skips conversion
for C/C++ languages which use actual file paths in #include directives.
…handle Rust grouped imports

- Gate I[A-Z] naming heuristic to C#/Java only (was firing for all languages)
- Swift unresolved types default to IMPLEMENTS (protocol conformance is the norm)
- Add tree-sitter query for Swift extension protocol conformance (extension Foo: Protocol)
- Handle Rust top-level grouped imports (use {crate::a, crate::b}) in both import loops
- Add 4 new heritage-processor tests (TypeScript refusal, Swift default, Swift Tier 1)
Add Go struct embedding detection (anonymous fields → EXTENDS edges) via
new tree-sitter heritage query with named-field filtering in both
parse-worker and heritage-processor paths.

Implement PackageMap optimization for Go cross-package resolution:
replace O(N) file-level ImportMap expansion with directory-level suffix
matching (Tier 2b in symbol resolver). Graph IMPORTS edges are preserved
via addImportGraphEdge split.

Remove overly broad @definition.type from GO_QUERIES that was
double-matching structs/interfaces as TypeAlias nodes, breaking Tier 3
unique-global resolution.

Add Go fixture (go-pkg) with Admin→User embedding, cross-package calls,
and 7 integration tests covering structs, functions, imports, calls,
and heritage edges.
Adds a kotlin-heritage fixture and 7 integration tests validating
class inheritance, interface implementation, JVM-style import
resolution, and symbol-table-driven EXTENDS/IMPLEMENTS disambiguation
via Kotlin delegation specifiers.
…ages

- Extract language-specific resolvers from import-processor.ts into
  resolvers/ directory (P7): jvm, go, csharp, php, rust, standard, utils
- import-processor.ts reduced from 1412 to 711 lines (50% reduction)
- Add comprehensive PHP integration tests: PSR-4 imports, traits, enums,
  heritage edges, method calls, MRO overrides
- Add ambiguous symbol resolution tests for all 9 languages verifying
  correct disambiguation via import chains
- Split monolithic lang-resolution.test.ts (1080 lines) into 9 per-language
  files under test/integration/resolvers/ with shared helpers
Repository owner deleted a comment from claude Bot Mar 11, 2026
@magyargergo

Copy link
Copy Markdown
Collaborator Author

@claude review this PR since it's been changed since you reviewed it.

@claude

claude Bot commented Mar 11, 2026

Copy link
Copy Markdown

Claude finished @magyargergo's task in 4m 6s —— View job


Code Review (Updated)

  • Gather context and understand the PR
  • Read key changed files
  • Check which previous review issues were addressed
  • Review new/changed code for fresh issues
  • Post comprehensive updated review

What Changed Since the Last Review

Two of the previously-flagged issues are now fixed:

  • Issue 7 (was Minor): I[A-Z] interface heuristic is now correctly language-gated to C#/Java only, with Swift getting its own IMPLEMENTS default (heritage-processor.ts:58-65).
  • Issue 9 (was Minor): Symbol resolver Tier 3 now returns null when allDefs.length > 1, properly refusing ambiguous global matches (symbol-resolver.ts:86-91). Call and route resolvers now delegate to resolveSymbol, picking up this behaviour automatically.

Still Outstanding from Previous Review

1. Class → Property schema pair missing (schema.ts — Bug, still unfixed)

HAS_METHOD edges are emitted from Class/Interface/Struct/Trait nodes to Property nodes (parse-worker.ts:964, parsing-processor.ts:252). The RELATION_SCHEMA in schema.ts includes:

FROM File TO `Property`          ✅
FROM Function TO `Property`      ✅
FROM Method TO `Property`        ✅
FROM Constructor TO `Property`   ✅

But these are still absent:

FROM Class TO `Property`         ❌
FROM Interface TO `Property`     ❌
FROM Struct TO `Property`        ❌
FROM Trait TO `Property`         ❌

Any HAS_METHOD edge where the source is a Class/Interface/Struct/Trait and the target is a Property will fail to be stored in KuzuDB. This is the highest-priority fix before merge.

Fix this →

2. findEnclosingClassId picks the trait name for impl Trait for Struct (utils.ts:271-275 — Bug, still unfixed)

The fallback children.find() returns the first type_identifier it encounters. In the tree-sitter Rust AST for impl MyTrait for MyStruct {}, the children are: impl, MyTrait, for, MyStruct, {...}. The first type_identifier is MyTrait, so generateId('Impl', 'path/to/file.rs:MyTrait') is used instead of MyStruct. All HAS_METHOD edges from such blocks point to the wrong identifier.

Fix this →

3. Default language fallback is 'typescript' (mro-processor.ts:287 — Minor, still unfixed)

const language = classNode.properties.language ?? 'typescript';

If TypeScript ever gets its own MRO case in the switch, nodes with a missing language property would silently route there instead of default:. Using 'unknown' is clearer intent.

Fix this →

4. extractMethodSignature misses Go return types (utils.ts:474-479 — Minor, still unfixed)

Go's return_type field exists in the tree-sitter grammar but uses a different AST structure than type_annotation. The function returns returnType: undefined for all Go methods, making the column unused for Go-heavy codebases. Worth at least a comment documenting the gap.


Design Issues Carried Over (No Changes)

5. MRO emits OVERRIDES for Property name collisions (mro-processor.ts:305-330)

methodsByName is populated from all HAS_METHOD targets, which includes Property nodes. Two parent classes with a property of the same name generate an OVERRIDES edge — semantically odd since properties don't have dispatch semantics in most languages. Filtering methodMap to only Method/Constructor nodes before the collision pass would tighten this.

6. buildTransitiveEdgeTypes first-reach heuristic (mro-processor.ts:421-453)

BFS propagates the edge type of the first path found. If BaseClass also implements IFoo, then IFoo reachable via Class→EXTENDS→BaseClass→IMPLEMENTS→IFoo would be classified as EXTENDS rather than IMPLEMENTS. This edge case means resolveCsharpJava could treat an interface method as a class method (wrong winner). Worth documenting this known limitation as a comment.

7. OVERRIDES edge semantic (mro-processor.ts:389-398)

The edge goes from the child class to the winning ancestor method (Class → Method). This differs from the natural Method → Method direction most developers would expect for OVERRIDES. This is workable since FROM Class TO Method exists in the schema, but Cypher queries need to use:

MATCH (c:Class)-[r:CodeRelation {type: 'OVERRIDES'}]->(m:Method)

rather than the more intuitive Method → Method form. The MCP tool descriptions and docstring should document this explicitly.


New Minor Issue

8. Redundant lookupFuzzy in processRoutesFromExtracted (call-processor.ts:291)

After resolveSymbol (which internally calls lookupFuzzy) returns a result, the function immediately calls lookupFuzzy again just to compute a confidence value:

const controllerDef = resolveSymbol(route.controllerName, ...);
// ...
const allControllerDefs = symbolTable.lookupFuzzy(route.controllerName); // redundant
const confidence = isImportedController ? 0.9 : allControllerDefs.length === 1 ? 0.7 : 0.5;

Since resolveSymbol already refuses to return when allDefs.length > 1, the allControllerDefs.length === 1 branch is always true here. The second lookupFuzzy call can be removed, simplifying to confidence = isImportedController ? 0.9 : 0.7.

Fix this →


Summary

The two critical fixes from last review remain: the Class→Property schema gap (#1) could silently drop edges at DB-load time, and the Rust impl Trait for Struct name issue (#2) misroutes HAS_METHOD edges. These should be addressed before merge. Everything else is either a minor improvement or an acceptable known limitation.

…IDES

Bugs fixed:
- Add 13 missing FROM/TO pairs in RELATION_SCHEMA for HAS_METHOD edges
  (Class/Interface/Struct/Trait/Impl/Record to Method/Constructor/Property)
- Fix findEnclosingClassId to pick implementing type for Rust
  impl Trait for Struct blocks (was picking trait name)
- Exclude Property nodes from MRO OVERRIDES collision detection
- Change MRO language fallback from typescript to unknown

Tests added:
- Unit: Property OVERRIDES exclusion (2 tests), Rust impl Trait for
  Struct name resolution (2 tests), schema HAS_METHOD pair coverage
- Integration: no OVERRIDES targets Property nodes across all 9 languages
- PHP fixture: added shared $status property to both traits to create
  real collision scenario for Property OVERRIDES exclusion test

Documentation:
- OVERRIDES edge direction (Class to Method), Go return type gap,
  BFS first-reach heuristic limitation
- Fix same-file confidence (0.85 → 0.95) to correctly outrank import-scoped (0.9)
- Fix Tier 1 overload preservation: use globalIndex filter instead of fileIndex lookup
- Add callable-kind guard: refuse CALLS edges to Interface and Enum symbols
- Fix Kotlin countCallArguments: handle call_suffix → value_arguments nesting
- Fix Kotlin extractFunctionName: add simple_identifier to fallback search
- Strictly type findParameterList and countCallArguments (remove all `any`)
- Add arity-based call resolution integration tests for 9 languages
- Add unit regression tests for Interface/Enum CALLS refusal
collectRustBindings was extracting the path segment (e.g. "models") from
`use crate::models::{User, Repo}` as a spurious NamedImportMap entry.
Skip scoped_identifier nodes that are direct children of scoped_use_list
since they are path prefixes, not importable symbols.

Adds rust-grouped-imports fixture and 4 integration tests verifying both
symbols resolve correctly and no spurious binding leaks through.
@abhigyanpatwari

Copy link
Copy Markdown
Owner

@magyargergo check these optimizations:

The Heritage resolution and Call resolution needs symbol table and import map respectively. So they can be parallelized

The MRO scan can benefit from a type-index. It will go from O ( all edges ) to O( inheritance edges )

Also I think we can cache import alias per-chunk since symbol table dont change mid-resolution
the heritage resolution and call resolution optimizations should give good performance boost for huge java repos especially
instead of awaiting processImports , processCalls and processHeritage u can just do

await processImports();
await Promise.all([
processCalls(),
processHeritage(),
]);
easy fix high impact, if i m not missing something here

@magyargergo

Copy link
Copy Markdown
Collaborator Author

@claude review the code and also focus on evaluating the comments above.

@claude

claude Bot commented Mar 13, 2026

Copy link
Copy Markdown

Claude finished @magyargergo's task in 6m 22s —— View job


Code Review (Updated) + Optimization Evaluation

  • Gather context — check what changed since last review (ce347de)
  • Read key changed files (pipeline.ts, mro-processor.ts, named-binding-extraction.ts, call-processor.ts, heritage-processor.ts)
  • Evaluate optimization suggestions from @abhigyanpatwari
  • Review new changes for bugs and design issues
  • Post comprehensive review

What Changed Since the Last Review

Two commits landed since the previous review (054a344):

  • 3b8adb8 — Fix spurious path-prefix binding for Rust grouped imports. collectRustBindings now detects scoped_identifier nodes whose parent.type === 'scoped_use_list' and skips extracting them as symbols (they are path prefixes, not importable names). 4 new integration tests added for rust-grouped-imports. This directly addresses Issue Add support for Ollama as a local inference backend #1 from the last review.

  • ce347de — Fix Kotlin lowercase member import regression introduced in 005ff3a. A guard exportedName[0] === exportedName[0].toLowerCase() correctly prevents import util.OneArg.writeAudit (member method import) from being added to NamedImportMap as a class-level binding.

Both fixes are correctly implemented. One residual limitation noted below.


Residual Limitation in 3b8adb8

Rust paths deeper than 2 levels still produce spurious path-segment bindings (named-binding-extraction.ts:244-251)

The guard fires only for scoped_identifier whose direct parent is scoped_use_list. For use crate::models::sub::{A, B}, the outer scoped_identifier(crate::models::sub) hits the guard and recurses into its children — one of which is the inner scoped_identifier(crate::models). This inner node has parent type scoped_identifier, not scoped_use_list, so it bypasses the guard and falls through to the "terminal scoped_identifier" case (hasDeeper = false), extracting models as a spurious binding.

Severity: Same as the original bug — lookupFuzzy('models') finds no graph node, chain walks return null, graceful fallback. No correctness impact in practice.

Minimal fix: Replace the recursion-into-children inside the guard with a simple return, since the use_list sibling of the path prefix is already handled by the final recursive case of the parent scoped_use_list:

// Current: recurses into children of the path prefix (unnecessary)
if (node.type === 'scoped_identifier' && node.parent?.type === 'scoped_use_list') {
  for (let i = 0; i < node.namedChildCount; i++) {
    const child = node.namedChild(i);
    if (child) collectRustBindings(child, bindings);
  }
  return;
}

// Simpler: just skip the path prefix entirely
if (node.type === 'scoped_identifier' && node.parent?.type === 'scoped_use_list') {
  return; // path prefix — the use_list sibling handles the actual symbols
}

The current recursion produces no bindings for 2-level paths, but introduces the spurious binding for 3+ level paths. The return form is correct for all depths.

Fix this →


Evaluation of @abhigyanpatwari's Optimization Suggestions

1. Parallelize processCalls + processHeritage with Promise.all — ✅ Correct and beneficial

The suggestion is architecturally sound. Looking at the pipeline (pipeline.ts:226-238):

// Current sequential (per-chunk):
await processImportsFromExtracted(...);          // populates importMap, packageMap, namedImportMap
await processCallsFromExtracted(...);            // reads symbolTable/importMap → writes to graph
await processHeritageFromExtracted(...);         // reads symbolTable/importMap → writes to graph
await processRoutesFromExtracted(...);           // reads symbolTable/importMap → writes to graph

Why it's safe: All three *FromExtracted functions exclusively read from symbolTable, importMap, packageMap, and namedImportMap — none of which are mutated during this phase. All three write only to graph.addRelationship, and since Node.js is single-threaded, the await yieldToEventLoop() interleaving between them cannot cause concurrent writes. Each addRelationship call is atomic from the event loop's perspective.

What changes: processRoutesFromExtracted also qualifies for the same parallelization:

await processImportsFromExtracted(...); // must complete first — populates importMap/namedImportMap
await Promise.all([
  chunkWorkerData.calls.length > 0
    ? processCallsFromExtracted(graph, chunkWorkerData.calls, symbolTable, importMap, packageMap, undefined, namedImportMap)
    : Promise.resolve(),
  chunkWorkerData.heritage.length > 0
    ? processHeritageFromExtracted(graph, chunkWorkerData.heritage, symbolTable, importMap, packageMap)
    : Promise.resolve(),
  chunkWorkerData.routes?.length > 0
    ? processRoutesFromExtracted(graph, chunkWorkerData.routes, symbolTable, importMap, packageMap)
    : Promise.resolve(),
]);

Impact: Significant for large Java repos with deep class hierarchies. Heritage resolution is the bottleneck in those cases (hundreds of resolveSymbol calls per class file), and it currently blocks call resolution from starting. For a 10,000-file Java repo, this could meaningfully reduce wall time per chunk. The sequential fallback path (processCalls + processHeritage sharing astCache) is also safe to parallelize since astCache is a JS Map with synchronous writes — no race conditions in Node.js's single-threaded model, though there would be redundant AST parsing since both functions would parse the same files independently.

One caveat: processHeritageFromExtracted does not accept namedImportMap as a parameter (it uses only importMap + packageMap for resolution). This is fine — heritage resolution only needs to know which file a class belongs to, not which specific named symbol was imported. No change needed there.

Implement this →


2. MRO type-index (O(all edges) → O(inheritance edges)) — ✅ Correct and impactful

buildAdjacency in mro-processor.ts:64-91 iterates all edges via graph.forEachRelationship, filtering for EXTENDS, IMPLEMENTS, and HAS_METHOD in the callback. For a large Java repo with millions of CALLS edges (every method call site), this scans all of them to find the comparatively few inheritance edges.

The fix requires adding a type-indexed accessor to KnowledgeGraph:

// types.ts
forEachRelationshipOfType: (type: string | string[], fn: (rel: GraphRelationship) => void) => void;

Implementation would maintain a Map<string, GraphRelationship[]> index populated in addRelationship. Then buildAdjacency calls:

graph.forEachRelationshipOfType(['EXTENDS', 'IMPLEMENTS', 'HAS_METHOD'], (rel) => { ... });

Impact: For a 100K-node graph with 500K edges (typical large Java repo), the MRO scan currently iterates all 500K edges. If there are 20K HAS_METHOD edges and 5K inheritance edges, the optimization reduces this by ~94%. The type-index approach is also reusable by other processors (e.g., communityProcessor could benefit from similar indexing).

Scope: 3 files — types.ts, graph.ts (implementation), mro-processor.ts (call site). This is a clean, bounded change.


3. Cache import alias per chunk — ✅ Valid micro-optimization, lower priority

The suggestion is correct: walkBindingChain is called once per call site for each member call resolution, and for a file with 50 call sites on User (all imported from the same module), 49 of those chain walks are redundant — the result is identical each time.

The cache would be:

const walkCache = new Map<string, SymbolDefinition[] | null>(); // key: `${filePath}:${name}`

Why it's lower priority than #1 and #2: Each chain walk is at most 5 Map.get lookups (O(1)) plus one lookupFuzzy call per hop. For most files, the chain is 0-1 hops (no re-export). The savings per call site are microseconds. For large files with hundreds of calls to the same type, savings accumulate, but this is not the primary bottleneck.

The optimization is worth doing when the other two are in place — it's a simple Map.get before the chain walk and Map.set after. The cache would need to be scoped per-chunk (created inside the chunk loop and discarded after).


Summary

Both new commits are correct and the previous Issue #1 (Rust spurious path binding) is resolved for the common 2-level path case. The residual limitation for 3+ level paths is low severity and easily fixable.

On the optimization suggestions: all three are technically sound. #1 (Promise.all parallelization) is the highest-impact, lowest-effort change — it's a one-line restructure with clear correctness guarantees in Node.js's single-threaded model. #2 (MRO type-index) is the most architecturally impactful, requiring a new graph accessor. #3 (alias walk cache) is a valid micro-optimization for large files.

The PR is in good shape. The Rust path-prefix recursion fix and the three optimizations are the only remaining actionable items.

…collision

Two methods named identically in different classes within the same file
previously shared a scope key, causing non-deterministic type resolution.
Now keys use funcName@startIndex for uniqueness.

Also adds tests documenting destructuring assignment extraction gap.
Merged origin/main (03bfa3c) into feat/method-resolution-order.
Conflicts in CLAUDE.md and AGENTS.md were GitNexus index stats —
kept our branch's numbers (1747 symbols, 4569 relationships).
Worker path now runs processCallsFromExtracted, processHeritageFromExtracted,
and processRoutesFromExtracted via Promise.all instead of sequentially.
Safe because all three only read shared state and write via addRelationship's
dedup guard. Sequential fallback path stays sequential (shared LRU astCache).

Also fixes Rust collectRustBindings spurious path-prefix bindings for 3+ level
grouped imports, and adds @param JSDoc for walkBindingChain's allDefs invariant.
Clarify that the parallelization safety comes from disjoint relationship
types + idempotent id-keyed Maps, not from lack of shared state (the
graph is shared). Strengthen allDefs JSDoc to describe silent-miss
consequence of passing pre-filtered results.
@magyargergo

Copy link
Copy Markdown
Collaborator Author

Performance & Code Quality Updates

Addressed remaining review feedback with two focused changes:

1. Parallelize processCalls + processHeritage + processRoutes (worker path)

Commit: 814fac1 + 0227448

Replaced sequential await calls with Promise.all in the worker path. Safe because:

  • Each processor writes disjoint relationship types (CALLS / EXTENDS+IMPLEMENTS / route-CALLS)
  • All writes go through addRelationship's idempotent if (!has) Map guard
  • Node.js single-threaded event loop prevents races between synchronous calls
  • Sequential fallback path left unchanged (shared LRU astCache makes parallelization counterproductive there)

Also removed redundant length > 0 guards — each function handles empty input gracefully.

2. Rust collectRustBindings path-prefix fix + JSDoc

Commit: 814fac1 + 0227448

  • Fixed spurious bindings for 3+ level Rust grouped imports (use crate::a::b::{X, Y} no longer emits b as a binding)
  • Added @param allDefs JSDoc to walkBindingChain documenting the no-pre-filtering invariant

Dropped (YAGNI)

  • Type-indexed edge access: 500K string comparisons is single-digit ms in V8, single callsite doesn't justify new API
  • walkBindingChain cache: Each chain is ≤5 Map.get lookups (microseconds), cache adds complexity for negligible gain

Test Results

All 1598+ tests pass. Only KuzuDB integration tests fail (pre-existing Windows file-lock issue).


🤖 Generated with Claude Code

…tables

Phase 1: Extract type binding logic from type-env.ts (635→125 LOC) into
type-extractors/ directory with per-language files and Record<SupportedLanguages,
LanguageTypeConfig> + satisfies dispatch.

Phase 2: Extract 5 config loaders from import-processor.ts into
language-config.ts (removed ~196 LOC of inline loaders).

Phase 3: Convert export-detection.ts switch/case to exhaustive
Record<SupportedLanguages, ExportChecker> + satisfies dispatch table,
fix node: any → SyntaxNode.

Also adds language feature matrix to README.

All 1146 unit tests and 433 integration tests pass.
…(Phase 1)

Extract per-language type extraction from type-env.ts (635→125 LOC) into
type-extractors/ with Record<SupportedLanguages, LanguageTypeConfig> + satisfies
dispatch. 9 per-language files, shared helpers, and barrel index.
Move 5 language-specific config loaders and their type interfaces from
import-processor.ts into standalone language-config.ts module.
# Conflicts:
#	gitnexus/src/core/ingestion/import-processor.ts
#	gitnexus/src/core/ingestion/language-config.ts
@magyargergo

Copy link
Copy Markdown
Collaborator Author

@claude review my latest changes and focus on the performance characteristics of this PR and how this evolved from the beginning. Include flow diagrams and breakdowns.

@claude

claude Bot commented Mar 13, 2026

Copy link
Copy Markdown

Claude finished @magyargergo's task in 3m 25s —— View job


Performance Analysis: Evolution of the Resolution Pipeline

  • Post initial comment
  • Read pipeline.ts and processing files
  • Analyze git log for performance evolution
  • Review latest commits (814fac1, 0227448)
  • Build flow diagrams and write analysis
  • Post final review

Latest Changes (814fac1 + 0227448)

The two most recent commits are correct and complete:

  • Pipeline parallelization (814fac1) — processCalls, processHeritage, processRoutes now run via Promise.all in the worker path. Node.js's single-threaded event loop makes this safe: all three processors only read shared state (symbolTable, importMap, packageMap, namedImportMap) and only write via addRelationship's idempotent dedup guard.
  • Rust path-prefix recursion fixed (814fac1) — collectRustBindings now returns early for scoped_identifier nodes whose parent is scoped_use_list, eliminating spurious path-segment bindings for grouped Rust imports at all depths.
  • JSDoc invariant on walkBindingChain (0227448) — @param allDefs documents the no-pre-filtering contract.
  • Safety comment on Promise.all (0227448) — explains why parallel execution is safe.

One residual note on the Rust fix: The previous collectRustBindings guard for scoped_use_list parent-nodes was changed from a recursive-into-children approach to a simple return. This is the correct fix — the use_list sibling handles the actual symbol names without any help from the path prefix. No issue here.


How Performance Evolved: The Full Story

This PR added approximately 13,800 lines across 37 commits. Below is a breakdown of every significant performance and correctness dimension, showing how each evolved from start to finish.


1. Ingestion Pipeline Architecture

Before this PR (baseline)
Each file was processed individually in a sequential loop: parse → resolve imports → resolve calls → resolve heritage. The symbol table was flat and could produce false positives via lookupFuzzy returning the first global match.

[Baseline per-file loop]

File_1 ──► parse ──► processImports ──► processCalls ──► processHeritage
File_2 ──► parse ──► processImports ──► processCalls ──► processHeritage
File_N ──► parse ──► processImports ──► processCalls ──► processHeritage

Wall time ∝ N × (T_parse + T_imports + T_calls + T_heritage)

After this PR (current)
Chunked memory-budget model with worker-pool parallelism, plus Promise.all for intra-chunk processing:

[Current chunked + parallel pipeline]

Chunk_K = up to 20MB of files

readFileContents(chunk_K)
        │
        ▼
processParsing ─── worker pool (parallel across files)
        │           └─ parse-worker: extract calls/heritage/routes/imports
        │               (AST never leaves worker — only serialized records returned)
        ▼
processImportsFromExtracted   ← sequential (populates importMap/packageMap/namedImportMap)
        │
        ▼
Promise.all([
  processCallsFromExtracted,    ─┐
  processHeritageFromExtracted,  ├─ all read-only on shared state, write-only on graph
  processRoutesFromExtracted,   ─┘
])
        │
        ▼
astCache.clear()   ← GC reclaims chunk memory

Wall time ∝ Σ(chunks) × max(T_calls, T_heritage, T_routes)   [not sum]

Impact of Promise.all parallelization (814fac1):

For a large Java repository with 5,000 files:

  • T_heritage ≈ 3× T_calls (deep class hierarchies, many resolveSymbol calls)
  • Sequential: T_calls + T_heritage + T_routes ≈ 4T
  • Parallel: max(T_calls, T_heritage, T_routes) ≈ T_heritage ≈ T
  • Theoretical speedup: ~3–4× for the resolution phase per chunk

The sequential fallback path (used when the worker pool fails) correctly stays sequential because it shares an LRU astCache — parallel access to a stateful LRU cache would produce non-deterministic eviction behavior.


2. Symbol Resolution: From O(N) First-Match to Tiered Confidence Filtering

This is the most architecturally significant evolution in the PR.

Stage 0 (pre-PR): raw lookupFuzzy, first match wins

calledName ──► lookupFuzzy() ──► allDefs[0]
                                  (could be any file, any definition)

False positives silently produced incorrect edges. Two classes named User in different files → always resolved to whichever was indexed first.

Stage 1 (d0a29a2 + fcf90dd): 3-Tier resolver, ambiguity refusal

calledName
    │
    ├─[Tier 1]──► lookupExactFull(currentFile, name) ──► if found → return (authoritative)
    │
    ├─[Tier 2a]─► allDefs.filter(def → importedFiles.has(def.filePath)) ──► if 1 found → return
    │
    ├─[Tier 2b]─► allDefs.filter(def → isFileInPackageDir(def.filePath, ...)) ──► if 1 found → return
    │
    └─[Tier 3]──► allDefs.length === 1 ? return allDefs[0] : null  ← REFUSE if ambiguous

This was the foundational correctness improvement: null (no edge) is better than a wrong edge.

Stage 2 (b7a8db9): NamedImportMap — file-level → name-level precision

Before: import { User, Repo } from './models' → only recorded models.ts in ImportMap. Both User and Repo were indistinguishable.

After: each named binding recorded individually. Resolution logic expanded:

[Tier 2a-named — added before allDefs early return to handle alias lookups]

name → namedImportMap.get(currentFile)?.get(name)
         → { sourcePath, exportedName }
         → lookupFuzzy(exportedName).filter(def → def.filePath === sourcePath)
         → if exactly 1 → return (import-scoped)

Stage 3 (7dbdb5c + 883b041): as alias support

import { User as U }lookupFuzzy('U') returns empty (no symbol named U exists). Without alias tracking, falls to global tier or null. With alias tracking:

name='U' → namedImportMap: U → { sourcePath: 'models.ts', exportedName: 'User' }
         → lookupFuzzy('User').filter(def → filePath === 'models.ts')
         → returns correct definition

Stage 4 (d8c1e1e): Re-export chain resolution (up to 5 hops)

file A imports { User } from B
B has no User definition — re-exports from C

walkBindingChain:
  depth=0: A.namedImports['User'] → { sourcePath: B, exportedName: 'User' }
           lookupFuzzy('User').filter(B) → empty
  depth=1: B.namedImports['User'] → { sourcePath: C, exportedName: 'User' }
           lookupFuzzy('User').filter(C) → found!  ← return

Cycle detection via visited Set prevents infinite loops.

Stage 5 (054a344 + 005ff3a): Full language coverage

walkBindingChain extracted as a shared utility (eliminating duplicate logic). Named binding extraction extended to all 9 languages including PHP flat/grouped non-aliased imports, Rust non-aliased use statements, Kotlin non-aliased class imports.

Stage 6 (814fac1): Parallelized in pipeline

Resolution itself didn't change; it now runs concurrently with heritage and route resolution per chunk.


3. Call Resolution Pipeline: 4-Stage Funnel

The call resolution funnel evolved from a single lookup to a 4-stage filter that progressively narrows candidates:

[Current resolveCallTarget pipeline]

calledName + currentFile
        │
        ▼
[A] collectTieredCandidates
        │  same-file      → {candidates: localDefs, tier: 'same-file'}
        │  named binding  → walkBindingChain → {candidates, tier: 'import-scoped'}
        │  import-scoped  → {candidates: importedDefs, tier: 'import-scoped'}
        │  package-scoped → {candidates: pkgDefs, tier: 'import-scoped'}
        │  unique-global  → {candidates: allDefs, tier: 'unique-global'}
        │
        ▼
[B] filterCallableCandidates
        │  free call      → keep Function, Method, Macro, Delegate
        │  constructor    → keep Constructor, Class, Struct, Record
        │  member call    → keep any callable
        │  argCount       → filter to exact arity match (variadic: any argCount passes)
        │
        ▼  if still > 1 and call is member call with known receiver type:
[C] receiver-type filtering
        │  receiverTypeName → lookupFuzzy(type) → typeNodeIds
        │  keep candidates where c.ownerId ∈ typeNodeIds
        │  if exactly 1 → emit edge
        │  if > 1 → refuse (ambiguous)
        │  if 0 → fall through to name-only
        │
        ▼
[D] Emit or refuse
        │  filteredCandidates.length === 1 → addRelationship(CALLS edge)
        └  else → null (no edge, no false positive)

Stage B (arity) was added in fc67a8c / b8d9182parameterCount added to SymbolDefinition.
Stage C (receiver type) was added in 167ac0eTypeEnv + receiverTypeName extraction.


4. TypeEnv: Receiver Type Tracking

[TypeEnv evolution]

v1 (b7a8db9): flat Map<varName, typeName> — last-write-wins
  Problem: function a() { user: User } + function b() { user: Repo }
           → 'user' → 'Repo' (overwrites 'User' non-deterministically)

v2 (b7a8db9): Map<funcName, Map<varName, typeName>> — per-function scope
  Problem: class A { save() {...} } + class B { save() {...} }
           → both 'save' scopes share key 'save' (last class processed wins)

v3 (9f83baa): Map<'funcName@startIndex', Map<varName, typeName>>
  Scope key = funcName + AST byte position → unique per method site
  Problem: eliminated ✓

The buildTypeEnv walk is O(AST nodes) per file — no global state, built once per file and discarded after call resolution.


5. MRO Processor: The One Dropped Optimization

buildAdjacency in mro-processor.ts:64 calls graph.forEachRelationship and scans all edge types:

graph.forEachRelationship((rel) => {
  if (rel.type === 'EXTENDS' || rel.type === 'IMPLEMENTS') { ... }  // ~5K edges
  if (rel.type === 'HAS_METHOD') { ... }                             // ~20K edges
  // but also iterates: CALLS (500K), MEMBER_OF, IMPORTS, etc.
});

For a large repo, this is O(all edges) when only O(inheritance edges) are needed. The PR considered a type-indexed accessor but dropped it as YAGNI — the comment notes "500K string comparisons is single-digit ms in V8". This is correct at current scale. For a Linux-kernel-sized codebase (millions of edges), this would become the MRO bottleneck.

The dropped optimization would have been:

// Hypothetical: type-indexed edge accessor
graph.forEachRelationshipOfType(['EXTENDS', 'IMPLEMENTS', 'HAS_METHOD'], (rel) => { ... });
// O(25K) instead of O(500K)

6. Memory Architecture: 20MB Chunk Budget

The chunk budget model (CHUNK_BYTE_BUDGET = 20MB) is a key performance-correctness tradeoff documented in the pipeline:

// Comment from pipeline.ts:193-194:
// "Calls/heritage use the symbol table built so far (symbols from earlier chunks
//  are already registered). This trades ~5% cross-chunk resolution accuracy for
//  200-400MB less memory — critical for Linux-kernel-scale repos."

This means calls to symbols defined in a later chunk are resolved as global fuzzy matches rather than import-scoped matches. For most repos this is a minor accuracy loss. For monorepos where barrel files import from many chunks, it could cause Tier 3 fallback for symbols that should be Tier 2.


Performance Characteristics Summary

Dimension Before PR After PR Change
Per-file resolution lookupFuzzy → first match 3-tier + NamedImportMap + re-exports Correctness: 50% → ~95% precision
Import precision File-level (ImportMap) Name-level (NamedImportMap, 7 languages) False positive rate significantly reduced
Alias resolution Not tracked 9 languages, re-export chain up to 5 hops New capability
Receiver-type filtering Not present Per-call TypeEnv lookup + ownerId filter New capability
TypeEnv scope isolation Flat (non-deterministic) Per-function, per-position scope key Correctness bug fixed
Pipeline model Per-file sequential Chunked (20MB budget), worker pool Memory: predictable; throughput: +worker pool
Intra-chunk resolution Sequential (calls→heritage→routes) Promise.all (parallel) ~3–4× speedup for heritage-heavy repos
MRO edge scan N/A O(all edges) per run Acceptable; type-index deferred as YAGNI
Test coverage Pre-existing 90 integration suites × 9 languages; 1,598+ tests Comprehensive

Remaining Performance Notes

  1. MRO buildAdjacency — the O(all edges) scan is acceptable now but is the natural next optimization target if the MRO phase shows up in profiling on large repos. A Map<edgeType, edge[]> index in the graph maintained by addRelationship would make this trivial.

  2. Sequential fallback path (pipeline.ts:270–279) — when the worker pool fails to initialize, the fallback re-reads source files and runs processCalls + processHeritage sequentially. These share the LRU astCache (which is the reason parallelization was explicitly not applied there). No issue, just worth noting that fallback performance is significantly worse than the happy path.

  3. walkBindingChain per-call invocation — called once per call site for every member call resolution, with up to 5 Map.get lookups per hop. For a file with hundreds of calls to the same imported type, each re-walks the chain independently. The dropped YAGNI cache (Map<filePath:name, SymbolDefinition[]>) would eliminate this for the typical case of many calls to one imported class. Low priority given the O(1) per-hop cost, but worth revisiting if profiling shows binding chain walks as a hot path.

@magyargergo magyargergo merged commit 1afe916 into main Mar 13, 2026
22 checks passed
@magyargergo magyargergo deleted the feat/method-resolution-order branch March 13, 2026 13:12
motolese pushed a commit to motolese/GitNexus that referenced this pull request Apr 23, 2026
…tructor discrimination (abhigyanpatwari#238)

* feat: add Method Resolution Order (MRO) with language-specific rules

Implement full MRO computation for multi-language inheritance hierarchies:

- HAS_METHOD edges: Class→Method ownership edges emitted during parsing
  (both worker pool and sequential fallback paths)
- Method signatures: extract parameterCount and returnType from AST nodes
- C# heritage fix: distinguish EXTENDS vs IMPLEMENTS for base_list captures
  using symbol table lookup + I[A-Z] naming heuristic fallback
- MRO processor (Phase 4.5): walks inheritance DAG, detects method-name
  collisions across parents, applies language-specific resolution:
  - C++: leftmost base class in declaration order wins
  - C#/Java: class method wins over interface default
  - Python: C3 linearization with cycle detection
  - Rust: no auto-resolution (requires qualified syntax)
  - Default: first definition in BFS order wins
- OVERRIDES edges emitted for resolved method collisions
- KuzuDB schema: Method table extended with parameterCount/returnType;
  dedicated CSV writer and COPY query for 10-column Method rows
- MCP tools: updated Cypher examples for HAS_METHOD, OVERRIDES, diamond

72 tests across 5 test files covering MRO resolution, HAS_METHOD edges,
method signature extraction, C# heritage resolution, and integration
tests across C#/Rust/Python/TS/Java/C++.

* feat: add scope-based symbol resolution replacing raw lookupFuzzy

Introduces a shared 3-tier resolveSymbol function used by both
heritage-processor and call-processor:
1. Same-file (lookupExactFull — authoritative)
2. Import-scoped (filtered by ImportMap — high confidence)
3. Global fuzzy (first match — low confidence fallback)

Adds lookupExactFull to SymbolTable returning full SymbolDefinition
with type info needed for heritage Class/Interface disambiguation.

* refactor: tighten symbol resolution — Tier 3 refuses ambiguous matches

- lookupExactFull now O(1) via direct SymbolDefinition storage in fileIndex
  (shared object references with globalIndex — zero additional memory)
- Added resolveSymbolInternal() preserving { definition, tier, candidateCount }
  for test assertions and logging
- Tier 3 now returns null when multiple global candidates exist instead of
  arbitrary allDefs[0] — a wrong edge is worse than no edge
- call-processor: renamed fuzzy-global → unique-global, removed dead branch
- 12 new tests: tier assertions, ambiguous refusal per language family,
  heritage false-positive guard, O(1) shared reference verification

* fix: critical language support bugs in import resolution and MRO

Phase 5 critical fixes from all-language analysis:
- Python: add relative_import query capture (PEP 328) — `.models`, `..utils`
  were silently dropped, producing zero ImportMap entries
- Rust: extract prefix from grouped imports `crate::module::{A, B}` — brace
  groups previously failed resolution entirely
- Swift: use normalizedFileList for Windows path compatibility in module
  import resolution (matches Go's resolveGoPackage pattern)
- MRO: fix c_sharp → csharp language name mismatch (enum is 'csharp'),
  add Kotlin to C#/Java resolution rules (class method wins over interface)

* feat: add strict multi-language integration tests + fix C/C++ import resolution

Add 32 integration tests across 6 language fixtures (TypeScript, C#, C++,
Java, Python, Rust) with exact toBe/toEqual assertions validating heritage
edges, import resolution, and trait implementations.

Fix C/C++ import resolution bug where dot-to-slash conversion mangled
include paths (e.g. "animal.h" became "animal/h"). Now skips conversion
for C/C++ languages which use actual file paths in #include directives.

* fix: language-gate heritage heuristic, add Swift extension heritage, handle Rust grouped imports

- Gate I[A-Z] naming heuristic to C#/Java only (was firing for all languages)
- Swift unresolved types default to IMPLEMENTS (protocol conformance is the norm)
- Add tree-sitter query for Swift extension protocol conformance (extension Foo: Protocol)
- Handle Rust top-level grouped imports (use {crate::a, crate::b}) in both import loops
- Add 4 new heritage-processor tests (TypeScript refusal, Swift default, Swift Tier 1)

* feat: add Go struct embedding heritage + PackageMap optimization

Add Go struct embedding detection (anonymous fields → EXTENDS edges) via
new tree-sitter heritage query with named-field filtering in both
parse-worker and heritage-processor paths.

Implement PackageMap optimization for Go cross-package resolution:
replace O(N) file-level ImportMap expansion with directory-level suffix
matching (Tier 2b in symbol resolver). Graph IMPORTS edges are preserved
via addImportGraphEdge split.

Remove overly broad @definition.type from GO_QUERIES that was
double-matching structs/interfaces as TypeAlias nodes, breaking Tier 3
unique-global resolution.

Add Go fixture (go-pkg) with Admin→User embedding, cross-package calls,
and 7 integration tests covering structs, functions, imports, calls,
and heritage edges.

* test: add Kotlin heritage integration tests

Adds a kotlin-heritage fixture and 7 integration tests validating
class inheritance, interface implementation, JVM-style import
resolution, and symbol-table-driven EXTENDS/IMPLEMENTS disambiguation
via Kotlin delegation specifiers.

* feat: extract resolvers, add PHP tests, ambiguous tests for all languages

- Extract language-specific resolvers from import-processor.ts into
  resolvers/ directory (P7): jvm, go, csharp, php, rust, standard, utils
- import-processor.ts reduced from 1412 to 711 lines (50% reduction)
- Add comprehensive PHP integration tests: PSR-4 imports, traits, enums,
  heritage edges, method calls, MRO overrides
- Add ambiguous symbol resolution tests for all 9 languages verifying
  correct disambiguation via import chains
- Split monolithic lang-resolution.test.ts (1080 lines) into 9 per-language
  files under test/integration/resolvers/ with shared helpers

* feat: update integration tests to include resolver tests for multiple languages

* fix: address code review — schema gap, Rust impl name, Property OVERRIDES

Bugs fixed:
- Add 13 missing FROM/TO pairs in RELATION_SCHEMA for HAS_METHOD edges
  (Class/Interface/Struct/Trait/Impl/Record to Method/Constructor/Property)
- Fix findEnclosingClassId to pick implementing type for Rust
  impl Trait for Struct blocks (was picking trait name)
- Exclude Property nodes from MRO OVERRIDES collision detection
- Change MRO language fallback from typescript to unknown

Tests added:
- Unit: Property OVERRIDES exclusion (2 tests), Rust impl Trait for
  Struct name resolution (2 tests), schema HAS_METHOD pair coverage
- Integration: no OVERRIDES targets Property nodes across all 9 languages
- PHP fixture: added shared $status property to both traits to create
  real collision scenario for Property OVERRIDES exclusion test

Documentation:
- OVERRIDES edge direction (Class to Method), Go return type gap,
  BFS first-reach heuristic limitation

* feat: harden CALLS-edge resolution — Phase 0 validation

- Fix same-file confidence (0.85 → 0.95) to correctly outrank import-scoped (0.9)
- Fix Tier 1 overload preservation: use globalIndex filter instead of fileIndex lookup
- Add callable-kind guard: refuse CALLS edges to Interface and Enum symbols
- Fix Kotlin countCallArguments: handle call_suffix → value_arguments nesting
- Fix Kotlin extractFunctionName: add simple_identifier to fallback search
- Strictly type findParameterList and countCallArguments (remove all `any`)
- Add arity-based call resolution integration tests for 9 languages
- Add unit regression tests for Interface/Enum CALLS refusal

* chore: remove C# build artifacts from fixtures

* feat: add call-form discrimination and ownerId to symbol table (Phase 1)

Add inferCallForm() and extractReceiverName() to distinguish free/member/constructor
calls at the AST level across all 9 languages. Add ownerId field to SymbolDefinition
linking Method/Constructor/Property to their owning class. Includes 36 unit tests
and member-call integration tests for all 9 languages (132 tests, 0 failures).

* feat: constructor/struct-literal resolution across all languages (Phase 2)

Add constructor discrimination to CALLS-edge resolution: new Foo(),
User{...} struct literals, and C# primary constructors now resolve to
Constructor/Class/Struct/Record nodes instead of being filtered out.

Queries: new_expression (C++), object_creation_expression (PHP),
composite_literal (Go), struct_expression (Rust), primary constructor
and implicit_object_creation_expression (C#).

Relaxes global tier in collectTieredCandidates to pass all candidates
through filterCallableCandidates, allowing kind/arity narrowing to
disambiguate at lower confidence.

* feat: receiver-constrained resolution with integration tests for all 9 languages

Add receiver-type filtering (Phase 3): when a member call like `user.save()`
has a known receiver type from TypeEnv, filter candidates by ownerId to
disambiguate methods with the same name across different classes.

Key changes:
- call-processor: build per-file TypeEnv, pass receiverTypeName to resolveCallTarget
- parse-worker: extract receiverTypeName from TypeEnv in worker thread
- resolveCallTarget: new step D filters by ownerId matching receiver type
- utils: extractReceiverName supports C++ field_expression (argument field)
- utils: findEnclosingClassId extracts Go method receiver types
- type-env: handle Go qualified_type, Kotlin user_type/variable_declaration
- parse-worker + parsing-processor: Function added to needsOwner for
  Kotlin/Rust/Python class methods captured as Function nodes

Integration tests added for receiver-constrained resolution across all 9
languages: TypeScript, Java, Python, Go, Rust, C++, C#, Kotlin, PHP.

* feat: NamedImportMap, scoped TypeEnv, broadened signatures + TS rest-param variadic fix

Address all 4 PR abhigyanpatwari#238 review items:
1. Remove redundant lookupFuzzy in processRoutesFromExtracted
2. Add NamedImportMap for TS/Python symbol-level import tracking (Tier 2a)
3. Make TypeEnv scope-aware (Map<scopeKey, Map<varName, type>>) to fix
   non-deterministic receiver resolution across functions
4. Broaden extractMethodSignature: Go/Rust/C++ return types, variadic
   detection for Go/Java/Python/C++/Kotlin/TypeScript rest params

Discovered and fixed: TS rest params (...args) were not detected as
variadic — added rest_pattern detection inside required_parameter nodes.

Integration tests added: scoped receiver, named import disambiguation,
and variadic call resolution for both TypeScript and Python.

* fix: alias import resolution, Go multi-assign TypeEnv, dead code removal

- NamedImportMap now stores {sourcePath, exportedName} so aliased imports
  (import { User as U }) resolve U → User in the source file
- Named binding check moved before empty-allDefs early return in both
  call-processor and symbol-resolver, fixing constructor calls via aliases
- Go extractFromGoShortVarDeclaration iterates all LHS/RHS pairs for
  multi-assignment (user, repo := User{}, Repo{}) instead of only first
- Remove unused TYPED_DECLARATION_TYPES set (TYPED_PARAMETER_TYPES kept)
- Integration tests for both fixes (go-multi-assign, typescript-alias-imports)

* feat: alias import extraction for Kotlin, Rust, PHP, C# + integration tests

Add named import alias extraction to both pipeline paths
(import-processor.ts and parse-worker.ts) for Kotlin, Rust, PHP,
and C#. Add integration test fixtures and tests for all 5 languages
(Python alias extraction already worked, just needed the test).

Each test verifies: class detection, member call resolution through
aliases to correct target files, and IMPORTS edge emission.

* refactor: use SupportedLanguages enum everywhere instead of raw strings

Replace all raw language string literals and `language: string` types
with the SupportedLanguages enum across 10 files. This ensures
compile-time safety for language dispatch and eliminates dead
`language === 'tsx'` checks (tsx maps to TypeScript in the enum).

* fix: tier-ordering bug, re-export chains, PHP grouped imports, Java named imports

- Fix collectTieredCandidates tier-ordering: same-file now checked before
  named bindings, preventing imports from shadowing local definitions
  (matches resolveSymbolInternal priority order)
- Add re-export chain resolution for TypeScript/JavaScript barrel files:
  export { X } from './base' and export type { X } from './base' now
  followed up to 5 hops through NamedImportMap
- Fix PHP grouped import alias extraction: use App\Models\{User, Repo as R}
  now correctly handled in both parse-worker and import-processor
- Add Java NamedImportMap support: import com.example.models.User now
  records User as a named binding for precise disambiguation
- Add 16 new integration tests across TypeScript, PHP, and Java resolvers
  (220 total resolver tests, all passing)

* refactor: consolidate alias extraction + add variadic/constructor/shadow integration tests

- Extract shared named-binding-extraction.ts from duplicate logic in
  import-processor.ts and parse-worker.ts (net -200 lines)
- Deduplicate appendKotlinWildcard (now imported from resolvers/index.ts)
- Add integration tests: constructor calls (Kotlin, Python), variadic
  resolution (Go, Java, C#, C++, Kotlin), re-export chains (Python),
  local definition shadowing (Python, Go)
- Add TODO(stack-graph) for TypeEnv scope key collision
- 225 integration tests passing (was 223)

* fix: PHP non-aliased imports, Python node identity, re-export chain dedup + local-shadow tests

- PHP flat non-aliased imports (use App\Models\User) now stored in NamedImportMap
- PHP grouped non-aliased imports ({User} in {User, Repo as R}) now stored in NamedImportMap
- Python: replace non-public child.id with child.startIndex for node identity
- Extract shared walkBindingChain() from symbol-resolver and call-processor
- Add PHP variadic resolution fixture + test (variadic_parameter already covers PHP)
- Add local-shadow integration tests for Java, C#, Kotlin, Rust, PHP, C++ (6 languages)

* feat: Rust non-aliased use bindings, Kotlin non-aliased imports, re-export chain resolution

Extend NamedImportMap coverage for Rust and Kotlin non-aliased imports:

- Rust: rename collectUseAsClauses → collectRustBindings, extract terminal
  scoped_identifier (use crate::models::User) and identifier in use_list
  (use crate::models::{User, Repo}) into NamedImportMap. This also enables
  pub use re-export chain following via walkBindingChain.
- Kotlin: extend extractKotlinNamedBindings to handle non-aliased imports
  (import com.example.User), skipping wildcard imports.
- Add rust-reexport-chain fixture + 3 integration tests verifying Handler{}
  resolves through mod.rs pub use to handler.rs.
- Add Kotlin heritage + constructor-calls reason assertions for non-aliased
  import-resolved resolution.
- Add C# heritage test documenting namespace import tier behavior.

* fix: skip Kotlin lowercase member imports in NamedImportMap

Member imports like `import util.OneArg.writeAudit` (lowercase last
segment) must not populate NamedImportMap — same-named function imports
from different classes collide, breaking arity-based disambiguation.
Apply the same guard Java already uses: skip lowercase last segments.

* fix: skip spurious path-prefix bindings in Rust grouped imports

collectRustBindings was extracting the path segment (e.g. "models") from
`use crate::models::{User, Repo}` as a spurious NamedImportMap entry.
Skip scoped_identifier nodes that are direct children of scoped_use_list
since they are path prefixes, not importable symbols.

Adds rust-grouped-imports fixture and 4 integration tests verifying both
symbols resolve correctly and no spurious binding leaks through.

* fix: use startIndex in TypeEnv scope key to prevent same-name method collision

Two methods named identically in different classes within the same file
previously shared a scope key, causing non-deterministic type resolution.
Now keys use funcName@startIndex for uniqueness.

Also adds tests documenting destructuring assignment extraction gap.

* test: document C# namespace-level import limitation in named binding extraction

* test: document same-arity overload discrimination limitation in call processor

* perf: parallelize calls/heritage/routes processing in worker path

Worker path now runs processCallsFromExtracted, processHeritageFromExtracted,
and processRoutesFromExtracted via Promise.all instead of sequentially.
Safe because all three only read shared state and write via addRelationship's
dedup guard. Sequential fallback path stays sequential (shared LRU astCache).

Also fixes Rust collectRustBindings spurious path-prefix bindings for 3+ level
grouped imports, and adds @param JSDoc for walkBindingChain's allDefs invariant.

* docs: improve Promise.all safety comment and walkBindingChain JSDoc

Clarify that the parallelization safety comes from disjoint relationship
types + idempotent id-keyed Maps, not from lack of shared state (the
graph is shared). Strengthen allDefs JSDoc to describe silent-miss
consequence of passing pre-filtered results.

* refactor: extract language-specific processing into modular dispatch tables

Phase 1: Extract type binding logic from type-env.ts (635→125 LOC) into
type-extractors/ directory with per-language files and Record<SupportedLanguages,
LanguageTypeConfig> + satisfies dispatch.

Phase 2: Extract 5 config loaders from import-processor.ts into
language-config.ts (removed ~196 LOC of inline loaders).

Phase 3: Convert export-detection.ts switch/case to exhaustive
Record<SupportedLanguages, ExportChecker> + satisfies dispatch table,
fix node: any → SyntaxNode.

Also adds language feature matrix to README.

All 1146 unit tests and 433 integration tests pass.

* refactor: extract type binding logic into type-extractors/ directory (Phase 1)

Extract per-language type extraction from type-env.ts (635→125 LOC) into
type-extractors/ with Record<SupportedLanguages, LanguageTypeConfig> + satisfies
dispatch. 9 per-language files, shared helpers, and barrel index.

* refactor: extract config loaders to language-config.ts (Phase 2)

Move 5 language-specific config loaders and their type interfaces from
import-processor.ts into standalone language-config.ts module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants