feat: language-aware code intelligence — symbol resolution, MRO, constructor discrimination by magyargergo · Pull Request #238 · abhigyanpatwari/GitNexus

magyargergo · 2026-03-10T22:58:05Z

Summary

This PR builds a language-aware code intelligence engine -- the core of an LSP-grade symbol resolver that understands imports, inheritance, method ownership, constructor patterns, receiver-typed member calls, and call-site semantics across 9 languages.

Symbol Resolution Engine

3-tier resolver: exact FQN -> scope-walk (file/namespace) -> guarded fuzzy fallback that refuses ambiguous matches
Import-aware disambiguation: resolves duplicate symbol names using import chains (PSR-4, JVM packages, Go packages, Rust modules, C++ includes, C# namespaces, Python relative imports)
Call-form discrimination: callForm field distinguishes regular calls, member calls (obj.method()), and constructor calls (new Foo() / struct literals) -- each form gets different candidate filtering
Arity filtering: narrows overloaded candidates by argument count when multiple symbols share a name
Receiver-type filtering: for member calls with typed receivers, filters candidates by ownerId matching the resolved receiver type -- disambiguates user.save() vs repo.save() when both User and Repo have save() methods

Receiver-Constrained Resolution (Phase 3)

TypeEnv: per-file Map<string, string> built from explicit type annotations (let x: T, T x, x: T, var x models.T, typed params)
extractReceiverName: extracts receiver identifier from member call AST across all 9 languages
resolveCallTarget step D: filters candidates by ownerId matching resolved receiver type nodeId
Go receiver types: findEnclosingClassId extracts receiver type from method_declaration (e.g., func (u *User) Save() -> User struct)
Safety: falls back to name-only resolution if receiver type unknown; refuses if 2+ candidates remain after filtering

Constructor & Struct Literal Resolution

new Foo() resolves to Constructor > Class (TS, Java, C#, C++, PHP, Kotlin)
Struct literals (User{...} in Go, User { ... } in Rust) resolve to Struct nodes
C# primary constructors (class User(string name, int age)) emit Constructor nodes with correct parameterCount
C# target-typed new (User u = new("x")) extracts type from variable declaration

Method Resolution Order (MRO)

Walks inheritance DAG, detects method-name collisions across parents
5 language-specific strategies: C++ leftmost-base, C#/Java class-over-interface, Python C3 linearization, Rust qualified syntax, default BFS
Emits OVERRIDES edges when inherited methods collide (never targets Property nodes)

Heritage & Ownership

HAS_METHOD edges: Class->Method ownership from AST container analysis
Method signatures: parameterCount and returnType extracted for Method/Constructor nodes
Go struct embedding: EXTENDS edges for embedded structs + PackageMap optimization
Swift extension heritage: extension declarations emit EXTENDS edges
Rust grouped imports: use crate::{models, other} correctly expanded

Resolver Architecture

Language-specific resolvers extracted into resolvers/ directory (JVM, Go, C#, PHP, Rust, standard, utils)
import-processor.ts reduced from 1412 -> 711 lines (50% reduction)

Bug Fixes

C/C++ resolveImportPath no longer mangles #include paths
Heritage heuristic language-gated (no longer applies class/interface rules to wrong languages)
C# base_list distinguishes EXTENDS vs IMPLEMENTS via symbol table + I[A-Z] heuristic
Global tier no longer blocks resolution when kind/arity filtering can narrow to 1 candidate
Go qualified_type (models.User) correctly unwrapped in TypeEnv
Kotlin user_type wrapper and variable_declaration nesting handled in TypeEnv
C++ field_expression receiver extraction via argument field

Schema

Method table: parameterCount INT32, returnType STRING
Cypher examples updated with HAS_METHOD, OVERRIDES, diamond inheritance queries

Test plan

Integration tests -- 9 languages (`test/integration/resolvers/`)

Language	Tests	Coverage
TypeScript	19	heritage, imports, member-calls, constructor resolution, receiver-constrained
Java	22	heritage, ambiguous, arity filtering, member-calls, constructors, receiver-constrained
C#	23	heritage, ambiguous, arity, member-calls, primary constructors, receiver-constrained
C++	18	diamond inheritance, ambiguous, arity, member-calls, constructors, receiver-constrained
Python	14	heritage, ambiguous, arity, member-calls, receiver-constrained
Go	18	struct embedding, PackageMap, ambiguous, arity, member-calls, struct literals, receiver-constrained
Rust	17	trait-impl, ambiguous, arity, member-calls, struct literals, receiver-constrained
Kotlin	18	delegation specifiers, JVM imports, ambiguous, arity, member-calls, receiver-constrained
PHP	29	PSR-4, traits, enums, MRO overrides, ambiguous, arity, member-calls, constructors, receiver-constrained

Unit tests

13 MRO tests: C++ diamond, C#/Java class+interface, Python C3 + cycle detection, Rust traits
21 HAS_METHOD tests: container types, findEnclosingClassId across all languages (incl. Go receiver types)
11 method signature tests: extractMethodSignature for TS/Python/Java/C#
14 heritage processor tests: C# interface resolution via symbol table
5 constructor-aware filtering tests: kind narrowing, arity fallback
TypeEnv tests: type annotation extraction across 9 languages
Symbol resolver tests: scope-walk, ambiguity refusal, tier priority
Call-form tests: inferCallForm, extractReceiverName, countCallArguments

Stats

230 files changed, 10,100 insertions, 973 deletions
1242+ tests passing, 53 test files

vercel · 2026-03-10T22:58:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
gitnexus	Ready	Preview, Comment	Mar 13, 2026 0:25am

Implement full MRO computation for multi-language inheritance hierarchies: - HAS_METHOD edges: Class→Method ownership edges emitted during parsing (both worker pool and sequential fallback paths) - Method signatures: extract parameterCount and returnType from AST nodes - C# heritage fix: distinguish EXTENDS vs IMPLEMENTS for base_list captures using symbol table lookup + I[A-Z] naming heuristic fallback - MRO processor (Phase 4.5): walks inheritance DAG, detects method-name collisions across parents, applies language-specific resolution: - C++: leftmost base class in declaration order wins - C#/Java: class method wins over interface default - Python: C3 linearization with cycle detection - Rust: no auto-resolution (requires qualified syntax) - Default: first definition in BFS order wins - OVERRIDES edges emitted for resolved method collisions - KuzuDB schema: Method table extended with parameterCount/returnType; dedicated CSV writer and COPY query for 10-column Method rows - MCP tools: updated Cypher examples for HAS_METHOD, OVERRIDES, diamond 72 tests across 5 test files covering MRO resolution, HAS_METHOD edges, method signature extraction, C# heritage resolution, and integration tests across C#/Rust/Python/TS/Java/C++.

github-actions · 2026-03-10T23:10:30Z

CI Report

✅ All checks passed

Pipeline Status

Stage	Status	Details
✅ Typecheck	`success`	tsc --noEmit
✅ Unit Tests	`success`	3 platforms
✅ Integration	`success`	3 OS x 4 groups = 12 jobs

Test Results

✅ 1636 passed
· 507 suites · 1636 total
· ⏱️ 38s
· 📊 1146 unit + 490 integration

Code Coverage

Combined (Unit + Integration)

Metric	Coverage	Covered	Threshold	Status
Statements	48.94%	3775/7713	26%	🟢 █████████░░░░░░░░░░░
Branches	41%	2188/5336	23%	🟢 ████████░░░░░░░░░░░░
Functions	51.04%	417/817	28%	🟢 ██████████░░░░░░░░░░
Lines	50.37%	3496/6940	27%	🟢 ██████████░░░░░░░░░░

Coverage breakdown by test suite

Unit Tests

Metric	Coverage	Covered	Threshold	Status
Statements	37.7%	2908/7713	26%	🟢 ███████░░░░░░░░░░░░░
Branches	32.36%	1727/5336	23%	🟢 ██████░░░░░░░░░░░░░░
Functions	39.16%	320/817	28%	🟢 ███████░░░░░░░░░░░░░
Lines	39.07%	2712/6940	27%	🟢 ███████░░░░░░░░░░░░░

Integration Tests

Metric	Coverage	Covered	Threshold	Status
Statements	24.69%	1905/7713	26%	🔴 ████░░░░░░░░░░░░░░░░
Branches	19.28%	1029/5336	23%	🔴 ███░░░░░░░░░░░░░░░░░
Functions	26.8%	219/817	28%	🔴 █████░░░░░░░░░░░░░░░
Lines	25.37%	1761/6940	27%	🔴 █████░░░░░░░░░░░░░░░

Coverage thresholds are auto-ratcheted — they only go up

Vitest thresholds.autoUpdate bumps the floor whenever local coverage exceeds it.
CI enforces the current thresholds; developers commit the ratcheted values.

_{📋 View full run · Generated by CI}

Introduces a shared 3-tier resolveSymbol function used by both heritage-processor and call-processor: 1. Same-file (lookupExactFull — authoritative) 2. Import-scoped (filtered by ImportMap — high confidence) 3. Global fuzzy (first match — low confidence fallback) Adds lookupExactFull to SymbolTable returning full SymbolDefinition with type info needed for heritage Class/Interface disambiguation.

magyargergo · 2026-03-11T00:26:49Z

@claude

claude · 2026-03-11T00:27:04Z

Claude finished @magyargergo's task in 3m 24s —— View job

Code Review

Great work on this feature — the overall architecture is clean and the test coverage is solid. Here are my findings, grouped by severity.

Bugs / Critical Issues

1. KuzuDB schema missing Class→Property (and other) HAS_METHOD edge pairs (schema.ts)

HAS_METHOD edges are emitted for Property nodes (parse-worker.ts:955, parsing-processor.ts:252), but FROM Class TO \Property`is not listed inRELATION_SCHEMA. This will silently fail or error when loading the CSV into KuzuDB. The same may be true for other container→Property combinations (e.g. Struct→Property). You'll need to add these missing FROM/TO` pairs to the relation table schema.

Fix this →

2. findEnclosingClassId picks the trait name, not the struct name for impl Trait for Struct (utils.ts:271-275)

For Rust's impl MyTrait for MyStruct { fn foo() {} }, the impl_item AST node's first type_identifier child is MyTrait, not MyStruct. The current fallback:

current.children?.find((c: any) =>
  c.type === 'type_identifier' || c.type === 'identifier' || c.type === 'name'
);

…will find MyTrait first, so HAS_METHOD edges for methods inside impl Trait for Struct point to the wrong class. The for keyword is a sibling node that separates trait from implementor — you'd need to skip past it to find the implementing type.

Fix this →

Design / Semantic Issues

3. MRO emits OVERRIDES for Property name collisions (mro-processor.ts:341-399)

methodsByName is built from all HAS_METHOD target nodes, which includes Property nodes. If two parent classes have a property with the same name, the MRO processor will emit an OVERRIDES edge for it — semantically misleading since properties don't have resolution order semantics in most languages. Consider filtering methodMap to only include Method and Constructor nodes, or renaming the concept.

4. buildTransitiveEdgeTypes first-reach heuristic can misclassify ancestors (mro-processor.ts:421-453)

Consider a hierarchy where MyClass : BaseClass, IFoo and BaseClass also implements IFoo. Then IFoo is reachable via EXTENDS→EXTENDS through BaseClass. BFS first-reach would mark IFoo as EXTENDS rather than IMPLEMENTS. In practice this affects the resolveCsharpJava conflict resolution: an interface method from IFoo could be treated as a class method (winning the conflict when it should be a lower-priority interface method).

5. OVERRIDES edge direction is Class→Method (not Method→Method) (mro-processor.ts:389-398)

graph.addRelationship({
  sourceId: classId,
  targetId: resolution.resolvedTo,  // a methodId
  type: 'OVERRIDES',

The edge goes from the child class to the winning ancestor method. Cypher queries expecting (m1:Method)-[:OVERRIDES]->(m2:Method) won't work — you'd need (c:Class)-[:OVERRIDES]->(m:Method). This semantic should be documented explicitly, or consider emitting Method→Method edges from a (potentially synthesized) child method node.

Minor Issues

6. Default language fallback is 'typescript' instead of 'default' (mro-processor.ts:287)

const language = classNode.properties.language ?? 'typescript';

If TypeScript ever gets a dedicated MRO case added to the switch, this silent fallback would bypass the default: branch for nodes with missing language. Using 'unknown' or 'default' as fallback would be clearer intent.

Fix this →

7. I[A-Z] heuristic applied to all languages (heritage-processor.ts:49)

resolveExtendsType is called for all languages when processing heritage.extends captures (TypeScript, Python, C++, etc.). The I[A-Z] naming convention heuristic only makes sense for C# and Java. A class named Icon in Python or Inspector in C++ would incorrectly get an IMPLEMENTS edge. Consider gating the heuristic behind a language check.

Fix this →

8. extractMethodSignature misses Go return types

Go uses a different AST structure for return types (no type_annotation or return_type node types — Go uses return_type but as a field name, and results can be grouped in parameter_list). The extractMethodSignature function returns undefined for Go methods consistently, making returnType unusable for Go-heavy codebases. Worth adding a Go-specific branch or at least documenting the gap.

9. Symbol resolver Tier 3 returns the first global match silently (symbol-resolver.ts:46)

return allDefs[0];

When there are multiple classes with the same name across files, the first-indexed match wins with no signal about the ambiguity. This could cause incorrect EXTENDS/IMPLEMENTS edges to be drawn to the wrong class. Consider returning null when allDefs.length > 1 and the name is too common, or at minimum adding the confidence on the resulting relationship.

Observations (not bugs)

MROEntry results not persisted: The full linearized MRO order per class is computed but then discarded after computeMRO. If you want to expose MRO order via Cypher queries later, you'd need to store entries as node properties or in a dedicated table.
Constructor parameterCount/returnType: Constructor nodes get method signatures extracted, but returnType is always undefined (constructors have no explicit return). escapeCSVNumber defaults parameterCount to 0 which is fine, but the Constructor table schema (CODE_ELEMENT_BASE) doesn't include these columns. Since Constructors go to the constructor.csv (multiLang path), the fields are silently dropped — that's consistent, but worth knowing.
Duplicate HAS_METHOD logic: The same enclosing-class detection exists in both parse-worker.ts:954–967 and parsing-processor.ts:251–264. Any future bug fix must be applied in both places.

Overall the implementation is well-structured and the test suite is thorough. The Class→Property schema gap and the Rust impl Trait for Struct name issue are the two I'd prioritize fixing before merge.

- lookupExactFull now O(1) via direct SymbolDefinition storage in fileIndex (shared object references with globalIndex — zero additional memory) - Added resolveSymbolInternal() preserving { definition, tier, candidateCount } for test assertions and logging - Tier 3 now returns null when multiple global candidates exist instead of arbitrary allDefs[0] — a wrong edge is worse than no edge - call-processor: renamed fuzzy-global → unique-global, removed dead branch - 12 new tests: tier assertions, ambiguous refusal per language family, heritage false-positive guard, O(1) shared reference verification

Phase 5 critical fixes from all-language analysis: - Python: add relative_import query capture (PEP 328) — `.models`, `..utils` were silently dropped, producing zero ImportMap entries - Rust: extract prefix from grouped imports `crate::module::{A, B}` — brace groups previously failed resolution entirely - Swift: use normalizedFileList for Windows path compatibility in module import resolution (matches Go's resolveGoPackage pattern) - MRO: fix c_sharp → csharp language name mismatch (enum is 'csharp'), add Kotlin to C#/Java resolution rules (class method wins over interface)

…resolution Add 32 integration tests across 6 language fixtures (TypeScript, C#, C++, Java, Python, Rust) with exact toBe/toEqual assertions validating heritage edges, import resolution, and trait implementations. Fix C/C++ import resolution bug where dot-to-slash conversion mangled include paths (e.g. "animal.h" became "animal/h"). Now skips conversion for C/C++ languages which use actual file paths in #include directives.

…handle Rust grouped imports - Gate I[A-Z] naming heuristic to C#/Java only (was firing for all languages) - Swift unresolved types default to IMPLEMENTS (protocol conformance is the norm) - Add tree-sitter query for Swift extension protocol conformance (extension Foo: Protocol) - Handle Rust top-level grouped imports (use {crate::a, crate::b}) in both import loops - Add 4 new heritage-processor tests (TypeScript refusal, Swift default, Swift Tier 1)

Add Go struct embedding detection (anonymous fields → EXTENDS edges) via new tree-sitter heritage query with named-field filtering in both parse-worker and heritage-processor paths. Implement PackageMap optimization for Go cross-package resolution: replace O(N) file-level ImportMap expansion with directory-level suffix matching (Tier 2b in symbol resolver). Graph IMPORTS edges are preserved via addImportGraphEdge split. Remove overly broad @definition.type from GO_QUERIES that was double-matching structs/interfaces as TypeAlias nodes, breaking Tier 3 unique-global resolution. Add Go fixture (go-pkg) with Admin→User embedding, cross-package calls, and 7 integration tests covering structs, functions, imports, calls, and heritage edges.

Adds a kotlin-heritage fixture and 7 integration tests validating class inheritance, interface implementation, JVM-style import resolution, and symbol-table-driven EXTENDS/IMPLEMENTS disambiguation via Kotlin delegation specifiers.

…ages - Extract language-specific resolvers from import-processor.ts into resolvers/ directory (P7): jvm, go, csharp, php, rust, standard, utils - import-processor.ts reduced from 1412 to 711 lines (50% reduction) - Add comprehensive PHP integration tests: PSR-4 imports, traits, enums, heritage edges, method calls, MRO overrides - Add ambiguous symbol resolution tests for all 9 languages verifying correct disambiguation via import chains - Split monolithic lang-resolution.test.ts (1080 lines) into 9 per-language files under test/integration/resolvers/ with shared helpers

… languages

magyargergo · 2026-03-11T13:46:00Z

@claude review this PR since it's been changed since you reviewed it.

claude · 2026-03-11T13:46:14Z

Claude finished @magyargergo's task in 4m 6s —— View job

Code Review (Updated)

Gather context and understand the PR
Read key changed files
Check which previous review issues were addressed
Review new/changed code for fresh issues
Post comprehensive updated review

What Changed Since the Last Review

Two of the previously-flagged issues are now fixed:

✅ Issue 7 (was Minor): I[A-Z] interface heuristic is now correctly language-gated to C#/Java only, with Swift getting its own IMPLEMENTS default (heritage-processor.ts:58-65).
✅ Issue 9 (was Minor): Symbol resolver Tier 3 now returns null when allDefs.length > 1, properly refusing ambiguous global matches (symbol-resolver.ts:86-91). Call and route resolvers now delegate to resolveSymbol, picking up this behaviour automatically.

Still Outstanding from Previous Review

1. Class → Property schema pair missing (schema.ts — Bug, still unfixed)

HAS_METHOD edges are emitted from Class/Interface/Struct/Trait nodes to Property nodes (parse-worker.ts:964, parsing-processor.ts:252). The RELATION_SCHEMA in schema.ts includes:

FROM File TO `Property`          ✅
FROM Function TO `Property`      ✅
FROM Method TO `Property`        ✅
FROM Constructor TO `Property`   ✅

But these are still absent:

FROM Class TO `Property`         ❌
FROM Interface TO `Property`     ❌
FROM Struct TO `Property`        ❌
FROM Trait TO `Property`         ❌

Any HAS_METHOD edge where the source is a Class/Interface/Struct/Trait and the target is a Property will fail to be stored in KuzuDB. This is the highest-priority fix before merge.

Fix this →

2. findEnclosingClassId picks the trait name for impl Trait for Struct (utils.ts:271-275 — Bug, still unfixed)

The fallback children.find() returns the first type_identifier it encounters. In the tree-sitter Rust AST for impl MyTrait for MyStruct {}, the children are: impl, MyTrait, for, MyStruct, {...}. The first type_identifier is MyTrait, so generateId('Impl', 'path/to/file.rs:MyTrait') is used instead of MyStruct. All HAS_METHOD edges from such blocks point to the wrong identifier.

Fix this →

3. Default language fallback is 'typescript' (mro-processor.ts:287 — Minor, still unfixed)

const language = classNode.properties.language ?? 'typescript';

If TypeScript ever gets its own MRO case in the switch, nodes with a missing language property would silently route there instead of default:. Using 'unknown' is clearer intent.

Fix this →

4. extractMethodSignature misses Go return types (utils.ts:474-479 — Minor, still unfixed)

Go's return_type field exists in the tree-sitter grammar but uses a different AST structure than type_annotation. The function returns returnType: undefined for all Go methods, making the column unused for Go-heavy codebases. Worth at least a comment documenting the gap.

Design Issues Carried Over (No Changes)

5. MRO emits OVERRIDES for Property name collisions (mro-processor.ts:305-330)

methodsByName is populated from all HAS_METHOD targets, which includes Property nodes. Two parent classes with a property of the same name generate an OVERRIDES edge — semantically odd since properties don't have dispatch semantics in most languages. Filtering methodMap to only Method/Constructor nodes before the collision pass would tighten this.

6. buildTransitiveEdgeTypes first-reach heuristic (mro-processor.ts:421-453)

BFS propagates the edge type of the first path found. If BaseClass also implements IFoo, then IFoo reachable via Class→EXTENDS→BaseClass→IMPLEMENTS→IFoo would be classified as EXTENDS rather than IMPLEMENTS. This edge case means resolveCsharpJava could treat an interface method as a class method (wrong winner). Worth documenting this known limitation as a comment.

7. OVERRIDES edge semantic (mro-processor.ts:389-398)

The edge goes from the child class to the winning ancestor method (Class → Method). This differs from the natural Method → Method direction most developers would expect for OVERRIDES. This is workable since FROM Class TO Method exists in the schema, but Cypher queries need to use:

MATCH (c:Class)-[r:CodeRelation {type: 'OVERRIDES'}]->(m:Method)

rather than the more intuitive Method → Method form. The MCP tool descriptions and docstring should document this explicitly.

New Minor Issue

8. Redundant lookupFuzzy in processRoutesFromExtracted (call-processor.ts:291)

After resolveSymbol (which internally calls lookupFuzzy) returns a result, the function immediately calls lookupFuzzy again just to compute a confidence value:

const controllerDef = resolveSymbol(route.controllerName, ...);
// ...
const allControllerDefs = symbolTable.lookupFuzzy(route.controllerName); // redundant
const confidence = isImportedController ? 0.9 : allControllerDefs.length === 1 ? 0.7 : 0.5;

Since resolveSymbol already refuses to return when allDefs.length > 1, the allControllerDefs.length === 1 branch is always true here. The second lookupFuzzy call can be removed, simplifying to confidence = isImportedController ? 0.9 : 0.7.

Fix this →

Summary

The two critical fixes from last review remain: the Class→Property schema gap (#1) could silently drop edges at DB-load time, and the Rust impl Trait for Struct name issue (#2) misroutes HAS_METHOD edges. These should be addressed before merge. Everything else is either a minor improvement or an acceptable known limitation.

…IDES Bugs fixed: - Add 13 missing FROM/TO pairs in RELATION_SCHEMA for HAS_METHOD edges (Class/Interface/Struct/Trait/Impl/Record to Method/Constructor/Property) - Fix findEnclosingClassId to pick implementing type for Rust impl Trait for Struct blocks (was picking trait name) - Exclude Property nodes from MRO OVERRIDES collision detection - Change MRO language fallback from typescript to unknown Tests added: - Unit: Property OVERRIDES exclusion (2 tests), Rust impl Trait for Struct name resolution (2 tests), schema HAS_METHOD pair coverage - Integration: no OVERRIDES targets Property nodes across all 9 languages - PHP fixture: added shared $status property to both traits to create real collision scenario for Property OVERRIDES exclusion test Documentation: - OVERRIDES edge direction (Class to Method), Go return type gap, BFS first-reach heuristic limitation

- Fix same-file confidence (0.85 → 0.95) to correctly outrank import-scoped (0.9) - Fix Tier 1 overload preservation: use globalIndex filter instead of fileIndex lookup - Add callable-kind guard: refuse CALLS edges to Interface and Enum symbols - Fix Kotlin countCallArguments: handle call_suffix → value_arguments nesting - Fix Kotlin extractFunctionName: add simple_identifier to fallback search - Strictly type findParameterList and countCallArguments (remove all `any`) - Add arity-based call resolution integration tests for 9 languages - Add unit regression tests for Interface/Enum CALLS refusal

collectRustBindings was extracting the path segment (e.g. "models") from `use crate::models::{User, Repo}` as a spurious NamedImportMap entry. Skip scoped_identifier nodes that are direct children of scoped_use_list since they are path prefixes, not importable symbols. Adds rust-grouped-imports fixture and 4 integration tests verifying both symbols resolve correctly and no spurious binding leaks through.

abhigyanpatwari · 2026-03-12T21:39:40Z

@magyargergo check these optimizations:

The Heritage resolution and Call resolution needs symbol table and import map respectively. So they can be parallelized

The MRO scan can benefit from a type-index. It will go from O ( all edges ) to O( inheritance edges )

Also I think we can cache import alias per-chunk since symbol table dont change mid-resolution
the heritage resolution and call resolution optimizations should give good performance boost for huge java repos especially
instead of awaiting processImports , processCalls and processHeritage u can just do

await processImports();
await Promise.all([
processCalls(),
processHeritage(),
]);
easy fix high impact, if i m not missing something here

magyargergo · 2026-03-13T05:45:52Z

@claude review the code and also focus on evaluating the comments above.

claude · 2026-03-13T05:46:06Z

Claude finished @magyargergo's task in 6m 22s —— View job

Code Review (Updated) + Optimization Evaluation

Gather context — check what changed since last review (ce347de)
Read key changed files (pipeline.ts, mro-processor.ts, named-binding-extraction.ts, call-processor.ts, heritage-processor.ts)
Evaluate optimization suggestions from @abhigyanpatwari
Review new changes for bugs and design issues
Post comprehensive review

What Changed Since the Last Review

Two commits landed since the previous review (054a344):

✅ 3b8adb8 — Fix spurious path-prefix binding for Rust grouped imports. collectRustBindings now detects scoped_identifier nodes whose parent.type === 'scoped_use_list' and skips extracting them as symbols (they are path prefixes, not importable names). 4 new integration tests added for rust-grouped-imports. This directly addresses Issue Add support for Ollama as a local inference backend #1 from the last review.
✅ ce347de — Fix Kotlin lowercase member import regression introduced in 005ff3a. A guard exportedName[0] === exportedName[0].toLowerCase() correctly prevents import util.OneArg.writeAudit (member method import) from being added to NamedImportMap as a class-level binding.

Both fixes are correctly implemented. One residual limitation noted below.

Residual Limitation in `3b8adb8`

Rust paths deeper than 2 levels still produce spurious path-segment bindings (named-binding-extraction.ts:244-251)

The guard fires only for scoped_identifier whose direct parent is scoped_use_list. For use crate::models::sub::{A, B}, the outer scoped_identifier(crate::models::sub) hits the guard and recurses into its children — one of which is the inner scoped_identifier(crate::models). This inner node has parent type scoped_identifier, not scoped_use_list, so it bypasses the guard and falls through to the "terminal scoped_identifier" case (hasDeeper = false), extracting models as a spurious binding.

Severity: Same as the original bug — lookupFuzzy('models') finds no graph node, chain walks return null, graceful fallback. No correctness impact in practice.

Minimal fix: Replace the recursion-into-children inside the guard with a simple return, since the use_list sibling of the path prefix is already handled by the final recursive case of the parent scoped_use_list:

// Current: recurses into children of the path prefix (unnecessary)
if (node.type === 'scoped_identifier' && node.parent?.type === 'scoped_use_list') {
  for (let i = 0; i < node.namedChildCount; i++) {
    const child = node.namedChild(i);
    if (child) collectRustBindings(child, bindings);
  }
  return;
}

// Simpler: just skip the path prefix entirely
if (node.type === 'scoped_identifier' && node.parent?.type === 'scoped_use_list') {
  return; // path prefix — the use_list sibling handles the actual symbols
}

The current recursion produces no bindings for 2-level paths, but introduces the spurious binding for 3+ level paths. The return form is correct for all depths.

Fix this →

Evaluation of @abhigyanpatwari's Optimization Suggestions

1. Parallelize `processCalls` + `processHeritage` with `Promise.all` — ✅ Correct and beneficial

The suggestion is architecturally sound. Looking at the pipeline (pipeline.ts:226-238):

// Current sequential (per-chunk):
await processImportsFromExtracted(...);          // populates importMap, packageMap, namedImportMap
await processCallsFromExtracted(...);            // reads symbolTable/importMap → writes to graph
await processHeritageFromExtracted(...);         // reads symbolTable/importMap → writes to graph
await processRoutesFromExtracted(...);           // reads symbolTable/importMap → writes to graph

Why it's safe: All three *FromExtracted functions exclusively read from symbolTable, importMap, packageMap, and namedImportMap — none of which are mutated during this phase. All three write only to graph.addRelationship, and since Node.js is single-threaded, the await yieldToEventLoop() interleaving between them cannot cause concurrent writes. Each addRelationship call is atomic from the event loop's perspective.

What changes: processRoutesFromExtracted also qualifies for the same parallelization:

await processImportsFromExtracted(...); // must complete first — populates importMap/namedImportMap
await Promise.all([
  chunkWorkerData.calls.length > 0
    ? processCallsFromExtracted(graph, chunkWorkerData.calls, symbolTable, importMap, packageMap, undefined, namedImportMap)
    : Promise.resolve(),
  chunkWorkerData.heritage.length > 0
    ? processHeritageFromExtracted(graph, chunkWorkerData.heritage, symbolTable, importMap, packageMap)
    : Promise.resolve(),
  chunkWorkerData.routes?.length > 0
    ? processRoutesFromExtracted(graph, chunkWorkerData.routes, symbolTable, importMap, packageMap)
    : Promise.resolve(),
]);

Impact: Significant for large Java repos with deep class hierarchies. Heritage resolution is the bottleneck in those cases (hundreds of resolveSymbol calls per class file), and it currently blocks call resolution from starting. For a 10,000-file Java repo, this could meaningfully reduce wall time per chunk. The sequential fallback path (processCalls + processHeritage sharing astCache) is also safe to parallelize since astCache is a JS Map with synchronous writes — no race conditions in Node.js's single-threaded model, though there would be redundant AST parsing since both functions would parse the same files independently.

One caveat: processHeritageFromExtracted does not accept namedImportMap as a parameter (it uses only importMap + packageMap for resolution). This is fine — heritage resolution only needs to know which file a class belongs to, not which specific named symbol was imported. No change needed there.

Implement this →

2. MRO type-index (O(all edges) → O(inheritance edges)) — ✅ Correct and impactful

buildAdjacency in mro-processor.ts:64-91 iterates all edges via graph.forEachRelationship, filtering for EXTENDS, IMPLEMENTS, and HAS_METHOD in the callback. For a large Java repo with millions of CALLS edges (every method call site), this scans all of them to find the comparatively few inheritance edges.

The fix requires adding a type-indexed accessor to KnowledgeGraph:

// types.ts
forEachRelationshipOfType: (type: string | string[], fn: (rel: GraphRelationship) => void) => void;

Implementation would maintain a Map<string, GraphRelationship[]> index populated in addRelationship. Then buildAdjacency calls:

graph.forEachRelationshipOfType(['EXTENDS', 'IMPLEMENTS', 'HAS_METHOD'], (rel) => { ... });

Impact: For a 100K-node graph with 500K edges (typical large Java repo), the MRO scan currently iterates all 500K edges. If there are 20K HAS_METHOD edges and 5K inheritance edges, the optimization reduces this by ~94%. The type-index approach is also reusable by other processors (e.g., communityProcessor could benefit from similar indexing).

Scope: 3 files — types.ts, graph.ts (implementation), mro-processor.ts (call site). This is a clean, bounded change.

3. Cache import alias per chunk — ✅ Valid micro-optimization, lower priority

The suggestion is correct: walkBindingChain is called once per call site for each member call resolution, and for a file with 50 call sites on User (all imported from the same module), 49 of those chain walks are redundant — the result is identical each time.

The cache would be:

const walkCache = new Map<string, SymbolDefinition[] | null>(); // key: `${filePath}:${name}`

Why it's lower priority than #1 and #2: Each chain walk is at most 5 Map.get lookups (O(1)) plus one lookupFuzzy call per hop. For most files, the chain is 0-1 hops (no re-export). The savings per call site are microseconds. For large files with hundreds of calls to the same type, savings accumulate, but this is not the primary bottleneck.

The optimization is worth doing when the other two are in place — it's a simple Map.get before the chain walk and Map.set after. The cache would need to be scoped per-chunk (created inside the chunk loop and discarded after).

Summary

Both new commits are correct and the previous Issue #1 (Rust spurious path binding) is resolved for the common 2-level path case. The residual limitation for 3+ level paths is low severity and easily fixable.

On the optimization suggestions: all three are technically sound. #1 (Promise.all parallelization) is the highest-impact, lowest-effort change — it's a one-line restructure with clear correctness guarantees in Node.js's single-threaded model. #2 (MRO type-index) is the most architecturally impactful, requiring a new graph accessor. #3 (alias walk cache) is a valid micro-optimization for large files.

The PR is in good shape. The Rust path-prefix recursion fix and the three optimizations are the only remaining actionable items.

…collision Two methods named identically in different classes within the same file previously shared a scope key, causing non-deterministic type resolution. Now keys use funcName@startIndex for uniqueness. Also adds tests documenting destructuring assignment extraction gap.

…extraction

…processor

Merged origin/main (03bfa3c) into feat/method-resolution-order. Conflicts in CLAUDE.md and AGENTS.md were GitNexus index stats — kept our branch's numbers (1747 symbols, 4569 relationships).

Worker path now runs processCallsFromExtracted, processHeritageFromExtracted, and processRoutesFromExtracted via Promise.all instead of sequentially. Safe because all three only read shared state and write via addRelationship's dedup guard. Sequential fallback path stays sequential (shared LRU astCache). Also fixes Rust collectRustBindings spurious path-prefix bindings for 3+ level grouped imports, and adds @param JSDoc for walkBindingChain's allDefs invariant.

Clarify that the parallelization safety comes from disjoint relationship types + idempotent id-keyed Maps, not from lack of shared state (the graph is shared). Strengthen allDefs JSDoc to describe silent-miss consequence of passing pre-filtered results.

magyargergo · 2026-03-13T09:01:30Z

Performance & Code Quality Updates

Addressed remaining review feedback with two focused changes:

1. Parallelize `processCalls` + `processHeritage` + `processRoutes` (worker path)

Commit: 814fac1 + 0227448

Replaced sequential await calls with Promise.all in the worker path. Safe because:

Each processor writes disjoint relationship types (CALLS / EXTENDS+IMPLEMENTS / route-CALLS)
All writes go through addRelationship's idempotent if (!has) Map guard
Node.js single-threaded event loop prevents races between synchronous calls
Sequential fallback path left unchanged (shared LRU astCache makes parallelization counterproductive there)

Also removed redundant length > 0 guards — each function handles empty input gracefully.

2. Rust `collectRustBindings` path-prefix fix + JSDoc

Commit: 814fac1 + 0227448

Fixed spurious bindings for 3+ level Rust grouped imports (use crate::a::b::{X, Y} no longer emits b as a binding)
Added @param allDefs JSDoc to walkBindingChain documenting the no-pre-filtering invariant

Dropped (YAGNI)

Type-indexed edge access: 500K string comparisons is single-digit ms in V8, single callsite doesn't justify new API
walkBindingChain cache: Each chain is ≤5 Map.get lookups (microseconds), cache adds complexity for negligible gain

Test Results

All 1598+ tests pass. Only KuzuDB integration tests fail (pre-existing Windows file-lock issue).

🤖 Generated with Claude Code

…tables Phase 1: Extract type binding logic from type-env.ts (635→125 LOC) into type-extractors/ directory with per-language files and Record<SupportedLanguages, LanguageTypeConfig> + satisfies dispatch. Phase 2: Extract 5 config loaders from import-processor.ts into language-config.ts (removed ~196 LOC of inline loaders). Phase 3: Convert export-detection.ts switch/case to exhaustive Record<SupportedLanguages, ExportChecker> + satisfies dispatch table, fix node: any → SyntaxNode. Also adds language feature matrix to README. All 1146 unit tests and 433 integration tests pass.

…(Phase 1) Extract per-language type extraction from type-env.ts (635→125 LOC) into type-extractors/ with Record<SupportedLanguages, LanguageTypeConfig> + satisfies dispatch. 9 per-language files, shared helpers, and barrel index.

Move 5 language-specific config loaders and their type interfaces from import-processor.ts into standalone language-config.ts module.

# Conflicts: # gitnexus/src/core/ingestion/import-processor.ts # gitnexus/src/core/ingestion/language-config.ts

magyargergo · 2026-03-13T13:04:16Z

@claude review my latest changes and focus on the performance characteristics of this PR and how this evolved from the beginning. Include flow diagrams and breakdowns.

claude · 2026-03-13T13:04:32Z

Claude finished @magyargergo's task in 3m 25s —— View job

Performance Analysis: Evolution of the Resolution Pipeline

Post initial comment
Read pipeline.ts and processing files
Analyze git log for performance evolution
Review latest commits (814fac1, 0227448)
Build flow diagrams and write analysis
Post final review

Latest Changes (`814fac1` + `0227448`)

The two most recent commits are correct and complete:

✅ Pipeline parallelization (814fac1) — processCalls, processHeritage, processRoutes now run via Promise.all in the worker path. Node.js's single-threaded event loop makes this safe: all three processors only read shared state (symbolTable, importMap, packageMap, namedImportMap) and only write via addRelationship's idempotent dedup guard.
✅ Rust path-prefix recursion fixed (814fac1) — collectRustBindings now returns early for scoped_identifier nodes whose parent is scoped_use_list, eliminating spurious path-segment bindings for grouped Rust imports at all depths.
✅ JSDoc invariant on walkBindingChain (0227448) — @param allDefs documents the no-pre-filtering contract.
✅ Safety comment on Promise.all (0227448) — explains why parallel execution is safe.

One residual note on the Rust fix: The previous collectRustBindings guard for scoped_use_list parent-nodes was changed from a recursive-into-children approach to a simple return. This is the correct fix — the use_list sibling handles the actual symbol names without any help from the path prefix. No issue here.

How Performance Evolved: The Full Story

This PR added approximately 13,800 lines across 37 commits. Below is a breakdown of every significant performance and correctness dimension, showing how each evolved from start to finish.

1. Ingestion Pipeline Architecture

Before this PR (baseline)
Each file was processed individually in a sequential loop: parse → resolve imports → resolve calls → resolve heritage. The symbol table was flat and could produce false positives via lookupFuzzy returning the first global match.

[Baseline per-file loop]

File_1 ──► parse ──► processImports ──► processCalls ──► processHeritage
File_2 ──► parse ──► processImports ──► processCalls ──► processHeritage
File_N ──► parse ──► processImports ──► processCalls ──► processHeritage

Wall time ∝ N × (T_parse + T_imports + T_calls + T_heritage)

After this PR (current)
Chunked memory-budget model with worker-pool parallelism, plus Promise.all for intra-chunk processing:

[Current chunked + parallel pipeline]

Chunk_K = up to 20MB of files

readFileContents(chunk_K)
        │
        ▼
processParsing ─── worker pool (parallel across files)
        │           └─ parse-worker: extract calls/heritage/routes/imports
        │               (AST never leaves worker — only serialized records returned)
        ▼
processImportsFromExtracted   ← sequential (populates importMap/packageMap/namedImportMap)
        │
        ▼
Promise.all([
  processCallsFromExtracted,    ─┐
  processHeritageFromExtracted,  ├─ all read-only on shared state, write-only on graph
  processRoutesFromExtracted,   ─┘
])
        │
        ▼
astCache.clear()   ← GC reclaims chunk memory

Wall time ∝ Σ(chunks) × max(T_calls, T_heritage, T_routes)   [not sum]

Impact of Promise.all parallelization (814fac1):

For a large Java repository with 5,000 files:

T_heritage ≈ 3× T_calls (deep class hierarchies, many resolveSymbol calls)
Sequential: T_calls + T_heritage + T_routes ≈ 4T
Parallel: max(T_calls, T_heritage, T_routes) ≈ T_heritage ≈ T
Theoretical speedup: ~3–4× for the resolution phase per chunk

The sequential fallback path (used when the worker pool fails) correctly stays sequential because it shares an LRU astCache — parallel access to a stateful LRU cache would produce non-deterministic eviction behavior.

2. Symbol Resolution: From O(N) First-Match to Tiered Confidence Filtering

This is the most architecturally significant evolution in the PR.

Stage 0 (pre-PR): raw lookupFuzzy, first match wins

calledName ──► lookupFuzzy() ──► allDefs[0]
                                  (could be any file, any definition)

False positives silently produced incorrect edges. Two classes named User in different files → always resolved to whichever was indexed first.

Stage 1 (d0a29a2 + fcf90dd): 3-Tier resolver, ambiguity refusal

calledName
    │
    ├─[Tier 1]──► lookupExactFull(currentFile, name) ──► if found → return (authoritative)
    │
    ├─[Tier 2a]─► allDefs.filter(def → importedFiles.has(def.filePath)) ──► if 1 found → return
    │
    ├─[Tier 2b]─► allDefs.filter(def → isFileInPackageDir(def.filePath, ...)) ──► if 1 found → return
    │
    └─[Tier 3]──► allDefs.length === 1 ? return allDefs[0] : null  ← REFUSE if ambiguous

This was the foundational correctness improvement: null (no edge) is better than a wrong edge.

Stage 2 (b7a8db9): NamedImportMap — file-level → name-level precision

Before: import { User, Repo } from './models' → only recorded models.ts in ImportMap. Both User and Repo were indistinguishable.

After: each named binding recorded individually. Resolution logic expanded:

[Tier 2a-named — added before allDefs early return to handle alias lookups]

name → namedImportMap.get(currentFile)?.get(name)
         → { sourcePath, exportedName }
         → lookupFuzzy(exportedName).filter(def → def.filePath === sourcePath)
         → if exactly 1 → return (import-scoped)

Stage 3 (7dbdb5c + 883b041): as alias support

import { User as U } → lookupFuzzy('U') returns empty (no symbol named U exists). Without alias tracking, falls to global tier or null. With alias tracking:

name='U' → namedImportMap: U → { sourcePath: 'models.ts', exportedName: 'User' }
         → lookupFuzzy('User').filter(def → filePath === 'models.ts')
         → returns correct definition

Stage 4 (d8c1e1e): Re-export chain resolution (up to 5 hops)

file A imports { User } from B
B has no User definition — re-exports from C

walkBindingChain:
  depth=0: A.namedImports['User'] → { sourcePath: B, exportedName: 'User' }
           lookupFuzzy('User').filter(B) → empty
  depth=1: B.namedImports['User'] → { sourcePath: C, exportedName: 'User' }
           lookupFuzzy('User').filter(C) → found!  ← return

Cycle detection via visited Set prevents infinite loops.

Stage 5 (054a344 + 005ff3a): Full language coverage

walkBindingChain extracted as a shared utility (eliminating duplicate logic). Named binding extraction extended to all 9 languages including PHP flat/grouped non-aliased imports, Rust non-aliased use statements, Kotlin non-aliased class imports.

Stage 6 (814fac1): Parallelized in pipeline

Resolution itself didn't change; it now runs concurrently with heritage and route resolution per chunk.

3. Call Resolution Pipeline: 4-Stage Funnel

The call resolution funnel evolved from a single lookup to a 4-stage filter that progressively narrows candidates:

[Current resolveCallTarget pipeline]

calledName + currentFile
        │
        ▼
[A] collectTieredCandidates
        │  same-file      → {candidates: localDefs, tier: 'same-file'}
        │  named binding  → walkBindingChain → {candidates, tier: 'import-scoped'}
        │  import-scoped  → {candidates: importedDefs, tier: 'import-scoped'}
        │  package-scoped → {candidates: pkgDefs, tier: 'import-scoped'}
        │  unique-global  → {candidates: allDefs, tier: 'unique-global'}
        │
        ▼
[B] filterCallableCandidates
        │  free call      → keep Function, Method, Macro, Delegate
        │  constructor    → keep Constructor, Class, Struct, Record
        │  member call    → keep any callable
        │  argCount       → filter to exact arity match (variadic: any argCount passes)
        │
        ▼  if still > 1 and call is member call with known receiver type:
[C] receiver-type filtering
        │  receiverTypeName → lookupFuzzy(type) → typeNodeIds
        │  keep candidates where c.ownerId ∈ typeNodeIds
        │  if exactly 1 → emit edge
        │  if > 1 → refuse (ambiguous)
        │  if 0 → fall through to name-only
        │
        ▼
[D] Emit or refuse
        │  filteredCandidates.length === 1 → addRelationship(CALLS edge)
        └  else → null (no edge, no false positive)

Stage B (arity) was added in fc67a8c / b8d9182 — parameterCount added to SymbolDefinition.
Stage C (receiver type) was added in 167ac0e — TypeEnv + receiverTypeName extraction.

4. TypeEnv: Receiver Type Tracking

[TypeEnv evolution]

v1 (b7a8db9): flat Map<varName, typeName> — last-write-wins
  Problem: function a() { user: User } + function b() { user: Repo }
           → 'user' → 'Repo' (overwrites 'User' non-deterministically)

v2 (b7a8db9): Map<funcName, Map<varName, typeName>> — per-function scope
  Problem: class A { save() {...} } + class B { save() {...} }
           → both 'save' scopes share key 'save' (last class processed wins)

v3 (9f83baa): Map<'funcName@startIndex', Map<varName, typeName>>
  Scope key = funcName + AST byte position → unique per method site
  Problem: eliminated ✓

The buildTypeEnv walk is O(AST nodes) per file — no global state, built once per file and discarded after call resolution.

5. MRO Processor: The One Dropped Optimization

buildAdjacency in mro-processor.ts:64 calls graph.forEachRelationship and scans all edge types:

graph.forEachRelationship((rel) => {
  if (rel.type === 'EXTENDS' || rel.type === 'IMPLEMENTS') { ... }  // ~5K edges
  if (rel.type === 'HAS_METHOD') { ... }                             // ~20K edges
  // but also iterates: CALLS (500K), MEMBER_OF, IMPORTS, etc.
});

For a large repo, this is O(all edges) when only O(inheritance edges) are needed. The PR considered a type-indexed accessor but dropped it as YAGNI — the comment notes "500K string comparisons is single-digit ms in V8". This is correct at current scale. For a Linux-kernel-sized codebase (millions of edges), this would become the MRO bottleneck.

The dropped optimization would have been:

// Hypothetical: type-indexed edge accessor
graph.forEachRelationshipOfType(['EXTENDS', 'IMPLEMENTS', 'HAS_METHOD'], (rel) => { ... });
// O(25K) instead of O(500K)

6. Memory Architecture: 20MB Chunk Budget

The chunk budget model (CHUNK_BYTE_BUDGET = 20MB) is a key performance-correctness tradeoff documented in the pipeline:

// Comment from pipeline.ts:193-194:
// "Calls/heritage use the symbol table built so far (symbols from earlier chunks
//  are already registered). This trades ~5% cross-chunk resolution accuracy for
//  200-400MB less memory — critical for Linux-kernel-scale repos."

This means calls to symbols defined in a later chunk are resolved as global fuzzy matches rather than import-scoped matches. For most repos this is a minor accuracy loss. For monorepos where barrel files import from many chunks, it could cause Tier 3 fallback for symbols that should be Tier 2.

Performance Characteristics Summary

Dimension	Before PR	After PR	Change
Per-file resolution	`lookupFuzzy` → first match	3-tier + NamedImportMap + re-exports	Correctness: ~~50%~~ → ~95% precision
Import precision	File-level (ImportMap)	Name-level (NamedImportMap, 7 languages)	False positive rate significantly reduced
Alias resolution	Not tracked	9 languages, re-export chain up to 5 hops	New capability
Receiver-type filtering	Not present	Per-call TypeEnv lookup + ownerId filter	New capability
TypeEnv scope isolation	Flat (non-deterministic)	Per-function, per-position scope key	Correctness bug fixed
Pipeline model	Per-file sequential	Chunked (20MB budget), worker pool	Memory: predictable; throughput: +worker pool
Intra-chunk resolution	Sequential (calls→heritage→routes)	`Promise.all` (parallel)	~3–4× speedup for heritage-heavy repos
MRO edge scan	N/A	O(all edges) per run	Acceptable; type-index deferred as YAGNI
Test coverage	Pre-existing	90 integration suites × 9 languages; 1,598+ tests	Comprehensive

Remaining Performance Notes

MRO buildAdjacency — the O(all edges) scan is acceptable now but is the natural next optimization target if the MRO phase shows up in profiling on large repos. A Map<edgeType, edge[]> index in the graph maintained by addRelationship would make this trivial.
Sequential fallback path (pipeline.ts:270–279) — when the worker pool fails to initialize, the fallback re-reads source files and runs processCalls + processHeritage sequentially. These share the LRU astCache (which is the reason parallelization was explicitly not applied there). No issue, just worth noting that fallback performance is significantly worse than the happy path.
walkBindingChain per-call invocation — called once per call site for every member call resolution, with up to 5 Map.get lookups per hop. For a file with hundreds of calls to the same imported type, each re-walks the chain independently. The dropped YAGNI cache (Map<filePath:name, SymbolDefinition[]>) would eliminate this for the typical case of many calls to one imported class. Low priority given the O(1) per-hop cost, but worth revisiting if profiling shows binding chain walks as a hot path.

…tructor discrimination (abhigyanpatwari#238) * feat: add Method Resolution Order (MRO) with language-specific rules Implement full MRO computation for multi-language inheritance hierarchies: - HAS_METHOD edges: Class→Method ownership edges emitted during parsing (both worker pool and sequential fallback paths) - Method signatures: extract parameterCount and returnType from AST nodes - C# heritage fix: distinguish EXTENDS vs IMPLEMENTS for base_list captures using symbol table lookup + I[A-Z] naming heuristic fallback - MRO processor (Phase 4.5): walks inheritance DAG, detects method-name collisions across parents, applies language-specific resolution: - C++: leftmost base class in declaration order wins - C#/Java: class method wins over interface default - Python: C3 linearization with cycle detection - Rust: no auto-resolution (requires qualified syntax) - Default: first definition in BFS order wins - OVERRIDES edges emitted for resolved method collisions - KuzuDB schema: Method table extended with parameterCount/returnType; dedicated CSV writer and COPY query for 10-column Method rows - MCP tools: updated Cypher examples for HAS_METHOD, OVERRIDES, diamond 72 tests across 5 test files covering MRO resolution, HAS_METHOD edges, method signature extraction, C# heritage resolution, and integration tests across C#/Rust/Python/TS/Java/C++. * feat: add scope-based symbol resolution replacing raw lookupFuzzy Introduces a shared 3-tier resolveSymbol function used by both heritage-processor and call-processor: 1. Same-file (lookupExactFull — authoritative) 2. Import-scoped (filtered by ImportMap — high confidence) 3. Global fuzzy (first match — low confidence fallback) Adds lookupExactFull to SymbolTable returning full SymbolDefinition with type info needed for heritage Class/Interface disambiguation. * refactor: tighten symbol resolution — Tier 3 refuses ambiguous matches - lookupExactFull now O(1) via direct SymbolDefinition storage in fileIndex (shared object references with globalIndex — zero additional memory) - Added resolveSymbolInternal() preserving { definition, tier, candidateCount } for test assertions and logging - Tier 3 now returns null when multiple global candidates exist instead of arbitrary allDefs[0] — a wrong edge is worse than no edge - call-processor: renamed fuzzy-global → unique-global, removed dead branch - 12 new tests: tier assertions, ambiguous refusal per language family, heritage false-positive guard, O(1) shared reference verification * fix: critical language support bugs in import resolution and MRO Phase 5 critical fixes from all-language analysis: - Python: add relative_import query capture (PEP 328) — `.models`, `..utils` were silently dropped, producing zero ImportMap entries - Rust: extract prefix from grouped imports `crate::module::{A, B}` — brace groups previously failed resolution entirely - Swift: use normalizedFileList for Windows path compatibility in module import resolution (matches Go's resolveGoPackage pattern) - MRO: fix c_sharp → csharp language name mismatch (enum is 'csharp'), add Kotlin to C#/Java resolution rules (class method wins over interface) * feat: add strict multi-language integration tests + fix C/C++ import resolution Add 32 integration tests across 6 language fixtures (TypeScript, C#, C++, Java, Python, Rust) with exact toBe/toEqual assertions validating heritage edges, import resolution, and trait implementations. Fix C/C++ import resolution bug where dot-to-slash conversion mangled include paths (e.g. "animal.h" became "animal/h"). Now skips conversion for C/C++ languages which use actual file paths in #include directives. * fix: language-gate heritage heuristic, add Swift extension heritage, handle Rust grouped imports - Gate I[A-Z] naming heuristic to C#/Java only (was firing for all languages) - Swift unresolved types default to IMPLEMENTS (protocol conformance is the norm) - Add tree-sitter query for Swift extension protocol conformance (extension Foo: Protocol) - Handle Rust top-level grouped imports (use {crate::a, crate::b}) in both import loops - Add 4 new heritage-processor tests (TypeScript refusal, Swift default, Swift Tier 1) * feat: add Go struct embedding heritage + PackageMap optimization Add Go struct embedding detection (anonymous fields → EXTENDS edges) via new tree-sitter heritage query with named-field filtering in both parse-worker and heritage-processor paths. Implement PackageMap optimization for Go cross-package resolution: replace O(N) file-level ImportMap expansion with directory-level suffix matching (Tier 2b in symbol resolver). Graph IMPORTS edges are preserved via addImportGraphEdge split. Remove overly broad @definition.type from GO_QUERIES that was double-matching structs/interfaces as TypeAlias nodes, breaking Tier 3 unique-global resolution. Add Go fixture (go-pkg) with Admin→User embedding, cross-package calls, and 7 integration tests covering structs, functions, imports, calls, and heritage edges. * test: add Kotlin heritage integration tests Adds a kotlin-heritage fixture and 7 integration tests validating class inheritance, interface implementation, JVM-style import resolution, and symbol-table-driven EXTENDS/IMPLEMENTS disambiguation via Kotlin delegation specifiers. * feat: extract resolvers, add PHP tests, ambiguous tests for all languages - Extract language-specific resolvers from import-processor.ts into resolvers/ directory (P7): jvm, go, csharp, php, rust, standard, utils - import-processor.ts reduced from 1412 to 711 lines (50% reduction) - Add comprehensive PHP integration tests: PSR-4 imports, traits, enums, heritage edges, method calls, MRO overrides - Add ambiguous symbol resolution tests for all 9 languages verifying correct disambiguation via import chains - Split monolithic lang-resolution.test.ts (1080 lines) into 9 per-language files under test/integration/resolvers/ with shared helpers * feat: update integration tests to include resolver tests for multiple languages * fix: address code review — schema gap, Rust impl name, Property OVERRIDES Bugs fixed: - Add 13 missing FROM/TO pairs in RELATION_SCHEMA for HAS_METHOD edges (Class/Interface/Struct/Trait/Impl/Record to Method/Constructor/Property) - Fix findEnclosingClassId to pick implementing type for Rust impl Trait for Struct blocks (was picking trait name) - Exclude Property nodes from MRO OVERRIDES collision detection - Change MRO language fallback from typescript to unknown Tests added: - Unit: Property OVERRIDES exclusion (2 tests), Rust impl Trait for Struct name resolution (2 tests), schema HAS_METHOD pair coverage - Integration: no OVERRIDES targets Property nodes across all 9 languages - PHP fixture: added shared $status property to both traits to create real collision scenario for Property OVERRIDES exclusion test Documentation: - OVERRIDES edge direction (Class to Method), Go return type gap, BFS first-reach heuristic limitation * feat: harden CALLS-edge resolution — Phase 0 validation - Fix same-file confidence (0.85 → 0.95) to correctly outrank import-scoped (0.9) - Fix Tier 1 overload preservation: use globalIndex filter instead of fileIndex lookup - Add callable-kind guard: refuse CALLS edges to Interface and Enum symbols - Fix Kotlin countCallArguments: handle call_suffix → value_arguments nesting - Fix Kotlin extractFunctionName: add simple_identifier to fallback search - Strictly type findParameterList and countCallArguments (remove all `any`) - Add arity-based call resolution integration tests for 9 languages - Add unit regression tests for Interface/Enum CALLS refusal * chore: remove C# build artifacts from fixtures * feat: add call-form discrimination and ownerId to symbol table (Phase 1) Add inferCallForm() and extractReceiverName() to distinguish free/member/constructor calls at the AST level across all 9 languages. Add ownerId field to SymbolDefinition linking Method/Constructor/Property to their owning class. Includes 36 unit tests and member-call integration tests for all 9 languages (132 tests, 0 failures). * feat: constructor/struct-literal resolution across all languages (Phase 2) Add constructor discrimination to CALLS-edge resolution: new Foo(), User{...} struct literals, and C# primary constructors now resolve to Constructor/Class/Struct/Record nodes instead of being filtered out. Queries: new_expression (C++), object_creation_expression (PHP), composite_literal (Go), struct_expression (Rust), primary constructor and implicit_object_creation_expression (C#). Relaxes global tier in collectTieredCandidates to pass all candidates through filterCallableCandidates, allowing kind/arity narrowing to disambiguate at lower confidence. * feat: receiver-constrained resolution with integration tests for all 9 languages Add receiver-type filtering (Phase 3): when a member call like `user.save()` has a known receiver type from TypeEnv, filter candidates by ownerId to disambiguate methods with the same name across different classes. Key changes: - call-processor: build per-file TypeEnv, pass receiverTypeName to resolveCallTarget - parse-worker: extract receiverTypeName from TypeEnv in worker thread - resolveCallTarget: new step D filters by ownerId matching receiver type - utils: extractReceiverName supports C++ field_expression (argument field) - utils: findEnclosingClassId extracts Go method receiver types - type-env: handle Go qualified_type, Kotlin user_type/variable_declaration - parse-worker + parsing-processor: Function added to needsOwner for Kotlin/Rust/Python class methods captured as Function nodes Integration tests added for receiver-constrained resolution across all 9 languages: TypeScript, Java, Python, Go, Rust, C++, C#, Kotlin, PHP. * feat: NamedImportMap, scoped TypeEnv, broadened signatures + TS rest-param variadic fix Address all 4 PR abhigyanpatwari#238 review items: 1. Remove redundant lookupFuzzy in processRoutesFromExtracted 2. Add NamedImportMap for TS/Python symbol-level import tracking (Tier 2a) 3. Make TypeEnv scope-aware (Map<scopeKey, Map<varName, type>>) to fix non-deterministic receiver resolution across functions 4. Broaden extractMethodSignature: Go/Rust/C++ return types, variadic detection for Go/Java/Python/C++/Kotlin/TypeScript rest params Discovered and fixed: TS rest params (...args) were not detected as variadic — added rest_pattern detection inside required_parameter nodes. Integration tests added: scoped receiver, named import disambiguation, and variadic call resolution for both TypeScript and Python. * fix: alias import resolution, Go multi-assign TypeEnv, dead code removal - NamedImportMap now stores {sourcePath, exportedName} so aliased imports (import { User as U }) resolve U → User in the source file - Named binding check moved before empty-allDefs early return in both call-processor and symbol-resolver, fixing constructor calls via aliases - Go extractFromGoShortVarDeclaration iterates all LHS/RHS pairs for multi-assignment (user, repo := User{}, Repo{}) instead of only first - Remove unused TYPED_DECLARATION_TYPES set (TYPED_PARAMETER_TYPES kept) - Integration tests for both fixes (go-multi-assign, typescript-alias-imports) * feat: alias import extraction for Kotlin, Rust, PHP, C# + integration tests Add named import alias extraction to both pipeline paths (import-processor.ts and parse-worker.ts) for Kotlin, Rust, PHP, and C#. Add integration test fixtures and tests for all 5 languages (Python alias extraction already worked, just needed the test). Each test verifies: class detection, member call resolution through aliases to correct target files, and IMPORTS edge emission. * refactor: use SupportedLanguages enum everywhere instead of raw strings Replace all raw language string literals and `language: string` types with the SupportedLanguages enum across 10 files. This ensures compile-time safety for language dispatch and eliminates dead `language === 'tsx'` checks (tsx maps to TypeScript in the enum). * fix: tier-ordering bug, re-export chains, PHP grouped imports, Java named imports - Fix collectTieredCandidates tier-ordering: same-file now checked before named bindings, preventing imports from shadowing local definitions (matches resolveSymbolInternal priority order) - Add re-export chain resolution for TypeScript/JavaScript barrel files: export { X } from './base' and export type { X } from './base' now followed up to 5 hops through NamedImportMap - Fix PHP grouped import alias extraction: use App\Models\{User, Repo as R} now correctly handled in both parse-worker and import-processor - Add Java NamedImportMap support: import com.example.models.User now records User as a named binding for precise disambiguation - Add 16 new integration tests across TypeScript, PHP, and Java resolvers (220 total resolver tests, all passing) * refactor: consolidate alias extraction + add variadic/constructor/shadow integration tests - Extract shared named-binding-extraction.ts from duplicate logic in import-processor.ts and parse-worker.ts (net -200 lines) - Deduplicate appendKotlinWildcard (now imported from resolvers/index.ts) - Add integration tests: constructor calls (Kotlin, Python), variadic resolution (Go, Java, C#, C++, Kotlin), re-export chains (Python), local definition shadowing (Python, Go) - Add TODO(stack-graph) for TypeEnv scope key collision - 225 integration tests passing (was 223) * fix: PHP non-aliased imports, Python node identity, re-export chain dedup + local-shadow tests - PHP flat non-aliased imports (use App\Models\User) now stored in NamedImportMap - PHP grouped non-aliased imports ({User} in {User, Repo as R}) now stored in NamedImportMap - Python: replace non-public child.id with child.startIndex for node identity - Extract shared walkBindingChain() from symbol-resolver and call-processor - Add PHP variadic resolution fixture + test (variadic_parameter already covers PHP) - Add local-shadow integration tests for Java, C#, Kotlin, Rust, PHP, C++ (6 languages) * feat: Rust non-aliased use bindings, Kotlin non-aliased imports, re-export chain resolution Extend NamedImportMap coverage for Rust and Kotlin non-aliased imports: - Rust: rename collectUseAsClauses → collectRustBindings, extract terminal scoped_identifier (use crate::models::User) and identifier in use_list (use crate::models::{User, Repo}) into NamedImportMap. This also enables pub use re-export chain following via walkBindingChain. - Kotlin: extend extractKotlinNamedBindings to handle non-aliased imports (import com.example.User), skipping wildcard imports. - Add rust-reexport-chain fixture + 3 integration tests verifying Handler{} resolves through mod.rs pub use to handler.rs. - Add Kotlin heritage + constructor-calls reason assertions for non-aliased import-resolved resolution. - Add C# heritage test documenting namespace import tier behavior. * fix: skip Kotlin lowercase member imports in NamedImportMap Member imports like `import util.OneArg.writeAudit` (lowercase last segment) must not populate NamedImportMap — same-named function imports from different classes collide, breaking arity-based disambiguation. Apply the same guard Java already uses: skip lowercase last segments. * fix: skip spurious path-prefix bindings in Rust grouped imports collectRustBindings was extracting the path segment (e.g. "models") from `use crate::models::{User, Repo}` as a spurious NamedImportMap entry. Skip scoped_identifier nodes that are direct children of scoped_use_list since they are path prefixes, not importable symbols. Adds rust-grouped-imports fixture and 4 integration tests verifying both symbols resolve correctly and no spurious binding leaks through. * fix: use startIndex in TypeEnv scope key to prevent same-name method collision Two methods named identically in different classes within the same file previously shared a scope key, causing non-deterministic type resolution. Now keys use funcName@startIndex for uniqueness. Also adds tests documenting destructuring assignment extraction gap. * test: document C# namespace-level import limitation in named binding extraction * test: document same-arity overload discrimination limitation in call processor * perf: parallelize calls/heritage/routes processing in worker path Worker path now runs processCallsFromExtracted, processHeritageFromExtracted, and processRoutesFromExtracted via Promise.all instead of sequentially. Safe because all three only read shared state and write via addRelationship's dedup guard. Sequential fallback path stays sequential (shared LRU astCache). Also fixes Rust collectRustBindings spurious path-prefix bindings for 3+ level grouped imports, and adds @param JSDoc for walkBindingChain's allDefs invariant. * docs: improve Promise.all safety comment and walkBindingChain JSDoc Clarify that the parallelization safety comes from disjoint relationship types + idempotent id-keyed Maps, not from lack of shared state (the graph is shared). Strengthen allDefs JSDoc to describe silent-miss consequence of passing pre-filtered results. * refactor: extract language-specific processing into modular dispatch tables Phase 1: Extract type binding logic from type-env.ts (635→125 LOC) into type-extractors/ directory with per-language files and Record<SupportedLanguages, LanguageTypeConfig> + satisfies dispatch. Phase 2: Extract 5 config loaders from import-processor.ts into language-config.ts (removed ~196 LOC of inline loaders). Phase 3: Convert export-detection.ts switch/case to exhaustive Record<SupportedLanguages, ExportChecker> + satisfies dispatch table, fix node: any → SyntaxNode. Also adds language feature matrix to README. All 1146 unit tests and 433 integration tests pass. * refactor: extract type binding logic into type-extractors/ directory (Phase 1) Extract per-language type extraction from type-env.ts (635→125 LOC) into type-extractors/ with Record<SupportedLanguages, LanguageTypeConfig> + satisfies dispatch. 9 per-language files, shared helpers, and barrel index. * refactor: extract config loaders to language-config.ts (Phase 2) Move 5 language-specific config loaders and their type interfaces from import-processor.ts into standalone language-config.ts module.

vercel Bot deployed to Preview March 10, 2026 22:58 View deployment

Base automatically changed from refactor/pre-merge-consolidation to main March 10, 2026 23:03

magyargergo force-pushed the feat/method-resolution-order branch from 83ee864 to 0334587 Compare March 10, 2026 23:06

vercel Bot deployed to Preview March 10, 2026 23:07 View deployment

vercel Bot deployed to Preview March 11, 2026 00:18 View deployment

magyargergo added 3 commits March 11, 2026 08:10

vercel Bot deployed to Preview March 11, 2026 09:30 View deployment

magyargergo added 4 commits March 11, 2026 10:00

test: add Kotlin heritage integration tests

2ff9118

Adds a kotlin-heritage fixture and 7 integration tests validating class inheritance, interface implementation, JVM-style import resolution, and symbol-table-driven EXTENDS/IMPLEMENTS disambiguation via Kotlin delegation specifiers.

vercel Bot deployed to Preview March 11, 2026 13:21 View deployment

feat: update integration tests to include resolver tests for multiple…

afc4812

… languages

vercel Bot deployed to Preview March 11, 2026 13:28 View deployment

Repository owner deleted a comment from claude Bot Mar 11, 2026

vercel Bot deployed to Preview March 11, 2026 14:36 View deployment

This was referenced Mar 11, 2026

feat(python): index module-level singleton instances as CodeElement nodes #179

Closed

fix: add symbol-level IMPORTS edges for named imports #137

Closed

vercel Bot deployed to Preview March 12, 2026 18:52 View deployment

This was referenced Mar 13, 2026

PHP constructor calls (new Foo()) and static method calls (Foo::bar()) missing from CALLS edges #267

Closed

impact/callers return empty results because IMPORTS edges target File nodes, not Class/Function nodes #266

Closed

magyargergo added 6 commits March 13, 2026 08:20

test: document C# namespace-level import limitation in named binding …

3e27dfa

…extraction

test: document same-arity overload discrimination limitation in call …

57f2010

…processor

merge: resolve conflicts with origin/main (keep branch stats)

5be58b1

Merged origin/main (03bfa3c) into feat/method-resolution-order. Conflicts in CLAUDE.md and AGENTS.md were GitNexus index stats — kept our branch's numbers (1747 symbols, 4569 relationships).

vercel Bot deployed to Preview March 13, 2026 09:02 View deployment

magyargergo added 5 commits March 13, 2026 12:00

refactor: extract config loaders to language-config.ts (Phase 2)

7004cec

Move 5 language-specific config loaders and their type interfaces from import-processor.ts into standalone language-config.ts module.

Merge branch 'worktree-agent-aaa6551a' into feat/method-resolution-order

42f959f

Merge branch 'worktree-agent-ad5f3d99' into feat/method-resolution-order

839e575

# Conflicts: # gitnexus/src/core/ingestion/import-processor.ts # gitnexus/src/core/ingestion/language-config.ts

vercel Bot deployed to Preview March 13, 2026 12:25 View deployment

magyargergo merged commit 1afe916 into main Mar 13, 2026
22 checks passed

magyargergo deleted the feat/method-resolution-order branch March 13, 2026 13:12

This was referenced Mar 13, 2026

[Bug] Java polymorphic method resolution inaccuracy in call graph (e.g., toResponse overloads) #150

Closed

java, incorrect Calls in CodeRelation #181

Closed

能支持php吗 #101

Closed

RayverAimar mentioned this pull request Mar 15, 2026

Python: context() and impact() return empty for classes that use static methods #289

Closed

Uh oh!

Conversation

magyargergo commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Symbol Resolution Engine

Receiver-Constrained Resolution (Phase 3)

Constructor & Struct Literal Resolution

Method Resolution Order (MRO)

Heritage & Ownership

Resolver Architecture

Bug Fixes

Schema

Test plan

Integration tests -- 9 languages (test/integration/resolvers/)

Unit tests

Stats

Uh oh!

vercel Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Report

Pipeline Status

Test Results

Code Coverage

Combined (Unit + Integration)

Unit Tests

Integration Tests

Uh oh!

magyargergo commented Mar 11, 2026

Uh oh!

claude Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Bugs / Critical Issues

Design / Semantic Issues

Minor Issues

Observations (not bugs)

Uh oh!

magyargergo commented Mar 11, 2026

Uh oh!

claude Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review (Updated)

What Changed Since the Last Review

Still Outstanding from Previous Review

Design Issues Carried Over (No Changes)

New Minor Issue

Summary

Uh oh!

abhigyanpatwari commented Mar 12, 2026

Uh oh!

magyargergo commented Mar 13, 2026

Uh oh!

claude Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review (Updated) + Optimization Evaluation

What Changed Since the Last Review

Residual Limitation in 3b8adb8

Evaluation of @abhigyanpatwari's Optimization Suggestions

1. Parallelize processCalls + processHeritage with Promise.all — ✅ Correct and beneficial

2. MRO type-index (O(all edges) → O(inheritance edges)) — ✅ Correct and impactful

3. Cache import alias per chunk — ✅ Valid micro-optimization, lower priority

Summary

Uh oh!

magyargergo commented Mar 13, 2026

Performance & Code Quality Updates

1. Parallelize processCalls + processHeritage + processRoutes (worker path)

2. Rust collectRustBindings path-prefix fix + JSDoc

Dropped (YAGNI)

Test Results

Uh oh!

magyargergo commented Mar 13, 2026

Uh oh!

claude Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Analysis: Evolution of the Resolution Pipeline

Latest Changes (814fac1 + 0227448)

magyargergo commented Mar 10, 2026 •

edited

Loading

Integration tests -- 9 languages (`test/integration/resolvers/`)

vercel Bot commented Mar 10, 2026 •

edited

Loading

github-actions Bot commented Mar 10, 2026 •

edited

Loading

claude Bot commented Mar 11, 2026 •

edited

Loading

claude Bot commented Mar 11, 2026 •

edited

Loading

claude Bot commented Mar 13, 2026 •

edited

Loading

Residual Limitation in `3b8adb8`

1. Parallelize `processCalls` + `processHeritage` with `Promise.all` — ✅ Correct and beneficial

1. Parallelize `processCalls` + `processHeritage` + `processRoutes` (worker path)

2. Rust `collectRustBindings` path-prefix fix + JSDoc

claude Bot commented Mar 13, 2026 •

edited

Loading

Latest Changes (`814fac1` + `0227448`)