Skip to content

feat(ingestion): language-agnostic call extractor with config+factory pattern#877

Merged
magyargergo merged 7 commits into
mainfrom
copilot/create-language-agnostic-call-extractor
Apr 16, 2026
Merged

feat(ingestion): language-agnostic call extractor with config+factory pattern#877
magyargergo merged 7 commits into
mainfrom
copilot/create-language-agnostic-call-extractor

Conversation

Copilot AI commented Apr 16, 2026

Copy link
Copy Markdown
Contributor
  • Fix prettier formatting issues in 4 files flagged by CI quality / format check
  • Verified npx prettier --check . passes on all changed files
  • Verified ESLint passes (exit code 0, only pre-existing warnings)
  • Verified TypeScript typecheck passes
  • Fix misleading callExtractor doc comment in language-provider.ts:162 — changed from "falls back to inline extraction" to "if unset, no calls are extracted for this language. All tree-sitter providers MUST supply this."
  • Add doc comment in generic.ts noting that extractLanguageCallSite is called on both invocation paths, so hooks must be idempotent
  • Expand C# test to verify full behavioral extraction (calledName, callForm, receiverName, typeAsReceiverHeuristic) for Console.WriteLine() and add lowercase receiver test
  • Fix misleading test name: "does not set typeAsReceiverHeuristic" → "sets typeAsReceiverHeuristic flag even for lowercase receivers"
  • Remove stale reference to deleted extractParsedCallSite() in parse-worker comment
  • All prettier, ESLint, TypeScript typecheck pass

@vercel

vercel Bot commented Apr 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Apr 16, 2026 10:30am

Request Review

…e configs, and wire into providers

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
…all-processor, delete call-sites/

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Copilot AI changed the title [WIP] Add language-agnostic call extractor using config and factory pattern feat(ingestion): language-agnostic call extractor with config+factory pattern Apr 16, 2026
Copilot AI requested a review from magyargergo April 16, 2026 09:00
@magyargergo magyargergo marked this pull request as ready for review April 16, 2026 09:02
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude Act as a senior reviewer for GitNexus. Your job is to determine whether this PR is production-ready for this repo, not to give a generic code review.

You are reviewing a PR in the GitNexus monorepo:

  • gitnexus/ → CLI + MCP
  • gitnexus-web/
  • gitnexus-shared/

Your task has 2 phases, in this exact order:

PHASE 1 — DEFINE THE BAR
Before reviewing the diff, establish a concise repo-specific definition of “production-ready” for GitNexus, based only on the repo docs and the affected area.
Keep this definition practical and reviewable. Do not invent standards that are not grounded in the repo.

PHASE 2 — REVIEW THE PR AGAINST THAT BAR
Review the actual diff only after defining the bar.
Stay tightly scoped to the changed code and its direct consequences.


CONTEXT TO LOAD FIRST
Read these before reviewing:

  • AGENTS.md
  • GUARDRAILS.md
  • CONTRIBUTING.md
  • TESTING.md
  • ARCHITECTURE.md

Additional context:


PRIMARY OBJECTIVE
Decide whether this PR is safe, correct, maintainable, and operationally acceptable to merge into production for GitNexus.

Do not optimize for completeness at the expense of signal.
Do not pad the review.
Do not propose unrelated refactors.
Do not restate the PR description unless needed for verification.


REVIEW RULES

  • Every finding must be grounded in specific evidence from the diff or directly relevant surrounding code.
  • Every finding must include path:line.
  • If you make a behavioral claim, cite the code that proves it.
  • If you make a performance claim, explain the mechanism.
  • If something cannot be verified from the diff alone, explicitly say so.
  • Distinguish clearly between:
    • verified issue
    • plausible risk
    • unverified concern
  • Avoid vague wording like “might be better” or “could be improved” unless you explain exactly why.
  • Keep the review focused on this PR’s scope only.

For each finding, assign one severity:

  • BLOCKING → must be fixed before merge
  • NON-BLOCKING → valid issue, but merge may still be acceptable
  • NIT → stylistic/minor, not merge-relevant

REPO-SPECIFIC REVIEW CHECKLIST
Use these exact headings.

1. Correctness & functional completeness

Check:

  • Does the implementation actually satisfy the PR claim?
    • ManifestExtractor is truly invoked
    • config.links produces non-zero cross-links where expected
  • Resolver contracts are preserved:
    • resolveSymbol remains exact-match
    • label-scoped Cypher remains correct per contract type
    • flag any regression toward fuzzy or unscoped matching
  • Graph schema integrity is preserved:
    • no silent changes to node labels
    • no silent changes to edge types
    • no silent changes to ID generation (generateId)
  • Call out any missing wiring, partial integration, dead branch, or mismatch between tests and runtime behavior

2. Code clarity & clean code

Check:

  • naming quality
  • local cohesion
  • dead code
  • unnecessary abstraction
  • hidden control flow
  • confusing indirection
  • adherence to repo conventions:
    • direct imports from gitnexus-shared
    • no barrel re-export regressions
    • no // removed comments
    • no unused re-exports
  • no drive-by refactors outside stated scope per CONTRIBUTING.md and GUARDRAILS.md § Scope

3. Test coverage & change safety

Evaluate against TESTING.md:

  • Are there unit tests under gitnexus/test/unit/ covering the newly wired path?
  • Is there a regression guard for 0-link → N-link behavior?
  • Are assertions meaningful rather than tautological?
  • Are fixtures realistic for manifest inputs?
  • If memoization/cache was introduced, is there a test proving hit/miss behavior and correctness?
  • Is there evidence the expected validation path would pass for staged gitnexus/ files?
    • tsc --noEmit
    • vitest run --project default
      If not verifiable, say exactly what is missing.

4. Performance

Inspect for:

  • hot-path overhead in ingestion/group sync
  • excess allocations per manifest entry
  • redundant Cypher round-trips
  • missed batching or missed parallelism (Promise.all) where it materially matters
  • O(n²) or repeated lookup patterns on large repos
  • memoization tradeoffs:
    • correctness
    • invalidation
    • bounded vs unbounded memory growth
      Do not speculate casually; explain the mechanism and likely impact.

5. Operational risk

Check:

  • Windows/cross-platform safety:
    • stream lifecycle
    • FD/file handle lifecycle
    • path separator assumptions
    • anything resembling prior ENOTEMPTY-style lifecycle regressions
  • LadybugDB single-writer invariant is preserved
  • Embeddings preservation:
    • no silent breakage of --embeddings
    • .gitnexus/meta.json.stats.embeddings not silently zeroed by changed paths
  • MCP contracts remain compatible:
    • group_*
    • query
    • context
    • impact
    • detect_changes
    • rename
    • cypher
      Flag any schema or contract break without migration note
  • staleness behavior still triggers correctly (gitnexus/src/mcp/staleness.ts)
  • rollback safety:
    • can this PR be reverted safely without re-analyze?
    • if not, explain why

6. Maintainability

Check:

  • Does the change respect Pipeline Phase DAG boundaries?
  • Is cross-phase coupling being introduced?
  • Are manifest-extractor entry points discoverable from group/sync.ts?
  • Would the next contributor know where to extend this?
  • Is shared logic placed appropriately:
    • in gitnexus-shared/
    • or gitnexus/src/core/**/utils.ts
    • and not duplicated between CLI and web bridge (server/api.ts)?

NON-NEGOTIABLES TO VERIFY
From GUARDRAILS.md / CONTRIBUTING.md:

  • no secrets, tokens, or machine-specific paths committed
  • no blind find-and-replace renames
  • if a symbol rename occurred, rename MCP tool or equivalent careful review was used
  • upstream impact was considered for modified shared symbols
  • no HIGH/CRITICAL impact left unaddressed
  • PR title follows conventional commit style + [area] Short description
  • PolyForm Noncommercial license headers/metadata untouched

OUTPUT FORMAT

Production-ready bar for this PR

Give a concise repo-specific definition in 5–10 bullets.

Findings

For each finding, use this structure exactly:

[BLOCKING|NON-BLOCKING|NIT] Short title

  • Evidence: path:line, path:line
  • Why it matters: ...
  • Recommendation: ...
  • Confidence: High / Medium / Low

Only include findings that are real and supported.
If there are no findings in a section, say:

  • No issues found in this area based on the diff and reviewed context.

What I could not fully verify

List any behavior you could not prove from the diff, tests, or available context.

Final verdict

Choose exactly one:

  • ✅ Ready to merge — no blocking issues.
  • ⚠️ Merge after addressing blocking items.
  • ❌ Not ready — fundamental concerns.

If verdict is not ready, list the blocking items in priority order.


IMPORTANT REVIEW BEHAVIOR

  • Be strict on correctness and contracts.
  • Be conservative about operational risk.
  • Be explicit about uncertainty.
  • Do not suggest unrelated refactors.
  • Do not turn nits into blockers.
  • Do not reward intent; review actual implementation.

@github-actions

github-actions Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 6m 7s —— View job


PR Review: feat(ingestion): language-agnostic call extractor with config+factory pattern

Tasks:

  • Post initial comment
  • Read repo docs (AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md)
  • Read changed files in the diff
  • Phase 1: Define production-ready bar
  • Phase 2: Review PR against that bar

Production-ready bar for this PR

Based on AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, and ARCHITECTURE.md, a production-ready refactor of the parse-phase call extraction must satisfy:

  • Behavioral parity: All call sites that were previously emitted must still be emitted with the same calledName, callForm, receiverName, receiverTypeName, argCount, and mixed-chain structure. No silent drops, no new false edges.
  • Graph schema intact: No changes to node labels, edge types, or generateId call signatures.
  • All 15 providers wired: callExtractor must be defined on every tree-sitter LanguageProvider; an undefined extractor silently drops all calls from that language.
  • Unit tests under gitnexus/test/unit/ covering the factory, per-language extraction, and provider wiring. Real tree-sitter grammars; no mocked AST stubs for behavioral tests.
  • tsc --noEmit + vitest run pass in gitnexus/.
  • No drive-by refactors beyond the stated scope (GUARDRAILS.md § Scope).
  • No secrets, machine-specific paths, or PolyForm metadata changes.
  • Disclosed behavioral changes: Any output change to ExtractedCall fields must be described in the PR.

Findings

1. Correctness & functional completeness

## NON-BLOCKING Undisclosed behavioral change: typeAsReceiverHeuristic now applies to C# and Kotlin regular member calls in parse-worker

  • Evidence:
    • Old extractParsedCallSite (call-sites/extract-language-call-site.ts, deleted): only non-null for Java method_reference nodes; C# and Kotlin always returned null.
    • Old parse-worker generic path (commit d024119, lines ~1830–1877): no type-as-receiver heuristic at all for regular member calls.
    • New csharpCallConfig (call-extractors/configs/csharp.ts:8): typeAsReceiverHeuristic: true.
    • New kotlinCallConfig (call-extractors/configs/jvm.ts:57): typeAsReceiverHeuristic: true.
    • New parse-worker Path 2 (workers/parse-worker.ts:1836–1844): heuristic applied based on callSite.typeAsReceiverHeuristic, which is now true for C# and Kotlin.
  • Why it matters: For any C# or Kotlin member call where the receiver is an uppercase identifier not bound in TypeEnv (e.g., Console.WriteLine(), Utils.parse()), parse-worker will now set receiverTypeName = receiverName. In the old code receiverTypeName was always undefined here. This produces new CALLS edges in the graph for these call sites that did not previously exist. The old code listed C#/Kotlin in the heuristic check in call-processor.ts:930 but that code was unreachable (inside if (languageSeed) which was always false for those languages).
  • Recommendation: This is likely correct and arguably fixes dead code, but must be called out explicitly in the PR description. A targeted integration test or golden-diff test verifying that Console.WriteLine() in C# now produces a CALLS edge to Console.WriteLine would make this intentional and auditable.
  • Confidence: High.

## NON-BLOCKING Misleading callExtractor doc comment — "falls back to inline extraction" is false

  • Evidence: language-provider.ts:163:
    /** ...
     * Default: undefined (falls back to inline extraction). */
    readonly callExtractor?: CallExtractor;
    The inline extraction in parse-worker was deleted by this PR. If callExtractor is undefined, parse-worker.ts:1655–1876 is skipped entirely and continue fires — all calls for that language are silently dropped.
  • Why it matters: A future contributor adding a new language without wiring callExtractor will get zero call edges with no error, relying on this misleading comment to conclude it's safe.
  • Recommendation: Change the comment to reflect reality: Default: undefined — if unset, no calls are extracted for this language. All tree-sitter providers MUST supply this. The test at call-extraction.test.ts:132–155 is a good regression guard but the comment will mislead contributors before they even see the test.
  • Confidence: High.

## NIT Double invocation of extractLanguageCallSite per Java call node

  • Evidence: For each Java call node, the factory's extract() is called twice by parse-worker:
    1. callExtractor.extract(callNode, undefined) (Path 1, parse-worker.ts:1662) — factory calls parseJavaMethodReference(callNode) internally.
    2. callExtractor.extract(callNode, callNameNode) (Path 2, parse-worker.ts:1827) — factory calls parseJavaMethodReference(callNode) again before falling through to generic.
      See generic.ts:41–49: extractLanguageCallSite is always tried first, regardless of which parse-worker path invoked extract().
  • Why it matters: parseJavaMethodReference is a cheap single-field type check, so this is not a practical performance issue. But the two-call design leaks the factory's internal retry logic into the caller.
  • Recommendation: Consider documenting in generic.ts or call-types.ts that extractLanguageCallSite is called on both extract(callNode, undefined) and extract(callNode, callNameNode) invocations, so language hooks must be idempotent.
  • Confidence: High.

2. Code clarity & clean code

  • Naming is consistent with method-types.ts / field-types.ts / method-extractors/generic.ts. Factory-pattern naming (createCallExtractor, CallExtractionConfig, CallExtractor) is clear.
  • No barrel re-export regressions observed.
  • No // removed comments.
  • No unused re-exports.
  • No drive-by refactors outside stated scope.
  • The heuristic logic in parse-worker is duplicated between Path 1 (parse-worker.ts:1673–1682) and Path 2 (parse-worker.ts:1836–1845) — same charCodeAt(0) >= 65 && <= 90 check. This was partially present before; the new code adds a second site. NIT, not blocking.

3. Test coverage & change safety

## NON-BLOCKING No test for the new behavioral path: C#/Kotlin typeAsReceiverHeuristic in generic extraction

  • Evidence: call-extraction.test.ts:304–320 tests only that the C# extractor sets typeAsReceiverHeuristic: true on the config. It does not test that, given a C# member call like Console.WriteLine(), the heuristic actually fires and sets receiverTypeName. The 'all tree-sitter providers have callExtractor defined' test (:132–155) guards wiring, which is good. The Java :: tests (:420–495) are solid.
  • Why it matters: The behavioral change described in Finding Add support for Ollama as a local inference backend #1 is not guarded by any test that would catch a regression or confirm the new behavior is as intended.
  • Recommendation: Add a test for C# (and Kotlin if grammar is available) that verifies extractor.extract(callNode, callNameNode).typeAsReceiverHeuristic === true for a member call, and separately verify that parse-worker emits receiverTypeName for such a call.
  • Confidence: High.

4. Performance

  • No new Cypher round-trips.
  • No O(n²) patterns introduced.
  • The double extractLanguageCallSite call is a single field check — negligible.
  • No memoization introduced, no invalidation risk.
  • No new allocations per call node beyond the existing spread patterns.
  • No issues found.

5. Operational risk

  • No file-handle or stream lifecycle changes.
  • No path separator assumptions.
  • LadybugDB single-writer invariant unaffected — no new analyze/load paths.
  • No changes to .gitnexus/meta.json.stats.embeddings paths.
  • MCP contracts (query, context, impact, detect_changes, rename, cypher, group_*) are unaffected — this PR only changes how ExtractedCall objects are constructed, not how the graph is queried.
  • Staleness detection in gitnexus/src/mcp/staleness.ts unaffected.
  • Rollback safety: reverting this PR requires a npx gitnexus analyze re-run to rebuild the call graph, since call edges may differ (especially for C# and Kotlin). This is the expected cost for any call-extraction change — no special migration needed, and LadybugDB is rebuilt from scratch on each analyze.

6. Maintainability

  • Pipeline Phase DAG boundaries are respected: changes are strictly within the parse phase's worker and sequential fallback path.
  • Per-language configs are in a discoverable location (call-extractors/configs/), matching the established pattern for method-extractors/configs/ and field-extractors/configs/.
  • The callExtractor?: CallExtractor optional field (with misleading comment, see Finding Welcome to GitNexus Discussions! #2) is the primary maintainability trap.
  • No shared logic moved to wrong layer; call-analysis.ts utilities are unchanged and correctly referenced.

Non-negotiables

  • No secrets or machine-specific paths: None found.
  • No blind find-and-replace renames: The deleted call-sites/ files were replaced with new call-extractors/ files via structural refactor, not text replacement. ✅
  • No HIGH/CRITICAL impact left unaddressed: All 15 providers are wired. ✅
  • PR title follows conventional commit: feat(ingestion): …
  • PolyForm license untouched: ✅

What I could not fully verify

  1. tsc --noEmit and vitest run results: I cannot run the test suite or TypeScript compiler to confirm CI passes. The firewall blocks in the PR body suggest the agent was able to run unit tests but hit DNS blocks on other operations.
  2. call-processor.ts generic path heuristic gap: The current call-processor generic path (after the callExtractor.extract(callNode, undefined) null-return) does not call callExtractor.extract(callNode, nameNode) and therefore never reads typeAsReceiverHeuristic from the extractor. I traced this but could not run the code to confirm whether the separate class-as-receiver fallback (ctx.resolve(receiverName, ...)) compensates equivalently.
  3. Kotlin :: callable references: Kotlin's tree-sitter queries do not appear to capture callable_reference nodes as @call. I could not find a grammar test confirming this — so the lack of extractLanguageCallSite in kotlinCallConfig is based on query inspection, not runtime proof.

Final verdict

⚠️ Merge after addressing blocking items.

There are no blocking issues in the traditional sense — the implementation is correct for all currently-supported languages. However, the undisclosed behavioral change in typeAsReceiverHeuristic for C# and Kotlin (Finding #1) should be documented in the PR description before merge, since it silently changes the call graph for two languages. The misleading doc comment (Finding #2) and missing behavioral test (Finding #3) should be addressed to prevent the next contributor from inadvertently disabling call extraction for a new language.

Priority order:

  1. Document the C#/Kotlin typeAsReceiverHeuristic behavioral change in the PR description (or confirm it's intentional with a test).
  2. Fix the language-provider.ts:163 doc comment to remove "falls back to inline extraction".
  3. (Optional before merge) Add a test confirming the heuristic fires correctly for C# member calls in parse-worker Path 2.

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude Act as a senior reviewer for GitNexus. Your job is to determine whether this PR is production-ready for this repo, not to give a generic code review.

You are reviewing a PR in the GitNexus monorepo:

  • gitnexus/ → CLI + MCP
  • gitnexus-web/
  • gitnexus-shared/

Your task has 2 phases, in this exact order:

PHASE 1 — DEFINE THE BAR
Before reviewing the diff, establish a concise repo-specific definition of “production-ready” for GitNexus, based only on the repo docs and the affected area.
Keep this definition practical and reviewable. Do not invent standards that are not grounded in the repo.

PHASE 2 — REVIEW THE PR AGAINST THAT BAR
Review the actual diff only after defining the bar.
Stay tightly scoped to the changed code and its direct consequences.


CONTEXT TO LOAD FIRST
Read these before reviewing:

  • AGENTS.md
  • GUARDRAILS.md
  • CONTRIBUTING.md
  • TESTING.md
  • ARCHITECTURE.md

Additional context:


PRIMARY OBJECTIVE
Decide whether this PR is safe, correct, maintainable, and operationally acceptable to merge into production for GitNexus.

Do not optimize for completeness at the expense of signal.
Do not pad the review.
Do not propose unrelated refactors.
Do not restate the PR description unless needed for verification.


REVIEW RULES

  • Every finding must be grounded in specific evidence from the diff or directly relevant surrounding code.
  • Every finding must include path:line.
  • If you make a behavioral claim, cite the code that proves it.
  • If you make a performance claim, explain the mechanism.
  • If something cannot be verified from the diff alone, explicitly say so.
  • Distinguish clearly between:
    • verified issue
    • plausible risk
    • unverified concern
  • Avoid vague wording like “might be better” or “could be improved” unless you explain exactly why.
  • Keep the review focused on this PR’s scope only.

For each finding, assign one severity:

  • BLOCKING → must be fixed before merge
  • NON-BLOCKING → valid issue, but merge may still be acceptable
  • NIT → stylistic/minor, not merge-relevant

REPO-SPECIFIC REVIEW CHECKLIST
Use these exact headings.

1. Correctness & functional completeness

Check:

  • Does the implementation actually satisfy the PR claim?
    • ManifestExtractor is truly invoked
    • config.links produces non-zero cross-links where expected
  • Resolver contracts are preserved:
    • resolveSymbol remains exact-match
    • label-scoped Cypher remains correct per contract type
    • flag any regression toward fuzzy or unscoped matching
  • Graph schema integrity is preserved:
    • no silent changes to node labels
    • no silent changes to edge types
    • no silent changes to ID generation (generateId)
  • Call out any missing wiring, partial integration, dead branch, or mismatch between tests and runtime behavior

2. Code clarity & clean code

Check:

  • naming quality
  • local cohesion
  • dead code
  • unnecessary abstraction
  • hidden control flow
  • confusing indirection
  • adherence to repo conventions:
    • direct imports from gitnexus-shared
    • no barrel re-export regressions
    • no // removed comments
    • no unused re-exports
  • no drive-by refactors outside stated scope per CONTRIBUTING.md and GUARDRAILS.md § Scope

3. Test coverage & change safety

Evaluate against TESTING.md:

  • Are there unit tests under gitnexus/test/unit/ covering the newly wired path?
  • Is there a regression guard for 0-link → N-link behavior?
  • Are assertions meaningful rather than tautological?
  • Are fixtures realistic for manifest inputs?
  • If memoization/cache was introduced, is there a test proving hit/miss behavior and correctness?
  • Is there evidence the expected validation path would pass for staged gitnexus/ files?
    • tsc --noEmit
    • vitest run --project default
      If not verifiable, say exactly what is missing.

4. Performance

Inspect for:

  • hot-path overhead in ingestion/group sync
  • excess allocations per manifest entry
  • redundant Cypher round-trips
  • missed batching or missed parallelism (Promise.all) where it materially matters
  • O(n²) or repeated lookup patterns on large repos
  • memoization tradeoffs:
    • correctness
    • invalidation
    • bounded vs unbounded memory growth
      Do not speculate casually; explain the mechanism and likely impact.

5. Operational risk

Check:

  • Windows/cross-platform safety:
    • stream lifecycle
    • FD/file handle lifecycle
    • path separator assumptions
    • anything resembling prior ENOTEMPTY-style lifecycle regressions
  • LadybugDB single-writer invariant is preserved
  • Embeddings preservation:
    • no silent breakage of --embeddings
    • .gitnexus/meta.json.stats.embeddings not silently zeroed by changed paths
  • MCP contracts remain compatible:
    • group_*
    • query
    • context
    • impact
    • detect_changes
    • rename
    • cypher
      Flag any schema or contract break without migration note
  • staleness behavior still triggers correctly (gitnexus/src/mcp/staleness.ts)
  • rollback safety:
    • can this PR be reverted safely without re-analyze?
    • if not, explain why

6. Maintainability

Check:

  • Does the change respect Pipeline Phase DAG boundaries?
  • Is cross-phase coupling being introduced?
  • Are manifest-extractor entry points discoverable from group/sync.ts?
  • Would the next contributor know where to extend this?
  • Is shared logic placed appropriately:
    • in gitnexus-shared/
    • or gitnexus/src/core/**/utils.ts
    • and not duplicated between CLI and web bridge (server/api.ts)?

NON-NEGOTIABLES TO VERIFY
From GUARDRAILS.md / CONTRIBUTING.md:

  • no secrets, tokens, or machine-specific paths committed
  • no blind find-and-replace renames
  • if a symbol rename occurred, rename MCP tool or equivalent careful review was used
  • upstream impact was considered for modified shared symbols
  • no HIGH/CRITICAL impact left unaddressed
  • PR title follows conventional commit style + [area] Short description
  • PolyForm Noncommercial license headers/metadata untouched

OUTPUT FORMAT

Production-ready bar for this PR

Give a concise repo-specific definition in 5–10 bullets.

Findings

For each finding, use this structure exactly:

[BLOCKING|NON-BLOCKING|NIT] Short title

  • Evidence: path:line, path:line
  • Why it matters: ...
  • Recommendation: ...
  • Confidence: High / Medium / Low

Only include findings that are real and supported.
If there are no findings in a section, say:

  • No issues found in this area based on the diff and reviewed context.

What I could not fully verify

List any behavior you could not prove from the diff, tests, or available context.

Final verdict

Choose exactly one:

  • ✅ Ready to merge — no blocking issues.
  • ⚠️ Merge after addressing blocking items.
  • ❌ Not ready — fundamental concerns.

If verdict is not ready, list the blocking items in priority order.


IMPORTANT REVIEW BEHAVIOR

  • Be strict on correctness and contracts.
  • Be conservative about operational risk.
  • Be explicit about uncertainty.
  • Do not suggest unrelated refactors.
  • Do not turn nits into blockers.
  • Do not reward intent; review actual implementation.

@github-actions

github-actions Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 3m 22s —— View job


PR Review: feat(ingestion): language-agnostic call extractor with config+factory pattern

Tasks:

  • Post initial comment
  • Read repo docs (AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md)
  • Read changed files in the diff
  • Phase 1: Define production-ready bar
  • Phase 2: Review PR against that bar

Production-ready bar for this PR

Based on AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, and ARCHITECTURE.md § Pipeline Phase DAG:

  • Behavioral parity: All call sites previously emitted must still be emitted with the same calledName, callForm, receiverName, receiverTypeName, argCount, and mixed-chain structure. No silent drops, no new false edges — or any behavioral change must be explicitly disclosed.
  • Graph schema intact: No changes to node labels, edge types, or generateId call signatures.
  • All tree-sitter providers wired: callExtractor must be defined on every tree-sitter LanguageProvider; an undefined extractor silently drops all calls from that language per the new code path.
  • Unit tests under gitnexus/test/unit/ covering factory construction, per-language behavioral extraction, and provider wiring. Real tree-sitter grammars (no AST stubs) for behavioral tests.
  • tsc --noEmit + vitest run must pass in gitnexus/.
  • No drive-by refactors beyond the stated scope (GUARDRAILS.md § Scope, CONTRIBUTING.md).
  • No secrets, machine-specific paths, or PolyForm metadata changes.
  • Pipeline Phase DAG boundaries respected: changes confined to the parse phase worker and call-processor; no cross-phase coupling introduced.

Findings

1. Correctness & functional completeness

## NON-BLOCKING call-processor Path 2 bypasses callExtractortypeAsReceiverHeuristic asymmetry

  • Evidence: call-processor.ts:987–1130 — after the if (callExtractor) block (Path 1, line 912–985), the generic path uses its own inline inferCallForm (line 1062) and extractReceiverName (line 1063), not callExtractor.extract(callNode, callNameNode). There is no reference to callSite.typeAsReceiverHeuristic in this path. Instead the call-processor uses ctx.resolve(receiverName, file.path) class lookup (lines 1116–1130) as the type-as-receiver fallback.
  • Why it matters: The PR's stated goal is a "language-agnostic" call extractor, but in call-processor.ts the typeAsReceiverHeuristic from the extractor config applies only to Path 1 (language-specific captures like Java ::). For regular C# member calls processed through call-processor's generic path, type-as-receiver is resolved via the graph (ctx.resolve()), which has different semantics than the parse-worker's uppercase-letter heuristic. The two paths will agree on fully-indexed repos where Console resolves as a class; they may disagree on partially-indexed or first-pass repos. This asymmetry is not documented.
  • Recommendation: Document the intentional split: the factory pattern is adopted for language-specific call shape extraction (Path 1) but call-processor's generic path retains its own richer resolution logic. A comment at call-processor.ts:987 explaining why callExtractor is not used for Path 2 here would prevent future contributors from introducing a misaligned refactor.
  • Confidence: High (verified from call-processor.ts:912–1130 and parse-worker.ts:1824).

## NON-BLOCKING No end-to-end test verifying parse-worker emits receiverTypeName for C# uppercase receivers

  • Evidence: call-extraction.test.ts:303–335 verifies the extractor returns typeAsReceiverHeuristic: true and receiverName: 'Console' for Console.WriteLine(). But no test verifies that parse-worker Path 2 (lines 1833–1842) actually fires the heuristic and sets receiverTypeName in the result.calls push. The test coverage confirms the extractor-layer output but stops short of the worker-layer emission.
  • Why it matters: The heuristic check at parse-worker.ts:1834–1842 (callSite.typeAsReceiverHeuristic && c0 >= 65 && c0 <= 90) is a separate conditional from what the test exercises. A future change that strips typeAsReceiverHeuristic from the generic path in the factory would pass all current tests while silently breaking receiverTypeName emission for C# calls in the worker.
  • Recommendation: Add one integration-level assertion in the test that calls result.calls emission logic (or a slim parse-worker harness test) verifying receiverTypeName === 'Console' for Console.WriteLine() in C#. This is a safety net, not a blocker.
  • Confidence: High.

2. Code clarity & clean code

  • Naming (CallExtractionConfig, CallExtractor, createCallExtractor) is consistent with method-types.ts / field-types.ts / method-extractors/generic.ts. Factory pattern is clear and follows established repo convention.
  • language-provider.ts:162–163: doc comment now correctly reads "if unset, no calls are extracted for this language. All tree-sitter providers MUST supply this." — the misleading "falls back to inline extraction" text from the original PR was fixed in commit baa61e1. ✅
  • generic.ts:39–43: idempotency note added for extractLanguageCallSite — correct and useful. ✅
  • No barrel re-export regressions, no // removed comments, no unused re-exports.
  • No drive-by refactors outside the stated scope.
  • The duplicate charCodeAt(0) >= 65 && <= 90 check at parse-worker Path 1 (line 1677) and Path 2 (line 1841) is a pre-existing pattern, not introduced by this PR.

3. Test coverage & change safety

  • call-extraction.test.ts — 527 lines, uses real tree-sitter grammars. Tests cover: factory construction (all 13 configs), provider wiring (15 tree-sitter languages), generic extraction (TypeScript, Python, Java, C#, Go, Rust, C++, PHP, Ruby), Java :: method references (4 cases), typeAsReceiverHeuristic config flags (all languages). This is solid.
  • C# behavioral tests (commit baa61e1): Console.WriteLine() full extraction and lowercase receiver test both present and correct.
  • Missing: Kotlin behavioral test (optional — grammar is marked optional in AGENTS.md; no grammar in test imports). Acceptable.
  • Missing: parse-worker Path 2 end-to-end receiverTypeName emission for C# (see Finding above).
  • Provider wiring test at line 136–158 covers all 15 tree-sitter languages including Vue. Java is correctly included.
  • The satisfies Record<SupportedLanguages, LanguageProvider> guard in languages/index.ts provides compile-time exhaustiveness, which is the repo's established safety mechanism.

4. Performance

  • No new Cypher round-trips. No O(n²) patterns. No redundant allocations per call node beyond pre-existing spread patterns.
  • parseJavaMethodReference (jvm.ts:15) is O(childCount) — single-pass child scan. Called at most twice per Java call node (Path 1 + potential generic path call), which is cheap.
  • No memoization introduced, no invalidation risk.
  • No issues found in this area.

5. Operational risk

  • No file-handle, stream, or path separator changes.
  • LadybugDB single-writer invariant unaffected — no new analyze/load paths.
  • Embeddings: no changes to --embeddings path, meta.json stats, or CSV streaming in run-analyze.ts. ✅
  • MCP contracts (query, context, impact, detect_changes, rename, cypher, group_*): unaffected. Changes are internal to call site construction in the parse phase, not to graph schema or query layer. ✅
  • Staleness detection (staleness.ts) unaffected.
  • Rollback safety: reverting requires npx gitnexus analyze re-run to rebuild call graph. Expected cost for any call-extraction change; no schema migration needed. LadybugDB rebuilt from scratch.

6. Maintainability

  • Pipeline Phase DAG boundaries respected: all changes confined to parse phase worker and call-processor. No cross-phase coupling. ✅
  • Per-language configs in call-extractors/configs/ match the established pattern for method-extractors/configs/ and field-extractors/configs/. New contributors will know where to add a config for a new language.
  • The callExtractor?: CallExtractor field on LanguageProvider is now clearly documented and the doc comment is accurate.
  • The asymmetry between parse-worker (uses factory for Path 2) and call-processor (own inline Path 2) is a maintainability risk if not documented, since the two paths look similar but diverge on receiver type resolution semantics.

Non-negotiables

  • No secrets or machine-specific paths: None found. ✅
  • No blind find-and-replace renames: Structural refactor via new files + deletion of call-sites/. ✅
  • No HIGH/CRITICAL impact left unaddressed: All 15 tree-sitter providers wired. ✅
  • PR title follows conventional commit: feat(ingestion): …
  • PolyForm license untouched: ✅

What I could not fully verify

  1. tsc --noEmit and vitest run results: Cannot execute the test suite or TypeScript compiler locally. Based on diff inspection, the types are consistent and the code compiles structurally.
  2. Kotlin behavioral extraction at runtime: Kotlin grammar is optional and not imported in the test file. Cannot verify that kotlinCallConfig correctly extracts member calls against real Kotlin AST nodes.
  3. call-processor Path 2 ctx.resolve() fallback coverage: I traced that the fallback at line 1116–1130 covers the type-as-receiver case for call-processor, but cannot run a cross-language call graph to confirm equivalence with parse-worker's uppercase-letter heuristic for C# static calls.

Final verdict

Ready to merge — no blocking issues.

The initial review from the previous pass identified three concerns (misleading doc comment, missing C# behavioral test, undisclosed typeAsReceiverHeuristic behavioral change). All three were addressed in subsequent commits (baa61e1, 6a71450). The remaining findings are NON-BLOCKING:

  • The call-processor Path 2 asymmetry is pre-existing architectural design (not introduced by this PR) and the ctx.resolve() fallback provides equivalent or richer coverage for the general case.
  • The missing parse-worker end-to-end receiverTypeName emission test is a safety net, not a correctness gap.

The factory/config pattern is cleanly implemented, follows established repo conventions, and all 15 tree-sitter providers are correctly wired with compile-time safety.


@magyargergo magyargergo merged commit 03821fa into main Apr 16, 2026
19 checks passed
@magyargergo magyargergo deleted the copilot/create-language-agnostic-call-extractor branch April 16, 2026 10:45
magyargergo added a commit that referenced this pull request Apr 18, 2026
Bumps gitnexus to v1.6.2 and adds the matching CHANGELOG entry.

Highlights since v1.6.1 (61 commits):
  - Docker support (#848)
  - Language-agnostic heritage / call / variable extractors
    (config+factory pattern, #877 #878 #890)
  - AST-aware embedding chunking (#889)
  - jQuery / axios HTTP consumer detection (#887)
  - SemanticModel wired as first-class resolution input, SM-20 (#885)
  - ImportSemantics split into per-strategy hooks (#886)
  - Python dotted-import fix (#899); worker warnings non-terminal
    (#900 / #261); global-install ENOTEMPTY fixes (#843 #846);
    embeddings staleness fix (#831)

See gitnexus/CHANGELOG.md for the full list.

After merge, tag `v1.6.2` triggers publish.yml which runs CI,
verifies tag↔package.json match, publishes to npm with provenance,
and creates the GitHub Release using the extracted CHANGELOG body.
@magyargergo magyargergo mentioned this pull request Apr 18, 2026
5 tasks
magyargergo added a commit that referenced this pull request Apr 18, 2026
Bumps gitnexus to v1.6.2 and adds the matching CHANGELOG entry.

Highlights since v1.6.1 (61 commits):
  - Docker support (#848)
  - Language-agnostic heritage / call / variable extractors
    (config+factory pattern, #877 #878 #890)
  - AST-aware embedding chunking (#889)
  - jQuery / axios HTTP consumer detection (#887)
  - SemanticModel wired as first-class resolution input, SM-20 (#885)
  - ImportSemantics split into per-strategy hooks (#886)
  - Python dotted-import fix (#899); worker warnings non-terminal
    (#900 / #261); global-install ENOTEMPTY fixes (#843 #846);
    embeddings staleness fix (#831)

See gitnexus/CHANGELOG.md for the full list.

After merge, tag `v1.6.2` triggers publish.yml which runs CI,
verifies tag↔package.json match, publishes to npm with provenance,
and creates the GitHub Release using the extracted CHANGELOG body.
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
… pattern (abhigyanpatwari#877)

* Initial plan

* feat(ingestion): add call-types, call-extractors factory, per-language configs, and wire into providers

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* feat(ingestion): replace inline call extraction in parse-worker and call-processor, delete call-sites/

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* test(ingestion): add unit tests for call extraction configs and factory

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* style: fix prettier formatting in call-extractor files

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/d4b06f56-03b6-4fa4-801f-7ddcc6e81f13

* fix: address review comments — doc comment, idempotency note, C# behavioral test

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/e53e650b-fae6-4551-ab25-cda28e4d647f

* fix: rename misleading test title, remove stale code reference in comment

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/e53e650b-fae6-4551-ab25-cda28e4d647f

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
Bumps gitnexus to v1.6.2 and adds the matching CHANGELOG entry.

Highlights since v1.6.1 (61 commits):
  - Docker support (abhigyanpatwari#848)
  - Language-agnostic heritage / call / variable extractors
    (config+factory pattern, abhigyanpatwari#877 abhigyanpatwari#878 abhigyanpatwari#890)
  - AST-aware embedding chunking (abhigyanpatwari#889)
  - jQuery / axios HTTP consumer detection (abhigyanpatwari#887)
  - SemanticModel wired as first-class resolution input, SM-20 (abhigyanpatwari#885)
  - ImportSemantics split into per-strategy hooks (abhigyanpatwari#886)
  - Python dotted-import fix (abhigyanpatwari#899); worker warnings non-terminal
    (abhigyanpatwari#900 / abhigyanpatwari#261); global-install ENOTEMPTY fixes (abhigyanpatwari#843 abhigyanpatwari#846);
    embeddings staleness fix (abhigyanpatwari#831)

See gitnexus/CHANGELOG.md for the full list.

After merge, tag `v1.6.2` triggers publish.yml which runs CI,
verifies tag↔package.json match, publishes to npm with provenance,
and creates the GitHub Release using the extracted CHANGELOG body.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(ingestion): language-agnostic call extractor with config+factory pattern

2 participants