feat(ingestion): language-agnostic call extractor with config+factory pattern by Copilot · Pull Request #877 · abhigyanpatwari/GitNexus

Copilot · 2026-04-16T08:36:01Z

vercel · 2026-04-16T08:36:07Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
gitnexus	Ready	Preview, Comment	Apr 16, 2026 10:30am

…e configs, and wire into providers Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36 Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

…all-processor, delete call-sites/ Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36 Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36 Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

magyargergo · 2026-04-16T09:13:11Z

@claude Act as a senior reviewer for GitNexus. Your job is to determine whether this PR is production-ready for this repo, not to give a generic code review.

You are reviewing a PR in the GitNexus monorepo:

gitnexus/ → CLI + MCP
gitnexus-web/
gitnexus-shared/

Your task has 2 phases, in this exact order:

PHASE 1 — DEFINE THE BAR
Before reviewing the diff, establish a concise repo-specific definition of “production-ready” for GitNexus, based only on the repo docs and the affected area.
Keep this definition practical and reviewable. Do not invent standards that are not grounded in the repo.

PHASE 2 — REVIEW THE PR AGAINST THAT BAR
Review the actual diff only after defining the bar.
Stay tightly scoped to the changed code and its direct consequences.

CONTEXT TO LOAD FIRST
Read these before reviewing:

AGENTS.md
GUARDRAILS.md
CONTRIBUTING.md
TESTING.md
ARCHITECTURE.md

Additional context:

If the diff touches ingestion, also read: ARCHITECTURE.md § Pipeline Phase DAG
PR [group/sync] Fix ManifestExtractor never called — config.links always produced 0 cross-links #827 diff area is primarily: gitnexus/src/core/group/ (sync + manifest extractor) plus associated tests

PRIMARY OBJECTIVE
Decide whether this PR is safe, correct, maintainable, and operationally acceptable to merge into production for GitNexus.

Do not optimize for completeness at the expense of signal.
Do not pad the review.
Do not propose unrelated refactors.
Do not restate the PR description unless needed for verification.

REVIEW RULES

Every finding must be grounded in specific evidence from the diff or directly relevant surrounding code.
Every finding must include path:line.
If you make a behavioral claim, cite the code that proves it.
If you make a performance claim, explain the mechanism.
If something cannot be verified from the diff alone, explicitly say so.
Distinguish clearly between:
- verified issue
- plausible risk
- unverified concern
Avoid vague wording like “might be better” or “could be improved” unless you explain exactly why.
Keep the review focused on this PR’s scope only.

For each finding, assign one severity:

BLOCKING → must be fixed before merge
NON-BLOCKING → valid issue, but merge may still be acceptable
NIT → stylistic/minor, not merge-relevant

REPO-SPECIFIC REVIEW CHECKLIST
Use these exact headings.

1. Correctness & functional completeness

Check:

Does the implementation actually satisfy the PR claim?
- ManifestExtractor is truly invoked
- config.links produces non-zero cross-links where expected
Resolver contracts are preserved:
- resolveSymbol remains exact-match
- label-scoped Cypher remains correct per contract type
- flag any regression toward fuzzy or unscoped matching
Graph schema integrity is preserved:
- no silent changes to node labels
- no silent changes to edge types
- no silent changes to ID generation (generateId)
Call out any missing wiring, partial integration, dead branch, or mismatch between tests and runtime behavior

2. Code clarity & clean code

Check:

naming quality
local cohesion
dead code
unnecessary abstraction
hidden control flow
confusing indirection
adherence to repo conventions:
- direct imports from gitnexus-shared
- no barrel re-export regressions
- no // removed comments
- no unused re-exports
no drive-by refactors outside stated scope per CONTRIBUTING.md and GUARDRAILS.md § Scope

3. Test coverage & change safety

Evaluate against TESTING.md:

Are there unit tests under gitnexus/test/unit/ covering the newly wired path?
Is there a regression guard for 0-link → N-link behavior?
Are assertions meaningful rather than tautological?
Are fixtures realistic for manifest inputs?
If memoization/cache was introduced, is there a test proving hit/miss behavior and correctness?
Is there evidence the expected validation path would pass for staged gitnexus/ files?
- tsc --noEmit
- vitest run --project default
  If not verifiable, say exactly what is missing.

4. Performance

Inspect for:

hot-path overhead in ingestion/group sync
excess allocations per manifest entry
redundant Cypher round-trips
missed batching or missed parallelism (Promise.all) where it materially matters
O(n²) or repeated lookup patterns on large repos
memoization tradeoffs:
- correctness
- invalidation
- bounded vs unbounded memory growth
  Do not speculate casually; explain the mechanism and likely impact.

5. Operational risk

Check:

Windows/cross-platform safety:
- stream lifecycle
- FD/file handle lifecycle
- path separator assumptions
- anything resembling prior ENOTEMPTY-style lifecycle regressions
LadybugDB single-writer invariant is preserved
Embeddings preservation:
- no silent breakage of --embeddings
- .gitnexus/meta.json.stats.embeddings not silently zeroed by changed paths
MCP contracts remain compatible:
- group_*
- query
- context
- impact
- detect_changes
- rename
- cypher
  Flag any schema or contract break without migration note
staleness behavior still triggers correctly (gitnexus/src/mcp/staleness.ts)
rollback safety:
- can this PR be reverted safely without re-analyze?
- if not, explain why

6. Maintainability

Check:

Does the change respect Pipeline Phase DAG boundaries?
Is cross-phase coupling being introduced?
Are manifest-extractor entry points discoverable from group/sync.ts?
Would the next contributor know where to extend this?
Is shared logic placed appropriately:
- in gitnexus-shared/
- or gitnexus/src/core/**/utils.ts
- and not duplicated between CLI and web bridge (server/api.ts)?

NON-NEGOTIABLES TO VERIFY
From GUARDRAILS.md / CONTRIBUTING.md:

no secrets, tokens, or machine-specific paths committed
no blind find-and-replace renames
if a symbol rename occurred, rename MCP tool or equivalent careful review was used
upstream impact was considered for modified shared symbols
no HIGH/CRITICAL impact left unaddressed
PR title follows conventional commit style + [area] Short description
PolyForm Noncommercial license headers/metadata untouched

OUTPUT FORMAT

Production-ready bar for this PR

Give a concise repo-specific definition in 5–10 bullets.

Findings

For each finding, use this structure exactly:

[BLOCKING|NON-BLOCKING|NIT] Short title

Evidence: path:line, path:line
Why it matters: ...
Recommendation: ...
Confidence: High / Medium / Low

Only include findings that are real and supported.
If there are no findings in a section, say:

No issues found in this area based on the diff and reviewed context.

What I could not fully verify

List any behavior you could not prove from the diff, tests, or available context.

Final verdict

Choose exactly one:

✅ Ready to merge — no blocking issues.
⚠️ Merge after addressing blocking items.
❌ Not ready — fundamental concerns.

If verdict is not ready, list the blocking items in priority order.

IMPORTANT REVIEW BEHAVIOR

Be strict on correctness and contracts.
Be conservative about operational risk.
Be explicit about uncertainty.
Do not suggest unrelated refactors.
Do not turn nits into blockers.
Do not reward intent; review actual implementation.

github-actions · 2026-04-16T09:13:24Z

Claude finished @magyargergo's task in 6m 7s —— View job

PR Review: feat(ingestion): language-agnostic call extractor with config+factory pattern

Tasks:

Post initial comment
Read repo docs (AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md)
Read changed files in the diff
Phase 1: Define production-ready bar
Phase 2: Review PR against that bar

Production-ready bar for this PR

Based on AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, and ARCHITECTURE.md, a production-ready refactor of the parse-phase call extraction must satisfy:

Behavioral parity: All call sites that were previously emitted must still be emitted with the same calledName, callForm, receiverName, receiverTypeName, argCount, and mixed-chain structure. No silent drops, no new false edges.
Graph schema intact: No changes to node labels, edge types, or generateId call signatures.
All 15 providers wired: callExtractor must be defined on every tree-sitter LanguageProvider; an undefined extractor silently drops all calls from that language.
Unit tests under gitnexus/test/unit/ covering the factory, per-language extraction, and provider wiring. Real tree-sitter grammars; no mocked AST stubs for behavioral tests.
tsc --noEmit + vitest run pass in gitnexus/.
No drive-by refactors beyond the stated scope (GUARDRAILS.md § Scope).
No secrets, machine-specific paths, or PolyForm metadata changes.
Disclosed behavioral changes: Any output change to ExtractedCall fields must be described in the PR.

Findings

1. Correctness & functional completeness

## NON-BLOCKING Undisclosed behavioral change: typeAsReceiverHeuristic now applies to C# and Kotlin regular member calls in parse-worker

Evidence:
- Old extractParsedCallSite (call-sites/extract-language-call-site.ts, deleted): only non-null for Java method_reference nodes; C# and Kotlin always returned null.
- Old parse-worker generic path (commit d024119, lines ~1830–1877): no type-as-receiver heuristic at all for regular member calls.
- New csharpCallConfig (call-extractors/configs/csharp.ts:8): typeAsReceiverHeuristic: true.
- New kotlinCallConfig (call-extractors/configs/jvm.ts:57): typeAsReceiverHeuristic: true.
- New parse-worker Path 2 (workers/parse-worker.ts:1836–1844): heuristic applied based on callSite.typeAsReceiverHeuristic, which is now true for C# and Kotlin.
Why it matters: For any C# or Kotlin member call where the receiver is an uppercase identifier not bound in TypeEnv (e.g., Console.WriteLine(), Utils.parse()), parse-worker will now set receiverTypeName = receiverName. In the old code receiverTypeName was always undefined here. This produces new CALLS edges in the graph for these call sites that did not previously exist. The old code listed C#/Kotlin in the heuristic check in call-processor.ts:930 but that code was unreachable (inside if (languageSeed) which was always false for those languages).
Recommendation: This is likely correct and arguably fixes dead code, but must be called out explicitly in the PR description. A targeted integration test or golden-diff test verifying that Console.WriteLine() in C# now produces a CALLS edge to Console.WriteLine would make this intentional and auditable.
Confidence: High.

## NON-BLOCKING Misleading callExtractor doc comment — "falls back to inline extraction" is false

Evidence: language-provider.ts:163:
```
/** ...
 * Default: undefined (falls back to inline extraction). */
readonly callExtractor?: CallExtractor;
```
The inline extraction in parse-worker was deleted by this PR. If callExtractor is undefined, parse-worker.ts:1655–1876 is skipped entirely and continue fires — all calls for that language are silently dropped.
Why it matters: A future contributor adding a new language without wiring callExtractor will get zero call edges with no error, relying on this misleading comment to conclude it's safe.
Recommendation: Change the comment to reflect reality: Default: undefined — if unset, no calls are extracted for this language. All tree-sitter providers MUST supply this. The test at call-extraction.test.ts:132–155 is a good regression guard but the comment will mislead contributors before they even see the test.
Confidence: High.

## NIT Double invocation of extractLanguageCallSite per Java call node

Evidence: For each Java call node, the factory's extract() is called twice by parse-worker:
1. callExtractor.extract(callNode, undefined) (Path 1, parse-worker.ts:1662) — factory calls parseJavaMethodReference(callNode) internally.
2. callExtractor.extract(callNode, callNameNode) (Path 2, parse-worker.ts:1827) — factory calls parseJavaMethodReference(callNode) again before falling through to generic.
  See generic.ts:41–49: extractLanguageCallSite is always tried first, regardless of which parse-worker path invoked extract().
Why it matters: parseJavaMethodReference is a cheap single-field type check, so this is not a practical performance issue. But the two-call design leaks the factory's internal retry logic into the caller.
Recommendation: Consider documenting in generic.ts or call-types.ts that extractLanguageCallSite is called on both extract(callNode, undefined) and extract(callNode, callNameNode) invocations, so language hooks must be idempotent.
Confidence: High.

2. Code clarity & clean code

Naming is consistent with method-types.ts / field-types.ts / method-extractors/generic.ts. Factory-pattern naming (createCallExtractor, CallExtractionConfig, CallExtractor) is clear.
No barrel re-export regressions observed.
No // removed comments.
No unused re-exports.
No drive-by refactors outside stated scope.
The heuristic logic in parse-worker is duplicated between Path 1 (parse-worker.ts:1673–1682) and Path 2 (parse-worker.ts:1836–1845) — same charCodeAt(0) >= 65 && <= 90 check. This was partially present before; the new code adds a second site. NIT, not blocking.

3. Test coverage & change safety

## NON-BLOCKING No test for the new behavioral path: C#/Kotlin typeAsReceiverHeuristic in generic extraction

Evidence: call-extraction.test.ts:304–320 tests only that the C# extractor sets typeAsReceiverHeuristic: true on the config. It does not test that, given a C# member call like Console.WriteLine(), the heuristic actually fires and sets receiverTypeName. The 'all tree-sitter providers have callExtractor defined' test (:132–155) guards wiring, which is good. The Java :: tests (:420–495) are solid.
Why it matters: The behavioral change described in Finding Add support for Ollama as a local inference backend #1 is not guarded by any test that would catch a regression or confirm the new behavior is as intended.
Recommendation: Add a test for C# (and Kotlin if grammar is available) that verifies extractor.extract(callNode, callNameNode).typeAsReceiverHeuristic === true for a member call, and separately verify that parse-worker emits receiverTypeName for such a call.
Confidence: High.

4. Performance

No new Cypher round-trips.
No O(n²) patterns introduced.
The double extractLanguageCallSite call is a single field check — negligible.
No memoization introduced, no invalidation risk.
No new allocations per call node beyond the existing spread patterns.
No issues found.

5. Operational risk

No file-handle or stream lifecycle changes.
No path separator assumptions.
LadybugDB single-writer invariant unaffected — no new analyze/load paths.
No changes to .gitnexus/meta.json.stats.embeddings paths.
MCP contracts (query, context, impact, detect_changes, rename, cypher, group_*) are unaffected — this PR only changes how ExtractedCall objects are constructed, not how the graph is queried.
Staleness detection in gitnexus/src/mcp/staleness.ts unaffected.
Rollback safety: reverting this PR requires a npx gitnexus analyze re-run to rebuild the call graph, since call edges may differ (especially for C# and Kotlin). This is the expected cost for any call-extraction change — no special migration needed, and LadybugDB is rebuilt from scratch on each analyze.

6. Maintainability

Pipeline Phase DAG boundaries are respected: changes are strictly within the parse phase's worker and sequential fallback path.
Per-language configs are in a discoverable location (call-extractors/configs/), matching the established pattern for method-extractors/configs/ and field-extractors/configs/.
The callExtractor?: CallExtractor optional field (with misleading comment, see Finding Welcome to GitNexus Discussions! #2) is the primary maintainability trap.
No shared logic moved to wrong layer; call-analysis.ts utilities are unchanged and correctly referenced.

Non-negotiables

No secrets or machine-specific paths: None found.
No blind find-and-replace renames: The deleted call-sites/ files were replaced with new call-extractors/ files via structural refactor, not text replacement. ✅
No HIGH/CRITICAL impact left unaddressed: All 15 providers are wired. ✅
PR title follows conventional commit: feat(ingestion): … ✅
PolyForm license untouched: ✅

What I could not fully verify

tsc --noEmit and vitest run results: I cannot run the test suite or TypeScript compiler to confirm CI passes. The firewall blocks in the PR body suggest the agent was able to run unit tests but hit DNS blocks on other operations.
call-processor.ts generic path heuristic gap: The current call-processor generic path (after the callExtractor.extract(callNode, undefined) null-return) does not call callExtractor.extract(callNode, nameNode) and therefore never reads typeAsReceiverHeuristic from the extractor. I traced this but could not run the code to confirm whether the separate class-as-receiver fallback (ctx.resolve(receiverName, ...)) compensates equivalently.
Kotlin :: callable references: Kotlin's tree-sitter queries do not appear to capture callable_reference nodes as @call. I could not find a grammar test confirming this — so the lack of extractLanguageCallSite in kotlinCallConfig is based on query inspection, not runtime proof.

Final verdict

⚠️ Merge after addressing blocking items.

There are no blocking issues in the traditional sense — the implementation is correct for all currently-supported languages. However, the undisclosed behavioral change in typeAsReceiverHeuristic for C# and Kotlin (Finding #1) should be documented in the PR description before merge, since it silently changes the call graph for two languages. The misleading doc comment (Finding #2) and missing behavioral test (Finding #3) should be addressed to prevent the next contributor from inadvertently disabling call extraction for a new language.

Priority order:

Document the C#/Kotlin typeAsReceiverHeuristic behavioral change in the PR description (or confirm it's intentional with a test).
Fix the language-provider.ts:163 doc comment to remove "falls back to inline extraction".
(Optional before merge) Add a test confirming the heuristic fires correctly for C# member calls in parse-worker Path 2.

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/d4b06f56-03b6-4fa4-801f-7ddcc6e81f13

…vioral test Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/e53e650b-fae6-4551-ab25-cda28e4d647f

…ment Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/e53e650b-fae6-4551-ab25-cda28e4d647f

magyargergo · 2026-04-16T10:33:17Z

@claude Act as a senior reviewer for GitNexus. Your job is to determine whether this PR is production-ready for this repo, not to give a generic code review.

You are reviewing a PR in the GitNexus monorepo:

gitnexus/ → CLI + MCP
gitnexus-web/
gitnexus-shared/

Your task has 2 phases, in this exact order:

PHASE 1 — DEFINE THE BAR
Before reviewing the diff, establish a concise repo-specific definition of “production-ready” for GitNexus, based only on the repo docs and the affected area.
Keep this definition practical and reviewable. Do not invent standards that are not grounded in the repo.

PHASE 2 — REVIEW THE PR AGAINST THAT BAR
Review the actual diff only after defining the bar.
Stay tightly scoped to the changed code and its direct consequences.

CONTEXT TO LOAD FIRST
Read these before reviewing:

AGENTS.md
GUARDRAILS.md
CONTRIBUTING.md
TESTING.md
ARCHITECTURE.md

Additional context:

If the diff touches ingestion, also read: ARCHITECTURE.md § Pipeline Phase DAG
PR [group/sync] Fix ManifestExtractor never called — config.links always produced 0 cross-links #827 diff area is primarily: gitnexus/src/core/group/ (sync + manifest extractor) plus associated tests

PRIMARY OBJECTIVE
Decide whether this PR is safe, correct, maintainable, and operationally acceptable to merge into production for GitNexus.

Do not optimize for completeness at the expense of signal.
Do not pad the review.
Do not propose unrelated refactors.
Do not restate the PR description unless needed for verification.

REVIEW RULES

Every finding must be grounded in specific evidence from the diff or directly relevant surrounding code.
Every finding must include path:line.
If you make a behavioral claim, cite the code that proves it.
If you make a performance claim, explain the mechanism.
If something cannot be verified from the diff alone, explicitly say so.
Distinguish clearly between:
- verified issue
- plausible risk
- unverified concern
Avoid vague wording like “might be better” or “could be improved” unless you explain exactly why.
Keep the review focused on this PR’s scope only.

For each finding, assign one severity:

BLOCKING → must be fixed before merge
NON-BLOCKING → valid issue, but merge may still be acceptable
NIT → stylistic/minor, not merge-relevant

REPO-SPECIFIC REVIEW CHECKLIST
Use these exact headings.

1. Correctness & functional completeness

Check:

Does the implementation actually satisfy the PR claim?
- ManifestExtractor is truly invoked
- config.links produces non-zero cross-links where expected
Resolver contracts are preserved:
- resolveSymbol remains exact-match
- label-scoped Cypher remains correct per contract type
- flag any regression toward fuzzy or unscoped matching
Graph schema integrity is preserved:
- no silent changes to node labels
- no silent changes to edge types
- no silent changes to ID generation (generateId)
Call out any missing wiring, partial integration, dead branch, or mismatch between tests and runtime behavior

2. Code clarity & clean code

Check:

naming quality
local cohesion
dead code
unnecessary abstraction
hidden control flow
confusing indirection
adherence to repo conventions:
- direct imports from gitnexus-shared
- no barrel re-export regressions
- no // removed comments
- no unused re-exports
no drive-by refactors outside stated scope per CONTRIBUTING.md and GUARDRAILS.md § Scope

3. Test coverage & change safety

Evaluate against TESTING.md:

Are there unit tests under gitnexus/test/unit/ covering the newly wired path?
Is there a regression guard for 0-link → N-link behavior?
Are assertions meaningful rather than tautological?
Are fixtures realistic for manifest inputs?
If memoization/cache was introduced, is there a test proving hit/miss behavior and correctness?
Is there evidence the expected validation path would pass for staged gitnexus/ files?
- tsc --noEmit
- vitest run --project default
  If not verifiable, say exactly what is missing.

4. Performance

Inspect for:

hot-path overhead in ingestion/group sync
excess allocations per manifest entry
redundant Cypher round-trips
missed batching or missed parallelism (Promise.all) where it materially matters
O(n²) or repeated lookup patterns on large repos
memoization tradeoffs:
- correctness
- invalidation
- bounded vs unbounded memory growth
  Do not speculate casually; explain the mechanism and likely impact.

5. Operational risk

Check:

Windows/cross-platform safety:
- stream lifecycle
- FD/file handle lifecycle
- path separator assumptions
- anything resembling prior ENOTEMPTY-style lifecycle regressions
LadybugDB single-writer invariant is preserved
Embeddings preservation:
- no silent breakage of --embeddings
- .gitnexus/meta.json.stats.embeddings not silently zeroed by changed paths
MCP contracts remain compatible:
- group_*
- query
- context
- impact
- detect_changes
- rename
- cypher
  Flag any schema or contract break without migration note
staleness behavior still triggers correctly (gitnexus/src/mcp/staleness.ts)
rollback safety:
- can this PR be reverted safely without re-analyze?
- if not, explain why

6. Maintainability

Check:

Does the change respect Pipeline Phase DAG boundaries?
Is cross-phase coupling being introduced?
Are manifest-extractor entry points discoverable from group/sync.ts?
Would the next contributor know where to extend this?
Is shared logic placed appropriately:
- in gitnexus-shared/
- or gitnexus/src/core/**/utils.ts
- and not duplicated between CLI and web bridge (server/api.ts)?

NON-NEGOTIABLES TO VERIFY
From GUARDRAILS.md / CONTRIBUTING.md:

no secrets, tokens, or machine-specific paths committed
no blind find-and-replace renames
if a symbol rename occurred, rename MCP tool or equivalent careful review was used
upstream impact was considered for modified shared symbols
no HIGH/CRITICAL impact left unaddressed
PR title follows conventional commit style + [area] Short description
PolyForm Noncommercial license headers/metadata untouched

OUTPUT FORMAT

Production-ready bar for this PR

Give a concise repo-specific definition in 5–10 bullets.

Findings

For each finding, use this structure exactly:

[BLOCKING|NON-BLOCKING|NIT] Short title

Evidence: path:line, path:line
Why it matters: ...
Recommendation: ...
Confidence: High / Medium / Low

Only include findings that are real and supported.
If there are no findings in a section, say:

No issues found in this area based on the diff and reviewed context.

What I could not fully verify

List any behavior you could not prove from the diff, tests, or available context.

Final verdict

Choose exactly one:

✅ Ready to merge — no blocking issues.
⚠️ Merge after addressing blocking items.
❌ Not ready — fundamental concerns.

If verdict is not ready, list the blocking items in priority order.

IMPORTANT REVIEW BEHAVIOR

Be strict on correctness and contracts.
Be conservative about operational risk.
Be explicit about uncertainty.
Do not suggest unrelated refactors.
Do not turn nits into blockers.
Do not reward intent; review actual implementation.

github-actions · 2026-04-16T10:33:29Z

Claude finished @magyargergo's task in 3m 22s —— View job

PR Review: feat(ingestion): language-agnostic call extractor with config+factory pattern

Tasks:

Post initial comment
Read repo docs (AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md)
Read changed files in the diff
Phase 1: Define production-ready bar
Phase 2: Review PR against that bar

Production-ready bar for this PR

Based on AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, and ARCHITECTURE.md § Pipeline Phase DAG:

Behavioral parity: All call sites previously emitted must still be emitted with the same calledName, callForm, receiverName, receiverTypeName, argCount, and mixed-chain structure. No silent drops, no new false edges — or any behavioral change must be explicitly disclosed.
Graph schema intact: No changes to node labels, edge types, or generateId call signatures.
All tree-sitter providers wired: callExtractor must be defined on every tree-sitter LanguageProvider; an undefined extractor silently drops all calls from that language per the new code path.
Unit tests under gitnexus/test/unit/ covering factory construction, per-language behavioral extraction, and provider wiring. Real tree-sitter grammars (no AST stubs) for behavioral tests.
tsc --noEmit + vitest run must pass in gitnexus/.
No drive-by refactors beyond the stated scope (GUARDRAILS.md § Scope, CONTRIBUTING.md).
No secrets, machine-specific paths, or PolyForm metadata changes.
Pipeline Phase DAG boundaries respected: changes confined to the parse phase worker and call-processor; no cross-phase coupling introduced.

Findings

1. Correctness & functional completeness

## NON-BLOCKING call-processor Path 2 bypasses callExtractor — typeAsReceiverHeuristic asymmetry

Evidence: call-processor.ts:987–1130 — after the if (callExtractor) block (Path 1, line 912–985), the generic path uses its own inline inferCallForm (line 1062) and extractReceiverName (line 1063), not callExtractor.extract(callNode, callNameNode). There is no reference to callSite.typeAsReceiverHeuristic in this path. Instead the call-processor uses ctx.resolve(receiverName, file.path) class lookup (lines 1116–1130) as the type-as-receiver fallback.
Why it matters: The PR's stated goal is a "language-agnostic" call extractor, but in call-processor.ts the typeAsReceiverHeuristic from the extractor config applies only to Path 1 (language-specific captures like Java ::). For regular C# member calls processed through call-processor's generic path, type-as-receiver is resolved via the graph (ctx.resolve()), which has different semantics than the parse-worker's uppercase-letter heuristic. The two paths will agree on fully-indexed repos where Console resolves as a class; they may disagree on partially-indexed or first-pass repos. This asymmetry is not documented.
Recommendation: Document the intentional split: the factory pattern is adopted for language-specific call shape extraction (Path 1) but call-processor's generic path retains its own richer resolution logic. A comment at call-processor.ts:987 explaining why callExtractor is not used for Path 2 here would prevent future contributors from introducing a misaligned refactor.
Confidence: High (verified from call-processor.ts:912–1130 and parse-worker.ts:1824).

## NON-BLOCKING No end-to-end test verifying parse-worker emits receiverTypeName for C# uppercase receivers

Evidence: call-extraction.test.ts:303–335 verifies the extractor returns typeAsReceiverHeuristic: true and receiverName: 'Console' for Console.WriteLine(). But no test verifies that parse-worker Path 2 (lines 1833–1842) actually fires the heuristic and sets receiverTypeName in the result.calls push. The test coverage confirms the extractor-layer output but stops short of the worker-layer emission.
Why it matters: The heuristic check at parse-worker.ts:1834–1842 (callSite.typeAsReceiverHeuristic && c0 >= 65 && c0 <= 90) is a separate conditional from what the test exercises. A future change that strips typeAsReceiverHeuristic from the generic path in the factory would pass all current tests while silently breaking receiverTypeName emission for C# calls in the worker.
Recommendation: Add one integration-level assertion in the test that calls result.calls emission logic (or a slim parse-worker harness test) verifying receiverTypeName === 'Console' for Console.WriteLine() in C#. This is a safety net, not a blocker.
Confidence: High.

2. Code clarity & clean code

Naming (CallExtractionConfig, CallExtractor, createCallExtractor) is consistent with method-types.ts / field-types.ts / method-extractors/generic.ts. Factory pattern is clear and follows established repo convention.
language-provider.ts:162–163: doc comment now correctly reads "if unset, no calls are extracted for this language. All tree-sitter providers MUST supply this." — the misleading "falls back to inline extraction" text from the original PR was fixed in commit baa61e1. ✅
generic.ts:39–43: idempotency note added for extractLanguageCallSite — correct and useful. ✅
No barrel re-export regressions, no // removed comments, no unused re-exports.
No drive-by refactors outside the stated scope.
The duplicate charCodeAt(0) >= 65 && <= 90 check at parse-worker Path 1 (line 1677) and Path 2 (line 1841) is a pre-existing pattern, not introduced by this PR.

3. Test coverage & change safety

call-extraction.test.ts — 527 lines, uses real tree-sitter grammars. Tests cover: factory construction (all 13 configs), provider wiring (15 tree-sitter languages), generic extraction (TypeScript, Python, Java, C#, Go, Rust, C++, PHP, Ruby), Java :: method references (4 cases), typeAsReceiverHeuristic config flags (all languages). This is solid.
C# behavioral tests (commit baa61e1): Console.WriteLine() full extraction and lowercase receiver test both present and correct.
Missing: Kotlin behavioral test (optional — grammar is marked optional in AGENTS.md; no grammar in test imports). Acceptable.
Missing: parse-worker Path 2 end-to-end receiverTypeName emission for C# (see Finding above).
Provider wiring test at line 136–158 covers all 15 tree-sitter languages including Vue. Java is correctly included.
The satisfies Record<SupportedLanguages, LanguageProvider> guard in languages/index.ts provides compile-time exhaustiveness, which is the repo's established safety mechanism.

4. Performance

No new Cypher round-trips. No O(n²) patterns. No redundant allocations per call node beyond pre-existing spread patterns.
parseJavaMethodReference (jvm.ts:15) is O(childCount) — single-pass child scan. Called at most twice per Java call node (Path 1 + potential generic path call), which is cheap.
No memoization introduced, no invalidation risk.
No issues found in this area.

5. Operational risk

No file-handle, stream, or path separator changes.
LadybugDB single-writer invariant unaffected — no new analyze/load paths.
Embeddings: no changes to --embeddings path, meta.json stats, or CSV streaming in run-analyze.ts. ✅
MCP contracts (query, context, impact, detect_changes, rename, cypher, group_*): unaffected. Changes are internal to call site construction in the parse phase, not to graph schema or query layer. ✅
Staleness detection (staleness.ts) unaffected.
Rollback safety: reverting requires npx gitnexus analyze re-run to rebuild call graph. Expected cost for any call-extraction change; no schema migration needed. LadybugDB rebuilt from scratch.

6. Maintainability

Pipeline Phase DAG boundaries respected: all changes confined to parse phase worker and call-processor. No cross-phase coupling. ✅
Per-language configs in call-extractors/configs/ match the established pattern for method-extractors/configs/ and field-extractors/configs/. New contributors will know where to add a config for a new language.
The callExtractor?: CallExtractor field on LanguageProvider is now clearly documented and the doc comment is accurate.
The asymmetry between parse-worker (uses factory for Path 2) and call-processor (own inline Path 2) is a maintainability risk if not documented, since the two paths look similar but diverge on receiver type resolution semantics.

Non-negotiables

No secrets or machine-specific paths: None found. ✅
No blind find-and-replace renames: Structural refactor via new files + deletion of call-sites/. ✅
No HIGH/CRITICAL impact left unaddressed: All 15 tree-sitter providers wired. ✅
PR title follows conventional commit: feat(ingestion): … ✅
PolyForm license untouched: ✅

What I could not fully verify

tsc --noEmit and vitest run results: Cannot execute the test suite or TypeScript compiler locally. Based on diff inspection, the types are consistent and the code compiles structurally.
Kotlin behavioral extraction at runtime: Kotlin grammar is optional and not imported in the test file. Cannot verify that kotlinCallConfig correctly extracts member calls against real Kotlin AST nodes.
call-processor Path 2 ctx.resolve() fallback coverage: I traced that the fallback at line 1116–1130 covers the type-as-receiver case for call-processor, but cannot run a cross-language call graph to confirm equivalence with parse-worker's uppercase-letter heuristic for C# static calls.

Final verdict

✅ Ready to merge — no blocking issues.

The initial review from the previous pass identified three concerns (misleading doc comment, missing C# behavioral test, undisclosed typeAsReceiverHeuristic behavioral change). All three were addressed in subsequent commits (baa61e1, 6a71450). The remaining findings are NON-BLOCKING:

The call-processor Path 2 asymmetry is pre-existing architectural design (not introduced by this PR) and the ctx.resolve() fallback provides equivalent or richer coverage for the general case.
The missing parse-worker end-to-end receiverTypeName emission test is a safety net, not a correctness gap.

The factory/config pattern is cleanly implemented, follows established repo conventions, and all 15 tree-sitter providers are correctly wired with compile-time safety.

Bumps gitnexus to v1.6.2 and adds the matching CHANGELOG entry. Highlights since v1.6.1 (61 commits): - Docker support (#848) - Language-agnostic heritage / call / variable extractors (config+factory pattern, #877 #878 #890) - AST-aware embedding chunking (#889) - jQuery / axios HTTP consumer detection (#887) - SemanticModel wired as first-class resolution input, SM-20 (#885) - ImportSemantics split into per-strategy hooks (#886) - Python dotted-import fix (#899); worker warnings non-terminal (#900 / #261); global-install ENOTEMPTY fixes (#843 #846); embeddings staleness fix (#831) See gitnexus/CHANGELOG.md for the full list. After merge, tag `v1.6.2` triggers publish.yml which runs CI, verifies tag↔package.json match, publishes to npm with provenance, and creates the GitHub Release using the extracted CHANGELOG body.

… pattern (abhigyanpatwari#877) * Initial plan * feat(ingestion): add call-types, call-extractors factory, per-language configs, and wire into providers Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36 Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com> * feat(ingestion): replace inline call extraction in parse-worker and call-processor, delete call-sites/ Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36 Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com> * test(ingestion): add unit tests for call extraction configs and factory Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/893afa77-5b34-4e6b-a1dc-03034261fb36 Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com> * style: fix prettier formatting in call-extractor files Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/d4b06f56-03b6-4fa4-801f-7ddcc6e81f13 * fix: address review comments — doc comment, idempotency note, C# behavioral test Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/e53e650b-fae6-4551-ab25-cda28e4d647f * fix: rename misleading test title, remove stale code reference in comment Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/e53e650b-fae6-4551-ab25-cda28e4d647f --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

Bumps gitnexus to v1.6.2 and adds the matching CHANGELOG entry. Highlights since v1.6.1 (61 commits): - Docker support (abhigyanpatwari#848) - Language-agnostic heritage / call / variable extractors (config+factory pattern, abhigyanpatwari#877 abhigyanpatwari#878 abhigyanpatwari#890) - AST-aware embedding chunking (abhigyanpatwari#889) - jQuery / axios HTTP consumer detection (abhigyanpatwari#887) - SemanticModel wired as first-class resolution input, SM-20 (abhigyanpatwari#885) - ImportSemantics split into per-strategy hooks (abhigyanpatwari#886) - Python dotted-import fix (abhigyanpatwari#899); worker warnings non-terminal (abhigyanpatwari#900 / abhigyanpatwari#261); global-install ENOTEMPTY fixes (abhigyanpatwari#843 abhigyanpatwari#846); embeddings staleness fix (abhigyanpatwari#831) See gitnexus/CHANGELOG.md for the full list. After merge, tag `v1.6.2` triggers publish.yml which runs CI, verifies tag↔package.json match, publishes to npm with provenance, and creates the GitHub Release using the extracted CHANGELOG body.

Initial plan

62f4235

Copilot AI assigned Copilot and magyargergo Apr 16, 2026

Copilot started work on behalf of magyargergo April 16, 2026 08:36 View session

Copilot AI linked an issue Apr 16, 2026 that may be closed by this pull request

feat(ingestion): language-agnostic call extractor with config+factory pattern #874

Closed

7 tasks

vercel Bot deployed to Preview April 16, 2026 08:36 View deployment

vercel Bot deployed to Preview April 16, 2026 08:48 View deployment

vercel Bot deployed to Preview April 16, 2026 08:55 View deployment

vercel Bot deployed to Preview April 16, 2026 08:57 View deployment

Copilot AI changed the title ~~[WIP] Add language-agnostic call extractor using config and factory pattern~~ feat(ingestion): language-agnostic call extractor with config+factory pattern Apr 16, 2026

Copilot AI requested a review from magyargergo April 16, 2026 09:00

Copilot finished work on behalf of magyargergo April 16, 2026 09:00

magyargergo marked this pull request as ready for review April 16, 2026 09:02

Copilot started work on behalf of magyargergo April 16, 2026 09:26 View session

style: fix prettier formatting in call-extractor files

a3db736

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/d4b06f56-03b6-4fa4-801f-7ddcc6e81f13

vercel Bot deployed to Preview April 16, 2026 09:30 View deployment

Copilot finished work on behalf of magyargergo April 16, 2026 09:31

Copilot started work on behalf of magyargergo April 16, 2026 10:21 View session

fix: address review comments — doc comment, idempotency note, C# beha…

baa61e1

…vioral test Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/e53e650b-fae6-4551-ab25-cda28e4d647f

vercel Bot deployed to Preview April 16, 2026 10:27 View deployment

fix: rename misleading test title, remove stale code reference in com…

6a71450

…ment Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/e53e650b-fae6-4551-ab25-cda28e4d647f

vercel Bot deployed to Preview April 16, 2026 10:30 View deployment

Copilot finished work on behalf of magyargergo April 16, 2026 10:31

magyargergo merged commit 03821fa into main Apr 16, 2026
19 checks passed

magyargergo deleted the copilot/create-language-agnostic-call-extractor branch April 16, 2026 10:45

magyargergo mentioned this pull request Apr 18, 2026

chore(release): v1.6.2 #952

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ingestion): language-agnostic call extractor with config+factory pattern#877

feat(ingestion): language-agnostic call extractor with config+factory pattern#877
magyargergo merged 7 commits into
mainfrom
copilot/create-language-agnostic-call-extractor

Copilot AI commented Apr 16, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

magyargergo commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

magyargergo commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

magyargergo commented Apr 16, 2026

1. Correctness & functional completeness

2. Code clarity & clean code

3. Test coverage & change safety

4. Performance

5. Operational risk

6. Maintainability

Production-ready bar for this PR

Findings

[BLOCKING|NON-BLOCKING|NIT] Short title

What I could not fully verify

Final verdict

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(ingestion): language-agnostic call extractor with config+factory pattern

Production-ready bar for this PR

Findings

1. Correctness & functional completeness

2. Code clarity & clean code

3. Test coverage & change safety

4. Performance

5. Operational risk

6. Maintainability

Non-negotiables

What I could not fully verify

Final verdict

Uh oh!

magyargergo commented Apr 16, 2026

1. Correctness & functional completeness

2. Code clarity & clean code

3. Test coverage & change safety

4. Performance

5. Operational risk

6. Maintainability

Production-ready bar for this PR

Findings

[BLOCKING|NON-BLOCKING|NIT] Short title

What I could not fully verify

Final verdict

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(ingestion): language-agnostic call extractor with config+factory pattern

Production-ready bar for this PR

Findings

1. Correctness & functional completeness

2. Code clarity & clean code

3. Test coverage & change safety

4. Performance

5. Operational risk

6. Maintainability

Non-negotiables

What I could not fully verify

Final verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 16, 2026 •

edited

Loading

vercel Bot commented Apr 16, 2026 •

edited

Loading

github-actions Bot commented Apr 16, 2026 •

edited

Loading

github-actions Bot commented Apr 16, 2026 •

edited

Loading