Skip to content

fix(group): derive grpc consumer FQN from java imports for client-jar consumers#1889

Merged
magyargergo merged 6 commits into
abhigyanpatwari:mainfrom
henry201605:fix/grpc-consumer-fqn-from-import
May 29, 2026
Merged

fix(group): derive grpc consumer FQN from java imports for client-jar consumers#1889
magyargergo merged 6 commits into
abhigyanpatwari:mainfrom
henry201605:fix/grpc-consumer-fqn-from-import

Conversation

@henry201605

Copy link
Copy Markdown
Contributor

Problem

Java gRPC microservices commonly follow the client-jar pattern: the service owner publishes a pre-compiled stub jar to a Maven repository, and consumer repos depend on the jar instead of carrying the originating .proto files. This is the shape documented by the official gRPC Java quickstart, Alibaba HSF, ByteDance KiteX-Java, and google-cloud-java.

Before this PR, GrpcExtractor could only produce a fully-qualified contract id (grpc::<package>.<Service>/*) when the consumer repo also carried a matching .proto. Client-jar consumers have no proto, so they fell back to a short-name contract id (grpc::<Service>/*) that never matches the provider's package-qualified id. Cross-repo grpc cross-link counts dropped to zero on every realistic Java microservice group.

Concrete repro from a two-repo group (provider + consumer):

provider repo  85 contracts, all FQN  ✓  (same repo has .proto)
consumer repo  12 contracts, all SHORT ✗  (zero .proto in repo)

→ exact cross-link count = 0 across the entire group

Fix

Derive the proto package directly from each consumer file's import <pkg>.<XxxGrpc>; statement. The package from the import is exactly the proto package, so the contract id matches the provider's verbatim — no .proto lookup needed in the consumer repo.

Implementation

  • grpc-patterns/types.tsGrpcDetection gains an optional protoPackage field.

  • grpc-patterns/java.ts — adds GRPC_CLASS_IMPORT_PATTERNS, a tree-sitter query that captures import_declaration > scoped_identifier { scope, name } pairs where the imported name ends in Grpc. The plugin builds a per-file XxxGrpc → fullPackage map and tags every provider / consumer detection it emits.

  • grpc-extractor.tsdetectionToContract() now resolves the contract id in three steps:

    1. Detection-supplied protoPackage wins (skips the proto map entirely so an unrelated same-name service in the consumer repo can't blur the FQN).
    2. Otherwise consult the legacy per-repo proto map.
    3. Otherwise fall back to a short-name contract id, preserving pre-fix behaviour.

    Confidence stays at the "with proto" tier when the import path resolves — an import statement in real source code is at least as authoritative as a per-repo proto map.

Same-short-name disambiguation

A real internal codebase defines two distinct ContentRpcService services in different proto packages (com.example.api.proto.service.ContentRpcService vs com.example.admin.proto.service.ContentRpcService). Consumer files importing the two flavours now emit two distinct FQNs; neither could be told apart from the other under the legacy short-name fallback.

Scope

This PR is Java-only. Other plugins (go.ts, python.ts, node.ts) keep their existing behaviour and can opt in later by setting protoPackage on their detections.

Tests

test/unit/group/grpc-extractor.test.ts adds a new Java client-jar consumer (import-derived FQN) describe block with 9 cases:

Case Pins
Consumer with import emits FQN without local proto happy path
Provider with import emits FQN without local proto happy path (server side)
Same short name across packages → distinct FQNs disambiguation
Import-derived FQN overrides unrelated local proto with same short name precedence
Consumer without import → no false positive regression protection
Short import + local proto → import-derived path still wins (marker check) precedence
Static and wildcard imports don't pollute the import map regression protection
Provider with import in client-jar consumer repo server side, no proto
Two files in one repo with different api/admin imports → file-local resolution per-file isolation

End-to-end verification

Ran the patched CLI on a real two-repo Java group: a consumer repo (zero .proto) plus its provider repo (has .proto). Synced as a group:

Before fix:  exact cross-link count = 0
After fix:   9 grpc cross-links, all from one consumer file
             (the file we converted from wildcard to specific imports)

Each cross-link's contractId exactly matches the provider's FQN (e.g. grpc::com.example.admin.proto.service.ApplicationRpcService/*), and meta.protoPackageSource: "import" confirms the import-derived path resolved them.

Verification

  • npx tsc --noEmit
  • npx tsc (dist rebuild) ✅
  • test/unit/group/grpc-extractor.test.ts — 60/60 pass (51 existing + 9 new)
  • test/unit/group/ — 30 files / 545 tests all green
  • npx prettier --check on touched files ✅
  • npx eslint on touched source files — 0 errors / 0 warnings

…consumers don't fall back to short names

Java gRPC microservices commonly follow the "client-jar" pattern: the
service owner publishes a pre-compiled stub jar to a Maven repository
and consumer repos depend on the jar instead of carrying the
originating `.proto` files. gRPC's official Java quickstart, Alibaba
HSF, ByteDance KiteX-Java and google-cloud-java all document this
shape.

Before this commit, `GrpcExtractor` resolved a fully-qualified
contract id (`grpc::<package>.<Service>/*`) only when the consumer
repo also carried a matching `.proto`. Client-jar consumers had no
proto, so they fell back to a short-name contract id
(`grpc::<Service>/*`) that never matched the provider's contract id.
Cross-repo grpc cross-link counts dropped to zero on every realistic
Java microservice group — including all of crsdp's `crsdp-backend →
unipus_cloud_framework` connections.

Fix: derive the proto package directly from each consumer file's
`import <pkg>.<XxxGrpc>;` statement. The package from the import is
exactly the proto package, so the contract id matches the provider's
verbatim — no `.proto` lookup needed in the consumer repo.

Implementation
--------------

* `grpc-patterns/types.ts` — `GrpcDetection` gains an optional
  `protoPackage` field. Plugins set it when the package can be
  derived from the source file alone.
* `grpc-patterns/java.ts` — adds `GRPC_CLASS_IMPORT_PATTERNS`, a
  tree-sitter query that captures every
  `import_declaration > scoped_identifier { scope, name }` pair where
  the imported name ends in `Grpc`. `import static …` and
  `import w.x.*;` are excluded by tree-sitter shape: the `name:` field
  is only present on the non-static, non-wildcard form. The plugin
  builds a per-file `XxxGrpc → fullPackage` map and tags every
  provider / consumer detection it emits.
* `grpc-extractor.ts` — `detectionToContract()` now resolves the
  contract id in three steps:
    1. detection-supplied `protoPackage` wins (skips the proto map
       entirely so an unrelated same-name service in the consumer
       repo can't blur the FQN);
    2. otherwise consult the legacy per-repo proto map;
    3. otherwise fall back to a short-name contract id, preserving
       pre-fix behaviour.
  Confidence stays at the "with proto" tier when the import path
  resolves: an import statement in real source is at least as
  authoritative as a per-repo proto map.

Same-short-name disambiguation
-------------------------------

The motivating case `unipus_cloud_framework` defines two distinct
`ContentRpcService` services in different proto packages
(`cn.unipus.ucf.api.proto.client.service.ContentRpcService` vs
`cn.unipus.ucf.admin.proto.client.service.ContentRpcService`). Two
consumer files importing the two flavours now emit two distinct FQNs;
neither could be told apart from the other under the legacy short-
name fallback.

Out of scope
------------

`import w.x.*;` (wildcard service imports) are left to the legacy
short-name fallback. Wildcard imports are discouraged by Google's
Java style guide and IntelliJ's defaults, and resolving them
unambiguously would require either group-level proto-package
catalogs or per-class disambiguation, both of which are larger
follow-ups. This commit only changes behaviour for the dominant
specific-import case.

Tests
-----

`test/unit/group/grpc-extractor.test.ts` adds a new "Java client-jar
consumer (import-derived FQN)" describe block with 9 cases covering
both the happy paths (consumer/provider FQN derivation, same-short-
name disambiguation, import-vs-local-proto precedence) and the
regression-protection paths (no import + no detection emitted, static
imports / wildcards ignored, mixed-file repos preserved).

End-to-end verification
-----------------------

Ran the patched cli on the real `crsdp-backend` (consumer, no
`.proto`) and `unipus_cloud_framework` (provider, has `.proto`)
repos. Synced as a two-repo group, every `XxxGrpc` referenced via a
specific import in `UcfAdminGrpcClientService.java` produced an FQN
contract id that exact-matched the provider repo's FQN — 9 grpc
cross-links surfaced where there were 0 before.

Verification
------------

* `npx tsc --noEmit`: pass
* `npx tsc` (dist rebuild): pass
* `test/unit/group/grpc-extractor.test.ts`: 60/60 pass (51 existing
  + 9 new)
* `test/unit/group/`: 30 files / 545 tests all green
* `npx prettier --check` on touched files: pass
* `npx eslint` on touched src files: 0 errors / 0 warnings
@vercel

vercel Bot commented May 29, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10080 10075 0 5 620s

✅ All 10075 tests passed

5 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 79.86% 35479/44425 79.48% 📈 +0.4 🟢 ███████████████░░░░░
Branches 68.44% 22629/33063 68% 📈 +0.4 🟢 █████████████░░░░░░░
Functions 84.87% 3659/4311 84.56% 📈 +0.3 🟢 ████████████████░░░░
Lines 83.42% 31966/38319 82.97% 📈 +0.5 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 8m 30s —— View job


PR #1889 Production-Readiness Review

Reviewer persona: staff-level GitNexus production-readiness — adversarial but fair.

Tasks

  • Post initial comment
  • Read repo docs (DoD.md, AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md)
  • Inspect branch hygiene (git log, diff stat)
  • Read changed files (grpc-extractor.ts, java.ts, types.ts, test.ts)
  • Read related files (matching.ts, sync pipeline, go.ts, proto.ts, index.ts)
  • Analyze Tree-sitter query correctness and import parsing
  • Analyze contract ID correctness — proto package vs java_package semantic risk
  • Analyze precedence logic and false positive risk
  • Analyze file-local isolation
  • Analyze confidence semantics
  • Analyze downstream exact/wildcard matching path
  • Analyze test coverage adequacy
  • Run unicode/hygiene checks

Review bar for this PR

Reviewed against: DoD.md v2.0.0, GUARDRAILS.md, AGENTS.md, CONTRIBUTING.md. Key gates applied:

  • DoD §2.1 — correctness end-to-end on real runtime path
  • DoD §2.4 — existing contracts preserved; user-visible behavior changes documented
  • DoD §2.7 — tests on the real changed path; assertions are exact, not bounds-only; fixtures realistic for the risk
  • DoD §6 — "the runtime path is not actually exercised by the tests" is a Not-Done signal

Problem being solved

Java gRPC client-jar consumers (no local .proto) fall back to short-name contract IDs (grpc::Service/*), which should match providers via the endsWith rule in runWildcardMatch but fail to create cross-links in some configurations. The fix derives the proto package from import <pkg>.<XxxGrpc>; statements, tagging protoPackage on detections so detectionToContract() can emit a fully-qualified service-level contract without a local proto file.


Current PR state

  • Branch: henry201605/GitNexus:fix/grpc-consumer-fqn-from-importabhigyanpatwari/GitNexus:main
  • Files changed: 4 TypeScript files, +473 -0 (additive only)
  • CI: All 10,062 tests passing per CI report; CodeQL, Gitleaks, Dependency Review, Docker Build all listed as green.
  • No human reviews, no assignees, no labels, no milestone.

Merge status and mergeability

Branch has one merge-from-main commit (452890e) plus one functional fix commit (8b1073c). Diff is tightly scoped to the gRPC extractor, patterns, and test. No lockfile churn, no workflow changes, no generated artifacts. Branch hygiene is clean.

Cannot verify mergeability state, branch protection requirements, or CodeQL/Dependency Review conclusions programmatically from this environment. CI comment says all checks passed. Treat CI green as confirmed unless a check is known to be flaky.


Repository history considered


Branch hygiene assessment

Clean. The visible commits are 8b1073c (functional) + 452890e (merge-from-main). The diff is additive and tightly scoped. No unrelated churn. No conflict resolutions pulled in from main that contaminate the diff.


Understanding of the change

Resolution path added:

Before (for Java consumer with no local .proto):

  1. detectionToContract checks d.protoPackage → undefined, skip
  2. Checks proto map for service name → empty candidates → short-name fallback
  3. Emits grpc::AuthService/* with confidenceWithoutProto = 0.55

After (for Java consumer with import com.acme.auth.proto.AuthServiceGrpc;):

  1. GRPC_CLASS_IMPORT_PATTERNS fires → grpcClassImports["AuthServiceGrpc"] = "com.acme.auth.proto"
  2. protoPackageFor("AuthService")"com.acme.auth.proto"
  3. Detection emits protoPackage: "com.acme.auth.proto"
  4. detectionToContract enters Step 1 branch → emits grpc::com.acme.auth.proto.AuthService/* with confidenceWithProto = 0.75

Matching path (confirmed from matching.ts):

Service-level contracts ending with /* are classified as wildcards (isServiceWildcard check at line 13–15 of matching.ts). They are:

  • Excluded from runExactMatch consumer list
  • Added to unmatched → sent to runWildcardMatch
  • In runWildcardMatch: consumer grpc::com.acme.auth.proto.authservice/* strips /*fqService = "com.acme.auth.proto.authservice" → matches provider key grpc::com.acme.auth.proto.authservice/login via exact providerFqService === fqService comparison

The old short-name grpc::authservice/* matched via providerFqService.endsWith('.' + fqService) (the endsWith fallback for bare-name consumers). This distinction is critical — see Finding 1.

Tree-sitter query: The GRPC_CLASS_IMPORT_PATTERNS query correctly captures:

  • scope: (_) @import_pkg → the entire package path as text (e.g., com.acme.auth.proto)
  • name: (identifier) @import_name (#match? @import_name "Grpc$") → the XxxGrpc simple name

Static imports are excluded because tree-sitter-java's import_declaration for import static … has a static keyword child, and the name: field semantics differ. Wildcard imports produce an asterisk child instead of identifier, also excluded. The query shape is correct for the standard import form. Confirmed clean for the common case.

File-local isolation: The grpcClassImports map is rebuilt per tree (per call to scan(tree)), which is per source file (the orchestrator calls plugin.scan(tree) once per file). Isolation is correct. The two-repo same-short-name test at line 692 confirms this.


Findings


Finding 1 — Java java_package option causes import-derived package to diverge from proto package, creating an active regression

Severity: BLOCKER

Risk: Confirmed production failure mode. Not a hypothetical: option java_package is routinely used in Java protobuf projects (Google Cloud Java APIs, internal enterprise APIs, any project that separates proto namespace from Java artifact namespace) to set the generated Java package independently from the proto package declaration.

Evidence checked:

  1. buildProtoContext() at grpc-extractor.ts:262 extracts proto package with:

    const pkgMatch = content.match(/^\s*package\s+([\w.]+)\s*;/m);

    No handling of option java_package. The proto plugin (proto.ts:57) similarly uses (package (full_ident) @pkg) — also the proto package declaration.

  2. The Java protobuf compiler generates XxxGrpc.java in the Java package specified by option java_package (or the proto package if not set). When option java_package = "com.example.generated"; and package com.example;, the generated file lives in com.example.generated but the proto package is com.example.

  3. The import in a consumer Java file: import com.example.generated.AuthServiceGrpc;grpcClassImports["AuthServiceGrpc"] = "com.example.generated".

  4. Provider contract ID: grpc::com.example.AuthService/Login (from proto package).

  5. Consumer import-derived contract: grpc::com.example.generated.AuthService/*.

  6. In runWildcardMatch, fqService = "com.example.generated.authservice". The endsWith fallback (line 279 of matching.ts) only fires when !fqService.includes('.'). Since the import-derived FQN has dots, the endsWith rule does NOT fire. Result: zero cross-links.

  7. Critical regression: The old short-name consumer grpc::AuthService/* had fqService = "authservice" (no dots), which DID trigger the endsWith rule against "com.example.authservice", producing a wildcard cross-link. The PR converts this to a dots-containing FQN that fails the guard. Projects currently getting cross-links via the endsWith path LOSE those cross-links after this change.

  8. No occurrence of java_package anywhere in the codebase (confirmed by grep). No test covers proto package ≠ java package divergence.

Recommended fix: Either:

  • (a) In buildProtoContext, also parse option java_package and map the generated Java package to the service. Add a javaPackageProtoServiceInfo index so import-derived lookups can verify the match. If java_package is set and the import matches it, the import is still valid — use the resolved proto package, not the java package, for the contract ID.
  • (b) Scope the fix to consumer repos with NO local proto only (skip Step 1 when protoMap.has(d.serviceName)), and add a docs note that the fix only works when java_package is absent or equals the proto package.
  • (c) At minimum, add a test where proto package = "com.example" and java_package = "com.example.generated", and assert the behavior — even if the current behavior is documented as a known limitation.

Blocks merge: YES


Finding 2 — Import-derived package overrides proto map unconditionally, including when local proto and import diverge

Severity: HIGH

Risk: A provider repo carrying its own .proto (which is the authoritative source) can have its contract ID corrupted if the proto has option java_package and the Java source imports from the java package. The import-derived path wins over the correct proto map entry.

Evidence checked:

detectionToContract() at grpc-extractor.ts:486:

if (d.protoPackage) {
  // Step 1: import-derived package wins. Skip the proto map entirely
  // — it can only blur things (e.g. an unrelated same-name service in
  // the consumer repo would otherwise win the path heuristic and
  // produce the wrong FQN).

The comment explains the intended override (unrelated same-name service in consumer repo). But it does not account for the case where java_package ≠ package and the import correctly identifies the generated class but not the proto package. In that scenario the proto map holds the correct package; the import holds a different (wrong for contract ID purposes) package.

Test test_import_derived_fqn_overrides_unrelated_local_proto_with_same_short_name (line 517) only covers the case where the local proto is genuinely unrelated. No test exists where the local proto IS related but java_package causes a divergence.

Recommended fix: Same as Finding 1. When protoMap.has(d.serviceName) with a non-ambiguous single candidate, validate the import-derived package against the proto candidate before overriding. If they agree → use import-derived (with protoPackageSource: 'import'). If they disagree → use proto map (correct source), optionally logging a warning.

Blocks merge: maybe (same root cause as Finding 1; resolving Finding 1 addresses this)


Finding 3 — No end-to-end cross-link test; unit tests don't exercise the matching path

Severity: HIGH

Risk: DoD §2.7 states "Tests cover the real changed path — they would fail if behavior, wiring, or contracts were broken, not only if a mock were misconfigured." The real changed path is:

consumer file → plugin.scan() → detectionToContract() → dedupe() → [group sync] → runWildcardMatch() → CrossLink

All 9 new tests call extractor.extract() and assert contractId, confidence, and meta.protoPackageSource. None invoke runWildcardMatch or buildProviderIndex with a two-repo fixture and assert cross-link count or type.

Confirmed: The service-level wildcard contracts (grpc::pkg.Service/*) produced by this PR go to runWildcardMatch, NOT runExactMatch. The distinction matters: matchType: 'exact' vs matchType: 'wildcard', and the confidence value is Math.min(provider.confidence, consumer.confidence) in wildcard pass.

No test verifies that grpc::com.acme.auth.proto.AuthService/* from a consumer produces an actual wildcard cross-link against grpc::com.acme.auth.proto.AuthService/Login from a provider, in a two-repo group fixture.

Recommended fix: Add one integration-style test to grpc-extractor.test.ts (or sync.test.ts) that:

  1. Creates a provider fixture (Java proto-backed or proto-only) emitting grpc::com.acme.auth.proto.AuthService/Login
  2. Creates a consumer fixture (import-derived) emitting grpc::com.acme.auth.proto.AuthService/*
  3. Calls buildProviderIndex + runWildcardMatch
  4. Asserts cross-link count = 1, contractId matches, matchType = 'wildcard'

Blocks merge: maybe (strong DoD requirement, but the contract ID assertions do verify the precondition for a successful wildcard match)


Finding 4 — confidenceWithProto is used for import-derived packages that may be wrong

Severity: MEDIUM

Risk: When protoPackage is set (import-derived), the contract uses d.confidenceWithProto — 0.75 for consumers, 0.8 for providers. The original confidence tier meaning was: "we found the proto file, this is likely correct." An import statement is a real signal, but it captures the java package (which may not equal proto package). Using the same confidence tier conflates two different evidence qualities.

More concretely: if Finding 1 applies (wrong package due to java_package divergence), the wrong FQN is stored at confidence 0.75, making it appear more reliable than a low-confidence short-name that was at least safe (no false cross-links from it, because it used endsWith).

Recommended fix: Consider a confidenceWithImport tier (e.g., 0.70) between the two existing tiers. Or document explicitly that confidenceWithProto is the correct tier because the import statement is treated as equivalent to proto map evidence, and add a test that asserts downstream confidence values in the cross-link.

Blocks merge: no (cosmetic if Finding 1 is resolved)


Finding 5 — Tests use toBeGreaterThanOrEqual for some assertions (existing behavior, not new)

Severity: LOW

Risk: Existing tests at lines 301, 303, 320, 340 use toBeGreaterThanOrEqual(1) and toContain. DoD §2.7 says to use toBe/toEqual for exact expectations. The 9 new tests added by this PR DO use exact assertions (toHaveLength(1), toBe('grpc::...')), which is correct. The existing tests are pre-existing debt, not introduced by this PR.

Blocks merge: no


Finding 6 — No documentation update for user-visible behavior change

Severity: LOW

Risk: DoD §2.4: "If user-visible behavior... changes, the relevant docs are updated in the same change." This PR changes observable group sync output for Java gRPC repositories — consumers without local .proto now emit FQN contracts (vs short-name). The wildcard import limitation is not documented anywhere. docs/guides/microservices-grpc.md does not exist. CHANGELOG.md is not expected to be edited in PRs per DoD §6.

Recommended fix: Add a note to AGENTS.md (group section) or create/update any gRPC guide doc noting: Java-specific imports (import pkg.XxxGrpc;) now produce fully-qualified contract IDs; wildcard imports (import pkg.*;) and static imports fall back to proto-map resolution; when option java_package differs from proto package, results may be incorrect.

Blocks merge: no


PR-specific assessment sections

Java import parsing and Tree-sitter correctness:

The query is correct for standard non-static, non-wildcard imports. Confirmed:

  • import foo.BarGrpc; → captured correctly: scope="foo", name="BarGrpc"
  • import static foo.BarGrpc.*; → not matched (tree-sitter-java import_declaration with static has different node structure) ✓
  • import foo.*;asterisk child, not identifier, not matched ✓
  • import foo.Outer.BarGrpc;scope captures the scoped_identifier node foo.Outer (its .text = "foo.Outer"), so grpcClassImports["BarGrpc"] = "foo.Outer". For standard java_multiple_files = true gRPC generation, this is unlikely because each gRPC class gets its own file. But java_multiple_files = false (outer class wrapping) would produce nested imports — this edge case is untested and could produce wrong packages like grpc::foo.Outer.BarGrpc/*.

Contract ID format: serviceContractId(d.protoPackage, d.serviceName) produces grpc::com.acme.auth.proto.AuthService/* — same format as proto-map-derived FQNs. The trailing /* is consistent with existing behavior. ✓

File-local isolation: grpcClassImports is a new Map() inside each call to scan(tree). Fully local, no cross-file leakage. ✓

Same-short-name disambiguation: Two files with different imports for the same short service name get distinct FQNs. Correct and tested. ✓

Provider-side tagging: Provider Java source (extending XxxGrpc.XxxImplBase) also gets protoPackage from the import map. protoPackageFor(serviceName) correctly correlates the XxxGrpc outer class name with the import. This raises the same java_package concern for providers in repos that have both a local .proto and Java source — the import wins unconditionally.

Non-Java plugin compatibility: protoPackage?: string is optional; Go/Python/Node plugins that don't set it are unaffected. TypeScript strictness would catch any mandatory field addition. ✓

Downstream output: meta.package, meta.protoPackageSource, meta.service, meta.source are all set. No downstream code assumes protoPackageSource is absent (new field, additive). meta.package was absent before for no-proto detections; it's now set by Step 1. Any downstream code iterating meta would see a new key — low risk. ✓

Unicode hygiene: Files contain only visible non-ASCII characters in comments (em-dash , right-arrow , ellipsis ). No hidden bidirectional controls found. ✓


Back-and-forth avoided by verifying

Question Verified answer
Does buildProtoContext handle option java_package? No. grpc-extractor.ts:262 matches package\s+… only. No java_package handling anywhere in the codebase.
Does runWildcardMatch handle FQN consumer wildcards? Yes. fqService = "com.acme.auth.proto.authservice" → exact-compares against providerFqService. Correct when packages match; fails silently when java_package ≠ package.
Do service-level wildcards go to exact match or wildcard match? Wildcard match only. isServiceWildcard at matching.ts:13 returns true for /* suffix. runExactMatch explicitly excludes them.
Is protoPackageFor only called when a matching stub/provider detection fires? Yes. The lookup ${serviceName}Grpc is keyed by the same service name as the stub usage. Unrelated *Grpc imports only pollute if the short name collides AND a detection fires.
Is the import map file-local? Yes. grpcClassImports = new Map() at scan start; rebuilt per scan(tree) call.
What confidence values do existing tests assert? New tests: 0.75 (consumer), 0.8 (provider). Matches confidenceWithProto field values in java.ts. ✓

Open questions that remain

None that can be resolved without the author's internal codebase or a concrete option java_package test fixture. The java_package divergence issue is verifiable from the codebase alone — it is a confirmed finding, not a suspicion.


Final verdict

not production-ready

The import-derived package captures the Java compilation package (java_package option), while provider contract IDs are built from the proto package declaration. When these diverge (any project using option java_package), the fix does not merely fail to create cross-links — it actively breaks existing wildcard matches that previously succeeded via the endsWith rule in runWildcardMatch. The !fqService.includes('.') guard at matching.ts:279 means dots-containing FQNs skip the endsWith fallback. Converting grpc::AuthService/* (no dots, endsWith applies) to grpc::java_pkg.AuthService/* (dots, endsWith skipped) is an active regression for any consumer repo where java_package ≠ proto package. No test in the PR covers this divergence. DoD §2.1 ("edge cases relevant to the changed surface are handled or explicitly documented as out of scope") and DoD §2.7 ("tests cover the real changed path") are both unmet for this scenario. The fix is sound and well-implemented for the common case where java_package is absent or equals the proto package; a targeted patch for that scope — or explicit test + doc acknowledgment of the java_package limitation — would move this to production-ready.

@magyargergo

Copy link
Copy Markdown
Collaborator

@henry201605 could you please look into claude's findings? 🙏

magyargergo and others added 2 commits May 29, 2026 09:30
…grpc detection

Addresses Claude bot review on PR abhigyanpatwari#1889:

- Finding 1: parse `option java_package` when building proto context;
  add a reverse index so an import-derived package can be translated
  back to the proto package.
- Finding 2: when same-repo proto map has the service, use the proto
  package; warn and record `meta.importPackage` if the import disagrees.
- Finding 3: add an end-to-end wildcard match test (provider+consumer
  fixture, runs `buildProviderIndex`+`runWildcardMatch`).

Client-jar consumer + diverging `java_package` (no local proto)
remains a known limitation; pinned by a dedicated test.
@henry201605

Copy link
Copy Markdown
Contributor Author

Thanks @magyargergo and @claude for the thorough review — pushed 9a3456c6 addressing the actionable findings.

What changed

Finding Status Where
#1 option java_package divergence (BLOCKER) Fixed buildProtoContext parses option java_package and builds a servicesByJavaPackage reverse index. detectionToContract Step 1 translates an import-derived package back to the proto package whenever a same-repo .proto declares a matching java_package.
#2 Import-derived overrides proto map (HIGH) Fixed detectionToContract Step 2 cross-checks the import against same-repo proto map candidates. Agreement → both paths emit the same FQN. Disagreement → proto wins, warning logged, meta.importPackage records the divergent import.
#3 No e2e cross-link test (HIGH) Fixed New Java client-jar consumer — end-to-end wildcard match describe block runs a two-repo fixture through buildProviderIndex + runWildcardMatch and asserts exactly one wildcard cross-link with the expected contract id.
#4 Confidence semantics for import path (MEDIUM) n/a after #1 The Step 1 translation produces a wire-correct FQN, so confidenceWithProto is genuinely earned. The disagreement path (Step 2 fallback) emits a warning and uses the proto package, also legitimately at the proto-tier confidence.
#5 Existing toBeGreaterThanOrEqual assertions Pre-existing Not introduced by this PR; left unchanged.
#6 Documentation update (LOW) Inline Decision-flow rationale is documented in the detectionToContract JSDoc in grpc-extractor.ts (5-step resolution order). Happy to add a paragraph to AGENTS.md if you'd prefer external docs.

Known limitation (documented in code + tests)

Client-jar consumers whose published proto sets option java_package to a value that differs from package, in repos that don't carry the originating .proto, still produce an import-derived FQN that reflects the Java namespace rather than the proto namespace. Resolving this requires group-level proto knowledge (sharing reverse-index entries across all repos in a group), which felt out of scope for this PR. Pinned by the test_consumer_without_local_proto_and_diverging_java_package_is_known_limitation case.

Verification

  • npx tsc --noEmit
  • PR-touched test suites — all green
    • grpc-extractor.test.ts: 64/64 (60 existing + 3 java_package divergence + 1 e2e wildcard)
    • full test/unit/group/: 31 files / 567 tests
  • npx prettier --check on touched files ✅
  • npx eslint on touched source files — 0 errors

End-to-end re-verified against the two-repo Java group used to validate the original PR; same 9 grpc cross-links surface, no regression.

@magyargergo

Copy link
Copy Markdown
Collaborator

Can you please benchmark your implementation? 🙏 We need to make sure we won't degrade the performance, especially for big projects. We need good and reliable algorithm to collect these information.

@henry201605

Copy link
Copy Markdown
Contributor Author

@magyargergo benchmark done — no measurable degradation.

Setup

  • Workload: a 4-repo group simulating a microservice cluster
    • 1 server-side gRPC monorepo with .proto files, ~4.5k symbols, 12 services
    • 1 Java client-jar consumer (Spring + Feign + gRPC stubs), ~9k symbols
    • 1 Python (FastAPI) backend, ~1.4k symbols
    • 1 React frontend, ~3.9k symbols
  • Total: 309 contracts, 9 grpc cross-links
  • Command: npx gitnexus group sync <group> (full extract + match cascade)
  • 5 runs each, after a discarded warm-up to load Node modules and tree-sitter native bindings.
  • Same machine, same fixtures, only the binary changes between runs.

Results

Version avg median min max contracts cross-links
main (2f15c1ec) 11686 ms 11384 ms 11070 ms 13092 ms 309 9
PR head (9a3456c6) 11539 ms 11350 ms 11059 ms 12369 ms 309 9
Δ −1.3% −0.3% −0.1% −5.5% 0 0

The PR run is, if anything, marginally faster than main; differences are within run-to-run jitter. Output sets are bit-identical.

Why the cost is small

The two added work units are O(1) per detection / O(F) per scan:

  1. buildProtoContext does a single extra .match(/^…java_package…/m) per .proto file, plus inserts into one extra Map<string, ProtoServiceInfo[]> only when option java_package is set AND differs from package. Empty for repos that don't use the option (the common case).
  2. detectionToContract Step 1 is a hash lookup (javaPackageMap.get) and an Array.find over candidates that share a single short service name. The candidate set is bounded by service-name collisions — typically 1, occasionally 2.

No new I/O, no tree-sitter re-parse, no per-file scan amplification.

Caveat on scale

4 repos / 309 contracts is the largest realistic fixture I could assemble locally — I don't have a 50-repo / 5k-contract group on hand to exercise. The asymptotic argument above (constant per-detection overhead, no I/O) should hold linearly, but if you have a larger benchmark fixture you'd like me to run it against, please point me at it.

How to reproduce

# baseline
git checkout 2f15c1ec -- gitnexus/src/core/group/extractors/grpc-extractor.ts gitnexus/test/unit/group/grpc-extractor.test.ts
cd gitnexus && npx tsc
for i in 1 2 3 4 5; do node dist/cli/index.js group sync <your-group>; done

# PR head
git checkout 9a3456c6 -- gitnexus/src/core/group/extractors/grpc-extractor.ts gitnexus/test/unit/group/grpc-extractor.test.ts
cd gitnexus && npx tsc
for i in 1 2 3 4 5; do node dist/cli/index.js group sync <your-group>; done

@magyargergo magyargergo merged commit 4bc8622 into abhigyanpatwari:main May 29, 2026
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants