Skip to content

fix(java): close parsing-layer coverage gaps F35/F38/F41 (#1928)#2045

Merged
magyargergo merged 5 commits into
abhigyanpatwari:mainfrom
Sweetdevil144:fix/java-1928-parsing-layer-coverage
Jun 5, 2026
Merged

fix(java): close parsing-layer coverage gaps F35/F38/F41 (#1928)#2045
magyargergo merged 5 commits into
abhigyanpatwari:mainfrom
Sweetdevil144:fix/java-1928-parsing-layer-coverage

Conversation

@Sweetdevil144

Copy link
Copy Markdown
Collaborator

Summary

Closes the three open findings of #1928 (parent epic #1919) in the registry-primary scope-resolution path — the live one after the legacy call/heritage/import DAG removal (#942/#943). F36/F37 already landed upstream (#1940/#1956); F39/F40 are verified-dead legacy-bank remnants and are intentionally left untouched.

  • F35 [HIGH] — qualified / qualified-generic constructor calls. new pkg.Foo() parses as a scoped_type_identifier that the scope query bound only as @reference.call.constructor.qualified with no @reference.name. The scope extractor then fell back to the whole-expression anchor (scope-extractor.ts nameCap = match['@reference.name'] ?? anchor), so the reference name became the raw new pkg.Foo() text and never resolved. Now the simple-name tail is bound via an end-anchored last child, plus a new arm for the previously uncaptured new pkg.Box<String>() (qualified and generic) shape. Mirrors the existing TS/JS new-expression qualified-constructor capture.
  • F38 [MEDIUM] — super(...) / this(...) explicit constructor invocations. Modeled by tree-sitter as explicit_constructor_invocation, never matched by the scope query, so chained-constructor CALLS edges were silently dropped. Synthesized (alongside the existing inheritance synth) with the target resolved structurally — this → enclosing type name; super → superclass tail via the shared javaBaseLookupNameNode (implicit Object skipped) — plus arity for overload disambiguation.
  • F41 [LOW] — interpretJavaTypeBinding strip order. The qualifier was stripped before generics, so a qualified generic type argument (Map<String, com.example.User>) was cut inside the generic into User>. Generics are now stripped first, then the qualifier; the erasure fallback is qualifier-tolerant.

Why F39/F40 are untouched

Traced end-to-end: the legacy @import capture is explicitly skipped (parse-worker.ts — IMPORTS now come from scope-resolution, #942/#943), and the legacy @call result feeds result.calls, which parse-impl.ts never reads (CALLS come solely from scope-resolution). Editing the legacy JAVA_QUERIES bank would change nothing functionally while churning the parity gate and risking the route/fetch path. Recommend closing F39/F40 as obsoleted by the registry-primary migration.

Test plan

New tests cover each finding at the capture level and end-to-end:

Made with Cursor

…wari#1928)

Registry-primary scope-resolution path (the live one post-abhigyanpatwari#942/abhigyanpatwari#943):

- F35 [HIGH]: qualified / qualified-generic constructor calls. `new pkg.Foo()`
  parses as a `scoped_type_identifier` that the query bound only as
  `@reference.call.constructor.qualified` with no `@reference.name`, so the
  scope extractor fell back to the whole-expression anchor and the reference
  name became the raw `new pkg.Foo()` text (never resolved). Bind the simple
  -name tail (end-anchored last child) and add an arm for the previously
  uncaptured `new pkg.Box<String>()` (qualified + generic) shape.

- F38 [MEDIUM]: `super(...)` / `this(...)` explicit constructor invocations,
  modeled as `explicit_constructor_invocation` and never matched by the scope
  query, dropped the chained-constructor CALLS edges. Synthesize them with the
  target resolved structurally (this -> enclosing type name; super -> superclass
  tail via the shared javaBaseLookupNameNode, skipping implicit Object) plus
  arity for overload disambiguation.

- F41 [LOW]: interpretJavaTypeBinding stripped the qualifier before generics, so
  a qualified generic type arg (`Map<String, com.example.User>`) was cut inside
  the generic into `User>`. Strip generics first, then the qualifier; make the
  erasure fallback qualifier-tolerant.

F36/F37 already landed upstream (abhigyanpatwari#1940/abhigyanpatwari#1956); F39/F40 are legacy-bank remnants
that are no longer consumed (legacy @import skipped in parse-worker; legacy
@call never read in parse-impl) so they are intentionally left untouched.

Tests: low-level capture unit tests (constructor shapes incl. double-match
guard; super/this/enum/implicit-Object), interpretJavaTypeBinding unit tests
(qualified generic args + the corruption case), and end-to-end resolver tests
with new fixtures asserting the CALLS edges resolve to the correct constructors.

Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel

vercel Bot commented Jun 4, 2026

Copy link
Copy Markdown

@Sweetdevil144 is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #2045 — tri-review (Java F35/F38/F41, #1928)

Methods & engines. Reviewed across three method families and ~9 lanes: GitNexus swarm (risk, test/CI), Compound-Engineering personas (correctness, adversarial, performance, testing), the three Java specialists requested (static-analysis/tree-sitter, JVM/Java-semantics, knowledge-graph builder), and Codex. Independence is asymmetric — every lane except Codex is Claude under a different prompt, so agreement is correlated. Codex was live and acted as the one independent engine (it CONFIRMED the F35 qualifier finding and REFUTED the F41 "regression"). Two swarm lanes (risk, test/CI) ended mid-investigation and contributed only partial notes; their domains were covered by the other lanes. The headline finding comes from a single (Claude) lane but was reproduced end-to-end by the coordinator, so it stands on reproduction, not corroboration count.

Bottom line: the three fixes are well-built and the parsing-layer logic is correct (F41 is provably non-corrupting; F35's query and F38's AST walk are verified against tree-sitter-java 0.23.5). The one thing worth fixing before merge: F38's this(...) chaining edge doesn't land — it materializes as a self-loop, and the new integration test masks it.

🔴 Headline (inline) — F38 this() emits a degenerate self-loop, not a ctor→ctor edge [reproduced]

Child(int x){ this(); } should produce CALLS Child(int)@L7 → Child()@L3; the pipeline actually emits Child()@L3 → Child()@L3 (selfLoop=true) — both endpoints collapse onto the first Child constructor, and the real relationship is lost. (super(1)Base is correct; the cross-class case works.)

  • Root cause (pre-existing, not in this diff): scope-resolution/graph-bridge/node-lookup.ts:101-135 registers overload-disambiguation keys (parameter-types / shape / constraints) only for Function/Method, never Constructor, so same-name ctors first-wins-collapse to one node. F38 is the first feature to create same-class ctor→ctor edges, so the first to surface it.
  • The test masks it: java-1928.test.ts matches the this() edge by name only, which a self-loop satisfies — green on a broken edge.
  • Suggested: at minimum strengthen the test to assert distinct source/target constructors (by startLine/arity) — that turns it red; the real fix extends the node-lookup overload keys to include Constructor. Fix here or track separately, but the test shouldn't certify a self-loop.

Inline (lower) — F41 test can't catch an order regression [reproduced]

java-interpret.test.ts covers List<Map<String, User>>List, but that input has no dot inside the generic args, so it yields List under both the old and new strip order — it can't detect a regression of the F41 reorder. The case that actually exercises it is a qualified nested element: add List<com.x.Foo<String>>List (corrupted Foo<String>> under the old order). Low severity; the fix itself is correct.

✅ What's solid (validation is a feature)

  • F41 reorder is correct and a strict improvement. A 500-input sweep found 0 regressions / 360 fixes; JVM-semantics confirms List<com.x.Foo<String>> → List is the JLS-correct erasure (the old Foo was an accidental, inconsistent result — unqualified List<Foo<String>> already returned List). Codex refuted the "regression" framing. The earlier-suspected nested-generic regression is not a defect.
  • F35 query verified against tree-sitter-java 0.23.5: the anchored tail binds the simple name (new a.b.Foo()Foo, never a/b), exactly one ctor ref per new (no double-match with the simple/generic/type-arg arms), array-creation correctly not matched, annotations robust.
  • F38 structural targeting is correct across nested/local/enum/record/generic/qualified-super; implicit-Object super correctly skipped; no crashes across 17 malformed inputs. The synth emits the right capture — the breakage is purely downstream in the graph-bridge.
  • No worker/non-worker parity gap: emitJavaScopeCaptures is the single shared emitScopeCaptures hook used by both pipelines (verified) — the trap that bit #1951/#2038 does not apply here.
  • New unit tests pass (23/23).

Lower-priority / known limitations (non-blocking)

  • F35 qualifier is captured but not threaded to resolution (Codex CONFIRMED). new pkg.Foo() resolves by the bare simple name Foo; the captured @reference.call.constructor.qualified is inert (no consumer). This is net-positive vs base (base emitted no edge for qualified new) and consistent with the codebase-wide simple-name resolution. ⚠️ The obvious "just emit @reference.qualified-name" fix is inert for constructor calls — there is no qualified channel in the call/MethodRegistry.lookup path (only the inherits/#1982 path consumes it). Real package-disambiguation is a larger, separate change. Only failure mode today: new pkg.Foo() with no pkg.Foo but a same-tail Foo elsewhere mis-binds (requires non-compiling source); the two-Foo case drops on tie.
  • F38 super to a same-tail qualified superclass (extends a.b.Base with a colliding c.d.Base) drops both EXTENDS and super-ctor edges — symmetric with the existing EXTENDS behavior (Java never threads a qualified base; the #1982 path is dormant for Java). Consistent, not newly broken.
  • Annotated superclass (extends @Ann Base) drops both EXTENDS and super-ctor edges — pre-existing javaBaseLookupNameNode gap (no annotated_type arm); a one-line arm would fix both.
  • Malformed extends { emits a @reference.name:"" ref on ERROR trees (resolves to nothing; no edge, no crash) — parity with the inheritance synth; an optional empty-name guard would tidy both.
  • Perf: emitJavaScopeCaptures now does the main query + two manual full-tree walks (inheritance + explicit-ctor). All O(n), and findEnclosingTypeDeclaration's parent walk is constant-depth — no O(n²). Given the active large-repo parse-OOM work (#1983), consider folding the two synth walks into one pass.
  • Test gaps: enum/record super(...) skip path, the super/this vs new parameter-types asymmetry (arity-only), and the this() distinct-source assertion (the masking gap above) are untested.

CI

gh pr checks: Vercel fails on authorization (deploy auth, not a code issue); "Apply conventional label" passes; "Validate PR title" skipped. No test-suite check is surfaced as a PR check, though the new test files are wired into the vitest default project (run by npm test). Mergeability: BLOCKED (driven by the Vercel auth failure), MERGEABLE otherwise.

Automated multi-tool digest (GitNexus swarm + Compound-Engineering personas + Codex). Verify before acting.

it('resolves `this()` in Child(int) to the Child constructor', () => {
const calls = getRelationships(result, 'CALLS');
const thisCall = calls.find(
(c) => c.target === 'Child' && c.targetLabel === 'Constructor' && c.source === 'Child',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2 · reproduced] F38 this() produces a self-loop, and this assertion masks it.

Running the pipeline on this fixture, Child(int x){ this(); } emits CALLS Child()@L3 → Child()@L3 (selfLoop=true), not the intended Child(int)@L7 → Child()@L3 — both endpoints collapse onto the first Child constructor and the real ctor→ctor edge is lost (super(1)→Base resolves correctly).

This find(...) matches by name only (source==='Child' && target==='Child'), which a self-loop satisfies, so the suite stays green on a broken edge.

Root cause is pre-existing and outside this diff: scope-resolution/graph-bridge/node-lookup.ts:101-135 registers overload-disambiguation keys only for Function/Method, never Constructor, so same-name constructors first-wins-collapse to one graph node.

Fix: at minimum assert the edge connects distinct constructors (e.g. by startLine/arity or node id) so this test goes red and exposes the miss; the full fix extends the node-lookup/ids overload keys to include Constructor.

it('falls back to the raw class name for an unrecognized nested generic', () => {
// Nested generic args defeat the single/two-arg element extraction; the
// erasure fallback keeps the outer class name (pre-existing behavior).
expect(raw('List<Map<String, User>>')).toBe('List');

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3 · reproduced] This case can't catch an F41 order-regression.

List<Map<String, User>> has no . inside its generic args, so stripQualifier is a no-op here and it normalizes to List under both the old order (stripGeneric(stripQualifier(x))) and the new order this PR introduces — so it can't detect a regression of the reorder.

The case that actually exercises F41 is a qualified nested element. Add:

expect(raw('List<com.x.Foo<String>>')).toBe('List');

which produced a corrupted Foo<String>> under the old order and List after the fix. The fix itself is correct — this only hardens the guard.

Sweetdevil144 and others added 2 commits June 5, 2026 04:02
…uper() chains don't self-loop (abhigyanpatwari#1928 F38 review)

Review of abhigyanpatwari#2045 caught two gaps; both confirmed by reproduction.

P2 — F38 this() emitted a self-loop. On the java-explicit-constructor fixture,
Child(int){ this(); } produced CALLS Child()#0 -> Child()#0 instead of
Child(int)abhigyanpatwari#1 -> Child()#0. Root cause is the language-agnostic graph-bridge: the
parse phase mints distinct Constructor nodes (Child#0, Child#1) carrying
parameterTypes, but node-lookup.ts registered the parameter-types / shape
overload keys only for Function/Method, never Constructor, so both ctors
collapsed onto the first-wins qualified/simple key and the caller Child(int)
resolved to Child#0 (the this() target). Extend the overload keys to Constructor
in both node-lookup.ts (registration) and ids.ts (lookup) via a shared
isOverloadableCallable predicate. Verified the edge now connects distinct nodes
(Child#1 -> Child#0); super(1)->Base#1 still correct. No cross-language
regressions (the 9 worker-path failures reproduce identically on clean HEAD).

Also harden the integration test: it matched the this() edge on name only, which
a self-loop satisfies; now assert the endpoints are DISTINCT constructors.

P3 — F41 order-regression guard was inert (List<Map<String,User>> normalizes to
List under both strip orders). Add List<com.x.Foo<String>> -> List, which is
corrupted to Foo<String>> under the old order and only correct generics-first.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Sweetdevil144 Sweetdevil144 requested a review from magyargergo June 5, 2026 01:25
@magyargergo magyargergo linked an issue Jun 5, 2026 that may be closed by this pull request
7 tasks
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10518 10502 0 16 519s

✅ All 10502 tests passed

16 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
  • C++ pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • Go structural interface detection benchmark > scales linearly with interface × struct count
  • Go structural interface detection split-phase benchmark > separates index-build and detection time
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • Vue pipeline benchmark > scales with component count
  • run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 80.26% 36699/45722 N/A% 🟢 ████████████████░░░░
Branches 68.89% 23386/33943 N/A% 🟢 █████████████░░░░░░░
Functions 85.65% 3874/4523 N/A% 🟢 █████████████████░░░
Lines 83.94% 33038/39358 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

…tures in baselines.json

Updated the fingerprint for the Java section and added detailed notes regarding the enhancements in constructor query captures, including qualified and qualified-generic constructor queries. This change reflects ongoing improvements in the parsing layer coverage and fixture updates.
@magyargergo magyargergo merged commit 281ce26 into abhigyanpatwari:main Jun 5, 2026
27 of 28 checks passed
@Sweetdevil144 Sweetdevil144 deleted the fix/java-1928-parsing-layer-coverage branch June 5, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Java: parsing-layer coverage gaps (7 findings)

2 participants