Skip to content

feat(ingestion): Java Spring route annotation → Route node extraction#2078

Merged
magyargergo merged 15 commits into
abhigyanpatwari:mainfrom
henry201605:feat/java-spring-route-ingestion
Jun 10, 2026
Merged

feat(ingestion): Java Spring route annotation → Route node extraction#2078
magyargergo merged 15 commits into
abhigyanpatwari:mainfrom
henry201605:feat/java-spring-route-ingestion

Conversation

@henry201605

@henry201605 henry201605 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds ingestion-layer support for Java Spring annotation-based routing, enabling route_map, shape_check, and api_impact MCP tools to work on Java Spring projects.

Before this PR: Route node count for all Java projects = 0. The group contract layer (http-patterns/java.ts) supported Spring annotations for cross-repo HTTP contract matching, but the ingestion pipeline did not generate graph Route nodes from them.

After this PR: Spring @GetMapping, @PostMapping, @PutMapping, @DeleteMapping, @PatchMapping annotations on methods generate Route nodes. Class-level @RequestMapping is correctly joined as a URL prefix, with per-class resolution for multiple controllers in one file.

Implementation

Route extraction is wired through the LanguageProvider contract rather than language-specific branches in the shared parse worker:

route-extractors/spring.ts (new)

The ingestion-layer Spring extractor. Uses a single predicate-free tree-sitter query to capture all route annotations, then a two-phase loop: pass 1 collects class-level @RequestMapping prefixes keyed by class_declaration node id; pass 2 resolves each method route's prefix via findEnclosingClass. Handles positional and named (path= / value=) arguments, and multiple classes per file with independent prefixes.

language-provider.ts + languages/java.ts

Added an optional extractDecoratorRoutes hook to LanguageProviderConfig. The Java provider registers extractSpringRoutes. The parse worker calls provider.extractDecoratorRoutes?.() generically — no language names in the shared dispatch path.

route-extractors/spring-shared.ts (new)

Shared primitives used by both the ingestion extractor (spring.ts) and the group extractor (http-patterns/java.ts) so the two can't drift apart: METHOD_ANNOTATION_TO_HTTP, findEnclosingClass, isRouteMemberKey, and a safe unquoteSpringLiteral. Lives in ingestion/ (lower layer); group imports from it, matching the existing group → ingestion direction.

Tests

  • test/unit/java-spring-route-ingestion.test.ts (9 cases): prefix joining, bare routes, all five HTTP verbs, @PatchMapping, named args, multi-class isolation, produces/consumes filtering, line numbers.
  • test/unit/spring-route-extractor-parity.test.ts (2 cases): parses one shared fixture through both the ingestion and group extractors and asserts they surface the same provider method/path combinations.
  • test/integration/spring-route-pipeline.test.ts (8 cases): full pipeline Route node + HANDLES_ROUTE edge generation.

Follow-up

The group-layer HttpRouteExtractor now runs both its graph path and source-scan path for Java and dedupes the result, making the source scan redundant for languages with full ingestion Route-node coverage. That optimization is tracked separately in #2138 (it also affects FastAPI / Laravel and needs to consider partial graph coverage).

…tion

Previously, GitNexus only supported Route node generation for JS/TS
ecosystems (Express, Next.js, Fastify, etc.) and Python (FastAPI, Flask).
Java Spring's annotation-based routing (@RequestMapping, @GetMapping,
@PostMapping, etc.) was only supported at the group contract layer
(http-patterns/java.ts) for cross-repo matching, but NOT at the
ingestion layer for generating graph Route nodes.

This commit adds ingestion-layer support:

1. JAVA_QUERIES (tree-sitter-queries.ts):
   - Added method-level annotation captures (@GetMapping, @PostMapping,
     @PutMapping, @DeleteMapping, @PatchMapping) → @decorator captures
   - Added class-level @RequestMapping@decorator capture (prefix)
   - Supports both positional ("/path") and named (path="/path",
     value="/path") annotation argument forms

2. parse-worker.ts:
   - Java class-level @RequestMapping is detected and stored as a prefix
     (not pushed as a standalone Route)
   - After per-file capture processing, the prefix is applied to all
     method-level routes in the same file via the existing
     ExtractedDecoratorRoute.prefix field
   - The routes phase (normalizeExtractedRoutePath) handles the prefix
     joining, producing final URLs like /api/users/list

3. Tests:
   - Unit test (worker-backed): 4 cases covering prefix joining,
     bare routes, class-level exclusion, multi-file isolation
   - Integration test (full pipeline): 6 cases covering end-to-end
     Route node + HANDLES_ROUTE edge generation

Closes the feature gap where `route_map`, `shape_check`, and
`api_impact` MCP tools returned empty results for Java Spring projects.
@henry201605 henry201605 requested a review from magyargergo as a code owner June 8, 2026 10:32
@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Found fixable formatting / unused-import issues across 15 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":2078,"changed_lines":15,"head_sha":"fa0c9eedcb4412fe649a1dd678ffc8dc5e00c4a7","run_id":"27131833178","apply_command":"/autofix"}

@henry201605

Copy link
Copy Markdown
Contributor Author

/autofix

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

✅ Applied autofix and pushed a commit. (apply run)

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10909 10893 0 16 541s

✅ All 10893 tests passed

16 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
  • C++ pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • Go structural interface detection benchmark > scales linearly with interface × struct count
  • Go structural interface detection split-phase benchmark > separates index-build and detection time
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • Vue pipeline benchmark > scales with component count
  • run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 75.09% 35544/47334 N/A% 🟢 ███████████████░░░░░
Branches 62.85% 21972/34959 N/A% 🟢 ████████████░░░░░░░░
Functions 80.84% 3836/4745 N/A% 🟢 ████████████████░░░░
Lines 78.89% 32141/40739 N/A% 🟢 ███████████████░░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tri-review: Java Spring route ingestion (PR #2078)

Verdict: changes requested. One failing CI check caused by this PR, plus three corroborated P2 issues — the headline being the language-specific Java logic added to the shared parse worker.

Methods & engine independence. Reviewed by 6 Claude reviewer lanes (correctness, adversarial, maintainability, testing returned full findings; the two GitNexus risk / test-CI lanes ended mid-investigation — their domains were covered by the CE lanes and by my own verification) plus Codex, the only genuinely independent engine here. The six Claude lanes share one underlying model, so their agreement is "consistent across personas," not independent confirmation — the strong signal below is Codex + a Claude lane agreeing, and several findings were verified by actually parsing real tree-sitter-java. Codex was live (foreground re-run after the first background pass returned empty).

What's solid (validated). The feature works end-to-end for the common one-controller-per-file case. It cleanly reuses the existing ExtractedDecoratorRoute.prefix plumbing (routes.ts joins prefix+path via normalizeExtractedRoutePath); javaClassPrefix is correctly per-file (no cross-file leak — refuted); a suspected mixed FastAPI+Spring prefix-clobber in parse-impl.ts was refuted by both Codex and correctness (the expansion is guarded on .py + decoratorReceiver==='router', neither of which Java sets); and normalizeExtractedRoutePath handles the slash-join edge cases. Positional + named-arg method annotations and GET/POST/PUT/DELETE all work. Good reuse of existing infrastructure.


Headline (the change requested) — [P2] Language-specific Java logic in shared parse-worker.ts

The PR adds the first language === SupportedLanguages.Java special-cases inside the previously language-agnostic decorator-route block (parse-worker.ts:1322, and the post-loop apply at :2279). AGENTS.md (§RFC #909) and CLAUDE.md both state: "Shared code in gitnexus/src/core/ingestion/ must not name languages — plug language behavior in via LanguageProvider/ScopeResolver hooks."

Honest framing: this extends existing tech-debt rather than uniquely breaking a rule — the file already names TS/Vue/Python/Go, and even the FastAPI/Laravel route extractors are reached via a one-line if (language === X) dispatch. So the achievable bar isn't "zero language names"; it's isolate the language logic into a module and leave only a thin dispatch line, the way the repo already does it: route-extractors/laravel.ts (extractLaravelRoutes, parse-worker:2255) and route-extractors/fastapi-router-bindings.ts (extractFastAPIRouterBindings, parse-worker:2266).

Recommended: extract the Spring logic to route-extractors/spring.ts (e.g. extractSpringRoutes(tree, filePath) returning routes with the class prefix already joined internally). This keeps the shared worker language-agnostic for this concern and dissolves the multi-class bug below (the extractor can hold a per-class prefix map). (Codex + ce-maintainability + gitnexus-risk.)


Inline findings (anchored on the diff)

  1. [P2 · CI blocker] quality / format is red because of this PR — root npx prettier --check . flags the new spring-route-pipeline.test.ts. Fix: prettier --write it.
  2. [P2 · bug] @PatchMapping routes silently dropped — ROUTE_DECORATOR_NAMES omits 'PatchMapping'. (Codex + adversarial + testing + verified.)
  3. [P2 · bug] Multiple @RequestMapping classes in one file → single javaClassPrefix scalar last-wins → wrong prefix on the earlier class's routes. (Codex + 4 Claude lanes; empirically parsed.)
  4. [P2 · test hygiene] The unit suite is a silent no-op in unit-only runs (runs zero production code); the integration test is what actually covers the feature. (Coordinator-verified.)

Lower-priority / coverage gaps (not blocking)

  • [P3] Method-level @RequestMapping(method=RequestMethod.POST) is recorded as httpMethod:'GET'. Latent onlyhttpMethod is not consumed for decorator routes (routes.ts:221-228 reads only routePath/prefix; the Route node is keyed on the URL at routes.ts:247). Becomes a real bug only if httpMethod is ever persisted/keyed. (correctness + Codex.)
  • [P3] Array-value @GetMapping({"/a","/b"}), constant-ref @GetMapping(PATH), and bare @GetMapping produce no query match and are silently dropped. (adversarial.)
  • [P3] Coverage gaps: named-arg class-level @RequestMapping(value=...) is untested (query added, fixtures only positional); the unit test asserts intermediate routePath/prefix rather than the joined URL (the integration test does assert /api/users/list); the HANDLES_ROUTE assertion uses >=1. (testing.)

Refuted (validation is a feature)

Cross-file prefix leak; mixed FastAPI+Spring prefix clobbering; route-path slash-join errors — all investigated and found sound.

CI

quality / format FAIL (this PR — finding 1) · Vercel FAIL (deploy authorization, unrelated) · CodeQL / lint / typecheck / Analyze pass · several platform-sensitive + coverage jobs still pending at review time.


Automated multi-tool digest (6 Claude reviewer lanes + Codex, the one independent engine). Inline anchors mark verified findings; please verify before acting.

Comment thread gitnexus/src/core/ingestion/workers/parse-worker.ts Outdated
@@ -0,0 +1,92 @@
/**

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2 · CI blocker] This file fails quality / format. CI runs npx prettier --check . from the repo root and flags this file (job log: [warn] gitnexus/test/integration/spring-route-pipeline.test.ts → exit 1). A local prettier --check with the gitnexus-package config passes, but the root config is what gates CI. Fix: npx prettier --write gitnexus/test/integration/spring-route-pipeline.test.ts and commit. [reproduced — CI log]

Comment thread gitnexus/src/core/ingestion/workers/parse-worker.ts Outdated
Comment thread gitnexus/src/core/ingestion/workers/parse-worker.ts Outdated
Comment thread gitnexus/test/unit/java-spring-route-ingestion.test.ts Outdated
…ping, multi-class support

Addresses all P2 findings from tri-review:

1. **Architecture**: Extracted Spring route logic from parse-worker.ts into
   a dedicated `route-extractors/spring.ts` module (matching the pattern
   of `laravel.ts` and `fastapi-router-bindings.ts`). parse-worker now
   has a single dispatch line — no language-specific logic inline.

2. **PatchMapping bug**: Added `'PatchMapping'` to `ROUTE_DECORATOR_NAMES`
   (was silently dropped before).

3. **Multi-class bug**: The new `extractSpringRoutes` walks each class
   declaration independently with its own prefix — no more single-scalar
   `javaClassPrefix` last-wins issue.

4. **Test hygiene**: Unit tests now import `extractSpringRoutes` directly
   (no dist build / worker pool dependency). Tests run in all tiers.

5. **Removed JAVA_QUERIES decorator patterns**: The Spring extractor does
   its own AST walk, so the tree-sitter query captures for Java annotations
   are no longer needed (avoids duplicate route emission).

Additional test coverage:
- Multi-class in one file with independent prefixes
- @PatchMapping support
- Named annotation args (path= and value=) on class-level @RequestMapping
@henry201605

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review! All P2 findings addressed in e932605:

  1. Architecture — Extracted Spring logic into route-extractors/spring.ts (matching the laravel/fastapi pattern). parse-worker now has a single if (language === Java) dispatch line, no inline language logic.

  2. PatchMapping — Added to ROUTE_DECORATOR_NAMES + METHOD_ANNOTATION_TO_HTTP in the new module. Test added.

  3. Multi-class prefix last-winsextractSpringRoutes walks each class_declaration independently with its own prefix map. Added MultiController.java fixture + test for two classes in one file.

  4. Test hygiene — Unit tests now import extractSpringRoutes directly (tree-sitter in-process, no dist/worker dependency). Runs in all tiers.

  5. Formatprettier --write applied to all new files.

P3 items (array values, method=RequestMethod.X) left as-is per your framing — can follow up separately if needed.

Comment thread gitnexus/src/core/ingestion/workers/parse-worker.ts Outdated
Addresses the second review comment: instead of an inline
`if (language === SupportedLanguages.Java)` dispatch in parse-worker,
the Spring route extraction is now wired through a new optional
`extractDecoratorRoutes` hook on LanguageProviderConfig.

- Added `extractDecoratorRoutes` to LanguageProviderConfig interface
- Java provider registers `extractSpringRoutes` as its implementation
- parse-worker calls `provider.extractDecoratorRoutes?.()` generically
- Removed direct import of spring.ts from parse-worker

This keeps parse-worker fully language-agnostic — no language names
appear in the dispatch path for route extraction.
@henry201605

Copy link
Copy Markdown
Contributor Author

Good point — agreed that this should go through the provider rather than an inline language check.

Fixed in 33e7849: added an optional extractDecoratorRoutes hook to LanguageProviderConfig, wired it up in the Java provider, and parse-worker now calls provider.extractDecoratorRoutes?.() generically. No language names in the dispatch path anymore.

Let me know if there's anything else I should adjust!

Comment thread gitnexus/src/core/ingestion/route-extractors/spring.ts Outdated
Comment thread gitnexus/src/core/ingestion/language-provider.ts Outdated
Comment thread gitnexus/src/core/ingestion/language-provider.ts Outdated
Comment thread gitnexus/src/core/ingestion/route-extractors/spring.ts Outdated
magyargergo and others added 2 commits June 8, 2026 21:06
…orts

Addresses all 4 inline review comments:

1. Rewrote spring.ts to use a single predicate-free Parser.Query
   (same pattern as group-layer JAVA_ROUTE_ANNOTATION_PATTERNS).
   Two-phase loop: first pass collects class prefixes by node.id,
   second pass resolves method routes via findEnclosingClass.
   No more manual DFS / recursion.

2-3. Moved inline import(...) type references in language-provider.ts
     to proper top-level imports (Parser, ExtractedDecoratorRoute).

4. Covered by #1 — recursive helpers removed entirely.

Added 3 extra test cases: non-route named args filtering,
prefix isolation across mixed classes, line number accuracy.
@henry201605

henry201605 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback! All four comments addressed:

  1. Tree-sitter captures instead of manual traversal — Rewrote spring.ts to use a single predicate-free Parser.Query. Two-phase loop: first pass collects class prefixes keyed by node.id, second pass resolves method routes via findEnclosingClass. No more manual DFS.

  2. Top-level import for Parser.Tree — Done, moved to the top of language-provider.ts.

  3. Top-level import for ExtractedDecoratorRoute — Same, both inline import(...) references removed.

  4. (Covered by point 1) — The recursive helpers are gone entirely now that we use tree-sitter captures.

Also added 3 extra test cases: non-route named args filtering (produces/consumes), prefix isolation across mixed classes (with/without prefix), and line number accuracy.

Thanks again for the careful review — much cleaner this way!

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude can you compare the route extrator in java.ts and spring.ts? What's the difference and how could we migrate spring.ts into java.ts and kotlin.ts?

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 1m 34s —— View job


Comparing group/extractors/http-patterns/java.ts vs route-extractors/spring.ts

  • Read spring.ts, java.ts, and related files
  • Compare the two extractors
  • Outline migration path to java.ts and kotlin.ts

Overview

The two files serve different layers of the pipeline and should not be confused by name alone:

group/extractors/http-patterns/java.ts route-extractors/spring.ts
Layer Group layer (cross-repo HTTP contract matching) Ingestion layer (graph Route node generation)
Output type HttpDetection[] (role, path, method, confidence) ExtractedDecoratorRoute[] (filePath, routePath, prefix, lineNumber)
Consumer api_impact, route_map, cross-repo contract diffing parse-worker → Route graph nodes + HANDLES_ROUTE edges
Scope Providers and consumers (RestTemplate, WebClient, OkHttp, Feign, Apache HttpClient) Providers only (@GetMapping etc.)
Query infra compilePatterns / runCompiledPatterns from tree-sitter-scanner.js new Parser.Query(Java, ...) directly

Key structural differences

1. What they do with the path

group/java.ts joins the class prefix and method path inline before returning:

// java.ts (group layer)
path: joinPath(prefix, route.rawPath),  // → "/api/users/list" fully resolved

route-extractors/spring.ts returns them separately — joining happens later in routes.ts via normalizeExtractedRoutePath:

// spring.ts (ingestion layer)
routePath: '/list',
prefix: '/api/users',  // joined downstream

2. Query coverage

group/java.ts also covers interface_declaration (for OpenFeign @FeignClient / @RequestLine contracts), and captures @member (the method name). spring.ts omits both — no interface support, no method name capture.

3. unquote vs unquoteLiteral

group/java.ts uses unquoteLiteral (from the scanner utils), which returns null for non-string nodes. spring.ts has its own unquote() that assumes the input is always a quoted string. The group-layer version is safer.

4. Cross-file inheritance

group/java.ts has scanSpringProject / collectSpringTypes which resolves interface→controller inheritance across files (a controller implementing a @RequestMapping interface inherits its routes). spring.ts has no equivalent — it's per-file only.


Migration path: spring.tsjava.ts + kotlin.ts

The spring.ts ingestion extractor is already wired correctly into java.ts (the language provider) via extractDecoratorRoutes: extractSpringRoutes. The question is whether the logic should live as a standalone spring.ts file or be absorbed.

Why spring.ts should stay separate (current approach is correct):

  • The group-layer java.ts is ~880 lines covering 6+ HTTP frameworks. Merging ingestion logic in would violate the layer separation this codebase enforces.
  • The query structure, output types, and path-joining semantics differ enough that sharing code would require awkward adapters.

What can be improved — sharing the query and helpers:

group/java.ts and route-extractors/spring.ts both define nearly identical:

  • METHOD_ANNOTATION_TO_HTTP (same map, duplicated)
  • findEnclosingClass (identical function, duplicated — see group/java.ts:368 vs spring.ts:101)
  • The tree-sitter query pattern (same 4-branch shape, minor differences in @member capture)
  • isRouteMemberKey / isRouteKey (same logic, different names)

These could be extracted to a shared route-extractors/spring-shared.ts:

// route-extractors/spring-shared.ts
export const METHOD_ANNOTATION_TO_HTTP: Record<string, string> = { ... };
export function findEnclosingClass(...) { ... }
export function isRouteMemberKey(...): boolean { ... }

Adding Kotlin support (kotlin.ts):

Spring annotations work identically in Kotlin, but the AST grammar differs:

  • Kotlin uses function_declaration instead of method_declaration
  • Annotations use multi_annotation or annotation nodes under modifiers (same field name, different grammar)

The migration would be:

  1. Parameterize spring.ts to accept the language grammar and method node type name:

    // spring.ts — generalized
    export function extractSpringRoutesGeneric(
      tree: Parser.Tree,
      filePath: string,
      language: Language,          // tree-sitter-java or tree-sitter-kotlin
      methodNodeType: string,      // 'method_declaration' or 'function_declaration'
      lineOffset = 0,
    ): ExtractedDecoratorRoute[] { ... }
    
    // Java convenience wrapper (keeps current API)
    export const extractSpringRoutes = (tree, filePath, lineOffset) =>
      extractSpringRoutesGeneric(tree, filePath, Java, 'method_declaration', lineOffset);
  2. Add a Kotlin wrapper (either in spring.ts or a new spring-kotlin.ts):

    import Kotlin from 'tree-sitter-kotlin';
    export const extractSpringKotlinRoutes = (tree, filePath, lineOffset) =>
      extractSpringRoutesGeneric(tree, filePath, Kotlin, 'function_declaration', lineOffset);
  3. Wire into kotlin.ts:

    // languages/kotlin.ts
    import { extractSpringKotlinRoutes } from '../route-extractors/spring.js';
    
    export const kotlinProvider = defineLanguage({
      ...
      extractDecoratorRoutes: extractSpringKotlinRoutes,
    });

The main validation needed before doing this: verify the Kotlin tree-sitter grammar uses the same annotation / annotation_argument_list / element_value_pair node type names as Java. If the annotation subtree differs, the query branches would need separate variants.

@magyargergo

Copy link
Copy Markdown
Collaborator

@henry201605 let me think about our approach a little bit. I have a feeling we are doing this processing twice.

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude following upon the previous comment, analyze whether we are doing duplicated processes and wasting cpu time.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 2m 37s —— View job


Analysis: CPU duplication between the two Spring route extractors

  • Post initial progress
  • Read route-extractors/spring.ts (ingestion layer)
  • Read group/extractors/http-patterns/java.ts (group layer)
  • Read parse-worker.ts — where ingestion extractor is called
  • Trace group/sync.tsHttpRouteExtractor — when group extractor runs
  • Analyze overlap / duplication

Short answer

Yes, the same Spring annotations are extracted twice — but across two different pipeline phases, not within a single command. There is also a subtler within-group sync duplication introduced by the graph-assisted path now having data it didn't have before.


Two separate phases, two separate parses

Phase Command Where Parses Java files?
Ingestion gitnexus analyze parse-worker.tsextractSpringRoutes ✅ Yes — tree-sitter in worker thread
Group sync gitnexus group sync group/sync.tsHttpRouteExtractor → group-layer java.ts ✅ Yes — tree-sitter via getScanInput

Both run the same predicate-free tree-sitter query against the same Java files: (class_declaration … @RequestMapping …) + (method_declaration … @{Get|Post|…}Mapping …). The METHOD_ANNOTATION_TO_HTTP map is literally copy-pasted in both modules (route-extractors/spring.ts:27 and group/http-patterns/java.ts:36).

The parse results are never shared: analyze and group sync are separate processes that each bring up their own Parser instance.


The subtler duplication inside group sync (the new wrinkle after this PR)

HttpRouteExtractor.extract() always runs two paths and merges them (http-route-extractor.ts:283-297):

// Source scan always runs to capture routes in languages/files not covered
// by graph edges; the glob and per-file parse results are cached above.
const providers = this.mergeGraphAndSourceContracts(
  await this.extractProvidersGraph(dbExecutor, getDetections),   // reads Route nodes from DB
  await this.extractProvidersSourceScan(files, getDetections),   // re-scans all files with java.ts
);

Before this PR: Java had no Route nodes → extractProvidersGraph returned nothing for Java → source scan was the only path → no redundancy, no waste.

After this PR: Java now has Route nodes → extractProvidersGraph finds them in DB and extractProvidersSourceScan tree-sitter-scans every Java file and finds the same routes → both paths produce identical contracts → mergeGraphAndSourceContracts deduplicates them. The tree-sitter parsing itself is cached per-file within a single group sync run (via cachedInputs), so the actual .parse() call only happens once. But the annotation-matching loop inside java.ts's scan() runs twice: once from graph path (to resolve the handler name) and once from source scan path (to emit the route).

Net effect: After this PR, during group sync on a Java Spring project, every route annotation is now processed once from the DB graph and once from the source scan — the latter being pure CPU waste because it can never add information the graph path doesn't already have.


Concrete CPU cost

For a repo with N Spring controller methods:

  • analyze: 1× tree-sitter parse + ROUTE_ANNOTATION_QUERY.matches() per file (via extractSpringRoutes)
  • group sync (after this PR): 1× tree-sitter parse per file (cached), but java.ts scan() runs twice per Route node's file (once per path), and mergeGraphAndSourceContracts deduplicates the results

The duplicated scan() calls are O(annotations) not O(file), but on a large Spring codebase with many annotations this is measurable — plus the deduplication work itself.


Root cause

HttpRouteExtractor was written when Java had no Route nodes in the graph (before this PR). The comment on line 285 says exactly this: "Source scan always runs to capture routes in languages/files not covered by graph edges". Now that this PR fills the graph with Java Route nodes, the source scan is no longer filling a gap for Java — it's duplicating the graph path.


What could be done (not a requirement for this PR)

  1. Short-term (no change needed now): The duplication is only wasteful, not incorrect — mergeGraphAndSourceContracts handles it. The PR is still correct.

  2. Medium-term optimization: extractProvidersSourceScan could skip files for which extractProvidersGraph already found full coverage (i.e., every graph Route node for a file also has a resolved handler name). This would require passing the graph coverage set into the source scan path.

  3. Long-term / architectural: If Route nodes are reliably emitted by ingestion for a language, group sync could skip source scanning for that language entirely and rely on the graph. This is a meaningful refactor of HttpRouteExtractor that would benefit Java, Python/FastAPI, PHP/Laravel, and any other language with full ingestion coverage.

For now, the correct framing is: this PR introduces the data that makes the source scan redundant for Java, but doesn't yet wire up the optimization to exploit that. That's a follow-up, not a blocker.

@magyargergo

Copy link
Copy Markdown
Collaborator

@henry201605 did you know about the group command?

@henry201605

Copy link
Copy Markdown
Contributor Author

Thanks for thinking through this carefully — and the Claude analysis above was really helpful in framing it.

To your question: yes, I'm familiar with group sync — I had the chance to contribute a few of the group-layer http-patterns PRs earlier (#1834, #1904, #1855), so I knew both layers parse Spring annotations and was aware of the overlap when adding the ingestion side.

On the "processing twice" point — you're right, and the analysis captures it well. There are really two kinds of duplication:

  • Source-level: METHOD_ANNOTATION_TO_HTTP, findEnclosingClass, and the route-key filter are copy-pasted between spring.ts and the group-layer java.ts.
  • Runtime inside group sync: now that Java has Route nodes, HttpRouteExtractor runs both its graph path and its source-scan path for Java and dedupes the result — so the source scan's annotation-matching loop is redundant work for Java (the tree-sitter parse itself is cached per file, but the scan loop runs twice). It's wasteful rather than incorrect, since mergeGraphAndSourceContracts handles it, but it's real and worth noting.

I think this PR is what surfaces that runtime redundancy (before it, Java had no Route nodes, so the source scan was the only path). The cleaner fix really sits in HttpRouteExtractor's "always source-scan" design — and it'd benefit Python/FastAPI and PHP/Laravel too — so my instinct is to handle that as a separate optimization rather than widen this PR.

What I'd be glad to clean up directly in this PR is the source-level duplication:

  • Extract METHOD_ANNOTATION_TO_HTTP, findEnclosingClass, and the route-key filter into a small shared module under route-extractors/, imported by both extractors (group → ingestion is the layer-correct direction, since group already imports from ingestion).
  • Align spring.ts's literal unquoting with the group layer's safer semantics — returning null for non-string nodes instead of assuming a quoted string.

And I agree with the analysis that the two extractors should otherwise stay separate — different output types (HttpDetection[] vs ExtractedDecoratorRoute[]), different path-joining, and the group-only Feign / RestTemplate / WebClient / interface-inheritance handling.

That's my read, but you know this codebase far better than I do — if you have a cleaner approach in mind for either the source-level sharing or the runtime path, I'm very happy to go with whatever direction you prefer. Just let me know.

@magyargergo

Copy link
Copy Markdown
Collaborator

Thanks, that makes sense, and I agree the two extractors should remain separate given that they serve different layers and produce different output types.

For this PR, could you please:

  • extract the shared Spring route primitives, such as the annotation-to-method mapping, enclosing-class lookup, and route-key filtering, so the ingestion and group implementations do not maintain those semantics independently;
  • add a small parity test using the same Spring fixture to confirm both extractors agree on the supported method/path combinations;
  • update the PR description to reflect the final provider-based implementation.

The broader HttpRouteExtractor optimization can be handled separately, since changing the graph/source-scan fallback behaviour is larger than the scope of this PR and should also consider FastAPI, Laravel, and partial graph coverage.

Please create and link a follow-up issue for that optimization. Once the items above are addressed, ping me and I’ll take another pass.

Addresses review follow-up on abhigyanpatwari#2078:

- Extract the primitives shared by the ingestion (route-extractors/spring.ts)
  and group (http-patterns/java.ts) Spring extractors into a new
  route-extractors/spring-shared.ts: METHOD_ANNOTATION_TO_HTTP,
  findEnclosingClass, isRouteMemberKey, and a safe unquoteSpringLiteral.
  Both extractors now import from it (group -> ingestion, the layer-correct
  direction) so the shared semantics can't drift apart.

- Replace spring.ts's local unquote() with the safer unquoteSpringLiteral
  (returns null for non-string nodes instead of assuming a quoted string).

- Add test/unit/spring-route-extractor-parity.test.ts: runs one shared Spring
  fixture through both extractors and asserts they surface the same provider
  method/path combinations.

The broader HttpRouteExtractor source-scan optimization is tracked in abhigyanpatwari#2138.
@henry201605

Copy link
Copy Markdown
Contributor Author

Thanks again for the detailed guidance on this — all four items are addressed now:

  1. Shared primitives — extracted METHOD_ANNOTATION_TO_HTTP, findEnclosingClass, and isRouteMemberKey (plus a safe unquoteSpringLiteral) into route-extractors/spring-shared.ts. Both the ingestion extractor (spring.ts) and the group extractor (http-patterns/java.ts) now import from it, so neither maintains those semantics independently.

  2. Parity test — added test/unit/spring-route-extractor-parity.test.ts, which runs one shared Spring fixture through both extractors and asserts they surface the same provider method/path combinations.

  3. PR description — updated to reflect the final provider-based implementation (the LanguageProvider.extractDecoratorRoutes hook plus the shared module).

  4. Follow-up issue — opened Optimize HttpRouteExtractor: skip source-scan for languages with full ingestion Route-node coverage #2138 to track the HttpRouteExtractor source-scan optimization, scoped to cover FastAPI/Laravel and partial graph coverage as you suggested.

Everything's green locally. Whenever you have a moment, I'd really appreciate another pass — and thanks again for taking the time to review this so carefully.

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tagging along!

@magyargergo magyargergo merged commit 36ca096 into abhigyanpatwari:main Jun 10, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants