Skip to content

fix(ingestion): reduce parse-phase memory for huge repos (#1983)#1

Closed
magyargergo wants to merge 1 commit into
refactor/943-delete-resolution-contextfrom
fix/issue-1983-large-repo-parse-oom
Closed

fix(ingestion): reduce parse-phase memory for huge repos (#1983)#1
magyargergo wants to merge 1 commit into
refactor/943-delete-resolution-contextfrom
fix/issue-1983-large-repo-parse-oom

Conversation

@magyargergo

Copy link
Copy Markdown
Owner

Summary

Fixes JavaScript heap OOM when analyzing very large repos (e.g. Linux kernel) during the parse phase. Stacked on abhigyanpatwari/GitNexus#2033 — merge abhigyanpatwari#2033 first, then open/retarget this against main.

  • Lazy disk-backed parse cache — flush each chunk to disk after merge; no longer retain hundreds of full worker payloads in RAM alongside the graph
  • Slim cache shards (schema v4) — drop legacy DAG fields unused after RING4-1/RING4-2
  • Defer worker ParsedFile for scope-resolver languages (scope-resolution re-extracts on main thread)
  • Incremental exportedTypeMap during chunk merge
  • GITNEXUS_DEBUG_HEAP=1[gitnexus-heap] probes for OOM diagnosis

Fixes abhigyanpatwari#1983

Test plan

  • npx tsc --noEmit
  • vitest run test/unit/incremental-parse-cache.test.ts
  • vitest run test/unit/parse-impl-worker-lazy-cache.test.ts
  • Linux kernel repro: NODE_OPTIONS=--max-old-space-size=20480 GITNEXUS_DEBUG_HEAP=1 gitnexus analyze --verbose

Merge order

  1. Merge refactor(ingestion): delete legacy resolution context + tiered-lookup plumbing (RING4-2, #943) abhigyanpatwari/GitNexus#2033 to main
  2. Retarget this PR to abhigyanpatwari/GitNexus main (or cherry-pick the single commit 665c60b8)

Made with Cursor

…wari#1983)

Stop retaining full parse-cache chunks in RAM alongside the merged graph,
slim on-disk shards, defer worker ParsedFile emission for scope-resolver
languages, and add GITNEXUS_DEBUG_HEAP probes for OOM diagnosis.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6892dd08-665e-444f-a321-5cab7c1a964e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-1983-large-repo-parse-oom

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@magyargergo magyargergo closed this Jun 4, 2026
magyargergo added a commit that referenced this pull request Jun 10, 2026
…abhigyanpatwari#2078)

* feat(ingestion): add Java Spring route annotation → Route node extraction

Previously, GitNexus only supported Route node generation for JS/TS
ecosystems (Express, Next.js, Fastify, etc.) and Python (FastAPI, Flask).
Java Spring's annotation-based routing (@RequestMapping, @GetMapping,
@PostMapping, etc.) was only supported at the group contract layer
(http-patterns/java.ts) for cross-repo matching, but NOT at the
ingestion layer for generating graph Route nodes.

This commit adds ingestion-layer support:

1. JAVA_QUERIES (tree-sitter-queries.ts):
   - Added method-level annotation captures (@GetMapping, @PostMapping,
     @PutMapping, @DeleteMapping, @PatchMapping) → @decorator captures
   - Added class-level @RequestMapping@decorator capture (prefix)
   - Supports both positional ("/path") and named (path="/path",
     value="/path") annotation argument forms

2. parse-worker.ts:
   - Java class-level @RequestMapping is detected and stored as a prefix
     (not pushed as a standalone Route)
   - After per-file capture processing, the prefix is applied to all
     method-level routes in the same file via the existing
     ExtractedDecoratorRoute.prefix field
   - The routes phase (normalizeExtractedRoutePath) handles the prefix
     joining, producing final URLs like /api/users/list

3. Tests:
   - Unit test (worker-backed): 4 cases covering prefix joining,
     bare routes, class-level exclusion, multi-file isolation
   - Integration test (full pipeline): 6 cases covering end-to-end
     Route node + HANDLES_ROUTE edge generation

Closes the feature gap where `route_map`, `shape_check`, and
`api_impact` MCP tools returned empty results for Java Spring projects.

* chore(autofix): apply prettier + eslint fixes via /autofix command

* fix: address review findings — extract spring.ts module, fix PatchMapping, multi-class support

Addresses all P2 findings from tri-review:

1. **Architecture**: Extracted Spring route logic from parse-worker.ts into
   a dedicated `route-extractors/spring.ts` module (matching the pattern
   of `laravel.ts` and `fastapi-router-bindings.ts`). parse-worker now
   has a single dispatch line — no language-specific logic inline.

2. **PatchMapping bug**: Added `'PatchMapping'` to `ROUTE_DECORATOR_NAMES`
   (was silently dropped before).

3. **Multi-class bug**: The new `extractSpringRoutes` walks each class
   declaration independently with its own prefix — no more single-scalar
   `javaClassPrefix` last-wins issue.

4. **Test hygiene**: Unit tests now import `extractSpringRoutes` directly
   (no dist build / worker pool dependency). Tests run in all tiers.

5. **Removed JAVA_QUERIES decorator patterns**: The Spring extractor does
   its own AST walk, so the tree-sitter query captures for Java annotations
   are no longer needed (avoids duplicate route emission).

Additional test coverage:
- Multi-class in one file with independent prefixes
- @PatchMapping support
- Named annotation args (path= and value=) on class-level @RequestMapping

* refactor: move Spring route extraction to LanguageProvider hook

Addresses the second review comment: instead of an inline
`if (language === SupportedLanguages.Java)` dispatch in parse-worker,
the Spring route extraction is now wired through a new optional
`extractDecoratorRoutes` hook on LanguageProviderConfig.

- Added `extractDecoratorRoutes` to LanguageProviderConfig interface
- Java provider registers `extractSpringRoutes` as its implementation
- parse-worker calls `provider.extractDecoratorRoutes?.()` generically
- Removed direct import of spring.ts from parse-worker

This keeps parse-worker fully language-agnostic — no language names
appear in the dispatch path for route extraction.

* refactor: rewrite spring.ts with tree-sitter captures, fix inline imports

Addresses all 4 inline review comments:

1. Rewrote spring.ts to use a single predicate-free Parser.Query
   (same pattern as group-layer JAVA_ROUTE_ANNOTATION_PATTERNS).
   Two-phase loop: first pass collects class prefixes by node.id,
   second pass resolves method routes via findEnclosingClass.
   No more manual DFS / recursion.

2-3. Moved inline import(...) type references in language-provider.ts
     to proper top-level imports (Parser, ExtractedDecoratorRoute).

4. Covered by #1 — recursive helpers removed entirely.

Added 3 extra test cases: non-route named args filtering,
prefix isolation across mixed classes, line number accuracy.

* refactor: extract shared Spring route primitives + add parity test

Addresses review follow-up on abhigyanpatwari#2078:

- Extract the primitives shared by the ingestion (route-extractors/spring.ts)
  and group (http-patterns/java.ts) Spring extractors into a new
  route-extractors/spring-shared.ts: METHOD_ANNOTATION_TO_HTTP,
  findEnclosingClass, isRouteMemberKey, and a safe unquoteSpringLiteral.
  Both extractors now import from it (group -> ingestion, the layer-correct
  direction) so the shared semantics can't drift apart.

- Replace spring.ts's local unquote() with the safer unquoteSpringLiteral
  (returns null for non-string nodes instead of assuming a quoted string).

- Add test/unit/spring-route-extractor-parity.test.ts: runs one shared Spring
  fixture through both extractors and asserts they surface the same provider
  method/path combinations.

The broader HttpRouteExtractor source-scan optimization is tracked in abhigyanpatwari#2138.

---------

Co-authored-by: henry <zhangwei2017@unipus.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gergő Magyar <gergomagyar@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant