Skip to content

feat: add ORM dataflow detection (Prisma + Supabase)#511

Merged
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
marxo126:feat/orm-dataflow-detection
Mar 26, 2026
Merged

feat: add ORM dataflow detection (Prisma + Supabase)#511
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
marxo126:feat/orm-dataflow-detection

Conversation

@marxo126

Copy link
Copy Markdown
Contributor

Summary

  • Add QUERIES relationship type to detect ORM data access patterns from Prisma and Supabase JS client calls
  • Creates edges from consumer files to model CodeElement nodes with method-specific reason fields (e.g., prisma-findMany, supabase-select)
  • Adds Prisma/Supabase framework detection patterns for path-based and AST-based entry point scoring

Changes

File Change
schema.ts Add QUERIES to REL_TYPES, add FROM Function/Method TO CodeElement pairs
parse-worker.ts Add ExtractedORMQuery interface, regex extraction for prisma.model.method() and supabase.from('table').method()
parsing-processor.ts Propagate ormQueries through WorkerExtractedData
pipeline.ts Phase 3.7: ORM Dataflow Detection — accumulates queries from workers + sequential path, creates CodeElement model nodes and QUERIES edges
graph/types.ts Add QUERIES to RelationshipType union
framework-detection.ts Add Prisma schema and Supabase client path/AST detection patterns

Detection Patterns

Prisma: prisma.user.findMany(), prisma.post.create(), etc. — regex captures model name + method
Supabase: supabase.from('bookings').select(), supabase.from('users').insert() — regex captures table name + method

Test plan

  • Integration test with fixture repo containing Prisma and Supabase query patterns
  • Verifies QUERIES edges created from consumer files to model nodes
  • Verifies CodeElement nodes created for each model/table
  • Verifies reason field encodes ORM type and method
  • Full test suite passes (3715/3715 tests, 2 pre-existing unrelated failures in skills-e2e)
  • Manual verification on real projects with Prisma/Supabase usage

🤖 Generated with Claude Code

@vercel

vercel Bot commented Mar 25, 2026

Copy link
Copy Markdown

@marxo126 is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@marxo126

Copy link
Copy Markdown
Contributor Author

Why this is useful

Currently GitNexus traces function calls, imports, and route handlers — but ORM data access is invisible. When a developer calls prisma.user.findMany() or supabase.from('bookings').select(), there's no edge linking that file to the data model it depends on. This means:

  • gitnexus_impact can't warn you that changing a Prisma model will break files that query it
  • gitnexus_query can't find "all files that access the users table"
  • Process detection misses data access as part of execution flows

With QUERIES edges, the knowledge graph now captures the full picture: HTTP request → handler → ORM query → model. This is especially valuable for Next.js + Prisma and Supabase-based projects where data access is the core of the application logic.

Test run verification

ORM integration test (3/3 passed):

✓ creates QUERIES edges for Prisma calls (1ms)
✓ creates QUERIES edges for Supabase calls (0ms)
✓ creates CodeElement nodes for ORM models (0ms)

Full test suite (3715/3730 passed, 13 skipped):

Test Files  2 failed | 94 passed (96)
     Tests  2 failed | 3715 passed | 13 skipped (3730)
  Duration  571.43s

The 2 failures are pre-existing in skills-e2e.test.ts — unrelated to this PR (they fail on main as well).

@marxo126 marxo126 force-pushed the feat/orm-dataflow-detection branch from 5d0d3ed to cf5501a Compare March 25, 2026 17:36
@github-actions

github-actions Bot commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
4270 4269 0 1 176s

✅ All 4269 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 71.01% 10639/14981 71.04% 📉 -0.0 🔴 ██████████████░░░░░░
Branches 61.61% 7209/11700 61.63% 📉 -0.0 🔴 ████████████░░░░░░░░
Functions 74.45% 930/1249 74.45% = 0.0 🟢 ██████████████░░░░░░
Lines 73.35% 9500/12950 73.38% 📉 -0.0 🔴 ██████████████░░░░░░

📋 View full run · Generated by CI

@marxo126 marxo126 force-pushed the feat/orm-dataflow-detection branch from cf5501a to 9dc845b Compare March 25, 2026 19:52
@marxo126

Copy link
Copy Markdown
Contributor Author

Rebased onto main

Force-pushed 9dc845b — rebased from 11 commits (included stale upstream merges from cursor-cli-wiki, Expo Router, PHP response shapes PRs) down to 1 clean commit on top of current main.

The ORM dataflow detection code is unchanged, only the branch history was cleaned up.

@magyargergo

Copy link
Copy Markdown
Collaborator

@marxo126 can you please make sure CI is green? 🙏

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude Let's run a final review if this PR is production ready and ready to be merged

@github-actions

github-actions Bot commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 4m 40s —— View job


Production Readiness Review

  • Gather context and understand the request
  • Read all changed files
  • Review core logic (pipeline, parse-worker, parsing-processor)
  • Review schema and type changes
  • Review framework detection
  • Review tests and fixtures
  • Assess production readiness and provide final verdict

Verdict: ❌ Not Ready to Merge — Critical Bug Found

The feature concept is solid and the integration with the pipeline is well-structured, but there is one critical bug that would silently make ORM detection a no-op in production.


🔴 Critical: extractORMQueries is never called in the worker path

gitnexus/src/core/ingestion/workers/parse-worker.ts, ~line 849

The extractORMQueries function is defined and exported from parse-worker.ts, and ParseWorkerResult includes an ormQueries: [] field — but the function is never called inside processBatch or processFileGroup. The worker accumulation loop in parsing-processor.ts (line 108) and pipeline.ts (line 761) both collect from result.ormQueries, but workers always return an empty array.

ORM detection only works on the sequential fallback path (sequentialChunkPaths loop in pipeline.ts, lines 799–802). For any real-world repository large enough to engage the worker pool (which is the default), zero QUERIES edges will ever be created.

The integration test passes because the 4-file fixture repo likely falls back to the sequential path entirely.

Fix: Add a call to extractORMQueries inside processFileGroup in parse-worker.ts, after the existing route extraction at line 1401:

// After: result.routes.push(...extractedRoutes);
extractORMQueries(file.path, file.content, result.ormQueries);

Fix this →


🟡 Medium: Regex duplication between parse-worker.ts and pipeline.ts

PRISMA_QUERY_RE, SUPABASE_QUERY_RE, and the extraction logic are copy-pasted verbatim into pipeline.ts (lines 1296–1321) with a comment explaining it avoids importing worker-only code. This is a valid reason, but a single regex update in one file won't be reflected in the other.

Suggested fix: Extract the regex constants and extraction logic to a shared utility module (e.g., utils/orm-extraction.ts) importable from both parse-worker.ts and pipeline.ts.


🟡 Medium: Prisma model name casing mismatch — existing node lookup always misses

In processORMQueries (pipeline.ts ~line 1341):

const candidateIds = [
  generateId('Class', `${q.model}`),   // 'user'
  generateId('Interface', `${q.model}`),
  generateId('CodeElement', `${q.model}`),
];
const existing = candidateIds.find(id => graph.getNode(id));

Regex capture gives lowercase model names (e.g. prisma.user.findMany'user'), but Prisma schema model names are PascalCase (User). So generateId('Class', 'user') will never match an existing Class node for User. The fallback to a synthetic CodeElement node always fires. This is low-severity (the graph still works), but the intent of reusing existing model nodes is never achieved.


🟢 What's Good

  • Schema changes are correct: QUERIES is properly added to REL_TYPES in schema.ts and RelationshipType in graph/types.ts. The FROM File TO CodeElement pair already exists in RELATION_SCHEMA.
  • Deduplication in processORMQueries: The seenEdges set correctly prevents duplicate edges per file×model×method.
  • Prisma $ prefix filtering: model.startsWith('$') correctly skips prisma.$connect(), prisma.$transaction(), etc.
  • Fast-path checks: content.includes('prisma.') / content.includes('supabase.from') guard before running the regex — good for performance.
  • Framework detection additions: Prisma schema path detection and Supabase client path detection in framework-detection.ts are reasonable and non-breaking.
  • CI is green: All 4198 tests pass, coverage drop is minor.

🔵 Minor: Indentation inconsistency

gitnexus/src/core/ingestion/pipeline.ts, line 632 — allORMQueries is declared with 4-space indent while all surrounding declarations use 2-space. Not a bug, but cosmetically inconsistent.


Test Coverage Gap

The integration test (orm-dataflow.test.ts) only exercises the sequential path for a tiny fixture. There are no tests that:

  • Verify ORM queries are collected by the worker (extractORMQueries exported function)
  • Test prisma.$connect() or prisma.$transaction() are correctly filtered
  • Test multi-model access patterns across multiple files

Once the worker-path bug is fixed, the test should be verified against a worker-pool execution (or a unit test added for extractORMQueries directly).


Bottom line: Fix the missing extractORMQueries(file.path, file.content, result.ormQueries) call inside processFileGroup in parse-worker.ts and this PR will be functionally correct. The other issues are polish items that don't block the feature from working.

@magyargergo

Copy link
Copy Markdown
Collaborator

Can you please resolve the merge conflicts as well? 🙏

@marxo126 marxo126 force-pushed the feat/orm-dataflow-detection branch from 9dc845b to e5e8450 Compare March 26, 2026 06:19
@marxo126

Copy link
Copy Markdown
Contributor Author

Rebased onto latest upstream main — merge conflicts resolved.

Add QUERIES relationship type to detect data access patterns from Prisma
and Supabase JS client calls. Creates edges from consumer files to model
CodeElement nodes with method-specific reason fields.

- Add QUERIES to REL_TYPES and FROM/TO pairs for Function/Method→CodeElement
- Add ExtractedORMQuery interface and regex extraction in parse-worker
- Add processORMQueries() in pipeline Phase 3.7 with model node creation
- Add Prisma/Supabase path + AST framework detection patterns
- Add integration test with fixture repo validating QUERIES edges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@marxo126 marxo126 force-pushed the feat/orm-dataflow-detection branch from e5e8450 to e301d74 Compare March 26, 2026 06:45
@marxo126

Copy link
Copy Markdown
Contributor Author

Rebased onto latest upstream/main (includes merged #501 and #504). CI should pass now — the shape-check-regression.test.ts failure was because the branch was based on older main before PR #501 merged.

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude Let's run a final review if this PR is production ready and ready to be merged.

@github-actions

github-actions Bot commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 1m 42s —— View job


Production Readiness Review

  • Gather context and understand the request
  • Read all changed files
  • Verify critical bug from previous review is fixed
  • Review remaining medium/minor issues
  • Check test coverage and CI status
  • Provide final verdict

Verdict: ❌ Not Ready to Merge — Critical Bug Still Present

The previous Claude review identified a critical bug. After checking the latest commit (e301d74), that bug has not been fixed.


🔴 Critical: extractORMQueries is still never called in the worker path

gitnexus/src/core/ingestion/workers/parse-worker.ts, lines 1401–1407

processFileGroup ends without ever calling extractORMQueries:

    // Extract framework routes via provider detection (e.g., Laravel routes.php)
    if (provider.isRouteFile?.(file.path)) {
      const extractedRoutes = extractLaravelRoutes(tree, file.path);
      result.routes.push(...extractedRoutes);
    }
  }  // ← processFileGroup ends here — extractORMQueries is never called
};

The extractORMQueries function is defined at line 849 and exported, but is never invoked anywhere in parse-worker.ts. Workers always return ormQueries: []. ORM detection only works on the sequential fallback path (pipeline.ts lines 805–808), which is only hit for small repos that don't engage the worker pool.

The integration test passes because the 4-file fixture always falls back to sequential execution.

Required fix: Add one line inside the for (const file of files) loop in processFileGroup, after route extraction (line 1404):

    result.routes.push(...extractedRoutes);
  }
  // Add this:
  extractORMQueries(file.path, file.content, result.ormQueries);

Fix this →


🟡 Medium: Regex duplication (still present)

PRISMA_QUERY_RE, SUPABASE_QUERY_RE, and the extraction logic are copy-pasted as extractORMQueriesInline in pipeline.ts (lines 1392–1418). A regex update in one place won't be reflected in the other. This is the second occurrence of this medium issue from the previous review — it remains unaddressed.

🟡 Medium: Prisma model name casing mismatch (still present)

The regex captures lowercase model names (prisma.user.findMany'user'), but the lookup attempts generateId('Class', 'user') which will never match a Prisma schema User class node. The fallback synthetic CodeElement always fires. The feature still "works" but never reuses existing model nodes.

🔵 Minor: Indentation inconsistency (still present)

pipeline.ts line 638: allORMQueries is declared with 4-space indent while surrounding declarations use 2-space.


What's Still Good

  • Schema changes (QUERIES in REL_TYPES, RelationshipType) are correct
  • processORMQueries deduplication via seenEdges is correct
  • $-prefix filtering for prisma.$connect() etc. is correct
  • Fast-path content guards (includes('prisma.')) are correct
  • Framework detection additions are non-breaking
  • CI passes (4256/4257 tests) — but only because the tiny fixture never hits the worker path

Summary: The one-line fix (calling extractORMQueries inside processFileGroup) is all that's blocking this from being functionally correct. The medium/minor issues are polish. Please fix the worker-path call and re-run CI before merging.

ORM detection (Prisma, Supabase) was defined in parse-worker.ts but
never called inside processFileGroup, making it dead code in the worker
path. Only the sequential fallback in pipeline.ts ran ORM extraction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@marxo126

Copy link
Copy Markdown
Contributor Author

Fix pushed: worker path now calls extractORMQueries

extractORMQueries was defined in parse-worker.ts but never called inside processFileGroup. ORM detection only worked on the sequential fallback path (small repos). Now it runs in the worker path too.

Change: Added extractORMQueries(file.path, file.content, result.ormQueries) call in the per-file loop after Laravel route extraction.

Typecheck passes, all ORM tests pass.

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude Let's run a final review if this PR is production ready and ready to be merged

@github-actions

github-actions Bot commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 1m 24s —— View job


Production Readiness Review

  • Gather context and understand the request
  • Verify critical bug fix from previous reviews (worker path calling extractORMQueries)
  • Review remaining medium/minor issues
  • Check CI status
  • Provide final verdict

Verdict: ✅ Ready to Merge

The critical bug identified in previous reviews has been fixed. The PR is functionally correct and CI is green.


🟢 Critical Bug: Fixed

gitnexus/src/core/ingestion/workers/parse-worker.ts, lines 1407–1408

The missing extractORMQueries call is now present inside the processFileGroup per-file loop (commit 0048a59):

    // Extract ORM queries (Prisma, Supabase)
    extractORMQueries(file.path, file.content, result.ormQueries);
  }
};

ORM detection now works on both the worker path (large repos) and the sequential fallback path (small repos).


🟡 Medium Issues (non-blocking, polish items)

These were flagged in prior reviews and remain unaddressed, but neither blocks correctness:

  1. Regex duplicationPRISMA_QUERY_RE, SUPABASE_QUERY_RE, and extraction logic are copy-pasted as extractORMQueriesInline in pipeline.ts (lines 1392–1418). A shared utility module would eliminate the dual-maintenance risk. Fix this →

  2. Prisma model casing mismatch — Regex captures lowercase model names (prisma.user.findMany'user'), but Prisma schema nodes are PascalCase (User). The generateId('Class', 'user') lookup at pipeline.ts:1438 will never match an existing class node — the synthetic CodeElement fallback always fires. The graph is still useful, but the intent of reusing existing model nodes is never achieved. Fix this →

🔵 Minor Issue (non-blocking)

  • Indentation inconsistencyallORMQueries at pipeline.ts:638 uses 4-space indent while all surrounding declarations use 2-space.

What's Good

  • Critical worker path bug fixed (commit 0048a59)
  • CI green: 4269/4270 tests pass (1 skipped — documented Ruby limitation, unrelated)
  • Schema changes correct: QUERIES in REL_TYPES and RelationshipType
  • Deduplication: seenEdges set correctly prevents duplicate edges per file×model×method
  • $-prefix filtering: prisma.$connect(), prisma.$transaction() correctly skipped
  • Fast-path guards: content.includes('prisma.') checks before regex — good for perf
  • Framework detection additions: non-breaking Prisma schema and Supabase client path patterns

The two medium issues and one minor issue are polish items that can be addressed in a follow-up PR. The core feature is now correct end-to-end.

@magyargergo magyargergo merged commit a047a08 into abhigyanpatwari:main Mar 26, 2026
9 of 10 checks passed
motolese pushed a commit to motolese/datamoto-gitnexus that referenced this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants