Skip to content

fix(group): Union HTTP graph and source contracts#1709

Merged
magyargergo merged 7 commits into
abhigyanpatwari:mainfrom
azizur100389:codex/http-provider-source-union
May 20, 2026
Merged

fix(group): Union HTTP graph and source contracts#1709
magyargergo merged 7 commits into
abhigyanpatwari:mainfrom
azizur100389:codex/http-provider-source-union

Conversation

@azizur100389

Copy link
Copy Markdown
Contributor

Fixes #1672

Summary:

  • union graph-assisted HTTP contracts with source-scan contracts instead of using source scan only when the graph is empty
  • keep graph-derived contracts authoritative for duplicate contract IDs because they carry richer symbol metadata
  • apply the same supplemental behavior to provider and consumer HTTP contracts
  • add polyglot-style regression coverage for graph routes plus source-scan-only routes in the same repo

Validation:

  • npx vitest run test/unit/group/http-route-extractor.test.ts -t "supplements graph"
  • npx vitest run test/unit/group/http-route-extractor.test.ts
  • npx vitest run test/unit/group/http-route-multi-verb.test.ts
  • npx vitest run test/unit/group
  • npx tsc --noEmit
  • git diff --check

Note: npx gitnexus impact ... and npx gitnexus detect_changes --scope staged still exit 1 with no output in this checkout, matching the local index issue seen on earlier PRs.

@vercel

vercel Bot commented May 19, 2026

Copy link
Copy Markdown

@azizur100389 is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
9270 9269 0 1 469s

✅ All 9269 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 78.61% 30118/38311 N/A% 🟢 ███████████████░░░░░
Branches 67.15% 19241/28652 N/A% 🟢 █████████████░░░░░░░
Functions 83.62% 3033/3627 N/A% 🟢 ████████████████░░░░
Lines 81.91% 27147/33139 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 6m 17s —— View job


PR #1709 Adversarial Review — Complete

Tasks:

  • Read repo guidance files (DoD.md, AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md)
  • Verify PR metadata locally (SHA, commit count, changed files, branch hygiene)
  • Inspect changed files: http-route-extractor.ts, http-route-extractor.test.ts
  • Inspect related files: normalization.ts, types.ts, contract-extractor.ts, callers
  • Search for key symbols: mergeGraphAndSourceContracts, dedupeContracts, contractIdFor, etc.
  • Run hidden Unicode checks
  • Assess CI status at reviewed SHA
  • Compile findings and post final review

Review bar for this PR

  • Provider and consumer extraction must both union graph-assisted contracts with source-scan contracts (not fallback-only), with graph contractIds winning on collision.
  • Graph-derived metadata (symbolUid, symbolRef, symbolName, confidence: 0.9, meta.extractionStrategy: 'graph_assisted', routeSource/fetchReason) must survive unchanged in the output.
  • Source-scan must still run when dbExecutor === null or when graph queries throw; no regression on existing pure source-scan behavior.
  • Dedupe semantics must not drop legitimate contracts or introduce duplicates that break downstream grouping or cross-link matching.
  • Tests must fail on the old fallback-only behavior and prove both supplement (different contractId) and conflict-resolution (same contractId, graph wins) scenarios.
  • Hidden/bidi Unicode controls in executable code or tests block merge. Performance regressions outside documented trade-offs block merge.

Current PR state

  • Base: abhigyanpatwari:main · Head branch: azizur100389:codex/http-provider-source-union
  • Full head SHA (local): c100aa3d254e700e71d0c7b5109d139dcedceab2 (merge commit) · Feature commit: 409b887cf71537c498ae6b137dea3b291006f47e
  • Changed files: exactly 2 per feature commit: gitnexus/src/core/group/extractors/http-route-extractor.ts (+30/-14) and gitnexus/test/unit/group/http-route-extractor.test.ts (+124/-0)
  • CI at reviewed SHA: Feature commit 409b887 — all checks passed (per in-PR CI report: 9231 tests, tsc, E2E). At merge HEAD c100aa3 (this review's SHA): Gitleaks ✅, Dependency Review ✅, CodeQL ✅, PR Autofix ✅; CI and Docker Build in progress at time of review, expected to pass as no TypeScript changes were introduced by the merge commit.
  • Visibility note: The merge commit brings eval/uv.lock from main (dependency bump 92ad0f5), which appears in the GitHub diff view alongside the two feature files. This is harmless but worth acknowledging.

Branch hygiene assessment

Classification: merge-from-main commit present but harmless and merge-safe.

Evidence: git log shows two commits since branch point — 409b887 (feature, touches only the 2 advertised files) and c100aa3 (merge of main, introduces eval/uv.lock from 92ad0f5 — a Python dependency bump with zero TypeScript impact). No unrelated extractor edits, no generated artifacts, no workflow churn, no lockfile drift beyond the benign merge pull. The merge commit does not block merge-readiness but does mean GitHub's "squash and merge" will include the uv.lock hunk; maintainers may prefer a rebase to produce a single clean commit, but this is a workflow preference, not a correctness blocker.


Understanding of the change

Old failure mode: extractProvidersSourceScan and extractConsumersSourceScan were each gated on graphProviders.length > 0 ? graphProviders : sourceScan. In a polyglot repo where Java HANDLES_ROUTE edges existed, the truthy branch fired and the entire Go/Python/TypeScript source-scan result was silently discarded.

New behavior — providers: mergeGraphAndSourceContracts(graphProviders, extractProvidersSourceScan(files, getDetections)) runs unconditionally. Graph rows are appended first; source-scan contracts whose contractId was not already seen (seeded from graph) are appended after. Graph entry wins on contractId collision.

New behavior — consumers: Identical pattern. mergeGraphAndSourceContracts(graphConsumers, extractConsumersSourceScan(files, getDetections)).

Null/throw fallback: When dbExecutor === null, graphProviders = []mergeGraphAndSourceContracts([], sourceScan) reduces to "return all unique-by-contractId source contracts." When extractProvidersGraph throws, it returns [] and the same reduction applies. Fallback path preserved.


Findings

Finding 1 — mergeGraphAndSourceContracts deduplicates source contracts more aggressively than the old null-executor path

  • Severity: minor
  • File: gitnexus/src/core/group/extractors/http-route-extractor.ts:479
  • Evidence: The private dedupeContracts at line 467 keys on ${contractId}|${filePath}|${name}, so two files each registering GET /api/health survive it as distinct contracts. Then mergeGraphAndSourceContracts([], sourceContracts) at line 199–202 deduplicates further by contractId only, retaining only the first. In the old null-executor path, extractProvidersSourceScan returned the per-file-distinct result directly; no second dedup pass collapsed it. normalization.dedupeContracts (used by bridge-db.ts:364) keys on (repo, contractId, role, filePath) — it is designed to hold per-file entries, so dropping the second file's contract upstream means it never reaches downstream merge-by-richness.
  • Risk: For source-only repos where the same method+path is legitimately registered in two different files (e.g., two microservices sharing the same repo path), one contract is now silently dropped by the extractor. The downstream richness-merge in normalization.ts would have handled it gracefully if it were passed through.
  • Recommended fix: Document the "one contract per contractId per role" invariant explicitly in the mergeGraphAndSourceContracts JSDoc, and add a test asserting that null-executor extraction from two files with the same route returns exactly one contract. If multi-file same-route is a valid product case, propagate duplicates through and let normalization.dedupeContracts select the richest.
  • Blocks merge: no — the same-contractId collapse is almost certainly the intended product semantics given that the downstream bridge-db node ID is sha256(repo‖contractId‖role‖filePath) and the graph model treats each route as a unique endpoint. But the behavioral change is currently untested and undocumented.

Finding 2 — Performance: unconditional source scan now runs on every group sync, including graph-rich repos

  • Severity: minor
  • File: gitnexus/src/core/group/extractors/http-route-extractor.ts:197–211
  • Evidence: Before this change, a repo where graphProviders.length > 0 skipped the glob and all tree-sitter parses for source scan. Now extractProvidersSourceScan(await getScannedFiles(), getDetections) runs unconditionally. getScannedFiles() is memoized (lines 190–195) and getDetections() is cached per file (lines 162–186), so graph-enrichment calls pre-warm the cache for graph files. The actual extra cost is: one glob(HTTP_SCAN_GLOB) per extract() call for the file list, plus tree-sitter parsing of files not already touched by graph enrichment.
  • Risk: On large repos with full graph coverage the glob is O(n) filesystem traversal that was previously skipped. Not catastrophic given caching, but it contradicts DoD 2.6 ("No repeated avoidable work, unnecessary scans") without an explicit documented rationale.
  • Recommended fix: Add a brief inline comment at the mergeGraphAndSourceContracts call site (or at getScannedFiles) explaining the intentional trade-off: "Source scan always runs to capture language paths not covered by the graph; glob runs at most once and parse results are cached." This satisfies DoD 2.6 by making the trade-off explicit.
  • Blocks merge: no — the caching mitigates the worst case, and the fix is inherently O(scan) for correctness.

Finding 3 — include-extractor.ts still uses the old fallback pattern (out of scope but flagged)

  • Severity: minor (informational, not a blocker for this PR)
  • File: gitnexus/src/core/group/extractors/include-extractor.ts:404
  • Evidence: grep found if (graphProviders.length > 0) return graphProviders; — the same pre-fix fallback pattern. This PR only claims to fix http-route-extractor.ts, so this is out of scope. Flagging for awareness.
  • Risk: If include detection has the same polyglot-drop bug as HTTP, it would need a parallel fix.
  • Recommended fix: Track as a follow-up issue if include extraction has the same problem.
  • Blocks merge: no.

PR-specific assessment sections

A. Provider union correctness ✅

extractProvidersGraph executes unconditionally when dbExecutor != null (line 197–198). extractProvidersSourceScan now executes regardless of graph result (lines 200–202). mergeGraphAndSourceContracts seeds seenContractIds from graph (line 483), appends all graph entries first (line 484), then appends only unseen source entries (lines 485–489). A source contract with the same contractId as a graph contract is skipped. Graph query failure returns [] and the merge correctly reduces to source-only. dbExecutor === null path is also correct (graphProviders is []).

The Java controller test at line 96–165 verifies the polyglot union: the graph mock returns GET /api/v2/users (Java), the Go file produces GET /api/health via source scan, both appear in output. Crucially expect(graphRouteMatches).toHaveLength(1) at line 157 proves that when BOTH graph and source scan produce http::GET::/api/v2/users (the Java controller file is parsed by both), only one survives — and line 158–159 confirms it is the graph-metadata copy (symbolUid: 'uid-ctrl-list', extractionStrategy: 'graph_assisted'). DoD requirement #8 (graph-authoritative conflict resolution) is covered by this test.

B. Consumer union correctness ✅

Same union pattern at lines 204–209. Graph consumers keep symbolUid, fetchReason, meta.extractionStrategy: 'graph_assisted' (lines 416–430). Source-scan consumers get extractionStrategy: 'source_scan' (line 459). The consumer supplement test at line 815–866 verifies graphConsumer.symbolUid === 'uid-fn-fetch' and sourceConsumer.extractionStrategy === 'source_scan'.

Consumer method correctness: The GET-default risk is pre-existing behavior (unchanged by this PR). When graph has a single consumer detection match, its method is used (line 394–396); with ambiguity, GET is the default. If graph emits GET /api/orders and source scan finds POST /api/orders, they produce distinct contractIds (http::GET:: vs http::POST::) and both appear — no false dedup. The prompt's concern about wrong impact edges does not materialize from this PR.

C. Graph authority and dedupe ✅

mergeGraphAndSourceContracts does not dedupe graph contracts among themselves (intentionally). The graph can return two rows for the same contractId (different files), and both survive to the downstream normalization.dedupeContracts which merges by richness score. The helper is called separately for providers and consumers — cross-role contractId collisions cannot occur within a single merge call. A defensive role-prefixed key (${role}|${contractId}) would be marginally safer if this helper were ever reused across roles, but as a private method with two call sites that clearly separate roles, the current form is acceptable.

D. Performance and scan caching ✅ with note

getScannedFiles is lazy and memoized (lines 190–195): glob runs at most once per extract(), shared between provider and consumer paths. cachedDetections (lines 162–186): each file parsed at most once; extractProvidersGraph calls getDetections(filePath) per graph row, pre-warming the cache for graph files before source scan touches them. The incremental cost of source scan on a fully graph-covered repo is one glob call + tree-sitter parse of files not in the graph. This is correct and necessary. See Finding 2 for the recommendation to document the trade-off.

E. Tests and CI ✅

New tests cover:

  • Provider supplement (Java graph + Go source scan, different contractIds) — line 96
  • Provider same-contractId conflict (implicit: Java file produces same route via both paths, toHaveLength(1) + graph metadata assertions) — lines 154–164
  • Consumer supplement (graph fetch + source-scan fetch('/api/health'), different contractIds) — line 815
  • Graph query failure falls back to source scan (throwingExecutor test at line 879) — pre-existing but covers new code path ✅
  • dbExecutor === null covered by the many null-executor tests — pre-existing ✅
  • getScannedFiles and detection caching share state between providers and consumers (not directly unit-tested as a performance property, but structurally sound)

Gap: no test asserting that mergeGraphAndSourceContracts([], [fileA_contract, fileB_contract_sameid]) returns only fileA_contract. This is covered by the behavioral change in Finding 1, but is currently untested.

CI: feature commit 409b887 — 9231 tests passed, tsc clean, E2E passed. Merge-head CI in progress at time of review; no TypeScript changes in the merge commit, expected to stay green.

F. Hidden Unicode / hygiene ✅

grep -Pn "[\x80-\xff]" on both changed files returned no output. No non-ASCII bytes in the changed files. GitHub's rendered bidi warnings were false positives, possibly triggered by em-dash or arrow characters in existing file comments that are harmless visible punctuation. Not a blocker.

G. Downstream contract compatibility ✅

normalization.dedupeContracts (used by bridge-db.ts:364) keys on (repo, contractId, role, filePath) and merges by richness score. It handles cases where multiple contracts share a contractId gracefully. The bridge node ID (contractNodeId at bridge-db.ts:53) is sha256(repo‖contractId‖role‖filePath), so per-file contract nodes are stable. The new behavior produces more contracts in the polyglot case (source-scan supplement is now included), all of which are valid inputs to the downstream. Contract IDs remain stable — contractIdFor(method, normalizeHttpPath(path)) is unchanged. No impact-analysis false positives for ignored files because createIgnoreFilter prunes the glob before source scan runs (pre-existing, unchanged by this PR).


Final verdict: production-ready with minor follow-ups

The fix correctly addresses the polyglot HTTP contract loss in #1672. The core logic in mergeGraphAndSourceContracts is sound: graph contracts are authoritative, source-scan supplements fill gaps, the helper is called separately for providers and consumers eliminating cross-role collision risk, and graph query failures and null-executor paths degrade gracefully to source-only behavior. Test coverage proves the supplement scenario, the same-contractId conflict (implicitly via toHaveLength(1) + graph metadata assertions), and the graph-throw fallback. The two minor findings — undocumented performance trade-off and a subtle behavioral change in the null-executor path's per-file deduplication semantics — do not block merge. The branch hygiene is acceptable (merge-from-main commit with an unrelated eval/uv.lock hunk is harmless). Hidden Unicode checks are clean.

Follow-ups recommended (non-blocking):

  1. Add a comment at the mergeGraphAndSourceContracts call site documenting the intentional "always scan" trade-off (Finding 2).
  2. Add a test for mergeGraphAndSourceContracts([], [A, B_same_id]) → returns only [A] to document and pin the source-dedup-by-contractId behavior (Finding 1).

@magyargergo

Copy link
Copy Markdown
Collaborator

@azizur100389 could you please look into claude's findings? 🙏

@azizur100389

Copy link
Copy Markdown
Contributor Author

Addressed Claude review follow-ups on #1709. Changes: added an inline note documenting the intentional always-source-scan trade-off and its caching, and added a public extractor regression for source-only duplicate provider contractIds so the one-contract-per-contractId behavior is pinned without testing a private helper. Validation: focused duplicate-provider test passed; focused graph supplement tests passed; full http-route-extractor unit file passed 32/32; http-route-multi-verb passed 10/10; full test/unit/group passed 502 passed / 8 skipped; npx tsc --noEmit passed; Prettier check passed; git diff --check passed. I also ran npm test: with TEMP/TMP moved to C:\tmp it reached 8993 passed / 43 skipped with one unrelated local failure in test/integration/analyze-heap-oom-e2e.test.ts, where the OOM child exited 0 locally instead of nonzero. I did not skip or alter that test. Local gitnexus impact/detect_changes still exit 1 with no output in this checkout.

@magyargergo magyargergo merged commit aa8f4d6 into abhigyanpatwari:main May 20, 2026
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP source-scan skipped whenever the graph returns any routes; polyglot repos silently drop providers

2 participants