Skip to content

fix(group): contract extractors honour .gitnexusignore via shared IgnoreService (#1185)#1247

Merged
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
azizur100389:fix/group-extractors-ignore-service
May 1, 2026
Merged

fix(group): contract extractors honour .gitnexusignore via shared IgnoreService (#1185)#1247
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
azizur100389:fix/group-extractors-ignore-service

Conversation

@azizur100389

Copy link
Copy Markdown
Contributor

Summary

Reported in #1185 by @DungNg03051999: the HTTP, gRPC, and topic contract extractors globbed the repo with a hardcoded ignore: ['**/node_modules/**', '**/.git/**', '**/dist/**', '**/build/**', '**/vendor/**'] array, bypassing the shared IgnoreService that the rest of the ingestion pipeline uses for .gitnexusignore + .gitignore parsing. Reporter measured 8 false-positive consumer contracts from a Python venv (mentor_env/) that .gitnexusignore was supposed to exclude.

Fix

Each extractor's glob call now consumes the shared filter, mirroring the canonical pattern in filesystem-walker.ts:6,34:

+ import { createIgnoreFilter } from '../../../config/ignore-service.js';
...
+ const ignoreFilter = await createIgnoreFilter(repoPath);
  return glob(SCAN_GLOB, {
    cwd: repoPath,
-   ignore: ['**/node_modules/**', '**/.git/**', '**/dist/**', '**/build/**', '**/vendor/**'],
+   ignore: ignoreFilter,
    nodir: true,
  });

Touched call sites:

  • gitnexus/src/core/group/extractors/http-route-extractor.tsscanFiles
  • gitnexus/src/core/group/extractors/grpc-extractor.ts — both glob calls (buildProtoContext for .proto files and the source-scan inside extract)
  • gitnexus/src/core/group/extractors/topic-extractor.tsextract's glob

Backward compatibility

All 5 hardcoded entries (node_modules, .git, dist, build, vendor) are in DEFAULT_IGNORE_LIST already (verified). Default behaviour for any repo without a .gitnexusignore is byte-identical. Users now additionally get:

Topic-extractor specifics

topic-extractor.ts previously had a 6th hardcoded entry **/*_test.go (Go test convention pushed to the glob level so the orchestrator stays language-agnostic — comment preserved in spirit). The fix wraps createIgnoreFilter so that glob-level pruning of *_test.go is preserved without falling back to a post-hoc .filter():

const baseFilter = await createIgnoreFilter(repoPath);
const ignoreFilter: typeof baseFilter = {
  ignored: (p) => p.relative().endsWith('_test.go') || baseFilter.ignored(p),
  childrenIgnored: (p) => baseFilter.childrenIgnored(p),
};

Wrapper short-circuits on the _test.go check before delegating to the base filter, so glob never reads those files. Regression-locked by a new test that writes a Sarama consumer call inside *_test.go and asserts no contract is emitted.

Tests

A respects .gitnexusignore (#1185) describe block was added to each of:

  • test/unit/group/http-route-extractor.test.ts
  • test/unit/group/grpc-extractor.test.ts
  • test/unit/group/topic-extractor.test.ts

Each exercises the full extractor with real temp directories: a control file in a normal path, a vendored file under mentor_env/ (matching reporter's repro), and a .gitnexusignore excluding mentor_env/. Asserts the control survives and the excluded path produces no contracts. The topic test additionally pins *_test.go glob-level pruning.

Verification

Check Result
tsc --noEmit ✅ clean
Targeted vitest (3 extractor files) ✅ 108/108 pass
npm run test:unit (full) ✅ 4811 passed / 10 skipped

The 3 unit-suite failures (2× git-utils.test.ts plus 1× skip-git-cli.test.ts) are pre-existing environment failures on the current upstream/main — verified via git stash + re-run on a clean tree (same 3 failures, unchanged). No CI-relevant regression.

Why this is safe

Property Status
Scope 3 source files + 3 test files. No public API change, no contract drift.
Cross-platform Path normalization handled inside IgnoreService (already battle-tested in the main ingestion walker).
AGENTS.md / --help accuracy The CLI help text has long mentioned GITNEXUS_NO_GITIGNORE=1 Skip .gitignore parsing (still reads .gitnexusignore) — implying universal honouring. This PR makes that contract true for the contract-extractor surface.
Performance createIgnoreFilter is called once per glob invocation, identical cost to the previous hardcoded array (no per-file overhead).

Closes #1185.

…oreService (abhigyanpatwari#1185)

The HTTP, gRPC, and topic contract extractors each globbed the repo
with a hardcoded `ignore: ['**/node_modules/**', '**/.git/**',
'**/dist/**', '**/build/**', '**/vendor/**']` array, bypassing the
shared `IgnoreService` that the rest of the ingestion pipeline uses
for `.gitnexusignore` and `.gitignore` parsing. Result: a vendored
Python venv (`mentor_env/`), generated stubs, or any user-defined
exclusion silently produced false-positive contracts.

Replace each hardcoded array with `createIgnoreFilter(repoPath)`,
mirroring the canonical pattern in `filesystem-walker.ts`. The 5
hardcoded names are all in `DEFAULT_IGNORE_LIST`, so default
behaviour is preserved; users now also get `.gitnexusignore`
patterns, the rest of the hardcoded list (e.g. `__pycache__`,
`.pytest_cache`), and the `.gitnexusignore` negation semantics
introduced in abhigyanpatwari#771.

The topic extractor additionally filters Go `*_test.go` at the glob
level. That filter is preserved via a small wrapper around
`createIgnoreFilter` that short-circuits before delegating, so
glob-level pruning still applies and the existing `_test.go` skip
test (with new content asserting the pruning is real) still passes.

Tests added to all three `*-extractor.test.ts` files exercising
`.gitnexusignore` honouring end-to-end via real temp directories.
@vercel

vercel Bot commented May 1, 2026

Copy link
Copy Markdown

@azizur100389 is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
7737 7736 0 1 351s

✅ All 7736 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 77.04% 22855/29666 77.02% 📈 +0.0 🟢 ███████████████░░░░░
Branches 65.95% 14688/22270 65.97% 📉 -0.0 🔴 █████████████░░░░░░░
Functions 81.99% 2249/2743 81.86% 📈 +0.1 🟢 ████████████████░░░░
Lines 79.9% 20625/25811 79.89% 📈 +0.0 🟢 ███████████████░░░░░

📋 View full run · Generated by CI

@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 2m 50s —— View job


PR #1247 Review — fix(group): contract extractors honour .gitnexusignore via shared IgnoreService

Tasks:

  • Post initial comment
  • Read DoD.md, AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md
  • Read ignore-service.ts and filesystem-walker.ts
  • Read all 3 changed extractor source files
  • Read all 3 changed extractor test files
  • Analyze IgnoreService contract and correctness
  • Analyze HTTP extractor changes
  • Analyze gRPC extractor changes (proto + source scan paths)
  • Analyze topic extractor changes + *_test.go wrapper
  • Verify DEFAULT_IGNORE_LIST coverage
  • Assess test coverage quality
  • Write final review

Review bar for this PR

  • HTTP/gRPC/topic extractors honour .gitnexusignore, .gitignore, and default ignore rules on the real runtime path (not test-only shims).
  • gRPC honours ignores in both .proto context building and source scanning.
  • Topic extraction preserves *_test.go glob-level pruning.
  • All 5 hardcoded entries (node_modules, .git, dist, build, vendor) remain effective via DEFAULT_IGNORE_LIST.
  • Excluded files produce no provider/consumer contracts, no symbolRef.filePath paths under ignored dirs.
  • Non-ignored file extraction behaviour is unchanged.
  • Tests cover the reporter's mentor_env/ false-positive class and *_test.go regression guard.
  • CI green on all platforms.

Issue and PR context considered

#1185 symptom: HTTP, gRPC, and topic extractors used hardcoded glob ignore arrays (node_modules, .git, dist, build, vendor). These bypassed IgnoreService, causing files matching .gitnexusignore (e.g. mentor_env/ Python venv) to be scanned and emitted as false-positive contracts. Reporter measured 8 spurious consumer contracts.

Affected extractors: http-route-extractor.ts, grpc-extractor.ts, topic-extractor.ts + matching tests.

Changed files: 3 source files (+36 lines), 3 test files (+142 lines), 0 public API or contract format changes.

CI status: ✅ typecheck clean, ✅ 7,735 tests passed on 3 platforms, ✅ E2E passed, 1 unrelated Ruby skip. SHA 61520aa is the PR head and all checks are green.

Core risk areas: (1) IgnoreService used incorrectly with glob; (2) gRPC source-scan path left uncovered by tests; (3) *_test.go wrapper regression; (4) default ignore parity.


Understanding of the change

#1185 showed that group extractors were silently scanning directories that .gitnexusignore was supposed to exclude, because they maintained their own hardcoded ignore: [...] arrays separate from the shared IgnoreService used by the main ingestion walker. The fix replaces those arrays with createIgnoreFilter(repoPath) — the same call that filesystem-walker.ts:34 makes — so all three extractors now share the same ignore logic. Default behaviour is byte-identical because every old hardcoded entry (node_modules, .git, dist, build, vendor) is present in DEFAULT_IGNORE_LIST. The topic extractor adds a thin wrapper that retains *_test.go glob-level pruning (not part of DEFAULT_IGNORE_LIST) before delegating to the base filter. CI and typecheck are green.


Findings

[medium] gRPC test claims source-scan coverage it does not exercise

  • Category: Test coverage
  • Files: gitnexus/test/unit/group/grpc-extractor.test.ts:617–656
  • Issue: The test title and block comment state "Covers both the .proto path (proto-context build) and the source-scan path (Java/Go/Node/Python consumer detection)." In reality, only a .proto file is written under mentor_env/. No source file (.java, .go, .ts, .py) with a detectable gRPC consumer pattern is placed in mentor_env/ to exercise the second createIgnoreFilter call in extract() at line 414–419 of grpc-extractor.ts. The comment overclaims the test's scope.
  • Why it matters here: The gRPC extractor is the only one with two separate glob calls (proto-context + source scan), and the PR description specifically calls out both. The source-scan glob is structurally identical to the proto-scan (same createIgnoreFilter pattern), so the implementation is correct by inspection. But the stated test coverage assertion is false, and a future refactor that accidentally reverted the source-scan path would not be caught by this test.
  • Recommended fix: Add a source file under mentor_env/ with a detectable gRPC consumer pattern (e.g. a Go file with NewOrderServiceClient(...)), and assert its contract is not emitted. Minimal addition, follows the existing test structure. Fix this →
  • Blocks merge: No — the implementation is provably correct by inspection. The test just does not prove what it claims.

[low] Extractor tests cover only .gitnexusignore; .gitignore integration is untested at this level

  • Category: Test completeness
  • Files: All three new test describe blocks.
  • Issue: Every new test writes .gitnexusignore and relies on it. No test verifies that a pattern in .gitignore alone is also honoured. createIgnoreFilter reads both files and the integration point is correctly wired, so this is a theoretical gap rather than a bug. IgnoreService itself has existing tests for .gitignore. But the extractors have no direct test of the .gitignore code path.
  • Why it matters here: The PR description claims "users now additionally get their own .gitignore patterns honoured" — a claim that is not directly exercised by any new test.
  • Recommended fix: Add one minimal test per extractor (or one shared one) that writes a .gitignore excluding mentor_env/ with no .gitnexusignore, and asserts the excluded path produces no contract. Alternatively, the PR description can explicitly note reliance on IgnoreService-level coverage.
  • Blocks merge: No.

[low] Negation semantics (!pattern) not covered at extractor level

  • Category: Test completeness
  • Files: All three new test describe blocks.
  • Issue: The .gitnexusignore negation semantics (!mentor_env/keep/) introduced in Feature: Allow indexing test directories (__tests__, __mocks__) #771 are not exercised in any extractor test. This is acceptable because IgnoreService's own tests cover negation, and the extractors use the filter object directly without touching negation logic. But the gap exists.
  • Why it matters here: Negation is an important user feature and the PR description cites it. If the filter object were ever wrapped or proxied in a way that broke negation, there would be no extractor-level regression catch.
  • Recommended fix: Acceptable to leave as follow-up referencing existing IgnoreService negation tests.
  • Blocks merge: No.

Definition of Done check

DoD item Status
HTTP route extraction honours .gitnexusignore, .gitignore, and default ignore rules ✅ satisfied — createIgnoreFilter wired on real path; test proves .gitnexusignore case
gRPC extraction honours ignores in both .proto context building and source scanning ✅ satisfied (implementation) / ⚠️ partially satisfied (tests — proto path proven, source-scan only by inspection)
Topic extraction honours ignores while preserving *_test.go glob-level pruning ✅ satisfied — wrapper is correct; two tests prove both behaviours
Existing default ignores remain effective for node_modules, .git, dist, build, vendor ✅ satisfied — all 5 names confirmed in DEFAULT_IGNORE_LIST
.gitnexusignore negation semantics consistent with ingestion walker ✅ satisfied at implementation level (same filter object); not tested at extractor level
Excluded files produce no provider/consumer contracts and no symbolRef paths under ignored dirs ✅ satisfied — all three tests assert symbolRef.filePath check
Existing HTTP/gRPC/topic contract detection behaviour unchanged for non-ignored files ✅ satisfied — all pre-existing tests pass (7,735 total)
Tests cover reporter-style mentor_env/ false-positive case ✅ satisfied — all three tests include mentor_env/ scenario
CI, typecheck, focused tests, and full relevant tests are green ✅ satisfied

Ignore semantics assessment

.gitnexusignore: Correctly wired through createIgnoreFilter in all three extractors. Tested in all three new test blocks with the reporter's exact directory name (mentor_env/).

.gitignore: Correctly included via loadIgnoreRules (both .gitignore and .gitnexusignore are read unless GITNEXUS_NO_GITIGNORE=1). Not tested at the extractor level, but the integration point is correct.

GITNEXUS_NO_GITIGNORE: Flows through createIgnoreFilterloadIgnoreRules unchanged. Extractors automatically respect this env var with no additional code; no extractor bypasses it.

Default ignores: node_modules, .git, dist, build, vendor are all present in DEFAULT_IGNORE_LIST (lines 24, 7, 44, 45, 26 of ignore-service.ts). Default behaviour is byte-identical for repos without .gitnexusignore. Additionally, the richer DEFAULT_IGNORE_LIST entries (__pycache__, venv, .venv, etc.) are now also effective.

Negation semantics: createIgnoreFilter implements hasExplicitUnignore with ancestor-walk and last-match-wins semantics. Extractor tests do not cover this, but the shared filter object is used correctly.

Windows/path normalization: p.relative() from path-scurry returns POSIX paths on all platforms (documented in ignore-service.ts:435). Topic wrapper's endsWith('_test.go') is safe. makeContract in gRPC normalizes backslashes via .replace(/\\/g, '/'). CI on 3 platforms confirms this.


Extractor assessment

HTTP route extractor: scanFiles(repoPath) correctly calls createIgnoreFilter(repoPath) once per extract() invocation (lazy evaluation via getScannedFiles() caches the result). nodir: true, cwd: repoPath, HTTP_SCAN_GLOB unchanged. Detection logic untouched. ✅

gRPC proto context: buildProtoContext at line 235 calls createIgnoreFilter(repoPath) before the .proto glob. Ignored .proto files never enter the service map. Provider contracts from ignored protos cannot be emitted. ✅

gRPC source scan: extract() at line 414 calls createIgnoreFilter(repoPath) before the source-scan glob. Creates a separate filter instance (slight duplication of I/O vs. proto-context call, but createIgnoreFilter reads at most 2 files at root — acceptable). Ignored source files are not parsed. ✅ implementation / ⚠️ not test-proven

Topic extractor: createIgnoreFilter(repoPath) called once; result wrapped with *_test.go short-circuit before delegating to base filter for ignored; childrenIgnored delegates cleanly. One parser reused across files. ✅

Topic *_test.go pruning: Wrapper is at line 68–71. p.relative().endsWith('_test.go') fires before baseFilter.ignored(p), so test files are never read. Second test (still prunes _test.go even when .gitnexusignore is empty) writes a real Sarama consumer call and asserts zero contracts — this is a strong regression guard. ✅

Unchanged contract detection for non-ignored files: Pre-existing tests (Spring, Express, Gin, NestJS, gRPC, Kafka, RabbitMQ, NATS) all pass, proving the change is transparent for normal repos. ✅


Test assessment

Tests added: 3 describe('respects .gitnexusignore (#1185)') blocks, 1–2 it cases each. Total ~142 lines.

HTTP ignore tests: Control at src/routes/users.ts, ignored under mentor_env/lib/leaked.ts using same Express pattern. Asserts control emitted, ignored not emitted, and no symbolRef.filePath starts with mentor_env/. Calls extract(null, ...) to force source-scan path. ✅

gRPC proto coverage: Control at proto/auth.proto, ignored at mentor_env/lib/leaked.proto. Both assert contract presence/absence and symbolRef path. ✅ for proto-context path.

gRPC source-scan coverage: No source file in mentor_env/ with a detectable gRPC consumer pattern. The test name and comment overclaim. ⚠️ gap

Topic mentor_env test: Control at src/EventHandler.java, ignored at mentor_env/lib/LeakedHandler.java with @KafkaListener. Asserts ids contain user.created, not leaked.event, and no mentor_env/ in symbolRef. ✅

Topic _test.go test: Writes real Sarama ConsumePartition("real-topic-from-test", ...) in src/orders_test.go, asserts zero contracts. This is a strong, meaningful test. ✅

Missing cases: .gitignore-only test per extractor; gRPC ignored source file; negation (!pattern) at extractor level.


Validation assessment

CI status: ✅ All checks green at HEAD SHA 61520aa. Typecheck (tsc --noEmit), unit tests (3 platforms), E2E passed. 7,735 passed / 1 skipped (unrelated Ruby). The 3 pre-existing local failures cited in the PR body (git-utils, skip-git-cli) are confirmed pre-existing and do not appear in CI.

Targeted tests: The 3 new extractor test files are part of npm run test:unit which passed. The PR reports 108/108 pass on targeted extractor suites.

Platform coverage: macOS, Windows, Linux — all passing.

Unverified: .gitignore-only code path not tested in extractors. Negation semantics not tested in extractors. Neither blocks merge given IgnoreService coverage.


Final verdict

production-ready with minor follow-ups

All three extractors are correctly wired to createIgnoreFilter on the real runtime path. DEFAULT_IGNORE_LIST covers every previously hardcoded entry. The topic *_test.go wrapper is correct and strongly regression-locked. CI is green on 3 platforms with 7,735 tests passing. The only meaningful gap is that the gRPC test claims to cover both proto-context and source-scan ignore paths but only exercises the proto-context path — the source-scan implementation is correct by inspection (identical pattern to the proto-scan), but the test overclaims its coverage. The .gitignore and negation paths are not tested at the extractor level, which is acceptable given IgnoreService-level coverage. These are follow-up items, not blockers.

@magyargergo

Copy link
Copy Markdown
Collaborator

@azizur100389 can you please look at these findings? 🙏

…overage (abhigyanpatwari#1185)

Addresses two findings from the @claude review on PR abhigyanpatwari#1247:

[medium] The gRPC ignore test claimed to cover both proto-context and
source-scan paths but only wrote a .proto file under mentor_env/.
Added a Python `_pb2_grpc.<Name>Stub(channel)` consumer file under the
same ignored dir (mirroring the canonical pattern from
`test_extract_python_stub_returns_consumer`); without the
`.gitnexusignore` filter that file would emit a consumer contract.
The test now exercises both `createIgnoreFilter` calls inside the gRPC
extractor (`buildProtoContext` + `extract`) in a single run, with both
defence-in-depth path-prefix assertions and a specific
`role: consumer` LeakedService assertion.

[low] Added one shared .gitignore-only test on the HTTP extractor.
`createIgnoreFilter` reads both `.gitignore` and `.gitnexusignore` via
`loadIgnoreRules`, but no extractor-level test exercised the
`.gitignore` path. One shared test is sufficient because all three
extractors consume the same filter object — verified at
`IgnoreService` level already.

The remaining [low] finding — "negation semantics (!pattern) not
tested at extractor level" — is deferred deliberately, not skipped.
Three reasons:

  1. The negation logic (introduced in abhigyanpatwari#771) lives entirely inside
     `createIgnoreFilter`'s `hasExplicitUnignore` ancestor-walk in
     `ignore-service.ts`. The extractors only consume the returned
     filter object — they never inspect patterns, never call
     `hasExplicitUnignore` directly, and have no code path that could
     diverge from the IgnoreService's negation behaviour.

  2. Negation is already locked in by 8 dedicated unit tests in
     `test/unit/ignore-service.test.ts` (the abhigyanpatwari#771 suite), plus the
     `!parent/` + `parent/child/` last-match-wins regression test
     added in PR abhigyanpatwari#1046. An extractor-level negation test would
     re-prove the same code path and would not catch any failure mode
     the existing tests don't already catch.

  3. The bot itself flagged the gap as "Acceptable to leave as
     follow-up referencing existing IgnoreService negation tests" —
     the deferral matches its own recommendation.

If a future change inserts an extractor-side wrapper around the filter
(as topic-extractor.ts already does for `*_test.go`) that could
plausibly affect negation, an extractor-level negation test should be
added at that point — not pre-emptively here.
@azizur100389

Copy link
Copy Markdown
Contributor Author

Addressed in 8e2752a1. Thanks @magyargergo for summoning the review — the medium finding was a real test-coverage gap on my end.

[medium] gRPC source-scan path now actually exercised. The original test wrote only a .proto under mentor_env/ so extract()'s second createIgnoreFilter call (the source-scan glob at line 414) was unproven. Added a Python _pb2_grpc.<Name>Stub(channel) consumer file under mentor_env/lib/leaked_consumer.py, mirroring the canonical pattern from test_extract_python_stub_returns_consumer above. Without the .gitnexusignore filter that file would emit a grpc::*/LeakedService consumer contract; with it, the test now asserts:

  • the existing path-prefix defence-in-depth (no symbolRef.filePath under mentor_env/),
  • the proto-context-specific assertion (grpc::leaked.LeakedService/Ping undefined),
  • and a new consumer-specific assertion (no role: consumer contract whose contractId matches LeakedService).

Both globs are now exercised in the same run, and the test name + comment are updated to accurately reflect what's covered.

[low] .gitignore-only path now covered at extractor level. Added one shared test on the HTTP extractor — same Express pattern as the existing test, but writes .gitignore instead of .gitnexusignore. One test is sufficient (per the bot's "or one shared one" suggestion) because all three extractors consume the same filter object; the .gitignore parsing happens inside loadIgnoreRules which is shared between them.

[low] negation deferred deliberately. The commit body spells out the rationale; short version:

  1. Negation logic (introduced in Feature: Allow indexing test directories (__tests__, __mocks__) #771) lives entirely in createIgnoreFilter's hasExplicitUnignore ancestor walk. Extractors consume the filter object opaquely — there is no extractor-side surface that could diverge from IgnoreService's negation behaviour.
  2. Negation is already locked in by 8 dedicated tests in test/unit/ignore-service.test.ts plus the !parent/ + parent/child/ last-match-wins regression test from PR feat(ingestion): GITNEXUS_INDEX_TEST_DIRS opt-in for __tests__ / __mocks__ (#771) #1046.
  3. The bot itself flagged this as "Acceptable to leave as follow-up referencing existing IgnoreService negation tests" — the deferral matches its recommendation.

If a future change introduces an extractor-side wrapper around the filter (as topic-extractor.ts already does for *_test.go) that could affect negation, an extractor-level negation test should be added at that point — not pre-emptively here.

Local verification on 8e2752a1:

CI re-running.

@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 2m 54s —— View job


PR #1247 Review — fix(group): contract extractors honour .gitnexusignore via shared IgnoreService

Tasks:

  • Post initial comment
  • Read IgnoreService and filesystem-walker
  • Read all 3 changed extractor source files
  • Read all 3 changed extractor test files
  • Inspect DEFAULT_IGNORE_LIST and GITNEXUS_NO_GITIGNORE
  • Analyze IgnoreService contract and glob wiring
  • Analyze HTTP extractor changes
  • Analyze gRPC extractor changes (proto + source scan)
  • Analyze topic extractor + *_test.go wrapper
  • Verify default ignore parity
  • Assess tests — gRPC source-scan, .gitignore, _test.go content
  • Inspect CI and validation status
  • Write final review

Review bar for this PR

  • HTTP route extraction honours .gitnexusignore, .gitignore, and default ignore rules via createIgnoreFilter — not post-filtered.
  • gRPC extraction honours ignores in both the .proto context glob (buildProtoContext) and the source-scan glob (extract).
  • Topic extraction honours ignores while preserving *_test.go glob-level pruning (never read, not post-filtered).
  • All 5 hardcoded entries (node_modules, .git, dist, build, vendor) remain effective via DEFAULT_IGNORE_LIST.
  • Excluded files produce no provider or consumer contracts; no symbolRef.filePath paths under ignored dirs.
  • Existing HTTP/gRPC/topic contract detection is unchanged for non-ignored files.
  • CI, typecheck, and full unit tests on 3 platforms are green at latest PR head SHA.

Issue and PR context considered

#1185 symptom: HTTP, gRPC, and topic extractors maintained their own hardcoded ignore: ['node_modules', '.git', 'dist', 'build', 'vendor'] glob arrays, bypassing IgnoreService. Files in .gitnexusignore-excluded dirs (e.g. mentor_env/ Python venv) were scanned and emitted 8 false-positive consumer contracts.

Affected extractors: http-route-extractor.ts, grpc-extractor.ts (2 glob call sites), topic-extractor.ts + their test files.

Changed files: 3 source files (+22 lines net), 3 test files (+205 lines). No public API or contract-format changes.

CI status: ✅ All checks green at HEAD SHA 8e2752a. 7,736 tests passed on 3 platforms, typecheck clean, E2E passed. 1 pre-existing Ruby skip unrelated to this change.

Core risk areas verified: (1) IgnoreService glob wiring — confirmed correct; (2) gRPC both glob paths — both tested with real detectable patterns; (3) *_test.go wrapper regression — strongly locked; (4) default ignore parity — all 5 entries confirmed in DEFAULT_IGNORE_LIST.


Understanding of the change

#1185 showed that group extractors silently scanned directories that .gitnexusignore was supposed to exclude, because each maintained its own hardcoded ignore: [...] array disconnected from the shared IgnoreService used by the ingestion walker. The fix replaces those arrays with createIgnoreFilter(repoPath) — the same call filesystem-walker.ts:34 makes — so all three extractors now share the same ignore semantics: .gitnexusignore, .gitignore, DEFAULT_IGNORE_LIST, negation rules, and child pruning. Default behaviour is byte-identical because every previously hardcoded entry (node_modules, .git, dist, build, vendor) is present in DEFAULT_IGNORE_LIST (lines 23, 8, 45, 46, 26). The topic extractor adds a thin ignored wrapper that retains *_test.go glob-level pruning before delegating to the base filter — preserving behaviour that was not in DEFAULT_IGNORE_LIST. Both gRPC glob paths call createIgnoreFilter independently (minor I/O duplication, reads at most 2 root files — negligible). CI green on 3 platforms.


Findings

No production-readiness issues found against the current DoD bar.


Definition of Done check

DoD item Status
HTTP route extraction honours .gitnexusignore, .gitignore, and default ignore rules ✅ satisfied — createIgnoreFilter wired; both .gitnexusignore and .gitignore-only paths tested
gRPC extraction honours ignores in both .proto context building and source scanning ✅ satisfied — both globs call createIgnoreFilter; test writes leaked.proto + leaked_consumer.py under mentor_env/ and asserts neither emits a contract
Topic extraction honours ignores while preserving *_test.go glob-level pruning ✅ satisfied — wrapper short-circuits on endsWith('_test.go') before base filter; regression-locked by Sarama ConsumePartition call inside _test.go with zero-contract assertion
Existing default ignores remain effective for node_modules, .git, dist, build, vendor ✅ satisfied — all 5 confirmed in DEFAULT_IGNORE_LIST at lines 23, 8, 45, 46, 26
.gitnexusignore negation semantics consistent with ingestion walker ✅ satisfied at implementation level — same filter object from createIgnoreFilter; deferred extractor-level test acceptable per IgnoreService-level coverage
Excluded files produce no provider/consumer contracts; no symbolRef paths under ignored dirs ✅ satisfied — all three tests assert symbolRef.filePath prefix check
Existing HTTP/gRPC/topic contract detection unchanged for non-ignored files ✅ satisfied — all pre-existing tests pass (7,736 total)
CI, typecheck, focused tests, and full relevant tests are green ✅ satisfied — SHA 8e2752a, all checks green

Ignore semantics assessment

.gitnexusignore: Correctly wired through createIgnoreFilter in all three extractors. Tested in all three new describe blocks with the reporter's mentor_env/ directory name.

.gitignore: Correctly included via loadIgnoreRules (reads both .gitignore and .gitnexusignore unless GITNEXUS_NO_GITIGNORE=1). Now tested at the extractor level: http-route-extractor.test.ts includes a dedicated case that writes only .gitignore (no .gitnexusignore) and asserts the excluded path produces no contract (8e2752a). One test is sufficient because all three extractors consume the same createIgnoreFilter object — .gitignore parsing is not extractor-specific.

GITNEXUS_NO_GITIGNORE: Flows through createIgnoreFilterloadIgnoreRulesskipGitignore at ignore-service.ts:357. No extractor bypasses this. No extractor-specific env test is needed; IgnoreService owns the setting.

Default ignores: node_modules (line 23), .git (line 8), dist (line 45), build (line 46), vendor (line 26) — all confirmed in DEFAULT_IGNORE_LIST. Topic's **/*_test.go was never in DEFAULT_IGNORE_LIST; it is preserved by the topic-specific wrapper. Default behaviour is byte-identical for repos without .gitnexusignore.

Negation semantics: hasExplicitUnignore performs an ancestor-walk with last-match-wins via ig.ignores(rel) guard. Extractors consume the filter object opaquely — there is no extractor-side logic that could diverge from IgnoreService. Deferred extractor-level test is acceptable given 8 dedicated negation tests in ignore-service.test.ts plus the #1046 regression test.

Windows/path normalization: p.relative() from path-scurry returns POSIX paths on all platforms (documented in ignore-service.ts:435). Topic wrapper's endsWith('_test.go') is safe. makeContract in gRPC normalizes backslashes at filePath.replace(/\\/g, '/'). CI on 3 platforms (macOS, Windows, Linux) confirms this.


Extractor assessment

HTTP route extractor: scanFiles(repoPath) at line 211 calls createIgnoreFilter(repoPath) once, passes the filter directly to glob. nodir: true, cwd: repoPath, HTTP_SCAN_GLOB all unchanged. getScannedFiles() lazy-caches the result — createIgnoreFilter is called at most once per extract() invocation even when both provider and consumer fallback paths trigger. Detection logic entirely untouched. ✅

gRPC proto context: buildProtoContext(repoPath) at line 235 calls createIgnoreFilter(repoPath). Ignored .proto files never enter the service map, cannot create provider contracts, and cannot contribute package/service resolution context to the source scan. ✅

gRPC source scan: extract() at line 414 calls createIgnoreFilter(repoPath) independently. Creates a second filter instance (minor duplication of the 2-file root read from buildProtoContext). The duplication is acceptable — createIgnoreFilter reads at most 2 small root files. The test verifies this path: leaked_consumer.py contains leaked_pb2_grpc.LeakedServiceStub(channel), which matches STUB_RE = /^(\w+)Stub$/ extracting LeakedService as the service name — without the ignore filter this would emit a grpc::*/LeakedService consumer contract. With the filter it does not. ✅

Topic extractor: createIgnoreFilter(repoPath) called once at line 67; result wrapped with _test.go short-circuit at line 69. p.relative().endsWith('_test.go') fires before baseFilter.ignored(p) — Go test files are never read. childrenIgnored delegates cleanly to base filter (no wrapping needed for directory descent). ✅

Topic *_test.go pruning: The wrapper at line 68–71 is a precise, minimal addition. p.relative() returns POSIX paths so .endsWith('_test.go') is correct for Go convention (lowercase). Previous behaviour pruned exactly **/*_test.go; wrapper behaviour is semantically identical — any file whose repo-relative path ends in _test.go is treated as ignored at glob level. The _test.go regression test writes a real Sarama ConsumePartition("real-topic-from-test", ...) call and asserts zero contracts — this is a meaningful, not inert, regression guard. ✅

Unchanged contract detection for non-ignored files: All pre-existing tests pass (7,736 total across 3 platforms). Spring @GetMapping, Express routes, NestJS, gRPC Java/Go/Node/Python, Kafka, RabbitMQ, NATS patterns all intact. ✅


Test assessment

Tests added: 3 describe('respects .gitnexusignore (#1185)') blocks + topic *_test.go regression test. Total ~205 new lines across two commits.

HTTP .gitnexusignore case: Control at src/routes/users.ts (Express GET /api/users), ignored at mentor_env/lib/leaked.ts (Express GET /api/leaked). Asserts control emitted, ignored absent, no symbolRef.filePath under mentor_env/. Calls extract(null, ...) forcing source-scan. ✅

HTTP .gitignore-only case: Identical Express pattern, writes only .gitignore (no .gitnexusignore). Asserts same three conditions. Proves .gitignore parsing is wired to extractor globs. ✅ (added in 8e2752a)

gRPC proto + source-scan: Control at proto/auth.proto (service AuthService). Ignored at mentor_env/lib/leaked.proto (service LeakedService) and mentor_env/lib/leaked_consumer.py (leaked_pb2_grpc.LeakedServiceStub(channel)). Single test run exercises both glob paths. Asserts: control contract defined, no symbolRef under mentor_env/, grpc::leaked.LeakedService/Ping undefined, no role: consumer contract matching /LeakedService/. The Python consumer WOULD emit without the filter (confirmed by reading STUB_RE). ✅

Topic .gitnexusignore case: Control at src/EventHandler.java (@KafkaListener(topics = "user.created")), ignored at mentor_env/lib/LeakedHandler.java (@KafkaListener(topics = "leaked.event")). Asserts user.created present, leaked.event absent, no symbolRef under mentor_env/. ✅

Topic *_test.go regression test: Real Sarama ConsumePartition("real-topic-from-test", ...) inside src/orders_test.go. Asserts zero contracts. This is a strong regression guard — the file would produce a detectable contract if the wrapper were removed. ✅

Missing cases: Negation (!pattern) not tested at extractor level — deferred with sound rationale (filter object used opaquely, 8 IgnoreService-level negation tests exist). gRPC and topic .gitignore-only paths not tested — acceptable given HTTP covers the shared integration point and all three extractors consume identical createIgnoreFilter output.


Validation assessment

CI: ✅ All checks green at HEAD SHA 8e2752a (latest commit on branch). Typecheck (tsc --noEmit), unit tests on macOS/Windows/Linux, E2E all passed.

Test counts: 7,736 passed / 1 skipped (pre-existing Ruby skip, unrelated). PR body reports 3 pre-existing local failures (git-utils.test.ts × 2, skip-git-cli.test.ts × 1) — these are confirmed absent from CI (environment-specific tmpdir issues, already present on clean upstream/main).

Targeted extractor tests: Part of npm run test:unit which passed. PR reports 109/109 on the 3 focused extractor files (up from 108 with the added .gitignore-only test).

Platform coverage: macOS, Windows, Linux — all passing. Backslash normalization in makeContract and p.relative() POSIX guarantee cover Windows.

Not verified by tests: gRPC .gitignore-only path, negation at extractor level — both explicitly deferred with sound rationale. Neither blocks merge.


Final verdict

production-ready

All three extractor paths are correctly wired to createIgnoreFilter on the real runtime path. The previously-flagged medium gap (gRPC source-scan path only provable by inspection) has been closed: commit 8e2752a adds a Python LeakedServiceStub(channel) file under mentor_env/ that the STUB_RE pattern would match, making the test a true regression guard for both gRPC glob paths. The .gitignore-only extractor-level smoke test added to the HTTP suite proves the end-to-end integration point shared by all three extractors. The topic *_test.go wrapper is correct and regression-locked by a real Sarama consumer call. DEFAULT_IGNORE_LIST covers all 5 previously hardcoded entries; default behaviour is byte-identical. CI is green on all 3 platforms with 7,736 tests passing. There are no remaining blocking or meaningful non-blocking issues — this PR is ready to merge.

@magyargergo magyargergo merged commit 4be4abe into abhigyanpatwari:main May 1, 2026
18 of 19 checks passed
@magyargergo

Copy link
Copy Markdown
Collaborator

Thank you for your contribution!

evander-wang pushed a commit to evander-wang/GitNexus that referenced this pull request May 3, 2026
…oreService (abhigyanpatwari#1185) (abhigyanpatwari#1247)

* fix(group): contract extractors honour .gitnexusignore via shared IgnoreService (abhigyanpatwari#1185)

The HTTP, gRPC, and topic contract extractors each globbed the repo
with a hardcoded `ignore: ['**/node_modules/**', '**/.git/**',
'**/dist/**', '**/build/**', '**/vendor/**']` array, bypassing the
shared `IgnoreService` that the rest of the ingestion pipeline uses
for `.gitnexusignore` and `.gitignore` parsing. Result: a vendored
Python venv (`mentor_env/`), generated stubs, or any user-defined
exclusion silently produced false-positive contracts.

Replace each hardcoded array with `createIgnoreFilter(repoPath)`,
mirroring the canonical pattern in `filesystem-walker.ts`. The 5
hardcoded names are all in `DEFAULT_IGNORE_LIST`, so default
behaviour is preserved; users now also get `.gitnexusignore`
patterns, the rest of the hardcoded list (e.g. `__pycache__`,
`.pytest_cache`), and the `.gitnexusignore` negation semantics
introduced in abhigyanpatwari#771.

The topic extractor additionally filters Go `*_test.go` at the glob
level. That filter is preserved via a small wrapper around
`createIgnoreFilter` that short-circuits before delegating, so
glob-level pruning still applies and the existing `_test.go` skip
test (with new content asserting the pruning is real) still passes.

Tests added to all three `*-extractor.test.ts` files exercising
`.gitnexusignore` honouring end-to-end via real temp directories.

* test(group): exercise gRPC source-scan ignore + add .gitignore-only coverage (abhigyanpatwari#1185)

Addresses two findings from the @claude review on PR abhigyanpatwari#1247:

[medium] The gRPC ignore test claimed to cover both proto-context and
source-scan paths but only wrote a .proto file under mentor_env/.
Added a Python `_pb2_grpc.<Name>Stub(channel)` consumer file under the
same ignored dir (mirroring the canonical pattern from
`test_extract_python_stub_returns_consumer`); without the
`.gitnexusignore` filter that file would emit a consumer contract.
The test now exercises both `createIgnoreFilter` calls inside the gRPC
extractor (`buildProtoContext` + `extract`) in a single run, with both
defence-in-depth path-prefix assertions and a specific
`role: consumer` LeakedService assertion.

[low] Added one shared .gitignore-only test on the HTTP extractor.
`createIgnoreFilter` reads both `.gitignore` and `.gitnexusignore` via
`loadIgnoreRules`, but no extractor-level test exercised the
`.gitignore` path. One shared test is sufficient because all three
extractors consume the same filter object — verified at
`IgnoreService` level already.

The remaining [low] finding — "negation semantics (!pattern) not
tested at extractor level" — is deferred deliberately, not skipped.
Three reasons:

  1. The negation logic (introduced in abhigyanpatwari#771) lives entirely inside
     `createIgnoreFilter`'s `hasExplicitUnignore` ancestor-walk in
     `ignore-service.ts`. The extractors only consume the returned
     filter object — they never inspect patterns, never call
     `hasExplicitUnignore` directly, and have no code path that could
     diverge from the IgnoreService's negation behaviour.

  2. Negation is already locked in by 8 dedicated unit tests in
     `test/unit/ignore-service.test.ts` (the abhigyanpatwari#771 suite), plus the
     `!parent/` + `parent/child/` last-match-wins regression test
     added in PR abhigyanpatwari#1046. An extractor-level negation test would
     re-prove the same code path and would not catch any failure mode
     the existing tests don't already catch.

  3. The bot itself flagged the gap as "Acceptable to leave as
     follow-up referencing existing IgnoreService negation tests" —
     the deferral matches its own recommendation.

If a future change inserts an extractor-side wrapper around the filter
(as topic-extractor.ts already does for `*_test.go`) that could
plausibly affect negation, an extractor-level negation test should be
added at that point — not pre-emptively here.
@magyargergo magyargergo mentioned this pull request May 10, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug][group] Contract extractors bypass IgnoreService — .gitnexusignore not honored during extraction

2 participants