feat(ingestion): ScopeExtractor driver — 5-pass CaptureMatch → ParsedFile (#919, RFC #909 Ring 2 PKG)#965
Conversation
…File (#919, RFC #909 Ring 2 PKG) Kicks off Ring 2 PKG. Implements RFC §5.3 + §3.2 Phase 1: the central, source-agnostic driver that turns a language provider's `CaptureMatch[]` into a `ParsedFile` — the per-file artifact the finalize orchestrator (#921) feeds into the shared `finalize()` algorithm (#915). ## Files ### New shared contracts - `gitnexus-shared/src/scope-resolution/parsed-file.ts` Per-file extraction artifact: scopes, parsedImports, localDefs, referenceSites. Structural superset of `FinalizeFile` so the finalize orchestrator threads `ParsedFile` through unchanged. - `gitnexus-shared/src/scope-resolution/reference-site.ts` Pre-resolution usage fact: name, atRange, inScope, kind, optional callForm/explicitReceiver/arity. Converted to `Reference` records by the resolution phase (populates `ReferenceIndex`). ### Ring 1 collateral tweak - `language-provider.ts: emitScopeCaptures` now returns `Promise<readonly CaptureMatch[]>` (was `readonly Capture[]`). Pre-grouping per tree-sitter match is the provider's job — the extractor expects coherent matches, not flat captures. No consumers yet (all languages still on legacy DAG), so no breakage. Docstring updated. ### New CLI module - `gitnexus/src/core/ingestion/scope-extractor.ts` Single entry point: `extract(matches, filePath, provider): ParsedFile`. Five-pass pipeline: Pass 1 — Build scope tree. `@scope.*` → `ScopeDraft[]` via range-containment parent derivation. Honors `provider.shouldCreateScope` (skip-but-reparent-children) and `provider.resolveScopeKind`. Throws `ScopeTreeInvariantError` via `buildScopeTree` on malformed input. Pass 2 — Attach declarations + local bindings. `@declaration.*` → `SymbolDefinition` + `BindingRef { origin: 'local' }`. Default attachment: innermost containing scope. Hoisting via `provider.bindingScopeFor`. Pass 3 — Collect raw imports. `@import.*` → `ParsedImport` via `provider.interpretImport`. Attached to ParsedFile (finalize resolves owning scope in Phase 2). Pass 4 — Collect type bindings. `@type-binding.*` → `TypeRef` via `provider.interpretTypeBinding` → `scope.typeBindings`. Hoistable via `bindingScopeFor`. Pass 5 — Collect reference sites. `@reference.*` → `ReferenceSite[]`. Call form from declarative sub-tag (`@reference.call.member`) or `provider.classifyCallForm`. ### Tests - `gitnexus/test/unit/scope-resolution/scope-extractor.test.ts` 23 tests organized by pass + one end-to-end fixture exercising all 5 passes together. MockProvider emits synthetic `CaptureMatch[]` with no AST — extractor is pure given those. ## Design notes - **Source-agnostic.** No `Tree` / `SyntaxNode` types leak into the driver. Works for tree-sitter providers and COBOL's regex tagger. - **One AST walk per language.** Providers do the walk inside `emitScopeCaptures`; this driver does zero traversal. - **Invariants delegated.** `ScopeTree.buildScopeTree` enforces structural rules (non-Module has parent, parent contains child, siblings don't overlap). The extractor doesn't try to repair malformed captures. - **Sub-tag whitelist.** `@reference.receiver`, `@declaration.name`, `@import.source`, etc. are known sub-tags — excluded from anchor selection so the broadest-range heuristic doesn't mis-identify them as anchors for their topic. Bug surfaced in the end-to-end fixture test (member call with a large-range receiver) and was fixed before commit. ## Verification - `tsc --noEmit` clean (both `gitnexus-shared` and `gitnexus`) - `gitnexus-shared` build clean - 23/23 new tests pass - Full scope-resolution / model / shadow suite: **285/285 pass** ## Closes part of #909. Unblocks - #920 parse-worker integration (emit ParsedFile from the worker) - #921 finalize orchestrator (consume ParsedFile[] workspace-wide) - #922 per-language import adapters
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@claude Act as a senior reviewer for GitNexus. Your job is to determine whether this PR is production-ready for this repo, not to give a generic code review. You are reviewing a PR in the GitNexus monorepo:
Your task has 2 phases, in this exact order: PHASE 1 — DEFINE THE BAR PHASE 2 — REVIEW THE PR AGAINST THAT BAR CONTEXT TO LOAD FIRST
Additional context:
PRIMARY OBJECTIVE Do not optimize for completeness at the expense of signal. REVIEW RULES
For each finding, assign one severity:
REPO-SPECIFIC REVIEW CHECKLIST 1. Correctness & functional completenessCheck:
2. Code clarity & clean codeCheck:
3. Test coverage & change safetyEvaluate against TESTING.md:
4. PerformanceInspect for:
5. Operational riskCheck:
6. MaintainabilityCheck:
NON-NEGOTIABLES TO VERIFY
OUTPUT FORMAT Production-ready bar for this PRGive a concise repo-specific definition in 5–10 bullets. FindingsFor each finding, use this structure exactly: [BLOCKING|NON-BLOCKING|NIT] Short title
Only include findings that are real and supported.
What I could not fully verifyList any behavior you could not prove from the diff, tests, or available context. Final verdictChoose exactly one:
If verdict is not ready, list the blocking items in priority order. IMPORTANT REVIEW BEHAVIOR
You must also ensure we are on track with the RFC plan! Check the issue tickets involved! |
|
Claude finished @magyargergo's task in 4m 39s —— View job PR Review: feat(ingestion): ScopeExtractor driver (#965)Tasks
Production-ready bar for this PRBased on AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, and ARCHITECTURE.md:
Findings1. Correctness & functional completenessNo issues found in the extractor's core logic. The five-pass pipeline matches RFC §5.3 exactly:
The
2. Code clarity & clean codeNON-BLOCKING —
|
| Concern | Assessment |
|---|---|
| Language-specific logic in shared ingestion | None. Every language decision is behind an optional hook. |
| Cross-phase coupling | None. extract() takes captures, returns ParsedFile — no Phase dependency. |
ParsedFile/ReferenceSite placement |
Correct: gitnexus-shared/src/scope-resolution/. |
index.ts exports |
Type-only exports; no value re-export regressions. No barrel anti-patterns. |
| Drive-by refactors | None found. The language-provider.ts change is strictly within scope (updated return type + JSDoc). |
KNOWN_SUB_TAGS extensibility |
One-line addition per new sub-tag convention; well-documented. |
| Next contributor discoverability | extract() is the clear entry point. Five-pass structure mirrors the RFC section numbering. Tests are organized by pass for easy regression localization. |
The deliberate isolation strategy (extractor is callable but not wired to the pipeline) is architecturally sound and matches the Ring 2 PKG ticket breakdown.
Non-negotiables
| Check | Result |
|---|---|
| No secrets/tokens/machine-specific paths | ✅ |
| No blind find-and-replace renames | ✅ (Capture → CaptureMatch is a targeted return-type change on an un-implemented hook) |
emitScopeCaptures change: upstream impact considered |
✅ Zero consumers confirmed |
| No HIGH/CRITICAL impact unaddressed | ✅ (no consumers of changed symbol) |
| PR title follows conventional commit | ✅ feat(ingestion): ... |
| PolyForm license untouched | ✅ |
RFC Plan alignment
Ring 2 SHARED is fully landed: #912 (ScopeTree + PositionIndex), #915 (finalize), #917 (registries), #918 (shadow). This PR (#919) is the correct next step as Ring 2 PKG kickoff. The remaining Ring 2 PKG work (#920 parse-worker, #921 finalize orchestrator, #922 import adapters, #923 shadow harness) is correctly deferred. On track.
What I could not fully verify
tsc --noEmitpass for both packages — claimed in the PR description, not locally runnable in this review context. The diff is type-clean on inspection.vitest run285/285 — claimed, not re-runnable here. Test logic appears sound.buildScopeTreeandbuildPositionIndexinvariant behavior under the exact draft snapshots used in tests — these are Ring 2 SHARED (RING2-SHARED-1: Scope / ScopeTree / PositionIndex in gitnexus-shared #912) and assumed correct per prior review of that PR.ScopeTree.getScopebehavior whenscopeTree.byIdcontains pre-mutation snapshots — correct structurally, but the interaction withclassifyCallFormimplementations is only verifiable once Ring 3 providers are written.
Final verdict
✅ Ready to merge — no blocking issues.
The extractor is correctly implemented, source-agnostic, invariant-enforcing, and well-tested for its stated scope. The two NON-BLOCKING findings are real but not merge-blockers for a deliberately-unwired driver:
ownerDefIdForstub — should be tracked in a follow-up issue or explicitly noted in RING2-PKG-2: parse-worker.ts integration — emit ParsedFile artifacts #920/RING2-PKG-3: Finalize orchestrator — invoke shared finalize algorithm; write to MutableSemanticModel #921 as a Ring 3 provider responsibility. Fix this →- Missing regression test for receiver-wider-than-anchor — the bug was fixed but the specific failure mode isn't guarded against regression. Fix this →
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 6724 tests passed 97 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
Addresses all 5 items from the PR #965 review in-PR. ## Structural changes - **Extract `ScopeExtractorHooks` as the narrow dependency surface.** The extractor now declares its dependency on a `Pick`-narrowed subset of `LanguageProvider` (just the 6 scope-resolution hooks it actually reads). Test mocks implement exactly that interface — no more `as unknown as LanguageProvider` cast hiding missing-field bugs. Adding a new hook read becomes a compile error, not a silent test pass. (Finding 3.2) - **Remove dead `ownerDefIdFor` stub + `isOwnerKind` helper.** The function always returned `undefined` with `void innermost; void drafts;` suppressors — an incomplete-implementation signal. The code path was also misleading: creating a clone of the def with `ownerId: undefined` is structurally identical to keeping the original. Pass 2 now keeps the def as-is. Contract is documented in a code comment: providers that need `ownerId` set it from their declaration hook; `finalize` (via #914 `MethodDispatchIndex`) fills in method/field `ownerId` in a post-extraction pass that has full def visibility. (Finding 2.1) - **Standardize `filePath` threading across passes 4 and 5.** Pass 4 was reading `drafts[0]!.filePath`; pass 5 was reading `anyFilePathFromScopeTree(scopeTree)`. Both equivalent but inconsistent. Both now take `filePath` as a parameter from the top-level `extract()` call. The `anyFilePathFromScopeTree` helper is removed. (Finding 2.2) ## Documentation - **Snapshot-semantics comment on `scopeTree` + `positionIndex`.** The hooks called during Passes 2-5 receive a `scopeTree` built BEFORE any bindings/ownedDefs/typeBindings were written. Hooks MUST NOT rely on `scope.bindings` etc. being populated — they're for parent/range/kind queries only. Added a doc block at the `scopeTree`/`positionIndex` construction site so future Ring 3 implementers don't write a `classifyCallForm` that reads bindings. (Finding 2.3) ## Tests - **Regression for the anchor-vs-receiver bug** (Finding 3.1): a member-call match where `@reference.receiver` spans columns 0-10 (wider) and the call name spans 11-15 (narrower). Without the `KNOWN_SUB_TAGS` exclusion, the broadest-range heuristic would have picked the receiver; the test pins that the call name is the one that ends up in `referenceSites[0].name`. - **Mock provider now types exactly `ScopeExtractorHooks`**, no more double-cast. Any future hook added to `extract()` that isn't in `ScopeExtractorHooks` is a compile error. ## Verification - `tsc --noEmit` clean in both `gitnexus-shared` and `gitnexus` - `gitnexus-shared` build clean - 24/24 scope-extractor tests pass (+1 regression) - Full scope-resolution / model / shadow suite: **286/286 pass**
…File (abhigyanpatwari#919, RFC abhigyanpatwari#909 Ring 2 PKG) (abhigyanpatwari#965) * feat(ingestion): ScopeExtractor driver — 5-pass CaptureMatch → ParsedFile (abhigyanpatwari#919, RFC abhigyanpatwari#909 Ring 2 PKG) Kicks off Ring 2 PKG. Implements RFC §5.3 + §3.2 Phase 1: the central, source-agnostic driver that turns a language provider's `CaptureMatch[]` into a `ParsedFile` — the per-file artifact the finalize orchestrator (abhigyanpatwari#921) feeds into the shared `finalize()` algorithm (abhigyanpatwari#915). ## Files ### New shared contracts - `gitnexus-shared/src/scope-resolution/parsed-file.ts` Per-file extraction artifact: scopes, parsedImports, localDefs, referenceSites. Structural superset of `FinalizeFile` so the finalize orchestrator threads `ParsedFile` through unchanged. - `gitnexus-shared/src/scope-resolution/reference-site.ts` Pre-resolution usage fact: name, atRange, inScope, kind, optional callForm/explicitReceiver/arity. Converted to `Reference` records by the resolution phase (populates `ReferenceIndex`). ### Ring 1 collateral tweak - `language-provider.ts: emitScopeCaptures` now returns `Promise<readonly CaptureMatch[]>` (was `readonly Capture[]`). Pre-grouping per tree-sitter match is the provider's job — the extractor expects coherent matches, not flat captures. No consumers yet (all languages still on legacy DAG), so no breakage. Docstring updated. ### New CLI module - `gitnexus/src/core/ingestion/scope-extractor.ts` Single entry point: `extract(matches, filePath, provider): ParsedFile`. Five-pass pipeline: Pass 1 — Build scope tree. `@scope.*` → `ScopeDraft[]` via range-containment parent derivation. Honors `provider.shouldCreateScope` (skip-but-reparent-children) and `provider.resolveScopeKind`. Throws `ScopeTreeInvariantError` via `buildScopeTree` on malformed input. Pass 2 — Attach declarations + local bindings. `@declaration.*` → `SymbolDefinition` + `BindingRef { origin: 'local' }`. Default attachment: innermost containing scope. Hoisting via `provider.bindingScopeFor`. Pass 3 — Collect raw imports. `@import.*` → `ParsedImport` via `provider.interpretImport`. Attached to ParsedFile (finalize resolves owning scope in Phase 2). Pass 4 — Collect type bindings. `@type-binding.*` → `TypeRef` via `provider.interpretTypeBinding` → `scope.typeBindings`. Hoistable via `bindingScopeFor`. Pass 5 — Collect reference sites. `@reference.*` → `ReferenceSite[]`. Call form from declarative sub-tag (`@reference.call.member`) or `provider.classifyCallForm`. ### Tests - `gitnexus/test/unit/scope-resolution/scope-extractor.test.ts` 23 tests organized by pass + one end-to-end fixture exercising all 5 passes together. MockProvider emits synthetic `CaptureMatch[]` with no AST — extractor is pure given those. ## Design notes - **Source-agnostic.** No `Tree` / `SyntaxNode` types leak into the driver. Works for tree-sitter providers and COBOL's regex tagger. - **One AST walk per language.** Providers do the walk inside `emitScopeCaptures`; this driver does zero traversal. - **Invariants delegated.** `ScopeTree.buildScopeTree` enforces structural rules (non-Module has parent, parent contains child, siblings don't overlap). The extractor doesn't try to repair malformed captures. - **Sub-tag whitelist.** `@reference.receiver`, `@declaration.name`, `@import.source`, etc. are known sub-tags — excluded from anchor selection so the broadest-range heuristic doesn't mis-identify them as anchors for their topic. Bug surfaced in the end-to-end fixture test (member call with a large-range receiver) and was fixed before commit. ## Verification - `tsc --noEmit` clean (both `gitnexus-shared` and `gitnexus`) - `gitnexus-shared` build clean - 23/23 new tests pass - Full scope-resolution / model / shadow suite: **285/285 pass** ## Closes part of abhigyanpatwari#909. Unblocks - abhigyanpatwari#920 parse-worker integration (emit ParsedFile from the worker) - abhigyanpatwari#921 finalize orchestrator (consume ParsedFile[] workspace-wide) - abhigyanpatwari#922 per-language import adapters * chore(ingestion): address abhigyanpatwari#919 review findings on the extractor Addresses all 5 items from the PR abhigyanpatwari#965 review in-PR. ## Structural changes - **Extract `ScopeExtractorHooks` as the narrow dependency surface.** The extractor now declares its dependency on a `Pick`-narrowed subset of `LanguageProvider` (just the 6 scope-resolution hooks it actually reads). Test mocks implement exactly that interface — no more `as unknown as LanguageProvider` cast hiding missing-field bugs. Adding a new hook read becomes a compile error, not a silent test pass. (Finding 3.2) - **Remove dead `ownerDefIdFor` stub + `isOwnerKind` helper.** The function always returned `undefined` with `void innermost; void drafts;` suppressors — an incomplete-implementation signal. The code path was also misleading: creating a clone of the def with `ownerId: undefined` is structurally identical to keeping the original. Pass 2 now keeps the def as-is. Contract is documented in a code comment: providers that need `ownerId` set it from their declaration hook; `finalize` (via abhigyanpatwari#914 `MethodDispatchIndex`) fills in method/field `ownerId` in a post-extraction pass that has full def visibility. (Finding 2.1) - **Standardize `filePath` threading across passes 4 and 5.** Pass 4 was reading `drafts[0]!.filePath`; pass 5 was reading `anyFilePathFromScopeTree(scopeTree)`. Both equivalent but inconsistent. Both now take `filePath` as a parameter from the top-level `extract()` call. The `anyFilePathFromScopeTree` helper is removed. (Finding 2.2) ## Documentation - **Snapshot-semantics comment on `scopeTree` + `positionIndex`.** The hooks called during Passes 2-5 receive a `scopeTree` built BEFORE any bindings/ownedDefs/typeBindings were written. Hooks MUST NOT rely on `scope.bindings` etc. being populated — they're for parent/range/kind queries only. Added a doc block at the `scopeTree`/`positionIndex` construction site so future Ring 3 implementers don't write a `classifyCallForm` that reads bindings. (Finding 2.3) ## Tests - **Regression for the anchor-vs-receiver bug** (Finding 3.1): a member-call match where `@reference.receiver` spans columns 0-10 (wider) and the call name spans 11-15 (narrower). Without the `KNOWN_SUB_TAGS` exclusion, the broadest-range heuristic would have picked the receiver; the test pins that the call name is the one that ends up in `referenceSites[0].name`. - **Mock provider now types exactly `ScopeExtractorHooks`**, no more double-cast. Any future hook added to `extract()` that isn't in `ScopeExtractorHooks` is a compile error. ## Verification - `tsc --noEmit` clean in both `gitnexus-shared` and `gitnexus` - `gitnexus-shared` build clean - 24/24 scope-extractor tests pass (+1 regression) - Full scope-resolution / model / shadow suite: **286/286 pass**
Closes #919. Ring 2 PKG kickoff — the central driver that turns a language provider's capture matches into a per-file
ParsedFileartifact. Foundation for the rest of Ring 2 PKG (#920 parse-worker integration, #921 finalize orchestrator, #922 import adapters).What's new
New shared contracts (
gitnexus-shared)Ring 1 collateral tweak
`LanguageProvider.emitScopeCaptures` now returns `Promise<readonly CaptureMatch[]>` (was `readonly Capture[]`). Pre-grouping per tree-sitter match is the provider's job — the extractor expects coherent matches, not flat captures. No consumers yet (all languages still on legacy DAG), so no breakage.
New CLI module (`gitnexus/`)
`src/core/ingestion/scope-extractor.ts` — single entry point:
```ts
extract(matches: readonly CaptureMatch[], filePath: string, provider: LanguageProvider): ParsedFile
```
The five passes (RFC §5.3)
Design principles
Tests (23, all passing)
Organized by pass so regressions localize:
Verification
What's NOT in this PR
The extractor is callable in isolation (tests prove this) but not wired into the real pipeline yet. This deliberate isolation keeps #919's scope tight and lets the downstream tickets land as small, reviewable integrations.
Part of