Skip to content

feat(ingestion): wire ScopeExtractor into parse-worker + processor (#920, RFC #909 Ring 2 PKG)#969

Merged
magyargergo merged 1 commit into
mainfrom
rfc/scope-resolution/920-parse-worker-integration
Apr 18, 2026
Merged

feat(ingestion): wire ScopeExtractor into parse-worker + processor (#920, RFC #909 Ring 2 PKG)#969
magyargergo merged 1 commit into
mainfrom
rfc/scope-resolution/920-parse-worker-integration

Conversation

@magyargergo

Copy link
Copy Markdown
Collaborator

Closes #920. Plumbs the ScopeExtractor (#919) into the real parsing pipeline. `ParsedFile` artifacts now flow from workers to the parsing-processor without changing any legacy-DAG behavior.

What ships

`scope-extractor-bridge.ts` (new)

extractParsedFile(provider, sourceText, filePath, onWarn?): ParsedFile | undefined
  • Short-circuits with `undefined` when the provider has not implemented `emitScopeCaptures` — every language today
  • Invokes the hook + `ScopeExtractor.extract`, returns a `ParsedFile`
  • Swallows exceptions on both sides. Failures route through `onWarn` (or `console.warn`) and return `undefined` — scope-extraction errors never break legacy parsing on the same file
  • Standalone module (not nested in `parse-worker.ts`) so tests can import it directly without triggering the worker's top-level `parentPort!.on(...)`

`parse-worker.ts`

  • `ParseWorkerResult.parsedFiles: ParsedFile[]` added
  • `processFileGroup` invokes `extractParsedFile` AFTER tree parse, BEFORE legacy extraction. Worker threads an `onWarn` callback that routes bridge warnings through `parentPort.postMessage({ type: 'warning', message })`
  • `mergeResult` includes `parsedFiles` in sub-batch merges
  • Initial + reset accumulator templates include `parsedFiles: []`

`parsing-processor.ts`

  • `WorkerExtractedData.parsedFiles: ParsedFile[]` added
  • Empty-result branch and the across-chunk aggregation both include `parsedFiles`. Aggregation is tolerant of workers that don't emit the field (older builds / partial rollouts)

Ring 1 tweak: `emitScopeCaptures` sync return

`readonly CaptureMatch[]` (was `Promise<readonly CaptureMatch[]>`). Tree-sitter and COBOL's regex tagger are both synchronous; no foreseeable async need. Sync lets the worker invoke it inline without cascading `async` through the batch pipeline + IPC handler. No consumers yet, so no breakage.

Design constraints honored

Acceptance criterion How
Per-file artifact production is parallel across worker threads Bridge runs inside `processFileGroup`, one instance per worker thread
No change to existing extraction outputs Helper runs BEFORE legacy extraction; its output goes into a separate `parsedFiles[]` field; legacy code path untouched
IPC payload size growth `ParsedFile` is empty (length 0) until a provider migrates — zero growth today. When migrated, growth is bounded by file size, similar to existing fields

Tests (9 new; full suite 311/311)

`gitnexus/test/unit/scope-resolution/parse-worker-scope-integration.test.ts`:

  • Not-migrated (2): undefined-returning hook · short-circuit never invokes extractor
  • Migrated (3): happy path · argument threading · honors `shouldCreateScope` override
  • Error resilience (4): hook throws · extractor throws (no Module) · extractor throws (sibling overlap) · `onWarn` gets routed message with filePath + error body

Verification

  • `tsc --noEmit` clean in both packages
  • `gitnexus-shared` build clean
  • 311/311 combined scope-resolution / shadow / model / flag suite (+9 new)

What's deferred (not this PR)

Part of

, RFC #909 Ring 2 PKG)

Plumbs the ScopeExtractor (#919) into the real parsing pipeline.
`ParsedFile` artifacts now flow from workers to the parsing-processor
without changing any legacy-DAG behavior.

## Shipped

### `gitnexus/src/core/ingestion/scope-extractor-bridge.ts` (new)

  - `extractParsedFile(provider, sourceText, filePath, onWarn?)`
  - Short-circuits (returns `undefined`) when the provider has not
    implemented `emitScopeCaptures`. True for every language today —
    this is the default no-op path.
  - Invokes the hook + `ScopeExtractor.extract`, returns a `ParsedFile`.
  - **Swallows exceptions on both sides.** Failures route through the
    optional `onWarn` callback (or `console.warn`) and return
    `undefined`. Scope-extraction errors NEVER break legacy parsing on
    the same file.
  - Standalone module (not nested in `parse-worker.ts`) so tests can
    import it directly without triggering the worker's top-level
    `parentPort!.on(...)`.

### `gitnexus/src/core/ingestion/workers/parse-worker.ts`

  - `ParseWorkerResult.parsedFiles: ParsedFile[]` added.
  - `processFileGroup` calls `extractParsedFile` AFTER tree parse,
    BEFORE legacy extraction. Worker provides an `onWarn` callback that
    routes bridge warnings through `parentPort.postMessage({ type:
    'warning', message })`.
  - `mergeResult` includes `parsedFiles` in the sub-batch merge.
  - Initial + reset accumulator templates include `parsedFiles: []`.

### `gitnexus/src/core/ingestion/parsing-processor.ts`

  - `WorkerExtractedData.parsedFiles: ParsedFile[]` added.
  - Empty-result branch and the across-chunk aggregation both include
    `parsedFiles`. Aggregation is tolerant of workers that don't emit
    the field (older builds / partial rollouts).

### Ring 1 tweak: `emitScopeCaptures` sync return

`readonly CaptureMatch[]` (was `Promise<readonly CaptureMatch[]>`).
Tree-sitter and COBOL's regex tagger are both synchronous; no
foreseeable need for async work inside this hook. Sync lets the
already-sync worker pipeline invoke it inline without cascading
`async` up through the batch driver + IPC handler.

## Tests (9 new; full suite 311/311)

`gitnexus/test/unit/scope-resolution/parse-worker-scope-integration.test.ts`:
  - Not-migrated (2): undefined-returning hook · never-invokes-extractor
  - Migrated (3): happy path · argument threading · honors
    `shouldCreateScope` override
  - Error resilience (4): hook throws · extractor throws (no Module) ·
    extractor throws (sibling overlap) · `onWarn` gets routed
    message with filePath + error body

## Verification

  - `tsc --noEmit` clean in both packages
  - `gitnexus-shared` build clean
  - 311/311 combined scope-resolution / shadow / model / flag suite
  - 9/9 new bridge tests

## What's NOT in this PR (still deferred to #921)

  - Actually using the `parsedFiles` — that's the finalize orchestrator.
  - `ModuleScopeIndex.byFilePath` materialization — belongs alongside
    the rest of the SemanticModel indexes in #921.

## Closes part of #909. Unblocks
  - #921 finalize-orchestrator — consumes `WorkerExtractedData.parsedFiles`
@vercel

vercel Bot commented Apr 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Apr 18, 2026 7:12pm

Request Review

@magyargergo

Copy link
Copy Markdown
Collaborator Author

@claude Act as a senior reviewer for GitNexus. Your job is to determine whether this PR is production-ready for this repo, not to give a generic code review.

You are reviewing a PR in the GitNexus monorepo:

  • gitnexus/ → CLI + MCP
  • gitnexus-web/
  • gitnexus-shared/

Your task has 2 phases, in this exact order:

PHASE 1 — DEFINE THE BAR
Before reviewing the diff, establish a concise repo-specific definition of “production-ready” for GitNexus, based only on the repo docs and the affected area.
Keep this definition practical and reviewable. Do not invent standards that are not grounded in the repo.

PHASE 2 — REVIEW THE PR AGAINST THAT BAR
Review the actual diff only after defining the bar.
Stay tightly scoped to the changed code and its direct consequences.


CONTEXT TO LOAD FIRST
Read these before reviewing:

  • AGENTS.md
  • GUARDRAILS.md
  • CONTRIBUTING.md
  • TESTING.md
  • ARCHITECTURE.md

Additional context:


PRIMARY OBJECTIVE
Decide whether this PR is safe, correct, maintainable, and operationally acceptable to merge into production for GitNexus.

Do not optimize for completeness at the expense of signal.
Do not pad the review.
Do not propose unrelated refactors.
Do not restate the PR description unless needed for verification.


REVIEW RULES

  • Every finding must be grounded in specific evidence from the diff or directly relevant surrounding code.
  • Every finding must include path:line.
  • If you make a behavioral claim, cite the code that proves it.
  • If you make a performance claim, explain the mechanism.
  • If something cannot be verified from the diff alone, explicitly say so.
  • Distinguish clearly between:
    • verified issue
    • plausible risk
    • unverified concern
  • Avoid vague wording like “might be better” or “could be improved” unless you explain exactly why.
  • Keep the review focused on this PR’s scope only.

For each finding, assign one severity:

  • BLOCKING → must be fixed before merge
  • NON-BLOCKING → valid issue, but merge may still be acceptable
  • NIT → stylistic/minor, not merge-relevant

REPO-SPECIFIC REVIEW CHECKLIST
Use these exact headings.

1. Correctness & functional completeness

Check:

  • Does the implementation actually satisfy the PR claim?
    • ManifestExtractor is truly invoked
    • config.links produces non-zero cross-links where expected
  • Resolver contracts are preserved:
    • resolveSymbol remains exact-match
    • label-scoped Cypher remains correct per contract type
    • flag any regression toward fuzzy or unscoped matching
  • Graph schema integrity is preserved:
    • no silent changes to node labels
    • no silent changes to edge types
    • no silent changes to ID generation (generateId)
  • Call out any missing wiring, partial integration, dead branch, or mismatch between tests and runtime behavior

2. Code clarity & clean code

Check:

  • naming quality
  • local cohesion
  • dead code
  • unnecessary abstraction
  • hidden control flow
  • confusing indirection
  • adherence to repo conventions:
    • direct imports from gitnexus-shared
    • no barrel re-export regressions
    • no // removed comments
    • no unused re-exports
  • no drive-by refactors outside stated scope per CONTRIBUTING.md and GUARDRAILS.md § Scope

3. Test coverage & change safety

Evaluate against TESTING.md:

  • Are there unit tests under gitnexus/test/unit/ covering the newly wired path?
  • Is there a regression guard for 0-link → N-link behavior?
  • Are assertions meaningful rather than tautological?
  • Are fixtures realistic for manifest inputs?
  • If memoization/cache was introduced, is there a test proving hit/miss behavior and correctness?
  • Is there evidence the expected validation path would pass for staged gitnexus/ files?
    • tsc --noEmit
    • vitest run --project default
      If not verifiable, say exactly what is missing.

4. Performance

Inspect for:

  • hot-path overhead in ingestion/group sync
  • excess allocations per manifest entry
  • redundant Cypher round-trips
  • missed batching or missed parallelism (Promise.all) where it materially matters
  • O(n²) or repeated lookup patterns on large repos
  • memoization tradeoffs:
    • correctness
    • invalidation
    • bounded vs unbounded memory growth
      Do not speculate casually; explain the mechanism and likely impact.

5. Operational risk

Check:

  • Windows/cross-platform safety:
    • stream lifecycle
    • FD/file handle lifecycle
    • path separator assumptions
    • anything resembling prior ENOTEMPTY-style lifecycle regressions
  • LadybugDB single-writer invariant is preserved
  • Embeddings preservation:
    • no silent breakage of --embeddings
    • .gitnexus/meta.json.stats.embeddings not silently zeroed by changed paths
  • MCP contracts remain compatible:
    • group_*
    • query
    • context
    • impact
    • detect_changes
    • rename
    • cypher
      Flag any schema or contract break without migration note
  • staleness behavior still triggers correctly (gitnexus/src/mcp/staleness.ts)
  • rollback safety:
    • can this PR be reverted safely without re-analyze?
    • if not, explain why

6. Maintainability

Check:

  • Does the change respect Pipeline Phase DAG boundaries?
  • Is cross-phase coupling being introduced?
  • Are manifest-extractor entry points discoverable from group/sync.ts?
  • Would the next contributor know where to extend this?
  • Is shared logic placed appropriately:
    • in gitnexus-shared/
    • or gitnexus/src/core/**/utils.ts
    • and not duplicated between CLI and web bridge (server/api.ts)?

NON-NEGOTIABLES TO VERIFY
From GUARDRAILS.md / CONTRIBUTING.md:

  • no secrets, tokens, or machine-specific paths committed
  • no blind find-and-replace renames
  • if a symbol rename occurred, rename MCP tool or equivalent careful review was used
  • upstream impact was considered for modified shared symbols
  • no HIGH/CRITICAL impact left unaddressed
  • PR title follows conventional commit style + [area] Short description
  • PolyForm Noncommercial license headers/metadata untouched

OUTPUT FORMAT

Production-ready bar for this PR

Give a concise repo-specific definition in 5–10 bullets.

Findings

For each finding, use this structure exactly:

[BLOCKING|NON-BLOCKING|NIT] Short title

  • Evidence: path:line, path:line
  • Why it matters: ...
  • Recommendation: ...
  • Confidence: High / Medium / Low

Only include findings that are real and supported.
If there are no findings in a section, say:

  • No issues found in this area based on the diff and reviewed context.

What I could not fully verify

List any behavior you could not prove from the diff, tests, or available context.

Final verdict

Choose exactly one:

  • ✅ Ready to merge — no blocking issues.
  • ⚠️ Merge after addressing blocking items.
  • ❌ Not ready — fundamental concerns.

If verdict is not ready, list the blocking items in priority order.


IMPORTANT REVIEW BEHAVIOR

  • Be strict on correctness and contracts.
  • Be conservative about operational risk.
  • Be explicit about uncertainty.
  • Do not suggest unrelated refactors.
  • Do not turn nits into blockers.
  • Do not reward intent; review actual implementation.

You must also ensure we are on track with the RFC plan! Check the issue tickets involved!

@github-actions

github-actions Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 3m 33s —— View job


PR #969 — Senior Review: feat(ingestion): wire ScopeExtractor into parse-worker + processor

Tasks:


Production-ready bar for this PR

Based on repo docs and the affected area (ingestion → parse phase → worker pipeline):

  1. Non-breaking additive contract: new fields on ParseWorkerResult / WorkerExtractedData must not alter any existing extraction output; legacy DAG path must be entirely unaffected.
  2. IPC contract preservation: any new worker→main message type must already be declared in WorkerOutgoingMessage and handled in worker-pool.ts before or in the same PR.
  3. Phase DAG boundary: changes must stay within the parse phase. No new cross-phase coupling allowed.
  4. Failure isolation: scope-extraction errors must never propagate to legacy parsing on the same file.
  5. Type safety: tsc --noEmit clean in both gitnexus/ and gitnexus-shared/.
  6. Unit test coverage under gitnexus/test/unit/ covering the new wired path: short-circuit, happy path, argument threading, and error resilience.
  7. Zero operational regression: no MCP contract changes, no graph schema writes, no LadybugDB mutations, no embeddings path touched.
  8. No secrets, machine paths, or blind renames per GUARDRAILS.md.
  9. PR title follows conventional-commit format per CONTRIBUTING.md (enforced by pr-labeler.yml).

Findings

1. Correctness & functional completeness

No blocking issues. Verified:

  • extractParsedFile is invoked in processFileGroup (parse-worker.ts:1418) before legacy extraction begins. Result goes into result.parsedFiles (parse-worker.ts:1422) — entirely independent of the legacy code path. ✓
  • mergeResult includes parsedFiles via appendAll (parse-worker.ts:2337). All three accumulator templates (initial, per-sub-batch reset, accumulated) initialize parsedFiles: []. ✓
  • parsing-processor.ts aggregates result.parsedFiles with a nil-safe guard (if (result.parsedFiles)) at line 175 — tolerant of any older worker builds. ✓
  • emitScopeCaptures sync return type change (language-provider.ts:332): confirmed zero existing implementations across the whole src/ tree. The change has no breakage surface today and no consumer was using the prior async shape. ✓
  • parsedFiles is intentionally not consumed in parse-impl.ts — confirmed by checking all chunkWorkerData.* accesses. Deferred correctly to RING2-PKG-3: Finalize orchestrator — invoke shared finalize algorithm; write to MutableSemanticModel #921. ✓
  • warning message type (parse-worker.ts:1419): already declared in WorkerOutgoingMessage union (worker-pool.ts:27) and handled at worker-pool.ts:124console.warn. No new IPC contract introduced. ✓

2. Code clarity & clean code

No issues found in this area based on the diff and reviewed context.

The separation of scope-extractor-bridge.ts from parse-worker.ts is correct and well-motivated (top-level parentPort!.on in parse-worker.ts fires on import — stated in the module JSDoc). Direct import in tests would otherwise trigger a worker-context assertion. The bridge is a single-responsibility module with clear contract documentation.

ScopeBridgeWarn type alias is appropriately lightweight. No barrel re-export regressions. No // removed comments. No drive-by refactors observed.

3. Test coverage & change safety

Nine tests in gitnexus/test/unit/scope-resolution/parse-worker-scope-integration.test.ts cover:

  • Not-migrated short-circuit (2 tests: undefined return + no-throw guarantee)
  • Happy path: ParsedFile structure, argument forwarding, shouldCreateScope hook integration (3 tests)
  • Error resilience: hook throws, onWarn routing, extractor throws on missing Module scope, extractor throws on sibling overlap (4 tests)

The overlap test (parse-worker-scope-integration.test.ts:152–163) is a good regression pin — it confirms the ScopeTreeInvariantError from #912 is caught and isolated.

NON-BLOCKING — Worker-level parsedFiles aggregation not directly tested
  • Evidence: mergeResult (parse-worker.ts:2337) and the parsedFiles: [] initializers in the three accumulator templates are not exercised by any test.
  • Why it matters: When a language eventually implements emitScopeCaptures, a regression in mergeResult would silently drop ParsedFile artifacts from sub-batches without failing any test.
  • Recommendation: When Ring 3 per-language work begins (or latest by RING2-PKG-3: Finalize orchestrator — invoke shared finalize algorithm; write to MutableSemanticModel #921), add an integration-level test that sends a two-file batch to processFileGroup and asserts both ParsedFile entries survive mergeResult. Not urgent while zero languages implement the hook.
  • Confidence: High (gap verified by search; impact low today).

4. Performance

No issues found in this area based on the diff and reviewed context.

For all current languages (zero emitScopeCaptures implementations): extractParsedFile executes one === undefined guard and returns — O(1) per file, negligible. result.parsedFiles.push is conditionally unreachable today. appendAll over an empty array is a no-op. When a language migrates, growth is bounded to one ParsedFile per file — identical scaling to existing fields like calls[].

5. Operational risk

No issues found in this area based on the diff and reviewed context.

  • No MCP tool contract changes; no graph writes; no LadybugDB mutations; no --embeddings path touched; staleness logic untouched.
  • IPC payload growth is zero until a language migrates.
  • Rollback: safe — parsedFiles is a new additive field; no downstream consumer (RING2-PKG-3: Finalize orchestrator — invoke shared finalize algorithm; write to MutableSemanticModel #921) is merged yet.
  • No Windows/path-separator concerns in the new code (string concatenation in scope-extractor-bridge.ts:47 uses template literals, not path joins — appropriate for error messages).
NIT — Sequential fallback produces no parsedFiles

6. Maintainability

No issues found in this area based on the diff and reviewed context.

Change is fully contained within the parse phase boundary. The bridge module is discoverable: language-provider.ts JSDoc → scope-extractor-bridge.tsparse-worker.ts. WorkerExtractedData.parsedFiles JSDoc explicitly points to #921. Next contributor wiring a language provider to emitScopeCaptures has a clear, tested path to follow.


RFC #909 Ring 2 progress check

Ticket Status Notes
#910, #911, #912 (Ring 2 SHARED deps) Merged Pre-requisites ✓
#917 ClassRegistry/MethodRegistry/FieldRegistry Merged (1bf9fb4)
#919 ScopeExtractor driver Merged (c6a291d) Direct dependency ✓
#920 (this PR) Open Wires bridge into parse-worker
#924 REGISTRY_PRIMARY_ flag Merged (eece634)
#921 finalize-orchestrator Blocked on this PR Unblocked once merged ✓

Ring 2 PKG is on track. This PR correctly unblocks #921 by providing WorkerExtractedData.parsedFiles.


What I could not fully verify

  1. tsc --noEmit result: CI is still in progress. The PR description claims clean typecheck in both packages, but I cannot confirm from the diff alone. The type signatures look correct.
  2. 311/311 test pass: Claimed by PR description; CI pending. The 9 new tests look correctly structured for Vitest.
  3. console.warn fallback in extractParsedFile (scope-extractor-bridge.ts:51) when onWarn is absent: correctly falls back, but there's no test asserting console.warn is called in that case (only the onWarn path is tested). Minor omission — not a blocking concern.

Final verdict

Ready to merge — no blocking issues.

The PR is a clean, well-isolated additive step. The bridge is correct, the legacy DAG is provably unaffected (zero emitScopeCaptures implementations today), IPC is wired into an existing handler, and the 9 tests pin all three invariants stated in the PR description. The one non-blocking gap (worker-level aggregation test) and one nit (sequential fallback gap documentation) are both appropriate to defer given the dormant state of the feature.


| CI workflow

@github-actions

Copy link
Copy Markdown
Contributor

CI Report

Some checks failed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
❌ Tests failure unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
6846 6749 0 97 261s

✅ All 6749 tests passed

97 test(s) skipped — expand for details
  • Swift MethodExtractor > isTypeDeclaration > recognizes class_declaration
  • Swift MethodExtractor > isTypeDeclaration > recognizes protocol_declaration
  • Swift MethodExtractor > isTypeDeclaration > rejects import_declaration
  • Swift MethodExtractor > visibility > extracts public method
  • Swift MethodExtractor > visibility > extracts private method
  • Swift MethodExtractor > visibility > defaults to internal when no modifier
  • Swift MethodExtractor > protocol methods > marks protocol method as abstract
  • Swift MethodExtractor > static and class methods > detects static func as isStatic
  • Swift MethodExtractor > static and class methods > detects class func as isStatic
  • Swift MethodExtractor > parameters > extracts parameters with types and default values
  • Swift MethodExtractor > return type > extracts return type from -> annotation
  • Swift MethodExtractor > annotations > extracts @objc attribute
  • Swift MethodExtractor > isFinal > detects final func
  • Swift MethodExtractor > isFinal > is false when not final
  • Swift MethodExtractor > isAsync > detects async func
  • Swift MethodExtractor > isOverride > detects override method
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift explicit init inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference does not bind plain functions
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()
  • Swift implicit imports (cross-file visibility) > detects UserService class in Models.swift
  • Swift implicit imports (cross-file visibility) > resolves UserService() constructor call across files (no explicit import)
  • Swift implicit imports (cross-file visibility) > resolves service.fetchUser() member call across files
  • Swift implicit imports (cross-file visibility) > creates IMPORTS edges between files in the same module
  • Swift extension deduplication > detects Product class
  • Swift extension deduplication > resolves Product() constructor despite extension creating duplicate class node
  • Swift extension deduplication > resolves product.save() to Product.swift (primary definition)
  • Swift constructor call fallback (no new keyword) > resolves OCRService() as constructor call across files
  • Swift constructor call fallback (no new keyword) > resolves ocr.recognize() member call via constructor-inferred type
  • Swift export visibility (internal vs private) > resolves PublicService() constructor across files
  • Swift export visibility (internal vs private) > resolves internalHelper() across files (internal = module-scoped)
  • Swift if let / guard let binding resolution > detects User and Repo classes
  • Swift if let / guard let binding resolution > resolves user.save() inside if-let to User#save
  • Swift if let / guard let binding resolution > resolves repo.save() inside guard-let to Repo#save
  • Swift if let / guard let binding resolution > user.save() in if-let does NOT resolve to Repo#save
  • Swift await / try expression unwrapping > resolves user.save() via await fetchUser() return type
  • Swift await / try expression unwrapping > resolves repo.save() via try parseRepo() return type
  • Swift await / try expression unwrapping > detects fetchUser and parseRepo as functions
  • Swift for-in loop element type inference > detects User and Repo classes
  • Swift for-in loop element type inference > creates implicit import edges between files
  • Swift field-type resolution > detects classes and their properties
  • Swift field-type resolution > emits HAS_PROPERTY edges from class to field
  • Swift field-type resolution > resolves field-chain call user.address.save() → Address#save
  • Swift field-type resolution > emits ACCESSES edges for field reads in chains
  • Swift field-type resolution > populates field metadata (visibility, declaredType) on Property nodes
  • Swift call-result binding > resolves call-result-bound method call user.save() → User#save
  • Swift call-result binding > getUser() is present as a defined function
  • Swift call-result binding > emits processUser -> getUser CALLS edge for let-assigned free function call
  • Swift method enrichment > detects Animal protocol and Dog class
  • Swift method enrichment > emits IMPLEMENTS edge Dog -> Animal
  • Swift method enrichment > emits HAS_METHOD edges for Dog methods
  • Swift method enrichment > marks protocol Animal.speak as isAbstract
  • Swift method enrichment > marks Dog.speak as NOT isAbstract
  • Swift method enrichment > marks breathe as isFinal
  • Swift method enrichment > marks classify as isStatic
  • Swift method enrichment > captures @objc annotation on breathe
  • Swift method enrichment > populates parameterTypes for classify(_ name: String)
  • Swift method enrichment > records parameterCount for classify
  • Swift method enrichment > records returnType for speak
  • Swift method enrichment > resolves dog.speak() CALLS edge
  • Swift method enrichment > resolves Dog.classify("dog") CALLS edge
  • Swift abstract dispatch > detects Repository protocol and SqlRepository class
  • Swift abstract dispatch > emits IMPLEMENTS edge SqlRepository -> Repository
  • Swift abstract dispatch > emits HAS_METHOD edges for Repository.find and Repository.save
  • Swift abstract dispatch > emits HAS_METHOD edges for SqlRepository.find and SqlRepository.save
  • Swift abstract dispatch > marks base Repository.find as isAbstract
  • Swift abstract dispatch > marks base Repository.save as isAbstract
  • Swift abstract dispatch > marks concrete SqlRepository.find as NOT isAbstract
  • Swift abstract dispatch > resolves repo.find(id: 42) CALLS edge
  • Swift abstract dispatch > resolves repo.save(entity: user) CALLS edge
  • Swift abstract dispatch > populates parameterTypes for Repository.find
  • Swift abstract dispatch > populates parameterTypes for Repository.save
  • Swift abstract dispatch > records returnType for SqlRepository.find
  • Swift abstract dispatch > emits METHOD_IMPLEMENTS edges from SqlRepository methods → Repository protocol methods
  • Swift overloaded method disambiguation > detects 2 distinct find Method nodes on SqlRepository
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edges for both find overloads
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edge for save
  • Swift overloaded method disambiguation > emits exactly 3 METHOD_IMPLEMENTS edges total
  • Swift Child extends Parent — inherited method resolution (SM-9) > detects Parent and Child classes
  • Swift Child extends Parent — inherited method resolution (SM-9) > resolves c.parentMethod() to Parent.parentMethod via first-wins MRO walk

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 73.35% 18160/24756 73.41% 📉 -0.1 🔴 ██████████████░░░░░░
Branches 62.24% 11508/18489 62.3% 📉 -0.1 🔴 ████████████░░░░░░░░
Functions 78.13% 1730/2214 78.2% 📉 -0.1 🔴 ███████████████░░░░░
Lines 75.92% 16456/21674 75.98% 📉 -0.1 🔴 ███████████████░░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo merged commit 39b5d29 into main Apr 18, 2026
17 of 19 checks passed
@magyargergo magyargergo deleted the rfc/scope-resolution/920-parse-worker-integration branch April 18, 2026 19:27
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
…bhigyanpatwari#920, RFC abhigyanpatwari#909 Ring 2 PKG) (abhigyanpatwari#969)

Plumbs the ScopeExtractor (abhigyanpatwari#919) into the real parsing pipeline.
`ParsedFile` artifacts now flow from workers to the parsing-processor
without changing any legacy-DAG behavior.

## Shipped

### `gitnexus/src/core/ingestion/scope-extractor-bridge.ts` (new)

  - `extractParsedFile(provider, sourceText, filePath, onWarn?)`
  - Short-circuits (returns `undefined`) when the provider has not
    implemented `emitScopeCaptures`. True for every language today —
    this is the default no-op path.
  - Invokes the hook + `ScopeExtractor.extract`, returns a `ParsedFile`.
  - **Swallows exceptions on both sides.** Failures route through the
    optional `onWarn` callback (or `console.warn`) and return
    `undefined`. Scope-extraction errors NEVER break legacy parsing on
    the same file.
  - Standalone module (not nested in `parse-worker.ts`) so tests can
    import it directly without triggering the worker's top-level
    `parentPort!.on(...)`.

### `gitnexus/src/core/ingestion/workers/parse-worker.ts`

  - `ParseWorkerResult.parsedFiles: ParsedFile[]` added.
  - `processFileGroup` calls `extractParsedFile` AFTER tree parse,
    BEFORE legacy extraction. Worker provides an `onWarn` callback that
    routes bridge warnings through `parentPort.postMessage({ type:
    'warning', message })`.
  - `mergeResult` includes `parsedFiles` in the sub-batch merge.
  - Initial + reset accumulator templates include `parsedFiles: []`.

### `gitnexus/src/core/ingestion/parsing-processor.ts`

  - `WorkerExtractedData.parsedFiles: ParsedFile[]` added.
  - Empty-result branch and the across-chunk aggregation both include
    `parsedFiles`. Aggregation is tolerant of workers that don't emit
    the field (older builds / partial rollouts).

### Ring 1 tweak: `emitScopeCaptures` sync return

`readonly CaptureMatch[]` (was `Promise<readonly CaptureMatch[]>`).
Tree-sitter and COBOL's regex tagger are both synchronous; no
foreseeable need for async work inside this hook. Sync lets the
already-sync worker pipeline invoke it inline without cascading
`async` up through the batch driver + IPC handler.

## Tests (9 new; full suite 311/311)

`gitnexus/test/unit/scope-resolution/parse-worker-scope-integration.test.ts`:
  - Not-migrated (2): undefined-returning hook · never-invokes-extractor
  - Migrated (3): happy path · argument threading · honors
    `shouldCreateScope` override
  - Error resilience (4): hook throws · extractor throws (no Module) ·
    extractor throws (sibling overlap) · `onWarn` gets routed
    message with filePath + error body

## Verification

  - `tsc --noEmit` clean in both packages
  - `gitnexus-shared` build clean
  - 311/311 combined scope-resolution / shadow / model / flag suite
  - 9/9 new bridge tests

## What's NOT in this PR (still deferred to abhigyanpatwari#921)

  - Actually using the `parsedFiles` — that's the finalize orchestrator.
  - `ModuleScopeIndex.byFilePath` materialization — belongs alongside
    the rest of the SemanticModel indexes in abhigyanpatwari#921.

## Closes part of abhigyanpatwari#909. Unblocks
  - abhigyanpatwari#921 finalize-orchestrator — consumes `WorkerExtractedData.parsedFiles`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RING2-PKG-2: parse-worker.ts integration — emit ParsedFile artifacts

1 participant