Skip to content

refactor(pipeline): DAG-based phase architecture + container-logic extraction to LanguageProvider#809

Merged
magyargergo merged 24 commits into
mainfrom
copilot/refactor-move-container-logic
Apr 13, 2026
Merged

refactor(pipeline): DAG-based phase architecture + container-logic extraction to LanguageProvider#809
magyargergo merged 24 commits into
mainfrom
copilot/refactor-move-container-logic

Conversation

Copilot AI commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Overview

Two interleaved refactors landed on this branch:

  1. Container logic moved into LanguageProvider — language-specific knowledge of which AST nodes count as containers (classes, interfaces, modules, etc.) is now owned by each language provider rather than scattered through generic processors.
  2. Ingestion pipeline rebuilt as a typed phase DAGpipeline.ts shrunk from ~1,950 lines to ~130, with each phase extracted into pipeline-phases/ as a typed node in a dependency graph, scheduled by a generic runner via topological sort.

A long review series (passes 1–5) drove sustained cleanup of correctness, type, and lifecycle issues across both refactors.

Architecture changes

  • Phase DAG runner — each phase declares typed deps and a typed output; the runner resolves deps via topological sort and passes PhaseResult<T> maps. Runner now filters results to only the declared deps before passing to execute(), so phases can't accidentally read undeclared upstream output.
  • PipelineContext is fully readonly — phase outputs flow only through typed deps, not through context mutation.
  • Pipeline-phases folder with one file per phase: scan, structure, markdown, cobol, parse, routes, tools, orm, crossFile, mro, communities, processes.
  • LanguageProvider owns container-node detection; resolveEnclosingOwner is config-driven with a runtime staticOwnerTypes guard at the factory chokepoint.

Correctness, lifecycle & type safety

  • Phase errors are wrapped with phase name and emit a terminal error progress event.
  • bindingAccumulator disposal moved into crossFile's try/finally (single-disposer ownership documented; sequential fallback also wrapped in try/finally so cleanup survives throws).
  • Minimal cycle reporting + resolveEnclosingOwner loop safeguards.
  • allFetchCalls and other parse outputs typed readonly to prevent downstream mutation.
  • ParseOutput.exportedTypeMap exposed as truly ReadonlyMap<…, ReadonlyMap<…>>. Worker-path graph→exports enrichment moved into parse-impl so the snapshot is fully populated at parse return; crossFile builds its own local mutable working copy for per-file re-resolution writes — no cast at the boundary.
  • Redundant parse deps dropped from mro/communities/processes (they only needed totalFiles, which originates in structure).
  • hasSynthesized flag guards the unconditional final synthesizeWildcardImportBindings call when per-chunk/fallback synthesis already ran.
  • isDev centralized in utils/env.ts (no more duplication across processors).
  • lineNumberAtOffset binary-search JSDoc improved; O(n²) line calc fixed.
  • Shared allPathSet built once in structure and consumed across phases (was rebuilt per phase).
  • Cycle-handling JSDoc in graph-sort.ts made consistent across module-level and function-level docs.

Testing

  • New DAG runner unit tests + dep-isolation test.
  • Direct unit coverage for wildcard-synthesis and cross-file-impl (incl. assertion that crossFile does NOT mutate the parse-supplied exportedTypeMap).
  • Golden-file graph-parity regression guard on mini-repo fixture.
  • Regression tests for config-driven staticOwnerTypes and resolveEnclosingOwner hook.
  • Documents intentional phase: 'parsing' progress label on cross-file re-resolution so telemetry bucketing stays consistent.

tsc --noEmit clean. Full Vitest run green except for one pre-existing unrelated dart parser-loader native-module failure.

Test plan

  • tsc --noEmit clean across gitnexus/ and gitnexus-shared/
  • Full Vitest suite (5,884 / 5,886 effective pass; dart query-compilation failure is pre-existing, unrelated to this branch)
  • DAG runner dep-isolation test passes
  • Graph-parity golden-file regression test passes
  • Wildcard-synthesis + cross-file-impl unit tests pass
  • CodeQL scan ✅

@vercel

vercel Bot commented Apr 13, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Apr 13, 2026 7:20pm

Request Review

…ovider

- Add resolveEnclosingOwner hook to LanguageProviderConfig
- Add staticOwnerTypes to MethodExtractionConfig
- Implement Ruby resolveEnclosingOwner (singleton_class → class/module)
- Replace hardcoded STATIC_OWNER_TYPES with config.staticOwnerTypes
- Move Ruby static types to rubyMethodConfig
- Move Kotlin static types to kotlinMethodConfig
- Remove Ruby singleton_class branch from findEnclosingClassInfo
- Collapse seqFindEnclosingClassNode/seqFindRawEnclosingContainerNode
  into single provider-aware seqFindEnclosingOwnerNode
- Update worker path to pass provider.resolveEnclosingOwner

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/bc9f9d4d-f749-4872-9ff2-17fc86e08787

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
…olveEnclosingOwner hook

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/bc9f9d4d-f749-4872-9ff2-17fc86e08787

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor move language-specific container node logic into LanguageProvider Refactor: move language-specific container node logic into LanguageProvider Apr 13, 2026
Copilot AI requested a review from magyargergo April 13, 2026 12:39
…tion

Restructure the ingestion pipeline from a ~1800-line monolithic orchestrator
into a DAG (Directed Acyclic Graph) of named phases with explicit dependencies.

New files under pipeline-phases/:
- types.ts: PipelinePhase, PipelineContext, PhaseResult contracts
- runner.ts: DAG runner with topological sort validation
- scan.ts, structure.ts, markdown.ts, cobol.ts: early phases
- parse.ts + parse-impl.ts: chunked parse + resolve (the core)
- routes.ts, tools.ts, orm.ts: post-parse enrichment phases
- cross-file.ts + cross-file-impl.ts: cross-file binding propagation
- mro.ts, communities.ts, processes.ts: graph analysis phases
- index.ts: barrel export

pipeline.ts reduced from ~1960 lines to ~184 lines:
- DAG phase array declaration
- runPipelineFromRepo as thin orchestrator
- topologicalLevelSort retained for backward compat

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/136bf9c3-2f4f-449b-9fff-001332c8371c
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude review this from performance, architecture perspective. is there any bugs? does it follow clean code principles?

@github-actions

github-actions Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 2m 45s —— View job


PR Review: Performance, Architecture, Clean Code & Bugs

Tasks:

  • Gather context and understand the PR
  • Read core changed files (pipeline.ts, runner.ts, types.ts, phase files)
  • Analyze architecture and design patterns
  • Identify performance concerns
  • Check for bugs
  • Evaluate clean code principles
  • Post comprehensive review

Summary

This is a well-executed refactor that reduces pipeline.ts from ~1960 to ~184 lines by introducing a DAG-based phase architecture. The overall design is sound. Below is a detailed review across all four dimensions.


Architecture

Strengths

  • Clean DAG execution model with typed phase inputs/outputs.
  • Each phase is independently testable (mock ctx + deps map).
  • PipelinePhase<T> is a minimal, composable interface.
  • Barrel export in index.ts is clean and the phase registry in buildPhaseList is easy to extend.

Issues

🔴 Circular dependency

cross-file-impl.ts:19 imports topologicalLevelSort from ../pipeline.js, but pipeline.ts imports from pipeline-phases/index.tscross-file.tscross-file-impl.ts. This is a circular module dependency:

pipeline.ts → pipeline-phases/cross-file.ts → cross-file-impl.ts → pipeline.ts

Node.js can resolve some cycles via hoisting, but this is fragile and can yield undefined imports at startup. topologicalLevelSort should be moved to a shared utility (e.g., utils/graph-sort.ts) to break the cycle. Fix this →

🟡 Design goal contradiction in types.ts

The file header promises "no shared mutable state between phases" (line 9), but PipelineContext.graph is described as a "Mutable knowledge graph — the single shared accumulator" (line 25). Every phase mutates the same KnowledgeGraph object. The readonly modifier only prevents reassigning the reference, not mutations. This comment should either accurately describe the design or the design should be reconsidered. The current approach is pragmatic but the stated goal is misleading.

🟡 Parallel phase opportunities not taken

routes, tools, and orm phases all depend only on parse and are mutually independent — they could run concurrently. The runner comment acknowledges this but dismisses it. Given that routes.ts alone does several file I/O operations, running these three phases in parallel would be a meaningful improvement. This is a missed opportunity worth tracking as a follow-up. Fix this →

🟡 Redundant parse dep in mro and communities

mroPhase declares deps: ['crossFile', 'parse'] and communitiesPhase declares deps: ['mro', 'parse']. Both only read totalFiles from parse for progress reporting — not a real data dependency. Since crossFile already depends on parse, these transitive parse deps extend the dependency list without value. A PipelineContext.totalFiles field would be cleaner.


Bugs

🔴 mroPhase emits wrong phase name in progress

mro.ts:39 calls ctx.onProgress with phase: 'parsing'. This should not be 'parsing' — the MRO phase is distinct. Users would see "Computing method resolution order..." attributed to the parse phase in the UI, potentially causing confusion in progress monitoring. Fix this →

🔴 routesPhase mutates ParseOutput.allFetchCalls

routes.ts:248 directly pushes into the allFetchCalls array that came from ParseOutput:

allFetchCalls.push({ filePath, fetchURL: normalized, lineNumber: 0 });

ParseOutput is a shared result — mutating it from a consumer phase violates the explicit data flow contract. If tools or another phase were ever run concurrently with routes, this would be a race condition. Even serially, it's surprising that the parse phase's output is retroactively modified. Fix this →

🟡 communities.ts:37 progress reports phase: 'communities' but starts at 82% — same start as cross-file-impl.ts

crossFilePhase reports percent: 82 (cross-file-impl.ts:96) and communitiesPhase also starts at percent: 82 (communities.ts:36). These overlap in the progress timeline, which will cause the progress indicator to jump backwards or appear stalled.


Performance

🟡 O(n²) line number calculation in extractORMQueriesInline

parse-impl.ts:249:

lineNumber: content.substring(0, m.index).split('\n').length - 1,

For each regex match, this allocates a new string and splits it. With many ORM queries in a large file this is O(file_size × match_count). A simple counter incremented by \n characters up to m.index would be O(file_size) total. Fix this →

🟡 synthesizeWildcardImportBindings called up to N+2 times

In runChunkedParseAndResolve, this function (which traverses all graph nodes and relationships) is called:

  1. Once per chunk where chunkNeedsSynthesis[chunkIdx] is true (line 496)
  2. Once before sequential fallback processing (line 629)
  3. Once globally at the end (line 698)

The final call at line 698 is always executed, even when all prior chunks already synthesized. For large repos, this is redundant work. A boolean flag tracking whether any synthesis has run would help.

🟡 File contents re-read in cross-file propagation

cross-file-impl.ts:147 calls readFileContents for level candidates. These files were already read during the parse phase and their contents discarded. Carrying forward a Map<string, string> of file contents from the parse phase output would save the second disk read for cross-file re-resolution.

🟢 Minor: Array.shift() in topological sort

runner.ts:61 uses queue.shift() which is O(n) due to array reindexing. With 12 phases this is negligible, but for future extensibility a proper queue or reversed-array pop would be more correct.


Clean Code

🟡 Misleading _item variable names

parse-impl.ts:509-576 and parsing-processor.ts:149-160 use _item in for-of loops where the variable is actively used:

for (const _item of chunkWorkerData.calls) deferredWorkerCalls.push(_item);

The _ prefix convention signals an intentionally unused variable. These should be item or a more descriptive name. Fix this →

Additionally, the pattern for (const item of arr) target.push(item) can simply be target.push(...arr) or target.push.apply(target, arr) (safe for large arrays that could hit call stack limits with spread).

🟡 isDev redeclared in every file

const isDev = process.env.NODE_ENV === 'development'; appears in at least 8 phase files and impl files. This is harmless but noisy. A shared utils/env.ts export or a single import would be more DRY.

🟡 isDev defined inside execute() in cross-file.ts

cross-file.ts:35 declares isDev inside the execute function body, while every other phase file declares it at module scope. This inconsistency is minor but worth aligning.

🟢 Progress percentage comments/values are scattered across phases with no central budget

Each phase hardcodes progress percentages (e.g., mro: 81%, communities: 82-92%, processes: 94-99%). These are not coordinated — any change to one phase's range requires manually adjusting others. A progress "budget" table in the PipelineContext or a constant in types.ts would make this maintainable.

🟢 processes.ts is the largest phase at 170 lines, but it's doing two things

The node/edge creation for Process nodes and the route/tool linking are two distinct responsibilities. The route-to-process linking could be a separate small helper or even a dedicated phase depending on future growth.


Overall Assessment

The refactor is architecturally sound and a significant improvement to maintainability. The critical items to address before merging are:

  1. The circular dependency (cross-file-impl.tspipeline.ts)
  2. The mroPhase wrong progress phase name
  3. The allFetchCalls mutation in routes.ts

The performance and clean code items are lower priority but worth tracking.

- Extract synthesizeWildcardImportBindings to wildcard-synthesis.ts
- Extract extractORMQueriesInline to orm-extraction.ts
- Create shared constants.ts for AST_CACHE_CAP
- Fix inline type import in orm.ts (use proper top-level import)
- Add comprehensive JSDoc to getPhaseOutput explaining type safety
- Move isDev to module level in cross-file.ts (consistency)
- Improve module-level documentation across files
- Organize barrel exports in index.ts with section comments

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/2bd6d4aa-6271-4009-8dd2-332ea8ec73ab

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude review this from performance, architecture perspective. is there any bugs? does it follow clean code principles?

@github-actions

github-actions Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 3m 40s —— View job


PR Review: Performance, Architecture, Clean Code & Bugs

Tasks:

  • Gather context and understand the PR
  • Read core changed files (runner.ts, types.ts, mro.ts, routes.ts, cross-file-impl.ts, graph-sort.ts, parse.ts, orm-extraction.ts, constants.ts, pipeline.ts)
  • Verify previously-flagged fixes are in place
  • Analyze architecture and design patterns
  • Identify performance concerns
  • Check for bugs
  • Evaluate clean code principles
  • Post comprehensive review

Summary

All critical and major issues from the three previous review passes have been resolved. The PR is in mergeable shape. This pass focuses on verifying those fixes and surfacing any remaining observations.


✅ Previously-Flagged Issues — All Resolved

Issue Status Evidence
Circular dependency (cross-file-impl.ts → pipeline.ts) ✅ Fixed cross-file-impl.ts:23 imports from ../utils/graph-sort.js; pipeline.ts:43 re-exports
mroPhase wrong progress phase name ('parsing') ✅ Fixed mro.ts:34 now uses phase: 'enriching'
Progress going backwards (mro 81% after crossFile 82%) ✅ Fixed mro.ts:35 emits percent: 83; communities starts at 84
routesPhase mutating ParseOutput.allFetchCalls ✅ Fixed routes.ts:61: const allFetchCalls = [...parseFetchCalls]
ParseOutput.allFetchCalls mutable type ✅ Fixed parse.ts:36: readonly allFetchCalls: readonly ExtractedFetchCall[]
_item variable naming in for-of loops ✅ Fixed parse-impl.ts:369–382 uses item throughout
isDev duplicated across pipeline-phases ✅ Fixed All phase files import from ./constants.jsutils/env.ts
isDev duplicated outside pipeline-phases ✅ Fixed language-config.ts:5, import-processor.ts:27, process-processor.ts:18 all import from ./utils/env.js
O(n²) line number calculation in ORM extraction ✅ Fixed orm-extraction.ts: buildLineOffsets + binary search lineNumberAtOffset
types.ts design goal comment contradiction ✅ Fixed Comment now accurately describes the shared mutable accumulator
Runner exposes full result map (hidden coupling risk) ✅ Fixed runner.ts:100–104 filters to declared deps only
Redundant parse dep in mro, communities, processes ✅ Fixed All three use ctx.totalFiles; mro drops parse dep entirely
Cycle files JSDoc inaccuracy in graph-sort.ts ✅ Fixed Both module-level and function-level docs say "processed last, in an undefined order"
Dep isolation test missing ✅ Added pipeline-runner.test.ts:192–221 — verifies deps.has('a') is false for undeclared phase

Remaining Observations

🟡 Performance

synthesizeWildcardImportBindings called up to N+2 times in parse-impl.ts

Three call sites exist:

  • parse-impl.ts:302: once per worker chunk where chunkNeedsSynthesis[chunkIdx] is true
  • parse-impl.ts:435: once before sequential fallback processing
  • parse-impl.ts:504: once globally at the end (always executes)

The final call at line 504 runs even when all prior chunks already synthesized, traversing all graph nodes and relationships again for no additional effect. A boolean flag tracking whether synthesis has run would eliminate the redundant final pass for repos where every chunk needed synthesis. Fix this →

File contents re-read in cross-file propagation

cross-file-impl.ts:148 calls readFileContents(repoPath, levelPaths) for each topological level's candidates. These same files were already read and discarded during the parse phase. For large repos this is a second disk round-trip for every cross-file candidate. Carrying a Map<string, string> file-content cache forward in ParseOutput would save the repeated I/O.

Array.shift() in runner.ts:60

O(n) per dequeue due to array reindexing. With 12 phases this is imperceptible. Flagging for completeness as the phase graph grows. Fix this →

🟡 Architecture

PipelineContext.totalFiles is the only non-readonly field

types.ts:40 declares totalFiles: number without readonly — intentionally writable so parsePhase can set it once. Every other PipelineContext field is readonly. The JSDoc at line 36–40 explains the design, but there is no runtime or compile-time guard preventing a subsequent phase from overwriting it. Low risk in the current static graph, worth a comment noting "written exactly once, by parsePhase".

cross-file-impl.ts:94–95 still reports phase: 'parsing' internally

onProgress({
  phase: 'parsing',
  percent: 82,
  message: `Cross-file type propagation (${filesWithGaps}+ files)...`,

The outer crossFilePhase is distinct from parse, but the inner impl reports under 'parsing'. This is arguably intentional (the UI progress arc covers 20–82% as "parsing"), but a consumer that routes telemetry by phase name would bucket cross-file work under parse. No direct bug, but worth documenting.

🟢 Minor Style

for (const item of arr) target.push(item) pattern in parse-impl.ts:369–382

Nine consecutive single-element push loops remain. These could be Array.prototype.push.apply(target, arr) (safe for large arrays, avoids call-stack limit of spread), or at minimum target.push(...arr) for small arrays. The current code is correct; this is purely cosmetic.

Progress percentage budget remains implicit

Ranges are hardcoded across files: parse 20–82%, crossFile 82%, mro 83%, communities 84–93%, processes 94–99%. A constants object mapping phase → {start, end} in types.ts or constants.ts would prevent silent overlap if any phase range is adjusted in the future.


Overall Assessment

Ready to merge. All previously identified critical and major issues are cleanly resolved. The architecture is sound, data flow is explicit, and the dep-isolation test correctly enforces the contract. The remaining items above are all 🟡/🟢 carry-overs — worth tracking as follow-ups but not blockers.


…ror progress event

Restores phase diagnostics at CLI/MCP boundary. runPipeline now wraps
phase.execute() in try/catch and rethrows with 'Phase <name> failed: ...'
preserving the original via { cause }. Also emits a terminal
{ phase: 'error' } progress event so subscribers see the failure before
the rejection propagates. Handler errors during error reporting are
swallowed to keep the original cause authoritative.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U1)
…ally; make single-use

crossFile.execute() now wraps its body in try/finally so the accumulator
is released on both the happy path and when runCrossFileBindingPropagation
throws. Dev-mode telemetry stays inside the try block before dispose (all
three counters return 0 after dispose clears internal maps).

BindingAccumulator becomes single-use: appendFile after dispose now throws
'BindingAccumulator: use after dispose' instead of silently re-animating
via the old _disposed auto-clear. Docs updated; the only production
construction site (parse-impl) always creates a fresh instance per run,
so no caller relied on the re-use contract.

Residual risk documented in crossFile module JSDoc: a future phase
inserted between parse and crossFile that throws would still leak the
accumulator. Any such phase must manage accumulator lifetime explicitly.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U2)
Investigation (plan U3) confirms: `importCtx` (ImportResolutionContext)
is a scratch workspace with no downstream consumer after parse.
`resolutionContext` (returned to crossFile) is a distinct object that
owns importMap / namedImportMap / packageMap / moduleAliasMap / model,
and never closes over importCtx. cross-file-impl consumes only that
ctx via processCalls. The two confusingly-similar "context" names
were the root of the adversarial reviewer's concern — comment locks
in the invariant so the next reader sees it.

No behavioral change.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U3)
…rseOutput

totalFiles was a hidden mutable field on PipelineContext written by
parse and read by mro/communities/processes — five reviewers flagged
this as a violation of the immutable-context invariant. Removed from
PipelineContext, which is now fully readonly, and made the implicit
temporal dep explicit: mro/communities/processes now declare 'parse'
as a dep and read totalFiles via getPhaseOutput<ParseOutput>(...).

No behavior change. Topo-sort unchanged because parse was already a
transitive dep through crossFile.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U4)
…kepoint

createMethodExtractor now rejects MethodExtractionConfigs that list
companion_object / singleton_class / object_declaration in
typeDeclarationNodes but omit the matching entry from staticOwnerTypes.
Fails loudly at provider construction time instead of producing
silent isStatic=false on the 50000th file analyzed.

Opt-out convention preserved: an explicit `new Set()` (empty Set)
signals intentional exclusion and passes the guard (memory obs #30588).

All 13 existing language configs pass the guard; the new negative test
fails without it. Test-first.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U5)
…vives throws

The sequential-fallback block in runChunkedParseAndResolve now runs
inside a try/finally that guarantees astCache.clear(), accumulator
finalize, and enrichExportedTypeMap execute even if readFileContents
or processCalls throws mid-fallback. Cleanup failures are caught
inside the finally so they can't mask the original error.

Accumulator disposal ownership remains with crossFile (U2) — U6 only
adds astCache cleanup and preserves finalize ordering on the error
path.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U6)
…-file-impl

Both modules previously had zero direct unit coverage — branches were
exercised only through integration tests' happy paths.

wildcard-synthesis.test.ts covers: Go graph-IMPORTS fallback, Python
moduleAliasMap build, MAX_SYNTHETIC_BINDINGS_PER_FILE cap, dedup
against existing namedImportMap entries, and empty-exportedSymbols
early return.

cross-file-impl.test.ts covers: gapRatio below threshold no-op,
MAX_CROSS_FILE_REPROCESS cap, graph-only exportedTypeMap fallback,
and empty namedImportMap short-circuit.

Tests assert current behavior — any future regression flips them.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U7)
…o fixture

Pins the current post-P1/P2 graph output (57 symbols, 92 relationships,
4 processes, deterministic edge digest) so future silent refactors
cannot drift behavior unnoticed. If any count changes or any edge
rewires, the test fails with a readable diff listing what changed
and a copy-pasteable UPDATE_GOLDEN=1 regen command.

Edge digest keyed by symbolic (label, name, filePath) triples rather
than raw generateId output — stays meaningful across id-encoding
refactors while still catching real semantic rewiring.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U8)
…afeguards

U9: runner cycle detection now reports only the SCC members via DFS
back-edge trace ('Cycle detected: A -> B -> C -> A') rather than
everything with inDegree > 0 (which mixed cycle members with blocked
dependents). Also emits the 'error' progress event for graph-
validation failures, symmetric with U1's runtime-error path.

U16: findEnclosingClassInfo now defends against language-provider
hooks that return non-container nodes — visitedContainers Set breaks
repeat-visit loops, MAX_ENCLOSING_WALK_ITERATIONS is belt-and-braces.
Documented the hook contract invariant so future provider authors
know the walk-continues-upward expectation.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U9, U16)
…t, graph-sort naming

Bundles plan units U10, U11, U12, U14, U15:

U10 — Type hygiene: readonly ParseOutput arrays (allExtractedRoutes,
allDecoratorRoutes, allToolDefs, allORMQueries, allPaths); removed
redundant 'as string[] | undefined' cast in routes.ts and 'as URL' in
parse-impl.ts; WorkerPool is now 'import type'. Readonly contract
propagated into processORMQueries (only iterates).

U11 — Dead code & shims: deleted constants.ts shim (AST_CACHE_CAP
inlined into its sole real consumer cross-file-impl.ts; isDev
consumers now import directly from ../utils/env.js). Removed internal
utility re-exports from pipeline-phases/index.ts (no external
consumers). Removed topologicalLevelSort re-export from pipeline.ts;
updated topological-sort.test.ts to import from the canonical
utils/graph-sort.js. Stripped 'Phase 3+4:' stale JSDoc from
parse-impl.ts.

U12 — Perf: StructureOutput now carries allPathSet (ReadonlySet<string>)
built once; cobol, markdown, and cross-file-impl consume the shared
set instead of allocating their own. Parse forwards it via
ParseOutput.allPathSet; processCobol/processMarkdown widened to
ReadonlySet<string>.

U14 — graph-sort.ts: renamed local 'inDegree' to
'pendingImportsPerFile' with expanded JSDoc explaining the reverse-
graph Kahn's formulation and warning future maintainers not to
'correct' it to standard in-degree semantics. Added self-edge test.

U15 — Unconditional worker-fallback logging: removed isDev guard on
the worker-pool-creation-failure console.warn so operators can
diagnose perf degradations in production.

No behavior change. U8 golden-file test confirms pipeline output is
byte-identical.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U10, U11, U12, U14, U15)
…o 1.3.0

U13 — documentation fixes:

ARCHITECTURE.md: the prior insertion of the 'Pipeline Phase DAG'
section orphaned 7 rows from the 'Where to change what' header.
Moved those 7 rows back up under their header so the table reads
contiguously; DAG section now follows the completed table.

AGENTS.md + CLAUDE.md: bumped version 1.2.0 -> 1.3.0, updated Last
reviewed to 2026-04-13, added matching Changelog row documenting
the GitNexus index stats refresh after the DAG refactor. Stat
bumps (symbols/relationships/execution flows) that were sitting
uncommitted in the working tree are now landed under a proper
changelog entry per each file's own documented schema.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U13)
@magyargergo magyargergo marked this pull request as ready for review April 13, 2026 18:30
@github-actions

github-actions Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
6278 6181 0 97 247s

✅ All 6181 tests passed

97 test(s) skipped — expand for details
  • Swift MethodExtractor > isTypeDeclaration > recognizes class_declaration
  • Swift MethodExtractor > isTypeDeclaration > recognizes protocol_declaration
  • Swift MethodExtractor > isTypeDeclaration > rejects import_declaration
  • Swift MethodExtractor > visibility > extracts public method
  • Swift MethodExtractor > visibility > extracts private method
  • Swift MethodExtractor > visibility > defaults to internal when no modifier
  • Swift MethodExtractor > protocol methods > marks protocol method as abstract
  • Swift MethodExtractor > static and class methods > detects static func as isStatic
  • Swift MethodExtractor > static and class methods > detects class func as isStatic
  • Swift MethodExtractor > parameters > extracts parameters with types and default values
  • Swift MethodExtractor > return type > extracts return type from -> annotation
  • Swift MethodExtractor > annotations > extracts @objc attribute
  • Swift MethodExtractor > isFinal > detects final func
  • Swift MethodExtractor > isFinal > is false when not final
  • Swift MethodExtractor > isAsync > detects async func
  • Swift MethodExtractor > isOverride > detects override method
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift explicit init inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference does not bind plain functions
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()
  • Swift implicit imports (cross-file visibility) > detects UserService class in Models.swift
  • Swift implicit imports (cross-file visibility) > resolves UserService() constructor call across files (no explicit import)
  • Swift implicit imports (cross-file visibility) > resolves service.fetchUser() member call across files
  • Swift implicit imports (cross-file visibility) > creates IMPORTS edges between files in the same module
  • Swift extension deduplication > detects Product class
  • Swift extension deduplication > resolves Product() constructor despite extension creating duplicate class node
  • Swift extension deduplication > resolves product.save() to Product.swift (primary definition)
  • Swift constructor call fallback (no new keyword) > resolves OCRService() as constructor call across files
  • Swift constructor call fallback (no new keyword) > resolves ocr.recognize() member call via constructor-inferred type
  • Swift export visibility (internal vs private) > resolves PublicService() constructor across files
  • Swift export visibility (internal vs private) > resolves internalHelper() across files (internal = module-scoped)
  • Swift if let / guard let binding resolution > detects User and Repo classes
  • Swift if let / guard let binding resolution > resolves user.save() inside if-let to User#save
  • Swift if let / guard let binding resolution > resolves repo.save() inside guard-let to Repo#save
  • Swift if let / guard let binding resolution > user.save() in if-let does NOT resolve to Repo#save
  • Swift await / try expression unwrapping > resolves user.save() via await fetchUser() return type
  • Swift await / try expression unwrapping > resolves repo.save() via try parseRepo() return type
  • Swift await / try expression unwrapping > detects fetchUser and parseRepo as functions
  • Swift for-in loop element type inference > detects User and Repo classes
  • Swift for-in loop element type inference > creates implicit import edges between files
  • Swift field-type resolution > detects classes and their properties
  • Swift field-type resolution > emits HAS_PROPERTY edges from class to field
  • Swift field-type resolution > resolves field-chain call user.address.save() → Address#save
  • Swift field-type resolution > emits ACCESSES edges for field reads in chains
  • Swift field-type resolution > populates field metadata (visibility, declaredType) on Property nodes
  • Swift call-result binding > resolves call-result-bound method call user.save() → User#save
  • Swift call-result binding > getUser() is present as a defined function
  • Swift call-result binding > emits processUser -> getUser CALLS edge for let-assigned free function call
  • Swift method enrichment > detects Animal protocol and Dog class
  • Swift method enrichment > emits IMPLEMENTS edge Dog -> Animal
  • Swift method enrichment > emits HAS_METHOD edges for Dog methods
  • Swift method enrichment > marks protocol Animal.speak as isAbstract
  • Swift method enrichment > marks Dog.speak as NOT isAbstract
  • Swift method enrichment > marks breathe as isFinal
  • Swift method enrichment > marks classify as isStatic
  • Swift method enrichment > captures @objc annotation on breathe
  • Swift method enrichment > populates parameterTypes for classify(_ name: String)
  • Swift method enrichment > records parameterCount for classify
  • Swift method enrichment > records returnType for speak
  • Swift method enrichment > resolves dog.speak() CALLS edge
  • Swift method enrichment > resolves Dog.classify("dog") CALLS edge
  • Swift abstract dispatch > detects Repository protocol and SqlRepository class
  • Swift abstract dispatch > emits IMPLEMENTS edge SqlRepository -> Repository
  • Swift abstract dispatch > emits HAS_METHOD edges for Repository.find and Repository.save
  • Swift abstract dispatch > emits HAS_METHOD edges for SqlRepository.find and SqlRepository.save
  • Swift abstract dispatch > marks base Repository.find as isAbstract
  • Swift abstract dispatch > marks base Repository.save as isAbstract
  • Swift abstract dispatch > marks concrete SqlRepository.find as NOT isAbstract
  • Swift abstract dispatch > resolves repo.find(id: 42) CALLS edge
  • Swift abstract dispatch > resolves repo.save(entity: user) CALLS edge
  • Swift abstract dispatch > populates parameterTypes for Repository.find
  • Swift abstract dispatch > populates parameterTypes for Repository.save
  • Swift abstract dispatch > records returnType for SqlRepository.find
  • Swift abstract dispatch > emits METHOD_IMPLEMENTS edges from SqlRepository methods → Repository protocol methods
  • Swift overloaded method disambiguation > detects 2 distinct find Method nodes on SqlRepository
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edges for both find overloads
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edge for save
  • Swift overloaded method disambiguation > emits exactly 3 METHOD_IMPLEMENTS edges total
  • Swift Child extends Parent — inherited method resolution (SM-9) > detects Parent and Child classes
  • Swift Child extends Parent — inherited method resolution (SM-9) > resolves c.parentMethod() to Parent.parentMethod via first-wins MRO walk

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 72.74% 16743/23015 72.55% 📈 +0.2 🟢 ██████████████░░░░░░
Branches 61.61% 10661/17302 61.49% 📈 +0.1 🟢 ████████████░░░░░░░░
Functions 77.72% 1560/2007 77.44% 📈 +0.3 🟢 ███████████████░░░░░
Lines 75.16% 15208/20234 74.97% 📈 +0.2 🟢 ███████████████░░░░░

📋 View full run · Generated by CI

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude review this from performance, architecture perspective. is there any bugs? does it follow clean code principles?

@github-actions

github-actions Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 2m 20s —— View job


PR Review — Pass 5

Tasks:

  • Gather context and understand the PR
  • Read current state of key files (runner.ts, parse-impl.ts, mro.ts, cross-file-impl.ts, types.ts, communities.ts, processes.ts, parse.ts, wildcard-synthesis.ts)
  • Verify remaining items from pass 4 are resolved
  • Analyze for new issues (performance, architecture, bugs, clean code)
  • Post comprehensive review

Summary

Pass 4 declared "ready to merge" — verified that all four critical issues are still cleanly resolved. However, three items that pass 4 marked as ✅ Fixed are actually still present in the code, and a new mutable-output issue is worth flagging.


❌ Pass 4 Incorrectly Marked as Fixed

🟡 Redundant parse dep in mro, communities, and processes — NOT Fixed

Pass 4 said: "All three use ctx.totalFiles". This is incorrect. The code does not have a ctx.totalFiles field. Reading the actual files:

  • mro.ts:27: deps: ['crossFile', 'parse'] — still declares parse
  • mro.ts:33: const { totalFiles } = getPhaseOutput<ParseOutput>(deps, 'parse') — still reads from parse output
  • communities.ts:24: deps: ['mro', 'parse'] — still declares parse
  • communities.ts:30: const { totalFiles } = getPhaseOutput<ParseOutput>(deps, 'parse') — still reads from parse output
  • processes.ts:28: deps: ['communities', 'routes', 'tools', 'parse'] — still declares parse
  • processes.ts:34: const { totalFiles } = getPhaseOutput<ParseOutput>(deps, 'parse') — still reads from parse output

PipelineContext (types.ts) has no totalFiles field. The fix was not applied. Three phases still declare a structural data dependency on parse solely for a progress-reporting counter. Fix this →

🟡 cross-file-impl.ts reports phase: 'parsing' — NOT Fixed

Pass 4 said: "cross-file-impl.ts:94–95 still reports phase: 'parsing' internally — arguably intentional". The current code at line 97 confirms it is still phase: 'parsing'. This isn't a blocker but it means any telemetry or monitoring that groups events by phase name will bucket cross-file re-resolution time under "parsing". At minimum a comment should document this as intentional.

🟡 synthesizeWildcardImportBindings redundant final call — NOT Fixed

parse-impl.ts:536 unconditionally calls synthesizeWildcardImportBindings at the end of runChunkedParseAndResolve, even when all chunks already ran per-chunk synthesis at line 302. For a repo where every chunk has wildcard-import files, the final call at line 536 traverses all graph nodes and relationships again for zero net effect. A simple let hasSynthesized = false flag set at line 302 would let the line 536 call be skipped. Fix this →


New Observation

🟡 ParseOutput.exportedTypeMap is mutable and mutated downstream — asymmetric with the allFetchCalls fix

parse.ts:35 declares:

exportedTypeMap: ExportedTypeMap;

(no readonly, no Readonly<> wrapper — ExportedTypeMap is a Map)

cross-file-impl.ts:51–54 mutates it directly:

const graphExports = buildExportedTypeMapFromGraph(graph, ctx.model.symbols);
for (const [fp, exports] of graphExports) exportedTypeMap.set(fp, exports);

The allFetchCalls mutation bug was fixed in a prior pass by making that array readonly. The same pattern applies here: crossFilePhase is retroactively modifying a field in ParseOutput — an upstream phase's output. This is intentional (crossFile enriches the map), but the type does not enforce it, so any other downstream phase (mro, communities, processes) could also mutate it without a type error. Typing it as ReadonlyMap<string, ReadonlyMap<string, string>> in ParseOutput would enforce that only crossFilePhase (which receives the mutable reference directly from runChunkedParseAndResolve) can write to it. Fix this →


Carry-over Items (Still Present, Not Blockers)

Issue Location Priority
Array.shift() O(n) dequeue runner.ts:60 🟢
File contents re-read in cross-file propagation cross-file-impl.ts:150 🟡
for (const item of arr) target.push(item) pattern parse-impl.ts:315–382 🟢
Progress percentage budget implicit and scattered All phase files 🟢

Overall Assessment

The four critical issues from passes 1–2 remain fixed. The PR is functionally sound. Three items that pass 4 incorrectly marked as resolved are still present — the most actionable is the redundant parse dep in three phases, which could be addressed with the ctx.totalFiles approach or alternatively by promoting totalFiles directly from StructureOutput into the context at pipeline setup time. The exportedTypeMap mutability asymmetry is a new 🟡 worth addressing before the pattern proliferates.

…ut.exportedTypeMap, skip redundant wildcard synth

- mro/communities/processes: switch redundant `parse` dep to `structure` —
  totalFiles originates in structure, so depending on parse for it was a
  spurious data dep that obscured the real DAG.
- ParseOutput.exportedTypeMap: typed as truly ReadonlyMap<...,ReadonlyMap>>;
  graph→exports enrichment moved into parse-impl so the snapshot is
  fully populated at parse return. crossFile builds its own local mutable
  working copy for per-file re-resolution writes — no cast at the boundary.
- parse-impl: hasSynthesized flag guards the unconditional final
  synthesizeWildcardImportBindings call when per-chunk/fallback synthesis
  already ran (graph-global + idempotent across chunks).
- cross-file-impl: documented the intentional `phase: 'parsing'` progress
  label so telemetry bucketing stays consistent with the parse phase.
- cross-file-impl test: replaced the now-moved fallback-enrichment
  assertion with a stronger one — crossFile must not mutate the
  parse-supplied map.

Addresses PR #809 review pass 5 carry-overs.
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude review this from performance, architecture perspective. is there any bugs? does it follow clean code principles?

@github-actions

github-actions Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 2m 11s —— View job


PR Review — Pass 6

Tasks:

  • Gather context and understand the PR
  • Read current state of key files (mro.ts, communities.ts, processes.ts, parse-impl.ts, cross-file-impl.ts, parse.ts, types.ts, runner.ts, structure.ts, cross-file.ts)
  • Verify which Pass 5 items are resolved vs still open
  • Check for new issues not previously identified
  • Post comprehensive review

Summary

All previously-flagged issues from Pass 5 are now resolved. The PR is clean and ready to merge. Below is a verification of those fixes plus a handful of new minor observations.


✅ Pass 5 Items — All Resolved

Issue Status Evidence
Redundant parse dep in mro, communities, processes ✅ Fixed All three now declare 'structure' instead of 'parse'; read totalFiles via getPhaseOutput<StructureOutput>(deps, 'structure')
synthesizeWildcardImportBindings redundant final call ✅ Fixed parse-impl.ts:242 adds let hasSynthesized = false; final call at line 549 is guarded by if (!hasSynthesized)
ParseOutput.exportedTypeMap mutable ✅ Fixed parse.ts:44 types it as ReadonlyMap<string, ReadonlyMap<string, string>>; cross-file-impl.ts:55–58 builds a local mutable working copy
cross-file-impl.ts phase: 'parsing' undocumented ✅ Documented cross-file-impl.ts:98–105 has an explicit comment explaining the intentional bucketing under 'parsing' and when to revisit it

Remaining Observations

🟡 ParseOutput.bindingAccumulator and resolutionContext lack readonly

parse.ts:50–52:

bindingAccumulator: BindingAccumulator;
resolutionContext: ReturnType<typeof createResolutionContext>;

All other ParseOutput array fields (allFetchCalls, allExtractedRoutes, etc.) are marked readonly. These two are not, meaning any phase with parse in its deps could reassign the fields. In practice, only crossFile accesses these (and correctly disposes the accumulator in its finally block), but the type does not enforce it. Adding readonly here would complete the consistency and prevent accidental reassignment. Fix this →

🟡 ParseOutput.totalFiles lacks readonly

parse.ts:58:

totalFiles: number;

This is the only scalar field in ParseOutput without readonly. mro, communities, and processes no longer read it from ParseOutput (they use structure now), but crossFile still reads it at cross-file.ts:52. There is no reason it should be writable after parse returns. Fix this →

🟡 Inner Map values in exportedTypeMap are mutable at runtime despite ReadonlyMap type

runChunkedParseAndResolve (line 235) builds exportedTypeMap as Map<string, Map<string, string>>. The ParseOutput interface widens the type to ReadonlyMap<string, ReadonlyMap<string, string>>. TypeScript accepts this because Map extends ReadonlyMap, but the inner Map<string, string> instances are still mutable objects at runtime — the ReadonlyMap wrapping is purely a type-level constraint and can be cast away without a type error.

In practice this is safe today: cross-file-impl.ts:57 correctly copies with new Map(exports) instead of mutating. However, a future consumer receiving parseOutput.exportedTypeMap.get(fp) has a runtime-mutable map despite the readonly type. Object.freeze() on the inner maps (or using a ReadonlyMap constructor wrapper) would close the gap if strict immutability is desired.

🟢 Array.shift() O(n) dequeue in runner.ts:60

Still present. With 12 phases this is entirely imperceptible — flagging as a carry-over from prior passes.

🟢 for (const item of arr) target.push(item) pattern in parse-impl.ts:325–330

Three consecutive single-push loops:

for (const item of chunkWorkerData.calls) deferredWorkerCalls.push(item);
for (const item of chunkWorkerData.heritage) deferredWorkerHeritage.push(item);
for (const item of chunkWorkerData.constructorBindings) deferredConstructorBindings.push(item);

Could be Array.prototype.push.apply(target, source) (safe for large arrays). Cosmetic only.

🟢 Progress percentage budget still implicit

Progress ranges (structure: 15–20%, parse: 20–82%, mro: 83%, communities: 84–93%, processes: 94–99%) remain hardcoded per-phase with no central table. Carry-over from prior passes — worth a follow-up issue but not a blocker.


Overall Assessment

Ready to merge. All critical and major issues from all five prior passes are cleanly resolved. The architecture is sound, data flow is explicit and enforced at the type boundary, the dep-isolation runner correctly filters to declared dependencies, and the hasSynthesized flag eliminates the redundant wildcard-synthesis traversal. The remaining items above are all 🟡/🟢 — good candidates for a follow-up cleanup PR but not blockers for this one.


@magyargergo magyargergo changed the title Refactor: move language-specific container node logic into LanguageProvider refactor(pipeline): DAG-based phase architecture + container-logic extraction to LanguageProvider Apr 13, 2026
@magyargergo magyargergo merged commit 26ff700 into main Apr 13, 2026
14 checks passed
@magyargergo magyargergo deleted the copilot/refactor-move-container-logic branch April 13, 2026 19:31
@magyargergo magyargergo mentioned this pull request Apr 13, 2026
2 tasks
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
…traction to LanguageProvider (abhigyanpatwari#809)

* Initial plan

* refactor: move language-specific container node logic into LanguageProvider

- Add resolveEnclosingOwner hook to LanguageProviderConfig
- Add staticOwnerTypes to MethodExtractionConfig
- Implement Ruby resolveEnclosingOwner (singleton_class → class/module)
- Replace hardcoded STATIC_OWNER_TYPES with config.staticOwnerTypes
- Move Ruby static types to rubyMethodConfig
- Move Kotlin static types to kotlinMethodConfig
- Remove Ruby singleton_class branch from findEnclosingClassInfo
- Collapse seqFindEnclosingClassNode/seqFindRawEnclosingContainerNode
  into single provider-aware seqFindEnclosingOwnerNode
- Update worker path to pass provider.resolveEnclosingOwner

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/bc9f9d4d-f749-4872-9ff2-17fc86e08787

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* test: add regression tests for config-driven staticOwnerTypes and resolveEnclosingOwner hook

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/bc9f9d4d-f749-4872-9ff2-17fc86e08787

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* refactor: implement DAG-based pipeline architecture with phase extraction

Restructure the ingestion pipeline from a ~1800-line monolithic orchestrator
into a DAG (Directed Acyclic Graph) of named phases with explicit dependencies.

New files under pipeline-phases/:
- types.ts: PipelinePhase, PipelineContext, PhaseResult contracts
- runner.ts: DAG runner with topological sort validation
- scan.ts, structure.ts, markdown.ts, cobol.ts: early phases
- parse.ts + parse-impl.ts: chunked parse + resolve (the core)
- routes.ts, tools.ts, orm.ts: post-parse enrichment phases
- cross-file.ts + cross-file-impl.ts: cross-file binding propagation
- mro.ts, communities.ts, processes.ts: graph analysis phases
- index.ts: barrel export

pipeline.ts reduced from ~1960 lines to ~184 lines:
- DAG phase array declaration
- runPipelineFromRepo as thin orchestrator
- topologicalLevelSort retained for backward compat

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/136bf9c3-2f4f-449b-9fff-001332c8371c

* test: add DAG runner unit tests, update ARCHITECTURE.md with phase DAG docs

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/136bf9c3-2f4f-449b-9fff-001332c8371c

* fix: address code review - pass resolutionContext through parse output, fix worker URL path

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/136bf9c3-2f4f-449b-9fff-001332c8371c

* fix: declare transitive parse dependency explicitly in mro/communities/processes phases

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/136bf9c3-2f4f-449b-9fff-001332c8371c

* refactor: improve pipeline-phases clean code and folder structure

- Extract synthesizeWildcardImportBindings to wildcard-synthesis.ts
- Extract extractORMQueriesInline to orm-extraction.ts
- Create shared constants.ts for AST_CACHE_CAP
- Fix inline type import in orm.ts (use proper top-level import)
- Add comprehensive JSDoc to getPhaseOutput explaining type safety
- Move isDev to module level in cross-file.ts (consistency)
- Improve module-level documentation across files
- Organize barrel exports in index.ts with section comments

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/2bd6d4aa-6271-4009-8dd2-332ea8ec73ab

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* address review feedback: fix circular dep, allFetchCalls mutation, progress bugs, remove DAG naming, extract isDev, fix _item naming, fix O(n²) line calc

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/6cf53c9b-d55d-4c6f-bf3d-7bfb82d512b6

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* improve JSDoc on lineNumberAtOffset binary search

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/6cf53c9b-d55d-4c6f-bf3d-7bfb82d512b6

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* address review: filter deps in runner, move totalFiles to ctx, fix cycle JSDoc, centralize isDev, remove DAG naming

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/b388424f-b939-4a94-97de-3855f9465564

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* fix doc consistency in graph-sort.ts module-level and function-level JSDoc

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/b388424f-b939-4a94-97de-3855f9465564

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* fix(pipeline): wrap phase errors with phase name and emit terminal error progress event

Restores phase diagnostics at CLI/MCP boundary. runPipeline now wraps
phase.execute() in try/catch and rethrows with 'Phase <name> failed: ...'
preserving the original via { cause }. Also emits a terminal
{ phase: 'error' } progress event so subscribers see the failure before
the rejection propagates. Handler errors during error reporting are
swallowed to keep the original cause authoritative.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U1)

* fix(pipeline): move bindingAccumulator dispose into crossFile try/finally; make single-use

crossFile.execute() now wraps its body in try/finally so the accumulator
is released on both the happy path and when runCrossFileBindingPropagation
throws. Dev-mode telemetry stays inside the try block before dispose (all
three counters return 0 after dispose clears internal maps).

BindingAccumulator becomes single-use: appendFile after dispose now throws
'BindingAccumulator: use after dispose' instead of silently re-animating
via the old _disposed auto-clear. Docs updated; the only production
construction site (parse-impl) always creates a fresh instance per run,
so no caller relied on the re-use contract.

Residual risk documented in crossFile module JSDoc: a future phase
inserted between parse and crossFile that throws would still leak the
accumulator. Any such phase must manage accumulator lifetime explicitly.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U2)

* docs(pipeline): explain why importCtx teardown is safe before crossFile

Investigation (plan U3) confirms: `importCtx` (ImportResolutionContext)
is a scratch workspace with no downstream consumer after parse.
`resolutionContext` (returned to crossFile) is a distinct object that
owns importMap / namedImportMap / packageMap / moduleAliasMap / model,
and never closes over importCtx. cross-file-impl consumes only that
ctx via processCalls. The two confusingly-similar "context" names
were the root of the adversarial reviewer's concern — comment locks
in the invariant so the next reader sees it.

No behavioral change.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U3)

* refactor(pipeline): remove ctx.totalFiles side-channel; promote to ParseOutput

totalFiles was a hidden mutable field on PipelineContext written by
parse and read by mro/communities/processes — five reviewers flagged
this as a violation of the immutable-context invariant. Removed from
PipelineContext, which is now fully readonly, and made the implicit
temporal dep explicit: mro/communities/processes now declare 'parse'
as a dep and read totalFiles via getPhaseOutput<ParseOutput>(...).

No behavior change. Topo-sort unchanged because parse was already a
transitive dep through crossFile.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U4)

* feat(method-extractor): runtime staticOwnerTypes guard at factory chokepoint

createMethodExtractor now rejects MethodExtractionConfigs that list
companion_object / singleton_class / object_declaration in
typeDeclarationNodes but omit the matching entry from staticOwnerTypes.
Fails loudly at provider construction time instead of producing
silent isStatic=false on the 50000th file analyzed.

Opt-out convention preserved: an explicit `new Set()` (empty Set)
signals intentional exclusion and passes the guard (memory obs #30588).

All 13 existing language configs pass the guard; the new negative test
fails without it. Test-first.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U5)

* fix(pipeline): wrap sequential-fallback in try/finally so cleanup survives throws

The sequential-fallback block in runChunkedParseAndResolve now runs
inside a try/finally that guarantees astCache.clear(), accumulator
finalize, and enrichExportedTypeMap execute even if readFileContents
or processCalls throws mid-fallback. Cleanup failures are caught
inside the finally so they can't mask the original error.

Accumulator disposal ownership remains with crossFile (U2) — U6 only
adds astCache cleanup and preserves finalize ordering on the error
path.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U6)

* test(pipeline): direct unit coverage for wildcard-synthesis and cross-file-impl

Both modules previously had zero direct unit coverage — branches were
exercised only through integration tests' happy paths.

wildcard-synthesis.test.ts covers: Go graph-IMPORTS fallback, Python
moduleAliasMap build, MAX_SYNTHETIC_BINDINGS_PER_FILE cap, dedup
against existing namedImportMap entries, and empty-exportedSymbols
early return.

cross-file-impl.test.ts covers: gapRatio below threshold no-op,
MAX_CROSS_FILE_REPROCESS cap, graph-only exportedTypeMap fallback,
and empty namedImportMap short-circuit.

Tests assert current behavior — any future regression flips them.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U7)

* test(pipeline): golden-file graph-parity regression guard on mini-repo fixture

Pins the current post-P1/P2 graph output (57 symbols, 92 relationships,
4 processes, deterministic edge digest) so future silent refactors
cannot drift behavior unnoticed. If any count changes or any edge
rewires, the test fails with a readable diff listing what changed
and a copy-pasteable UPDATE_GOLDEN=1 regen command.

Edge digest keyed by symbolic (label, name, filePath) triples rather
than raw generateId output — stays meaningful across id-encoding
refactors while still catching real semantic rewiring.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U8)

* fix(pipeline): minimal cycle reporting + resolveEnclosingOwner loop safeguards

U9: runner cycle detection now reports only the SCC members via DFS
back-edge trace ('Cycle detected: A -> B -> C -> A') rather than
everything with inDegree > 0 (which mixed cycle members with blocked
dependents). Also emits the 'error' progress event for graph-
validation failures, symmetric with U1's runtime-error path.

U16: findEnclosingClassInfo now defends against language-provider
hooks that return non-container nodes — visitedContainers Set breaks
repeat-visit loops, MAX_ENCLOSING_WALK_ITERATIONS is belt-and-braces.
Documented the hook contract invariant so future provider authors
know the walk-continues-upward expectation.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U9, U16)

* refactor(pipeline): type hygiene, dead code cleanup, shared allPathSet, graph-sort naming

Bundles plan units U10, U11, U12, U14, U15:

U10 — Type hygiene: readonly ParseOutput arrays (allExtractedRoutes,
allDecoratorRoutes, allToolDefs, allORMQueries, allPaths); removed
redundant 'as string[] | undefined' cast in routes.ts and 'as URL' in
parse-impl.ts; WorkerPool is now 'import type'. Readonly contract
propagated into processORMQueries (only iterates).

U11 — Dead code & shims: deleted constants.ts shim (AST_CACHE_CAP
inlined into its sole real consumer cross-file-impl.ts; isDev
consumers now import directly from ../utils/env.js). Removed internal
utility re-exports from pipeline-phases/index.ts (no external
consumers). Removed topologicalLevelSort re-export from pipeline.ts;
updated topological-sort.test.ts to import from the canonical
utils/graph-sort.js. Stripped 'Phase 3+4:' stale JSDoc from
parse-impl.ts.

U12 — Perf: StructureOutput now carries allPathSet (ReadonlySet<string>)
built once; cobol, markdown, and cross-file-impl consume the shared
set instead of allocating their own. Parse forwards it via
ParseOutput.allPathSet; processCobol/processMarkdown widened to
ReadonlySet<string>.

U14 — graph-sort.ts: renamed local 'inDegree' to
'pendingImportsPerFile' with expanded JSDoc explaining the reverse-
graph Kahn's formulation and warning future maintainers not to
'correct' it to standard in-degree semantics. Added self-edge test.

U15 — Unconditional worker-fallback logging: removed isDev guard on
the worker-pool-creation-failure console.warn so operators can
diagnose perf degradations in production.

No behavior change. U8 golden-file test confirms pipeline output is
byte-identical.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U10, U11, U12, U14, U15)

* docs: fix ARCHITECTURE.md table integrity; bump AGENTS.md/CLAUDE.md to 1.3.0

U13 — documentation fixes:

ARCHITECTURE.md: the prior insertion of the 'Pipeline Phase DAG'
section orphaned 7 rows from the 'Where to change what' header.
Moved those 7 rows back up under their header so the table reads
contiguously; DAG section now follows the completed table.

AGENTS.md + CLAUDE.md: bumped version 1.2.0 -> 1.3.0, updated Last
reviewed to 2026-04-13, added matching Changelog row documenting
the GitNexus index stats refresh after the DAG refactor. Stat
bumps (symbols/relationships/execution flows) that were sitting
uncommitted in the working tree are now landed under a proper
changelog entry per each file's own documented schema.

Plan: docs/plans/2026-04-13-001-fix-pipeline-dag-refactor-review-findings-plan.md (U13)

* refactor(pipeline): drop spurious parse deps, true-readonly ParseOutput.exportedTypeMap, skip redundant wildcard synth

- mro/communities/processes: switch redundant `parse` dep to `structure` —
  totalFiles originates in structure, so depending on parse for it was a
  spurious data dep that obscured the real DAG.
- ParseOutput.exportedTypeMap: typed as truly ReadonlyMap<...,ReadonlyMap>>;
  graph→exports enrichment moved into parse-impl so the snapshot is
  fully populated at parse return. crossFile builds its own local mutable
  working copy for per-file re-resolution writes — no cast at the boundary.
- parse-impl: hasSynthesized flag guards the unconditional final
  synthesizeWildcardImportBindings call when per-chunk/fallback synthesis
  already ran (graph-global + idempotent across chunks).
- cross-file-impl: documented the intentional `phase: 'parsing'` progress
  label so telemetry bucketing stays consistent with the parse phase.
- cross-file-impl test: replaced the now-moved fallback-enrichment
  assertion with a stronger one — crossFile must not mutate the
  parse-supplied map.

Addresses PR abhigyanpatwari#809 review pass 5 carry-overs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Co-authored-by: Gergo Magyar <gergomagyar@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: move language-specific container node logic into LanguageProvider

2 participants