feat(ingestion): add control-flow-graph layer for TS/JS (#2081)#2099
Conversation
) U1 of M1 (CFG layer). Plain JSON-serializable CFG data model (BasicBlockData/ CfgEdgeData/FunctionCfg — must survive the worker→main boundary + ParsedFile store), a CfgBuilder accumulator (leaders→blocks→edges, synthetic ENTRY/EXIT, idempotent edges), a ControlFlowContext (break/continue/switch + labeled-jump target stacks), and a TraversalResult ({entry, dangling exits}). AST-agnostic and unit-tested on the classic control-flow topologies (if/else, while back-edge, mid-block return, labeled break/continue) the S2 spike validated; reachability helper backs the R9 property test.
…npatwari#2081) Add the TS/JS CfgVisitor that walks a function's tree-sitter AST and drives the U1 CfgBuilder to produce a serializable FunctionCfg. One visitor covers both languages (shared grammar family). Handles the classic CFG hazards explicitly (R2, R10): - loops allocate a dedicated loop-exit block so `break` has a concrete target before the loop's successor is known; `continue`/back-edge close the loop (while, do-while, C-for with init-once + increment-as-continue-target, for-in, for-of) - switch fallthrough falls out naturally: a non-breaking case yields exits we wire to the next case as `fallthrough`; a breaking case wires to the switch exit via ControlFlowContext - try/catch/finally: normal completion AND exceptional flow both route through finally (post-domination); a conservative exceptional edge models that the protected region may raise to its handler (not just explicit `throw`) - labeled break/continue resolve against the labeled loop's frame - early return/throw wire to EXIT/handler and terminate their block 19 hazard tests (one per construct) + AC1 10-function fixture; all green. No change to the committed U1 core or ControlFlowContext. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ence (abhigyanpatwari#2081) Run the CFG visitor in the parse worker (where the AST lives), serialize the per-function CFG onto a new ParsedFile.cfgSideChannel, and keep it coherent across the disk-backed store and the warm/durable parse cache (R3, R4). - gitnexus-shared parsed-file.ts: add `cfgSideChannel?: unknown` as a DISTINCT field from captureSideChannel (different producer/consumer/lifecycle; plain JSON data — blocks/edges deliberately lack the `nodeId` the store's interning reviver keys on, so no mis-interning). - cfg/types.ts + visitors/typescript.ts: add CfgVisitor.isFunction so the worker enumerates functions (and applies the line budget) by a cheap node-type test. - cfg/collect.ts (new): collectFunctionCfgs walks the tree, builds one CFG per function (nested included), applies maxFunctionLines (over-cap = skipped). - language-provider.ts: add `cfgVisitor?: CfgVisitor<SyntaxNode>` hook; typescript.ts attaches it to both the TS and JS providers (shared grammar). - parse-worker.ts: read pdg + pdgMaxFunctionLines from workerData (read once at init — the worker never sees PipelineOptions), gate the build, attach cfgSideChannel alongside captureSideChannel. - parse-cache.ts: bump SCHEMA_BUMP 4→5 (ParsedFile shape changed) and fold the pdg flag into computeChunkHash so a pdg-off cached chunk is NOT reused on a --pdg run (the abhigyanpatwari#2038-class warm-cache trap). Default path keeps its keys. - worker-pool.ts + parse-impl.ts + pipeline.ts: thread pdg/pdgMaxFunctionLines PipelineOptions → WorkerPoolOptions → workerData, and into the chunk-hash key. 9 boundary tests: collect contract, JSON round-trip identity (no AST leakage), the pdg cache-key guard, the line-cap skip, and the no-visitor gate. Full CFG suite (U1+U2+U3) green; build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bhigyanpatwari#2081) Emit persisted BasicBlock nodes + CFG edges from each ParsedFile's worker-built cfgSideChannel, INSIDE scope-resolution's Phase-4 graph emission — the last point where the worker-built CFGs are loaded (emitParsedFiles carries the channel; the disk store is cleared right after the orchestrator returns). This is the architecture the doc-review corrected to: a standalone post-`mro` phase (the issue's literal subtask) provably reads empty data (KTD1). - cfg/emit.ts (new): pure emitFileCfgs(graph, cfgs, maxEdgesPerFunction, onWarn). BasicBlock id = `BasicBlock:<filePath>:<functionStartLine>:<blockIndex>` (KTD3 — funcStart disambiguates blocks across functions in one file; no `name` column). CFG edge = CodeRelation type 'CFG' with the edge KIND (seq/cond-true/…) in `reason` (kinds can't be their own edge type). Per- function edge cap stops at the cap and warns with the dropped count — no silent truncation (R6/KTD6). - run.ts: pdg-gated emit pass over emitParsedFiles after emitPostResolutionEdges (store still live); RunScopeResolutionInput gains pdg + pdgMaxEdgesPerFunction. - phase.ts: thread ctx.options.pdg / pdgMaxEdgesPerFunction into the call. - pipeline.ts: PipelineOptions.pdgMaxEdgesPerFunction. 6 tests: node/edge shape (KTD3 id, no name, type='CFG', kind in reason), cross-function id uniqueness, AC2 reachability-from-ENTRY property, the edge cap's no-silent-truncation contract, and empty-input no-op. Flag-off byte-identity + full runPipelineFromRepo round-trip land in U7. Build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ks) (abhigyanpatwari#2081) Expose the CFG/PDG substrate as an opt-in and thread it from CLI/.gitnexusrc to the single source of truth (PipelineOptions.pdg), which fans out to BOTH sinks already wired in U3/U4: the worker build gate (workerData.pdg) and the scope-resolution emit gate. Off by default (R7). - cli/index.ts: `--pdg` commander flag. - cli/analyze.ts: AnalyzeOptions.pdg + pass `pdg` into runFullAnalysis options. - cli/analyze-config.ts: KEY_SPECS `pdg` (boolean) so `.gitnexusrc { "pdg": true }` normalizes and a non-boolean value fails closed with GitNexusRcError. - core/run-analyze.ts: AnalyzeOptions.pdg → runPipelineFromRepo({ pdg }). (The internal PipelineOptions/WorkerPoolOptions/workerData fields + the parse-cache key fold landed in U3/U4; this unit adds the user-facing surface. The budget knobs stay at internal defaults for M1.) Tests: analyze-config pdg normalization + non-boolean rejection; opt-in.test.ts covers the CLI/file merge precedence and that pdg perturbs the chunk-dispatch key. The full worker-build + main-emit round-trip is the U7 integration test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ocs (abhigyanpatwari#2081) Acceptance criteria for the M1 CFG layer: - AC1: a 10-function TS fixture's CFG node/edge set matches a committed snapshot (cfg-snapshot.test.ts). - AC2: every BasicBlock is reachable from its function ENTRY (property test over the emitted graph; the fixture has no dead code). - AC3: hazard fixtures lock the classic-bug coverage — try/throw/finally post-domination + labeled break/continue resolution. - AC4: the existing pipeline-graph-golden test stays byte-identical with --pdg off (verified; no UPDATE_GOLDEN), proving the opt-in adds zero default-run drift. - End-to-end (pipeline-pdg.test.ts): runPipelineFromRepo({ pdg: true }) on a tiny repo emits BasicBlock nodes + CFG edges with both endpoints present — the true both-sinks proof (worker builds → store → scope-resolution emits); the default run emits zero. Docs: CHANGELOG M1 entry, ARCHITECTURE "Optional CFG/PDG emission" subsection (why emit is in-phase, not post-mro), README CFG language-support note. Full CFG suite (U1–U7): 56 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wari#2081) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
) Review (10 reviewers) confirmed OFF-path byte-identity (adversarial + golden) and found defects all within the --pdg path. Fixes: - P1 same-line BasicBlock id collision: add a start-column disambiguator to FunctionCfg + the id (`BasicBlock:<file>:<line>:<col>:<idx>`) so two functions sharing a start line no longer collide under first-writer-wins addNode. - P1 worker crash-cascade: per-file try/catch around collectFunctionCfgs so a CFG-build throw cannot escape to the language-group catch and silently drop every remaining file in the group. - P2 edge-cap drop now logs unconditionally (input.onWarn is validator-gated/ silent in prod) — upholds the no-silent-truncation guarantee. - P2 Array.isArray guard before the cfgSideChannel cast in run.ts. - P2 maxFunctionLines default: worker applies DEFAULT_PDG_MAX_FUNCTION_LINES=2000 when unset; caps forwarded through run-analyze AnalyzeOptions (closes the server-path drop). - P3 README duplicate paragraph removed; `0`-vs-default docstrings corrected; CLI --pdg flag made language-neutral; reachableBlocks JSDoc corrected. - Documented the break-through-finally + stacked-label CFG limitations. - Tests: same-line id-collision regression, standalone throw→EXIT, dead-code- after-return, async/generator/method coverage, strengthened labeled-continue. Refuted: the HTTP-500 getNodeQuery finding — M0 already shipped the BasicBlock branch + name-floor (R12/web-safety handled). CFG + analyze-config suites: 95 tests green; golden parity (AC4) byte-identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 11146 tests passed 16 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
…ation (abhigyanpatwari#2081) Closes the M1 review's requires_verification perf gap ("no benchmark for collectFunctionCfgs; a wall-time + cfgSideChannel byte-size regression gate would catch the extendBlock concatenation before kernel scale"). - bench/cfg/measure.mjs (new): build-free tsx harness timing collectFunctionCfgs (parse once, reuse the tree) across three scaling scenarios — straight-line (extendBlock path), many-functions (collect walk), branchy (block/edge growth) — at 500→2000. Reports a wall-time scaling ratio AND a cfgSideChannel byte-size ratio, plus an order-independent sha256 over the emitted blocks/edges as the behavior gate. `--check` compares both ratios + the fingerprint against bench/cfg/baselines.json; mirrors the scope-capture / python-scope harnesses. - .github/workflows/ci-tests.yml: run the gate on every test job (build-free, alongside the existing scope-capture guards) so an O(n^2) re-regression fails CI. - cfg-builder.ts: structural fix for the one real hotspot the bench surfaced — accumulate basic-block text as fragments joined once in finish(), instead of concatenating onto a growing string per coalesced statement (O(n^2) → O(n)). Behavior-identical (the CFG fingerprint + the AC1 snapshot are unchanged). Measured (post-fix): time ratios straight-line ~1.3, many-functions ~1.0, branchy ~1.1 (all sub-quadratic; a true O(n^2) would be ~4.0). cfgSideChannel bytes scale linearly (~1.0-1.04). 60 CFG tests green; build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bhigyanpatwari#2081) Extend bench/cfg/measure.mjs beyond wall-time to the two other scalability dimensions that matter at kernel scale: - DISK growth: utf8 byte size of the serialized cfgSideChannel — exactly what a --pdg run writes onto every ParsedFile shard (durable store + parse cache). - MEMORY growth: retained JS heap of the cfgSideChannel payload, measured by the release-delta method (heap held minus heap after dropping it) — robust to pre-existing garbage and dead-stable run-to-run. Needs `node --expose-gc`; without it the heap metric is null and its gate is skipped (local runs still work). ci-tests.yml now passes --expose-gc so the heap gate runs in CI. Both gated on linear scaling in baselines.json (disk_bytes_budget / heap_budget 1.2-1.3). Measured: disk ~1.0-1.04, retained heap ~0.87-1.0 — both linear (~1KB/function each; ~2MB heap / 1.6MB disk at 2000 functions, --pdg only). Bumped REPS 7->15 to stabilize the noisier time signal and widened the coarse time tripwire budgets (the disk/heap gates carry the tight regression detection). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tri-review + CFG-domain-expert lane — M1 CFG layer (#2081)Methods: GitNexus swarm + Compound-Engineering personas + Codex (the one genuinely independent engine — live) + a CFG / program-analysis domain-expert lane (added for completeness). 9 lanes; 2 What's solid (validated across lanes): the OFF-path byte-identity holds (every CFG path gated; pdg-off chunk keys unchanged; Headline findings1. [P1 for the M2 taint consumer · Codex + CFG-expert + correctness] Exceptional 2. [P1 · Codex, independent engine] 3. [P2 · Codex + CFG-expert] Stacked/block labels strand a block. 4. [P2 · Codex + CFG-expert + 3 lanes · DOCUMENTED] 5. [P2 · performance, measured] Benchmark self-review (the new perf gate, under scrutiny)6. [P3 · adversarial, reproduced] The 7. [P3 · adversarial + performance + Codex] Heap gate silently no-ops without 8. [P3 · Codex] Ratio-only gates miss absolute (constant-factor) regressions — a 10× linear blowup keeps ratios ~1.0. Enhancement: add deterministic absolute byte/heap-per-function ceilings. Refuted / disagreement / lower-priority
CI: green except Vercel (deploy-auth, non-blocking). Automated multi-tool digest (3 methods + a domain-expert lane). The author will address the corroborated findings in-PR. Verify before acting. |
…wari#2081) Corroborated findings from the tri-review (Codex + CE personas + GitNexus swarm + a CFG/program-analysis domain-expert lane). The OFF-path stays byte-identical; all fixes are within the --pdg path or the benchmark. - [Codex+CFG-expert] Exceptional `throw` edges now wire EVERY block in a try's protected region to the handler, not just the body ENTRY. A branched try body (`try { if (x) { use(t); } } catch`) previously left interior blocks with no path to `catch` — a taint false-negative into the handler for the M2 PDG pass. - [Codex+CFG-expert] An unresolved labeled jump (a stacked outer label or a labeled non-loop block) now routes to the function EXIT instead of leaving a dangling sink — restores the single-exit invariant post-dominator/PDG computation needs. - [Codex] computeChunkHash now folds pdgMaxFunctionLines/pdgMaxEdgesPerFunction into the chunk key (not just the pdg boolean), so a warm cache built under one cap is never served to a run with a different cap (abhigyanpatwari#2038 class, extended to the budgets). Adds PdgCacheKey; boolean form kept for back-compat. - [perf] visitTry resolves catch/finally in a single namedChild pass (the double `namedChildren.find` allocated two throwaway arrays). - [adversarial] The bench `straight-line` scenario now runs at 2000->8000: output is a constant 4 blocks so disk/heap can't see the concat path, and at the old N a genuine O(n²) was masked by V8 cons-strings. Verified at the new N: the array-join impl ~1.0, a rope-optimized `+=` ~1.0 (correctly not flagged), a real O(n²) (re-join-every-append) ~3.8 — budget tightened 2.0->1.5. - [adversarial+Codex] The bench `--check` now FAILS LOUDLY when run without `--expose-gc` instead of silently skipping the retained-heap gate. - Doc: re-labeled the finally-bypass as a SOUNDNESS (false-negative) limitation tracked for M2, not mere "precision." 3 new regression tests (branched-try interior→handler, stacked-label→EXIT, cap-fold key). 99 CFG tests pass; build clean; bench gate green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve conflict in run-analyze.ts: combine M1 CFG/PDG opt-in options (pdg/pdgMaxFunctionLines/pdgMaxEdgesPerFunction) with main's fetchWrappers option in the runPipelineFromRepo call. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
magyargergo
left a comment
There was a problem hiding this comment.
Multi-lane review digest — PR #2099 (CFG layer for TS/JS, #2081 M1)
Methods & engine breakdown: 6 Claude reviewer lanes (GitNexus swarm + Compound-Engineering personas: correctness, adversarial, performance, maintainability, testing, risk) + direct coordinator verification. Codex was unavailable (quota exhausted) — this is an all-Claude review; cross-lane agreement means "consistent across personas," not independent-engine confirmation. The two headline findings were verified directly by the coordinator (one reproduced against a real tree-sitter parse, one code-read through the full persistence chain). Digest was vetted by an independent synthesis-critic pass before posting.
What's solid (validated, not just absence of findings)
- The
--pdg-off default path is genuinely zero-overhead and byte-identical: gated at worker init (PDG_ENABLED, parse-worker.ts), at emit (input.pdg === true, run.ts), and at the cache key (pdg-off keys verbatim pre-#2081) — pinned by the pre-existing pipeline-graph-golden edgeDigest plus pipeline-pdg.test.ts's flag-off zero-count. Confirmed by 3 lanes + coordinator. - Both P1s from the earlier review round are fixed:
functionStartColumndisambiguates same-line-function BasicBlock ids, and the worker CFG build has a per-file try/catch so one bad file can't drop its language group. - The pdg cache namespace is honored end-to-end (worker shards keyed by the main-thread hash; run-scoped store cleared per run; warm byte-copy keyed identically): a
--pdgrun cannot hit a non-pdg chunk or vice versa. - New
CFGedges are inert in impact's whitelist traversal; BasicBlock read paths were already handled in M0 (api.ts). The merge commit (58f13ad) preserves both parents' changes (verified against each parent). CI builds dist before vitest, and the new bench gate hard-fails on real thresholds (fingerprint + time/disk/heap ratios).
Headline findings (inline comments)
- P1
run-analyze.ts:666—--pdgon an already-indexed repo silently persists ~zero CFG (incremental writeback is pdg-mode-blind). [code-read, full chain verified] - P2
cfg/visitors/typescript.ts:464— emptycatch {}treated as no catch; swallowed-exception flow bypasses the catch and exits the function. [reproduced] - P2
pipeline-phases/parse-impl.ts:750— emit-time-onlymaxEdgesPerFunctionfolded into the parse-cache key → spurious full re-parse on cap change. [code-read; 2 lanes] - P3
scope-resolution/pipeline/run.ts:711— outer-array-only guard; a malformedcfgSideChannelelement still throws mid-graph-build, the exact failure the adjacent comment claims to prevent. [code-read; 2 lanes] - P3
cfg/visitors/typescript.ts:351—forwith a body but no increment: phantomheader→headerloop-back self-edge while the real back-edge is labeledseq. [agent-reproduced + code-read] - P3
storage/parse-cache.ts:162— misleading comment: pdg-off keys are unchanged, but SCHEMA_BUMP 4→5 still cold-invalidates every cache once on upgrade. [code-read]
Lower-priority notes (no inline anchor)
- Alternating pdg/non-pdg runs mutually evict each other's cache + durable store on every flip (end-of-run pruning keeps only this run's keys + sibling-branch keys) — consider unioning the same branch's prior keys. (single lane, not coordinator-verified)
- Empty try body with a catch routes normal flow through the dead catch block (agent-reproduced; rare pattern).
- Degraded re-extract paths (missing durable shards) silently produce per-file CFG holes on
--pdgruns — the existing warnings don't mention CFG loss; consider an emitted-with-CFG counter in the run summary. - Implicit exceptions inside a catch body are not routed to
finally(only explicitthrows; asymmetric vs the try body's blanket handler edges). with_statementis absent fromCONTROL_FLOW_TYPES(areturninsidewithis flattened to fallthrough; sloppy-mode-only exposure).emit.ts:28doc says cap0⇒ "use this default" but the code makes0⇒ unlimited (run.ts's doc is the correct one).CfgEmitResult.cappedFunctionsandCollectedCfgs.skippedare computed but never surfaced in production logs.--pdgships CFG-only in M1 — worth a one-line scope note in the help text (or a--cfgalias) so config authors know the flag will expand.- Perf: blanket try-body throw edges and ~1KB/function
cfgSideChannelare real but bounded and bench-gated; a try-heavy bench scenario would close the one ungated pattern.
Test gaps (consistent across 3 lanes)
- No
--pdg-through-incremental-writeback integration test — would have caught the P1 (index → re-analyze--pdg→ assert BasicBlock count > 0). - No warm-cache cross-flag test (pdg-off warm store → pdg-on run on the same storagePath).
- Visitor shapes untested: return-in-try-with-finally, throw-in-catch-with-finally, nested try, switch default-in-middle, empty shared cases,
for(;;). - No malformed-
cfgSideChannel-element test through run.ts.
Refuted suspicions (probed, no divergence)
Cache poisoning across pdg namespaces; workerData plumbing with absent store paths; JSON-serialization hazards (−0/Infinity/reviver collisions); DB injection via BasicBlock text (CSV + escapeValue paths verified); quadratic collect.ts tree walk; toothless bench gate.
CI
Green at review time except Vercel (deploy-auth, known infra) and the windows/coverage jobs still pending.
Coverage: full 38-file diff read; visitor/builder/emit/cache/worker paths deep-read; persistence chain code-read; no end-to-end --pdg analyze run was executed (worktree build constraint).
Automated multi-lane review digest (all-Claude, Codex unavailable) — verify findings before acting on them.
| workerPoolSize: options.workerPoolSize, | ||
| // CFG/PDG opt-in (#2081 M1). PipelineOptions.pdg fans out to the worker | ||
| // build gate (workerData.pdg) and the scope-resolution emit gate. | ||
| pdg: options.pdg === true, |
There was a problem hiding this comment.
P1 — --pdg silently persists no CFG on an already-indexed repo [code-read, full chain verified]
Trigger: repo already indexed (meta.fileHashes populated) → user runs gitnexus analyze --pdg with no file changes — the natural first use of this flag.
isIncremental(this file, ~line 694) has no pdg term, and RepoMeta records no pdg mode (zeropdgrefs instorage/repo-manager.ts), so the mode flip is invisible.- Workers then pay the full CFG cost (cache misses everywhere — pdg-namespaced keys) and scope-resolution emits BasicBlocks into the in-memory graph…
- …but
extractChangedSubgraph(core/incremental/subgraph-extract.ts:70-85) keeps only nodes whosefilePathis in the changed set (+Community/Process). Withchanged=0, every BasicBlock node is dropped from the written subgraph. The run logsIncremental: changed=0and succeeds with zero BasicBlock rows. - Converse flip (plain
analyzeafter a--pdgindex — e.g. the standard re-index hook):deleteNodesForFileremoves changed files' BasicBlocks, the pdg-off run rewrites none → a zombie mixed-coverage CFG layer only--forcecan clean.
Fix: record the pdg mode (and caps) in RepoMeta and force full writeback (or expand the write set to all CFG-bearing files) on mode flip; at minimum, loudly suggest --force when --pdg hits an incremental-eligible non-pdg index. A small integration test (index → re-analyze --pdg → assert BasicBlock count > 0) would pin this.
| // durable store. A stale or wrong-shape value (e.g. a pre-SCHEMA_BUMP | ||
| // shard that slipped the version gate) must skip emission, not throw a | ||
| // TypeError mid-graph-build and abort scope-resolution for the language. | ||
| if (!Array.isArray(cfgs) || cfgs.length === 0) continue; |
There was a problem hiding this comment.
P3 — guard validates only the outer array; the comment overpromises [code-read; 2 lanes]
The comment above says a stale/wrong-shape value "must skip emission, not throw a TypeError mid-graph-build" — but the check is Array.isArray(cfgs) plus a cast. A malformed element (e.g. [{}]) reaches emitFileCfgs, which destructures and iterates cfg.blocks → TypeError on the main thread, with no per-file catch here (unlike the worker side), aborting the language's scope-resolution pass — exactly the failure the comment claims is defended against. Precondition (a shard that passes the version gate with mismatched element shape) is low-probability today, but M2 will add fields to FunctionCfg and this is the seam where a missed SCHEMA_BUMP turns into a crash.
Fix: per-element shape check (Array.isArray(cfg?.blocks) && Array.isArray(cfg?.edges) && typeof cfg?.filePath === 'string') with a warn-and-skip, or wrap the per-file emitFileCfgs call in try/catch mirroring the worker-side isolation.
…once (abhigyanpatwari#2099 F6) The computeChunkHash comment claimed pdg-off warm caches "survive this change untouched" — true for the key FORMAT, but misleading as an upgrade-behavior promise: SCHEMA_BUMP 4→5 changes PARSE_CACHE_VERSION and both stores hard-invalidate on it. Separate the two facts so the next cache change isn't reasoned about from a false premise. Review finding F6 (P3) of PR abhigyanpatwari#2099 tri-review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bhigyanpatwari#2099 F5) A for with a body but no increment emitted an unconditional header→header 'loop-back' self-edge (a path that never executes the body) while the real back-edge body→header was labeled 'seq'. Any consumer identifying loops via reason='loop-back' picked the phantom edge and excluded the body from the natural loop. Gate the self-edge on the body being absent (the one case where the header genuinely re-tests itself) and carry 'loop-back' on the body's exits when they ARE the back-edge, matching visitWhile/visitForIn. Review finding F5 (P3) of PR abhigyanpatwari#2099 tri-review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ari#2099 F2) visitTry keyed handler semantics off the traversal result — null for an empty body, since visitSeq([]) returns null — instead of the syntactic clause. An empty `catch {}` was therefore treated as NO catch: the swallowed exception escaped to the outer handler/EXIT, the no-catch re-propagation misfired past finally, and code after a try whose body always throws became unreachable from ENTRY — a hard false-negative source for the M2 taint pass, on an extremely common pattern. Synthesize one empty block spanning the clause (entry == sole exit) when the catch body traverses to null, before the protected region is walked. Exception flow lands in it and rejoins the normal continuation; all downstream wiring (handler selection, finally routing, the !catchRes re-propagation gate) operates on the syntactically-correct shape. Review finding F2 (P2, reproduced) of PR abhigyanpatwari#2099 tri-review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bhigyanpatwari#2099 F4) The cfgSideChannel guard checked only Array.isArray before casting to FunctionCfg[] — its own comment promised a wrong-shape value would 'skip emission, not throw a TypeError mid-graph-build', but a malformed ELEMENT sailed through. Worse, the obvious-looking failure shape never throws at all: emitFileCfgs string-templates any edge endpoint into the BasicBlock id and graph inserts are no-throw, so a non-integer endpoint silently became a dangling 'BasicBlock:…:undefined' edge that degrades the DB rel-pair COPY to row-by-row fallback inserts much later. Layered fix matching house precedents (parsedfile-store reviver, worker-side per-file catch): a per-element shape+content predicate (arrays + integer edge endpoints) that warns and skips malformed elements while valid siblings still emit, plus a per-file try/catch backstop for shapes that genuinely throw (e.g. a null inside blocks). Review finding F4 (P3) of PR abhigyanpatwari#2099 tri-review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…igyanpatwari#2099 F3) pdgMaxEdgesPerFunction is applied exclusively in emitFileCfgs during scope-resolution on the main thread — the worker never receives it (workerData carries only pdg + pdgMaxFunctionLines), so the cached worker output is byte-identical across cap values. Folding it into the chunk key (added by a prior review round) only converted a free knob into a repo-sized cost: every cap change forced a full re-parse and a durable-store rewrite of unchanged data. Keep pdg + maxFunctionLines (genuinely worker-visible, shape the cached cfgSideChannel) and document the classification test in the PdgCacheKey doc comment so the next option gets sorted deliberately: worker-shard inputs go in this key; persisted-graph-only inputs belong in the RepoMeta pdg stamp (F1). Chunks written under the old ns string miss once and prune — no migration needed. Review finding F3 (P2) of PR abhigyanpatwari#2099 tri-review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mode flip (abhigyanpatwari#2099 F1) Running --pdg against an already-indexed repo silently persisted ~zero CFG: incremental eligibility had no pdg term, RepoMeta recorded no mode, and extractChangedSubgraph keeps only changed-file nodes — on a no-change --pdg re-run every freshly built BasicBlock was dropped from the written subgraph ('Incremental: changed=0', run succeeds, zero rows). The converse flip left zombie mixed-coverage blocks only --force could clean. Worse, a clean-tree flip hit the alreadyUpToDate fast path and never ran the pipeline at all. - RepoMeta gains an additive-optional pdg stamp ({maxFunctionLines, maxEdgesPerFunction}, resolved values; absent ≡ pdg-off, which covers every legacy meta). No INCREMENTAL_SCHEMA_VERSION bump — that would force a one-time full rebuild for everyone. The end-of-run meta is a fresh literal, so omitting the field on a pdg-off run is what clears the stamp after an on→off flip. - pdgModeMismatch (pure, exported) compares the resolved triple; the flip check sits before the fast path and always logs its notice (not gated on options.force — --skills implies force with no message of its own), naming the .gitnexusrc pdg key that pins the mode. - The full-rebuild branch now writes the incrementalInProgress dirty flag (toWriteCount: 0 sentinel) before the wipe whenever a prior meta exists, mirroring the incremental branch. This closes the crash window where a rebuild dying between the bulk load and saveMeta left meta/DB inconsistent and the fast path certified zombie (or missing) CFG rows indefinitely — and incidentally closes the same pre-existing hole for user --force runs. Recovery log reworded accordingly. Tests: pdg-mode-flip.test.ts (real git + LadybugDB; primary assertion is a direct BasicBlock table count — meta.stats aggregates nondeterministic Community/Process rows) covering off→on, steady-state fast path, on→off zombie cleanup, cap-change rebuild, and dirty-flag + flip composition; pure-helper tests for default resolution and the 0=unlimited carve-out. Review finding F1 (P1) of PR abhigyanpatwari#2099 tri-review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Closes the M1 milestone of Epic #2087 (issue #2081). Populates the M0
BasicBlock/CFGgraph substrate (#2080, PR #2092) with real per-function control-flow graphs for TypeScript & JavaScript, strictly behind an opt-in--pdgflag.Default
analyzeruns are byte-identical to before — verified by the golden-graph parity test (AC4) with the flag off, and independently corroborated by an adversarial review pass.What it does (opt-in
--pdg)ParsedFile.cfgSideChannelas plain data.BasicBlocknodes +CFGedges from that side-channel while the disk-backed ParsedFile store is still live — the architecture the doc-review corrected to (a standalone post-mrophase would read an empty store; see KTD1).--pdgrun (the perf(ingestion): Linux-kernel-scale analysis — worker-pool parse + finalize O(n²) + scope-resolution memory wall (#1983) #2038-class trap), withSCHEMA_BUMP4→5.Control-flow coverage
if/else/else-if,while/do-while/for/for-in/for-of(header + back-edge + exit),switch(case dispatch + fallthrough vsbreak),try/catch/finally(normal and exceptional flow both route throughfinally), labeledbreak/continue, earlyreturn/throw, expression-bodied arrows, async/generator/method bodies.Implementation units
CfgVisitor(one hazard per test)cfgSideChannel+ cache coherence (SCHEMA_BUMP+ pdg-folded chunk key) +LanguageProvider.cfgVisitorhookBasicBlock+CFGemit within scope-resolution (per-function edge cap, no silent truncation)--pdgopt-in plumbing (CLI +.gitnexusrc→ both the worker build gate and the emit gate)Acceptance criteria
BasicBlockreachable from its function ENTRY--pdgoff (byte-identical)runPipelineFromRepo({ pdg: true })emits a queryable CFG; default run emits zeroReview (tri-method, applied in-PR)
A 10-reviewer pass (correctness/adversarial/performance/api-contract/reliability/testing/maintainability/project-standards/agent-native/learnings) confirmed the OFF-path byte-identity and found defects all within the
--pdgpath, all fixed in this PR:BasicBlockid collision → start-column disambiguator (BasicBlock:<file>:<line>:<col>:<idx>)cfgSideChannelcast →Array.isArrayguardmaxFunctionLinesno default →DEFAULT_PDG_MAX_FUNCTION_LINES=2000+ caps forwarded through the server path0-vs-default docstrings, language-named CLI flag, stale JSDocRefuted: an HTTP-500
getNodeQueryfinding — M0 already shipped theBasicBlockquery branch + name-floor (R12/web-safety handled).Known limitations (scoped to M1, documented in code)
break/continue/return) out of atry-with-finallyedges directly to its target rather than routing throughfinallyfirst (conservative under-approximation; sound for a CFG). The general fix duplicatesfinallyper exit path — deferred to when the downstream taint analysis (M2+) needs the precision. Normal completion andthrowdo route throughfinally.outer: inner: for) and break-to-block-label are unmodeled (the jump becomes a CFG sink — a missing edge, never a mis-route). Single-labeled loops/switches resolve correctly.pdgMaxFunctionLines/pdgMaxEdgesPerFunctionhave no CLI flag (plan-scoped to programmatic/server use for M1); reachable viaPipelineOptions+runFullAnalysis.Tests
56 CFG tests (unit + integration) + analyze-config; full suite green locally; golden parity byte-identical.
🤖 Generated with Claude Code