Skip to content

feat(cfg): intra-procedural REACHING_DEF data-dependence layer (#2082)#2160

Merged
magyargergo merged 20 commits into
abhigyanpatwari:mainfrom
magyargergo:feat/reaching-def-m2
Jun 11, 2026
Merged

feat(cfg): intra-procedural REACHING_DEF data-dependence layer (#2082)#2160
magyargergo merged 20 commits into
abhigyanpatwari:mainfrom
magyargergo:feat/reaching-def-m2

Conversation

@magyargergo

Copy link
Copy Markdown
Collaborator

Implements M2 of Epic A (#2087): intra-procedural reaching definitions — the classic GEN/KILL monotone fixpoint over the M1 per-function CFG — producing statement-granular def→use facts (the substrate the M3 taint engine consumes) and a budgeted, persisted REACHING_DEF edge projection. Closes #2082.

Everything is strictly behind --pdg; a flag-off run stays byte-identical (golden gate green).

What this adds

Layer 2 of the taint-first PDG substrate (L1 CFG merged in #2099):

  • Per-statement def/use harvesting (cfg/visitors/typescript-harvest.ts) — runs in the parse worker where the AST lives, at every block-text entry site (incl. return/throw arguments, expression-bodied arrows, for-in/of heads, switch case tests, catch params, ENTRY params). Two-phase and order-independent: a declaration pre-scan builds the complete lexical scope tree first (the CFG walk is not source-order — finally-before-try, for-init-last — so declare-as-you-walk would mis-key common code). Bindings are table-indexed (name:declLine:declCol identity), so shadowed names never conflate and the serialized payload stays ~2× M1 instead of ~5× (measured). SCHEMA_BUMP 5→6.
  • The solver (cfg/reaching-defs.ts) — pure, deterministic computeReachingDefs(cfg, limits?): RPO worklist, reference-shared def-sets (kill is total per binding, so transfers alias or replace — never deep-copy), throw edges contribute IN∪OUT to handlers (an exception can fire before a block's defs complete), and the canonical intra-block sweep recovers statement granularity from M1's coalesced blocks without re-splitting the CFG. maxFacts bounds the O(defs×uses)-by-spec materialization spike. M3 calls this same function in-phase; the persisted edges are a projection, never the taint substrate.
  • Persistence (cfg/emit.ts + scope-resolution wiring) — facts dedup to (defBlock, useBlock, binding) before budgeting (relationship ids are in-memory-only; finer rows would be byte-identical duplicates), edge ids are content-derived with a single function anchor, reason carries the plain variable name (M0/S1 verdict), and the per-function cap (4000, Joern's prior art) truncates-and-warns naming the top bindings by fact count — both truncation layers warn unconditionally (R7).
  • CFG soundness carried forward from the M1 tri-review:
    • Early exits now route through finally with target-relative threading via finalizer frames interleaved on ControlFlowContext — a break whose loop is wholly inside the try keeps its direct edge (re-routing it would let a finally redefinition falsely kill in-loop defs). Completion edges get dedicated finally-return/break/continue kinds, preserving the "bare jump kind ⟹ source-block terminator" invariant. Threading protocol lives in control-flow-context.ts so future language visitors can't drift on it.
    • Catch params define in their own once-executed block (prepending into a loop-header entry would re-gen per iteration and kill loop-carried redefs).
    • Throw model stays the sound over-approximation (every protected block → handler); per-leader precision deferred to M3 by design.
  • Coherence — the new emit-time cap joins RepoMeta.pdg and pdgModeMismatch now compares the key union structurally, so an M1→M2 upgrade on an already-indexed repo auto-forces the full writeback that populates REACHING_DEF rows (no --force needed), and future knobs join the comparison automatically. The cap is deliberately NOT in the parse-cache chunk key (feat(ingestion): add control-flow-graph layer for TS/JS (#2081) #2099 F3 discipline).
  • detect_changes noise fix — on a --pdg index every edited function used to surface N nameless BasicBlock pseudo-symbols; the hunk-overlap query (and the BM25 filePath fallback) now exclude them by id prefix.
  • Bench gates — two new scenarios: dense-bindings (bindings×blocks scale jointly; catches the per-use-rescan O(n³) class at rd ratio ≥16) and fact-fanout (facts quadratic by spec → gated on boundedness at the fact limit, not linearity), plus an absolute side-channel byte ceiling (a ratio gate is blind to constant-factor encoding bloat). All 5 scenarios pass.

Testing

  • 164 tests across 14 files (unit hazards on hand-built CFGs, parser-direct harvest/walk-order traps, emit/budget/guard, R5 facts snapshot on the extended fixture, pipeline both-sinks, durable-store fact round-trip, real-DB mode-flip + REACHING_DEF row smoke + detect_changes filter).
  • Golden flag-off byte-identity green; bench --check green; tsc clean in both packages; root prettier clean.
  • A 7-angle review pass ran pre-PR; all 8 verified correctness findings fixed in-branch (defKey overflow guard, catch-param loop-header kill, JS class-name defs, intra-statement write-then-read over-approximation, graceful CFG-only degradation on malformed facts, id-prefix exclusion, structural comparator, bench gate self-disable).

Known limitations (documented in-code, M3/M4 scope)

  • Closure/callback flows are invisible in both directions (nested function bodies are opaque) — named in the M3 consumer contract; M4 territory.
  • Member/property writes are not scalar defs (TypeScript-CFA precedent).
  • Shared-finally path conflation and the throw over-approximation add facts, never lose them.

Known residuals (accepted, low-risk)

  • The cfg unit-test files each carry a local parse/collect/cfgOf harness (3 copies) — consolidation into test/helpers/ deferred to avoid late-cycle churn.
  • Pre-existing (not introduced here): the server analyze path forwards no pdg options at all, so a server-triggered re-analysis of a CLI-built --pdg index runs pdg-off and removes the layer per the documented F1 flip semantics; an M1-era binary downgraded onto an M2 stamp would not detect the new field. Both are M3-surface decisions.

Post-Deploy Monitoring & Validation

No production service impact — opt-in CLI/indexer feature, flag-off byte-identical (golden-gated in CI). For dogfooding validation: run gitnexus analyze --pdg, check [reaching-defs] warns in the log (cap/fact-limit truncations name file:line and top bindings), MATCH (:BasicBlock)-[r:CodeRelation {type:'REACHING_DEF'}]->(:BasicBlock) RETURN count(r) > 0, and PROF_SCOPE_RESOLUTION=1 shows the pdg= segment. Rollback = re-run without --pdg (full writeback removes the layer).


Plan: docs/plans/2026-06-10-002-feat-reaching-def-m2-plan.md (4 research lanes + 3-agent deepening + doc review). Next: M3 #2083 — intra-procedural taint, the first reliable-taint ship.

🤖 Generated with Claude Code

…k, class defs, intra-statement reads, graceful fact degradation (abhigyanpatwari#2082)

- reaching-defs: STMT_STRIDE 2^16→2^21 + upfront aliasing bail-out; a use
  that shares its statement with a def now also sees the same-statement def
  (assign-and-test idiom was a taint false negative); drop dead posInOrder
- visitor: catch-param def gets its own once-executed block (prepending into
  a loop-header entry re-genned per iteration and killed loop-carried
  redefs); unresolved-label jumps now thread all active finallys; the
  finalizer-threading protocol moved to control-flow-context as shared
  helpers for future language visitors
- harvest: class declarations def their name (was a bogus use in JS, silent
  skip in TS); class-expression names stay internal
- emit: isEmitSafeCfg adds index==position contiguity; fact validation split
  into hasEmitSafeFacts so malformed facts degrade to CFG-only instead of
  dropping the function's whole CFG layer; facts-per-edge multiplier single
  source; lazy top-binding tally; dead solveMs removed
- run-analyze: pdgModeMismatch compares the key union structurally — new
  resolved knobs join the comparison automatically
- mcp: BasicBlock exclusion via id prefix (NULL-name rows of real symbols
  are no longer dropped) + same filter on the BM25 filePath fallback
- bench: rd ratio denominator clamped (gate no longer self-disables at fast
  small-N); PROF-gated pdg timing in run.ts
@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Found fixable formatting / unused-import issues across 16 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":2160,"changed_lines":16,"head_sha":"5fd049c99b7b0b77be636a2572ac9ca84eacbb45","run_id":"27308753046","apply_command":"/autofix"}

@magyargergo magyargergo changed the title feat(cfg): M2 — intra-procedural REACHING_DEF data-dependence layer (#2082) feat(cfg): intra-procedural REACHING_DEF data-dependence layer (#2082) Jun 10, 2026
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
11303 11287 0 16 598s

✅ All 11287 tests passed

16 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
  • C++ pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • Go structural interface detection benchmark > scales linearly with interface × struct count
  • Go structural interface detection split-phase benchmark > separates index-build and detection time
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • Vue pipeline benchmark > scales with component count
  • run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 75.77% 37315/49243 N/A% 🟢 ███████████████░░░░░
Branches 63.6% 23120/36349 N/A% 🟢 ████████████░░░░░░░░
Functions 81.44% 4021/4937 N/A% 🟢 ████████████████░░░░
Lines 79.49% 33692/42384 N/A% 🟢 ███████████████░░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tri-review digest — PR #2160 (M2 REACHING_DEF layer)

Methods & engines: 2-method, ALL-CLAUDE review (Codex unavailable — usage limit; resets Jun 11 06:24 UTC). Lanes: 4 compound-engineering personas (correctness, adversarial — both ran empirical tsx repros against the real visitor+solver; performance, maintainability) + 2 GitNexus lanes (risk, test/CI) which twice ended mid-investigation; the coordinator absorbed their unique scopes directly (stale-test blast sweep executed; non-pdg query blast radius verified against schemas + the PR's own real-DB test). Cross-lane agreement here is "consistent across personas," not independent-engine confirmation.

What's solid (validated, not just unflagged): the adversarial lane constructed and FAILED to break: determinism (5× byte-identical solves; same-line arrows), 50-deep nested try/finally kill chains (semantically exact), throw-before-def seed survival into catch, 3000-block scale (57ms), emit caps/dedup/id idempotency, and a long harvester identity list (sibling-block shadowing, computed-key destructuring, for-await, destructured catch params, tagged templates, arguments). The correctness lane refuted the suspected mergePreds aliasing, worklist accounting drift, and sweep/fixpoint divergence with traces. The finally-threading topology (target-relative, nested chains, labeled continue through switch) held under attack.

Headline findings (coordinator-reproduced)

  1. [P1][reproduced · same-engine] Conditional-context defs are unconditional total kills → taint false negatives on core JS idioms. gitnexus/src/core/ingestion/cfg/visitors/typescript-harvest.ts (walkValue assignment_expression, ~line 398): a def inside a short-circuit RHS or ternary arm (if (a && (x = clean())) {} sink(x); cached ?? (cached = load())) is harvested as a must-def; the solver's total kill then erases the prior def. Reproduced: source→sink fact ABSENT when a falsy keeps source live. Same mechanism covers case-test defs (P3 variant). Fix direction: may-def (gen-without-kill) facts for conditionally-evaluated contexts. (correctness + adversarial lanes each reproduced it — same engine, so convergent rather than independent — and the coordinator re-reproduced it.)
  2. [P1][reproduced · same-engine] Unresolved-label jumps lose the real continuation → false kills. gitnexus/src/core/ingestion/cfg/visitors/typescript.ts (visitBreak fallback): blk: { if (c) break blk; x = 2; } sink(x) routes the break to EXIT, removing the only path that keeps x = 1 live — x=1→sink fact ABSENT. The in-code comment claims this is "sound for dataflow" — it is not (it under-approximates paths). Affects labeled non-loop blocks and the OUTER label of doubly-labeled constructs. Fix direction: model labeled blocks as break-targets (join block + labeled frame), labels as a list. (adversarial reproduced + correctness residual + coordinator.)
  3. [P1][reproduced · same-engine] Throw edges miss intermediate defs in multi-def coalesced blocks. gitnexus/src/core/ingestion/cfg/reaching-defs.ts (throw-pred IN∪OUT): try { x = parse(a); x = normalize(x); } catch { sink(x) }parse→sink fact ABSENT though normalize(x) throwing delivers exactly parse's value. IN∪OUT covers only entry/final states, not "after def_i". Fix direction: throw-pred contribution = IN(from) ∪ ALL def sites of the block (monotone, no serialization change). (correctness reproduced + coordinator.)
  4. [P1][reproduced] CI breakage: stale DEFAULTS in an untouched test suite. gitnexus/test/unit/run-analyze.test.ts:333 — the DEFAULTS constant omits the new maxReachingDefEdgesPerFunction: 4000 field that resolvePdgConfig({pdg:true}) now returns, so its strict toEqual expectations fail. The two failing tests (coordinator ran the file): resolvePdgConfig: caps resolve to their defaults; 0 = unlimited is preserved and explicit defaults compare equal to absent caps (KTD5 normalization). The pending tests CI job will go red. Anchor: the resolvePdgConfig change in gitnexus/src/core/run-analyze.ts.

Also inline-worthy

  1. [P2][adversarial-reproduced / coordinator code-read] Initializer-less var x; harvested as a def — a runtime no-op kills the live def (x = source(); var x; sink(x) loses source→sink). typescript-harvest.ts variable_declaration case: only def when the declarator has a value.
  2. [P2][CI fact] quality/lint fails on the new fixturetest/integration/cfg/fixtures/ten-functions.ts:116,118 prefer-const (the withShadowing let s is never reassigned; reassign or restructure — const would change the binding-kind the test exercises).

Body-only (lower confidence / lower severity)

  • [P2][code-read] resolve() parent-chain walk is O(depth) per identifier — quadratic on deeply-chained single-statement expressions (generated code); a nearest-scope cache per node id at prescan time makes it O(1). No bench scenario covers expression-depth. (performance)
  • [P3] isEmitSafeCfg doesn't validate entryIndex; a corrupt element passes guards and the solver throw costs the whole FILE's REACHING_DEF (caught by run.ts try/catch with a misleading message). (correctness, reproduced by lane)
  • [P3] STMT_STRIDE bail-out reuses status 'truncated' → the emit warn misnames it as the fact-materialization limit. (correctness)
  • [P3] (x) += 1 / (x)++ parenthesized lvalues lose the def (unwrap before the identifier test). (correctness, reproduced by lane)
  • [P3] Same-name var + function-declaration: two bindings for one JS variable + lexical-position kill (pathological input). (adversarial)
  • [P3] Comparator typing: pdgModeMismatch's Record<string,unknown> casts erase type safety; iterating resolvePdgConfig's own keys encodes ownership more directly. (maintainability)
  • [P3] Test-harness triplication across the three cfg unit test files (known residual from the PR description; maintainability concurs not blocking).
  • Perf residuals: dedup string keys / singleton Set allocations are micro at current scale; the documented Map-spine copy (~5× normalized on dense-bindings, budget 10) is a known, benched shape.

Refuted / cleared

mergePreds shared-set mutation; worklist pending drift; sweep/fixpoint OUT divergence; bare for (x of xs) defs; labeled continue through switch+finally; throw-into-finally re-propagation; non-pdg blast radius of the changed MCP queries — [code-read + partial run] the coordinator grepped every node-table DDL (id STRING … PRIMARY KEY (id) on all) and personally ran the PR's real-DB test exercising the STARTS WITH predicate; the empty-result behavior on indexes with no BasicBlock rows is inferred from the query logic, not separately executed; both-undefined pdgModeMismatch fast path; SCHEMA_BUMP lockstep invalidation; stale-test blast sweep (only run-analyze.test.ts affected; the 4 parse-cache suites pass).

Test gaps (for the fix round)

No tests pin: conditional-context defs (1), labeled-block continuation facts (2), multi-def-block throw states (3), initializer-less var (5), entryIndex guard, malformedFactFunctions counter, the finally-* KIND regression (a kind rename to bare jumps would pass current reachability-shaped tests).

CI at review time

quality/lint FAIL (finding 6), tests job will fail (finding 4); typecheck/format/golden PASS; macos/windows/bench jobs pending. PR is BEHIND main by one unrelated merge (MERGEABLE).

Automated multi-lane digest (all-Claude this run — verify before acting). Coverage: full source diff read; snapshots/fixture blobs skimmed.

if (value) this.walkValue(value, acc);
}
return;
case 'assignment_expression': {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1][reproduced · same-engine] A def inside a conditionally-evaluated subexpression (short-circuit RHS, ternary arm) is harvested as an unconditional def, and the solver's total kill then erases the prior def — a taint false negative on core idioms.

Repro (run through the real visitor+solver): function f(a) { let x = source(); if (a && (x = clean())) {} sink(x); } → the only fact is clean→sink; source→sink is missing, though with a falsy that's exactly what executes. Same family: cached ?? (cached = load()), cond ? (x = v) : 0, and the case-test attach in visitSwitch.

Fix direction: harvest defs reached under a conditional context (RHS of &&/||/??, ternary arms, case tests) into a may-def field (mayDefs) with gen-without-kill semantics in the solver. Two lanes converged on this independently of each other (same engine) and the coordinator re-reproduced it.

// execution provably runs every finally between the jump and wherever it
// lands... up to the ones the conservative EXIT routing over-includes).
// Sound for dataflow either way: extra paths, never a bypassed finally.
const { target, finalizers } = res ?? {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1][reproduced · same-engine] The unresolved-label fallback removes the jump's REAL continuation: blk: { if (c) break blk; x = 2; } sink(x) routes the break to EXIT, so the only path keeping x = 1 live disappears — x=1→sink fact missing (false kill). The comment's "sound for dataflow either way" claim is wrong for reaching-defs: routing to EXIT under-approximates paths. Affects labeled non-loop blocks and the OUTER label of doubly-labeled constructs (outer: inner: do {…} while).

Fix direction: model labeled statements generically — a labeled non-loop block pushes a break-target frame whose target is a synthesized join block after the body; carry labels as a list so doubly-labeled loops attach both. The target-relative finally threading then stays exact for these jumps too.

}

// ── adjacency (sorted for deterministic merges) ─────────────────────────
// A `throw` edge contributes IN(from) ∪ OUT(from) to its handler, not just

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1][reproduced · same-engine] IN∪OUT covers only a block's entry and final states — the intermediate defs of a multi-def coalesced block are invisible to the handler. Repro: let x = seed(a); try { x = parse(a); x = normalize(x); } catch (e) { sink(x); } → facts into sink are seed→sink and normalize→sink only; parse→sink is missing, though normalize(x) throwing delivers exactly parse's value. Everyday code shape (x = a(); x = b(x); in a try).

Fix direction: for throw-predecessors contribute IN(from) ∪ all def sites of the block (a static per-block all-defs map) instead of IN∪OUT — monotone, deterministic, no serialization change.

? {
maxFunctionLines: options.pdgMaxFunctionLines ?? DEFAULT_PDG_MAX_FUNCTION_LINES,
maxEdgesPerFunction: options.pdgMaxEdgesPerFunction ?? DEFAULT_MAX_CFG_EDGES_PER_FUNCTION,
maxReachingDefEdgesPerFunction:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1][reproduced] This new resolved field breaks an untouched suite: test/unit/run-analyze.test.ts:333's DEFAULTS constant was not updated, so its strict toEqual expectations fail — 2 tests red on this head (resolvePdgConfig: caps resolve to their defaults… and explicit defaults compare equal to absent caps (KTD5 normalization)); the pending tests CI job will fail. Coordinator ran the file to confirm. A blast sweep of the rest of test/ found no other stale suite (the 4 parse-cache suites pass).

Fix: add maxReachingDefEdgesPerFunction: 4000 to that DEFAULTS and to the 0 = unlimited variant's expectation.

case 'shorthand_property_identifier':
this.use(node, acc);
return;
case 'lexical_declaration':

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2][lane-reproduced / coordinator code-read] An initializer-less var x; declarator is harvested as a def, but at runtime a hoisted bare var mid-function writes nothing — the fabricated def kills the live one: x = source(); var x; sink(x) loses source→sink. Only record the def when the declarator has a value (the let/const case genuinely initializes, so lexical_declaration is fine; the bug is variable_declaration declarators without initializers).

}

export function withShadowing(): void {
let s = 1;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2][CI] quality / lint fails here: prefer-const ×2 (lines 116/118) — the shadowing fixture's let s bindings are never reassigned. Note plain const would change the binding kind the harvest test exercises; reassigning both bindings (e.g. s = s + 1; in each scope) keeps the let semantics and enriches the RD facts the snapshot pins.

@magyargergo

magyargergo commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

All findings from tri-review 4471987625 are fixed in e356803..98f19c3 (one commit per finding) — none deferred:

# Finding Fix
P1-1 Conditional-context defs were unconditional total kills StatementFacts.mayDefs (gen-without-kill) for short-circuit RHS, ternary arms, logical assignment (`x
P1-2 Unresolved-label jumps lost the real continuation Labeled statements modeled generically: label LISTS on loop/switch frames (outer: inner: for resolves both); labeled non-loop statements get a break-target frame with a synthesized join — and labels compose with finalizer threading
P1-3 Throw edges missed intermediate defs of coalesced blocks Throw-preds now contribute IN ∪ allDefs(block) (subsumes OUT); parse→sink forms when normalize(x) throws
P1-4 Stale DEFAULTS in run-analyze.test.ts (2 CI failures) Updated for the M2 stamp field; blast sweep confirmed no other stale suite
P2-5 Bare var x; fabricated a killing def Declarator defs only when initialized (lexical_declaration unchanged)
P2-6 Fixture prefer-const lint errors Both shadowing bindings now genuinely reassign (richer RD facts, same let semantics)
P3s entryIndex guard · stride-bail misdiagnosis · (x)++ lvalues · resolve() depth walk entry/exit join isEmitSafeCfg; distinct overflow status + warn; lvalue unwrapping; prescan nearest-scope cache makes resolution O(scope-chain)

17 new regression tests pin every fix; each finding landed as its own commit per maintainer preference (history rewritten in-place, tree-identical to the reviewed fix state) (each P1's exact repro is now an acceptance test asserting the previously-missing fact). Full validation: 210 tests green in isolation, golden flag-off byte-identity green, bench --check green (fingerprints unchanged — no scenario uses the affected constructs), tsc + prettier clean. Not fixed by design: same-name var+function-declaration identity (pathological input, documented) and the comparator-typing advisory (the mismatch-on-doubt bias is the safe direction).

magyargergo added a commit to magyargergo/GitNexus that referenced this pull request Jun 10, 2026
…ed-statement modeling, throw all-defs, stale test defaults (abhigyanpatwari#2160 review)

Addresses review 4471987625 (all 4 P1s + both P2s + the P3s):

- harvester: defs inside conditionally-evaluated subexpressions (short-circuit
  RHS, ternary arms, logical assignment, switch case tests) are now MAY-defs
  (StatementFacts.mayDefs) — gen without kill; treating them as must-defs
  falsely killed the prior def on the not-taken path (taint false negative on
  the lazy-init idiom). Bare `var x;` declarators no longer fabricate a def
  (runtime no-op). Parenthesized/non-null lvalue wrappers unwrap. resolve()
  now uses a prescan-built nearest-scope cache (O(scope-chain), not
  O(AST-depth) per identifier).
- visitor: labeled statements are modeled generically — loops/switches carry
  their full label LIST (`outer: inner: for` resolves both), labeled
  non-loop statements get a break-target frame with a synthesized join (the
  EXIT fallback removed the real continuation and falsely killed live defs).
- solver: throw-predecessors now contribute IN ∪ allDefs(block) so the
  handler sees INTERMEDIATE defs of multi-def coalesced blocks (OUT's
  last-def-wins missed the def that is live exactly when the next statement
  throws); may-def gen/sweep semantics; distinct 'overflow' status for the
  STMT_STRIDE bail-out (was misreported as the fact limit).
- emit: entry/exit index validation joins isEmitSafeCfg; hasEmitSafeFacts
  validates mayDefs; distinct overflow warn.
- tests: run-analyze.test.ts DEFAULTS updated for the M2 stamp field (2 CI
  failures); fixture prefer-const lint errors fixed by real reassignments;
  regression tests pin every fix (17 new tests).
…g defaults

The DEFAULTS constant lacked the maxReachingDefEdgesPerFunction field that
resolvePdgConfig resolves since the M2 stamp landed, failing two strict
toEqual expectations (the CI 'tests' job failures). Models M2 steady-state
equality; the M1-era-stamp upgrade path stays pinned in pdg-mode-flip.test.ts.

Finding P1-4 of review 4471987625 (abhigyanpatwari#2160).
…onst CI errors

Both withShadowing let bindings now genuinely reassign (s = s + 1 per scope),
clearing the two prefer-const errors that failed quality/lint. Plain const
would change the binding kind the harvest test exercises; reassignment keeps
the let semantics and enriches the reaching-defs facts the snapshot pins
(snapshot + per-binding assertion updated accordingly).

Finding P2-6 of review 4471987625 (abhigyanpatwari#2160).
A corrupted side-channel element with an out-of-range entryIndex passed
isEmitSafeCfg and threw inside the reaching-defs RPO walk — caught by the
per-FILE try/catch, costing every sibling function's REACHING_DEF projection
instead of the one element (and logging a misleading message). entry/exit
join the guard's id-anchor checks.

Finding P3 (entryIndex) of review 4471987625 (abhigyanpatwari#2160).
… status

The STMT_STRIDE aliasing guard reused status 'truncated', so the emit warn
misnamed it as the fact-materialization limit (printing an unrelated maxFacts
value, including '(0)' when unlimited) and telemetry conflated the two. A
distinct 'overflow' status gets its own warn naming the actual cause; the
function's CFG layer is explicitly unaffected.

Finding P3 (stride-bail diagnosis) of review 4471987625 (abhigyanpatwari#2160).
resolve() walked the AST parent chain per identifier — O(expression nesting
depth), quadratic on deeply-chained single-statement expressions in generated
code (not caught by any bench scenario, which scale blocks/bindings, not
expression depth). The prescan already visits every node once, so caching its
innermost scope makes phase-2 resolution O(scope-chain). Behavior-identical;
the parent-chain walk survives as fallback for prescan-unvisited nodes.

Finding P2 (resolve depth walk) of review 4471987625 (abhigyanpatwari#2160).
A bare `var x;` mid-function is hoisted and writes nothing at runtime, but
the harvester recorded a def — fabricating a kill of the live def in the
same block: `x = source(); var x; sink(x)` lost the source→sink fact (a
reaching-defs false negative). Defs now require an initializer for
variable_declaration declarators; let/const genuinely initialize and keep
their def.

Finding P2-5 of review 4471987625 (abhigyanpatwari#2160).
…tection

`(x) += 1` and `(x)++` gated the def on the node type being exactly
'identifier', so the parenthesized form fell to the uses-only branch — the
def (and its kill) silently vanished. Wrappers that don't change the lvalue
(parenthesized_expression, TS non_null_expression) now unwrap at all three
lvalue sites.

Finding P3 (parenthesized lvalues) of review 4471987625 (abhigyanpatwari#2160).
A def inside a short-circuit right operand, ternary arm, logical assignment,
or switch case test was harvested as a must-def; the solver's total kill then
erased the prior def on the not-taken path — a taint false negative on core
idioms (`if (a && (x = clean())) {} sink(x)` lost source→sink;
`cached ?? (cached = load())` likewise). StatementFacts gains an optional
mayDefs field (conditional-context tracking in the harvester); the solver's
per-block GEN carries {set, kills} so a may-def UNIONS into the binding's set
instead of replacing it, in both the transfer and the statement sweep; the
emit fact-guard validates mayDefs indices; switch case tests harvest via the
conditional path.

Finding P1-1 of review 4471987625 (abhigyanpatwari#2160).
… continuation

A break to a label the visitor didn't model (labeled non-loop block, the
OUTER label of a doubly-labeled construct) routed to EXIT, REMOVING the only
path that kept the pre-jump def live — a reaching-defs false kill the in-code
comment wrongly called sound. Loop/switch frames now carry their full label
LIST (`outer: inner: for` resolves both); a labeled non-loop statement gets
a break-target frame whose target is a synthesized join after the body; an
unlabeled break never matches a block frame; labels compose with finalizer
threading (a labeled break crossing a finally still threads it).

Finding P1-2 of review 4471987625 (abhigyanpatwari#2160).
The throw contribution was IN ∪ OUT — entry and final states only. The
intermediate defs of a multi-def coalesced block were invisible to the
handler, though they are exactly what the catch observes when a later
statement throws: `try { x = parse(a); x = normalize(x); } catch { sink(x) }`
lost the parse→sink fact (normalize throwing delivers parse's value). Throw
predecessors now contribute IN(from) ∪ allDefs(from) — a static per-block
all-def-sites map — which subsumes OUT; monotone and deterministic.

Finding P1-3 of review 4471987625 (abhigyanpatwari#2160).
@magyargergo magyargergo force-pushed the feat/reaching-def-m2 branch from 2f8625a to 98f19c3 Compare June 11, 2026 04:34
@magyargergo magyargergo merged commit bde340a into abhigyanpatwari:main Jun 11, 2026
27 of 28 checks passed
@magyargergo magyargergo deleted the feat/reaching-def-m2 branch June 11, 2026 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

M2 — Intra-procedural data dependence: REACHING_DEF (GEN/KILL)

1 participant