perf(cpp): index ADL candidates once instead of per-site rescans by magyargergo · Pull Request #1990 · abhigyanpatwari/GitNexus

magyargergo · 2026-06-03T08:23:58Z

Problem

C++ scope-resolution emit dominates large-repo analysis. On a 5,969-file C++ repo the run spent ~6.76h in the scope-resolution emit phase — ~70% of the entire 9.76h analysis:

extract=713ms finalize=66514ms propagate=18839ms resolve=3548ms emit=24347727ms (5969 files)
[scope-resolution:cpp] 5969 files → 28184 IMPORTS + 60760 reference edges (650298 unresolved sites)

resolve (registry lookup) was only 3.5s, so the cost was entirely in fallback edge emission.

Root cause

pickCppAdlCandidates (argument-dependent lookup) runs once per unresolved ADL-eligible call site, and each invocation:

rescans every parsed file (rebuilding a per-file scope map each call),
scans every workspace def via findCppClassDefBySimpleName, and
uses an O(scopes²) child-scope walk for hidden friends.

That is O(unresolved sites × files). With ~650k unresolved C++ sites the emit phase goes super-linear.

Fix

Build an AdlCandidateIndex once per pipeline run (lazy, guarded by parsedFiles identity, reset in clearCppAdlState) and query it per site:

Index	Replaces
`classDefsBySimple`	per-site all-defs scan (preserves `defs.byId` order → identical first-match / ambiguous semantics)
`nsCandidates`	per-site all-files namespace scan (keeps inline-namespace transparency)
`friendCandidates`	per-site hidden-friend scan; a parent→children index removes the O(scopes²) walk
`nsFunctionsByQName` / `nsFunctionsBySimple`	per-site all-files function-reference scan

A monotonic seqByNodeId (file-major; namespace defs before friend/member defs within a file) lets the per-site query merge candidates across associated namespaces, dedup by nodeId, and sort — reproducing the exact legacy candidate set and order.

Per-site cost drops from O(sites × files) to O(associated namespaces); the index build is a single O(scopes + defs) pass.

Results

Benchmark (fixed 80 files, growing unresolved ADL sites), emit phase only:

unresolved sites	before	after
500	87 ms	7 ms
1000	232 ms	9 ms
2000	(≫, linear)	17 ms

emit goes from linear-in-sites to flat. The eliminated term scales with file count, so on the real 5,969-file repo (~75× more files) the speedup is ~1000×+ — the 6.76h emit collapses to the one-time index build plus a few seconds of per-site queries.

Validation

Parity: synthetic candidate output (overload narrowing + hidden-friend resolution) is byte-identical before/after.
270/270 C++ integration resolver tests pass; 4/4 resolver-parity-expected-failures pass.
tsc --noEmit and eslint both clean.

🤖 Generated with Claude Code

Follow-up hardening + benchmarks (post-tri-review)

Addresses the PR's own tri-review and adds committed performance guards. No change to the index's runtime behavior — these are docs, a dev/test-only assertion, and tests/benchmarks.

Docs — corrected the state-lifecycle comment (3→5 state pieces) and the clearCppAdlState caller attribution; documented ensureAdlIndex's full input set + clear-between-passes contract.
?? 0 invariant guard — validateAdlSeqCoverage runs in buildAdlIndex under the resolver's opt-in validation gate (NODE_ENV!=production && VALIDATE_SEMANTIC_MODEL!=0); a broken seq-coverage invariant now throws loudly in dev/CI instead of silently dropping a CALLS edge. Production behavior and the hot path are unchanged.
Parity fixture — cpp-adl-ns-plus-hidden-friend-same-name reaches one callable through each merge bucket (hidden-friend via friendCandidates, namespace member via nsCandidates), so dropping either bucket fails the suite.
ADL emit benchmark (cpp-adl-benchmark.test.ts) — co-scales files and sites so the old O(sites×files) cost is O(N²) and the new is O(N); isolates the scope-resolution emit ms (via the logger test destination) and asserts sub-quadratic scaling. GITNEXUS_BENCH=1, build-free.
C++ pipeline benchmark (cpp-pipeline-benchmark.test.ts) — fills the one missing per-language pipeline benchmark; file-count scaling with a coarse time guard + a deterministic node-ratio guard.

Validation: 272/272 cpp integration tests, 4/4 new unit tests, both benchmarks pass; tsc + eslint clean.

Post-Deploy Monitoring & Validation

No additional operational monitoring required — the change is internal to the C++ analyzer (no runtime/service surface); correctness is covered by the resolver test suite and the new dev-gated invariant guard, and CI runs all of it.

C++ scope-resolution `emit` dominated large-repo analysis (~6.76h on a 5,969-file repo — ~70% of the total run). `pickCppAdlCandidates` ran once per unresolved ADL-eligible call site and each time: - rescanned every parsed file (rebuilding a per-file scope map per call), - scanned every workspace def (`findCppClassDefBySimpleName`), and - used an O(scopes²) child-scope walk for hidden friends. That is O(unresolved sites × files); with hundreds of thousands of unresolved C++ sites the emit phase went super-linear. `resolve` (registry lookup) was only 3.5s — the cost was entirely in fallback edge emission. Build an `AdlCandidateIndex` once per run (lazy, guarded by `parsedFiles` identity, reset in `clearCppAdlState`) and query it per site: - `classDefsBySimple` — preserves `defs.byId` order so first-match / ambiguous semantics are identical to the legacy linear scan. - `nsCandidates` — namespace-owned callables, with inline-namespace transparency. - `friendCandidates` — hidden-friend + class-member callables; a parent→children scope index replaces the O(scopes²) walk. - `nsFunctionsByQName` / `nsFunctionsBySimple` — function-reference ADL path. A monotonic `seqByNodeId` (file-major; namespace defs before friend/member defs within a file) lets the per-site query merge candidates across associated namespaces, dedup by nodeId, and sort — reproducing the exact legacy candidate set and order. Per-site cost drops from O(sites × files) to O(associated namespaces); the emit phase goes from linear-in-sites to flat. Benchmark (files=80): emit at 1000 sites 232ms → 9ms, 2000 sites flat at 17ms; the eliminated term scales with file count, so the speedup is ~1000×+ on the real 5,969-file repo. Behavior is unchanged: synthetic candidate output is byte-identical before/after, all 270 C++ integration resolver tests and 4/4 resolver-parity-expected-failures pass, and tsc + eslint are clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-03T08:24:02Z

@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

magyargergo

Tri-review digest — PR #1990 (perf(cpp): index ADL candidates once)

Methods (engine breakdown): 8 reviewers — 7 Claude lanes (gitnexus risk-architect, test-ci-verifier; CE correctness, adversarial, performance, maintainability, testing) + 1 independent engine, Codex. Note: Codex was requested as gpt-5.5 but its runtime resolved to "GPT-5 Codex", not gpt-5.5 — so this is one genuinely-independent engine, not a confirmed gpt-5.5 run. Two of the three "methods" are the same engine (Claude) under different persona prompts, so Claude-only agreement is "consistent across personas," not independent confirmation; the strong signal here is Codex + Claude lanes agreeing there are no blocking bugs.

Verdict: production-ready with minor follow-ups

No lane found a P0/P1. Codex: SHIP (equivalence, cache lifecycle, correctness traps, memory all REFUTED). The remaining items are P2/P3 documentation, defensive-hardening, and test-coverage suggestions — none gate the merge.

Branch hygiene: clean single-purpose fix — one file (adl.ts, +245/−138), no unrelated churn, whitespace clean, no bidi/hidden Unicode. Merge state: MERGEABLE (GitHub mergeStateStatus: BLOCKED reflects pending required checks/review, not a conflict); a few CI checks (quality / format, Build & Push, e2e) were still pending at review time.

What the review validated (credit)

The behavior-preserving claim held up under adversarial scrutiny across both engines:

Equivalence / candidate order — REFUTED as a divergence. Each def is owned by exactly one scope, so seqByNodeId is unique per def; the build's file-major traversal (namespaces before friends per file) reproduces the legacy push order. Notably, correctness + adversarial independently showed candidate order is not even observable in the emitted graph: the sole consumer (free-call-fallback merged-ADL branch) either narrows to a unique survivor (order-independent) or suppresses on >1 (count-based). So order parity is doubly safe.
classDefsBySimple first-match/ambiguous semantics are identical to the legacy scopes.defs.byId linear scan (order preserved).
Inline-namespace transparency, Constructor inclusion split, and the hidden-friend path (now O(scopes) via a parent→children index instead of O(scopes²)) are all faithful to legacy.
Cache lifecycle — REFUTED as stale-data: parsedFiles is a fresh array per runScopeResolution, clearCppAdlState runs in loadResolutionConfig each pass, and the MCP analyze worker is a forked child that exits — no cross-run reuse.
Memory — bounded: the index holds references to defs that already exist; it's dropped per run.
Independent re-run of cpp.test.ts by two lanes reproduced 270/270 passing.

Findings (all non-blocking)

[P3 · inline] Stale state-lifecycle docs. The header lifecycle block (L50–58) and the clearCppAdlState JSDoc (L324) say "Three module-level maps … cleared via clearCppAdlState() (called from clearFileLocalNames)". This PR added two more state vars (adlIndex at L142, adlIndexSource at L143, reset at L330–331), so the count is now stale at five; and the "called from clearFileLocalNames" attribution was already wrong pre-PR — clearCppAdlState is called from cppScopeResolver.loadResolutionConfig (scope-resolver.ts:66), while clearFileLocalNames (file-local-linkage.ts:99) does not call it. A future maintainer adding C++ state may miss resetting the new vars. [code-read] — flagged by risk + maintainability lanes. Fix: update the lifecycle block to list all five state pieces and correct the caller name (and the L324 docstring). (Inline comment anchored at the new reset on L330; the stale prose itself is unchanged context above.)

[P3 · body] ?? 0 seq fallback is dead-defensive. bySeq.set(idx.seqByNodeId.get(def.nodeId) ?? 0, def) (L430): every def in a candidate bucket has its seq assigned in the same build block, so ?? 0 is provably unreachable today (confirmed by Codex + correctness + adversarial + performance). But if a future change ever bucketed a def without a seq, two such defs would collide on key 0 and one would be silently dropped (→ a missed CALLS edge, no error). [code-read] — flagged by 6 of 8 lanes. Optional hardening: replace ?? 0 with a loud assertion/throw, or start seq at 1 and treat 0 as "missing," so a broken invariant fails in tests rather than silently.

[P3 · body] Lazy-cache guard documents only one of its inputs. ensureAdlIndex invalidates on parsedFiles reference identity, but the index is also a function of scopes and the module-level classToNamespaceQualifiedName. Correct for the current pipeline (all built together per pass). A future incremental/watch path that reuses a parsedFiles array with different scopes without clearCppAdlState would serve a stale index. [code-read] — risk + adversarial + performance. Fix: a one-line comment on ensureAdlIndex stating the index is also keyed on scopes/classToNamespaceQualifiedName and callers must clear between such passes.

[P2→P3 · body] Parity guarded only by existing integration tests + an uncommitted harness. The 270 cpp.test.ts cases assert final CALLS edges (downstream of overload narrowing), and the byte-identical candidate-list check lives in a throwaway harness that isn't in the diff. A candidate-set/order regression that doesn't flip a narrowing outcome would pass CI. Given order is non-observable (above), real risk is low, so this is test-hardening, not a blocker. Suggestion (from the testing lane): add a small fixture where one associated namespace has both a namespace-level callable and a same-named hidden friend, asserting the resolved edge — this locks the merge order against future regressions.

CI

Core checks green: Analyze (js-ts, python), CodeQL, typecheck-web, tree-sitter ABI (macOS), scope-parity, tests, gitleaks. Vercel "fail" is deploy-authorization only (not a code failure). A few checks (quality / format, Build & Push, e2e) were still pending at review time.

Coverage

Full — single-file diff (adl.ts, +245/−138), read in its entirety; no partial-coverage caveats.

Automated multi-tool digest (7 Claude persona lanes + Codex). Two-thirds of the lanes share the Claude engine, so weight Codex+Claude agreement over persona consensus. Verify findings against the code before acting.

magyargergo · 2026-06-03T08:53:30Z

  argInfoBySite.clear();
  noAdlSites.clear();
  classToNamespaceQualifiedName.clear();
+  adlIndex = undefined;


[P3 · maintainability/docs] Stale state-lifecycle comment. This PR adds adlIndex/adlIndexSource (reset here at L330-331), which makes the module header block at L50-58 and this function's JSDoc at L324 stale on two counts:

They say "Three module-level maps" — there are now five state pieces.

They say clearCppAdlState() is "called from clearFileLocalNames" — it is actually called from cppScopeResolver.loadResolutionConfig (scope-resolver.ts:66); clearFileLocalNames (file-local-linkage.ts:99) does not call it (this attribution was already inaccurate pre-PR).

A maintainer adding C++ resolver state later may follow the comment and miss resetting the two new vars. Fix: update the L50-58 lifecycle block to list all five state pieces and correct the caller name (and the L324 docstring). [code-read]

github-actions · 2026-06-03T09:16:25Z

CI Report

✅ All checks passed

Pipeline Status

Stage	Status	Details
✅ Typecheck	`success`	tsc --noEmit
✅ Tests	`success`	unit tests, 3 platforms
✅ E2E	`success`	gitnexus-web changes only

Test Results

Tests	Passed	Failed	Skipped	Duration
10990	10975	0	15	690s

✅ All 10975 tests passed

15 test(s) skipped — expand for details

COBOL pipeline benchmark > scales with file count
C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
C++ pipeline benchmark > scales with file count
C# pipeline benchmark > scales with file count — namespaces spread across the solution
C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
Go pipeline benchmark > scales with file count (workers enabled)
Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
Go structural interface detection benchmark > scales linearly with interface × struct count
Go structural interface detection split-phase benchmark > separates index-build and detection time
PHP pipeline benchmark > scales with file count (workers enabled)
Ruby pipeline benchmark > scales with file count (workers enabled)
Rust pipeline benchmark > scales with file count (workers enabled)
run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric	Coverage	Covered	Base	Delta	Status
Statements	80.33%	38319/47701	79.84%	📈 +0.5	🟢 ████████████████░░░░
Branches	68.89%	24345/35334	68.5%	📈 +0.4	🟢 █████████████░░░░░░░
Functions	85.53%	3993/4668	84.94%	📈 +0.6	🟢 █████████████████░░░
Lines	83.93%	34473/41073	83.36%	📈 +0.6	🟢 ████████████████░░░░

_{📋 View full run · Generated by CI}

The header lifecycle block listed three module-level maps and named clearFileLocalNames as the reset caller; both became inaccurate when the candidate index was added. Enumerate all five state pieces, name the real caller (loadResolutionConfig), and document that ensureAdlIndex's staleness guard keys on parsedFiles identity while the index also depends on scopes and classToNamespaceQualifiedName. Addresses PR abhigyanpatwari#1990 tri-review (U1, U3). Doc-only; no behavior change.

pickCppAdlCandidates sorts merged candidates by seqByNodeId with a `?? 0` fallback. That fallback is unreachable today (every bucketed def is seq-assigned in the same build block), but a future regression could break it and silently collapse two seq-0 candidates, dropping a CALLS edge with no error. Add validateAdlSeqCoverage and run it from buildAdlIndex under the resolver's opt-in validation gate (NODE_ENV!=production && VALIDATE_SEMANTIC_MODEL!=0), so a broken invariant throws loudly in dev/CI instead. Production behavior and the hot path are unchanged. Unit-tested; 270/270 cpp integration tests pass with the guard active. Addresses PR abhigyanpatwari#1990 tri-review (U2).

…merge pickCppAdlCandidates merges friendCandidates (hidden friends of associated classes) and nsCandidates (namespace-owned callables) for a single associated namespace. The byte-identical-parity claim rested only on an uncommitted harness. Add a fixture that reaches one callable through each bucket — combine only via a hidden friend, process only via a namespace member — so dropping either bucket from the merge fails the suite. Candidate order is not observable (narrowing resolves a unique survivor or suppresses), so the guard is on the set. Addresses PR abhigyanpatwari#1990 tri-review (U4).

Guards the PR abhigyanpatwari#1990 optimization against reintroducing the O(sites x files) ADL candidate scan. Generates many UNRESOLVED ADL sites (class-typed arg + a callee declared nowhere) and co-scales files and sites with N, so the old cost is O(N^2) and the new cost O(N). Isolates the scope-resolution emit ms from parse-dominated wall time via the logger test destination (capture verified) and asserts the end-to-end emit ratio stays under fileRatio^1.5. Gated by GITNEXUS_BENCH=1; runs build-free (workerPoolSize: 0). Addresses the benchmark request alongside PR abhigyanpatwari#1990 (U5).

Fills the one missing per-language pipeline benchmark (cobol/csharp/go/php/ ruby/rust already have one); modeled on cobol-pipeline-benchmark.test.ts. Generates synthetic C++ with constant per-file work and constant header fan-out, sweeps file count through the full pipeline, and guards linearity with a coarse time-ratio bound plus a deterministic node-ratio bound (the non-flaky guard against reintroducing O(fileCount^2) work). Gated by GITNEXUS_BENCH=1; runs build-free (workerPoolSize: 0). Addresses the benchmark request alongside PR abhigyanpatwari#1990 (U6).

github-actions · 2026-06-03T10:14:47Z

✨ PR Autofix

Found fixable formatting / unused-import issues across 18 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":1990,"changed_lines":18,"head_sha":"4f5ca77ffe898b2a842603fea0a66b6cec1e7b7a","run_id":"26878272531","apply_command":"/autofix"}

The U4 parity fixture (cpp-adl-ns-plus-hidden-friend-same-name) lives under test/fixtures/lang-resolution/cpp-*, so its lib.h + app.cpp join the cpp scope-capture bench corpus (bench/scope-capture/measure.mjs). That is pure fixture-corpus growth — no scope-extractor change, existing fixtures' captures byte-identical — so the cpp fingerprint legitimately drifts (fixture_count 265->267). Rebaseline cpp to match, as abhigyanpatwari#1965/abhigyanpatwari#1975 did for earlier fixture additions. Verified: --check PASS for all 14 languages. Addresses PR abhigyanpatwari#1990 tri-review (U4 follow-on).

magyargergo linked an issue Jun 3, 2026 that may be closed by this pull request

gitnexus analyze hangs during [Resolving types (Cpp [3/3] - linking symbols] #1973

Closed

Merge branch 'main' into perf/cpp-adl-index-once

28e0507

magyargergo commented Jun 3, 2026

View reviewed changes

magyargergo added 5 commits June 3, 2026 10:12

magyargergo added 2 commits June 3, 2026 10:16

style(cpp): prettier-format adl benchmark

e2ba4b7

magyargergo merged commit f1b8438 into abhigyanpatwari:main Jun 3, 2026
29 of 30 checks passed

magyargergo deleted the perf/cpp-adl-index-once branch June 3, 2026 11:25

This was referenced Jun 3, 2026

fix(php): import decomposition, enum cases, anonymous class scope — F53,F54,F55 (#1931) #1989

Merged

feat(cpp): resolve inheritance-lattice member lookup #2077

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(cpp): index ADL candidates once instead of per-site rescans#1990

perf(cpp): index ADL candidates once instead of per-site rescans#1990
magyargergo merged 9 commits into
abhigyanpatwari:mainfrom
magyargergo:perf/cpp-adl-index-once

magyargergo commented Jun 3, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 3, 2026

Uh oh!

magyargergo left a comment

Uh oh!

magyargergo Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

magyargergo commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Results

Validation

Follow-up hardening + benchmarks (post-tri-review)

Post-Deploy Monitoring & Validation

Uh oh!

vercel Bot commented Jun 3, 2026

Uh oh!

magyargergo left a comment

Choose a reason for hiding this comment

Tri-review digest — PR #1990 (perf(cpp): index ADL candidates once)

Verdict: production-ready with minor follow-ups

What the review validated (credit)

Findings (all non-blocking)

CI

Coverage

Uh oh!

magyargergo Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Report

Pipeline Status

Test Results

Code Coverage

Tests

Uh oh!

github-actions Bot commented Jun 3, 2026

✨ PR Autofix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

magyargergo commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading