Skip to content

feat(cpp): migrate C++ to scope-based resolution model (#938)#1520

Merged
magyargergo merged 37 commits into
abhigyanpatwari:mainfrom
SZU-WenjieHuang:cpp-scope-resolution-parity
May 14, 2026
Merged

feat(cpp): migrate C++ to scope-based resolution model (#938)#1520
magyargergo merged 37 commits into
abhigyanpatwari:mainfrom
SZU-WenjieHuang:cpp-scope-resolution-parity

Conversation

@SZU-WenjieHuang

@SZU-WenjieHuang SZU-WenjieHuang commented May 12, 2026

Copy link
Copy Markdown
Contributor

Closes #935

Summary

This PR completes the remaining C++ scope-resolution parity work and removes the last legacy parity expected failure.

Key fixes:

  • Improves C++ overload / receiver type propagation cases for scope-resolution.
  • Fixes cross-file and method-chain resolution regressions.
  • Fixes write-access edge deduplication for compound assignments.
  • Removes the remaining C++ legacy parity expected failure after both resolver paths reach full parity.

The final write-access issue was caused by the legacy DAG processCalls path using a separate pendingWrites flow whose ACCESSES edge id did not include the assignment line. As a result, multiple writes to the same field in the same function, e.g. user.name = ... and user.name += ..., were deduplicated into one edge. This PR adds assignment line information to preserve each write site.

Verification

Local validation completed:

  • npx vitest run test/unit/scope-resolution/cpp/

    • 81 passed
  • REGISTRY_PRIMARY_CPP=1 npx vitest run test/integration/resolvers/cpp.test.ts

    • 133 passed
    • 0 failed
    • 0 skipped
  • REGISTRY_PRIMARY_CPP=0 npx vitest run test/integration/resolvers/cpp.test.ts

    • 133 passed
    • 0 failed
    • 0 skipped
  • npx vitest run test/integration/resolvers/c.test.ts

    • 6 passed
  • npx tsc --noEmit

    • Existing unrelated thrift-extractor.ts type errors remain.
    • No new TypeScript errors introduced by this PR.

Risk

The changes are focused on C++ scope-resolution and write-access edge identity. Both registry-primary and legacy paths were validated locally with the parity gate commands.

@magyargergo refer to #936

@vercel

vercel Bot commented May 12, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Found fixable formatting / unused-import issues across 45 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":1520,"changed_lines":45,"head_sha":"5b527b46e21e4730f55141f31bfb4d69e2c81a54","run_id":"25844459221","apply_command":"/autofix"}

Comment thread gitnexus/src/core/ingestion/languages/cpp/arity-metadata.ts Fixed
Comment thread gitnexus/src/core/ingestion/languages/cpp/interpret.ts Fixed
Comment thread gitnexus/test/unit/group/include-extractor.test.ts Fixed
Comment thread gitnexus/src/core/ingestion/languages/cpp/interpret.ts Fixed
Comment thread gitnexus/src/core/group/extractors/include-extractor.ts Fixed
HuangWenjie added 2 commits May 12, 2026 14:06
- prettier: format arity-metadata.ts, captures.ts, index.ts
- eslint: rename unused HEADER_GLOB to _HEADER_GLOB
- eslint: replace unsafe parser.parse() with parseSourceSafe()
- eslint: suppress intentional console.warn/log in sync.ts
- eslint: remove unused _it import alias in cpp.test.ts
- prettier: format call-processor.ts, imported-return-types.ts,
  include-extractor.test.ts, cpp-captures.test.ts, cpp-imports.test.ts
- eslint: suppress intentional console.warn in manifest-extractor.ts
- typecheck: restore 'thrift' in ContractType union (was accidentally
  removed) and add thrift case to exhaustive switch in manifest-extractor
@magyargergo

Copy link
Copy Markdown
Collaborator

@SZU-WenjieHuang the idea is to add the C++ scope-based resolution along with the legacy DAG version. Please have a look at #1497. Once we finished with the migration we will do the cleanup.

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
8975 8965 0 1 427s

✅ All 8965 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • Ruby mixin heritage: sequential vs worker parity > exercises both pipeline paths (sequential and worker)
  • Ruby mixin heritage: sequential vs worker parity > labels Ruby modules as Trait in both modes
  • Ruby mixin heritage: sequential vs worker parity > sequential mode resolves include-provided method: call_greet → greet
  • Ruby mixin heritage: sequential vs worker parity > sequential mode resolves prepend-only method: call_prepended_marker → PrependedOverride#prepended_marker
  • Ruby mixin heritage: sequential vs worker parity > sequential mode resolves prepend shadow: call_serialize → PrependedOverride#serialize (not Account#serialize)
  • Ruby mixin heritage: sequential vs worker parity > sequential mode resolves extend-provided class method: Usage#run → LoggerMixin#log
  • Ruby mixin heritage: sequential vs worker parity > sequential mode emits IMPLEMENTS edges for all three mixin kinds
  • Ruby mixin heritage: sequential vs worker parity > worker mode resolves the same include and prepend-only targets
  • Ruby mixin heritage: sequential vs worker parity > sequential and worker modes produce the same mixin-method CALLS edges

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 78.11% 28223/36131 N/A% 🟢 ███████████████░░░░░
Branches 66.45% 17823/26819 N/A% 🟢 █████████████░░░░░░░
Functions 82.9% 2837/3422 N/A% 🟢 ████████████████░░░░
Lines 81.38% 25497/31328 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@SZU-WenjieHuang

Copy link
Copy Markdown
Contributor Author

@magyargergo
Thanks for the pointer! I've viewed #1497 and confirmed that the structure
of this PR follows the same pattern:

  • New languages/cpp/ module with scope-resolver, query patterns, captures,
    interpret, import decomposition, arity, merge-bindings, etc.
  • Registered in registry.ts and added to MIGRATED_LANGUAGES
  • Legacy c-cpp.ts provider is preserved — both paths run in parity
  • Test results: 133/133 passed on both REGISTRY_PRIMARY_CPP=1 and =0

HuangWenjie and others added 5 commits May 12, 2026 14:43
Restore types.ts, config-parser.ts, matching.ts, sync.ts, and
manifest-extractor.ts to upstream/main versions. The original commit
accidentally removed fields (thrift, workspace_deps, exclude_links_paths,
exclude_links_param_only_paths) from DetectConfig/MatchingConfig/ContractType
which are still referenced by matching.test.ts, config-parser.test.ts,
sync.test.ts and other integration tests.

This PR's scope is C++ scope-resolution parity only — group module
type definitions and logic should remain unchanged.
- arity-metadata.ts, interpret.ts: replace single-pass template strip
  regex (/<[^>]*>/g) with a while-loop to fully handle nested templates
  like Map<List<int>> — resolves 'Incomplete multi-character sanitization'
- cpp.test.ts: remove unused vitest 'it' import since the file defines
  its own 'it' via createResolverParityIt — resolves 'Assignment to constant'
- include-extractor.test.ts: use fs.mkdtempSync() instead of predictable
  os.tmpdir()+Date.now() paths — resolves 'Insecure temporary file'
- interpret.ts: remove redundant 'name !== undefined' check (already
  guaranteed by early return) — resolves 'Comparison between inconvertible types'
@magyargergo magyargergo changed the title fix(cpp): complete scope-resolution parity feat(cpp): migrate C++ to scope-based resolution model (#938) May 12, 2026
Comment thread gitnexus/src/core/group/extractors/include-extractor.ts Outdated
Comment thread gitnexus/src/core/group/extractors/include-extractor.ts Outdated
@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 21m 44s —— View job


Production-Readiness Review: PR #1520


Review bar for this PR

  • C++ scope-resolver must not emit false-positive CALLS/USES edges for ambiguous using namespace, template, or overloaded-factory cases
  • REGISTRY_PRIMARY_CPP=0 and =1 must both pass on the latest head, not only locally
  • include-extractor.ts must be exactly at main state or the changes must be causal to C++ scope parity with tests retained
  • Write ACCESSES edge deduplication fix must be proven by a test asserting exactly N distinct edges per write site, not just a count
  • COMPOUND_RECEIVER_MAX_DEPTH and imported-return-types changes affect all migrated languages — must be justified and regression-free
  • No deleted regression tests for group extractor behavior (findings readme #3highlight tool for agent, prompt enhancements #6) without equivalent coverage elsewhere
  • C provider behavior must be provably unchanged through C-specific regression tests
  • Feature-flag rollback (REGISTRY_PRIMARY_CPP=0) must restore old behavior without partial mixed-mode state
  • Branch must contain only C++ scope-resolution parity changes — unrelated group module changes must be absent or exactly at main

Current PR state


Branch hygiene assessment

rebase/split required before final review

The initial commit dc1db90 was a "kitchen-sink" commit that included large modifications to unrelated group module files: config-parser.ts, manifest-extractor.ts, matching.ts, sync.ts, types.ts, and critically include-extractor.ts. A partial revert in 4208467 restored five of those files to upstream main, but explicitly excluded include-extractor.ts and its test file. The current include-extractor.ts (457 lines) contains unrelated functional changes — including deletion of a stripBlockComments guard and introduction of progressive suffixResolve — that break previously-tested behavior. Three merge-from-main commits (39428ee, 68c2399, 8e90f81) further pollute the branch history with Java migration, CI fix, and unrelated PR work. The diff cannot be cleanly reviewed as a C++ scope-resolution parity change because the group extractor changes are interleaved and the regression tests for those changes were deleted.


Understanding of the change

What the PR is trying to do: Complete the C++ scope-resolution parity migration by adding a full languages/cpp/ module (scope-resolver, query patterns, captures, interpret, arity, merge-bindings, range-bindings, file-local-linkage, header-scan, import-decomposer, import-target, simple-hooks), registering C++ in MIGRATED_LANGUAGES and SCOPE_RESOLVERS, and fixing a write-access edge deduplication bug in the legacy DAG pendingWrites path.

Code paths changed: (1) New languages/cpp/ module wired via c-cpp.ts and registered in registry.ts; (2) MIGRATED_LANGUAGES gains CPlusPlus making C++ default to registry-primary; (3) call-processor.ts pendingWrites now includes line number in ACCESSES edge ID; (4) parse-worker.ts propagates line number for write captures; (5) compound-receiver.ts max depth bumped 4→6; (6) imported-return-types.ts now also propagates wildcard-origin bindings; (7) include-extractor.ts refactored (unrelated).

What is proven by tests: Diamond inheritance, basic call resolution, arity filtering, member calls, constructors, receiver-constrained, variadic, range-for, method chains, return-type inference, write-access (count=3), overload disambiguation, cross-file binding. File-local linkage detection (unit). Anonymous namespace detection (unit). Both REGISTRY_PRIMARY_CPP=0 and =1 claimed to pass locally.

What remains uncertain: (1) Whether the include-extractor block comment and suffix-resolve regressions are live bugs in production (they are, per code inspection). (2) Whether the wildcard-origin change in imported-return-types.ts could cause over-propagation for Python/TypeScript/Go/C/C# (uses wildcard imports). (3) Whether COMPOUND_RECEIVER_MAX_DEPTH=6 is regression-free for all currently-migrated languages. (4) Whether anonymous namespace cross-file exclusion is end-to-end tested through the resolver pipeline (no integration test).


Migration-readiness scorecard

Item Result
Migration scope understood pass
Legacy path identified pass
New registry-primary path identified pass
Feature flag behavior verified pass
Default production behavior verified pass (CPlusPlus in MIGRATED_LANGUAGES)
Side-by-side parity verified pass (CI parity workflow runs both modes)
Negative tests verify no false edges fail — no integration tests for using namespace false-edge prevention, conflicting namespace cases, or anonymous namespace cross-file exclusion via the full resolver pipeline
C regression risk tested pass (C provider separate; C tests unchanged)
Edge ID compatibility reviewed pass (write ACCESSES IDs intentionally include line number)
Rollback path exists pass (REGISTRY_PRIMARY_CPP=0 restores legacy path)
CI validates migration path pass (ci-scope-parity.yml discovers MIGRATED_LANGUAGES and runs both paths)
Branch hygiene acceptable failinclude-extractor.ts not restored to main; regression tests deleted
Security / hidden Unicode clean pass (no bidi controls; non-ASCII only in comments)

Findings


Finding 1 — BLOCKER

  • Severity: blocker
  • File: gitnexus/src/core/group/extractors/include-extractor.ts (lines 404–421), gitnexus/test/unit/group/include-extractor.test.ts
  • Evidence: The revert commit 4208467 explicitly lists the restored files in its message — config-parser.ts, manifest-extractor.ts, matching.ts, sync.ts — and does NOT include include-extractor.ts. The current file at line 407–411 runs INCLUDE_REGEX.exec(content) directly on raw content in the catch fallback without first stripping block comments. The original code had stripBlockComments for exactly this reason (referencing PR feat: add IncludeExtractor for C++ cross-repo include tracking (group) #1156 finding Embeddings pipeline #5). That function and its test (finding #5: regex fallback ignores block-commented includes) were deleted in dc1db90 and not restored. Independently confirmed: INCLUDE_REGEX = /^[ \t]*#\s*include\s*"([^"]+)"/gm — multiline regex with no block-comment awareness.
  • Repro:
    /* This is commented out:
    #include "ext/api.h"
    */
    void main() {}
    For files >32KB this include would fire the regex fallback, match the commented-out directive, and emit a spurious cross-repo consumer contract for ext/api.h.
  • Production risk: Files >32 KB with block-commented #include directives will emit spurious cross-repo consumer contracts, creating false graph edges across repositories.
  • Recommended fix: Restore stripBlockComments in the catch block before running the regex: const stripped = content.replace(/\/\*[\s\S]*?\*\//g, (m) => m.replace(/[^\n]/g, ' '));
  • Test required: Restore finding #5: regex fallback ignores block-commented includes test
  • Blocks merge: yes

Finding 2 — BLOCKER

  • Severity: blocker
  • File: gitnexus/src/core/group/extractors/include-extractor.ts (line 420), gitnexus/test/unit/group/include-extractor.test.ts
  • Evidence: Line 420: const resolved = suffixResolve(pathParts, normalizedFiles, allFiles, suffixIndex). suffixResolve tries progressively shorter suffix matches. The deleted test finding #4: suffix-ambiguity does not silently suppress cross-repo include documented and caught exactly this: #include "ext/api.h" would be suppressed (not emitted as a consumer contract) if any local file happens to end in api.h (e.g., internal/api.h). The old isLocalInclude helper only matched when the local file path ended with the full include string — not a shorter suffix. The regression test for this was deleted in dc1db90 and not restored.
  • Repro: Repo with internal/api.h and cpp file with #include "ext/api.h". suffixResolve tries ext/api.h (no match), then tries api.h → matches internal/api.h → returns non-null → silently skips cross-repo contract.
  • Production risk: Cross-repo include contracts silently suppressed, breaking cross-repo impact analysis and dependency graph edges.
  • Recommended fix: Use only full-path suffix match as guard: const fullSuffix = pathParts.join('/'); const fullMatch = suffixIndex.get(fullSuffix) ?? suffixIndex.getInsensitive?.(fullSuffix); if (fullMatch) continue;
  • Test required: Restore finding #4: suffix-ambiguity does not silently suppress cross-repo include test
  • Blocks merge: yes

Finding 3 — BLOCKER

  • Severity: blocker
  • File: gitnexus/src/core/group/extractors/include-extractor.ts, gitnexus/test/unit/group/include-extractor.test.ts
  • Evidence: The include-extractor.ts file in this PR is NOT the same as main. It is a different implementation with different behavior. The revert commit 4208467 states its scope was config-parser.ts, manifest-extractor.ts, matching.ts, sync.ts — confirming that include-extractor.ts changes are still present. The include-extractor.test.ts went from ~600 lines to 236 lines, with findings readme #3 (case-folding determinism), Embeddings pipeline #4 (suffix-ambiguity), Embeddings pipeline #5 (block-comment regression), and highlight tool for agent, prompt enhancements #6 (meta.source path) all deleted. Additionally meta.source is hardcoded as 'tree_sitter' at line 434 even when the regex fallback path executes (finding highlight tool for agent, prompt enhancements #6).
  • Production risk: Loss of test coverage for previously-caught regressions; meta.source lying about extraction path prevents debugging of group extractor issues.
  • Recommended fix: Either (a) fully restore include-extractor.ts and its test to the main baseline and leave it out of this PR, or (b) keep the new implementation but restore all deleted regression tests and fix the three confirmed bugs.
  • Test required: Restore findings readme #3, Embeddings pipeline #4, Embeddings pipeline #5, highlight tool for agent, prompt enhancements #6 regression tests or prove equivalent coverage
  • Blocks merge: yes

Finding 4 — MAJOR

  • Severity: major
  • File: gitnexus/src/core/ingestion/scope-resolution/passes/compound-receiver.ts (line 35), gitnexus/src/core/ingestion/scope-resolution/passes/imported-return-types.ts (lines 155–160)
  • Evidence: Two shared pipeline changes affect all currently migrated languages (Python, C#, TypeScript, Go, C, and now C++), not just C++:
    1. COMPOUND_RECEIVER_MAX_DEPTH bumped from 4 to 6 — deeper chain resolution for all languages. No documentation for why depth 6 is safe. Method chains of length 5+ are extremely rare; the change could cause performance degradation or unexpected resolution cascades on already-migrated languages.
    2. imported-return-types.ts now accepts wildcard as a valid origin for propagation (previously only import and reexport). In Python (which uses wildcard imports via from x import *), TypeScript (wildcard re-exports), and Go (no wildcard but still applies), this may widen return-type propagation in ways not tested for the previously-migrated languages.
  • Production risk: Regression in Python/C#/TypeScript/Go/C resolver behavior without targeted regression tests for those languages under these changes. Parity tests may not catch cross-language regressions.
  • Recommended fix: Scope the COMPOUND_RECEIVER_MAX_DEPTH change to C++ via the resolver config, or document why depth=6 is universally safe. For imported-return-types.ts, add a test that verifies the wildcard propagation path does not over-fire for Python or TypeScript.
  • Test required: Regression test for imported-return-types.ts with wildcard-origin binding; compound-receiver depth test at depth 5+
  • Blocks merge: maybe (requires verification CI ran all migrated language parity tests on the latest head)

Finding 5 — MAJOR

  • Severity: major
  • File: gitnexus/src/core/ingestion/languages/cpp/arity-metadata.ts (normalizeCppParamType, lines 144–157)
  • Evidence: The STD_MAP normalizes long, short, unsigned, 'unsigned int', 'long long', size_t all to 'int'. This means void foo(long n) and void foo(int n) are indistinguishable during overload resolution. In practice, C++ code that overloads on integer width (common in systems code) would produce ambiguous resolution that incorrectly picks one candidate. The declared and call-site types are normalized independently, so foo(42L) (long literal — but wait, inferCppLiteralType returns 'int' for 42L since it doesn't check the L suffix). 42Lnumber_literal → no float markers → 'int'. So foo(long) and foo(int) would both get normalized type 'int', which means arity-based type matching can't distinguish them. This could create false CALLS edges to the wrong overload.
  • Repro:
    void process(int x) { /* path A */ }
    void process(long x) { /* path B */ }
    void caller() { process(42); }  // should call int version
    Both declarations normalize to (int). The resolver picks arbitrarily.
  • Production risk: Wrong overload resolution for integer-width overloads — a false CALLS edge to the wrong function definition.
  • Recommended fix: For overload disambiguation, only use type matching as a tie-breaker when it is unambiguous. If normalized types are identical for two candidates, treat both as compatible rather than arbitrarily picking one.
  • Test required: Add test case: two overloads f(int) and f(long) — verify both remain as candidates when called with 42, not that one is picked over the other.
  • Blocks merge: maybe

Finding 6 — MAJOR

  • Severity: major
  • File: gitnexus/test/integration/resolvers/cpp.test.ts (lines 940–958)
  • Evidence: The write-access deduplication test verifies writes.length === 3 and uses toContain('name'). The fixture service.cpp has user.name = "Alice", user.address = Address(), and user.name += " Smith" — so there are 2 writes to name and 1 write to address. The test checks total count=3 and that name and address appear, but does NOT explicitly assert that name appears exactly twice. If both name writes are present but an address write is missing (or vice versa), the count=3 check would still pass while masking a regression in one write path.
  • Production risk: The PR's stated primary fix (write-access deduplication of compound assignments) is not rigorously regression-tested. If a future change re-introduces deduplication for one write path, the test would not catch it.
  • Recommended fix:
    const nameWrites = writes.filter(e => e.target === 'name');
    expect(nameWrites).toHaveLength(2); // both = and += must produce edges
    const addrWrites = writes.filter(e => e.target === 'address');
    expect(addrWrites).toHaveLength(1);
  • Test required: Exact count per field name
  • Blocks merge: no (the fix is correct; the test is just imprecise)

Finding 7 — MINOR

  • Severity: minor
  • File: gitnexus/test/integration/resolvers/cpp.test.ts
  • Evidence: No integration test for: (a) anonymous namespace file-local linkage preventing cross-file false CALLS edges (unit tests exist but the full resolver pipeline is not exercised); (b) using namespace wildcard with conflicting namespace names (two namespaces both having foo() — no test proves that GitNexus stays conservative); (c) using namespace std not leaking STL CALLS edges.
  • Production risk: Unknown — may produce false edges in real codebases that mix namespaces.
  • Recommended fix: Add integration fixtures for these scenarios.
  • Test required: Anonymous namespace cross-file exclusion via full pipeline; conflicting using namespace produces no CALLS to wrong target
  • Blocks merge: no

Finding 8 — MINOR

  • Severity: minor
  • File: gitnexus/src/core/ingestion/languages/cpp/query.ts
  • Evidence: Non-ASCII box-drawing characters (, ) appear inside the CPP_SCOPE_QUERY template literal at line 6 and throughout (as tree-sitter comment lines ;; ── Scopes ──...). These are in comment lines (;; ...) within the tree-sitter query language and are harmless to query execution. However, they are non-ASCII in a query string, which fails a strict hidden-Unicode check.
  • Production risk: No functional risk; pure hygiene.
  • Recommended fix: Replace and with ASCII equivalents (-, --) in query comments, or document repo policy allowing visible non-ASCII in comments.
  • Blocks merge: no

PR-specific assessment sections

1. C++ grammar and tree-sitter query correctness

query.ts is comprehensive: covers qualified_identifier, field_identifier, destructor_name, template_function, function_declarator, pointer_declarator, reference_declarator, optional_parameter_declaration, variadic_parameter_declaration, using_declaration, anonymous namespace, new_expression, call_expression, field_expression, assignment_expression. Node type names match tree-sitter-cpp's grammar.

One gap: operator_name nodes (e.g., operator+, operator==) are not captured as @declaration.method. They would be missed as function definitions if defined with operator names inside classes. This is an out-of-scope gap per RFC V1 but should be documented.

Another: reference_declarator handling in findFuncDeclarator (arity-metadata.ts) traverses childForFieldName('declarator') first and falls back to scanning children. For deeply-nested return declarators, this unwrap may not reach the function_declarator. Arity would return {} (unknown), which is safe (conservative).

2. C++ lexer/preprocessor/header/include behavior

import-decomposer.ts correctly handles string_literal vs system_lib_string nodes to detect system vs local includes. using_declaration decomposition correctly distinguishes using namespace X (wildcard) from using X::name (named import) by checking for the anonymous namespace token.

header-scan.ts walks the filesystem at resolution time. This is synchronous I/O and could be slow for large repos. It excludes common directories but not all vendor paths.

The .h extension is handled by cppProvider (not cProvider). This was the pre-existing design. C .h files in pure-C repos would be parsed by the C++ provider at the group extractor layer but the scope resolver is gated by getLanguageFromFilename which returns the language from the file extension mapping. This warrants confirming: does getLanguageFromFilename('.h') return C or CPlusPlus?

3. GCC/G++ semantic parity: name lookup, overloads, scope resolution

Template generic-ignored is correct for V1 — Box<User> resolves to Box. The risk is conflating Box<User>::save() and Box<Order>::save() into the same Box::save() target. This is documented behavior and acceptable for V1.

isSuperReceiver pattern /^[A-Z]\w*::/ matches any qualified call starting with an uppercase identifier. This is a reasonable heuristic for Base::method() but would also match namespace-qualified calls like Singleton::getInstance(). This means namespace-qualified calls would be incorrectly treated as super-receiver calls. This could suppress normal qualified-call resolution for namespace functions.

cppMergeBindings tier ordering (local > namespace > import > reexport > wildcard) correctly models C++ scope precedence. Local definitions shadow namespace imports which shadow include-based wildcards.

ADL: confirmed not implemented. allowGlobalFreeCallFallback: true means unresolved member calls fall back to global free-call lookup. This is a conservative approximation that could produce false edges when multiple free functions share a name.

4. Static-analysis conservatism: false edges vs missing edges

The implementation is appropriately conservative in most areas: cppReceiverBinding returns null and defers to populateOwners; fieldFallbackOnMethodLookup: false prevents incorrect method resolution through field types. cppArityCompatibility returns 'unknown' when metadata is missing.

The arity type normalization (finding #5 above) represents the main false-positive risk. long/int/short/unsigned normalization to 'int' can cause wrong overload selection.

The lookupDeclaredTypeForIdentifier function scans only compound_statement (function body) for type declarations. It doesn't scan for_statement init, if_statement init (C++17), or outer scopes. This means some argument types would be missed ('' returned), which is conservative (no type match → stays as candidate). Safe.

5. Migration completeness and production readiness

What migrated: C++ production behavior now defaults to the scope-resolution registry path (CPlusPlus in MIGRATED_LANGUAGES). Legacy path (call-processor.ts processing) is gated by isRegistryPrimary and skips C++ files.

Default behavior: Registry-primary by default — C++ repositories will use the new resolver on production deployment without setting any env var. This is intentional and confirmed.

Migration completeness: The core migration is complete (new resolver implemented, registered, default flipped, both paths tested). The concern is whether the unrelated include-extractor changes polluting the branch introduce production regressions in the group extraction pipeline unrelated to the C++ migration itself.

6. Feature flag, default behavior, and rollback

REGISTRY_PRIMARY_CPP=0 restores legacy DAG for C++. REGISTRY_PRIMARY_CPP=1 forces registry-primary. Default (no env var) = registry-primary since CPlusPlus is in MIGRATED_LANGUAGES. The registry-primary-flag test at line 127–136 explicitly tests the CPlusPlusREGISTRY_PRIMARY_CPP mapping and the negative case. Rollback is safe: no schema changes, no database migrations.

7. Write-access edge identity

The fix adds line?: number to ExtractedAssignment in parse-worker.ts and includes it in the ACCESSES edge ID: generateId('ACCESSES', ${srcId}:${fieldNodeId}:write${pw.line !== undefined ? :${pw.line} : ''}). The line is startPosition.row + 1 (1-indexed). This is deterministic for the same source text. The :${pw.line} suffix correctly distinguishes two writes to the same field on different lines.

The ? (optional) on line means some write paths could still not carry a line number, which would leave those paths susceptible to deduplication. In parse-worker.ts, the worker path always sets line. In call-processor.ts pendingWrites (legacy DAG), line is also always set (captureMap['assignment'].startPosition.row + 1). So both paths always provide a line number in practice — the ? type is defensive.

8. Registry-primary and legacy parity

The CI parity workflow ci-scope-parity.yml dynamically discovers MIGRATED_LANGUAGES and runs the C++ integration test file under both REGISTRY_PRIMARY_CPP=0 and =1. Both are required to pass. The PR claims 133/133 passed both ways locally. The CI report shows "all checks passed" with 8743 tests.

One concern: the parity test uses createResolverParityIt('cpp') which runs tests under both modes. But the write-access test might behave differently between modes: the legacy path uses pendingWrites with line-number fix; the new registry path uses a different write capture mechanism. Whether both produce the same 3 edges needs verification. The test passes currently, suggesting it does — but the underlying mechanisms are different.

9. C regression risk

cProvider (.c files only) is unchanged: same hooks, same emitCScopeCaptures, same cArityCompatibility. The C scope resolver (c/scope-resolver.ts) is unchanged. C integration tests (test/integration/resolvers/c.test.ts) run unchanged. The isRegistryPrimary(SupportedLanguages.C) returns true (C is in MIGRATED_LANGUAGES since before this PR), so C was already on the registry-primary path — this PR doesn't affect C behavior.

10. Tests and CI

Coverage present: Diamond inheritance, arity filtering, member calls, constructors, receiver-constrained, variadic, range-for (simple, structured binding, dereference), method chains, return-type inference, auto alias, assignment chain, overload disambiguation by type and arity, const-qualified overloads, template overloads, out-of-class methods, write-access tracking, cross-file binding propagation.

Coverage absent:

  • using namespace with conflicting names from two namespaces — no negative test
  • Anonymous namespace cross-file exclusion via full pipeline (unit test only)
  • Template type conflation: Box<User> and Box<Order> both mapping to Box with same method — no test that proves only one target is picked (or both, if multiple definitions)
  • Macro-shaped calls (calls that expand to function calls via preprocessor macros) — no test
  • Operator overloads (operator+, operator<<) — not captured, not tested

CI: The parity workflow structure is sound. The question is whether the latest head 8e90f81 (a merge-from-main commit) has all checks passing on it — the CI report in the PR thread is from an earlier commit. This needs verification.

11. Security, CodeQL, hidden Unicode, and hygiene

12. Branch hygiene and scope control

The branch has three merge-from-main commits and an incomplete revert. The C++ scope-resolution core (14 new files + 15 modified files in cpp/) is reviewable and focused. The contamination is specifically:

These cannot be justified as "required for C++ parity." They must either be split out or fully reverted to main before merge.


Final verdict

rebase/split required before final review

  1. Is the C++ migration complete? The core migration is technically complete — the resolver module is implemented, registered, and defaulted. However, the PR cannot be verified as production-ready because unrelated include-extractor.ts changes obscure the diff.
  2. Is the production default safe? Adding CPlusPlus to MIGRATED_LANGUAGES flips the production default. The new C++ resolver is reasonably implemented and CI-validated. The risk is the group extractor regression, not the C++ resolver itself.
  3. Is rollback available? Yes — REGISTRY_PRIMARY_CPP=0 is tested and functional.
  4. Are false-positive graph edges acceptably prevented? The C++ resolver is conservative in most areas. The long/int type normalization and isSuperReceiver pattern for namespace-qualified calls are two false-positive risks that warrant targeted tests.
  5. Is C behavior protected? Yes — C provider is separate and unaffected.
  6. Are CI and tests sufficient? CI parity structure is good. Two critical gaps: (a) include-extractor regression tests deleted without replacement; (b) no integration-level negative tests for using namespace and anonymous namespace cross-file exclusion.
  7. Is the branch clean enough to merge? No. include-extractor.ts and its test file contain confirmed production regressions (block-comment regex fallback, progressive suffix suppression) with their regression tests deleted. These bugs were introduced in the initial commit, the revert commit explicitly excluded them, and they remain in the current HEAD.

The C++ scope-resolution work itself is solid and nearly production-ready. The blocker is not the C++ resolver — it is the include-extractor.ts changes that were accidentally included, partially reverted, and left in a broken state with deleted tests. Restoring include-extractor.ts and its test to the main baseline (or fixing the three confirmed bugs and restoring the regression tests) would make this PR mergeable pending the minor follow-ups above.

@magyargergo

Copy link
Copy Markdown
Collaborator

@SZU-WenjieHuang please look into claude's findings 🙏

Resolve conflicts from PR abhigyanpatwari#1497 (PHP migration to scope-based resolution):
- registry-primary-flag.ts: include both CPlusPlus and PHP in MIGRATED_LANGUAGES
- scope-resolution/pipeline/registry.ts: register both cppScopeResolver and phpScopeResolver
- registry-primary-flag.test.ts: keep dynamic opt-out loop from main; assert both CPlusPlus and PHP are off
- call-processor.ts: keep HEAD's deferred-write 'line' field (used for unique ACCESSES edge IDs)
- Findings 1-3 (BLOCKERS): restore include-extractor.ts and its test to
  the main baseline. Block-comment fallback regression, suffix-resolve
  false-positive suppression, and the four deleted regression tests
  (abhigyanpatwari#3-abhigyanpatwari#6) are now back. These changes were unrelated to C++ scope
  parity and should not have been in this PR.

- Finding 4 (MAJOR, partial): revert COMPOUND_RECEIVER_MAX_DEPTH 6 to
  4. No C++ test exercises depth > 4 (cpp-chain-call uses a 2-hop
  chain), so the bump risked silent regressions on other migrated
  languages without justification. The wildcard-origin propagation in
  imported-return-types.ts is retained — C++ #include and using
  namespace both emit wildcard-origin bindings (cpp/import-decomposer
  .ts:40,90), so wildcard propagation is causal to C++ parity.

- Finding 6: tighten write-access dedup test with exact per-field
  counts (nameWrites = 2, addrWrites = 1) instead of total-count + sub
  string containment, so a regression in one of the two name writes
  can no longer be masked.

- Finding 8: skipped. Box-drawing characters in cpp/query.ts comments
  match the established convention used in csharp/java/php query
  files.

Finding 5 (int/long normalization tie-breaker) left as documented
follow-up — proper fix requires resolver-level tie-breaker logic and
risks regressing other arity-matching tests.
@magyargergo

Copy link
Copy Markdown
Collaborator

@SZU-WenjieHuang will continue work on it tomorrow

@SZU-WenjieHuang

SZU-WenjieHuang commented May 13, 2026

Copy link
Copy Markdown
Contributor Author

@magyargergo Thanks for taking this forward and for the detailed follow-up plan.

If you need me to pick up any specific follow-up item, run validation locally, or help with additional C++ parity tests, please feel free to ping me anytime :)

@magyargergo

Copy link
Copy Markdown
Collaborator

@magyargergo Thanks for taking this forward and for the detailed follow-up plan.

If you need me to pick up any specific follow-up item, run validation locally, or help with additional C++ parity tests, please feel free to ping me anytime :)

Thank you! No need for now. They have been addressed already.

@magyargergo

Copy link
Copy Markdown
Collaborator

@SZU-WenjieHuang actually, can you review this plan and if it needs any adjustments? 🙏

…ers (U1)

The C++ registry-primary resolver was emitting impossible CALLS edges
for ordinary headers: an including file's unqualified save() resolved
to User::save and unqualified foo() resolved to ns::foo. Two leak
paths converged on localDefs:

1. expandCppWildcardNames (file-local-linkage.ts) iterated the
   flattened localDefs and exported every simple tail, including
   class-owned methods and namespace-contained symbols. Replaced with
   a scope-aware filter: build nodeId -> owning Scope from
   Scope.ownedDefs and skip defs whose owning scope is Namespace or
   Class.

2. The shared global free-call fallback's pickUniqueGlobalCallable
   walks the workspace registry by simple name and would still hit
   class methods / namespace members even with wildcard expansion
   fixed. Plugged the gap via the existing isFileLocalDef hook —
   semantically 'logically invisible cross-file' — by tracking per-
   file non-globally-visible nodeIds (populateCppNonGloballyVisible,
   called from populateOwners) and adding an ownerId !== undefined
   fast-path for class-owned defs.

Side fix in shared finalize-algorithm.ts: when wildcard expansion
resolves to a real target but produces zero propagating names, the
edge was dropped, taking the file-level IMPORTS edge with it.
Preserve the original wildcard edge so #include dependencies survive
even when the header exposes no unqualified bindings.

Tests: cpp-include-no-class-leak, cpp-include-no-namespace-leak, and
cpp-anon-ns-same-file-visible fixtures. Negative tests mode-gated to
REGISTRY_PRIMARY_CPP=1 via the expected-failures registry — legacy
DAG has no scope-aware filtering on the global fallback; backporting
is out of scope. All 2104 resolver integration tests pass under
registry-primary mode.
@magyargergo

Copy link
Copy Markdown
Collaborator

/autofix

@github-actions

Copy link
Copy Markdown
Contributor

✅ Applied autofix and pushed a commit. (apply run)

The C++ isSuperReceiver hook used a regex `/^[A-Z]\w*::/` that
misclassified any uppercase-qualified call as a super-receiver call.
Singleton::getInstance(), std::Foo::bar(), and PascalCase namespace
calls all entered the super branch, where the absence of an enclosing
class (or wrong MRO context) dropped the resolution entirely.

Fix:
- New optional ScopeResolver hook isSuperReceiverInContext(text,
  callerScope, scopes). Languages where super classification depends
  on caller context define it; receiver-bound-calls.ts prefers it
  when defined and falls back to the simple isSuperReceiver(text)
  otherwise. Other migrated languages (Python, Java, C#, PHP, Go,
  TypeScript) are unchanged.
- C++ implementation: parse the LHS of '::' from the receiver text,
  resolve via findClassBindingInScope, and return true only when
  the LHS is a class-like def in the caller's enclosing class's MRO.
  Returns false for namespace LHS, unresolved LHS, self-class LHS
  (qualified self-calls aren't super), and any non-'::' form.
- Extended the C++ tree-sitter query to capture the LHS of
  qualified_identifier as @reference.receiver so qualified static
  member calls (Singleton::getInstance()) reach the receiver-bound
  Case 2 (class-name receiver) path. Without the receiver capture,
  qualified calls had no explicit receiver and could not resolve
  through any receiver-bound branch.

Test: cpp-namespace-qualified-not-super fixture. Singleton::getInstance()
from a free function asserts exactly 1 CALLS edge through the
qualified-call path. Passes under both REGISTRY_PRIMARY_CPP=1 and =0.

All 2113 resolver integration tests pass; all 147 cpp tests pass under
both modes.
…llide (U4)

ISO C++ rejects 's.f(1)' as ambiguous when both 'void f(int)' and
'void f(int, int = 0)' are declared on S. The previous resolver
returned the first viable candidate via pickOverload's fallback.

Extended isOverloadAmbiguousAfterNormalization to take an optional
argCount: when provided, the predicate compares only the first
argCount slots of each candidate's parameterTypes. Candidates whose
declared-prefix matches up to argCount are treated as ambiguous
because default arguments make all of them equally viable for the
call.

Without argCount, behavior is unchanged (the original int/long
normalization-collapse contract, full-length equality required).
pickOverload now passes site.arity so default-arg ambiguity fires.

Test: cpp-overload-default-arg-ambiguous fixture. s.f(1) where S has
f(int) and f(int, int = 0) asserts exactly .toBe(0) CALLS edges.
Passes under both REGISTRY_PRIMARY_CPP=1 and =0.

All 2114 resolver integration tests pass; all 148 cpp tests pass
under both modes.
… (U3)

ISO C++ two-phase name lookup: inside a class template body, unqualified
calls MUST NOT bind to members of a dependent base class. Only this->name
or Base<T>::name forms make the lookup dependent. GCC and Clang both
reject the unqualified form with 'declaration of f must be available'.

Before this fix, GitNexus's global free-call fallback walked the
workspace registry by simple name and bound unqualified calls inside
template bodies to dependent-base members, producing CALLS edges the
compiler would reject.

Implementation:
- New languages/cpp/two-phase-lookup.ts module: per-pipeline state
  recording (className, dependentBaseName) pairs at capture time and
  resolving them to nodeId sets during populateOwners.
- captures.ts detectCppDependentBases walks the AST once finding every
  template_declaration containing a class/struct definition. For each,
  it collects template-parameter names (typename T, class T, non-type
  int N, template-template parameters) and walks each base in the
  base_class_clause checking whether any inner type_identifier matches
  a template parameter. Conservative bias: typename T::U, decltype,
  and template-template-parameter shapes also classified as dependent.
- Extended scope-resolution contract's isCallableVisibleFromCaller
  hook with optional callerScope and scopes fields. C++ implements
  the hook to consult isCppDependentBaseMember: when the candidate
  is a member of a dependent base of the caller's enclosing class,
  the hook returns false and pickUniqueGlobalCallable skips the
  candidate.
- clearFileLocalNames also clears the dependent-base state per
  pipeline run.

Fixtures:
- cpp-two-phase-dependent-base: Derived<T> deriving from Base<T>,
  unqualified f() and i inside Derived's body. Asserts zero CALLS
  edges and zero ACCESSES edges respectively.
- cpp-two-phase-this-qualified, cpp-two-phase-non-dependent-base,
  cpp-two-phase-namespace-free-call-inside-template: positive
  fixtures left as documented gaps (this-> and qualified-name
  resolution inside template bodies are pre-existing resolver
  weaknesses independent of U3). Tracked separately.

Negative test mode-gated to REGISTRY_PRIMARY_CPP=1 via the expected-
failures registry; legacy DAG has no two-phase lookup.

All 2116 resolver integration tests pass under registry-primary; all
150 cpp tests pass under both modes (5 negative tests skipped in legacy
as documented).
Plan 2026-05-13-001 U2. Adds argument-dependent lookup as a new
candidate-generating tier in `emitFreeCallFallback`: when ordinary
unqualified lookup is empty, ADL surfaces candidates from each
value-class-typed argument's enclosing namespace.

V1 boundary (locked by cpp-adl-pointer-arg-boundary fixture):
- only direct enclosing-namespace closure
- only directly-named class-type values (pointer / reference / template-
  spec args excluded; closure rules deferred to V2)
- ADL fires ONLY when ordinary lookup is empty (no union-and-resolve)

Parenthesized name `(f)(s)` suppresses ADL per ISO C++
[basic.lookup.argdep]/3.1. Multi-candidate ambiguity (e.g. `process(int)`
vs `process(long)` after C++ int-width normalization) returns the
ADL_AMBIGUOUS sentinel — caller suppresses entirely, mirroring the
OVERLOAD_AMBIGUOUS contract from plan 2026-05-12-002 U2.

Implementation:
- `cpp/adl.ts` — new module: per-pipeline argInfoBySite + noAdlSites Maps
  populated at capture time, classToNamespaceQualifiedName Map populated
  during populateOwners; `pickCppAdlCandidates` returns
  SymbolDefinition | ADL_AMBIGUOUS | undefined
- `scope-resolution/contract/scope-resolver.ts` — adds optional
  `resolveAdlCandidates` hook
- `scope-resolution/passes/free-call-fallback.ts` — invokes ADL hook
  between `findCallableBindingInScope` and `pickUniqueGlobalCallable`;
  marks site handled on `'ambiguous'` so emit-references doesn't retry
- `cpp/captures.ts` — detects `parenthesized_expression` function wrap;
  per-arg classification (pointer/reference/value class) preserving the
  shape info the existing arity-narrowing normalizer strips
- `cpp/scope-resolver.ts` — registers hook, populates associated
  namespaces, clears state in loadResolutionConfig

Negative tests (parens, pointer-boundary, ambiguous) gated under
LEGACY_RESOLVER_PARITY_EXPECTED_FAILURES.cpp — legacy DAG has no V1/V2
ADL boundary or ADL_AMBIGUOUS suppression.

154/154 cpp integration tests pass under REGISTRY_PRIMARY_CPP=1;
147 pass + 7 skipped under =0 (legacy parity baseline).
…esolution (U5)

Plan 2026-05-13-001 U5. Two ISO C++ inline-namespace semantics:

1. Unqualified-lookup transitive visibility: inline-namespace members
   reach the enclosing namespace's scope as if declared there. The
   `populateCppNonGloballyVisible` exemption keeps them globally visible
   so cross-file unqualified lookup finds them.

2. Qualified-receiver transitive visibility: `outer::foo()` resolves to
   `outer::v1::foo()` when `v1` is inline (and through arbitrarily-deep
   nesting like `outer::v1::experimental::foo`, matching libc++ `__1` /
   libstdc++ `__cxx11`).

The second behavior required a new resolver case in
`receiver-bound-calls.ts` (Case 1.5: language-specific qualified-receiver
member lookup) because C++ qualified-namespace member calls had no prior
resolution path — receiver-bound Case 1 only handled
`ParsedImport.kind === 'namespace'` (Python/JS-style) and Case 2 handles
class receivers, neither of which fired for `outer::foo()`. The new
hook `resolveQualifiedReceiverMember` is opt-in; languages without
C++-style qualified-name semantics omit it.

Implementation:
- `cpp/inline-namespaces.ts` — new module: per-pipeline
  `inlineNamespaceRangesByFile` + `inlineNamespaceScopeIds` Sets;
  `markCppInlineNamespaceRange` at capture time;
  `populateCppInlineNamespaceScopes` resolves ranges → scope IDs;
  `resolveCppQualifiedNamespaceMember` walks namespace scopes by simple
  name and descends transitively through inline children only.
- `scope-resolution/contract/scope-resolver.ts` — adds optional
  `resolveQualifiedReceiverMember` hook to the contract.
- `scope-resolution/passes/receiver-bound-calls.ts` — Case 1.5 invokes
  the hook between Case 1 (namespace imports) and Case 2 (class-name
  receiver). Returns undefined for non-namespace receivers so Case 2
  still resolves class-qualified calls.
- `cpp/captures.ts` — detects `inline` keyword child on
  `namespace_definition`; records 1-based range to match Scope.range.
- `cpp/file-local-linkage.ts` — `populateCppNonGloballyVisible` exempts
  inline-namespace scopes so cross-file unqualified lookup keeps their
  members visible.
- `cpp/scope-resolver.ts` — wires `populateCppInlineNamespaceScopes`
  into populateOwners (BEFORE `populateCppNonGloballyVisible` so the
  exemption sees populated state); registers
  `resolveQualifiedReceiverMember` hook.

4 fixtures: `cpp-inline-namespace-unqualified`, `-versioned`,
`-nested` (two transitive inline hops, STL `__1` shape), and
`-adl-participation` (composes with U2 — ADL surfaces records declared
inside inline child namespaces). All 4 assert exactly 1 CALLS edge with
correct target file.

Versioned fixture gated under LEGACY_RESOLVER_PARITY_EXPECTED_FAILURES.cpp
— legacy DAG can't disambiguate two same-name foos without inline
awareness. Other 3 coincidentally resolve in legacy.

158/158 cpp integration tests pass under REGISTRY_PRIMARY_CPP=1;
150 pass + 8 skipped under =0 (legacy parity baseline).
Plan 2026-05-13-001 Phase 5. Locks in correct behavior at the
intersections between the previously-shipped scope-resolver units.

Enhancement to U1: `isSuperReceiverInContext` strips template-argument
lists (`Base<T>` → `Base`) and namespace prefixes (`outer::v1::Base` →
`Base`) before resolving the receiver in the caller's scope chain. This
makes the super-receiver classification work for template-class
heritage shapes like `Base<T>::method()` and `outer::v1::Base<T>::f()`.

Three fixtures + four tests:

- `cpp-phase5-u1-u3-qualified-base-call`:
  `template<class T> struct Derived : Base<T>` with
  `Base<T>::method()` inside a template body. Asserts NO mis-routing
  (count = 0) — documents the V1 gap that template-class inheritance
  isn't captured as EXTENDS by the legacy DAG, so MRO walks are empty
  and the super branch can't dispatch. The composition still works
  correctly: U1's template-arg-stripping classifies `Base<T>` as a
  super candidate, but the empty-MRO terminates without false edges.

- `cpp-phase5-u2-u3-adl-from-derived`:
  `Derived : Base<T>` where `Base::record` shadows `audit::record`.
  Unqualified `record(e)` inside the template body should resolve via
  ADL to `audit::record` (because U3 + the `isFileLocalDef` class-
  owned filter suppress `Base::record`). Asserts 1 edge to audit.h
  and 0 edges to base.h.

- `cpp-phase5-u3-u5-inline-base`:
  `template<class T> struct Derived : outer::v1::Base<T>` where `v1`
  is inline. Unqualified `f()` inside `Derived<T>::g()` should NOT
  bind to Base::f (dependent-base suppression even across inline
  namespace prefix). Asserts count = 0.

Phase 5 tests asserting no-false-positives are gated under
LEGACY_RESOLVER_PARITY_EXPECTED_FAILURES.cpp — legacy DAG over-
resolves without the template-arg-stripping qualified-receiver path
and without two-phase dependent-base suppression.

162/162 cpp integration tests pass under REGISTRY_PRIMARY_CPP=1;
152 pass + 10 skipped under =0 (legacy parity baseline).
@magyargergo magyargergo merged commit e01f091 into abhigyanpatwari:main May 14, 2026
29 of 30 checks passed
hohaivu pushed a commit to hohaivu/GitNexus that referenced this pull request May 19, 2026
…ri#938) (abhigyanpatwari#1520)

* fix(cpp): complete scope-resolution parity

* fix(ci): resolve formatting, lint errors for PR abhigyanpatwari#1520

- prettier: format arity-metadata.ts, captures.ts, index.ts
- eslint: rename unused HEADER_GLOB to _HEADER_GLOB
- eslint: replace unsafe parser.parse() with parseSourceSafe()
- eslint: suppress intentional console.warn/log in sync.ts
- eslint: remove unused _it import alias in cpp.test.ts

* fix(ci): complete formatting, lint, and typecheck fixes

- prettier: format call-processor.ts, imported-return-types.ts,
  include-extractor.test.ts, cpp-captures.test.ts, cpp-imports.test.ts
- eslint: suppress intentional console.warn in manifest-extractor.ts
- typecheck: restore 'thrift' in ContractType union (was accidentally
  removed) and add thrift case to exhaustive switch in manifest-extractor

* fix(ci): revert unintended group module changes that broke tests

Restore types.ts, config-parser.ts, matching.ts, sync.ts, and
manifest-extractor.ts to upstream/main versions. The original commit
accidentally removed fields (thrift, workspace_deps, exclude_links_paths,
exclude_links_param_only_paths) from DetectConfig/MatchingConfig/ContractType
which are still referenced by matching.test.ts, config-parser.test.ts,
sync.test.ts and other integration tests.

This PR's scope is C++ scope-resolution parity only — group module
type definitions and logic should remain unchanged.

* fix(codeql): address security and quality alerts

- arity-metadata.ts, interpret.ts: replace single-pass template strip
  regex (/<[^>]*>/g) with a while-loop to fully handle nested templates
  like Map<List<int>> — resolves 'Incomplete multi-character sanitization'
- cpp.test.ts: remove unused vitest 'it' import since the file defines
  its own 'it' via createResolverParityIt — resolves 'Assignment to constant'
- include-extractor.test.ts: use fs.mkdtempSync() instead of predictable
  os.tmpdir()+Date.now() paths — resolves 'Insecure temporary file'
- interpret.ts: remove redundant 'name !== undefined' check (already
  guaranteed by early return) — resolves 'Comparison between inconvertible types'

* review: address Claude review findings on PR abhigyanpatwari#1520

- Findings 1-3 (BLOCKERS): restore include-extractor.ts and its test to
  the main baseline. Block-comment fallback regression, suffix-resolve
  false-positive suppression, and the four deleted regression tests
  (abhigyanpatwari#3-abhigyanpatwari#6) are now back. These changes were unrelated to C++ scope
  parity and should not have been in this PR.

- Finding 4 (MAJOR, partial): revert COMPOUND_RECEIVER_MAX_DEPTH 6 to
  4. No C++ test exercises depth > 4 (cpp-chain-call uses a 2-hop
  chain), so the bump risked silent regressions on other migrated
  languages without justification. The wildcard-origin propagation in
  imported-return-types.ts is retained — C++ #include and using
  namespace both emit wildcard-origin bindings (cpp/import-decomposer
  .ts:40,90), so wildcard propagation is causal to C++ parity.

- Finding 6: tighten write-access dedup test with exact per-field
  counts (nameWrites = 2, addrWrites = 1) instead of total-count + sub
  string containment, so a regression in one of the two name writes
  can no longer be masked.

- Finding 8: skipped. Box-drawing characters in cpp/query.ts comments
  match the established convention used in csharp/java/php query
  files.

Finding 5 (int/long normalization tie-breaker) left as documented
follow-up — proper fix requires resolver-level tie-breaker logic and
risks regressing other arity-matching tests.

* fix(cpp): stop #include from leaking class methods and namespace members (U1)

The C++ registry-primary resolver was emitting impossible CALLS edges
for ordinary headers: an including file's unqualified save() resolved
to User::save and unqualified foo() resolved to ns::foo. Two leak
paths converged on localDefs:

1. expandCppWildcardNames (file-local-linkage.ts) iterated the
   flattened localDefs and exported every simple tail, including
   class-owned methods and namespace-contained symbols. Replaced with
   a scope-aware filter: build nodeId -> owning Scope from
   Scope.ownedDefs and skip defs whose owning scope is Namespace or
   Class.

2. The shared global free-call fallback's pickUniqueGlobalCallable
   walks the workspace registry by simple name and would still hit
   class methods / namespace members even with wildcard expansion
   fixed. Plugged the gap via the existing isFileLocalDef hook —
   semantically 'logically invisible cross-file' — by tracking per-
   file non-globally-visible nodeIds (populateCppNonGloballyVisible,
   called from populateOwners) and adding an ownerId !== undefined
   fast-path for class-owned defs.

Side fix in shared finalize-algorithm.ts: when wildcard expansion
resolves to a real target but produces zero propagating names, the
edge was dropped, taking the file-level IMPORTS edge with it.
Preserve the original wildcard edge so #include dependencies survive
even when the header exposes no unqualified bindings.

Tests: cpp-include-no-class-leak, cpp-include-no-namespace-leak, and
cpp-anon-ns-same-file-visible fixtures. Negative tests mode-gated to
REGISTRY_PRIMARY_CPP=1 via the expected-failures registry — legacy
DAG has no scope-aware filtering on the global fallback; backporting
is out of scope. All 2104 resolver integration tests pass under
registry-primary mode.

* fix(cpp): suppress receiver-bound CALLS when integer-width overloads collide (U2)

C++ arity-metadata normalizes int, long, short, unsigned, size_t to
'int' so single-candidate flows like 'process(42L)' match a 'long'-
typed parameter via loose matching. But when both 'process(int)' and
'process(long)' coexist as method overloads, they both end up with
parameterTypes=['int'] in the registry, and pickOverload's narrowing
returns 2 candidates with no way to disambiguate. The previous code
picked candidates[0] arbitrarily, emitting a CALLS edge to the wrong
overload roughly half the time.

Fix:
- Add isOverloadAmbiguousAfterNormalization in overload-narrowing.ts
  that detects >1 candidate sharing identical parameterTypes sequences.
- Have pickOverload return a new OVERLOAD_AMBIGUOUS sentinel when this
  fires.
- In the receiver-bound-calls loop, when pickOverload signals ambiguity,
  suppress the edge AND add the site to handledSites so the late-stage
  emitReferencesViaLookup pass does not re-emit the pre-resolved
  reference. Without the handled-mark, the reference index still
  carries a toDef and emits the same wrong edge.

Graph schema has no ambiguous-target edge model, so emitting two
edges (one per candidate) would require a separate schema change.
Zero-edge is the only safe outcome.

Other languages: the ambiguity check is a precondition gate, not a
behavior change for normal narrowing. Languages whose normalizers do
not collapse distinct types into a single token (verified by grep
over *-arity-metadata.ts) will never produce >1 candidate with
identical parameterTypes from genuinely distinct declarations, so
the branch is effectively C++-only in practice.

Test: cpp-overload-int-long fixture asserts exactly .toBe(0) CALLS
edges. Count=1 = arbitrary pick (the bug); count>1 = unsupported
ambiguous-edge model. Mode-gated to REGISTRY_PRIMARY_CPP=1 — legacy
DAG has no OVERLOAD_AMBIGUOUS wiring; backporting is out of scope.

All 2105 resolver integration tests pass under registry-primary; all
139 cpp tests pass under both modes (3 negative tests skipped in
legacy as documented).

* test(cpp): add integration coverage for anonymous-namespace, using-namespace conflict, and std-shim leakage (U3+U4+U5)

Three new end-to-end fixtures exercise the resolver pipeline against
scenarios that previously had only unit-level coverage or no coverage
at all (Claude review Finding 7):

U3 — cpp-anon-ns-cross-file:
  helper.cpp declares 'namespace { void worker(); }' and calls it
  internally. caller.cpp declares a separate 'void worker()' and calls
  it. Asserts (a) the cross-file CALLS edge from caller's run() does
  not target helper.cpp's anonymous-namespace worker, and (b) the
  same-file edge from helper_entry() to its own worker still resolves
  (positive guard against a 'no edges at all' regression making the
  negative check vacuously pass). Includes a state-isolation guard
  that re-runs the same fixture and asserts identical results,
  proving clearFileLocalNames() is called by the pipeline entry.

U4 — cpp-using-namespace-conflict:
  Two headers each declaring 'namespace a { foo() }' and
  'namespace b { foo() }' respectively, plus a caller doing
  'using namespace a; using namespace b; foo()'. Asserts exactly
  zero CALLS edges. One edge = arbitrary pick (the bug); two edges
  would require an ambiguous-target edge model GitNexus does not
  have. Depends on U1 — without scope-aware filtering, both foo()s
  would already be in the importer's wildcard binding set as simple
  'foo', so the test would pass for the wrong reason.

U5 — cpp-using-namespace-std-smoke:
  Fixture-local 'namespace std { void cout_write(); void println(); }'
  shim rather than real <iostream> — captures the wildcard-leak
  shape deterministically without depending on system-header modeling
  stability (out of scope per plan). Asserts (a) the project-local
  call resolves correctly, (b) no leak to shim STL symbols, and (c)
  no CALLS/ACCESSES edges from the caller into std-shim.h at all.

Negative tests for U2/U4 mode-gated to REGISTRY_PRIMARY_CPP=1 via
the expected-failures registry; legacy DAG lacks the OVERLOAD_AMBIGUOUS
suppression and the namespace-aware filtering, so the leaks persist
there. All 2112 resolver integration tests pass under registry-primary;
all 146 cpp tests pass under both modes (4 negative tests skipped in
legacy as documented).

* chore(autofix): apply prettier + eslint fixes via /autofix command

* fix(cpp): scope-aware isSuperReceiver classification (U1)

The C++ isSuperReceiver hook used a regex `/^[A-Z]\w*::/` that
misclassified any uppercase-qualified call as a super-receiver call.
Singleton::getInstance(), std::Foo::bar(), and PascalCase namespace
calls all entered the super branch, where the absence of an enclosing
class (or wrong MRO context) dropped the resolution entirely.

Fix:
- New optional ScopeResolver hook isSuperReceiverInContext(text,
  callerScope, scopes). Languages where super classification depends
  on caller context define it; receiver-bound-calls.ts prefers it
  when defined and falls back to the simple isSuperReceiver(text)
  otherwise. Other migrated languages (Python, Java, C#, PHP, Go,
  TypeScript) are unchanged.
- C++ implementation: parse the LHS of '::' from the receiver text,
  resolve via findClassBindingInScope, and return true only when
  the LHS is a class-like def in the caller's enclosing class's MRO.
  Returns false for namespace LHS, unresolved LHS, self-class LHS
  (qualified self-calls aren't super), and any non-'::' form.
- Extended the C++ tree-sitter query to capture the LHS of
  qualified_identifier as @reference.receiver so qualified static
  member calls (Singleton::getInstance()) reach the receiver-bound
  Case 2 (class-name receiver) path. Without the receiver capture,
  qualified calls had no explicit receiver and could not resolve
  through any receiver-bound branch.

Test: cpp-namespace-qualified-not-super fixture. Singleton::getInstance()
from a free function asserts exactly 1 CALLS edge through the
qualified-call path. Passes under both REGISTRY_PRIMARY_CPP=1 and =0.

All 2113 resolver integration tests pass; all 147 cpp tests pass under
both modes.

* fix(cpp): suppress receiver-bound CALLS when default-arg overloads collide (U4)

ISO C++ rejects 's.f(1)' as ambiguous when both 'void f(int)' and
'void f(int, int = 0)' are declared on S. The previous resolver
returned the first viable candidate via pickOverload's fallback.

Extended isOverloadAmbiguousAfterNormalization to take an optional
argCount: when provided, the predicate compares only the first
argCount slots of each candidate's parameterTypes. Candidates whose
declared-prefix matches up to argCount are treated as ambiguous
because default arguments make all of them equally viable for the
call.

Without argCount, behavior is unchanged (the original int/long
normalization-collapse contract, full-length equality required).
pickOverload now passes site.arity so default-arg ambiguity fires.

Test: cpp-overload-default-arg-ambiguous fixture. s.f(1) where S has
f(int) and f(int, int = 0) asserts exactly .toBe(0) CALLS edges.
Passes under both REGISTRY_PRIMARY_CPP=1 and =0.

All 2114 resolver integration tests pass; all 148 cpp tests pass
under both modes.

* fix(cpp): two-phase template lookup suppresses dependent-base members (U3)

ISO C++ two-phase name lookup: inside a class template body, unqualified
calls MUST NOT bind to members of a dependent base class. Only this->name
or Base<T>::name forms make the lookup dependent. GCC and Clang both
reject the unqualified form with 'declaration of f must be available'.

Before this fix, GitNexus's global free-call fallback walked the
workspace registry by simple name and bound unqualified calls inside
template bodies to dependent-base members, producing CALLS edges the
compiler would reject.

Implementation:
- New languages/cpp/two-phase-lookup.ts module: per-pipeline state
  recording (className, dependentBaseName) pairs at capture time and
  resolving them to nodeId sets during populateOwners.
- captures.ts detectCppDependentBases walks the AST once finding every
  template_declaration containing a class/struct definition. For each,
  it collects template-parameter names (typename T, class T, non-type
  int N, template-template parameters) and walks each base in the
  base_class_clause checking whether any inner type_identifier matches
  a template parameter. Conservative bias: typename T::U, decltype,
  and template-template-parameter shapes also classified as dependent.
- Extended scope-resolution contract's isCallableVisibleFromCaller
  hook with optional callerScope and scopes fields. C++ implements
  the hook to consult isCppDependentBaseMember: when the candidate
  is a member of a dependent base of the caller's enclosing class,
  the hook returns false and pickUniqueGlobalCallable skips the
  candidate.
- clearFileLocalNames also clears the dependent-base state per
  pipeline run.

Fixtures:
- cpp-two-phase-dependent-base: Derived<T> deriving from Base<T>,
  unqualified f() and i inside Derived's body. Asserts zero CALLS
  edges and zero ACCESSES edges respectively.
- cpp-two-phase-this-qualified, cpp-two-phase-non-dependent-base,
  cpp-two-phase-namespace-free-call-inside-template: positive
  fixtures left as documented gaps (this-> and qualified-name
  resolution inside template bodies are pre-existing resolver
  weaknesses independent of U3). Tracked separately.

Negative test mode-gated to REGISTRY_PRIMARY_CPP=1 via the expected-
failures registry; legacy DAG has no two-phase lookup.

All 2116 resolver integration tests pass under registry-primary; all
150 cpp tests pass under both modes (5 negative tests skipped in legacy
as documented).

* fix(cpp): implement V1 ADL (Koenig lookup) for free-function calls (U2)

Plan 2026-05-13-001 U2. Adds argument-dependent lookup as a new
candidate-generating tier in `emitFreeCallFallback`: when ordinary
unqualified lookup is empty, ADL surfaces candidates from each
value-class-typed argument's enclosing namespace.

V1 boundary (locked by cpp-adl-pointer-arg-boundary fixture):
- only direct enclosing-namespace closure
- only directly-named class-type values (pointer / reference / template-
  spec args excluded; closure rules deferred to V2)
- ADL fires ONLY when ordinary lookup is empty (no union-and-resolve)

Parenthesized name `(f)(s)` suppresses ADL per ISO C++
[basic.lookup.argdep]/3.1. Multi-candidate ambiguity (e.g. `process(int)`
vs `process(long)` after C++ int-width normalization) returns the
ADL_AMBIGUOUS sentinel — caller suppresses entirely, mirroring the
OVERLOAD_AMBIGUOUS contract from plan 2026-05-12-002 U2.

Implementation:
- `cpp/adl.ts` — new module: per-pipeline argInfoBySite + noAdlSites Maps
  populated at capture time, classToNamespaceQualifiedName Map populated
  during populateOwners; `pickCppAdlCandidates` returns
  SymbolDefinition | ADL_AMBIGUOUS | undefined
- `scope-resolution/contract/scope-resolver.ts` — adds optional
  `resolveAdlCandidates` hook
- `scope-resolution/passes/free-call-fallback.ts` — invokes ADL hook
  between `findCallableBindingInScope` and `pickUniqueGlobalCallable`;
  marks site handled on `'ambiguous'` so emit-references doesn't retry
- `cpp/captures.ts` — detects `parenthesized_expression` function wrap;
  per-arg classification (pointer/reference/value class) preserving the
  shape info the existing arity-narrowing normalizer strips
- `cpp/scope-resolver.ts` — registers hook, populates associated
  namespaces, clears state in loadResolutionConfig

Negative tests (parens, pointer-boundary, ambiguous) gated under
LEGACY_RESOLVER_PARITY_EXPECTED_FAILURES.cpp — legacy DAG has no V1/V2
ADL boundary or ADL_AMBIGUOUS suppression.

154/154 cpp integration tests pass under REGISTRY_PRIMARY_CPP=1;
147 pass + 7 skipped under =0 (legacy parity baseline).

* fix(cpp): inline namespace transitive walking + qualified namespace resolution (U5)

Plan 2026-05-13-001 U5. Two ISO C++ inline-namespace semantics:

1. Unqualified-lookup transitive visibility: inline-namespace members
   reach the enclosing namespace's scope as if declared there. The
   `populateCppNonGloballyVisible` exemption keeps them globally visible
   so cross-file unqualified lookup finds them.

2. Qualified-receiver transitive visibility: `outer::foo()` resolves to
   `outer::v1::foo()` when `v1` is inline (and through arbitrarily-deep
   nesting like `outer::v1::experimental::foo`, matching libc++ `__1` /
   libstdc++ `__cxx11`).

The second behavior required a new resolver case in
`receiver-bound-calls.ts` (Case 1.5: language-specific qualified-receiver
member lookup) because C++ qualified-namespace member calls had no prior
resolution path — receiver-bound Case 1 only handled
`ParsedImport.kind === 'namespace'` (Python/JS-style) and Case 2 handles
class receivers, neither of which fired for `outer::foo()`. The new
hook `resolveQualifiedReceiverMember` is opt-in; languages without
C++-style qualified-name semantics omit it.

Implementation:
- `cpp/inline-namespaces.ts` — new module: per-pipeline
  `inlineNamespaceRangesByFile` + `inlineNamespaceScopeIds` Sets;
  `markCppInlineNamespaceRange` at capture time;
  `populateCppInlineNamespaceScopes` resolves ranges → scope IDs;
  `resolveCppQualifiedNamespaceMember` walks namespace scopes by simple
  name and descends transitively through inline children only.
- `scope-resolution/contract/scope-resolver.ts` — adds optional
  `resolveQualifiedReceiverMember` hook to the contract.
- `scope-resolution/passes/receiver-bound-calls.ts` — Case 1.5 invokes
  the hook between Case 1 (namespace imports) and Case 2 (class-name
  receiver). Returns undefined for non-namespace receivers so Case 2
  still resolves class-qualified calls.
- `cpp/captures.ts` — detects `inline` keyword child on
  `namespace_definition`; records 1-based range to match Scope.range.
- `cpp/file-local-linkage.ts` — `populateCppNonGloballyVisible` exempts
  inline-namespace scopes so cross-file unqualified lookup keeps their
  members visible.
- `cpp/scope-resolver.ts` — wires `populateCppInlineNamespaceScopes`
  into populateOwners (BEFORE `populateCppNonGloballyVisible` so the
  exemption sees populated state); registers
  `resolveQualifiedReceiverMember` hook.

4 fixtures: `cpp-inline-namespace-unqualified`, `-versioned`,
`-nested` (two transitive inline hops, STL `__1` shape), and
`-adl-participation` (composes with U2 — ADL surfaces records declared
inside inline child namespaces). All 4 assert exactly 1 CALLS edge with
correct target file.

Versioned fixture gated under LEGACY_RESOLVER_PARITY_EXPECTED_FAILURES.cpp
— legacy DAG can't disambiguate two same-name foos without inline
awareness. Other 3 coincidentally resolve in legacy.

158/158 cpp integration tests pass under REGISTRY_PRIMARY_CPP=1;
150 pass + 8 skipped under =0 (legacy parity baseline).

* test(cpp): Phase 5 cross-unit composition tests for U1/U2/U3/U5

Plan 2026-05-13-001 Phase 5. Locks in correct behavior at the
intersections between the previously-shipped scope-resolver units.

Enhancement to U1: `isSuperReceiverInContext` strips template-argument
lists (`Base<T>` → `Base`) and namespace prefixes (`outer::v1::Base` →
`Base`) before resolving the receiver in the caller's scope chain. This
makes the super-receiver classification work for template-class
heritage shapes like `Base<T>::method()` and `outer::v1::Base<T>::f()`.

Three fixtures + four tests:

- `cpp-phase5-u1-u3-qualified-base-call`:
  `template<class T> struct Derived : Base<T>` with
  `Base<T>::method()` inside a template body. Asserts NO mis-routing
  (count = 0) — documents the V1 gap that template-class inheritance
  isn't captured as EXTENDS by the legacy DAG, so MRO walks are empty
  and the super branch can't dispatch. The composition still works
  correctly: U1's template-arg-stripping classifies `Base<T>` as a
  super candidate, but the empty-MRO terminates without false edges.

- `cpp-phase5-u2-u3-adl-from-derived`:
  `Derived : Base<T>` where `Base::record` shadows `audit::record`.
  Unqualified `record(e)` inside the template body should resolve via
  ADL to `audit::record` (because U3 + the `isFileLocalDef` class-
  owned filter suppress `Base::record`). Asserts 1 edge to audit.h
  and 0 edges to base.h.

- `cpp-phase5-u3-u5-inline-base`:
  `template<class T> struct Derived : outer::v1::Base<T>` where `v1`
  is inline. Unqualified `f()` inside `Derived<T>::g()` should NOT
  bind to Base::f (dependent-base suppression even across inline
  namespace prefix). Asserts count = 0.

Phase 5 tests asserting no-false-positives are gated under
LEGACY_RESOLVER_PARITY_EXPECTED_FAILURES.cpp — legacy DAG over-
resolves without the template-arg-stripping qualified-receiver path
and without two-phase dependent-base suppression.

162/162 cpp integration tests pass under REGISTRY_PRIMARY_CPP=1;
152 pass + 10 skipped under =0 (legacy parity baseline).

---------

Co-authored-by: HuangWenjie <zhoudeng.hwj@alibaba-inc.com>
Co-authored-by: Gergo Magyar <gergomagyar@icloud.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LANG-cpp: Migrate C++ to scope-based resolution

3 participants