Skip to content

feat(ingestion): Add C++ parameter type class sidecar#1642

Merged
magyargergo merged 3 commits into
abhigyanpatwari:mainfrom
azizur100389:codex/cpp-type-class-sidecar
May 16, 2026
Merged

feat(ingestion): Add C++ parameter type class sidecar#1642
magyargergo merged 3 commits into
abhigyanpatwari:mainfrom
azizur100389:codex/cpp-type-class-sidecar

Conversation

@azizur100389

Copy link
Copy Markdown
Contributor

Summary

  • add an additive parameterTypeClasses sidecar on SymbolDefinition
  • preserve C++ cv/ref/pointer shape from parameter declarations while keeping coarse parameterTypes unchanged
  • thread the sidecar through C++ scope captures and symbol-table metadata
  • document that the sidecar does not participate in graph node identity

Fixes #1630

Tests

  • npx vitest run test/unit/scope-resolution/cpp/cpp-arity.test.ts
  • npx tsc --noEmit
  • npx vitest run test/integration/resolvers/cpp.test.ts
  • REGISTRY_PRIMARY_CPP=0 npx vitest run test/integration/resolvers/cpp.test.ts
  • git diff --check

Note: npx gitnexus impact ... / detect_changes ... currently exit 1 with no output in this checkout because the local GitNexus index/registry is unavailable, so the change was kept narrowly scoped and validated manually with the C++ resolver suites.

@vercel

vercel Bot commented May 16, 2026

Copy link
Copy Markdown

@azizur100389 is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
9136 9135 0 1 455s

✅ All 9135 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 78.37% 29328/37421 N/A% 🟢 ███████████████░░░░░
Branches 66.84% 18664/27920 N/A% 🟢 █████████████░░░░░░░
Functions 83.28% 2930/3518 N/A% 🟢 ████████████████░░░░
Lines 81.66% 26455/32394 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo changed the title Add C++ parameter type class sidecar feat(ingestion): Add C++ parameter type class sidecar May 16, 2026
@github-actions

github-actions Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 7m 46s —— View job


Adversarial Production-Readiness Review — PR #1642

Review completed. Adversarial inspection of all 9 changed files, graph identity code, resolver passes, converter rank, and ingestion threading.

Todo List

  • Post initial comment
  • Read repo context files (DoD.md, AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md)
  • Inspect all 9 changed files
  • Inspect related files (graph identity, resolver passes, conversion-rank, arity.ts, etc.)
  • Run diff verification and hidden Unicode checks
  • Trace sidecar data flow end-to-end
  • Verify graph identity code (ids.ts)
  • Check CI status
  • Compile findings and post full review

Review Bar for This PR

This review applies the full multi-lane adversarial bar: (A) shared schema + graph identity, (B) C++ arity + type-shape classification, (C) ingestion/capture/worker threading, (D) resolver behavior preservation, (E) tests + CI + hidden Unicode. Each lane is independently merge-blocking.


Current PR State

Attribute Value
Commit count 1 (0b8630a)
Changed files 9 (exact match with stated list)
CI (from PR comment) ✅ 9108/9108 passed
Typecheck ✅ tsc --noEmit success
E2E ✅ (gitnexus-web scope; not applicable here)

Branch Hygiene Assessment

Clean feature/fix PR. Verified locally: git log --oneline shows exactly 1 commit ahead of main. git diff --name-only HEAD~1 HEAD lists exactly the 9 expected sidecar files. No unrelated overload-resolution, ADL, type-traits, workflow, formatting, generated registry, or graph-ID churn is present.


Understanding of the Change

This PR adds an optional parameterTypeClasses?: ParameterTypeClass[] sidecar on SymbolDefinition. The ParameterTypeClass type stores { base, cv, indirection, pointerDepth } per parameter. normalizeCppParamType is intentionally left coarse and unchanged. The sidecar is computed by classifyCppParameterType inside computeCppDeclarationArity, serialised as a JSON synthetic capture @declaration.parameter-type-classes, validated and deserialised in parseJsonParameterTypeClassesCapture inside the scope-extractor, and threaded through the symbol table, chunk merge, and worker ParsedSymbol. No resolver reads it yet.


Findings

Finding 1 — @declaration.parameter-type-classes absent from KNOWN_SUB_TAGS

Severity: Low / hygiene
Evidence: scope-extractor.ts:1016–1031KNOWN_SUB_TAGS includes @declaration.parameter-types, @declaration.parameter-count, @declaration.required-parameter-count as excluded-from-anchor candidates, but does not include @declaration.parameter-type-classes. In anchorCaptureFor, any capture name NOT in KNOWN_SUB_TAGS is eligible to become the declaration anchor.
Risk: In practice the real anchor (e.g. @declaration.function, from tree-sitter query) wins because (a) it is added to grouped before the synthetic captures, and (b) the span comparison uses strict > so ties preserve the first entry. However this creates a fragile correctness dependency on iteration order and span equality. Any future refactoring that reorders the grouped map construction or uses a different synthetic node range would silently make @declaration.parameter-type-classes the anchor, causing malformed definitions to flow into the DAG.
Recommended fix: Add '@declaration.parameter-type-classes' to KNOWN_SUB_TAGS in scope-extractor.ts.
Blocks merge: No — present behavior is correct; this is a maintenance hazard. Fix this →


Finding 2 — cv field is a coarse "any-cv-present" signal, not a top-level vs. pointee cv decomposition

Severity: Medium (for future consumers)
Evidence: arity-metadata.ts:186–195classifyCppParameterType detects const and volatile by matching /\bconst\b/ and /\bvolatile\b/ anywhere in source (the full parameter spelling). This means const int* p (pointer-to-const) and int* const p (const-pointer) both produce cv: 'const'. They are ISO C++ distinct types: std::is_same_v<const int*, int* const> is false. The field comment reads "Top-level cv signal preserved from the original C++ parameter spelling" — "Top-level cv" is misleading because the implementation cannot distinguish these two cases.
Risk: Issue #1630's stated future consumers include std::is_same_v<T, int*>, std::is_same_v<T, int&>, and qualification-conversion consumers for #1629/#1637. If any of those consumers treat cv as a precise top-level-cv field they will conflate const int* with int* const and emit wrong type-trait or conversion-rank results.
Recommended fix: Rename the comment to explicitly say this is a coarse "cv-qualifier present anywhere in the spelling" signal (not a decomposed top-level vs. pointee cv). Add a JSDoc note listing the known limitation. Alternatively, implement parser-aware decomposition, but the coarse signal documented honestly is acceptable for the sidecar's current consumers.
Blocks merge: No — no consumer currently reads cv. Minor follow-up required before #1629/#1637 land. Fix this →


Finding 3 — Regex pointer-depth misclassifies complex declarators

Severity: Medium (for future consumers)
Evidence: arity-metadata.ts:197pointerDepth is computed as (source.match(/\*/g) ?? []).length, counting ALL * characters in the full parameter spelling. This misclassifies:

Declarator Expected Actual
int (*fp)(int) unknown (function pointer) indirection: 'pointer', pointerDepth: 2
void (*cb)() unknown (function pointer) indirection: 'pointer', pointerDepth: 1
std::vector<int*> v indirection: 'value', pointerDepth: 0 indirection: 'pointer', pointerDepth: 1
int C::* mp unknown (pointer-to-member) indirection: 'pointer', pointerDepth: 1
int (&arr)[3] unknown indirection: 'lvalue-ref', pointerDepth: 0

Issue #1630 explicitly states: "missing sidecar data for unsupported forms such as function pointers or arrays should fall back to unknown rather than misclassify."
Risk: Future #1629/#1637 consumers reading pointerDepth or indirection will trust falsely precise values and may emit wrong conversion ranks or type-trait predicates. This is the highest-risk finding because the sidecar is designed to be trusted by downstream consumers.
Recommended fix: Before setting indirection: 'pointer', detect parenthesized declarators (* suggesting function-pointer or pointer-to-member syntax (regex \(\s*\*) and return unknownTypeClass(base) for those forms. Similarly detect <[^>]*\*[^>]*> for template params containing * and exclude those from the pointer count. This restricts reliable classification to simple scalar/ref/pointer forms.
Blocks merge: No for today (no consumer reads this yet). Yes before #1629/#1637 land without a conservative fallback. Fix this →


Finding 4 — Missing test coverage for edge declarator shapes and invariants

Severity: Low
Evidence: cpp-arity.test.ts covers int, const int*, int&, int&&, int* p (via classifyCppParameterType direct call), and coarse parameterTypes stability. Missing:

  • Function pointer parameter returning unknown (critical given Finding 3)
  • Array parameter int arr[] classification
  • int** (double pointer depth 2)
  • const int& (const on a reference — technically cv on the referenced object, not a reference)
  • const int* const p (const pointer to const)
  • std::vector<int*> or any template with embedded *
  • Explicit assertion: parameterTypes.length === parameterTypeClasses.length
  • Variadic function: parameterTypeClasses has same count as parameterTypes (including the ... slot)

Risk: Without a function-pointer test asserting unknown, the misclassification in Finding 3 cannot be caught by CI.
Recommended fix: Add test cases for the missing shapes listed above.
Blocks merge: No — but required alongside or before Finding 3 fix. Fix this →


PR-Specific Assessment Sections

Shared Schema and Graph Identity Safety — ✅ CLEAR

ParameterTypeClass is correctly exported from gitnexus-shared/src/index.ts:29 alongside SymbolDefinition. parameterTypeClasses is ? optional on SymbolDefinition at line 42. Non-C++ symbols require no migration. Old caches reading a SymbolDefinition without this field will simply have undefined — safe.

Graph identity audit: resolveDefGraphId (graph-bridge/ids.ts:72–115) constructs lookup keys using only qualifiedName, type, parameterTypes, and templateArguments. parameterTypeClasses is not referenced anywhere in graph-bridge/. The makeDefId function in scope-extractor.ts:701–708 uses filePath, range, type, and name — no sidecar. Confirmed: graph node IDs and edge target IDs are unchanged.

C++ parameterTypes Stability — ✅ CLEAR

No duplicate push. The GitHub-rendered diff gave the false impression of two types.push() calls in the non-variadic branch. The actual file at arity-metadata.ts:88–94 contains exactly one types.push(normalizeCppParamType(rawType)) per non-variadic parameter. The typeClasses.push(...) is the sidecar accumulator — a separate array.

normalizeCppParamType is unchanged (arity-metadata.ts:141–176). int, const int*, int&, and int&& all produce coarse int as before. Variadic and C-style ... handling is unchanged.

types.length === typeClasses.length is structurally maintained: every branch of the for (const p of params) loop pushes exactly one element to both types and typeClasses, including the hasEllipsis tail append at lines 97–100.

parameterTypeClasses Sidecar Correctness — ⚠️ MINOR ISSUES

See Findings 2 and 3. The core structure (ParameterTypeClass with base, cv, indirection, pointerDepth) is well-designed. base correctly reuses normalizeCppParamType so it is consistent with the coarse parameterTypes vocabulary. unknownTypeClass is correctly applied for variadic ... parameters. Parsing validation in parseJsonParameterTypeClassesCapture is strict — all enum values and numeric pointerDepth are validated before acceptance.

Classifier Conservatism for cv/ref/pointer forms — ⚠️ NOT CONSERVATIVE ENOUGH

See Findings 2 and 3. Simple forms (int, int*, int&, int&&, const int*) are classified correctly. Complex forms (function pointers, pointer-to-member, template args with *, arrays) produce overconfident results rather than unknown. The cv field cannot distinguish top-level vs. pointee cv. This is a documented gap in Issue #1630 itself (fallback to unknown requirement), not met by this PR for the listed complex forms.

Ingestion/Capture/Worker Data-Flow Completeness — ✅ CLEAR WITH OBSERVATION

Data flow fully traced:

  1. computeCppDeclarationAritytypeClasses (arity-metadata.ts:78–106)
  2. emitCppScopeCapturessyntheticCapture('@declaration.parameter-type-classes', fnNode, JSON.stringify(arity.parameterTypeClasses)) (captures.ts:117–122)
  3. scope-extractor.ts:549–551parseJsonParameterTypeClassesCapture(match['@declaration.parameter-type-classes']) → stored in localDefs
  4. parsing-processor.ts:131parameterTypeClasses: sym.parameterTypeClasses in mergeChunkResults
  5. parse-worker.ts:2310parameterTypeClasses: methodProps.parameterTypeClasses (observation: this is undefined for C++ because MethodInfo doesn't carry parameterTypeClasses, but this is acceptable — the sidecar lands in parsedFiles via the scope-capture path, not in symbol table entries from the legacy DAG path)

Observation: The legacy DAG / methodExtractor symbol table path does not populate parameterTypeClasses. The sidecar is only available in ParsedFile.localDefs (scope-resolution path). This is architecturally intentional given the stated C++ migration to scope-resolution primary, but it means a consumer reading SymbolDefinition.parameterTypeClasses from the legacy symbol table index will see undefined. Future consumers must read from scope-resolution artifacts, not the legacy symbol table.

Resolver Behavior Preservation — ✅ CLEAR

overload-narrowing.ts, conversion-rank.ts, free-call-fallback.ts, receiver-bound-calls.ts, constraint-filter (absent), and scope-resolver.ts — none consume parameterTypeClasses. Confirmed by exhaustive grep across gitnexus/src/core/ingestion/scope-resolution/ and gitnexus/src/core/ingestion/languages/cpp/ with zero matches outside the producer files. ADL behavior, conversion sequence ranking (PR #1606), and legacy parity are unaffected.

Relationship to #1629, #1637, #1578/#1606, and #1579 — ADEQUATE FOUNDATION WITH CAVEATS

Tests and CI Coverage — ✅ CI CLEAR; TESTS INCOMPLETE

CI: 9108/9108 tests passed (all platforms). Typecheck clean. The cpp parity scope is covered within the test suite.

Unit tests cover the essential happy-path shapes. Missing cases (Finding 4) mean Finding 3's misclassification is not caught by CI. The three most important additions before #1629/#1637 land: function-pointer → unknown, template-with-pointer-arg → value not misclassified, and the parameterTypes.length === parameterTypeClasses.length invariant assertion.

Hidden Unicode / Hygiene — ✅ CLEAR

No bidi control characters found in any changed file. Non-ASCII characters in comments (, ) are em-dash and right-arrow in JSDoc comments, consistent with existing repository style throughout scope-extractor.ts and other files. The GitHub "hidden Unicode" warning appears to be triggered by these comment characters, not by any executable code or bidi controls.


Final Verdict

production-ready with minor follow-ups

The merge-blocking concerns from the DoD are all clear: (1) parameterTypes are not duplicated — the visible GitHub diff artifact is NOT present in the actual file; (2) graph node IDs are provably unchanged — resolveDefGraphId does not touch parameterTypeClasses; (3) no resolver consumes the sidecar; (4) no bidi controls; (5) CI is fully green at 9108/9108. The schema is additive, optional, and backward-compatible. The ingestion pipeline threading is complete for the scope-resolution path.

The non-blocking follow-ups that MUST land before #1629 or #1637 consume the sidecar are: (a) restrict classifier to return unknownTypeClass for function-pointer syntax ((*), templates with * inside angle-brackets, and pointer-to-member declarators (Finding 3); (b) clarify the cv field documentation to state it is a coarse "any-cv-present" signal, not a decomposed top-level vs. pointee cv (Finding 2); (c) add @declaration.parameter-type-classes to KNOWN_SUB_TAGS in scope-extractor.ts (Finding 1); and (d) add unit tests for function-pointer and template-pointer edge cases plus a length-parity assertion (Finding 4). None of these block today's metadata-only merge, but if any of findings 2–3 are unaddressed when #1629/#1637 land, they become merge-blocking at that point.

@magyargergo magyargergo mentioned this pull request May 16, 2026
9 tasks
@magyargergo magyargergo merged commit 2376912 into abhigyanpatwari:main May 16, 2026
29 of 30 checks passed
hohaivu pushed a commit to hohaivu/GitNexus that referenced this pull request May 19, 2026
…i#1642)

Co-authored-by: Gergő Magyar <gergomagyar@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cpp: type-class sidecar for normalizeCppParamType (cv/ref/pointer)

2 participants