Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 102 additions & 4 deletions gitnexus/src/core/ingestion/languages/cpp/adl.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,25 @@
* - `audit::Event& r`, `audit::Event&& rr`
* - `std::vector<audit::Event>` (template namespace + template-arg namespaces)
*
* Function-pointer arguments and the rest of the full closure are still
* deliberately excluded. V2 additionally walks class ancestors (via MRO),
* so base-class enclosing namespaces also contribute associated namespaces.
* V2 additionally walks class ancestors (via MRO), so base-class enclosing
* namespaces also contribute associated namespaces.
*
* **GitNexus approximation (not strict ISO C++ ADL):** passing a qualified
* function reference like `utils::worker` contributes `utils` to the associated
* set, enabling resolution of unqualified calls like `with_callback(utils::worker)`
* to `utils::with_callback`. Under ISO C++ `[basic.lookup.argdep]`, associated
* entities for function-type arguments come from the **parameter types and return
* type** of each function in the overload set — NOT the function's enclosing
* namespace. For `void worker()`, the standard-compliant associated set is empty.
* GitNexus instead contributes the enclosing namespace of any Function/Method
* def whose simple name matches, because it enables the dominant real-world ADL
* pattern at reasonable precision cost.
*
* For qualified refs (e.g. `utils::worker`) the namespace is confirmed via a
* workspace lookup (only contributed when a Function/Method named `worker` exists
* in `utils`). For unqualified refs the workspace is searched for any Function
* def with that simple name. Locally-declared function-pointer variables
* (e.g. `void (*g)()`) and function parameters are excluded from this path.
*
* The current implementation also short-circuits to ADL only when ordinary lookup is empty
* (`findCallableBindingInScope` returned undefined). ISO C++ would
Expand Down Expand Up @@ -65,6 +81,7 @@ import {
* Per-argument shape information collected at capture time. ADL fires for
* arguments where `simpleClassName !== ''`, including class pointers and
* references whose declarator chain resolves to a named class type.
* Free-function reference arguments use `functionRefText`.
*/
export interface CppAdlArgInfo {
/** Simple class-like type name (last segment of qualified name); empty
Expand All @@ -82,6 +99,15 @@ export interface CppAdlArgInfo {
/** Enclosing namespaces extracted from explicit type template arguments,
* recursively bounded. */
readonly templateArgNamespaces: readonly string[];
/** When set, the arg is a potential free-function reference (not a locally-
* declared function-pointer variable or function parameter). Contains the
* identifier text as written in source (e.g. `"utils::worker"` or
* `"worker"`). GitNexus approximation: the function's enclosing namespace
* is contributed to the ADL associated set. For qualified refs a workspace
* lookup confirms a Function/Method with that simple name exists in the
* namespace before contributing; for unqualified refs every namespace
* containing a matching Function/Method def is contributed. */
readonly functionRefText?: string;
}

const argInfoBySite = new Map<string, readonly CppAdlArgInfo[]>();
Expand Down Expand Up @@ -179,10 +205,14 @@ export function pickCppAdlCandidates(
const args = argInfoBySite.get(key);
if (args === undefined || args.length === 0) return undefined;

// Collect associated namespace QNames from every participating class-typed arg.
// Collect associated namespace QNames from every participating class-typed arg
// and from function-reference args.
const associatedNamespaces = new Set<string>();
for (const arg of args) {
collectAssociatedNamespacesForAdlArg(arg, scopes, associatedNamespaces);
if (arg.functionRefText !== undefined) {
collectFunctionRefNamespaces(arg.functionRefText, parsedFiles, associatedNamespaces);
}
}
if (associatedNamespaces.size === 0) return undefined;

Expand Down Expand Up @@ -389,3 +419,71 @@ function findCppClassDefBySimpleName(
if (firstMatch === undefined) return undefined;
return { classDef: firstMatch, ambiguous: false };
}

/**
* Contribute associated namespaces for a function-reference argument.
*
* - **Qualified refs** (`utils::worker`, `outer::inner::fn`): the namespace
* is extracted from the qualifier text (converting `::` to `.` for dot-joined
* QName matching). A workspace lookup then **verifies** that a Function or
* Method def named `worker` (the simple name after the last `::`) actually
* exists in the extracted namespace. This prevents false positives from
* namespace-qualified variables, enum values, and static data members, which
* also produce `qualified_identifier` AST nodes in tree-sitter-cpp (the
* AST node type alone does not distinguish functions from non-function names).
* - **Unqualified refs** (`worker`): the workspace is searched for any
* Function/Method def whose simple name matches. Every distinct enclosing
* namespace found is added — overloads across the same namespace produce
* a single entry; GitNexus does not select a specific overload at this stage.
*/
function collectFunctionRefNamespaces(
refText: string,
parsedFiles: readonly ParsedFile[],
out: Set<string>,
): void {
const colonIdx = refText.lastIndexOf('::');
if (colonIdx !== -1) {
// Qualified ref: extract namespace prefix and normalise :: → dot notation.
const nsText = refText.slice(0, colonIdx).replace(/::/g, '.');
if (nsText === '') return;
const simpleName = refText.slice(colonIdx + 2);
// Verify that a Function/Method named `simpleName` exists in `nsText`.
// Without this guard every `a::b` qualified_identifier arg (variable,
// enum value, static member, type alias) would blindly contribute `a`
// to the associated set and risk a false-positive CALLS edge.
for (const parsed of parsedFiles) {
const scopesById = new Map<ScopeId, (typeof parsed.scopes)[number]>();
for (const sc of parsed.scopes) scopesById.set(sc.id, sc);
for (const scope of parsed.scopes) {
if (scope.kind !== 'Namespace') continue;
if (computeNamespaceQName(scope, scopesById) !== nsText) continue;
for (const def of scope.ownedDefs) {
if (def.type !== 'Function' && def.type !== 'Method') continue;
const simple = def.qualifiedName?.split('.').pop() ?? def.qualifiedName ?? '';
if (simple === simpleName) {
out.add(nsText);
return; // Namespace confirmed; no need to scan further files.
}
}
}
}
return;
}

// Unqualified: search all namespace scopes for a Function def with this
// simple name and contribute its enclosing namespace.
for (const parsed of parsedFiles) {
const scopesById = new Map<ScopeId, (typeof parsed.scopes)[number]>();
for (const sc of parsed.scopes) scopesById.set(sc.id, sc);
for (const scope of parsed.scopes) {
if (scope.kind !== 'Namespace') continue;
for (const def of scope.ownedDefs) {
if (def.type !== 'Function' && def.type !== 'Method') continue;
const simple = def.qualifiedName?.split('.').pop() ?? def.qualifiedName ?? '';
if (simple !== refText) continue;
const nsQName = computeNamespaceQName(scope, scopesById);
if (nsQName !== '') out.add(nsQName);
}
}
}
}
141 changes: 137 additions & 4 deletions gitnexus/src/core/ingestion/languages/cpp/captures.ts
Original file line number Diff line number Diff line change
Expand Up @@ -785,17 +785,91 @@ function classifyAdlArg(argNode: SyntaxNode): CppAdlArgInfo {
) {
return EMPTY_ADL_ARG;
}
// Qualified expression (a::b) — may be a function, variable, enum value,
// or static member. Record as a potential function reference; resolution
// time verifies via workspace lookup that a Function/Method with this simple
// name exists in the extracted namespace before contributing to the set.
if (argNode.type === 'qualified_identifier') {
return {
simpleClassName: '',
templateSimpleClassName: '',
templateNamespace: '',
templateArgClassNames: [],
templateArgNamespaces: [],
functionRefText: argNode.text,
};
}
// Variable reference — look up its declared type (preserving pointer /
// reference / qualified-name shape; the existing arity-narrowing helper
// strips this info).
if (argNode.type === 'identifier') {
return lookupAdlIdentifierType(argNode);
const result = lookupAdlIdentifierType(argNode);
if (result === null) {
// Not found in the local compound_statement scope — could be a
// free-function reference (unqualified name, namespace scope).
return {
simpleClassName: '',
templateSimpleClassName: '',
templateNamespace: '',
templateArgClassNames: [],
templateArgNamespaces: [],
functionRefText: argNode.text,
};
}
return result;
}
// Other shapes (calls, member access, operators) — V1 unsupported.
return EMPTY_ADL_ARG;
}

function lookupAdlIdentifierType(identNode: SyntaxNode): CppAdlArgInfo {
/**
* Returns `true` when `varName` appears as a parameter name in the nearest
* enclosing `function_definition` or `function_declarator` that contains
* `identNode`. Parameters live in `parameter_list` (a sibling of the
* `compound_statement`), so the `compound_statement`-local declaration scan
* in `lookupAdlIdentifierType` would not find them — causing them to be
* mistakenly classified as potential free-function references.
*
* In tree-sitter-cpp a `function_definition` does NOT expose `parameters`
* as a direct named field; parameters live inside the nested
* `function_declarator`. For `function_declarator` nodes the `parameters`
* field IS direct. Both cases are handled below.
*/
function isIdentifierAFunctionParameter(identNode: SyntaxNode, varName: string): boolean {
let node: SyntaxNode | null = identNode.parent;
let safety = 64;
while (node !== null && safety-- > 0) {
let params: SyntaxNode | null = null;
if (node.type === 'function_declarator') {
// parameters is a direct field on function_declarator.
params = node.childForFieldName('parameters');
} else if (node.type === 'function_definition') {
// function_definition carries parameters inside its `declarator` field
// (which is a function_declarator). Walk through it.
const decl = node.childForFieldName('declarator');
if (decl !== null && decl.type === 'function_declarator') {
params = decl.childForFieldName('parameters');
}
}
if (params !== null) {
for (let i = 0; i < params.namedChildCount; i++) {
const param = params.namedChild(i);
if (param === null) continue;
const declNode = param.childForFieldName('declarator');
if (declNode === null) continue;
const leafName = extractDeclaratorLeafName(declNode);
if (leafName === varName) return true;
}
// Only check the immediately enclosing function — do not climb further.
break;
}
if (node.type === 'translation_unit') break;
node = node.parent;
}
return false;
}

function lookupAdlIdentifierType(identNode: SyntaxNode): CppAdlArgInfo | null {
const varName = identNode.text;
let scope: SyntaxNode | null = identNode.parent;
while (
Expand All @@ -805,8 +879,17 @@ function lookupAdlIdentifierType(identNode: SyntaxNode): CppAdlArgInfo {
) {
scope = scope.parent;
}
if (scope === null) return EMPTY_ADL_ARG;
if (scope === null) return null;

// Function parameters live in the enclosing function's `parameter_list`,
// NOT inside the `compound_statement`, so the declaration scan below would
// never find them and would return `null` — incorrectly triggering the
// free-function-reference path. Check the parameter_list first.
if (isIdentifierAFunctionParameter(identNode, varName)) {
return EMPTY_ADL_ARG;
}

let foundAsLocalFunctionPointer = false;
for (let i = 0; i < scope.childCount; i++) {
const stmt = scope.child(i);
if (stmt === null || stmt.type !== 'declaration') continue;
Expand All @@ -833,6 +916,9 @@ function lookupAdlIdentifierType(identNode: SyntaxNode): CppAdlArgInfo {
if (inner.type === 'pointer_declarator') {
if (findFirstDescendantOfType(inner, 'function_declarator') !== null) {
isFunctionPointer = true;
// Extract the name from within the function-pointer declarator chain
// so `foundAsLocalFunctionPointer` can detect a matching declaration.
nameText = extractDeclaratorLeafName(inner);
break;
}
const next = inner.childForFieldName('declarator');
Expand Down Expand Up @@ -862,12 +948,21 @@ function lookupAdlIdentifierType(identNode: SyntaxNode): CppAdlArgInfo {
}
if (inner.type === 'function_declarator') {
isFunctionPointer = true;
// Extract the name from the inner declarator (e.g. `(*g)` in `void (*g)()`).
const innerDecl = inner.childForFieldName('declarator');
if (innerDecl !== null) nameText = extractDeclaratorLeafName(innerDecl);
break;
}
// Reached the leaf — usually `identifier`. Take its text.
nameText = inner.text;
break;
}
if (nameText === varName && isFunctionPointer) {
// Explicitly declared as a function-pointer variable — must not be
// treated as a free-function reference by the caller.
foundAsLocalFunctionPointer = true;
continue;
}
if (isFunctionPointer || nameText !== varName) continue;

const simpleClassName = extractAdlSimpleTypeName(typeNode);
Expand All @@ -885,7 +980,22 @@ function lookupAdlIdentifierType(identNode: SyntaxNode): CppAdlArgInfo {
templateArgNamespaces,
};
}
return EMPTY_ADL_ARG;
// If the identifier was found in local scope as a function-pointer variable,
// return EMPTY_ADL_ARG so the caller does NOT treat it as a free-function
// reference. Otherwise return null to indicate "not in local scope".
//
// Known limitation (Finding 4): variables whose type is a typedef/using alias
// for a function-pointer type are NOT detected here. For example:
// using Callback = void (*)();
// Callback g;
// foo(g); // `g`'s declarator is `identifier` with type `Callback`
// The declarator has no `pointer_declarator` wrapper, so `isFunctionPointer`
// stays false and `extractAdlSimpleTypeName` returns `"Callback"`. ADL then
// looks for a class named `Callback`; if none exists, this degrades to
// EMPTY_ADL_ARG (class not found → no namespace contributed). If a class
// named `Callback` does exist, a spurious namespace contribution could occur.
// Risk is low in practice; a future fix should resolve the typedef/alias chain.
return foundAsLocalFunctionPointer ? EMPTY_ADL_ARG : null;
}

/** Extract the simple class-like type name from a `type:` field node.
Expand Down Expand Up @@ -1040,6 +1150,29 @@ function extractNamespaceFromQualifiedText(text: string): string {
return normalizeCppNamespaceQName(cleaned.slice(0, idx));
}

/**
* Walk a declarator node chain, unwrapping pointer/reference/function/
* parenthesized wrappers, and return the text of the innermost identifier.
* Returns `null` when no identifier is found within `safety` steps.
* Used by `lookupAdlIdentifierType` to extract the variable name from
* function-pointer declarator trees such as `(*g)()` in `void (*g)()`.
*/
function extractDeclaratorLeafName(node: SyntaxNode): string | null {
let cur: SyntaxNode = node;
let safety = 16;
while (safety-- > 0) {
if (cur.type === 'identifier' || cur.type === 'type_identifier') return cur.text;
// Common wrapper nodes — follow the 'declarator' field when present.
const next =
cur.childForFieldName('declarator') ??
// parenthesized_declarator: single named child
(cur.type === 'parenthesized_declarator' ? cur.namedChild(0) : null);
if (next === null) return null;
cur = next;
}
return null;
}

/**
* Check if a C++ function_definition or declaration has `static` storage class.
*/
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#include "utils.h"

namespace caller {
void run() {
with_callback(utils::worker);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#pragma once

namespace utils {
void worker();
void worker(int n);
void with_callback(int n);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#include "utils.h"

namespace caller {
void run() {
with_callback(utils::worker);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#pragma once

namespace utils {
void worker();
void with_callback(int n);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#include "audit.h"

namespace app {
void run() {
// `g` is a locally-declared function-pointer variable. audit::g() also
// exists in the workspace. The local-fp guard (foundAsLocalFunctionPointer)
// must detect `g` as a function-pointer variable declaration and return
// EMPTY_ADL_ARG, preventing the workspace scan that would otherwise find
// audit::g and contribute `audit` to the ADL associated set.
void (*g)();
record(g);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#pragma once

namespace audit {
// A free function named `g` exists in the workspace. Without the local-fp
// guard, a locally-declared `void (*g)()` variable would fall through to
// EMPTY_ADL_ARG and not be treated as a free-function ref — but this test
// specifically verifies that the local fp variable shadows the workspace
// function of the same name and no namespace is contributed.
void g();
void record(void (*fn)());
}
Loading
Loading