diff --git a/.cursor/plans/enhance_523ca41c.plan.md b/.cursor/plans/enhance_523ca41c.plan.md new file mode 100644 index 0000000000..471e13f67b --- /dev/null +++ b/.cursor/plans/enhance_523ca41c.plan.md @@ -0,0 +1,239 @@ +--- +name: Enhance +overview: Restructure GitNexus LLM tools to leverage clusters and processes for better code understanding. Remove unused highlight tool, add new tools (explore, overview), enhance existing tools with cluster/process context, and improve impact analysis reliability. +todos: [] +--- + +# Enhanced LLM Tools with Cluster and Process Integration + +## Summary + +Consolidate GitNexus from 6 tools to **7 focused tools** that leverage the pre-computed clusters (Communities) and processes for richer context. Remove the highlight tool, add `explore` and `overview` tools, and enhance `search` and `blastRadius` with cluster/process awareness. + +## Final Tool Set + +| Tool | Status | Purpose ||------|--------|---------|| `search` | Enhance | Hybrid search + group results by process/cluster || `grep` | Keep | Regex pattern search || `read` | Keep | Read file content || `explore` | **New** | Deep dive on one symbol, cluster, or process || `overview` | **New** | Codebase map (all clusters + all processes) || `impact` | Enhance | Rename from blastRadius, add process/cluster context, increase limits || `cypher` | Keep | Raw graph queries || `highlight` | **Remove** | No longer needed | + +## Architecture + +```mermaid +flowchart TD + subgraph tools [LLM Tools Layer] + search[search] + grep[grep] + read[read] + explore[explore] + overview[overview] + impact[impact] + cypher[cypher] + end + + subgraph graph [Knowledge Graph] + nodes[Nodes: File, Function, Class...] + communities[Community Nodes] + processes[Process Nodes] + edges[CodeRelation Edges] + memberOf[MEMBER_OF Edges] + stepIn[STEP_IN_PROCESS Edges] + end + + search --> edges + search --> communities + search --> processes + explore --> communities + explore --> processes + explore --> memberOf + explore --> stepIn + overview --> communities + overview --> processes + impact --> edges + impact --> communities + impact --> processes + cypher --> graph +``` + + + +## File Changes + +### 1. Remove Highlight Tool + +**File:** [gitnexus/src/core/llm/tools.ts](gitnexus/src/core/llm/tools.ts) + +- Delete the `highlightTool` definition (lines ~395-414) +- Remove `highlightTool` from the returned array (line ~862) +- Remove highlight marker logic from `blastRadius` output (line ~814-816) + +**File:** [gitnexus/src/core/llm/agent.ts](gitnexus/src/core/llm/agent.ts) + +- Remove highlight references from system prompt (lines 70, 77) +- Update tool list in prompt to reflect new tools + +**File:** [gitnexus/src/core/llm/types.ts](gitnexus/src/core/llm/types.ts) + +- Remove `'highlight'` from `AgentStreamChunk.type` union (line 180) +- Remove `highlightNodeIds` property (line 187-188) + +### 2. Add `explore` Tool + +**File:** [gitnexus/src/core/llm/tools.ts](gitnexus/src/core/llm/tools.ts)New tool that auto-detects target type and returns comprehensive context: + +```typescript +explore({ + target: string, // Name of symbol, cluster, or process + type?: 'symbol' | 'cluster' | 'process' // Optional, auto-detected +}) +``` + +**Functionality:** + +- For symbols: Query node, get MEMBER_OF cluster, get STEP_IN_PROCESS processes, get 1-hop connections +- For clusters: Query Community node, get members via MEMBER_OF, get processes that touch this cluster +- For processes: Query Process node, get steps via STEP_IN_PROCESS with step order, get clusters touched + +**Cypher queries needed:** + +```cypher +-- Symbol cluster membership +MATCH (s {name: $name})-[:CodeRelation {type: 'MEMBER_OF'}]->(c:Community) +RETURN c.label, c.description + +-- Symbol process participation +MATCH (s {name: $name})-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process) +RETURN p.label, r.step, p.stepCount + +-- Process steps in order +MATCH (s)-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process {id: $processId}) +RETURN s.name, s.filePath, r.step +ORDER BY r.step +``` + + + +### 3. Add `overview` Tool + +**File:** [gitnexus/src/core/llm/tools.ts](gitnexus/src/core/llm/tools.ts)New tool that returns codebase structure: + +```typescript +overview() // No parameters +``` + +**Functionality:** + +- Query all Community nodes with member counts +- Query all Process nodes with step counts and types +- Calculate cluster dependencies (cross-cluster CALLS) +- Identify critical paths (most connected processes) + +**Output format:** + +```javascript +CLUSTERS (N total): +| Cluster | Symbols | Cohesion | Description | +... + +PROCESSES (N total): +| Process | Steps | Type | Clusters | +... + +CRITICAL PATHS: +- LoginFlow (45 edges) +... +``` + + + +### 4. Enhance `search` Tool + +**File:** [gitnexus/src/core/llm/tools.ts](gitnexus/src/core/llm/tools.ts)Modify existing search to group results by process:**Current:** Returns flat list with 1-hop connections**Enhanced:** Groups results by process, adds cluster context**Changes:** + +- After hybrid search, query STEP_IN_PROCESS for each result +- Group results by process ID +- Sort processes by number of matching results (relevance) +- Add cluster label for each result via MEMBER_OF query +- Keep 1-hop connections as optional detail + +**New parameter:** + +```typescript +search({ + query: string, + groupByProcess?: boolean, // Default: true + limit?: number +}) +``` + + + +### 5. Enhance `impact` Tool (rename from blastRadius) + +**File:** [gitnexus/src/core/llm/tools.ts](gitnexus/src/core/llm/tools.ts)**Rename:** `blastRadiusTool` to `impactTool`**Enhancements:** + +1. Increase LIMIT clauses: 100 to 300 (depth 1), 100 to 200 (depth 2), 50 to 100 (depth 3) +2. Add affected processes section (query STEP_IN_PROCESS for all affected symbols) +3. Add affected clusters section (query MEMBER_OF for all affected symbols) +4. Add risk assessment summary +5. Surface confidence scores more prominently (group by confidence level) + +**New output sections:** + +```javascript +AFFECTED PROCESSES: +- LoginFlow - BROKEN at step 2 +- SignupFlow - BROKEN at step 1 + +AFFECTED CLUSTERS: +- Authentication (direct) +- API Routes (indirect) + +RISK: CRITICAL +- N direct callers +- N processes affected +- N clusters affected +``` + + + +### 6. Increase Process Detection Limits + +**File:** [gitnexus/src/core/ingestion/process-processor.ts](gitnexus/src/core/ingestion/process-processor.ts)Change default config (lines 27-32): + +```typescript +const DEFAULT_CONFIG: ProcessDetectionConfig = { + maxTraceDepth: 10, // Keep + maxBranching: 4, // Was 3 + maxProcesses: 75, // Was 50 + minSteps: 2, // Keep +}; +``` + + + +### 7. Update System Prompt + +**File:** [gitnexus/src/core/llm/agent.ts](gitnexus/src/core/llm/agent.ts)Update BASE_SYSTEM_PROMPT to reflect new tools: + +```javascript +## TOOLS +- **search** - Hybrid search. Results grouped by process with cluster context. +- **grep** - Regex pattern search for exact strings. +- **read** - Read file content. +- **explore** - Deep dive on a symbol, cluster, or process. Shows membership, participation, connections. +- **overview** - Codebase map showing all clusters and processes. +- **impact** - Impact analysis. Shows affected processes, clusters, and risk level. +- **cypher** - Raw Cypher queries against the graph. + +## GRAPH SCHEMA +Nodes: File, Folder, Function, Class, Interface, Method, Community, Process +Relations: CodeRelation with type: CONTAINS, DEFINES, IMPORTS, CALLS, EXTENDS, IMPLEMENTS, MEMBER_OF, STEP_IN_PROCESS +``` + + + +## Implementation Order + +1. Remove highlight tool (cleanup) +2. Increase process detection limits +3. Add overview tool (simplest new tool) +4. Add explore tool +5. Enhance impact tool \ No newline at end of file diff --git a/.sisyphus/drafts/gitnexus-brainstorming.md b/.sisyphus/drafts/gitnexus-brainstorming.md new file mode 100644 index 0000000000..95865a68b5 --- /dev/null +++ b/.sisyphus/drafts/gitnexus-brainstorming.md @@ -0,0 +1,18 @@ +# Draft: Gitnexus Brainstorming - Clustering & Process Maps + +## Initial Context +- Project: **GitnexusV2** +- Structure: + - `gitnexus/` (Likely the core application) + - `gitnexus-mcp/` (Likely a Model Context Protocol server) +- Goal: Make it accurate and usable for smaller/dumber models. +- Current Focus: Implementing **Clustering** and **Process Maps**. + +## Findings +- **Clustering**: Found `gitnexus/src/core/ingestion/cluster-enricher.ts`. +- **Process Maps**: No files matched `*process*map*` yet. Searching content next. + +## Open Questions +- How is "process map" defined in this context? (Graph, mermaid diagram, flowchart?) +- What is the input for clustering? (Code chunks, files, commits?) +- What is the intended output for "smaller models"? (Simplified context, summaries?) diff --git a/.sisyphus/drafts/noodlbox-comparison.md b/.sisyphus/drafts/noodlbox-comparison.md new file mode 100644 index 0000000000..3ef3adc488 --- /dev/null +++ b/.sisyphus/drafts/noodlbox-comparison.md @@ -0,0 +1,34 @@ +# Draft: Gitnexus vs Noodlbox Strategy + +## Objectives +- Understand GitnexusV2 current state and goals. +- Analyze Noodlbox capabilities from provided URL. +- Compare features, architecture, and value proposition. +- Provide strategic views and recommendations. + +## Research Findings +- [GitnexusV2]: Zero-server, browser-native (WASM), KuzuDB based. Graph + Vector hybrid search. +- [Noodlbox]: CLI-first, heavy install. Has "Session Hooks" and "Search Hooks" via plugins/CLI. + +## Comparison Points +- **Core Philosophy**: Both bet on "Knowledge Graph + MCP" as the future. Noodlbox validates Gitnexus's direction. +- **Architecture**: + - *Noodlbox*: CLI/Binary based. Likely local server management. + - *Gitnexus*: Zero-server, Browser-native (WASM). Lower friction, higher privacy. +- **Features**: + - *Communities/Processes*: Both have them. Noodlbox uses them for "context injection". Gitnexus uses them for "visual exploration + query". + - *Impact Analysis*: Noodlbox has polished workflows (e.g., `detect_impact staged`). Gitnexus has the engine (`blastRadius`) but maybe not the specific workflow wrappers yet. +- **UX/Integration**: + - *Noodlbox*: "Hooks" (Session/Search) are a killer feature. Proactively injecting context into the agent's session. + - *Gitnexus*: Powerful tools, but relies on agent *pulling* data? + +## Strategic Views +1. **Validation**: The market direction is confirmed. You are building the right thing. +2. **differentiation**: Lean into "Zero-Setup / Browser-Native". Noodlbox requires `noodl init` and CLI handling. Gitnexus could just *be*. +3. **Opportunity**: Steal the "Session/Search Hooks" pattern. Make the agent smarter *automatically* without the user asking "check impact". +4. **Workflow Polish**: Noodlbox's `/detect_impact staged` is a great specific use case. Gitnexus should wrap `blastRadius` into similar concrete workflows. + +## Technical Feasibility (Interception) +- **Cursor**: Use `.cursorrules` to "shadow" default tools. Instruct agent to ALWAYS use `gitnexus_search` instead of `grep`. +- **Claude Code**: Likely uses a private plugin API for `PreToolUse`. We can't match this exactly without an official plugin, but we can approximate it with strong prompt instructions in `AGENTS.md`. +- **MCP Shadowing**: Define tools with names that conflict (e.g., `grep`)? No, unsafe. Better to use "Virtual Hooks" via system prompt instructions. diff --git a/gitnexus-mcp/package.json b/gitnexus-mcp/package.json index 4203ff69fb..037571f935 100644 --- a/gitnexus-mcp/package.json +++ b/gitnexus-mcp/package.json @@ -1,6 +1,6 @@ { "name": "gitnexus-mcp", - "version": "0.1.1", + "version": "0.2.0", "description": "MCP server for GitNexus code intelligence - connect Cursor, Claude, and other AI agents to your codebase", "author": "Abhigyan Patwari", "license": "MIT", @@ -44,4 +44,4 @@ "engines": { "node": ">=18.0.0" } -} +} \ No newline at end of file diff --git a/gitnexus-mcp/src/mcp/server.ts b/gitnexus-mcp/src/mcp/server.ts index 81b663613e..a5c4ef7e7a 100644 --- a/gitnexus-mcp/src/mcp/server.ts +++ b/gitnexus-mcp/src/mcp/server.ts @@ -74,22 +74,28 @@ function formatContextAsMarkdown(context: CodebaseContext): string { // Usage hints lines.push('## πŸ› οΈ Available Tools'); lines.push(''); - lines.push('- **search**: Semantic search across the codebase'); - lines.push('- **cypher**: Execute Cypher queries on the knowledge graph'); - lines.push('- **blastRadius**: Analyze impact of changes to a node'); - lines.push('- **highlight**: Visualize nodes in the graph'); + lines.push('- **search**: Semantic + keyword search across codebase'); + lines.push('- **cypher**: Execute Cypher queries on knowledge graph'); + lines.push('- **grep**: Regex pattern search in files'); + lines.push('- **read**: Read file contents'); + lines.push('- **explore**: Deep dive on symbol, cluster, or process'); + lines.push('- **overview**: Codebase map (all clusters + processes)'); + lines.push('- **impact**: Analyze change impact (upstream/downstream)'); + lines.push('- **highlight**: Visualize nodes in graph'); lines.push(''); lines.push('## πŸ“ Graph Schema'); lines.push(''); - lines.push('**Node Types**: File, Folder, Function, Class, Interface, Method'); + lines.push('**Node Types**: File, Folder, Function, Class, Interface, Method, Community, Process'); lines.push(''); lines.push('**Relation**: `CodeRelation` with `type` property:'); - lines.push('- CONTAINS, DEFINES, IMPORTS, CALLS, EXTENDS, IMPLEMENTS'); + lines.push('- CALLS, IMPORTS, EXTENDS, IMPLEMENTS, CONTAINS, DEFINES'); + lines.push('- MEMBER_OF (symbol β†’ community), STEP_IN_PROCESS (symbol β†’ process)'); lines.push(''); lines.push('**Example Cypher Queries**:'); lines.push('```cypher'); lines.push('MATCH (f:Function) RETURN f.name LIMIT 10'); lines.push("MATCH (f:File)-[:CodeRelation {type: 'IMPORTS'}]->(g:File) RETURN f.name, g.name"); + lines.push("MATCH (s)-[:CodeRelation {type: 'MEMBER_OF'}]->(c:Community) RETURN c.label, count(s)"); lines.push('```'); return lines.join('\n'); diff --git a/gitnexus-mcp/src/mcp/tools.ts b/gitnexus-mcp/src/mcp/tools.ts index 9ca2d27d2b..84ab18d72a 100644 --- a/gitnexus-mcp/src/mcp/tools.ts +++ b/gitnexus-mcp/src/mcp/tools.ts @@ -41,19 +41,20 @@ ALWAYS call this first to understand the codebase before searching or querying.` { name: 'search', description: `Hybrid search (keyword + semantic) across the codebase. -Returns code nodes with their graph connections. +Returns code nodes with their graph connections, grouped by process. WHEN TO USE: - Finding implementations ("where is auth handled?") - Understanding code flow ("what calls UserService?") - Locating patterns ("find all API endpoints") -RETURNS: Array of {name, type, filePath, code, connections[]}`, +RETURNS: Array of {name, type, filePath, code, connections[], cluster, processes[]}`, inputSchema: { type: 'object', properties: { query: { type: 'string', description: 'Natural language or keyword search query' }, limit: { type: 'number', description: 'Max results to return', default: 10 }, + groupByProcess: { type: 'boolean', description: 'Group results by process', default: true }, }, required: ['query'], }, @@ -63,23 +64,23 @@ RETURNS: Array of {name, type, filePath, code, connections[]}`, description: `Execute Cypher query against the code knowledge graph. SCHEMA: -- Nodes: File, Function, Class, Interface, Method -- Edges: CALLS, IMPORTS, EXTENDS, IMPLEMENTS, CONTAINS +- Nodes: File, Folder, Function, Class, Interface, Method, Community, Process +- Edges via CodeRelation.type: CALLS, IMPORTS, EXTENDS, IMPLEMENTS, CONTAINS, DEFINES, MEMBER_OF, STEP_IN_PROCESS EXAMPLES: β€’ Find callers of a function: - MATCH (a)-[:CALLS]->(b:Function {name: "validateUser"}) RETURN a.name, a.filePath + MATCH (a)-[:CodeRelation {type: 'CALLS'}]->(b:Function {name: "validateUser"}) RETURN a.name, a.filePath -β€’ Find class hierarchy: - MATCH (c:Class)-[:EXTENDS*]->(base) WHERE c.name = "AdminUser" RETURN base.name +β€’ Find all functions in a community: + MATCH (f:Function)-[:CodeRelation {type: 'MEMBER_OF'}]->(c:Community {label: "Auth"}) RETURN f.name -β€’ Impact analysis (what depends on X): - MATCH (target:Function {name: $name})<-[:CALLS*1..3]-(caller) RETURN DISTINCT caller +β€’ Find steps in a process: + MATCH (s)-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process {label: "UserLogin"}) RETURN s.name, r.step ORDER BY r.step TIPS: -- Relationship types are UPPERCASE: CALLS, IMPORTS, EXTENDS -- Node labels are PascalCase: Function, Class, Interface -- Properties: name, filePath, code, startLine, endLine`, +- All relationships use CodeRelation table with 'type' property +- Community = functional cluster detected by Leiden algorithm +- Process = execution flow trace from entry point to terminal`, inputSchema: { type: 'object', properties: { @@ -133,17 +134,60 @@ RETURNS: {filePath, content, language, lines}`, }, }, { - name: 'blastRadius', + name: 'explore', + description: `Deep dive on a symbol, cluster, or process. + +TYPE: symbol | cluster | process + +For SYMBOL: Shows cluster membership, process participation, callers/callees +For CLUSTER: Shows members, cohesion score, processes touching it +For PROCESS: Shows step-by-step trace, clusters traversed, entry/terminal points + +Use after search to understand context of a specific node.`, + inputSchema: { + type: 'object', + properties: { + name: { type: 'string', description: 'Name of symbol, cluster, or process to explore' }, + type: { type: 'string', description: 'Type: symbol, cluster, or process' }, + }, + required: ['name', 'type'], + }, + }, + { + name: 'overview', + description: `Get codebase map showing all clusters and processes. + +Returns: +- All communities (clusters) with member counts and cohesion scores +- All processes with step counts and types (intra/cross-community) +- High-level architectural view + +Use to understand overall codebase structure before diving deep.`, + inputSchema: { + type: 'object', + properties: { + showProcesses: { type: 'boolean', description: 'Include process list', default: true }, + showClusters: { type: 'boolean', description: 'Include cluster list', default: true }, + limit: { type: 'number', description: 'Max items per category', default: 20 }, + }, + required: [], + }, + }, + { + name: 'impact', description: `Analyze the impact of changing a code element. Returns all nodes affected by modifying the target, with distance, edge type, and confidence. USE BEFORE making changes to understand ripple effects. -Output format (compact tabular): - Type|Name|File:Line|EdgeType|Confidence% +Output includes: +- Affected processes (with step positions) +- Affected clusters (direct/indirect) +- Risk assessment (critical/high/medium/low) +- Callers/dependents grouped by depth EdgeType: CALLS, IMPORTS, EXTENDS, IMPLEMENTS -Confidence: 100% = certain, <80% = fuzzy match [fuzzy] +Confidence: 100% = certain, <80% = fuzzy match Depth groups: - d=1: WILL BREAK (direct callers/importers) @@ -155,7 +199,7 @@ Depth groups: target: { type: 'string', description: 'Name of function, class, or file to analyze' }, direction: { type: 'string', description: 'upstream (what depends on this) or downstream (what this depends on)' }, maxDepth: { type: 'number', description: 'Max relationship depth (default: 3)', default: 3 }, - relationTypes: { type: 'array', items: { type: 'string' }, description: 'Filter: CALLS, IMPORTS, EXTENDS, IMPLEMENTS, CONTAINS, DEFINES (default: usage-based)' }, + relationTypes: { type: 'array', items: { type: 'string' }, description: 'Filter: CALLS, IMPORTS, EXTENDS, IMPLEMENTS (default: usage-based)' }, includeTests: { type: 'boolean', description: 'Include test files (default: false)' }, minConfidence: { type: 'number', description: 'Minimum confidence 0-1 (default: 0.7)' }, }, diff --git a/gitnexus/docs/FRAMEWORK_SUPPORT.md b/gitnexus/docs/FRAMEWORK_SUPPORT.md new file mode 100644 index 0000000000..71cca03e44 --- /dev/null +++ b/gitnexus/docs/FRAMEWORK_SUPPORT.md @@ -0,0 +1,74 @@ +# Framework Support for Entry Point Detection + +GitNexus automatically detects frameworks and boosts entry point scores for known patterns. + +## Status Legend +- βœ… Supported (path-based detection) +- ❌ Not yet supported + +--- + +## JavaScript / TypeScript + +| Framework | Status | Detection Pattern | Multiplier | +|-----------|--------|-------------------|------------| +| Next.js (Pages) | βœ… | `/pages/*.tsx` | 3.0x | +| Next.js (App) | βœ… | `/app/*/page.tsx` | 3.0x | +| Next.js API | βœ… | `/pages/api/*`, `/app/*/route.ts` | 3.0x | +| Express.js | βœ… | `/routes/*` | 2.5x | +| React | βœ… | `/components/*.tsx` (PascalCase) | 1.5x | +| NestJS | ❌ | TODO: `@Controller` decorator | - | + +## Python + +| Framework | Status | Detection Pattern | Multiplier | +|-----------|--------|-------------------|------------| +| Django | βœ… | `views.py`, `urls.py` | 3.0x | +| FastAPI | βœ… | `/routers/*`, `/endpoints/*` | 2.5x | +| Flask | βœ… | `/routes/*` | 2.5x | + +## Java + +| Framework | Status | Detection Pattern | Multiplier | +|-----------|--------|-------------------|------------| +| Spring Boot | βœ… | `/controller/*`, `*Controller.java` | 3.0x | +| JAX-RS | ❌ | TODO: `@Path` annotation | - | + +## C# + +| Framework | Status | Detection Pattern | Multiplier | +|-----------|--------|-------------------|------------| +| ASP.NET Core | βœ… | `/Controllers/*`, `*Controller.cs` | 3.0x | +| Blazor | βœ… | `/Pages/*.razor` | 2.5x | + +## Go + +| Framework | Status | Detection Pattern | Multiplier | +|-----------|--------|-------------------|------------| +| net/http | βœ… | `/handlers/*`, `main.go` | 2.5-3.0x | +| Gin/Echo | βœ… | `/handlers/*`, `/routes/*` | 2.5x | + +## Rust + +| Framework | Status | Detection Pattern | Multiplier | +|-----------|--------|-------------------|------------| +| Actix/Axum/Rocket | βœ… | `/handlers/*`, `main.rs` | 2.5-3.0x | + +## C / C++ + +| Framework | Status | Detection Pattern | Multiplier | +|-----------|--------|-------------------|------------| +| Generic | βœ… | `main.c`, `main.cpp` | 3.0x | + +--- + +## Adding New Framework Support + +1. Edit `framework-detection.ts` β†’ `detectFrameworkFromPath()` +2. Add path pattern with appropriate multiplier +3. Update this documentation +4. Test with a sample project + +## Graceful Fallback + +Unknown frameworks return `null`, resulting in a **1.0x multiplier** (no bonus, no penalty). diff --git a/gitnexus/src/components/ActivityFeed.tsx b/gitnexus/src/components/ActivityFeed.tsx index 0f4444e0df..a802b859b6 100644 --- a/gitnexus/src/components/ActivityFeed.tsx +++ b/gitnexus/src/components/ActivityFeed.tsx @@ -6,7 +6,7 @@ */ import { useState, useEffect, useRef } from 'react'; -import { Activity, Search, Database, Terminal, Eye, Loader2, CheckCircle, XCircle, Clock, FileText, Zap } from 'lucide-react'; +import { Activity, Search, Database, Terminal, Loader2, CheckCircle, XCircle, Clock, FileText, Zap, Map, Compass } from 'lucide-react'; import { getMCPClient, type ActivityEvent } from '../core/mcp/mcp-client'; // Tool icons @@ -16,8 +16,9 @@ const TOOL_ICONS: Record = { cypher: Database, grep: Terminal, read: FileText, - blastRadius: Activity, - highlight: Eye, + impact: Activity, + overview: Map, + explore: Compass, }; // Tool colors @@ -27,8 +28,9 @@ const TOOL_COLORS: Record = { cypher: 'text-purple-400', grep: 'text-green-400', read: 'text-blue-400', - blastRadius: 'text-rose-400', - highlight: 'text-teal-400', + impact: 'text-rose-400', + overview: 'text-indigo-400', + explore: 'text-teal-400', }; export function ActivityFeed() { diff --git a/gitnexus/src/components/Header.tsx b/gitnexus/src/components/Header.tsx index 9e4936223e..c1926842aa 100644 --- a/gitnexus/src/components/Header.tsx +++ b/gitnexus/src/components/Header.tsx @@ -238,27 +238,74 @@ export const Header = ({ onFocusNode }: HeaderProps) => { const results = await runQuery(query); return results; }} - onBlastRadius={async (nodeId, hops = 2) => { - // Run blast radius query + onImpact={async (nodeId: string, hops = 2) => { + // Run impact analysis query const query = ` MATCH (start)-[*1..${hops}]-(connected) WHERE start.id = '${nodeId}' OR start.name = '${nodeId}' RETURN DISTINCT connected.id AS id, connected.name AS name, labels(connected) AS labels `; const results = await runQuery(query); - // Trigger ripple animation on blast radius results + // Trigger ripple animation on impact results const nodeIds = results.map((r: any) => r.id).filter(Boolean); if (nodeIds.length > 0) { triggerNodeAnimation(nodeIds, 'ripple'); } return results; }} - onHighlight={(nodeIds) => { - // Highlight nodes in the graph - setHighlightedNodeIds(new Set(nodeIds)); - // Trigger glow animation on highlighted nodes - if (nodeIds.length > 0) { - triggerNodeAnimation(nodeIds, 'glow'); + onOverview={async () => { + // Return codebase overview: clusters + processes + const clustersQuery = ` + MATCH (c:Community) + OPTIONAL MATCH (c)<-[:CodeRelation {type: 'MEMBER_OF'}]-(m) + RETURN c.id AS id, c.label AS label, c.cohesion AS cohesion, c.description AS description, count(m) AS memberCount + ORDER BY memberCount DESC + LIMIT 50 + `; + const processesQuery = ` + MATCH (p:Process) + RETURN p.id AS id, p.label AS label, p.processType AS type, p.stepCount AS steps + ORDER BY p.stepCount DESC + LIMIT 50 + `; + const [clusters, processes] = await Promise.all([ + runQuery(clustersQuery), + runQuery(processesQuery), + ]); + return { clusters, processes }; + }} + onExplore={async (target: string, type?: 'symbol' | 'cluster' | 'process') => { + // Explore a specific target + if (type === 'cluster' || target.startsWith('comm_')) { + const query = ` + MATCH (c:Community) + WHERE c.id = '${target}' OR c.label CONTAINS '${target}' + OPTIONAL MATCH (c)<-[:CodeRelation {type: 'MEMBER_OF'}]-(m) + RETURN c.id AS id, c.label AS label, c.description AS description, collect(m.name)[0..10] AS members + LIMIT 1 + `; + return await runQuery(query); + } else if (type === 'process' || target.startsWith('proc_')) { + const query = ` + MATCH (p:Process) + WHERE p.id = '${target}' OR p.label CONTAINS '${target}' + OPTIONAL MATCH (s)-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p) + RETURN p.id AS id, p.label AS label, p.stepCount AS steps, collect({name: s.name, step: r.step})[0..20] AS trace + LIMIT 1 + `; + return await runQuery(query); + } else { + // Symbol exploration + const query = ` + MATCH (n) + WHERE n.name = '${target}' OR n.id ENDS WITH ':${target}' + OPTIONAL MATCH (n)-[:CodeRelation {type: 'MEMBER_OF'}]->(c:Community) + OPTIONAL MATCH (n)-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process) + RETURN n.id AS id, n.name AS name, n.filePath AS filePath, label(n) AS nodeType, + c.label AS cluster, collect({process: p.label, step: r.step}) AS processes + LIMIT 1 + `; + return await runQuery(query); } }} getContext={async () => { diff --git a/gitnexus/src/components/MCPToggle.tsx b/gitnexus/src/components/MCPToggle.tsx index 46b765889d..87c2c3e94b 100644 --- a/gitnexus/src/components/MCPToggle.tsx +++ b/gitnexus/src/components/MCPToggle.tsx @@ -14,10 +14,11 @@ type ConnectionState = 'disconnected' | 'connecting' | 'connected' | 'error'; interface MCPToggleProps { onSearch?: (query: string, limit?: number) => Promise; onCypher?: (query: string) => Promise; - onBlastRadius?: (nodeId: string, hops?: number) => Promise; - onHighlight?: (nodeIds: string[], color?: string) => void; + onImpact?: (nodeId: string, hops?: number) => Promise; onGrep?: (pattern: string, caseSensitive?: boolean, maxResults?: number) => Promise; onRead?: (filePath: string, startLine?: number, endLine?: number) => Promise; + onOverview?: () => Promise; + onExplore?: (target: string, type?: 'symbol' | 'cluster' | 'process') => Promise; showOnboardingTip?: boolean; getContext?: () => Promise; } @@ -37,10 +38,11 @@ const MCP_CONFIG = `{ export function MCPToggle({ onSearch, onCypher, - onBlastRadius, - onHighlight, + onImpact, onGrep, onRead, + onOverview, + onExplore, showOnboardingTip = false, getContext, }: MCPToggleProps = {}) { @@ -92,10 +94,11 @@ export function MCPToggle({ // Register tool handlers if (onSearch) client.registerHandler('search', async (params) => onSearch(params.query, params.limit)); if (onCypher) client.registerHandler('cypher', async (params) => onCypher(params.query)); - if (onBlastRadius) client.registerHandler('blastRadius', async (params) => onBlastRadius(params.nodeId, params.hops)); - if (onHighlight) client.registerHandler('highlight', async (params) => { onHighlight(params.nodeIds, params.color); return { highlighted: params.nodeIds.length }; }); + if (onImpact) client.registerHandler('impact', async (params) => onImpact(params.nodeId, params.hops)); if (onGrep) client.registerHandler('grep', async (params) => onGrep(params.pattern, params.caseSensitive, params.maxResults)); if (onRead) client.registerHandler('read', async (params) => onRead(params.filePath, params.startLine, params.endLine)); + if (onOverview) client.registerHandler('overview', async () => onOverview()); + if (onExplore) client.registerHandler('explore', async (params) => onExplore(params.target, params.type)); if (getContext) client.registerHandler('context', async () => getContext()); setStatus('connected'); @@ -114,7 +117,7 @@ export function MCPToggle({ } catch { setStatus('error'); } - }, [onSearch, onCypher, onBlastRadius, onHighlight, onGrep, onRead, getContext]); + }, [onSearch, onCypher, onImpact, onGrep, onRead, onOverview, onExplore, getContext]); const disconnect = useCallback(() => { const client = getMCPClient(); diff --git a/gitnexus/src/core/graph/types.ts b/gitnexus/src/core/graph/types.ts index 1e8fa20373..7bc9a5a95f 100644 --- a/gitnexus/src/core/graph/types.ts +++ b/gitnexus/src/core/graph/types.ts @@ -14,7 +14,8 @@ export type NodeLabel = | 'Import' | 'Type' | 'CodeElement' - | 'Community'; + | 'Community' + | 'Process'; export type NodeProperties = { @@ -31,6 +32,15 @@ export type NodeProperties = { keywords?: string[], description?: string, enrichedBy?: 'heuristic' | 'llm', + // Process-specific properties + processType?: 'intra_community' | 'cross_community', + stepCount?: number, + communities?: string[], + entryPointId?: string, + terminalId?: string, + // Entry point scoring (computed by process detection) + entryPointScore?: number, + entryPointReason?: string, } export type RelationshipType = @@ -45,6 +55,7 @@ export type RelationshipType = | 'IMPLEMENTS' | 'EXTENDS' | 'MEMBER_OF' + | 'STEP_IN_PROCESS' export interface GraphNode { id: string, @@ -61,6 +72,8 @@ export interface GraphRelationship { confidence: number, /** Resolution reason: 'import-resolved', 'same-file', 'fuzzy-global', or empty for non-CALLS */ reason: string, + /** Step number for STEP_IN_PROCESS relationships (1-indexed) */ + step?: number, } export interface KnowledgeGraph { diff --git a/gitnexus/src/core/ingestion/call-processor.ts b/gitnexus/src/core/ingestion/call-processor.ts index d0ac9b22e9..2b71c5aaa0 100644 --- a/gitnexus/src/core/ingestion/call-processor.ts +++ b/gitnexus/src/core/ingestion/call-processor.ts @@ -34,6 +34,7 @@ const FUNCTION_NODE_TYPES = new Set([ 'local_function_statement', // Rust 'function_item', + 'impl_item', // Methods inside impl blocks ]); /** @@ -51,21 +52,46 @@ const findEnclosingFunction = ( if (FUNCTION_NODE_TYPES.has(current.type)) { // Found enclosing function - try to get its name let funcName: string | null = null; + let label = 'Function'; // Different node types have different name locations if (current.type === 'function_declaration' || current.type === 'function_definition' || current.type === 'async_function_declaration' || - current.type === 'generator_function_declaration') { + current.type === 'generator_function_declaration' || + current.type === 'function_item') { // Rust function // Named function: function foo() {} const nameNode = current.childForFieldName?.('name') || current.children?.find((c: any) => c.type === 'identifier' || c.type === 'property_identifier'); funcName = nameNode?.text; + } else if (current.type === 'impl_item') { + // Rust method inside impl block: wrapper around function_item or const_item + // We need to look inside for the function_item + const funcItem = current.children?.find((c: any) => c.type === 'function_item'); + if (funcItem) { + const nameNode = funcItem.childForFieldName?.('name') || + funcItem.children?.find((c: any) => c.type === 'identifier'); + funcName = nameNode?.text; + label = 'Method'; + } } else if (current.type === 'method_definition') { - // Method: foo() {} inside class + // Method: foo() {} inside class (JS/TS) const nameNode = current.childForFieldName?.('name') || current.children?.find((c: any) => c.type === 'property_identifier'); funcName = nameNode?.text; + label = 'Method'; + } else if (current.type === 'method_declaration') { + // Java method: public void foo() {} + const nameNode = current.childForFieldName?.('name') || + current.children?.find((c: any) => c.type === 'identifier'); + funcName = nameNode?.text; + label = 'Method'; + } else if (current.type === 'constructor_declaration') { + // Java constructor: public ClassName() {} + const nameNode = current.childForFieldName?.('name') || + current.children?.find((c: any) => c.type === 'identifier'); + funcName = nameNode?.text; + label = 'Method'; // Treat constructors as methods for process detection } else if (current.type === 'arrow_function' || current.type === 'function_expression') { // Arrow/expression: const foo = () => {} - check parent variable declarator const parent = current.parent; @@ -78,12 +104,18 @@ const findEnclosingFunction = ( if (funcName) { // Look up the function in symbol table to get its node ID + // Try exact match first const nodeId = symbolTable.lookupExact(filePath, funcName); if (nodeId) return nodeId; - // Fallback: generate ID based on name and file - const fallbackLabel = current.type === 'method_definition' ? 'Method' : 'Function'; - return generateId(fallbackLabel, `${filePath}:${funcName}`); + // Try construct ID manually if lookup fails (common for non-exported internal functions) + // Format should match what parsing-processor generates: "Function:path/to/file:funcName" + // Check if we already have a node with this ID in the symbol table to be safe + const generatedId = generateId(label, `${filePath}:${funcName}`); + + // Ideally we should verify this ID exists, but strictly speaking if we are inside it, + // it SHOULD exist. Returning it is better than falling back to File. + return generatedId; } // Couldn't determine function name - try parent (might be nested) diff --git a/gitnexus/src/core/ingestion/entry-point-scoring.ts b/gitnexus/src/core/ingestion/entry-point-scoring.ts new file mode 100644 index 0000000000..1ef3d3ddc6 --- /dev/null +++ b/gitnexus/src/core/ingestion/entry-point-scoring.ts @@ -0,0 +1,281 @@ +/** + * Entry Point Scoring + * + * Calculates entry point scores for process detection based on: + * 1. Call ratio (existing algorithm - callees / (callers + 1)) + * 2. Export status (exported functions get higher priority) + * 3. Name patterns (functions matching entry point patterns like handle*, on*, *Controller) + * 4. Framework detection (path-based detection for Next.js, Express, Django, etc.) + * + * This module is language-agnostic - language-specific patterns are defined per language. + */ + +import { detectFrameworkFromPath } from './framework-detection'; + +// ============================================================================ +// NAME PATTERNS - All 9 supported languages +// ============================================================================ + +/** + * Common entry point naming patterns by language + * These patterns indicate functions that are likely feature entry points + */ +const ENTRY_POINT_PATTERNS: Record = { + // Universal patterns (apply to all languages) + '*': [ + /^(main|init|bootstrap|start|run|setup|configure)$/i, + /^handle[A-Z]/, // handleLogin, handleSubmit + /^on[A-Z]/, // onClick, onSubmit + /Handler$/, // RequestHandler + /Controller$/, // UserController + /^process[A-Z]/, // processPayment + /^execute[A-Z]/, // executeQuery + /^perform[A-Z]/, // performAction + /^dispatch[A-Z]/, // dispatchEvent + /^trigger[A-Z]/, // triggerAction + /^fire[A-Z]/, // fireEvent + /^emit[A-Z]/, // emitEvent + ], + + // JavaScript/TypeScript + 'javascript': [ + /^use[A-Z]/, // React hooks (useEffect, etc.) + ], + 'typescript': [ + /^use[A-Z]/, // React hooks + ], + + // Python + 'python': [ + /^app$/, // Flask/FastAPI app + /^(get|post|put|delete|patch)_/i, // REST conventions + /^api_/, // API functions + /^view_/, // Django views + ], + + // Java + 'java': [ + /^do[A-Z]/, // doGet, doPost (Servlets) + /^create[A-Z]/, // Factory patterns + /^build[A-Z]/, // Builder patterns + /Service$/, // UserService + ], + + // C# + 'csharp': [ + /^(Get|Post|Put|Delete)/, // ASP.NET conventions + /Action$/, // MVC actions + /^On[A-Z]/, // Event handlers + /Async$/, // Async entry points + ], + + // Go + 'go': [ + /Handler$/, // http.Handler pattern + /^Serve/, // ServeHTTP + /^New[A-Z]/, // Constructor pattern (returns new instance) + /^Make[A-Z]/, // Make functions + ], + + // Rust + 'rust': [ + /^(get|post|put|delete)_handler$/i, + /^handle_/, // handle_request + /^new$/, // Constructor pattern + /^run$/, // run entry point + /^spawn/, // Async spawn + ], + + // C - explicit main() boost (critical for C programs) + 'c': [ + /^main$/, // THE entry point + /^init_/, // Initialization functions + /^start_/, // Start functions + /^run_/, // Run functions + ], + + // C++ - same as C plus class patterns + 'cpp': [ + /^main$/, // THE entry point + /^init_/, + /^Create[A-Z]/, // Factory patterns + /^Run$/, // Run methods + /^Start$/, // Start methods + ], +}; + +// ============================================================================ +// UTILITY PATTERNS - Functions that should be penalized +// ============================================================================ + +/** + * Patterns that indicate utility/helper functions (NOT entry points) + * These get penalized in scoring + */ +const UTILITY_PATTERNS: RegExp[] = [ + /^(get|set|is|has|can|should|will|did)[A-Z]/, // Accessors/predicates + /^_/, // Private by convention + /^(format|parse|validate|convert|transform)/i, // Transformation utilities + /^(log|debug|error|warn|info)$/i, // Logging + /^(to|from)[A-Z]/, // Conversions + /^(encode|decode)/i, // Encoding utilities + /^(serialize|deserialize)/i, // Serialization + /^(clone|copy|deep)/i, // Cloning utilities + /^(merge|extend|assign)/i, // Object utilities + /^(filter|map|reduce|sort|find)/i, // Collection utilities (standalone) + /Helper$/, + /Util$/, + /Utils$/, + /^utils?$/i, + /^helpers?$/i, +]; + +// ============================================================================ +// TYPES +// ============================================================================ + +export interface EntryPointScoreResult { + score: number; + reasons: string[]; +} + +// ============================================================================ +// MAIN SCORING FUNCTION +// ============================================================================ + +/** + * Calculate an entry point score for a function/method + * + * Higher scores indicate better entry point candidates. + * Score = baseScore Γ— exportMultiplier Γ— nameMultiplier + * + * @param name - Function/method name + * @param language - Programming language + * @param isExported - Whether the function is exported/public + * @param callerCount - Number of functions that call this function + * @param calleeCount - Number of functions this function calls + * @returns Score and array of reasons explaining the score + */ +export function calculateEntryPointScore( + name: string, + language: string, + isExported: boolean, + callerCount: number, + calleeCount: number, + filePath: string = '' // Optional for backwards compatibility +): EntryPointScoreResult { + const reasons: string[] = []; + + // Must have outgoing calls to be an entry point (we need to trace forward) + if (calleeCount === 0) { + return { score: 0, reasons: ['no-outgoing-calls'] }; + } + + // Base score: call ratio (existing algorithm) + // High ratio = calls many, called by few = likely entry point + const baseScore = calleeCount / (callerCount + 1); + reasons.push(`base:${baseScore.toFixed(2)}`); + + // Export bonus: exported/public functions are more likely entry points + const exportMultiplier = isExported ? 2.0 : 1.0; + if (isExported) { + reasons.push('exported'); + } + + // Name pattern scoring + let nameMultiplier = 1.0; + + // Check negative patterns first (utilities get penalized) + if (UTILITY_PATTERNS.some(p => p.test(name))) { + nameMultiplier = 0.3; // Significant penalty + reasons.push('utility-pattern'); + } else { + // Check positive patterns + const universalPatterns = ENTRY_POINT_PATTERNS['*'] || []; + const langPatterns = ENTRY_POINT_PATTERNS[language] || []; + const allPatterns = [...universalPatterns, ...langPatterns]; + + if (allPatterns.some(p => p.test(name))) { + nameMultiplier = 1.5; // Bonus for matching entry point pattern + reasons.push('entry-pattern'); + } + } + + // Framework detection bonus (Phase 2) + let frameworkMultiplier = 1.0; + if (filePath) { + const frameworkHint = detectFrameworkFromPath(filePath); + if (frameworkHint) { + frameworkMultiplier = frameworkHint.entryPointMultiplier; + reasons.push(`framework:${frameworkHint.reason}`); + } + } + + // Calculate final score + const finalScore = baseScore * exportMultiplier * nameMultiplier * frameworkMultiplier; + + return { + score: finalScore, + reasons, + }; +} + +// ============================================================================ +// HELPER FUNCTIONS +// ============================================================================ + +/** + * Check if a file path is a test file (should be excluded from entry points) + * Covers common test file patterns across all supported languages + */ +export function isTestFile(filePath: string): boolean { + const p = filePath.toLowerCase().replace(/\\/g, '/'); + + return ( + // JavaScript/TypeScript test patterns + p.includes('.test.') || + p.includes('.spec.') || + p.includes('__tests__/') || + p.includes('__mocks__/') || + // Generic test folders + p.includes('/test/') || + p.includes('/tests/') || + p.includes('/testing/') || + // Python test patterns + p.endsWith('_test.py') || + p.includes('/test_') || + // Go test patterns + p.endsWith('_test.go') || + // Java test patterns + p.includes('/src/test/') || + // Rust test patterns (inline tests are different, but test files) + p.includes('/tests/') || + // C# test patterns + p.includes('.tests/') || + p.includes('tests.cs') + ); +} + +/** + * Check if a file path is likely a utility/helper file + * These might still have entry points but should be lower priority + */ +export function isUtilityFile(filePath: string): boolean { + const p = filePath.toLowerCase().replace(/\\/g, '/'); + + return ( + p.includes('/utils/') || + p.includes('/util/') || + p.includes('/helpers/') || + p.includes('/helper/') || + p.includes('/common/') || + p.includes('/shared/') || + p.includes('/lib/') || + p.endsWith('/utils.ts') || + p.endsWith('/utils.js') || + p.endsWith('/helpers.ts') || + p.endsWith('/helpers.js') || + p.endsWith('_utils.py') || + p.endsWith('_helpers.py') + ); +} diff --git a/gitnexus/src/core/ingestion/framework-detection.ts b/gitnexus/src/core/ingestion/framework-detection.ts new file mode 100644 index 0000000000..d3c75ab876 --- /dev/null +++ b/gitnexus/src/core/ingestion/framework-detection.ts @@ -0,0 +1,243 @@ +/** + * Framework Detection + * + * Detects frameworks from file path patterns and provides entry point multipliers. + * This enables framework-aware entry point scoring. + * + * DESIGN: Returns null for unknown frameworks, which causes a 1.0 multiplier + * (no bonus, no penalty) - same behavior as before this feature. + */ + +// ============================================================================ +// TYPES +// ============================================================================ + +export interface FrameworkHint { + framework: string; + entryPointMultiplier: number; + reason: string; +} + +// ============================================================================ +// PATH-BASED FRAMEWORK DETECTION +// ============================================================================ + +/** + * Detect framework from file path patterns + * + * This provides entry point multipliers based on well-known framework conventions. + * Returns null if no framework pattern is detected (falls back to 1.0 multiplier). + */ +export function detectFrameworkFromPath(filePath: string): FrameworkHint | null { + // Normalize path separators and ensure leading slash for consistent matching + let p = filePath.toLowerCase().replace(/\\/g, '/'); + if (!p.startsWith('/')) { + p = '/' + p; // Add leading slash so patterns like '/app/' match 'app/...' + } + + // ========== JAVASCRIPT / TYPESCRIPT FRAMEWORKS ========== + + // Next.js - Pages Router (high confidence) + if (p.includes('/pages/') && !p.includes('/_') && !p.includes('/api/')) { + if (p.endsWith('.tsx') || p.endsWith('.ts') || p.endsWith('.jsx') || p.endsWith('.js')) { + return { framework: 'nextjs-pages', entryPointMultiplier: 3.0, reason: 'nextjs-page' }; + } + } + + // Next.js - App Router (page.tsx files) + if (p.includes('/app/') && ( + p.endsWith('page.tsx') || p.endsWith('page.ts') || + p.endsWith('page.jsx') || p.endsWith('page.js') + )) { + return { framework: 'nextjs-app', entryPointMultiplier: 3.0, reason: 'nextjs-app-page' }; + } + + // Next.js - API Routes + if (p.includes('/pages/api/') || (p.includes('/app/') && p.includes('/api/') && p.endsWith('route.ts'))) { + return { framework: 'nextjs-api', entryPointMultiplier: 3.0, reason: 'nextjs-api-route' }; + } + + // Next.js - Layout files (moderate - they're entry-ish but not the main entry) + if (p.includes('/app/') && (p.endsWith('layout.tsx') || p.endsWith('layout.ts'))) { + return { framework: 'nextjs-app', entryPointMultiplier: 2.0, reason: 'nextjs-layout' }; + } + + // Express / Node.js routes + if (p.includes('/routes/') && (p.endsWith('.ts') || p.endsWith('.js'))) { + return { framework: 'express', entryPointMultiplier: 2.5, reason: 'routes-folder' }; + } + + // Generic controllers (MVC pattern) + if (p.includes('/controllers/') && (p.endsWith('.ts') || p.endsWith('.js'))) { + return { framework: 'mvc', entryPointMultiplier: 2.5, reason: 'controllers-folder' }; + } + + // Generic handlers + if (p.includes('/handlers/') && (p.endsWith('.ts') || p.endsWith('.js'))) { + return { framework: 'handlers', entryPointMultiplier: 2.5, reason: 'handlers-folder' }; + } + + // React components (lower priority - not all are entry points) + if ((p.includes('/components/') || p.includes('/views/')) && + (p.endsWith('.tsx') || p.endsWith('.jsx'))) { + // Only boost if PascalCase filename (likely a component, not util) + const fileName = p.split('/').pop() || ''; + if (/^[A-Z]/.test(fileName)) { + return { framework: 'react', entryPointMultiplier: 1.5, reason: 'react-component' }; + } + } + + // ========== PYTHON FRAMEWORKS ========== + + // Django views (high confidence) + if (p.endsWith('views.py')) { + return { framework: 'django', entryPointMultiplier: 3.0, reason: 'django-views' }; + } + + // Django URL configs + if (p.endsWith('urls.py')) { + return { framework: 'django', entryPointMultiplier: 2.0, reason: 'django-urls' }; + } + + // FastAPI / Flask routers + if ((p.includes('/routers/') || p.includes('/endpoints/') || p.includes('/routes/')) && + p.endsWith('.py')) { + return { framework: 'fastapi', entryPointMultiplier: 2.5, reason: 'api-routers' }; + } + + // Python API folder + if (p.includes('/api/') && p.endsWith('.py') && !p.endsWith('__init__.py')) { + return { framework: 'python-api', entryPointMultiplier: 2.0, reason: 'api-folder' }; + } + + // ========== JAVA FRAMEWORKS ========== + + // Spring Boot controllers + if ((p.includes('/controller/') || p.includes('/controllers/')) && p.endsWith('.java')) { + return { framework: 'spring', entryPointMultiplier: 3.0, reason: 'spring-controller' }; + } + + // Spring Boot - files ending in Controller.java + if (p.endsWith('controller.java')) { + return { framework: 'spring', entryPointMultiplier: 3.0, reason: 'spring-controller-file' }; + } + + // Java service layer (often entry points for business logic) + if ((p.includes('/service/') || p.includes('/services/')) && p.endsWith('.java')) { + return { framework: 'java-service', entryPointMultiplier: 1.8, reason: 'java-service' }; + } + + // ========== C# / .NET FRAMEWORKS ========== + + // ASP.NET Controllers + if (p.includes('/controllers/') && p.endsWith('.cs')) { + return { framework: 'aspnet', entryPointMultiplier: 3.0, reason: 'aspnet-controller' }; + } + + // ASP.NET - files ending in Controller.cs + if (p.endsWith('controller.cs')) { + return { framework: 'aspnet', entryPointMultiplier: 3.0, reason: 'aspnet-controller-file' }; + } + + // Blazor pages + if (p.includes('/pages/') && p.endsWith('.razor')) { + return { framework: 'blazor', entryPointMultiplier: 2.5, reason: 'blazor-page' }; + } + + // ========== GO FRAMEWORKS ========== + + // Go handlers + if ((p.includes('/handlers/') || p.includes('/handler/')) && p.endsWith('.go')) { + return { framework: 'go-http', entryPointMultiplier: 2.5, reason: 'go-handlers' }; + } + + // Go routes + if (p.includes('/routes/') && p.endsWith('.go')) { + return { framework: 'go-http', entryPointMultiplier: 2.5, reason: 'go-routes' }; + } + + // Go controllers + if (p.includes('/controllers/') && p.endsWith('.go')) { + return { framework: 'go-mvc', entryPointMultiplier: 2.5, reason: 'go-controller' }; + } + + // Go main.go files (THE entry point) + if (p.endsWith('/main.go') || p.endsWith('/cmd/') && p.endsWith('.go')) { + return { framework: 'go', entryPointMultiplier: 3.0, reason: 'go-main' }; + } + + // ========== RUST FRAMEWORKS ========== + + // Rust handlers/routes + if ((p.includes('/handlers/') || p.includes('/routes/')) && p.endsWith('.rs')) { + return { framework: 'rust-web', entryPointMultiplier: 2.5, reason: 'rust-handlers' }; + } + + // Rust main.rs (THE entry point) + if (p.endsWith('/main.rs')) { + return { framework: 'rust', entryPointMultiplier: 3.0, reason: 'rust-main' }; + } + + // Rust bin folder (executables) + if (p.includes('/bin/') && p.endsWith('.rs')) { + return { framework: 'rust', entryPointMultiplier: 2.5, reason: 'rust-bin' }; + } + + // ========== C / C++ ========== + + // C/C++ main files + if (p.endsWith('/main.c') || p.endsWith('/main.cpp') || p.endsWith('/main.cc')) { + return { framework: 'c-cpp', entryPointMultiplier: 3.0, reason: 'c-main' }; + } + + // C/C++ src folder entry points (if named specifically) + if ((p.includes('/src/') && (p.endsWith('/app.c') || p.endsWith('/app.cpp')))) { + return { framework: 'c-cpp', entryPointMultiplier: 2.5, reason: 'c-app' }; + } + + // ========== GENERIC PATTERNS ========== + + // Any language: index files in API folders + if (p.includes('/api/') && ( + p.endsWith('/index.ts') || p.endsWith('/index.js') || + p.endsWith('/__init__.py') + )) { + return { framework: 'api', entryPointMultiplier: 1.8, reason: 'api-index' }; + } + + // No framework detected - return null for graceful fallback (1.0 multiplier) + return null; +} + +// ============================================================================ +// FUTURE: AST-BASED PATTERNS (for Phase 3) +// ============================================================================ + +/** + * Patterns that indicate entry points within code (for future AST-based detection) + * These would require parsing decorators/annotations in the code itself. + */ +export const FRAMEWORK_AST_PATTERNS = { + // JavaScript/TypeScript decorators + 'nestjs': ['@Controller', '@Get', '@Post', '@Put', '@Delete', '@Patch'], + 'express': ['app.get', 'app.post', 'app.put', 'app.delete', 'router.get', 'router.post'], + + // Python decorators + 'fastapi': ['@app.get', '@app.post', '@app.put', '@app.delete', '@router.get'], + 'flask': ['@app.route', '@blueprint.route'], + + // Java annotations + 'spring': ['@RestController', '@Controller', '@GetMapping', '@PostMapping', '@RequestMapping'], + 'jaxrs': ['@Path', '@GET', '@POST', '@PUT', '@DELETE'], + + // C# attributes + 'aspnet': ['[ApiController]', '[HttpGet]', '[HttpPost]', '[Route]'], + + // Go patterns (function signatures) + 'go-http': ['http.Handler', 'http.HandlerFunc', 'ServeHTTP'], + + // Rust macros + 'actix': ['#[get', '#[post', '#[put', '#[delete'], + 'axum': ['Router::new'], + 'rocket': ['#[get', '#[post'], +}; diff --git a/gitnexus/src/core/ingestion/import-processor.ts b/gitnexus/src/core/ingestion/import-processor.ts index 0dd9b4893a..c0cb6bd681 100644 --- a/gitnexus/src/core/ingestion/import-processor.ts +++ b/gitnexus/src/core/ingestion/import-processor.ts @@ -11,16 +11,18 @@ export type ImportMap = Map>; export const createImportMap = (): ImportMap => new Map(); -// Helper: Resolve relative paths (e.g. "../utils" -> "src/lib/utils.ts") +// Helper: Resolve import paths (relative and absolute/package-style) const resolveImportPath = ( currentFile: string, importPath: string, - allFiles: Set + allFiles: Set, + allFileList: string[], + resolveCache: Map ): string | null => { - // 1. Handle non-relative imports (libraries like 'react') - if (!importPath.startsWith('.')) return null; // We skip node_modules for now + const cacheKey = `${currentFile}::${importPath}`; + if (resolveCache.has(cacheKey)) return resolveCache.get(cacheKey) ?? null; - // 2. Resolve '..' and '.' + // 1. Resolve '..' and '.' for relative imports const currentDir = currentFile.split('/').slice(0, -1); const parts = importPath.split('/'); @@ -35,7 +37,7 @@ const resolveImportPath = ( const basePath = currentDir.join('/'); - // 3. Try extensions for all supported languages + // 2. Try extensions for all supported languages const extensions = [ '', // TypeScript/JavaScript @@ -54,11 +56,51 @@ const resolveImportPath = ( '.rs', '/mod.rs' ]; - for (const ext of extensions) { - const candidate = basePath + ext; - if (allFiles.has(candidate)) return candidate; + if (importPath.startsWith('.')) { + for (const ext of extensions) { + const candidate = basePath + ext; + if (allFiles.has(candidate)) { + resolveCache.set(cacheKey, candidate); + return candidate; + } + } + resolveCache.set(cacheKey, null); + return null; + } + + // 3. Handle absolute/package imports (Java, Go, Python, etc.) + if (importPath.endsWith('.*')) { + resolveCache.set(cacheKey, null); + return null; + } + + const pathLike = importPath.includes('/') + ? importPath + : importPath.replace(/\./g, '/'); + const pathParts = pathLike.split('/').filter(Boolean); + + // Normalize all file paths to forward slashes for matching + const normalizedFileList = allFileList.map(p => p.replace(/\\/g, '/')); + + for (let i = 0; i < pathParts.length; i++) { + const suffix = pathParts.slice(i).join('/'); + for (const ext of extensions) { + const suffixWithExt = suffix + ext; + // Require path separator before match to avoid false positives like "View.java" matching "RootView.java" + const suffixPattern = '/' + suffixWithExt; + const matchIdx = normalizedFileList.findIndex(filePath => + filePath.endsWith(suffixPattern) || filePath.toLowerCase().endsWith(suffixPattern.toLowerCase()) + ); + if (matchIdx !== -1) { + const match = allFileList[matchIdx]; + resolveCache.set(cacheKey, match); + return match; + } + } } + // Unresolved imports (external packages, SDK imports) are expected - don't log + resolveCache.set(cacheKey, null); return null; }; @@ -72,6 +114,12 @@ export const processImports = async ( // Create a Set of all file paths for fast lookup during resolution const allFilePaths = new Set(files.map(f => f.path)); const parser = await loadParser(); + const resolveCache = new Map(); + const allFileList = files.map(f => f.path); + + // Track import statistics + let totalImportsFound = 0; + let totalImportsResolved = 0; for (let i = 0; i < files.length; i++) { const file = files[i]; @@ -102,6 +150,8 @@ export const processImports = async ( try { query = parser.getLanguage().query(queryStr); matches = query.matches(tree.rootNode); + + // Removed verbose Java import logging } catch (queryError: any) { // Detailed debug logging for query failures console.group(`πŸ”΄ Query Error: ${file.path}`); @@ -123,13 +173,27 @@ export const processImports = async ( if (captureMap['import']) { const sourceNode = captureMap['import.source']; - if (!sourceNode) return; + if (!sourceNode) { + if (import.meta.env.DEV) { + console.log(`⚠️ Import captured but no source node in ${file.path}`); + } + return; + } // Clean path (remove quotes) const rawImportPath = sourceNode.text.replace(/['"]/g, ''); + totalImportsFound++; + + // Removed verbose per-import logging // Resolve to actual file in the system - const resolvedPath = resolveImportPath(file.path, rawImportPath, allFilePaths); + const resolvedPath = resolveImportPath( + file.path, + rawImportPath, + allFilePaths, + allFileList, + resolveCache + ); if (resolvedPath) { // A. Update Graph (File -> IMPORTS -> File) @@ -137,6 +201,8 @@ export const processImports = async ( const targetId = generateId('File', resolvedPath); const relId = generateId('IMPORTS', `${file.path}->${resolvedPath}`); + totalImportsResolved++; + graph.addRelationship({ id: relId, sourceId, @@ -161,6 +227,10 @@ export const processImports = async ( tree.delete(); } } + + if (import.meta.env.DEV) { + console.log(`πŸ“Š Import processing complete: ${totalImportsResolved}/${totalImportsFound} imports resolved to graph edges`); + } }; diff --git a/gitnexus/src/core/ingestion/parsing-processor.ts b/gitnexus/src/core/ingestion/parsing-processor.ts index 79d53216ec..807bcf5815 100644 --- a/gitnexus/src/core/ingestion/parsing-processor.ts +++ b/gitnexus/src/core/ingestion/parsing-processor.ts @@ -8,6 +8,108 @@ import { getLanguageFromFilename } from './utils'; export type FileProgressCallback = (current: number, total: number, filePath: string) => void; +// ============================================================================ +// EXPORT DETECTION - Language-specific visibility detection +// ============================================================================ + +/** + * Check if a symbol (function, class, etc.) is exported/public + * Handles all 9 supported languages with explicit logic + * + * @param node - The AST node for the symbol name + * @param name - The symbol name + * @param language - The programming language + * @returns true if the symbol is exported/public + */ +const isNodeExported = (node: any, name: string, language: string): boolean => { + let current = node; + + switch (language) { + // JavaScript/TypeScript: Check for export keyword in ancestors + case 'javascript': + case 'typescript': + while (current) { + const type = current.type; + if (type === 'export_statement' || + type === 'export_specifier' || + type === 'lexical_declaration' && current.parent?.type === 'export_statement') { + return true; + } + // Also check if text starts with 'export ' + if (current.text?.startsWith('export ')) { + return true; + } + current = current.parent; + } + return false; + + // Python: Public if no leading underscore (convention) + case 'python': + return !name.startsWith('_'); + + // Java: Check for 'public' modifier + // In tree-sitter Java, modifiers are siblings of the name node, not parents + case 'java': + while (current) { + // Check if this node or any sibling is a 'modifiers' node containing 'public' + if (current.parent) { + const parent = current.parent; + // Check all children of the parent for modifiers + for (let i = 0; i < parent.childCount; i++) { + const child = parent.child(i); + if (child?.type === 'modifiers' && child.text?.includes('public')) { + return true; + } + } + // Also check if the parent's text starts with 'public' (fallback) + if (parent.type === 'method_declaration' || parent.type === 'constructor_declaration') { + if (parent.text?.trimStart().startsWith('public')) { + return true; + } + } + } + current = current.parent; + } + return false; + + // C#: Check for 'public' modifier in ancestors + case 'csharp': + while (current) { + if (current.type === 'modifier' || current.type === 'modifiers') { + if (current.text?.includes('public')) return true; + } + current = current.parent; + } + return false; + + // Go: Uppercase first letter = exported + case 'go': + if (name.length === 0) return false; + const first = name[0]; + // Must be uppercase letter (not a number or symbol) + return first === first.toUpperCase() && first !== first.toLowerCase(); + + // Rust: Check for 'pub' visibility modifier + case 'rust': + while (current) { + if (current.type === 'visibility_modifier') { + if (current.text?.includes('pub')) return true; + } + current = current.parent; + } + return false; + + // C/C++: No native export concept at language level + // Entry points will be detected via name patterns (main, etc.) + case 'c': + case 'cpp': + return false; + + default: + return false; + } +}; + export const processParsing = async ( graph: KnowledgeGraph, files: { path: string; content: string }[], @@ -123,7 +225,8 @@ export const processParsing = async ( filePath: file.path, startLine: nameNode.startPosition.row, endLine: nameNode.endPosition.row, - language: language + language: language, + isExported: isNodeExported(nameNode, nodeName, language), } }; diff --git a/gitnexus/src/core/ingestion/pipeline.ts b/gitnexus/src/core/ingestion/pipeline.ts index a273d1a5db..8c276b3125 100644 --- a/gitnexus/src/core/ingestion/pipeline.ts +++ b/gitnexus/src/core/ingestion/pipeline.ts @@ -6,6 +6,7 @@ import { processImports, createImportMap } from './import-processor'; import { processCalls } from './call-processor'; import { processHeritage } from './heritage-processor'; import { processCommunities, CommunityDetectionResult } from './community-processor'; +import { processProcesses, ProcessDetectionResult } from './process-processor'; import { createSymbolTable } from './symbol-table'; import { createASTCache } from './ast-cache'; import { PipelineProgress, PipelineResult } from '../../types/pipeline'; @@ -122,6 +123,16 @@ export const runPipelineFromFiles = async ( stats: { filesProcessed: current, totalFiles: total, nodesCreated: graph.nodeCount }, }); }); + + // Debug: Count IMPORTS relationships + if (import.meta.env.DEV) { + const importsCount = graph.relationships.filter(r => r.type === 'IMPORTS').length; + console.log(`πŸ“Š Pipeline: After import phase, graph has ${importsCount} IMPORTS relationships (total: ${graph.relationshipCount})`); + if (importsCount > 0) { + const sample = graph.relationships.filter(r => r.type === 'IMPORTS').slice(0, 3); + sample.forEach(r => console.log(` Sample IMPORTS: ${r.sourceId} β†’ ${r.targetId}`)); + } + } // Phase 5: Calls (82-98%) @@ -210,12 +221,70 @@ export const runPipelineFromFiles = async ( }); }); + // Phase 8: Process Detection (98-99%) + onProgress({ + phase: 'processes', + percent: 98, + message: 'Detecting execution flows...', + stats: { filesProcessed: files.length, totalFiles: files.length, nodesCreated: graph.nodeCount }, + }); + + const processResult = await processProcesses( + graph, + communityResult.memberships, + (message, progress) => { + const processProgress = 98 + (progress * 0.01); + onProgress({ + phase: 'processes', + percent: Math.round(processProgress), + message, + stats: { filesProcessed: files.length, totalFiles: files.length, nodesCreated: graph.nodeCount }, + }); + } + ); + + // Log process detection results + if (import.meta.env.DEV) { + console.log(`πŸ”„ Process detection: ${processResult.stats.totalProcesses} processes found (${processResult.stats.crossCommunityCount} cross-community)`); + } + + // Add Process nodes to the graph + processResult.processes.forEach(proc => { + graph.addNode({ + id: proc.id, + label: 'Process' as const, + properties: { + name: proc.label, + filePath: '', + heuristicLabel: proc.heuristicLabel, + processType: proc.processType, + stepCount: proc.stepCount, + communities: proc.communities, + entryPointId: proc.entryPointId, + terminalId: proc.terminalId, + } + }); + }); + + // Add STEP_IN_PROCESS relationships + processResult.steps.forEach(step => { + graph.addRelationship({ + id: `${step.nodeId}_step_${step.step}_${step.processId}`, + type: 'STEP_IN_PROCESS', + sourceId: step.nodeId, + targetId: step.processId, + confidence: 1.0, + reason: 'trace-detection', + step: step.step, + }); + }); + - // Phase 8: Complete (100%) + // Phase 9: Complete (100%) onProgress({ phase: 'complete', percent: 100, - message: `Graph complete! ${communityResult.stats.totalCommunities} communities detected.`, + message: `Graph complete! ${communityResult.stats.totalCommunities} communities, ${processResult.stats.totalProcesses} processes detected.`, stats: { filesProcessed: files.length, totalFiles: files.length, @@ -226,7 +295,7 @@ export const runPipelineFromFiles = async ( // Cleanup WASM memory before returning astCache.clear(); - return { graph, fileContents, communityResult }; + return { graph, fileContents, communityResult, processResult }; } catch (error) { cleanup(); diff --git a/gitnexus/src/core/ingestion/process-processor.ts b/gitnexus/src/core/ingestion/process-processor.ts new file mode 100644 index 0000000000..cf983d2e62 --- /dev/null +++ b/gitnexus/src/core/ingestion/process-processor.ts @@ -0,0 +1,409 @@ +/** + * Process Detection Processor + * + * Detects execution flows (Processes) in the code graph by: + * 1. Finding entry points (functions with no internal callers) + * 2. Tracing forward via CALLS edges (BFS) + * 3. Grouping and deduplicating similar paths + * 4. Labeling with heuristic names + * + * Processes help agents understand how features work through the codebase. + */ + +import { KnowledgeGraph, GraphNode, GraphRelationship, NodeLabel } from '../graph/types'; +import { CommunityMembership } from './community-processor'; +import { calculateEntryPointScore, isTestFile } from './entry-point-scoring'; + +// ============================================================================ +// CONFIGURATION +// ============================================================================ + +export interface ProcessDetectionConfig { + maxTraceDepth: number; // Maximum steps to trace (default: 10) + maxBranching: number; // Max branches to follow per node (default: 3) + maxProcesses: number; // Maximum processes to detect (default: 50) + minSteps: number; // Minimum steps for a valid process (default: 2) +} + +const DEFAULT_CONFIG: ProcessDetectionConfig = { + maxTraceDepth: 10, + maxBranching: 4, + maxProcesses: 75, + minSteps: 2, +}; + +// ============================================================================ +// TYPES +// ============================================================================ + +export interface ProcessNode { + id: string; // "proc_handleLogin_createSession" + label: string; // "HandleLogin β†’ CreateSession" + heuristicLabel: string; + processType: 'intra_community' | 'cross_community'; + stepCount: number; + communities: string[]; // Community IDs touched + entryPointId: string; + terminalId: string; + trace: string[]; // Ordered array of node IDs +} + +export interface ProcessStep { + nodeId: string; + processId: string; + step: number; // 1-indexed position in trace +} + +export interface ProcessDetectionResult { + processes: ProcessNode[]; + steps: ProcessStep[]; + stats: { + totalProcesses: number; + crossCommunityCount: number; + avgStepCount: number; + entryPointsFound: number; + }; +} + +// ============================================================================ +// MAIN PROCESSOR +// ============================================================================ + +/** + * Detect processes (execution flows) in the knowledge graph + * + * This runs AFTER community detection, using CALLS edges to trace flows. + */ +export const processProcesses = async ( + knowledgeGraph: KnowledgeGraph, + memberships: CommunityMembership[], + onProgress?: (message: string, progress: number) => void, + config: Partial = {} +): Promise => { + const cfg = { ...DEFAULT_CONFIG, ...config }; + + onProgress?.('Finding entry points...', 0); + + // Build lookup maps + const membershipMap = new Map(); + memberships.forEach(m => membershipMap.set(m.nodeId, m.communityId)); + + const callsEdges = buildCallsGraph(knowledgeGraph); + const reverseCallsEdges = buildReverseCallsGraph(knowledgeGraph); + const nodeMap = new Map(); + knowledgeGraph.nodes.forEach(n => nodeMap.set(n.id, n)); + + // Step 1: Find entry points (functions that call others but have few callers) + const entryPoints = findEntryPoints(knowledgeGraph, reverseCallsEdges, callsEdges); + + onProgress?.(`Found ${entryPoints.length} entry points, tracing flows...`, 20); + + onProgress?.(`Found ${entryPoints.length} entry points, tracing flows...`, 20); + + // Step 2: Trace processes from each entry point + const allTraces: string[][] = []; + + for (let i = 0; i < entryPoints.length && allTraces.length < cfg.maxProcesses * 2; i++) { + const entryId = entryPoints[i]; + const traces = traceFromEntryPoint(entryId, callsEdges, cfg); + + // Filter out traces that are too short + traces.filter(t => t.length >= cfg.minSteps).forEach(t => allTraces.push(t)); + + if (i % 10 === 0) { + onProgress?.(`Tracing entry point ${i + 1}/${entryPoints.length}...`, 20 + (i / entryPoints.length) * 40); + } + } + + onProgress?.(`Found ${allTraces.length} traces, deduplicating...`, 60); + + // Step 3: Deduplicate similar traces + const uniqueTraces = deduplicateTraces(allTraces); + + // Step 4: Limit to max processes (prioritize longer traces) + const limitedTraces = uniqueTraces + .sort((a, b) => b.length - a.length) + .slice(0, cfg.maxProcesses); + + onProgress?.(`Creating ${limitedTraces.length} process nodes...`, 80); + + // Step 5: Create process nodes + const processes: ProcessNode[] = []; + const steps: ProcessStep[] = []; + + limitedTraces.forEach((trace, idx) => { + const entryPointId = trace[0]; + const terminalId = trace[trace.length - 1]; + + // Get communities touched + const communitiesSet = new Set(); + trace.forEach(nodeId => { + const comm = membershipMap.get(nodeId); + if (comm) communitiesSet.add(comm); + }); + const communities = Array.from(communitiesSet); + + // Determine process type + const processType: 'intra_community' | 'cross_community' = + communities.length > 1 ? 'cross_community' : 'intra_community'; + + // Generate label + const entryNode = nodeMap.get(entryPointId); + const terminalNode = nodeMap.get(terminalId); + const entryName = entryNode?.properties.name || 'Unknown'; + const terminalName = terminalNode?.properties.name || 'Unknown'; + const heuristicLabel = `${capitalize(entryName)} β†’ ${capitalize(terminalName)}`; + + const processId = `proc_${idx}_${sanitizeId(entryName)}`; + + processes.push({ + id: processId, + label: heuristicLabel, + heuristicLabel, + processType, + stepCount: trace.length, + communities, + entryPointId, + terminalId, + trace, + }); + + // Create step relationships + trace.forEach((nodeId, stepIdx) => { + steps.push({ + nodeId, + processId, + step: stepIdx + 1, // 1-indexed + }); + }); + }); + + onProgress?.('Process detection complete!', 100); + + // Calculate stats + const crossCommunityCount = processes.filter(p => p.processType === 'cross_community').length; + const avgStepCount = processes.length > 0 + ? processes.reduce((sum, p) => sum + p.stepCount, 0) / processes.length + : 0; + + return { + processes, + steps, + stats: { + totalProcesses: processes.length, + crossCommunityCount, + avgStepCount: Math.round(avgStepCount * 10) / 10, + entryPointsFound: entryPoints.length, + }, + }; +}; + +// ============================================================================ +// HELPER: Build CALLS adjacency list +// ============================================================================ + +type AdjacencyList = Map; + +const buildCallsGraph = (graph: KnowledgeGraph): AdjacencyList => { + const adj = new Map(); + + graph.relationships.forEach(rel => { + if (rel.type === 'CALLS') { + if (!adj.has(rel.sourceId)) { + adj.set(rel.sourceId, []); + } + adj.get(rel.sourceId)!.push(rel.targetId); + } + }); + + return adj; +}; + +const buildReverseCallsGraph = (graph: KnowledgeGraph): AdjacencyList => { + const adj = new Map(); + + graph.relationships.forEach(rel => { + if (rel.type === 'CALLS') { + if (!adj.has(rel.targetId)) { + adj.set(rel.targetId, []); + } + adj.get(rel.targetId)!.push(rel.sourceId); + } + }); + + return adj; +}; + +/** + * Find functions/methods that are good entry points for tracing. + * + * Entry points are scored based on: + * 1. Call ratio (calls many, called by few) + * 2. Export status (exported/public functions rank higher) + * 3. Name patterns (handle*, on*, *Controller, etc.) + * + * Test files are excluded entirely. + */ +const findEntryPoints = ( + graph: KnowledgeGraph, + reverseCallsEdges: AdjacencyList, + callsEdges: AdjacencyList +): string[] => { + const symbolTypes = new Set(['Function', 'Method']); + const entryPointCandidates: { + id: string; + score: number; + reasons: string[]; + }[] = []; + + graph.nodes.forEach(node => { + if (!symbolTypes.has(node.label)) return; + + const filePath = node.properties.filePath || ''; + + // Skip test files entirely + if (isTestFile(filePath)) return; + + const callers = reverseCallsEdges.get(node.id) || []; + const callees = callsEdges.get(node.id) || []; + + // Must have at least 1 outgoing call to trace forward + if (callees.length === 0) return; + + // Calculate entry point score using new scoring system + const { score, reasons } = calculateEntryPointScore( + node.properties.name, + node.properties.language || 'javascript', + node.properties.isExported ?? false, + callers.length, + callees.length, + filePath // Pass filePath for framework detection + ); + + if (score > 0) { + entryPointCandidates.push({ id: node.id, score, reasons }); + } + }); + + // Sort by score descending and return top candidates + const sorted = entryPointCandidates.sort((a, b) => b.score - a.score); + + // DEBUG: Log top candidates with new scoring details + if (sorted.length > 0 && typeof import.meta !== 'undefined' && import.meta.env?.DEV) { + console.log(`[Process] Top 10 entry point candidates (new scoring):`); + sorted.slice(0, 10).forEach((c, i) => { + const node = graph.nodes.find(n => n.id === c.id); + const exported = node?.properties.isExported ? 'βœ“' : 'βœ—'; + const shortPath = node?.properties.filePath?.split('/').slice(-2).join('/') || ''; + console.log(` ${i+1}. ${node?.properties.name} [exported:${exported}] (${shortPath})`); + console.log(` score: ${c.score.toFixed(2)} = [${c.reasons.join(' Γ— ')}]`); + }); + } + + return sorted + .slice(0, 200) // Limit to prevent explosion + .map(c => c.id); +}; + +// ============================================================================ +// HELPER: Trace from entry point (BFS) +// ============================================================================ + +/** + * Trace forward from an entry point using BFS. + * Returns all distinct paths up to maxDepth. + */ +const traceFromEntryPoint = ( + entryId: string, + callsEdges: AdjacencyList, + config: ProcessDetectionConfig +): string[][] => { + const traces: string[][] = []; + + // BFS with path tracking + // Each queue item: [currentNodeId, pathSoFar] + const queue: [string, string[]][] = [[entryId, [entryId]]]; + const visited = new Set(); + + while (queue.length > 0 && traces.length < config.maxBranching * 3) { + const [currentId, path] = queue.shift()!; + + // Get outgoing calls + const callees = callsEdges.get(currentId) || []; + + if (callees.length === 0) { + // Terminal node - this is a complete trace + if (path.length >= config.minSteps) { + traces.push([...path]); + } + } else if (path.length >= config.maxTraceDepth) { + // Max depth reached - save what we have + if (path.length >= config.minSteps) { + traces.push([...path]); + } + } else { + // Continue tracing - limit branching + const limitedCallees = callees.slice(0, config.maxBranching); + let addedBranch = false; + + for (const calleeId of limitedCallees) { + // Avoid cycles + if (!path.includes(calleeId)) { + queue.push([calleeId, [...path, calleeId]]); + addedBranch = true; + } + } + + // If all branches were cycles, save current path as terminal + if (!addedBranch && path.length >= config.minSteps) { + traces.push([...path]); + } + } + } + + return traces; +}; + +// ============================================================================ +// HELPER: Deduplicate traces +// ============================================================================ + +/** + * Merge traces that are subsets of other traces. + * Keep longer traces, remove redundant shorter ones. + */ +const deduplicateTraces = (traces: string[][]): string[][] => { + if (traces.length === 0) return []; + + // Sort by length descending + const sorted = [...traces].sort((a, b) => b.length - a.length); + const unique: string[][] = []; + + for (const trace of sorted) { + // Check if this trace is a subset of any already-added trace + const traceKey = trace.join('->'); + const isSubset = unique.some(existing => { + const existingKey = existing.join('->'); + return existingKey.includes(traceKey); + }); + + if (!isSubset) { + unique.push(trace); + } + } + + return unique; +}; + +// ============================================================================ +// HELPER: String utilities +// ============================================================================ + +const capitalize = (s: string): string => { + if (!s) return s; + return s.charAt(0).toUpperCase() + s.slice(1); +}; + +const sanitizeId = (s: string): string => { + return s.replace(/[^a-zA-Z0-9]/g, '_').substring(0, 20).toLowerCase(); +}; diff --git a/gitnexus/src/core/ingestion/tree-sitter-queries.ts b/gitnexus/src/core/ingestion/tree-sitter-queries.ts index 464ed1f868..a931b4a400 100644 --- a/gitnexus/src/core/ingestion/tree-sitter-queries.ts +++ b/gitnexus/src/core/ingestion/tree-sitter-queries.ts @@ -160,8 +160,8 @@ export const JAVA_QUERIES = ` (method_declaration name: (identifier) @name) @definition.method (constructor_declaration name: (identifier) @name) @definition.constructor -; Imports -(import_declaration (scoped_identifier) @import.source) @import +; Imports - capture any import declaration child as source +(import_declaration (_) @import.source) @import ; Calls (method_invocation name: (identifier) @call.name) @call diff --git a/gitnexus/src/core/kuzu/csv-generator.ts b/gitnexus/src/core/kuzu/csv-generator.ts index 0b16898936..43df569cf5 100644 --- a/gitnexus/src/core/kuzu/csv-generator.ts +++ b/gitnexus/src/core/kuzu/csv-generator.ts @@ -226,9 +226,35 @@ const generateCommunityCSV = (nodes: GraphNode[]): string => { return rows.join('\n'); }; -// ============================================================================ -// RELATIONSHIP CSV GENERATOR (Single Table) -// ============================================================================ +/** + * Generate CSV for Process nodes + * Headers: id,label,heuristicLabel,processType,stepCount,communities,entryPointId,terminalId + */ +const generateProcessCSV = (nodes: GraphNode[]): string => { + const headers = ['id', 'label', 'heuristicLabel', 'processType', 'stepCount', 'communities', 'entryPointId', 'terminalId']; + const rows: string[] = [headers.join(',')]; + + for (const node of nodes) { + if (node.label !== 'Process') continue; + + // Handle communities array (string[]) + const communities = (node.properties as any).communities || []; + const communitiesStr = `[${communities.map((c: string) => `'${c.replace(/'/g, "''")}'`).join(',')}]`; + + rows.push([ + escapeCSVField(node.id), + escapeCSVField(node.properties.name || ''), // label stores name + escapeCSVField((node.properties as any).heuristicLabel || ''), + escapeCSVField((node.properties as any).processType || ''), + escapeCSVNumber((node.properties as any).stepCount, 0), + escapeCSVField(communitiesStr), // Needs CSV escaping because it contains commas! + escapeCSVField((node.properties as any).entryPointId || ''), + escapeCSVField((node.properties as any).terminalId || ''), + ].join(',')); + } + + return rows.join('\n'); +}; /** * Generate CSV for the single CodeRelation table @@ -238,7 +264,7 @@ const generateCommunityCSV = (nodes: GraphNode[]): string => { * reason: 'import-resolved' | 'same-file' | 'fuzzy-global' (or empty for non-CALLS) */ const generateRelationCSV = (graph: KnowledgeGraph): string => { - const headers = ['from', 'to', 'type', 'confidence', 'reason']; + const headers = ['from', 'to', 'type', 'confidence', 'reason', 'step']; const rows: string[] = [headers.join(',')]; for (const rel of graph.relationships) { @@ -248,6 +274,7 @@ const generateRelationCSV = (graph: KnowledgeGraph): string => { escapeCSVField(rel.type), escapeCSVNumber(rel.confidence, 1.0), escapeCSVField(rel.reason), + escapeCSVNumber((rel as any).step, 0), ].join(',')); } @@ -278,6 +305,7 @@ export const generateAllCSVs = ( nodeCSVs.set('Method', generateCodeElementCSV(nodes, 'Method', fileContents)); nodeCSVs.set('CodeElement', generateCodeElementCSV(nodes, 'CodeElement', fileContents)); nodeCSVs.set('Community', generateCommunityCSV(nodes)); + nodeCSVs.set('Process', generateProcessCSV(nodes)); // Generate single relation CSV const relCSV = generateRelationCSV(graph); diff --git a/gitnexus/src/core/kuzu/kuzu-adapter.ts b/gitnexus/src/core/kuzu/kuzu-adapter.ts index 73aa40b254..c16e2edf3f 100644 --- a/gitnexus/src/core/kuzu/kuzu-adapter.ts +++ b/gitnexus/src/core/kuzu/kuzu-adapter.ts @@ -120,12 +120,15 @@ export const loadGraphToKuzu = async ( for (const line of relLines) { try { // Parse CSV - handle quoted fields and numeric confidence - // Format: "from","to","type",confidence,"reason" - const match = line.match(/"([^"]*)","([^"]*)","([^"]*)",([0-9.]+),"([^"]*)"/); + // Parse CSV - handle quoted fields and numeric confidence + // Format: "from","to","type",confidence,"reason",step + // Note: step is unquoted numeric + const match = line.match(/"([^"]*)","([^"]*)","([^"]*)",([0-9.]+),"([^"]*)",([0-9-]+)/); if (!match) continue; - const [, fromId, toId, relType, confidenceStr, reason] = match; + const [, fromId, toId, relType, confidenceStr, reason, stepStr] = match; const confidence = parseFloat(confidenceStr) || 1.0; + const step = parseInt(stepStr) || 0; // Extract labels from node IDs // Community nodes have IDs like "comm_14" (no colon) @@ -134,6 +137,9 @@ export const loadGraphToKuzu = async ( if (nodeId.startsWith('comm_')) { return 'Community'; } + if (nodeId.startsWith('proc_')) { + return 'Process'; + } return nodeId.split(':')[0]; }; @@ -150,7 +156,7 @@ export const loadGraphToKuzu = async ( const insertQuery = ` MATCH (a:${fromLabel} {id: '${fromId.replace(/'/g, "''")}'}), (b:${toLabel} {id: '${toId.replace(/'/g, "''")}'}) - CREATE (a)-[:${REL_TABLE_NAME} {type: '${relType}', confidence: ${confidence}, reason: '${reason.replace(/'/g, "''")}'}]->(b) + CREATE (a)-[:${REL_TABLE_NAME} {type: '${relType}', confidence: ${confidence}, reason: '${reason.replace(/'/g, "''")}', step: ${step}}]->(b) `; await conn.query(insertQuery); insertedRels++; @@ -162,6 +168,7 @@ export const loadGraphToKuzu = async ( const [, fromId, toId, relType] = match; const getNodeLabel = (nodeId: string): string => { if (nodeId.startsWith('comm_')) return 'Community'; + if (nodeId.startsWith('proc_')) return 'Process'; return nodeId.split(':')[0]; }; const fromLabel = getNodeLabel(fromId); @@ -229,6 +236,9 @@ const getCopyQuery = (table: NodeTableName, path: string): string => { if (table === 'Community') { return `COPY Community(id, label, heuristicLabel, keywords, description, enrichedBy, cohesion, symbolCount) FROM "${path}" (HEADER=true, PARALLEL=false)`; } + if (table === 'Process') { + return `COPY Process(id, label, heuristicLabel, processType, stepCount, communities, entryPointId, terminalId) FROM "${path}" (HEADER=true, PARALLEL=false)`; + } // All code element tables: Function, Class, Interface, Method, CodeElement return `COPY ${table}(id, name, filePath, startLine, endLine, content) FROM "${path}" (HEADER=true, PARALLEL=false)`; }; diff --git a/gitnexus/src/core/kuzu/schema.ts b/gitnexus/src/core/kuzu/schema.ts index 437383c2d9..6c20b4bd56 100644 --- a/gitnexus/src/core/kuzu/schema.ts +++ b/gitnexus/src/core/kuzu/schema.ts @@ -13,7 +13,7 @@ // NODE TABLE NAMES // ============================================================================ export const NODE_TABLES = [ - 'File', 'Folder', 'Function', 'Class', 'Interface', 'Method', 'CodeElement', 'Community', + 'File', 'Folder', 'Function', 'Class', 'Interface', 'Method', 'CodeElement', 'Community', 'Process', // Multi-language support 'Struct', 'Enum', 'Macro', 'Typedef', 'Union', 'Namespace', 'Trait', 'Impl', 'TypeAlias', 'Const', 'Static', 'Property', 'Record', 'Delegate', 'Annotation', 'Constructor', 'Template', 'Module' @@ -26,7 +26,7 @@ export type NodeTableName = typeof NODE_TABLES[number]; export const REL_TABLE_NAME = 'CodeRelation'; // Valid relation types -export const REL_TYPES = ['CONTAINS', 'DEFINES', 'IMPORTS', 'CALLS', 'EXTENDS', 'IMPLEMENTS', 'MEMBER_OF'] as const; +export const REL_TYPES = ['CONTAINS', 'DEFINES', 'IMPORTS', 'CALLS', 'EXTENDS', 'IMPLEMENTS', 'MEMBER_OF', 'STEP_IN_PROCESS'] as const; export type RelType = typeof REL_TYPES[number]; // ============================================================================ @@ -127,6 +127,23 @@ CREATE NODE TABLE Community ( PRIMARY KEY (id) )`; +// ============================================================================ +// PROCESS NODE TABLE (for execution flow detection) +// ============================================================================ + +export const PROCESS_SCHEMA = ` +CREATE NODE TABLE Process ( + id STRING, + label STRING, + heuristicLabel STRING, + processType STRING, + stepCount INT32, + communities STRING[], + entryPointId STRING, + terminalId STRING, + PRIMARY KEY (id) +)`; + // ============================================================================ // MULTI-LANGUAGE NODE TABLE SCHEMAS // ============================================================================ @@ -206,31 +223,64 @@ CREATE REL TABLE ${REL_TABLE_NAME} ( FROM Function TO \`Enum\`, FROM Function TO Namespace, FROM Function TO TypeAlias, + FROM Function TO \`Module\`, + FROM Function TO Impl, + FROM Function TO Interface, + FROM Function TO Constructor, FROM Class TO Method, FROM Class TO Function, FROM Class TO Class, FROM Class TO Interface, FROM Class TO Community, FROM Class TO Template, + FROM Class TO TypeAlias, + FROM Class TO \`Struct\`, + FROM Class TO \`Enum\`, + FROM Class TO Constructor, FROM Method TO Function, FROM Method TO Method, FROM Method TO Class, FROM Method TO Community, FROM Method TO Template, FROM Method TO \`Struct\`, + FROM Method TO TypeAlias, + FROM Method TO \`Enum\`, + FROM Method TO \`Macro\`, + FROM Method TO Namespace, + FROM Method TO \`Module\`, + FROM Method TO Impl, + FROM Method TO Interface, + FROM Method TO Constructor, FROM Template TO Template, FROM Template TO Function, FROM Template TO Method, FROM Template TO Class, FROM Template TO \`Struct\`, + FROM Template TO TypeAlias, + FROM Template TO \`Enum\`, + FROM Template TO \`Macro\`, + FROM Template TO Interface, + FROM Template TO Constructor, + FROM \`Module\` TO \`Module\`, FROM CodeElement TO Community, FROM Interface TO Community, + FROM Interface TO Function, + FROM Interface TO Method, + FROM Interface TO Class, + FROM Interface TO Interface, + FROM Interface TO TypeAlias, + FROM Interface TO \`Struct\`, + FROM Interface TO Constructor, FROM \`Struct\` TO Community, FROM \`Struct\` TO Trait, FROM \`Struct\` TO Function, FROM \`Struct\` TO Method, FROM \`Enum\` TO Community, FROM \`Macro\` TO Community, + FROM \`Macro\` TO Function, + FROM \`Macro\` TO Method, + FROM \`Module\` TO Function, + FROM \`Module\` TO Method, FROM Typedef TO Community, FROM \`Union\` TO Community, FROM Namespace TO Community, @@ -247,11 +297,45 @@ CREATE REL TABLE ${REL_TABLE_NAME} ( FROM Constructor TO Community, FROM Constructor TO Interface, FROM Constructor TO Class, + FROM Constructor TO Method, + FROM Constructor TO Function, + FROM Constructor TO Constructor, + FROM Constructor TO \`Struct\`, + FROM Constructor TO \`Macro\`, + FROM Constructor TO Template, + FROM Constructor TO TypeAlias, + FROM Constructor TO \`Enum\`, + FROM Constructor TO Impl, + FROM Constructor TO Namespace, FROM Template TO Community, FROM \`Module\` TO Community, + FROM Function TO Process, + FROM Method TO Process, + FROM Class TO Process, + FROM Interface TO Process, + FROM \`Struct\` TO Process, + FROM Constructor TO Process, + FROM \`Module\` TO Process, + FROM \`Macro\` TO Process, + FROM Impl TO Process, + FROM Typedef TO Process, + FROM TypeAlias TO Process, + FROM \`Enum\` TO Process, + FROM \`Union\` TO Process, + FROM Namespace TO Process, + FROM Trait TO Process, + FROM \`Const\` TO Process, + FROM Static TO Process, + FROM Property TO Process, + FROM Record TO Process, + FROM Delegate TO Process, + FROM Annotation TO Process, + FROM Template TO Process, + FROM CodeElement TO Process, type STRING, confidence DOUBLE, - reason STRING + reason STRING, + step INT32 )`; // ============================================================================ @@ -288,6 +372,7 @@ export const NODE_SCHEMA_QUERIES = [ METHOD_SCHEMA, CODE_ELEMENT_SCHEMA, COMMUNITY_SCHEMA, + PROCESS_SCHEMA, // Multi-language support STRUCT_SCHEMA, ENUM_SCHEMA, diff --git a/gitnexus/src/core/llm/agent.ts b/gitnexus/src/core/llm/agent.ts index 3584655c48..8ec2cc9d03 100644 --- a/gitnexus/src/core/llm/agent.ts +++ b/gitnexus/src/core/llm/agent.ts @@ -67,26 +67,37 @@ You are an investigator. For each question: 3. **Trace** β†’ Use cypher to follow connections in the graph 4. **Cite** β†’ Ground every finding with [[file:line]] or [[Type:Name]] 5. **Validate** β†’ Use cypher to validate the results and confirm completeness of context before final output. ( MUST DO ) -6. **Highlight** β†’ Visualize key nodes with highlight ## πŸ› οΈ TOOLS -- **\`search\`** β€” Hybrid search (keyword + semantic). Returns code matches with graph connections. +- **\`search\`** β€” Hybrid search. Results grouped by process with cluster context. - **\`cypher\`** β€” Cypher queries against the graph. Use \`{{QUERY_VECTOR}}\` for vector search. - **\`grep\`** β€” Regex search. Best for exact strings, TODOs, error codes. - **\`read\`** β€” Read file content. Always use after search/grep to see full code. -- **\`highlight\`** β€” Highlight nodes in the visual graph. -- **\`blastRadius\`** β€” Impact analysis. Output is graph-verified (trusted). Run optional grep for dynamic patterns if thoroughness needed. +- **\`explore\`** β€” Deep dive on a symbol, cluster, or process. Shows membership, participation, connections. +- **\`overview\`** β€” Codebase map showing all clusters and processes. +- **\`impact\`** β€” Impact analysis. Shows affected processes, clusters, and risk level. ## πŸ“Š GRAPH SCHEMA -Nodes: File, Folder, Function, Class, Interface, Method, CodeElement -Relation: \`CodeRelation\` with \`type\` property: CONTAINS, DEFINES, IMPORTS, CALLS, EXTENDS, IMPLEMENTS +Nodes: File, Folder, Function, Class, Interface, Method, Community, Process +Relations: \`CodeRelation\` with \`type\` property: CONTAINS, DEFINES, IMPORTS, CALLS, EXTENDS, IMPLEMENTS, MEMBER_OF, STEP_IN_PROCESS + +## πŸ“ GRAPH SEMANTICS (Important!) +**Edge Types:** +- \`CALLS\`: Method invocation OR constructor injection. If A receives B as parameter and uses it, Aβ†’B is CALLS. This is intentional simplification. +- \`IMPORTS\`: File-level import/include statement. +- \`EXTENDS/IMPLEMENTS\`: Class inheritance. + +**Process Nodes:** +- Process labels use format: "EntryPoint β†’ Terminal" (e.g., "onCreate β†’ showToast") +- These are heuristic names from tracing execution flow, NOT application-defined names +- Entry points are detected via export status, naming patterns, and framework conventions Cypher examples: - \`MATCH (f:Function) RETURN f.name LIMIT 10\` - \`MATCH (f:File)-[:CodeRelation {type: 'IMPORTS'}]->(g:File) RETURN f.name, g.name\` ## πŸ“CRITICAL RULES -- **blastRadius output is trusted.** Do NOT re-validate with cypher. Optionally run the suggested grep commands for dynamic patterns. +- **impact output is trusted.** Do NOT re-validate with cypher. Optionally run the suggested grep commands for dynamic patterns. - **Cite or retract.** Never state something you can't ground. - **Read before concluding.** Don't guess from names alone. - **Retry on failure.** If a tool fails, fix the input and try again. @@ -99,7 +110,7 @@ Think like a senior architect. Be conciseβ€”no fluff, short, precise and to the - Use tables for comparisons/rankings - Use mermaid diagrams for flows/dependencies - Surface deep insights: patterns, coupling, design decisions -- End with **TL;DR** (1-2 sentences) +- End with **TL;DR** (short summary of the response, summing up the response and the most critical parts) ## MERMAID RULES When generating diagrams: diff --git a/gitnexus/src/core/llm/tools.ts b/gitnexus/src/core/llm/tools.ts index b86357ebe4..879e451fcb 100644 --- a/gitnexus/src/core/llm/tools.ts +++ b/gitnexus/src/core/llm/tools.ts @@ -1,13 +1,14 @@ /** * Graph RAG Tools for LangChain Agent * - * Consolidated tools (6 total): - * - search: Hybrid search (BM25 + semantic + RRF) with 1-hop expansion + * Consolidated tools (7 total): + * - search: Hybrid search (BM25 + semantic + RRF), grouped by process/cluster * - cypher: Execute Cypher queries (auto-embeds {{QUERY_VECTOR}} if present) * - grep: Regex pattern search across files * - read: Read file content by path - * - highlight: Highlight nodes in graph UI - * - blastRadius: Impact analysis (what depends on / is affected by changes) + * - overview: Codebase map (clusters + processes) + * - explore: Deep dive on a symbol, cluster, or process + * - impact: Impact analysis (what depends on / is affected by changes) */ import { tool } from '@langchain/core/tools'; @@ -36,8 +37,9 @@ export const createGraphRAGTools = ( * Unified search tool: BM25 + Semantic + RRF, with 1-hop graph context */ const searchTool = tool( - async ({ query, limit }: { query: string; limit?: number }) => { + async ({ query, limit, groupByProcess }: { query: string; limit?: number; groupByProcess?: boolean }) => { const k = limit ?? 10; + const shouldGroup = groupByProcess ?? true; // Step 1: Hybrid search (BM25 + semantic with RRF) let searchResults: any[] = []; @@ -62,12 +64,26 @@ export const createGraphRAGTools = ( return `No code found matching "${query}". Try different terms or use grep for exact patterns.`; } - // Step 2: Get 1-hop connections for each result - const resultsWithContext: string[] = []; + type ProcessInfo = { id: string; label: string; step?: number; stepCount?: number }; + type ResultInfo = { + idx: number; + nodeId: string; + name: string; + label: string; + filePath: string; + location: string; + sources: string; + score: string; + connections: string; + clusterLabel: string; + processes: ProcessInfo[]; + }; + + const results: ResultInfo[] = []; for (let i = 0; i < Math.min(searchResults.length, k); i++) { const r = searchResults[i]; - const nodeId = r.nodeId || r.id; + const nodeId = r.nodeId || r.id || ''; const name = r.name || r.filePath?.split('/').pop() || 'Unknown'; const label = r.label || 'File'; const filePath = r.filePath || ''; @@ -91,7 +107,6 @@ export const createGraphRAGTools = ( `; const connRes = await executeQuery(connectionsQuery); if (connRes.length > 0) { - // Result is nested array: [[outgoing], [incoming]] or {outgoing: [], incoming: []} const row = connRes[0]; const rawOutgoing = Array.isArray(row) ? row[0] : (row.outgoing || []); const rawIncoming = Array.isArray(row) ? row[1] : (row.incoming || []); @@ -116,18 +131,130 @@ export const createGraphRAGTools = ( } } - resultsWithContext.push( - `[${i + 1}] ${label}: ${name}${score}\n ID: ${nodeId}\n File: ${filePath}${location}\n Found by: ${sources}${connections}` - ); + // Cluster membership + let clusterLabel = 'Unclustered'; + if (nodeId) { + try { + const nodeLabel = nodeId.split(':')[0]; + const clusterQuery = ` + MATCH (n:${nodeLabel} {id: '${nodeId.replace(/'/g, "''")}'}) + MATCH (n)-[:CodeRelation {type: 'MEMBER_OF'}]->(c:Community) + RETURN c.label AS label + LIMIT 1 + `; + const clusterRes = await executeQuery(clusterQuery); + if (clusterRes.length > 0) { + const row = clusterRes[0]; + const labelValue = Array.isArray(row) ? row[0] : row.label; + if (labelValue) clusterLabel = labelValue; + } + } catch { + // Skip cluster lookup if query fails + } + } + + // Process participation + const processes: ProcessInfo[] = []; + if (nodeId) { + try { + const nodeLabel = nodeId.split(':')[0]; + const processQuery = ` + MATCH (n:${nodeLabel} {id: '${nodeId.replace(/'/g, "''")}'}) + MATCH (n)-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process) + RETURN p.id AS id, p.label AS label, r.step AS step, p.stepCount AS stepCount + ORDER BY r.step + `; + const procRes = await executeQuery(processQuery); + for (const row of procRes) { + const id = Array.isArray(row) ? row[0] : row.id; + const labelValue = Array.isArray(row) ? row[1] : row.label; + const step = Array.isArray(row) ? row[2] : row.step; + const stepCount = Array.isArray(row) ? row[3] : row.stepCount; + if (id && labelValue) { + processes.push({ id, label: labelValue, step, stepCount }); + } + } + } catch { + // Skip process lookup if query fails + } + } + + results.push({ + idx: i + 1, + nodeId, + name, + label, + filePath, + location, + sources, + score, + connections, + clusterLabel, + processes, + }); } - return `Found ${searchResults.length} matches:\n\n${resultsWithContext.join('\n\n')}`; + const formatResult = (r: ResultInfo, stepInfo?: ProcessInfo) => { + const stepLabel = stepInfo?.step ? ` (step ${stepInfo.step}/${stepInfo.stepCount ?? '?'})` : ''; + return `[${r.idx}] ${r.label}: ${r.name}${r.score}${stepLabel}\n ID: ${r.nodeId}\n File: ${r.filePath}${r.location}\n Cluster: ${r.clusterLabel}\n Found by: ${r.sources}${r.connections}`; + }; + + if (!shouldGroup) { + return `Found ${searchResults.length} matches:\n\n${results.map(r => formatResult(r)).join('\n\n')}`; + } + + // Group by process (or "No process") + const processMap = new Map(); + const noProcessKey = '__no_process__'; + + for (const r of results) { + if (r.processes.length === 0) { + if (!processMap.has(noProcessKey)) { + processMap.set(noProcessKey, { label: 'No process', entries: [] }); + } + processMap.get(noProcessKey)!.entries.push({ result: r }); + continue; + } + + for (const p of r.processes) { + if (!processMap.has(p.id)) { + processMap.set(p.id, { label: p.label, stepCount: p.stepCount, entries: [] }); + } + processMap.get(p.id)!.entries.push({ result: r, step: p.step, stepCount: p.stepCount }); + } + } + + const sortedProcesses = Array.from(processMap.entries()).sort((a, b) => { + const aCount = a[1].entries.length; + const bCount = b[1].entries.length; + return bCount - aCount; + }); + + const lines: string[] = []; + lines.push(`Found ${searchResults.length} matches grouped by process:`); + lines.push(''); + + for (const [pid, group] of sortedProcesses) { + const stepInfo = group.stepCount ? `, ${group.stepCount} steps` : ''; + const header = pid === noProcessKey + ? `NO PROCESS (${group.entries.length} matches)` + : `PROCESS: ${group.label} (${group.entries.length} matches${stepInfo})`; + lines.push(header); + group.entries.forEach(entry => { + const stepLabel = entry.step ? { id: pid, label: group.label, step: entry.step, stepCount: entry.stepCount } : undefined; + lines.push(formatResult(entry.result, stepLabel)); + }); + lines.push(''); + } + + return lines.join('\n').trim(); }, { name: 'search', - description: 'Search for code by keywords or concepts. Combines keyword matching and semantic understanding. Returns relevant code with their graph connections (what calls them, what they import, etc.).', + description: 'Search for code by keywords or concepts. Combines keyword matching and semantic understanding. Groups results by process with cluster context.', schema: z.object({ query: z.string().describe('What you are looking for (e.g., "authentication middleware", "database connection")'), + groupByProcess: z.boolean().optional().nullable().describe('Group results by process (default: true)'), limit: z.number().optional().nullable().describe('Max results to return (default: 10)'), }), } @@ -389,35 +516,364 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, ); // ============================================================================ - // TOOL 5: HIGHLIGHT (Highlight nodes in graph UI) + // TOOL 5: OVERVIEW (Codebase map) // ============================================================================ - const highlightTool = tool( - async ({ nodeIds, description }: { nodeIds: string[]; description?: string }) => { - if (!nodeIds || nodeIds.length === 0) { - return 'No node IDs provided.'; + const overviewTool = tool( + async () => { + try { + const clustersQuery = ` + MATCH (c:Community) + RETURN c.id AS id, c.label AS label, c.cohesion AS cohesion, c.symbolCount AS symbolCount, c.description AS description + ORDER BY c.symbolCount DESC + LIMIT 200 + `; + const processesQuery = ` + MATCH (p:Process) + RETURN p.id AS id, p.label AS label, p.processType AS type, p.stepCount AS stepCount, p.communities AS communities + ORDER BY p.stepCount DESC + LIMIT 200 + `; + const depsQuery = ` + MATCH (a)-[:CodeRelation {type: 'CALLS'}]->(b) + MATCH (a)-[:CodeRelation {type: 'MEMBER_OF'}]->(c1:Community) + MATCH (b)-[:CodeRelation {type: 'MEMBER_OF'}]->(c2:Community) + WHERE c1.id <> c2.id + RETURN c1.label AS \`from\`, c2.label AS \`to\`, COUNT(*) AS calls + ORDER BY calls DESC + LIMIT 15 + `; + const criticalQuery = ` + MATCH (s)-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process) + RETURN p.label AS label, COUNT(r) AS steps + ORDER BY steps DESC + LIMIT 10 + `; + + const [clusters, processes, deps, critical] = await Promise.all([ + executeQuery(clustersQuery), + executeQuery(processesQuery), + executeQuery(depsQuery), + executeQuery(criticalQuery), + ]); + + const clusterLines = clusters.map((row: any) => { + const label = Array.isArray(row) ? row[1] : row.label; + const symbols = Array.isArray(row) ? row[3] : row.symbolCount; + const cohesion = Array.isArray(row) ? row[2] : row.cohesion; + const desc = Array.isArray(row) ? row[4] : row.description; + const cohesionText = cohesion !== null && cohesion !== undefined ? Number(cohesion).toFixed(2) : ''; + return `| ${label || ''} | ${symbols ?? ''} | ${cohesionText} | ${desc ?? ''} |`; + }); + + const processLines = processes.map((row: any) => { + const label = Array.isArray(row) ? row[1] : row.label; + const steps = Array.isArray(row) ? row[3] : row.stepCount; + const type = Array.isArray(row) ? row[2] : row.type; + const communities = Array.isArray(row) ? row[4] : row.communities; + const clusterText = Array.isArray(communities) ? communities.length : (communities ? 1 : 0); + return `| ${label || ''} | ${steps ?? ''} | ${type ?? ''} | ${clusterText} |`; + }); + + const depLines = deps.map((row: any) => { + const from = Array.isArray(row) ? row[0] : row.from; + const to = Array.isArray(row) ? row[1] : row.to; + const calls = Array.isArray(row) ? row[2] : row.calls; + return `- ${from} -> ${to} (${calls} calls)`; + }); + + const criticalLines = critical.map((row: any) => { + const label = Array.isArray(row) ? row[0] : row.label; + const steps = Array.isArray(row) ? row[1] : row.steps; + return `- ${label} (${steps} steps)`; + }); + + return [ + `CLUSTERS (${clusters.length} total):`, + `| Cluster | Symbols | Cohesion | Description |`, + `| --- | --- | --- | --- |`, + ...clusterLines, + ``, + `PROCESSES (${processes.length} total):`, + `| Process | Steps | Type | Clusters |`, + `| --- | --- | --- | --- |`, + ...processLines, + ``, + `CLUSTER DEPENDENCIES:`, + ...(depLines.length > 0 ? depLines : ['- None found']), + ``, + `CRITICAL PATHS:`, + ...(criticalLines.length > 0 ? criticalLines : ['- None found']), + ].join('\n'); + } catch (error) { + return `Overview error: ${error instanceof Error ? error.message : String(error)}`; + } + }, + { + name: 'overview', + description: 'Codebase map showing all clusters and processes, plus cross-cluster dependencies.', + schema: z.object({}), + } + ); + + // ============================================================================ + // TOOL 6: EXPLORE (Deep dive on symbol, cluster, or process) + // ============================================================================ + + const exploreTool = tool( + async ({ target, type }: { target: string; type?: 'symbol' | 'cluster' | 'process' | null }) => { + const safeTarget = target.replace(/'/g, "''"); + let resolvedType = type ?? null; + let processRow: any | null = null; + let communityRow: any | null = null; + let symbolRow: any | null = null; + + const getRowValue = (row: any, idx: number, key: string) => Array.isArray(row) ? row[idx] : row[key]; + + if (!resolvedType || resolvedType === 'process') { + const processQuery = ` + MATCH (p:Process) + WHERE p.id = '${safeTarget}' OR p.label = '${safeTarget}' + RETURN p.id AS id, p.label AS label, p.processType AS type, p.stepCount AS stepCount + LIMIT 1 + `; + const processRes = await executeQuery(processQuery); + if (processRes.length > 0) { + processRow = processRes[0]; + resolvedType = 'process'; + } + } + + if (!resolvedType || resolvedType === 'cluster') { + const communityQuery = ` + MATCH (c:Community) + WHERE c.id = '${safeTarget}' OR c.label = '${safeTarget}' OR c.heuristicLabel = '${safeTarget}' + RETURN c.id AS id, c.label AS label, c.cohesion AS cohesion, c.symbolCount AS symbolCount, c.description AS description + LIMIT 1 + `; + const communityRes = await executeQuery(communityQuery); + if (communityRes.length > 0) { + communityRow = communityRes[0]; + resolvedType = 'cluster'; + } } - const marker = `[HIGHLIGHT_NODES:${nodeIds.join(',')}]`; - const desc = description || `Highlighting ${nodeIds.length} node(s)`; + if (!resolvedType || resolvedType === 'symbol') { + const symbolQuery = ` + MATCH (n) + WHERE n.name = '${safeTarget}' OR n.id = '${safeTarget}' OR n.filePath = '${safeTarget}' + RETURN n.id AS id, n.name AS name, n.filePath AS filePath, label(n) AS nodeType + LIMIT 5 + `; + const symbolRes = await executeQuery(symbolQuery); + if (symbolRes.length > 0) { + symbolRow = symbolRes[0]; + resolvedType = 'symbol'; + } + } - return `${desc}\n\n${marker}\n\nNodes highlighted in the graph.`; + if (!resolvedType) { + return `Could not find "${target}" as a symbol, cluster, or process. Try search first.`; + } + + if (resolvedType === 'process') { + const pid = getRowValue(processRow, 0, 'id'); + const label = getRowValue(processRow, 1, 'label'); + const ptype = getRowValue(processRow, 2, 'type'); + const stepCount = getRowValue(processRow, 3, 'stepCount'); + + const stepsQuery = ` + MATCH (s)-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process {id: '${pid.replace(/'/g, "''")}'}) + RETURN s.name AS name, s.filePath AS filePath, r.step AS step + ORDER BY r.step + `; + const clustersQuery = ` + MATCH (c:Community)<-[:CodeRelation {type: 'MEMBER_OF'}]-(s) + MATCH (s)-[:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process {id: '${pid.replace(/'/g, "''")}'}) + RETURN DISTINCT c.id AS id, c.label AS label, c.description AS description + ORDER BY c.label + LIMIT 20 + `; + + const [steps, clusters] = await Promise.all([ + executeQuery(stepsQuery), + executeQuery(clustersQuery), + ]); + + const stepLines = steps.map((row: any) => { + const name = getRowValue(row, 0, 'name'); + const filePath = getRowValue(row, 1, 'filePath'); + const step = getRowValue(row, 2, 'step'); + return `- ${step}. ${name} (${filePath || 'n/a'})`; + }); + + const clusterLines = clusters.map((row: any) => { + const clabel = getRowValue(row, 1, 'label'); + const desc = getRowValue(row, 2, 'description'); + return `- ${clabel}${desc ? ` β€” ${desc}` : ''}`; + }); + + return [ + `PROCESS: ${label}`, + `Type: ${ptype || 'n/a'}`, + `Steps: ${stepCount ?? steps.length}`, + ``, + `STEPS:`, + ...(stepLines.length > 0 ? stepLines : ['- None found']), + ``, + `CLUSTERS TOUCHED:`, + ...(clusterLines.length > 0 ? clusterLines : ['- None found']), + ].join('\n'); + } + + if (resolvedType === 'cluster') { + const cid = getRowValue(communityRow, 0, 'id'); + const label = getRowValue(communityRow, 1, 'label'); + const cohesion = getRowValue(communityRow, 2, 'cohesion'); + const symbolCount = getRowValue(communityRow, 3, 'symbolCount'); + const description = getRowValue(communityRow, 4, 'description'); + + const membersQuery = ` + MATCH (c:Community {id: '${cid.replace(/'/g, "''")}'})<-[:CodeRelation {type: 'MEMBER_OF'}]-(m) + RETURN m.name AS name, m.filePath AS filePath, label(m) AS nodeType + LIMIT 50 + `; + const processesQuery = ` + MATCH (c:Community {id: '${cid.replace(/'/g, "''")}'})<-[:CodeRelation {type: 'MEMBER_OF'}]-(s) + MATCH (s)-[:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process) + RETURN DISTINCT p.id AS id, p.label AS label, p.stepCount AS stepCount + ORDER BY p.stepCount DESC + LIMIT 20 + `; + + const [members, processes] = await Promise.all([ + executeQuery(membersQuery), + executeQuery(processesQuery), + ]); + + const memberLines = members.map((row: any) => { + const name = getRowValue(row, 0, 'name'); + const filePath = getRowValue(row, 1, 'filePath'); + const nodeType = getRowValue(row, 2, 'nodeType'); + return `- ${nodeType}: ${name} (${filePath || 'n/a'})`; + }); + + const processLines = processes.map((row: any) => { + const plabel = getRowValue(row, 1, 'label'); + const steps = getRowValue(row, 2, 'stepCount'); + return `- ${plabel} (${steps} steps)`; + }); + + return [ + `CLUSTER: ${label}`, + `Symbols: ${symbolCount ?? members.length}`, + `Cohesion: ${cohesion !== null && cohesion !== undefined ? Number(cohesion).toFixed(2) : 'n/a'}`, + `Description: ${description || 'n/a'}`, + ``, + `TOP MEMBERS:`, + ...(memberLines.length > 0 ? memberLines : ['- None found']), + ``, + `PROCESSES TOUCHING THIS CLUSTER:`, + ...(processLines.length > 0 ? processLines : ['- None found']), + ].join('\n'); + } + + if (resolvedType === 'symbol') { + const nodeId = getRowValue(symbolRow, 0, 'id'); + const name = getRowValue(symbolRow, 1, 'name'); + const filePath = getRowValue(symbolRow, 2, 'filePath'); + const nodeType = getRowValue(symbolRow, 3, 'nodeType'); + + const clusterQuery = ` + MATCH (n:${nodeType} {id: '${String(nodeId).replace(/'/g, "''")}'}) + MATCH (n)-[:CodeRelation {type: 'MEMBER_OF'}]->(c:Community) + RETURN c.label AS label, c.description AS description + LIMIT 1 + `; + const processQuery = ` + MATCH (n:${nodeType} {id: '${String(nodeId).replace(/'/g, "''")}'}) + MATCH (n)-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process) + RETURN p.label AS label, r.step AS step, p.stepCount AS stepCount + ORDER BY r.step + `; + const connectionsQuery = ` + MATCH (n:${nodeType} {id: '${String(nodeId).replace(/'/g, "''")}'}) + OPTIONAL MATCH (n)-[r1:CodeRelation]->(dst) + OPTIONAL MATCH (src)-[r2:CodeRelation]->(n) + RETURN + collect(DISTINCT {name: dst.name, type: r1.type, confidence: r1.confidence}) AS outgoing, + collect(DISTINCT {name: src.name, type: r2.type, confidence: r2.confidence}) AS incoming + LIMIT 1 + `; + + const [clusterRes, processRes, connRes] = await Promise.all([ + executeQuery(clusterQuery), + executeQuery(processQuery), + executeQuery(connectionsQuery), + ]); + + const clusterLabel = clusterRes.length > 0 ? getRowValue(clusterRes[0], 0, 'label') : 'Unclustered'; + const clusterDesc = clusterRes.length > 0 ? getRowValue(clusterRes[0], 1, 'description') : ''; + + const processLines = processRes.map((row: any) => { + const plabel = getRowValue(row, 0, 'label'); + const step = getRowValue(row, 1, 'step'); + const stepCount = getRowValue(row, 2, 'stepCount'); + return `- ${plabel} (step ${step}/${stepCount ?? '?'})`; + }); + + let connections = 'None'; + if (connRes.length > 0) { + const row = connRes[0]; + const rawOutgoing = Array.isArray(row) ? row[0] : (row.outgoing || []); + const rawIncoming = Array.isArray(row) ? row[1] : (row.incoming || []); + const outgoing = (rawOutgoing || []).filter((c: any) => c && c.name).slice(0, 5); + const incoming = (rawIncoming || []).filter((c: any) => c && c.name).slice(0, 5); + + const fmt = (c: any, dir: 'out' | 'in') => { + const conf = c.confidence ? Math.round(c.confidence * 100) : 100; + return dir === 'out' + ? `-[${c.type} ${conf}%]-> ${c.name}` + : `<-[${c.type} ${conf}%]- ${c.name}`; + }; + const outList = outgoing.map((c: any) => fmt(c, 'out')); + const inList = incoming.map((c: any) => fmt(c, 'in')); + if (outList.length || inList.length) { + connections = [...outList, ...inList].join(', '); + } + } + + return [ + `SYMBOL: ${nodeType} ${name}`, + `ID: ${nodeId}`, + `File: ${filePath || 'n/a'}`, + `Cluster: ${clusterLabel}${clusterDesc ? ` β€” ${clusterDesc}` : ''}`, + ``, + `PROCESSES:`, + ...(processLines.length > 0 ? processLines : ['- None found']), + ``, + `CONNECTIONS:`, + connections, + ].join('\n'); + } + + return `Unable to explore "${target}".`; }, { - name: 'highlight', - description: 'Highlight nodes in the visual graph. Use node IDs from search/cypher results (format: Label:filepath:name).', + name: 'explore', + description: 'Deep dive on a symbol, cluster, or process. Shows membership, participation, and connections.', schema: z.object({ - nodeIds: z.array(z.string()).describe('Node IDs to highlight (e.g., ["Function:src/utils.ts:calculate"])'), - description: z.string().optional().nullable().describe('What these nodes represent'), + target: z.string().describe('Name or ID of a symbol, cluster, or process'), + type: z.enum(['symbol', 'cluster', 'process']).optional().nullable().describe('Optional target type (auto-detected if omitted)'), }), } ); // ============================================================================ - // TOOL 6: BLAST RADIUS (Impact analysis) + // TOOL 7: IMPACT (Impact analysis) // ============================================================================ - const blastRadiusTool = tool( + const impactTool = tool( async ({ target, direction, maxDepth, relationTypes, includeTests, minConfidence }: { target: string; direction: 'upstream' | 'downstream'; @@ -452,12 +908,23 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, : 'Dependencies this RELIES ON'; // Try to find the target node first - const findTargetQuery = ` - MATCH (n) - WHERE n.name = '${target.replace(/'/g, "''")}' - RETURN n.id AS id, label(n) AS nodeType, n.filePath AS filePath - LIMIT 5 - `; + // If target contains '/', search by filePath; otherwise by name + const isPathQuery = target.includes('/'); + const escapedTarget = target.replace(/'/g, "''"); + + const findTargetQuery = isPathQuery + ? ` + MATCH (n) + WHERE n.filePath IS NOT NULL AND n.filePath CONTAINS '${escapedTarget}' + RETURN n.id AS id, label(n) AS nodeType, n.filePath AS filePath + LIMIT 10 + ` + : ` + MATCH (n) + WHERE n.name = '${escapedTarget}' + RETURN n.id AS id, label(n) AS nodeType, n.filePath AS filePath + LIMIT 10 + `; let targetResults; try { @@ -470,12 +937,40 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, return `Could not find "${target}" in the codebase. Try using the search tool first to find the exact name.`; } - // Use the first match - const targetNode = targetResults[0]; + // Handle multiple matches - require disambiguation + const allPaths = targetResults.map((r: any) => Array.isArray(r) ? r[2] : r.filePath).filter(Boolean); + + // If multiple matches and target doesn't look like a specific path, ask for clarification + if (targetResults.length > 1 && !target.includes('/')) { + return `⚠️ AMBIGUOUS TARGET: Multiple files named "${target}" found:\n\n${allPaths.map((p: string, i: number) => `${i + 1}. ${p}`).join('\n')}\n\nPlease specify which file you mean by using a more specific path, e.g.:\n- impact("${allPaths[0].split('/').slice(-3).join('/')}")\n- impact("${allPaths[1]?.split('/').slice(-3).join('/') || allPaths[0]}")`; + } + + // If target contains a path, try to find matching file + let targetNode = targetResults[0]; + if (target.includes('/') && targetResults.length > 1) { + const exactMatch = targetResults.find((r: any) => { + const path = Array.isArray(r) ? r[2] : r.filePath; + return path && path.toLowerCase().includes(target.toLowerCase()); + }); + if (exactMatch) { + targetNode = exactMatch; + } else { + // Still ambiguous even with path + return `⚠️ AMBIGUOUS TARGET: Could not uniquely match "${target}". Found:\n\n${allPaths.map((p: string, i: number) => `${i + 1}. ${p}`).join('\n')}\n\nPlease use a more specific path.`; + } + } + const targetId = Array.isArray(targetNode) ? targetNode[0] : targetNode.id; const targetType = Array.isArray(targetNode) ? targetNode[1] : targetNode.nodeType; const targetFilePath = Array.isArray(targetNode) ? targetNode[2] : targetNode.filePath; + if (import.meta.env.DEV) { + console.log(`🎯 Impact: Found target "${target}" β†’ id=${targetId}, type=${targetType}, filePath=${targetFilePath}`); + } + + // No more multipleMatchWarning needed - we either disambiguated or returned early + const multipleMatchWarning = ''; + // For File targets, find what calls code INSIDE the file (by filePath) // For code elements (Function, Class, etc.), use the direct id const isFileTarget = targetType === 'File'; @@ -505,7 +1000,7 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, r.type AS edgeType, r.confidence AS confidence, r.reason AS reason - LIMIT 100 + LIMIT 300 ` : ` MATCH (target {id: '${targetId.replace(/'/g, "''")}'}) @@ -522,7 +1017,7 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, r.type AS edgeType, r.confidence AS confidence, r.reason AS reason - LIMIT 100 + LIMIT 300 ` : isFileTarget ? ` @@ -541,7 +1036,7 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, r.type AS edgeType, r.confidence AS confidence, r.reason AS reason - LIMIT 100 + LIMIT 300 ` : ` MATCH (target {id: '${targetId.replace(/'/g, "''")}'}) @@ -558,10 +1053,21 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, r.type AS edgeType, r.confidence AS confidence, r.reason AS reason - LIMIT 100 + LIMIT 300 `; - depthQueries.push(executeQuery(d1Query).catch(err => { - if (import.meta.env.DEV) console.warn('Blast radius d=1 query failed:', err); + if (import.meta.env.DEV) { + console.log(`πŸ” Impact d=1 query:\n${d1Query}`); + } + depthQueries.push(executeQuery(d1Query).then(results => { + if (import.meta.env.DEV) { + console.log(`πŸ“Š Impact d=1 results: ${results.length} rows`); + if (results.length > 0) { + console.log(' Sample:', results.slice(0, 3)); + } + } + return results; + }).catch(err => { + if (import.meta.env.DEV) console.warn('Impact d=1 query failed:', err); return []; })); @@ -586,7 +1092,7 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, r2.type AS edgeType, r2.confidence AS confidence, r2.reason AS reason - LIMIT 100 + LIMIT 200 ` : ` MATCH (target {id: '${targetId.replace(/'/g, "''")}'}) @@ -606,10 +1112,10 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, r2.type AS edgeType, r2.confidence AS confidence, r2.reason AS reason - LIMIT 100 + LIMIT 200 `; depthQueries.push(executeQuery(d2Query).catch(err => { - if (import.meta.env.DEV) console.warn('Blast radius d=2 query failed:', err); + if (import.meta.env.DEV) console.warn('Impact d=2 query failed:', err); return []; })); } @@ -637,7 +1143,7 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, r3.type AS edgeType, r3.confidence AS confidence, r3.reason AS reason - LIMIT 50 + LIMIT 100 ` : ` MATCH (target {id: '${targetId.replace(/'/g, "''")}'}) @@ -659,10 +1165,10 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, r3.type AS edgeType, r3.confidence AS confidence, r3.reason AS reason - LIMIT 50 + LIMIT 100 `; depthQueries.push(executeQuery(d3Query).catch(err => { - if (import.meta.env.DEV) console.warn('Blast radius d=3 query failed:', err); + if (import.meta.env.DEV) console.warn('Impact d=3 query failed:', err); return []; })); } @@ -718,12 +1224,142 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, const totalAffected = allNodeIds.length; if (totalAffected === 0) { - return `No ${direction} dependencies found for "${target}" (types: ${activeRelTypes.join(', ')}). This code appears to be ${direction === 'upstream' ? 'unused (not called by anything)' : 'self-contained (no outgoing dependencies)'}.`; + if (isFileTarget) { + const escapeRegex = (value: string) => value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); + const targetFileName = (targetFilePath || target).split('/').pop() || target; + const baseName = targetFileName.replace(/\.[^/.]+$/, ''); + const refRegex = new RegExp(`\\b${escapeRegex(baseName)}\\b`, 'g'); + const hints: Array<{ file: string; line: number; content: string }> = []; + const hintLimit = 15; + + for (const [filePath, content] of fileContents.entries()) { + if (filePath === targetFilePath) continue; + const lines = content.split('\n'); + for (let i = 0; i < lines.length; i++) { + if (refRegex.test(lines[i])) { + hints.push({ + file: filePath, + line: i + 1, + content: lines[i].trim().slice(0, 150), + }); + if (hints.length >= hintLimit) break; + } + refRegex.lastIndex = 0; + } + if (hints.length >= hintLimit) break; + } + + if (hints.length > 0) { + const formatted = hints.map(h => `${h.file}:${h.line}: ${h.content}`).join('\n'); + return `No ${direction} dependencies found for "${target}" (types: ${activeRelTypes.join(', ')}), but textual references were detected (graph may be incomplete):\n\n${formatted}${multipleMatchWarning}`; + } + } + + return `No ${direction} dependencies found for "${target}" (types: ${activeRelTypes.join(', ')}). This code appears to be ${direction === 'upstream' ? 'unused (not called by anything)' : 'self-contained (no outgoing dependencies)'}.${multipleMatchWarning}`; + } + + const depth1 = byDepth.get(1) || []; + const depth2 = byDepth.get(2) || []; + const depth3 = byDepth.get(3) || []; + + // Confidence buckets + const confidenceBuckets = { high: 0, medium: 0, low: 0 }; + for (const nodes of byDepth.values()) { + for (const n of nodes) { + const conf = n.confidence ?? 1; + if (conf >= 0.9) confidenceBuckets.high += 1; + else if (conf >= 0.8) confidenceBuckets.medium += 1; + else confidenceBuckets.low += 1; + } + } + + // Affected processes and clusters + const maxIdsForContext = 500; + const trimmedIds = allNodeIds.slice(0, maxIdsForContext); + const idList = trimmedIds.map(id => `'${id.replace(/'/g, "''")}'`).join(', '); + let affectedProcesses: Array<{ label: string; hits: number; minStep: number | null; stepCount: number | null }> = []; + let affectedClusters: Array<{ label: string; hits: number; impact: string }> = []; + + if (trimmedIds.length > 0) { + const processQuery = ` + MATCH (s)-[r:CodeRelation {type: 'STEP_IN_PROCESS'}]->(p:Process) + WHERE s.id IN [${idList}] + RETURN p.label AS label, COUNT(DISTINCT s.id) AS hits, MIN(r.step) AS minStep, p.stepCount AS stepCount + ORDER BY hits DESC + LIMIT 20 + `; + const clusterQuery = ` + MATCH (s)-[:CodeRelation {type: 'MEMBER_OF'}]->(c:Community) + WHERE s.id IN [${idList}] + RETURN c.label AS label, COUNT(DISTINCT s.id) AS hits + ORDER BY hits DESC + LIMIT 20 + `; + const directIdList = depth1.map(n => `'${n.id.replace(/'/g, "''")}'`).join(', '); + const directClusterQuery = depth1.length > 0 ? ` + MATCH (s)-[:CodeRelation {type: 'MEMBER_OF'}]->(c:Community) + WHERE s.id IN [${directIdList}] + RETURN DISTINCT c.label AS label + ` : ''; + + const [processRes, clusterRes, directClusterRes] = await Promise.all([ + executeQuery(processQuery), + executeQuery(clusterQuery), + directClusterQuery ? executeQuery(directClusterQuery) : Promise.resolve([]), + ]); + + const directClusterSet = new Set(); + directClusterRes.forEach((row: any) => { + const label = Array.isArray(row) ? row[0] : row.label; + if (label) directClusterSet.add(label); + }); + + affectedProcesses = processRes.map((row: any) => ({ + label: Array.isArray(row) ? row[0] : row.label, + hits: Array.isArray(row) ? row[1] : row.hits, + minStep: Array.isArray(row) ? row[2] : row.minStep, + stepCount: Array.isArray(row) ? row[3] : row.stepCount, + })); + + affectedClusters = clusterRes.map((row: any) => { + const label = Array.isArray(row) ? row[0] : row.label; + const hits = Array.isArray(row) ? row[1] : row.hits; + const impact = directClusterSet.has(label) ? 'direct' : 'indirect'; + return { label, hits, impact }; + }); + } + + const directCount = depth1.length; + const processCount = affectedProcesses.length; + const clusterCount = affectedClusters.length; + let risk = 'LOW'; + if (directCount >= 30 || processCount >= 5 || clusterCount >= 5 || totalAffected >= 200) { + risk = 'CRITICAL'; + } else if (directCount >= 15 || processCount >= 3 || clusterCount >= 3 || totalAffected >= 100) { + risk = 'HIGH'; + } else if (directCount >= 5 || totalAffected >= 30) { + risk = 'MEDIUM'; } // ===== COMPACT TABULAR OUTPUT ===== const lines: string[] = [ - `πŸ”΄ BLAST RADIUS: ${target} | ${direction} | ${totalAffected} affected`, + `πŸ”΄ IMPACT: ${target} | ${direction} | ${totalAffected} affected`, + `Confidence: High ${confidenceBuckets.high} | Medium ${confidenceBuckets.medium} | Low ${confidenceBuckets.low}`, + ``, + `AFFECTED PROCESSES:`, + ...(affectedProcesses.length > 0 + ? affectedProcesses.map(p => `- ${p.label} - BROKEN at step ${p.minStep ?? '?'} (${p.hits} symbols, ${p.stepCount ?? '?'} steps)`) + : ['- None found']), + ``, + `AFFECTED CLUSTERS:`, + ...(affectedClusters.length > 0 + ? affectedClusters.map(c => `- ${c.label} (${c.impact}, ${c.hits} symbols)`) + : ['- None found']), + ``, + `RISK: ${risk}`, + `- Direct callers: ${directCount}`, + `- Processes affected: ${processCount}`, + `- Clusters affected: ${clusterCount}`, ``, ]; @@ -767,7 +1403,6 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, }; // Depth 1 - Critical (with call site snippets) - const depth1 = byDepth.get(1) || []; if (depth1.length > 0) { const header = direction === 'upstream' ? `d=1 (Directly DEPEND ON ${target}):` @@ -786,7 +1421,6 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, } // Depth 2 - High impact - const depth2 = byDepth.get(2) || []; if (depth2.length > 0) { const header = direction === 'upstream' ? `d=2 (Indirectly DEPEND ON ${target}):` @@ -798,7 +1432,6 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, } // Depth 3 - Transitive - const depth3 = byDepth.get(3) || []; if (depth3.length > 0) { lines.push(`d=3 (Deep impact/dependency):`); depth3.slice(0, 5).forEach(n => lines.push(formatNode(n))); @@ -809,17 +1442,16 @@ MATCH (n:Function {id: emb.nodeId}) RETURN n`, // Compact footer lines.push(`βœ… GRAPH ANALYSIS COMPLETE (trusted)`); lines.push(`⚠️ Optional: grep("${target}") for dynamic patterns`); + if (multipleMatchWarning) { + lines.push(multipleMatchWarning); + } lines.push(``); - // Add the marker for UI highlighting - const marker = `[BLAST_RADIUS:${allNodeIds.join(',')}]`; - lines.push(marker); - return lines.join('\n'); }, { - name: 'blastRadius', - description: `Analyze the blast radius (impact) of changing a function, class, or file. + name: 'impact', + description: `Analyze the impact of changing a function, class, or file. Use when users ask: - "What would break if I changed X?" @@ -838,7 +1470,12 @@ Confidence: 100% = certain, <80% = fuzzy match (may be false positive) relationTypes filter (optional): - Default: CALLS, IMPORTS, EXTENDS, IMPLEMENTS (usage-based) -- Can add CONTAINS, DEFINES for structural analysis`, +- Can add CONTAINS, DEFINES for structural analysis + +Additional output sections: +- Affected processes (with step impact) +- Affected clusters (direct/indirect) +- Risk summary (based on direct callers, processes, clusters)`, schema: z.object({ target: z.string().describe('Name of the function, class, or file to analyze'), direction: z.enum(['upstream', 'downstream']).describe('upstream = what depends on this; downstream = what this depends on'), @@ -859,7 +1496,8 @@ relationTypes filter (optional): cypherTool, grepTool, readTool, - highlightTool, - blastRadiusTool, + overviewTool, + exploreTool, + impactTool, ]; }; diff --git a/gitnexus/src/core/llm/types.ts b/gitnexus/src/core/llm/types.ts index 8df3114b0f..d9ca8b30b7 100644 --- a/gitnexus/src/core/llm/types.ts +++ b/gitnexus/src/core/llm/types.ts @@ -177,15 +177,13 @@ export interface ToolCallInfo { * Now supports step-based streaming where each step is a distinct message */ export interface AgentStreamChunk { - type: 'reasoning' | 'tool_call' | 'tool_result' | 'content' | 'highlight' | 'error' | 'done'; + type: 'reasoning' | 'tool_call' | 'tool_result' | 'content' | 'error' | 'done'; /** LLM's reasoning/thinking text (shown as a step) */ reasoning?: string; /** Final answer content (streamed token by token) */ content?: string; /** Tool call information */ toolCall?: ToolCallInfo; - /** Node IDs to highlight in the graph */ - highlightNodeIds?: string[]; /** Error message */ error?: string; } diff --git a/gitnexus/src/hooks/useAppState.tsx b/gitnexus/src/hooks/useAppState.tsx index bce50e94e0..21d6ef32b3 100644 --- a/gitnexus/src/hooks/useAppState.tsx +++ b/gitnexus/src/hooks/useAppState.tsx @@ -1042,10 +1042,10 @@ export const AppStateProvider = ({ children }: { children: ReactNode }) => { } } - // Parse blast radius marker from tool results - const blastMatch = tc.result.match(/\[BLAST_RADIUS:([^\]]+)\]/); - if (blastMatch) { - const rawIds = blastMatch[1].split(',').map((id: string) => id.trim()).filter(Boolean); + // Parse impact marker from tool results + const impactMatch = tc.result.match(/\[IMPACT:([^\]]+)\]/); + if (impactMatch) { + const rawIds = impactMatch[1].split(',').map((id: string) => id.trim()).filter(Boolean); if (rawIds.length > 0 && graph) { const matchedIds = new Set(); const graphNodeIds = graph.nodes.map(n => n.id); diff --git a/gitnexus/src/lib/constants.ts b/gitnexus/src/lib/constants.ts index a5c231b327..efe94286d2 100644 --- a/gitnexus/src/lib/constants.ts +++ b/gitnexus/src/lib/constants.ts @@ -18,6 +18,7 @@ export const NODE_COLORS: Record = { Type: '#a78bfa', // Violet light CodeElement: '#64748b', // Slate - muted Community: '#818cf8', // Indigo light - cluster indicator + Process: '#f43f5e', // Rose - execution flow indicator }; // Node sizes by type - clear visual hierarchy with dramatic size differences @@ -39,6 +40,7 @@ export const NODE_SIZES: Record = { Type: 3, // Type alias - small CodeElement: 2, // Generic small Community: 0, // Hidden by default - metadata node + Process: 0, // Hidden by default - metadata node }; // Community color palette for cluster-based coloring diff --git a/gitnexus/src/types/pipeline.ts b/gitnexus/src/types/pipeline.ts index 0c7fec2938..123be720bc 100644 --- a/gitnexus/src/types/pipeline.ts +++ b/gitnexus/src/types/pipeline.ts @@ -1,7 +1,8 @@ import { GraphNode, GraphRelationship, KnowledgeGraph } from '../core/graph/types'; import { CommunityDetectionResult } from '../core/ingestion/community-processor'; +import { ProcessDetectionResult } from '../core/ingestion/process-processor'; -export type PipelinePhase = 'idle' | 'extracting' | 'structure' | 'parsing' | 'imports' | 'calls' | 'heritage' | 'communities' | 'enriching' | 'complete' | 'error'; +export type PipelinePhase = 'idle' | 'extracting' | 'structure' | 'parsing' | 'imports' | 'calls' | 'heritage' | 'communities' | 'processes' | 'enriching' | 'complete' | 'error'; export interface PipelineProgress { phase: PipelinePhase; @@ -20,6 +21,7 @@ export interface PipelineResult { graph: KnowledgeGraph; fileContents: Map; communityResult?: CommunityDetectionResult; + processResult?: ProcessDetectionResult; } // Serializable version for Web Worker communication diff --git a/todo_live_check_feature.md b/todo_live_check_feature.md new file mode 100644 index 0000000000..fceab3bd91 --- /dev/null +++ b/todo_live_check_feature.md @@ -0,0 +1,171 @@ +# Feature Specification: GitNexus "Guardian" (Live Impact Check) + +## 1. Overview +The "Guardian" is an active, background monitoring system that provides real-time feedback to developers as they modify code. It leverages the deterministic Knowledge Graph (KuzuDB) to perform instant "Impact Analysis" and "Architecture Linting" without incurring LLM token costs. + +**Goal:** Provide a "Safety Net" that catches breaking changes, side effects, and architectural violations *before* a commit is made. + +## 2. Core Capabilities + +### A. Live "Blast Radius" Detection +* **Trigger:** + * **Manual:** File Save / Debounced Keystroke. + * **AI-Aware Heuristic:** "Burst Write Cooldown" (See Section 3). +* **Logic:** + 1. Identify modified symbols (Functions, Classes) via incremental Tree-sitter parsing. + 2. Execute Graph Query (Cypher) to find dependents. + * `MATCH (modified)<-[:CALLS*1..5]-(affected) RETURN affected` + 3. Filter "affected" nodes that are outside the current file. +* **User Experience:** + * **Toast/Status Bar:** "⚠️ Modification affects 12 external files." + * **Panel:** List of affected files/functions (e.g., "Breaks `PaymentService.process()`"). + * **"Fix Prompt" Generator:** Button to copy a prompt for the AI agent (e.g., "Check `AuthService.ts` for regressions caused by my changes to `Login.tsx`"). +* **Cost:** **Zero Tokens.** (Pure Graph Traversal). + +### B. Architecture "Linting" +* **Trigger:** File Save / New Import Added. +* **Logic:** + 1. Detect new `IMPORTS` or `CALLS` edges in the graph. + 2. Check against defined "Layer Rules" (e.g., defined in `.gitnexus/rules.yaml`). + * *Rule Example:* `Frontend` cannot import `Database`. + 3. Execute Graph Query: + * `MATCH (source)-[:IMPORTS]->(target) WHERE source.layer = 'Frontend' AND target.layer = 'Database' RETURN source, target` +* **User Experience:** + * **Inline Warning:** "❌ Architectural Violation: UI component cannot directly access Database types." +* **Cost:** **Zero Tokens.** (Rule-based Graph Matching). + +### C. "Smart" Explanation (On Demand) +* **Trigger:** User clicks "Explain Risk" on a warning. +* **Logic:** + 1. Gather context: Source code of the change + Signatures of affected functions. + 2. Send structured prompt to LLM (Small model: GPT-4o-mini / Local Llama). + 3. *Prompt:* "The user modified `calculateTotal()`. This function is called by `InvoiceGenerator`. Explain potential risks." +* **User Experience:** + * Natural language summary: "Changing the return type of `calculateTotal` will cause a compilation error in `InvoiceGenerator` which expects a number." +* **Cost:** **Low.** (Only on user request, highly targeted context). + +## 3. Technical Architecture + +### The "Watcher" Loop +1. **File Watcher:** Listens for `change` events in the workspace. +2. **AI Detection Heuristic (Burst Mode):** + * *Logic:* Detect rapid file writes (3+ files in <2s) typical of AI Agents. + * *Action:* Suppress immediate checks (don't spam). Wait for "Cooldown" (e.g., 5s silence). + * *Trigger:* After cooldown, run "Full Impact Scan" and present summary. +3. **Incremental Parser:** Updates the KuzuDB graph for the specific file (milliseconds). + * *Note:* Uses Tree-sitter's incremental parsing to be extremely fast. +3. **Graph Engine:** Runs pre-compiled Cypher queries against the updated graph. +4. **Client Interface (MCP/Extension):** + * Push notifications to VS Code / Cursor. + * Update "Impact" panel in the browser UI. + +## 4. Agent Integration (Mandatory Workflow) + +To ensure AI agents (Cursor, Claude, etc.) utilize GitNexus, we leverage "Prompt Engineering via Context" and Tool Definitions. + +### A. The `.cursorrules` Protocol +* **Mechanism:** A `.cursorrules` file in the project root enforces agent behavior. +* **Content:** + ```markdown + # GitNexus Safety Protocol + You have access to `@gitnexus` tools. + + **MANDATORY WORKFLOW:** + 1. **PLANNING:** Before writing code, use `gitnexus_get_dependencies` to map the blast radius. + 2. **VERIFICATION:** After EVERY code modification, you MUST run `gitnexus_detect_impact`. + 3. **REPORTING:** If risks are found, fix them or report to user. + ``` + +### B. "Bossy" Tool Descriptions +* **Mechanism:** Update MCP Tool definitions to be imperative. +* **Description:** "CRITICAL SAFETY TOOL. MUST be used after modifying code. Scans graph for broken dependencies. Returns affected files." + +### C. "Fix Prompt" Generator +* **Mechanism:** If the Agent ignores the tools and the "Guardian" detects a break, the popup offers a "Copy Fix Prompt" button. +* **Prompt:** "Your changes to X broke Y. Use `gitnexus_detect_impact` to verify and fix." + +## 5. Distributed Knowledge Graph (Git-Native Architecture) + +To enable "B2B / Team" features without a central server, we use Git itself as the synchronization mechanism for the Knowledge Graph. This is the **"Git-Native Knowledge Graph"** architecture. + +### A. The Core Concept: "Graph Manifest" +You cannot commit the raw KuzuDB database files (binaries) to Git. Instead, we use a lightweight, diff-friendly **Graph Manifest**. + +* **File Path:** `.gitnexus/graph-state.jsonl.gz` +* **Content:** A compressed JSON Lines dump of the Nodes and Edges (semantic data only). +* **Purpose:** Acts as the "Transport Layer" for the graph between machines. + +### B. The Workflow + +#### Step 1: The "Write" Op (Local Dev) +1. **Code Change:** Developer modifies `User.ts`. +2. **Local Indexing:** GitNexus CLI updates local KuzuDB instantly (Incremental Update). +3. **Pre-Commit Hook:** + * Trigger: `git commit` + * Action: GitNexus dumps the current KuzuDB state to `.gitnexus/graph-state.jsonl.gz`. + * Optimization: Only dumps semantic data (e.g., "Func A calls Func B"), not the full AST, keeping it small. + +#### Step 2: The "Transport" (Git Sync) +* `git push` uploads the Code + Graph Manifest. +* **Crucial:** The graph version is now cryptographically tied to the commit hash. No "drift" between code and graph. + +#### Step 3: The "Read" Op (Teammate Pull) +1. **Git Pull:** Teammate receives new code + new manifest. +2. **Post-Merge Hook / Hydration:** + * Trigger: Git detects change in `.gitnexus`. + * Action: GitNexus CLI reads the manifest and **bulk-inserts** it into the local KuzuDB. + * **Result:** Teammate has a fully queried graph in seconds (vs. minutes of re-parsing). + +### C. Conflict Resolution: "Discard and Rebuild" +What happens if two devs change the graph simultaneously? + +* **Scenario:** Merge conflict in `.gitnexus/graph-state.jsonl.gz`. +* **Strategy:** + 1. GitNexus detects the conflict in the manifest file. + 2. It **discards** the conflicted manifest. + 3. It runs the **Parser** locally on the *merged* source code (Source of Truth). + 4. It generates a **fresh, correct** manifest. +* **Philosophy:** The Graph is a *derivative* of the Code. We never manually merge the graph; we regenerate it from the source. + +### D. Architecture Diagram (Mermaid) + +```mermaid +graph TD + subgraph "Developer A (Write)" + CodeA[User.ts] -->|Parser| DB_A[(Local KuzuDB)] + DB_A -->|Pre-Commit Export| ManifestA[.gitnexus/graph-state.gnx] + ManifestA -->|git push| GitHub + end + + subgraph "GitHub / Git Server" + GitHub -->|git pull| DevB_Repo + end + + subgraph "Developer B (Read)" + DevB_Repo[Code + Manifest] -->|Hydration Hook| DB_B[(Local KuzuDB)] + ManifestA -->|Bulk Insert| DB_B + DB_B -->|Instant Query| Cursor_B[Cursor / IDE] + end + + subgraph "Enterprise Hub (Monetization)" + GitHub -->|Webhook| HubServer[Node.js Hub] + HubServer -->|Download Manifest| CentralDB[(Neo4j / Postgres)] + CentralDB -->|Analytics API| Dashboard[CTO Dashboard] + end +``` + +### E. B2B / Enterprise "Hub" Integration +The "Hub" is a lightweight server that monetizes this architecture. +1. **Action:** Subscribes to the repo's webhooks. +2. **Ingestion:** Downloads *only* the manifest file (not the source code). +3. **Storage:** Loads it into a centralized DB (Neo4j/Postgres) for organization-wide queries. +4. **Security:** "We don't see your code, only your graph structure." + + +## 7. Implementation Roadmap +1. **Phase 1:** Implement `FileWatcher` in CLI + Incremental Graph Update. +2. **Phase 2:** Create `ImpactQuery` engine (Cypher queries for dependents). +3. **Phase 3:** Build MCP Tool `get_live_impact` for Editor integration and "Bossy" descriptions. +4. **Phase 4:** Implement Git-Native Graph Sync (Manifest generation + Pre-commit hook). +5. **Phase 5:** Add Architecture Rule definition schema (`.gitnexus/rules.yaml`). +