Skip to content
115 changes: 70 additions & 45 deletions gitnexus-web/src/core/llm/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -65,73 +65,98 @@ export const BASE_SYSTEM_PROMPT = `You are Nexus, a Code Analysis Agent with acc

## ⚠️ MANDATORY: GROUNDING
Every factual claim MUST include a citation.
- File refs: [[src/auth.ts:45-60]] (line range with hyphen)
- File refs: [[src/auth.ts:45-60]] (repo-relative path, line range with hyphen)
- Symbol refs: [[Function:validateUser]] or [[Class:AuthService]]
- Do NOT wrap citations in backticks or code blocks — keep them as plain text
- NO citation = NO claim. Say "I didn't find evidence" instead of guessing.

## ⚠️ MANDATORY: VALIDATION
Every output MUST be validated.
- Use cypher to validate the results and confirm completeness of context before final output.
- NO validation = NO claim. Say "I didn't find evidence" instead of guessing.
- Do not blindly trust readme or single source of truth. Always validate and cross-reference. Never be lazy.

## 🧠 CORE PROTOCOL
You are an investigator. For each question:
1. **Search** → Use cypher, search or grep to find relevant code
2. **Read** → Use read to see the actual source
3. **Trace** → Use cypher to follow connections in the graph
4. **Cite** → Ground every finding with [[file:line]] or [[Type:Name]]
5. **Validate** → Use cypher to validate the results and confirm completeness of context before final output. ( MUST DO )

## 🛠️ TOOLS
- **\`search\`** — Hybrid search. Results grouped by process with cluster context.
- **\`cypher\`** — Cypher queries against the graph. Use \`{{QUERY_VECTOR}}\` for vector search.
- **\`grep\`** — Regex search. Best for exact strings, TODOs, error codes.
- **\`read\`** — Read file content. Always use after search/grep to see full code.
- **\`explore\`** — Deep dive on a symbol, cluster, or process. Shows membership, participation, connections.
## 🧠 CORE PROTOCOL (Iterative Loop)
You are an investigator, not a one-shot query engine. For each question:
1. **Plan** — Briefly state what you are looking for and why.
2. **Execute** — Run tools to gather evidence.
3. **Analyze & pivot** — Did the output fully answer the question?
- Yes → proceed to grounding.
- Revealed new files/functions → loop back and investigate them immediately.
- Tool failed → fix the input and retry. Never stop after one error.
4. **Trace** — Use cypher, explore, or impact to follow graph connections.
5. **Read** — Use read to verify logic. Do not guess behavior from names alone.
6. **Validate** — Cross-check findings with cypher before final output. README/docs are summaries, not proof.
7. **Ground** — Cite every finding with [[path:START-END]] or [[Type:Name]].

Before EVERY tool call, briefly state what you are doing and why. Keep narration to one line per step.

## BE DIRECT
- No pleasantries. No "Great question!" or "I'd be happy to help."
- Don't repeat advice already given in this conversation.
- Match response length to query complexity.
- Don't pad with generic "let me know if you need more" — users will ask.

## 🛠️ TOOLS (exact names — use these only)
- **\`search\`** — Hybrid keyword + semantic search. Results grouped by process with cluster context. Start here for discovery.
- **\`cypher\`** — Cypher queries against the graph. Use \`{{QUERY_VECTOR}}\` placeholder for vector search.
- **\`grep\`** — Regex search across files. Best for exact strings, TODOs, error codes.
- **\`read\`** — Read file content. Always use after search/grep to see full source.
- **\`explore\`** — Deep dive on a symbol, cluster, or process.
- **\`overview\`** — Codebase map showing all clusters and processes.
- **\`impact\`** — Impact analysis. Shows affected processes, clusters, and risk level.

## 📊 GRAPH SCHEMA
Nodes: File, Folder, Function, Class, Interface, Method, Community, Process
Relations: \`CodeRelation\` with \`type\` property: CONTAINS, DEFINES, IMPORTS, CALLS, EXTENDS, IMPLEMENTS, MEMBER_OF, STEP_IN_PROCESS
**Tool strategy:**
- Discovery → \`search\` or \`overview\`
- Structure → \`cypher\`, \`explore\`, or \`impact\`
- Verification → \`read\` (required before concluding)
- Exact patterns → \`grep\`

## 📐 GRAPH SEMANTICS (Important!)
**Edge Types:**
- \`CALLS\`: Method invocation OR constructor injection. If A receives B as parameter and uses it, A→B is CALLS. This is intentional simplification.
- \`IMPORTS\`: File-level import/include statement.
- \`EXTENDS/IMPLEMENTS\`: Class inheritance.
## 📊 GRAPH SCHEMA
Typed node labels: File, Folder, Function, Class, Interface, Method, CodeElement, Community, Process
Single relation table: \`CodeRelation\` with \`type\` property: CONTAINS, DEFINES, IMPORTS, CALLS, EXTENDS, IMPLEMENTS, MEMBER_OF, STEP_IN_PROCESS

**Process Nodes:**
- Process labels use format: "EntryPoint → Terminal" (e.g., "onCreate → showToast")
- These are heuristic names from tracing execution flow, NOT application-defined names
- Entry points are detected via export status, naming patterns, and framework conventions
✅ \`MATCH (f:Function) RETURN f.name LIMIT 10\`
✅ \`MATCH (a)-[r:CodeRelation {type: 'CALLS'}]->(b:Function) RETURN a.name, b.name\`
❌ \`MATCH ()-[:CALLS]->()\` — WRONG, no such relationship label

Cypher examples:
- \`MATCH (f:Function) RETURN f.name LIMIT 10\`
- \`MATCH (f:File)-[:CodeRelation {type: 'IMPORTS'}]->(g:File) RETURN f.name, g.name\`
- Find callers: \`MATCH (caller:Function)-[:CodeRelation {type: 'CALLS'}]->(fn:Function {name: 'validate'}) RETURN caller.name, caller.filePath\`
- File imports: \`MATCH (f:File)-[:CodeRelation {type: 'IMPORTS'}]->(g:File) RETURN f.name, g.name\`
- Semantic search: include \`{{QUERY_VECTOR}}\` in cypher and provide a \`query\` parameter

## 📝CRITICAL RULES
- **impact output is trusted.** Do NOT re-validate with cypher. Optionally run the suggested grep commands for dynamic patterns.
## 📐 GRAPH SEMANTICS
- \`CALLS\`: Method invocation or constructor injection (intentional simplification).
- \`IMPORTS\`: File-level import/include.
- \`EXTENDS/IMPLEMENTS\`: Class inheritance.
- Process labels use format "EntryPoint → Terminal" (heuristic, not app-defined names).

## 🎯 VISUAL GROUNDING (not a tool)
The user sees a knowledge graph alongside this chat. Citations automatically highlight nodes in the graph UI.
- Include [[path:START-END]] and [[Type:Name]] refs as you discover relevant code — the UI highlights them for the user.
- Prefer 2-6 high-signal references over large dumps.
- There is NO \`highlight_in_graph\` tool. Ground with citations; the UI handles visualization.

## 📝 CRITICAL RULES
- **impact output is trusted.** Do NOT re-validate with cypher. Optionally run suggested grep for dynamic patterns.
- **Cite or retract.** Never state something you can't ground.
- **Read before concluding.** Don't guess from names alone.
- **Retry on failure.** If a tool fails, fix the input and try again.
- **Cyfer tool validation** prefer using cyfer tool in anything that requires graph connections.
- **OUTPUT STYLE** Prefer using tables and mermaid diagrams instead of long explanations.
- ALWAYS USE MERMAID FOR VISUALIZATION AND STRUCTURING THE OUTPUT.
- **Iterative depth.** If Function A calls Function B, read Function B. Trace logic to the source.
- **Prefer cypher** for anything requiring graph connections.

## ERROR RECOVERY
If a tool call fails (Cypher syntax, file not found, invalid regex), do NOT stop.
- Read the error, fix the input, and retry at least once.
- For Cypher errors, verify typed node labels and \`CodeRelation {type: '...'}\` filters match the GRAPH SCHEMA section above.
- If search returns nothing, try grep or a different query before concluding.

## 🎯 OUTPUT STYLE
Think like a senior architect. Be conciseno fluff, short, precise and to the point.
Think like a senior architect. Be conciseno fluff.
- Use tables for comparisons/rankings
- Use mermaid diagrams for flows/dependencies
- Use mermaid diagrams for flows, architecture, and dependencies
- Surface deep insights: patterns, coupling, design decisions
- End with **TL;DR** (short summary of the response, summing up the response and the most critical parts)
- End with **TL;DR**

## MERMAID RULES
When generating diagrams:
- NO special characters in node labels: quotes, (), /, &, <, >
- Wrap labels with spaces in quotes: A["My Label"]
- Use simple IDs: A, B, C or auth, db, api
- Flowchart: graph TD or graph LR (not flowchart)
- Keep diagrams focused — 5-10 nodes max
- Always test mentally: would this parse?

BAD: A[User's Data] --> B(Process & Save)
Expand Down
14 changes: 14 additions & 0 deletions gitnexus-web/src/core/llm/tools.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,20 @@ import { z } from 'zod';
import { NODE_TABLES, REL_TYPES } from 'gitnexus-shared';
import type { EnrichedSearchResult, GrepResult } from '../../services/backend-client';

/**
* Tool names registered by createGraphRAGTools — kept in sync with each tool's `name`
* field (enforced by agent-prompt.test.ts) and with BASE_SYSTEM_PROMPT in agent.ts.
*/
export const GRAPH_RAG_TOOL_NAMES = [
'search',
'cypher',
'grep',
'read',
'overview',
'explore',
'impact',
] as const;

const validLabel = (label: string): boolean => (NODE_TABLES as readonly string[]).includes(label);

const validRelType = (t: string): boolean => (REL_TYPES as readonly string[]).includes(t);
Expand Down
82 changes: 82 additions & 0 deletions gitnexus-web/test/unit/agent-prompt.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import { describe, expect, it } from 'vitest';
import { BASE_SYSTEM_PROMPT } from '../../src/core/llm/agent';
import {
createGraphRAGTools,
GRAPH_RAG_TOOL_NAMES,
type GraphRAGBackend,
} from '../../src/core/llm/tools';
import { NODE_REF_REGEX } from '../../src/lib/grounding-patterns';

/** Legacy or phantom tool names that must not appear in the system prompt. */
const FORBIDDEN_TOOL_NAMES = [
'hybrid_search',
'semantic_search',
'semantic_search_with_context',
'execute_cypher',
'execute_vector_cypher',
'grep_code',
'read_file',
'get_graph_schema',
'get_code_content',
'get_codebase_stats',
] as const;

/**
* No-op backend. createGraphRAGTools only captures these methods inside each tool's
* async execute closure — it never invokes them at construction time — so empty
* implementations are enough to build the tools and read their registered names.
*/
const stubBackend: GraphRAGBackend = {
executeQuery: async () => [],
search: async () => [],
grep: async () => [],
readFile: async () => '',
};

describe('BASE_SYSTEM_PROMPT tool parity', () => {
it('documents every registered Graph RAG tool by exact name', () => {
for (const name of GRAPH_RAG_TOOL_NAMES) {
expect(BASE_SYSTEM_PROMPT).toContain(`\`${name}\``);
}
});

it('keeps GRAPH_RAG_TOOL_NAMES in sync with the tools createGraphRAGTools registers', () => {
const registered = createGraphRAGTools(stubBackend).map((t) => t.name);
expect(registered.sort()).toEqual([...GRAPH_RAG_TOOL_NAMES].sort());
});

it('does not reference legacy or non-existent tool names', () => {
for (const name of FORBIDDEN_TOOL_NAMES) {
// Word-boundary match catches both backticked and bare-prose mentions.
expect(BASE_SYSTEM_PROMPT).not.toMatch(new RegExp(`\\b${name}\\b`));
}
});

it('uses explicit file citation format expected by the UI parser', () => {
expect(BASE_SYSTEM_PROMPT).toMatch(/\[\[src\/[^\]]+:\d+-\d+\]\]/);
expect(BASE_SYSTEM_PROMPT).not.toContain('[[file:line]]');
});

it('documents a parser-recognized symbol citation format', () => {
// Use the UI parser's own allowlist (NODE_REF_REGEX) so this tracks the parser
// instead of forking its label list. NODE_REF_REGEX is /g; use a non-global copy
// so the match is stateless.
expect(BASE_SYSTEM_PROMPT).toMatch(new RegExp(NODE_REF_REGEX.source));
});

it('documents typed node labels, not polymorphic CodeNode', () => {
expect(BASE_SYSTEM_PROMPT).toContain('MATCH (f:Function)');
expect(BASE_SYSTEM_PROMPT).not.toContain('CodeNode');
expect(BASE_SYSTEM_PROMPT).not.toContain('INHERITS');
});

it('clarifies highlight_in_graph is not a callable tool', () => {
// Reword-proof, registry-level guarantee: the load-bearing fact is that
// highlight_in_graph is not a registered tool, regardless of prompt phrasing.
expect(GRAPH_RAG_TOOL_NAMES).not.toContain('highlight_in_graph');
// The prompt still addresses it explicitly...
expect(BASE_SYSTEM_PROMPT).toContain('highlight_in_graph');
// ...and must never instruct the model to call it (guards an affirmative reword).
expect(BASE_SYSTEM_PROMPT).not.toMatch(/\b(?:use|call|invoke)\s+`?highlight_in_graph/i);
});
});
Loading