fix(web): align agent system prompt with registered tools#1984
Conversation
Rewrites BASE_SYSTEM_PROMPT to fix tool-name mismatches, citation format, and schema guidance from PR #14 tri-review, and adds unit tests that guard prompt ↔ tool registry parity. Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
magyargergo
left a comment
There was a problem hiding this comment.
🔭 Tri-review — fix(web): align agent system prompt with registered tools
Methods & engine breakdown. Reviewed three ways: the GitNexus reviewer swarm (risk, test/CI) + Compound-Engineering personas (correctness, adversarial, maintainability, testing) — six Claude lanes — plus Codex, the one genuinely independent engine. Codex was sandbox-limited this run: it couldn't use its local shell or the GitNexus index, but recovered file access via its GitHub tools and read the full diff, the tool definitions, and the gitnexus-shared schema constants. Its visible analysis corroborated two points (the validation-section fold-in is benign; the schema constants it located back the schema-match check), but it did not return a retrievable final findings list. So treat this as a Claude-consensus review with partial Codex corroboration — not three independent confirmations (the six Claude lanes share priors, so their agreement is "consistent across personas," not independent).
This holds up well. I re-read the code to confirm the central claim end-to-end: the prompt's seven tool names, the graph-schema node labels & relation types (all members of NODE_TABLES/REL_TYPES in gitnexus-shared), the cypher {{QUERY_VECTOR}} + query routing, and the [[path:START-END]] / [[Type:Name]] citation format all match the real createGraphRAGTools registration (tools.ts:1497) and the UI grounding parser. The "There is NO highlight_in_graph tool" line is accurate. All five new test cases (8 expect() calls) genuinely pass. This is a clean fix for a prompt that had drifted from the actual toolset.
Headline (inline) — P2, non-blocking
The new test guards prompt ↔ GRAPH_RAG_TOOL_NAMES but not GRAPH_RAG_TOOL_NAMES ↔ the actual registered tools. GRAPH_RAG_TOOL_NAMES is a third hand-maintained copy; the test never imports createGraphRAGTools, so a future tool rename/add/remove can keep all five test cases green while the prompt mislabels a real tool — the LLM then emits a tool-call name LangChain can't route, and that tool silently fails at runtime. One assertion closes the loop (see inline comment). Flagged by five of the six lanes (risk, test-CI, adversarial, maintainability, testing; correctness noted it as a testing gap); trigger confirmed by code-read. Non-blocking — the three lists are in sync today.
Minor (optional)
- Test-only const on the public barrel —
GRAPH_RAG_TOOL_NAMESis consumed only by the test, which imports it directly from./tools; theindex.ts:24re-export adds public surface for no runtime consumer. (maintainability) - Const comment understates the coupling —
tools.ts:19says "keep in sync withBASE_SYSTEM_PROMPT"; the real invariant is registration ↔ const ↔ prompt. (maintainability) - A few brittle assertions —
FORBIDDEN_TOOL_NAMESonly blocks back-ticked names (a bare-prose legacy mention would evade it); thehighlight_in_graphregex leans on single-line token co-occurrence; the citation regex matches the example not the instruction; the new[[Type:Name]]symbol-ref form is untested. All P3 hardening, not defects. (adversarial, testing)
Refuted (validation is a feature)
- Dropping the "MANDATORY: VALIDATION" heading is not a regression — it was folded into CORE PROTOCOL step 6, the "Cite or retract" rule, and a new ERROR RECOVERY section. (risk, correctness, adversarial; Codex concurred)
- Tool-name substring-collision (
searchvshybrid_search), barrel-export name collision, and "citation placeholders are unparseable" were each probed and refuted — the concrete prompt examples are parser-valid.
Pre-existing (not introduced here)
- The UI's
NODE_REF_REGEXallowlist (grounding-patterns.ts:8) omitsCommunity/Process, yet the prompt lists them as node labels, so a[[Process:…]]citation would be silently dropped. Pre-existing — the prompt's examples only useFunction/Class; worth a separate ticket, not this PR.
CI & merge
- Branch hygiene: merge-from-main commit present but harmless and merge-safe (
Merge branch 'main'brought the branch up to date; the web change is one focused commit). - Merge state: checks pending (
BLOCKEDon required checks; no conflicts). The web gates are green —typecheck-web,lint,format,Build & Push gitnexus-web,e2e / e2e (chromium),e2e / Check web module changes, CodeQL. The one web-relevant check still pending istests / ubuntu / coverage(it runs the newagent-prompt.test.ts); the rest of the pending set (Build & Push gitnexus,scope-parity, tree-sitter ABI windows, windows platform-sensitive) is ingestion/CLI and unrelated to this web-only change.
Final verdict — production-ready with minor follow-ups
No correctness defects: the prompt-vs-tools alignment, graph schema, and citation format were verified end-to-end and the new tests pass. The single P2 (a const↔registration parity gap in the test) is a non-blocking hardening that a one-line assertion resolves; the other items are P3 polish or pre-existing. Before merge, just let the remaining tests / ubuntu / coverage check (which runs the new test) go green.
Automated multi-tool digest (GitNexus swarm + Compound-Engineering + Codex). Verify findings before acting. No blocking issues; the one inline item is a non-blocking test-coverage enhancement.
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10947 tests passed 13 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
U1: assert GRAPH_RAG_TOOL_NAMES equals the names createGraphRAGTools actually registers (via a no-op stub backend), closing the const<->registration drift gap the prompt-parity test previously missed. U2: make the forbidden-name guard word-boundary (catches bare-prose mentions, not just backticked); make the highlight_in_graph guarantee registry-level (reword-proof) plus a presence check; add a parser-recognized [[Type:Name]] symbol-citation assertion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
U3: GRAPH_RAG_TOOL_NAMES has no runtime consumer -- the parity test imports it directly from ./tools -- so remove it from the public index.ts barrel re-export. Update the constant's doc comment to name the registration<->const<->prompt coupling now enforced by agent-prompt.test.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Source the symbol-citation assertion from the UI parser's own NODE_REF_REGEX instead of a hardcoded 4-label subset, so the test tracks the parser's allowlist rather than forking it. Also drop a redundant array spread and an unnecessary readonly-tuple cast surfaced by the simplify pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code review noted the registry-absence + bare-presence pair would pass if a future prompt edit affirmatively instructed calling highlight_in_graph (string present, still not registered). Add an assertion that the prompt never says use/call/invoke highlight_in_graph -- restoring the protective intent of the replaced negation check without its brittleness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…system-prompt Fold the prompt<->tools parity test hardening into PR #1984: const<->registration parity gate, word-boundary forbidden-name guard, NODE_REF_REGEX-derived symbol-ref assertion, and a guard against affirmative highlight_in_graph call instructions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Addresses all blocking and high-priority findings from the PR #14 tri-review for the current
gitnexus-webagent:BASE_SYSTEM_PROMPTwith the iterative investigation loop while using exact registered tool names (search,cypher,grep,read,explore,overview,impact)[[path:START-END]],[[Type:Name]]) matching the UI parserCodeRelation {type: '...'}schema (noCodeNode/INHERITSdrift)highlight_in_graphtool in this codebaseGRAPH_RAG_TOOL_NAMESand adds unit tests to prevent prompt ↔ tool registry regressionsTest plan
cd gitnexus-web && npm test -- test/unit/agent-prompt.test.tssearch,cypher,read) dispatch successfullyMade with Cursor