Skip to content

fix(web): align agent system prompt with registered tools#1984

Merged
magyargergo merged 8 commits into
mainfrom
fix/web-agent-system-prompt
Jun 3, 2026
Merged

fix(web): align agent system prompt with registered tools#1984
magyargergo merged 8 commits into
mainfrom
fix/web-agent-system-prompt

Conversation

@magyargergo

Copy link
Copy Markdown
Collaborator

Summary

Addresses all blocking and high-priority findings from the PR #14 tri-review for the current gitnexus-web agent:

  • Rewrites BASE_SYSTEM_PROMPT with the iterative investigation loop while using exact registered tool names (search, cypher, grep, read, explore, overview, impact)
  • Restores explicit citation rules ([[path:START-END]], [[Type:Name]]) matching the UI parser
  • Documents typed node labels + CodeRelation {type: '...'} schema (no CodeNode / INHERITS drift)
  • Clarifies that graph highlighting is citation-driven — there is no highlight_in_graph tool in this codebase
  • Restores BE DIRECT, MERMAID RULES, and ERROR RECOVERY sections removed in PR prompt changes #14
  • Exports GRAPH_RAG_TOOL_NAMES and adds unit tests to prevent prompt ↔ tool registry regressions

Test plan

  • cd gitnexus-web && npm test -- test/unit/agent-prompt.test.ts
  • Manual: run agent in web UI and confirm discovery tools (search, cypher, read) dispatch successfully

Made with Cursor

Rewrites BASE_SYSTEM_PROMPT to fix tool-name mismatches, citation format,
and schema guidance from PR #14 tri-review, and adds unit tests that
guard prompt ↔ tool registry parity.

Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel

vercel Bot commented Jun 3, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Jun 3, 2026 6:25am

Request Review

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔭 Tri-review — fix(web): align agent system prompt with registered tools

Methods & engine breakdown. Reviewed three ways: the GitNexus reviewer swarm (risk, test/CI) + Compound-Engineering personas (correctness, adversarial, maintainability, testing) — six Claude lanes — plus Codex, the one genuinely independent engine. Codex was sandbox-limited this run: it couldn't use its local shell or the GitNexus index, but recovered file access via its GitHub tools and read the full diff, the tool definitions, and the gitnexus-shared schema constants. Its visible analysis corroborated two points (the validation-section fold-in is benign; the schema constants it located back the schema-match check), but it did not return a retrievable final findings list. So treat this as a Claude-consensus review with partial Codex corroboration — not three independent confirmations (the six Claude lanes share priors, so their agreement is "consistent across personas," not independent).

This holds up well. I re-read the code to confirm the central claim end-to-end: the prompt's seven tool names, the graph-schema node labels & relation types (all members of NODE_TABLES/REL_TYPES in gitnexus-shared), the cypher {{QUERY_VECTOR}} + query routing, and the [[path:START-END]] / [[Type:Name]] citation format all match the real createGraphRAGTools registration (tools.ts:1497) and the UI grounding parser. The "There is NO highlight_in_graph tool" line is accurate. All five new test cases (8 expect() calls) genuinely pass. This is a clean fix for a prompt that had drifted from the actual toolset.

Headline (inline) — P2, non-blocking

The new test guards prompt ↔ GRAPH_RAG_TOOL_NAMES but not GRAPH_RAG_TOOL_NAMES ↔ the actual registered tools. GRAPH_RAG_TOOL_NAMES is a third hand-maintained copy; the test never imports createGraphRAGTools, so a future tool rename/add/remove can keep all five test cases green while the prompt mislabels a real tool — the LLM then emits a tool-call name LangChain can't route, and that tool silently fails at runtime. One assertion closes the loop (see inline comment). Flagged by five of the six lanes (risk, test-CI, adversarial, maintainability, testing; correctness noted it as a testing gap); trigger confirmed by code-read. Non-blocking — the three lists are in sync today.

Minor (optional)

  • Test-only const on the public barrelGRAPH_RAG_TOOL_NAMES is consumed only by the test, which imports it directly from ./tools; the index.ts:24 re-export adds public surface for no runtime consumer. (maintainability)
  • Const comment understates the couplingtools.ts:19 says "keep in sync with BASE_SYSTEM_PROMPT"; the real invariant is registration ↔ const ↔ prompt. (maintainability)
  • A few brittle assertionsFORBIDDEN_TOOL_NAMES only blocks back-ticked names (a bare-prose legacy mention would evade it); the highlight_in_graph regex leans on single-line token co-occurrence; the citation regex matches the example not the instruction; the new [[Type:Name]] symbol-ref form is untested. All P3 hardening, not defects. (adversarial, testing)

Refuted (validation is a feature)

  • Dropping the "MANDATORY: VALIDATION" heading is not a regression — it was folded into CORE PROTOCOL step 6, the "Cite or retract" rule, and a new ERROR RECOVERY section. (risk, correctness, adversarial; Codex concurred)
  • Tool-name substring-collision (search vs hybrid_search), barrel-export name collision, and "citation placeholders are unparseable" were each probed and refuted — the concrete prompt examples are parser-valid.

Pre-existing (not introduced here)

  • The UI's NODE_REF_REGEX allowlist (grounding-patterns.ts:8) omits Community/Process, yet the prompt lists them as node labels, so a [[Process:…]] citation would be silently dropped. Pre-existing — the prompt's examples only use Function/Class; worth a separate ticket, not this PR.

CI & merge

  • Branch hygiene: merge-from-main commit present but harmless and merge-safe (Merge branch 'main' brought the branch up to date; the web change is one focused commit).
  • Merge state: checks pending (BLOCKED on required checks; no conflicts). The web gates are green — typecheck-web, lint, format, Build & Push gitnexus-web, e2e / e2e (chromium), e2e / Check web module changes, CodeQL. The one web-relevant check still pending is tests / ubuntu / coverage (it runs the new agent-prompt.test.ts); the rest of the pending set (Build & Push gitnexus, scope-parity, tree-sitter ABI windows, windows platform-sensitive) is ingestion/CLI and unrelated to this web-only change.

Final verdict — production-ready with minor follow-ups

No correctness defects: the prompt-vs-tools alignment, graph schema, and citation format were verified end-to-end and the new tests pass. The single P2 (a const↔registration parity gap in the test) is a non-blocking hardening that a one-line assertion resolves; the other items are P3 polish or pre-existing. Before merge, just let the remaining tests / ubuntu / coverage check (which runs the new test) go green.


Automated multi-tool digest (GitNexus swarm + Compound-Engineering + Codex). Verify findings before acting. No blocking issues; the one inline item is a non-blocking test-coverage enhancement.

Comment thread gitnexus-web/test/unit/agent-prompt.test.ts Outdated
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10960 10947 0 13 662s

✅ All 10947 tests passed

13 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • Go structural interface detection benchmark > scales linearly with interface × struct count
  • Go structural interface detection split-phase benchmark > separates index-build and detection time
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 80.3% 38245/47625 79.84% 📈 +0.5 🟢 ████████████████░░░░
Branches 68.85% 24321/35320 68.5% 📈 +0.3 🟢 █████████████░░░░░░░
Functions 85.45% 3978/4655 84.94% 📈 +0.5 🟢 █████████████████░░░
Lines 83.91% 34403/40998 83.36% 📈 +0.5 🟢 ████████████████░░░░

📋 View full run · Generated by CI

magyargergo and others added 5 commits June 3, 2026 05:45
U1: assert GRAPH_RAG_TOOL_NAMES equals the names createGraphRAGTools actually registers (via a no-op stub backend), closing the const<->registration drift gap the prompt-parity test previously missed.

U2: make the forbidden-name guard word-boundary (catches bare-prose mentions, not just backticked); make the highlight_in_graph guarantee registry-level (reword-proof) plus a presence check; add a parser-recognized [[Type:Name]] symbol-citation assertion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
U3: GRAPH_RAG_TOOL_NAMES has no runtime consumer -- the parity test imports it directly from ./tools -- so remove it from the public index.ts barrel re-export. Update the constant's doc comment to name the registration<->const<->prompt coupling now enforced by agent-prompt.test.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Source the symbol-citation assertion from the UI parser's own NODE_REF_REGEX instead of a hardcoded 4-label subset, so the test tracks the parser's allowlist rather than forking it. Also drop a redundant array spread and an unnecessary readonly-tuple cast surfaced by the simplify pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code review noted the registry-absence + bare-presence pair would pass if a future prompt edit affirmatively instructed calling highlight_in_graph (string present, still not registered). Add an assertion that the prompt never says use/call/invoke highlight_in_graph -- restoring the protective intent of the replaced negation check without its brittleness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…system-prompt

Fold the prompt<->tools parity test hardening into PR #1984: const<->registration parity gate, word-boundary forbidden-name guard, NODE_REF_REGEX-derived symbol-ref assertion, and a guard against affirmative highlight_in_graph call instructions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@magyargergo magyargergo merged commit 78ad6bc into main Jun 3, 2026
31 checks passed
@magyargergo magyargergo deleted the fix/web-agent-system-prompt branch June 3, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant