Skip to content

docs: agent development framework, GitHub templates, eval refactor#479

Merged
magyargergo merged 18 commits into
abhigyanpatwari:mainfrom
jreakin:docs/agent-framework-templates-eval
Mar 25, 2026
Merged

docs: agent development framework, GitHub templates, eval refactor#479
magyargergo merged 18 commits into
abhigyanpatwari:mainfrom
jreakin:docs/agent-framework-templates-eval

Conversation

@jreakin

@jreakin jreakin commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Documentation, templates, and eval tooling. Zero runtime impact. Split from #303 (PR 6 of 6). Depends on #486.

Why this matters

GitNexus is increasingly contributed to by AI agents (Claude Code, Cursor, GitHub Copilot) alongside human developers. Without structured guidance, agents make well-intentioned but unsafe changes — renaming symbols with find-and-replace instead of the graph-aware rename tool, skipping impact analysis before edits, committing without running detect_changes, or accidentally deleting embeddings by running analyze without --embeddings.

This PR introduces a layered documentation framework that gives both human and AI contributors the same guardrails:

  1. What to do before editing — run impact analysis, check blast radius, understand risk levels
  2. What never to do — no secrets, no find-and-replace renames, no commits without change detection
  3. How to recover — copy-paste runbook commands for every common failure mode
  4. How to contribute — PR template enforces test verification, conventional commits, area-touched checkboxes

The framework is self-referential: the project is indexed by GitNexus itself, so agents reading AGENTS.md are instructed to use GitNexus MCP tools to safely navigate the codebase they're modifying. This makes AI-assisted contributions reproducible and auditable.


Agent Framework

These documents form a layered system — policy on top, operations in the middle, architecture at the base:

Policy layer — rules and safety gates:

  • AGENTS.md — Canonical agent instructions. Mandatory impact analysis before edits (gitnexus_impact), change detection before commits (gitnexus_detect_changes), scope boundaries (what agents can read/write/execute vs. what's off-limits). Includes GitNexus MCP tool quick reference and impact risk levels (d=1 WILL BREAK → d=3 MAY NEED TESTING). Recurring failure patterns ("Signs") with fixes.
  • CLAUDE.md — Claude Code-specific deltas. PreToolUse hooks for hard gates (e.g., tests before git_commit). Execution sequence discipline for long sessions. Defers all MCP guidance to AGENTS.md as single source of truth.
  • GUARDRAILS.md — Safety boundaries for humans and AI. Five non-negotiables: no secrets, use rename tool, impact analysis, detect_changes, preserve embeddings. Escalation scenarios (HIGH/CRITICAL risk, CI/release changes, data loss uncertainty). Publishing and supply chain rules.

Operations layer — what to run and when:

  • TESTING.md — Test structure, commands, and targets for both packages (gitnexus CLI: vitest; gitnexus-web: vitest + Playwright E2E). Categories: unit, integration, eval-style golden sets, E2E. Known issues (LadybugDB locking in containers).
  • RUNBOOK.md — Copy-paste operations: index a repo, generate embeddings, troubleshoot MCP (no repos → run analyze; wrong repo → use list_repos + repo param), clean slate, CI failure repro, memory/locking issues.

Architecture layer — system context:

  • ARCHITECTURE.md — Monorepo layout (gitnexus CLI/MCP, gitnexus-web React/Vite, plugin packages, eval harness). End-to-end data flow: Tree-sitter parsing → LadybugDB persistence → MCP tools/resources. Where-to-change-what map for common modifications.
  • llms.txt — Minimal pointer document listing key docs and directories for LLM context seeding.

Editor Integration

  • .cursor/index.mdc — Always-on Cursor Cloud rules pointing to AGENTS.md for canonical instructions.
  • .cursor/rules/100-monorepo.mdc — Glob-scoped build/test quick reference per package. Known issues (native build tools, LadybugDB locking).
  • .cursorrules — Legacy deprecation stub pointing to .cursor/index.mdc.

GitHub Templates

  • PR template — Structured sections: summary, motivation, areas touched (checkboxes for CLI, web, CI, docs, eval), scope/constraints, implementation notes, testing/verification (with test command checkboxes), risk/rollout, and checklist (PR body, AGENTS.md changelog, no secrets, validation).
  • Bug report form — Area dropdown (CLI/core, web UI, CI, docs, other), structured fields for summary, context, expected/actual behavior, reproduction steps, environment, logs/screenshots.
  • Feature request form — Area dropdown, problem/opportunity, proposed solution, alternatives, acceptance criteria, contributor willingness checkbox.
  • config.yml — Blank issues enabled for edge cases outside the templates.

Eval Harness

  • tool_registry.py — Centralized tool script specs and metric keys (was scattered across multiple files).
  • constants.py — Shared constants for eval/bridge timeouts and defaults.
  • utils/errors.py — Debug-gated exception sanitization using traceback.format_exception (not format_exc) for reliable traceback capture. log_safe_exception helper for consistent error logging without stack trace leaks.
  • bridge/mcp_bridge.py — Centralized MCP timeout constants, improved subprocess stream closing, refactored response reading.
  • analysis/analyze_results.py — Longest-suffix-first run-ID parsing, flexible mode ordering, shared metric keys.
  • Property-based tests via Hypothesis: run-ID parsing, tool script rendering, error sanitization edge cases.

Test plan

  • All docs render correctly on GitHub
  • Eval tests pass locally
  • PR template renders with correct checkboxes
  • Issue forms show area dropdowns
  • .cursor/index.mdc loads in Cursor Cloud

Part 6 of 6 — split from #303. Stack: PR1 (#474 ✅) → PR2 (#475 ✅) → PR3 (#476 ✅) → PR4 (#477 ✅) → PR5 (#486) → PR6

Copilot AI review requested due to automatic review settings March 23, 2026 08:09
@vercel

vercel Bot commented Mar 23, 2026

Copy link
Copy Markdown

@jreakin is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the repo’s contributor/agent documentation and GitHub templates, adds/updates test infrastructure (Vitest + Playwright) for gitnexus-web, and refactors parts of the eval harness + web-side ingestion/LadybugDB tooling.

Changes:

  • Added/updated core “agent framework” + runbook/testing/architecture docs and GitHub PR/issue templates.
  • Introduced gitnexus-web test harness (Vitest config, unit tests, Playwright E2E) and centralized UI constants / icon imports.
  • Refactored web worker + LadybugDB adapter behaviors (pipeline finalization, server-graph loading, query guards, settings storage migration) and eval harness utilities (tool registry + error sanitization + Hypothesis tests).

Reviewed changes

Copilot reviewed 83 out of 86 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
llms.txt Adds minimal pointer doc listing key repo docs and directories.
gitnexus/vitest.config.ts Removes coverage auto-update setting; minor formatting.
gitnexus/package.json Updates prepare script to set git hooks path after build.
gitnexus-web/vitest.config.ts Adds Vitest configuration for web package (jsdom, coverage, aliases).
gitnexus-web/test/unit/utils.test.ts Unit tests for generateId.
gitnexus-web/test/unit/settings-service.test.ts Unit tests for sessionStorage-based settings service + provider config logic.
gitnexus-web/test/unit/server-connection.test.ts Unit tests for server URL normalization + file content extraction.
gitnexus-web/test/unit/security-guards.test.ts Tests for label/rel allowlists, ID regex, and read-only write detection regex.
gitnexus-web/test/unit/path-resolution.test.ts Tests for new path resolution helpers.
gitnexus-web/test/unit/mermaid-generator.test.ts Tests for mermaid generator output.
gitnexus-web/test/unit/graph.test.ts Tests for graph add/dedup behavior.
gitnexus-web/test/unit/constants.test.ts Tests ensuring constants consistency (colors/sizes/edges).
gitnexus-web/test/setup.ts Vitest setup resetting storage between tests.
gitnexus-web/test/fixtures/graph.ts Shared graph factories for tests.
gitnexus-web/src/workers/ingestion.worker.ts Refactors pipeline finalization; renames/implements server graph loading; clamps negative scores; improves enrichment updates.
gitnexus-web/src/types/lbug-wasm.d.ts Updates LadybugDB WASM type declarations.
gitnexus-web/src/lib/path-resolution.ts Adds reusable path normalization + “best match” resolver.
gitnexus-web/src/lib/lucide-icons.tsx Adds deep icon imports to reduce bundle size.
gitnexus-web/src/lib/grounding-patterns.ts Adds shared regexes for file/node grounding references.
gitnexus-web/src/lib/constants.ts Adds Section styling in node color/size maps.
gitnexus-web/src/hooks/useBackend.ts Centralizes backend URL defaults and debounce constant; adds dev-only fetch warning.
gitnexus-web/src/hooks/app-state/graph.tsx Adds a dedicated GraphState context/provider.
gitnexus-web/src/core/llm/types.ts Centralizes provider default base URLs via UI constants.
gitnexus-web/src/core/llm/tools.ts Adds label/rel-type allowlist checks before interpolating into Cypher; sanitizes relationTypes input.
gitnexus-web/src/core/llm/settings-service.ts Migrates settings to sessionStorage with localStorage migration; refactors provider config selection; centralizes OpenRouter model fetch URL.
gitnexus-web/src/core/llm/agent.ts Centralizes base URLs; improves streaming tool-call parsing; replaces boolean tool tracking with a counter.
gitnexus-web/src/core/lbug/schema.ts Extends REL_TYPES list with additional relation types.
gitnexus-web/src/core/lbug/lbug-adapter.ts Adds init promise; enforces readOnly by default; switches to getAllRows; adds prepared/batched relation insert logic; adds test-mode skip for bulk load.
gitnexus-web/src/core/lbug/csv-generator.ts Adds multi-language CSV generation; fixes CSV escaping for community keywords.
gitnexus-web/src/core/graph/types.ts Adds Section to NodeLabel.
gitnexus-web/src/config/ui-constants.ts Introduces centralized UI/provider constants (timeouts/default URLs).
gitnexus-web/src/components/StatusBar.tsx Uses memoization for primary language; switches icon import; adds test id for “Ready”.
gitnexus-web/src/components/SettingsPanel.tsx Switches to deep icon imports; extracts provider cards; sessionStorage messaging; fixes timeout cleanup.
gitnexus-web/src/components/ProcessesPanel.tsx Adds ID allowlist filter/guard for Cypher queries involving process/step IDs.
gitnexus-web/src/components/ProcessFlowModal.tsx Switches icon imports; sanitizes mermaid SVG; improves keyboard handler guard.
gitnexus-web/src/components/MarkdownRenderer.tsx Switches icon imports; clears copy-state timeout on unmount.
gitnexus-web/src/components/GraphCanvas.tsx Uses O(1) node lookups; clears more highlight state when toggling off; switches icon imports.
gitnexus-web/src/components/FileTreePanel.tsx Switches icon imports; makes search recursive.
gitnexus-web/src/components/DropZone.tsx Switches icon imports; wraps localStorage get/set in try/catch.
gitnexus-web/src/components/CodeReferencesPanel.tsx Switches icon imports; cancels rAF chain on cleanup.
gitnexus-web/src/App.tsx Refactors embeddings startup; changes server-connect flow to load worker DB then init agent; uses centralized error delay.
gitnexus-web/playwright.config.ts Adds Playwright config with optional insecure mode and software WebGL args.
gitnexus-web/package.json Adds engines/scripts, Playwright + testing deps, adds LadybugDB + DOMPurify.
gitnexus-web/e2e/server-connect.spec.ts Adds E2E tests for server connect, AI panel, processes, highlight clearing.
gitnexus-web/e2e/manual-record.spec.ts Adds manual debug recording spec (skipped by default).
gitnexus-web/e2e/debug-issues.spec.ts Adds targeted debug harness spec (skipped unless enabled).
eval/utils/errors.py Adds debug-gated exception sanitization + log helper.
eval/utils/__init__.py Marks eval utils as a package.
eval/tool_registry.py Centralizes tool script specs and metric keys.
eval/tests/test_tool_scripts.py Tests tool script rendering behavior.
eval/tests/test_property_based.py Adds Hypothesis property-based tests for run-id parsing/tool script rendering/error sanitization.
eval/tests/test_parse_run_id.py Adds targeted tests for run-id parsing.
eval/tests/test_errors.py Adds tests for sanitized exception output with/without debug.
eval/tests/conftest.py Ensures eval root is importable in tests.
eval/tests/__init__.py Marks eval tests as a package.
eval/run_eval.py Refactors build/agent/env creation; replaces raw traceback logging with sanitized error recording.
eval/pyproject.toml Adds Hypothesis/coverage deps; includes additional files in wheel build.
eval/environments/gitnexus_docker.py Refactors tool script installation to use registry; centralizes constants; uses safe exception logging.
eval/constants.py Introduces shared constants for eval/bridge timeouts and defaults.
eval/bridge/mcp_bridge.py Centralizes timeouts; improves subprocess stream closing; refactors response reading helpers; safe exception logging.
eval/analysis/analyze_results.py Improves run-id parsing (longest known suffix); uses shared tool metric keys; adds flexible mode ordering.
eval/agents/gitnexus_agent.py Centralizes tool binaries/metric keys; uses default augment timeout constant.
eval/README.md Documents debug logging env var behavior.
TESTING.md Adds consolidated testing documentation for both packages.
RUNBOOK.md Adds operational runbook for common workflows and troubleshooting.
README.md Adds “Development” section linking to new docs.
GUARDRAILS.md Adds contributor/agent guardrails and escalation guidance.
CONTRIBUTING.md Adds contribution guidelines and PR expectations.
CLAUDE.md Adds structured Claude Code-specific guidance and changelog.
ARCHITECTURE.md Adds architecture overview and where-to-change-what map.
AGENTS.md Adds structured agent instructions and Cursor-specific guidance.
.gitignore Adds ignores for handoff docs, Playwright artifacts, and Python test artifacts.
.github/workflows/ci.yml Adds E2E workflow orchestration and updates CI gate/meta artifact.
.github/workflows/ci-quality.yml Adds gitnexus-web typecheck job.
.github/workflows/ci-e2e.yml Adds reusable web E2E workflow triggered only on gitnexus-web/** changes.
.github/PULL_REQUEST_TEMPLATE.md Adds PR template with area/touch + test checklist.
.github/ISSUE_TEMPLATE/feature_request.yml Adds feature request issue form with area dropdown.
.github/ISSUE_TEMPLATE/config.yml Enables blank issues.
.github/ISSUE_TEMPLATE/bug_report.yml Adds bug report issue form with structured fields.
.githooks/pre-commit Adds pre-commit hook to run relevant typecheck/tests based on staged paths.
.cursorrules Deprecates legacy Cursor rules file in favor of .cursor/index.mdc.
.cursor/rules/100-monorepo.mdc Adds glob-scoped monorepo build/test quick refs for Cursor.
.cursor/index.mdc Adds always-on Cursor rules pointing to canonical docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gitnexus-web/src/components/ProcessesPanel.tsx Outdated
Comment thread gitnexus-web/src/core/lbug/lbug-adapter.ts
Comment thread gitnexus/package.json Outdated
Comment thread eval/utils/errors.py
Comment thread gitnexus-web/src/App.tsx
@jreakin jreakin force-pushed the docs/agent-framework-templates-eval branch 5 times, most recently from a2ff6a8 to bce7652 Compare March 23, 2026 08:40
@jreakin

jreakin commented Mar 23, 2026

Copy link
Copy Markdown
Contributor Author

Copilot review addressed:

Comment Action
sanitize_exception() uses format_exc() Fixed (53e5ec0) — switched to traceback.format_exception(type(exc), exc, exc.__traceback__) which reliably captures the passed exception's traceback even outside an active except block.
PR description says "zero runtime impact" The cumulative diff includes runtime changes from PRs 1-5 in the stack. This PR (docs/eval only) has zero runtime impact on its own — the description is accurate for the incremental diff.
isSafeId rejects / Already fixed in PR2 #475.
loadGraphToLbug doesn't close db/conn Already fixed in PR3 #476.
prepare script Windows compat Already fixed in PR1 #474.

@jreakin jreakin force-pushed the docs/agent-framework-templates-eval branch 15 times, most recently from e27063e to 5aa3866 Compare March 23, 2026 15:16
@magyargergo

Copy link
Copy Markdown
Collaborator

Can you please rebase your changes on top of main? I merged your changes from #477 into origin/main

jreakin and others added 2 commits March 23, 2026 13:38
CI:
- ci.yml consolidated to reference ci-tests.yml
- ci-quality.yml: add typecheck-web job for gitnexus-web/
- ci-e2e.yml: E2E workflow with dorny/paths-filter (web changes only)
- ci-report.yml: remove dead integration-reports references
- CI gate allows skipped E2E status
- .gitignore: playwright artifacts, eval test artifacts

Pre-commit hook:
- .githooks/pre-commit: typecheck + unit tests for both packages
- Activated via git config core.hooksPath in prepare script

Test infrastructure:
- Vitest + React Testing Library: 58 unit tests
  (graph, server-connection, mermaid, settings, constants, utils, paths)
- Playwright E2E: 5 tests + manual recording harness
- vitest.config from vitest/config, engines.node >= 20
- Playwright artifacts retain-on-failure
- wait-on in devDependencies
- vitest/coverage-v8 aligned with vitest 4.x

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflects devDependency additions (vitest, playwright, wait-on,
@testing-library, etc.) from package.json changes in this PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jreakin jreakin force-pushed the docs/agent-framework-templates-eval branch from cd330d3 to 5291552 Compare March 23, 2026 18:39
E2E tests referenced these data-testid attributes but they didn't
exist in ProcessItem. All 6 E2E testids now have matching source
elements: status-ready, process-list-loaded, process-row,
process-view-button, process-highlight-button, server-url-input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jreakin jreakin force-pushed the docs/agent-framework-templates-eval branch from 416f990 to e27199a Compare March 23, 2026 19:26
…solving

networkidle waits for zero network activity for 500ms, but Vite's HMR
WebSocket stays open permanently, causing page.goto to timeout at 60s
on all tests after the first. The explicit toBeVisible waits on UI
elements are sufficient and deterministic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jreakin jreakin force-pushed the docs/agent-framework-templates-eval branch from e27199a to ba4d406 Compare March 23, 2026 19:36
…s pass locally

Root cause: test 1 clicked the Server button before React hydrated,
so the tab content never rendered and the input wasn't found.

Fixes:
- Wait for Server button toBeVisible before clicking
- Increase input wait to 15s
- Remove networkidle (Vite HMR WebSocket prevents it from resolving)
- Add retries: 1 in CI for transient cold-start flakiness

Verified locally: all 5 E2E tests pass, 198 unit tests pass, typecheck clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jreakin jreakin force-pushed the docs/agent-framework-templates-eval branch from ba4d406 to 48d7c21 Compare March 23, 2026 19:48
gitnexus analyze can crash with "double free or corruption" (known
issue abhigyanpatwari#273) during the LadybugDB native addon shutdown. The index is
usually written successfully before the crash. The workflow now:
1. Allows analyze to exit non-zero with a warning
2. Verifies .gitnexus index was actually created
3. Only fails if no index exists (real failure)

All tests verified locally: 198 unit, 5 E2E pass, typecheck clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jreakin jreakin force-pushed the docs/agent-framework-templates-eval branch from 48d7c21 to 52167a6 Compare March 23, 2026 20:31
jreakin and others added 4 commits March 23, 2026 15:57
The previous echo string had special characters that broke bash
quoting in GitHub Actions. Simplified to: analyze || true, then
check if .gitnexus exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent framework (layered docs for AI-assisted contributions):
- AGENTS.md: canonical instructions, impact analysis, MCP tools
- CLAUDE.md: Claude Code-specific deltas and hooks
- GUARDRAILS.md: safety boundaries, non-negotiables, escalation
- ARCHITECTURE.md: monorepo layout, data flow map
- TESTING.md: test structure, commands, categories
- RUNBOOK.md: copy-paste operations for dev/CI/MCP
- llms.txt: minimal LLM context pointer

Editor integration:
- .cursor/index.mdc + rules/100-monorepo.mdc

GitHub templates:
- PR template with areas-touched checkboxes
- Bug report + feature request issue forms

Eval harness:
- Refactored mcp_bridge, tool_registry, constants
- Error sanitization utilities
- Property-based tests via Hypothesis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eption

format_exc() returns the currently handled exception traceback, which
may be unrelated if called outside an active except block. Using
format_exception(type(exc), exc, exc.__traceback__) reliably captures
the passed exception's traceback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CONTRIBUTING.md: add gitnexus-web typecheck command, pre-commit hook
  checklist item
- TESTING.md: add gitnexus-web typecheck command, pre-commit hook
  section (husky), update CI integration to list actual workflow files
  (ci-quality, ci-tests, ci-e2e)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jreakin jreakin force-pushed the docs/agent-framework-templates-eval branch from 52167a6 to 53d2004 Compare March 23, 2026 20:58
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude act as senior context engineer and expert in LLMs. Ultra-think while reviewing the AI related md files and research about if this is the right direction of setting up cursor and is optimal for token usage.

@github-actions

github-actions Bot commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 2m 21s —— View job


AI Context Engineering Review — Cursor Setup & Token Optimization

Tasks:

  • Understand the request
  • Read AGENTS.md, CLAUDE.md, GUARDRAILS.md
  • Read .cursor/index.mdc, .cursor/rules/100-monorepo.mdc, .cursorrules
  • Read llms.txt, ARCHITECTURE.md, CONTRIBUTING.md
  • Analyze token efficiency of current setup
  • Write detailed review

Direction Assessment: Mostly Right, But Critical Token Waste Found

The layered architecture (policy → operations → architecture) is the correct mental model for multi-agent contexts. The decisions to use .cursor/index.mdc as a pointer, scope rules with globs, and deprecate .cursorrules all follow current best practices. However, there are two critical duplication bugs that waste a significant number of tokens per session, plus several structural improvements worth making.


🔴 Critical: gitnexus:start Block Is Loaded Twice Per Claude Code Session

CLAUDE.md lines 60–161 contain the full `` block — the same ~100-line GitNexus tool reference that also lives in AGENTS.md. CLAUDE.md is always loaded by Claude Code, so every session consumes this block plus any time AGENTS.md is read. Line 58 of CLAUDE.md even says:

"For all GitNexus MCP guidance…, use AGENTS.md as the single source of truth."

…and then immediately embeds the full block, contradicting itself.

Token cost: ~700 tokens per Claude Code session, every session, even for doc-only or eval work.

Fix: Replace lines 60–161 in CLAUDE.md with a single pointer:

GitNexus MCP rules are in the `` block in **AGENTS.md** — load that section when working with MCP tools or the graph index.

🔴 Critical: gitnexus:start Block Is Duplicated Inside AGENTS.md Itself

In AGENTS.md, the Reference Documentation bullet at line 51 embeds the entire gitnexus:start block inline as the value of a list item (lines 52–151), and then the same block appears again as a standalone section at lines 162–262. This means any agent that reads AGENTS.md linearly processes the block twice.

This is likely a markdown formatting artifact — the reference text tried to say "see the block below" but instead pasted the block as literal text. The reference bullet should read:

- **GitNexus:** skills under `.claude/skills/gitnexus/`; machine-oriented rules in the `gitnexus:start``gitnexus:end` block below.

🟡 High: .cursor/index.mdc Is Too Minimal for "Always On" Use

Current content (~13 lines) just says "go read AGENTS.md." With alwaysApply: true, Cursor injects this file's content into every request context — which means the 3 most critical rules (impact before edit, detect_changes before commit, never find-and-replace rename) are not in that always-on context. They only get applied if the model decides to fetch AGENTS.md.

For hard safety gates, the critical rules need to live in the always-on file itself. Best practice is a two-tier approach:

  • .cursor/index.mdc (alwaysApply: true) — 15–20 lines max: the 3–5 non-negotiables + pointer to AGENTS.md for details
  • AGENTS.md — full reference, read on demand for complex tasks

Recommended addition to .cursor/index.mdc:

## Non-negotiables (always apply)
- NEVER edit a function/class/method without running `gitnexus_impact` first.
- NEVER rename symbols with find-and-replace — use `gitnexus_rename`.  
- NEVER commit without running `gitnexus_detect_changes()`.
- NEVER ignore HIGH/CRITICAL risk warnings.
- NEVER run `npx gitnexus analyze` without `--embeddings` if `.gitnexus/meta.json` shows stored embeddings.

Full rules → **AGENTS.md** (`gitnexus:start` block, Cursor Cloud section).

This adds ~200 tokens to every Cursor request but guarantees the safety gates are applied — far better than the current approach where they might be skipped.


🟡 Medium: Scope Table Repeated 3× Across Files

The read/write/execute/off-limits table appears in AGENTS.md, CLAUDE.md, and GUARDRAILS.md with essentially the same content. This costs ~300–400 tokens per file that gets loaded and adds maintenance burden (update one, forget the others).

Recommended: Keep it only in AGENTS.md as the single source of truth. CLAUDE.md and GUARDRAILS.md should reference it via a link, not re-state it.


🟡 Medium: No Cursor Rule for eval/

.cursor/rules/100-monorepo.mdc has globs gitnexus/** and gitnexus-web/** but nothing for eval/**. Python/eval contributors working in that directory get no guidance about the uv run stack, Hypothesis tests, or the bridge timeout constants. A 10-line 200-eval.mdc with Python-specific commands would close this gap.


🟢 Good: Structural Patterns That Are Correct

Pattern Why it's right
.cursor/index.mdc as pointer, not content dump Follows Cursor's recommended rule architecture; avoids putting stale rules in an always-on file
Glob-scoped rules in .cursor/rules/*.mdc Correct — context is injected only when working on matching files, not for every request
Deprecating .cursorrules Right call; .cursorrules is loaded for legacy compatibility only
GUARDRAILS.md "Signs" pattern Excellent LLM prompt engineering — trigger/instruction/reason triples are highly effective for consistent agent behavior
llms.txt for context seeding Follows the llms.txt convention correctly; good for RAG-style tooling
Self-referential indexing (the repo indexed by GitNexus itself) Genuinely clever and architecturally sound; closes the loop for AI-assisted contributions

🔵 Minor: llms.txt Could Be More Useful

Current llms.txt lists files without explaining priority or usage context. LLM-aware tools that consume llms.txt benefit from brief annotations:

# Core docs (read first for any contribution)
AGENTS.md — canonical agent instructions, GitNexus MCP tool reference
GUARDRAILS.md — non-negotiables and escalation triggers

# Operational (read when running commands or debugging)
RUNBOOK.md — copy-paste commands for common workflows
TESTING.md — test commands for both packages

Token Budget Summary

Item Est. tokens Status
gitnexus:start block in CLAUDE.md ~700/session ❌ Remove
gitnexus:start block doubled in AGENTS.md ~700 when read ❌ Fix
Scope table ×3 ~400/file loaded 🟡 Consolidate
.cursor/index.mdc always-on ~113/Cursor req 🟡 Expand to ~250 with critical rules
100-monorepo.mdc on-match ~125 ✅ Appropriate

The two critical fixes (remove block from CLAUDE.md, fix AGENTS.md double-embed) would save ~700–1,400 tokens per session with zero loss of information, since AGENTS.md remains the authoritative source.


Summary

The direction is correct — layered docs, glob-scoped Cursor rules, pointer-based always-on context, and self-referential indexing are all sound patterns. The issues are execution-level: the gitnexus:start block was accidentally duplicated into CLAUDE.md (and doubled inside AGENTS.md), and .cursor/index.mdc is too minimal to guarantee the critical safety rules are applied in every Cursor session. Fix those two things and this framework is solid.

@magyargergo

Copy link
Copy Markdown
Collaborator

Cc @abhigyanpatwari

Please have a look at this @jreakin 🙏

jreakin and others added 2 commits March 23, 2026 20:32
…atwari#486

- AGENTS.md: update test counts (CLI ~2000 unit, ~1850 integration),
  add gitnexus-web testing section (198 unit, 5 E2E with commands)
- RUNBOOK.md: fix Node requirement to >=20, fix E2E local repro command
- TESTING.md: E2E uses data-testid selectors + real servers, not mocks
- .cursor/rules/100-monorepo.mdc: add web test/E2E commands

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@magyargergo

Copy link
Copy Markdown
Collaborator

Can you please rebase this PR on top of origin/main?

# Conflicts:
#	.github/workflows/ci-e2e.yml
#	.github/workflows/ci-report.yml
#	.github/workflows/ci-tests.yml
#	.github/workflows/ci.yml
#	gitnexus-web/e2e/debug-issues.spec.ts
#	gitnexus-web/e2e/server-connect.spec.ts
@jreakin jreakin marked this pull request as draft March 24, 2026 06:13
… Cursor rules

- Remove ~100-line gitnexus:start block from CLAUDE.md (was duplicated from AGENTS.md)
- Fix gitnexus:start block inlined inside AGENTS.md Reference Docs bullet (doubled)
- Replace CLAUDE.md scope table with pointer to AGENTS.md (single source of truth)
- Expand .cursor/index.mdc with 5 non-negotiable safety rules for always-on context
- Add .cursor/rules/200-eval.mdc with Python/eval commands (glob-scoped to eval/**)
- Improve llms.txt with priority annotations and descriptions
- Bump version headers to 1.2.0, last-reviewed to 2026-03-24

Saves ~1,400 tokens/session with zero information loss.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
4134 4132 0 2 168s

✅ All 4132 tests passed

2 test(s) skipped — expand for details
  • withTestLbugDB(lock-retry) > withLbugDb retry behaviour > throws after max retry attempts
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 70.71% 10216/14447 70.71% = 0.0 🟢 ██████████████░░░░░░
Branches 61.26% 6932/11315 61.25% 📈 +0.0 🟢 ████████████░░░░░░░░
Functions 73.76% 897/1216 73.76% = 0.0 🟢 ██████████████░░░░░░
Lines 73.09% 9122/12480 73.09% = 0.0 🟢 ██████████████░░░░░░

📋 View full run · Generated by CI

@jreakin jreakin marked this pull request as ready for review March 24, 2026 06:22
@jreakin

jreakin commented Mar 24, 2026

Copy link
Copy Markdown
Contributor Author

Addressed all context engineering review findings

Pushed 7020293 addressing every item from the AI context engineering review:

Critical (token waste)

Finding Fix Tokens saved
gitnexus:start block duplicated in CLAUDE.md Replaced ~100-line block with 1-line pointer to AGENTS.md ~700/session
gitnexus:start block doubled inside AGENTS.md (inlined in Reference Docs bullet + standalone) Fixed reference bullet to say "see block below" ~700 when read

High

Finding Fix
.cursor/index.mdc too minimal for always-on use Added 5 non-negotiable safety rules (impact before edit, no find-and-replace rename, detect_changes before commit, no ignoring HIGH/CRITICAL, embeddings guard)

Medium

Finding Fix
Scope table repeated in CLAUDE.md + AGENTS.md Removed from CLAUDE.md, replaced with pointer to AGENTS.md
No Cursor rule for eval/ directory Added .cursor/rules/200-eval.mdc (glob-scoped, Python/Hypothesis/uv commands)

Minor

Finding Fix
llms.txt lacks priority annotations Reorganized into Core → Operational → Architecture tiers with descriptions

Also in this push

Test results

  • gitnexus-web: 198/198 passed
  • gitnexus CLI: 3832/3840 passed (8 pre-existing failures in express-routes and python-mcp-tools resolver integration tests — unrelated to this PR)

Net effect: -230 lines, +49 lines, saving ~1,400 tokens/session with zero information loss.


🤖 Generated with Claude Code

@magyargergo

Copy link
Copy Markdown
Collaborator

@jreakin Can you please resolve these conflicts? I'll merge your changes after this is green! 🚀

@magyargergo magyargergo merged commit c68d797 into abhigyanpatwari:main Mar 25, 2026
9 of 10 checks passed
@jreakin jreakin deleted the docs/agent-framework-templates-eval branch March 26, 2026 01:11
motolese pushed a commit to motolese/datamoto-gitnexus that referenced this pull request Apr 23, 2026
…bhigyanpatwari#479)

* ci: E2E workflow, web typecheck job, pre-commit hook, test suite

CI:
- ci.yml consolidated to reference ci-tests.yml
- ci-quality.yml: add typecheck-web job for gitnexus-web/
- ci-e2e.yml: E2E workflow with dorny/paths-filter (web changes only)
- ci-report.yml: remove dead integration-reports references
- CI gate allows skipped E2E status
- .gitignore: playwright artifacts, eval test artifacts

Pre-commit hook:
- .githooks/pre-commit: typecheck + unit tests for both packages
- Activated via git config core.hooksPath in prepare script

Test infrastructure:
- Vitest + React Testing Library: 58 unit tests
  (graph, server-connection, mermaid, settings, constants, utils, paths)
- Playwright E2E: 5 tests + manual recording harness
- vitest.config from vitest/config, engines.node >= 20
- Playwright artifacts retain-on-failure
- wait-on in devDependencies
- vitest/coverage-v8 aligned with vitest 4.x

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update gitnexus-web package-lock.json

Reflects devDependency additions (vitest, playwright, wait-on,
@testing-library, etc.) from package.json changes in this PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(e2e): add missing process-list-loaded testid, increase CI timeouts

- Add data-testid="process-list-loaded" to ProcessesPanel (E2E tests
  were waiting for an element that didn't exist)
- Increase server connect timeouts from 5s to 10s for slower CI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): run gitnexus-web unit tests in CI, remove unused variable

- Add gitnexus-web npm ci + vitest run to ci-tests.yml so web unit
  tests are gated by the CI status check (were only running locally)
- Remove unused IS_PLAYWRIGHT_AUTOMATION variable from E2E spec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(e2e): add process-row testid, wait for networkidle on page load

- Add data-testid="process-row" to ProcessItem component (E2E tests
  referenced it but it didn't exist in the source)
- Use waitUntil: 'networkidle' on page.goto to ensure Vite dev server
  is fully ready before interacting (fixes first-test timeout in CI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(e2e): add process-view-button and process-highlight-button testids

E2E tests referenced these data-testid attributes but they didn't
exist in ProcessItem. All 6 E2E testids now have matching source
elements: status-ready, process-list-loaded, process-row,
process-view-button, process-highlight-button, server-url-input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(e2e): remove networkidle — Vite HMR WebSocket prevents it from resolving

networkidle waits for zero network activity for 500ms, but Vite's HMR
WebSocket stays open permanently, causing page.goto to timeout at 60s
on all tests after the first. The explicit toBeVisible waits on UI
elements are sufficient and deterministic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(e2e): wait for Server button visibility, add CI retry, all 5 tests pass locally

Root cause: test 1 clicked the Server button before React hydrated,
so the tab content never rendered and the input wasn't found.

Fixes:
- Wait for Server button toBeVisible before clicking
- Increase input wait to 15s
- Remove networkidle (Vite HMR WebSocket prevents it from resolving)
- Add retries: 1 in CI for transient cold-start flakiness

Verified locally: all 5 E2E tests pass, 198 unit tests pass, typecheck clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): tolerate LadybugDB native crash during analyze step

gitnexus analyze can crash with "double free or corruption" (known
issue abhigyanpatwari#273) during the LadybugDB native addon shutdown. The index is
usually written successfully before the crash. The workflow now:
1. Allows analyze to exit non-zero with a warning
2. Verifies .gitnexus index was actually created
3. Only fails if no index exists (real failure)

All tests verified locally: 198 unit, 5 E2E pass, typecheck clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): fix shell quoting in analyze step, simplify to || true

The previous echo string had special characters that broke bash
quoting in GitHub Actions. Simplified to: analyze || true, then
check if .gitnexus exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add agent development framework, GitHub templates, eval refactor

Agent framework (layered docs for AI-assisted contributions):
- AGENTS.md: canonical instructions, impact analysis, MCP tools
- CLAUDE.md: Claude Code-specific deltas and hooks
- GUARDRAILS.md: safety boundaries, non-negotiables, escalation
- ARCHITECTURE.md: monorepo layout, data flow map
- TESTING.md: test structure, commands, categories
- RUNBOOK.md: copy-paste operations for dev/CI/MCP
- llms.txt: minimal LLM context pointer

Editor integration:
- .cursor/index.mdc + rules/100-monorepo.mdc

GitHub templates:
- PR template with areas-touched checkboxes
- Bug report + feature request issue forms

Eval harness:
- Refactored mcp_bridge, tool_registry, constants
- Error sanitization utilities
- Property-based tests via Hypothesis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(eval): use format_exception instead of format_exc in sanitize_exception

format_exc() returns the currently handled exception traceback, which
may be unrelated if called outside an active except block. Using
format_exception(type(exc), exc, exc.__traceback__) reliably captures
the passed exception's traceback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update CONTRIBUTING.md and TESTING.md for current CI/hook setup

- CONTRIBUTING.md: add gitnexus-web typecheck command, pre-commit hook
  checklist item
- TESTING.md: add gitnexus-web typecheck command, pre-commit hook
  section (husky), update CI integration to list actual workflow files
  (ci-quality, ci-tests, ci-e2e)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update testing docs to reflect CI/E2E changes from PR abhigyanpatwari#486

- AGENTS.md: update test counts (CLI ~2000 unit, ~1850 integration),
  add gitnexus-web testing section (198 unit, 5 E2E with commands)
- RUNBOOK.md: fix Node requirement to >=20, fix E2E local repro command
- TESTING.md: E2E uses data-testid selectors + real servers, not mocks
- .cursor/rules/100-monorepo.mdc: add web test/E2E commands

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: address context engineering review — deduplicate tokens, expand Cursor rules

- Remove ~100-line gitnexus:start block from CLAUDE.md (was duplicated from AGENTS.md)
- Fix gitnexus:start block inlined inside AGENTS.md Reference Docs bullet (doubled)
- Replace CLAUDE.md scope table with pointer to AGENTS.md (single source of truth)
- Expand .cursor/index.mdc with 5 non-negotiable safety rules for always-on context
- Add .cursor/rules/200-eval.mdc with Python/eval commands (glob-scoped to eval/**)
- Improve llms.txt with priority annotations and descriptions
- Bump version headers to 1.2.0, last-reviewed to 2026-03-24

Saves ~1,400 tokens/session with zero information loss.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Gergő Magyar <gergomagyar@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants