docs: agent development framework, GitHub templates, eval refactor by jreakin · Pull Request #479 · abhigyanpatwari/GitNexus

jreakin · 2026-03-23T08:09:22Z

Summary

Documentation, templates, and eval tooling. Zero runtime impact. Split from #303 (PR 6 of 6). Depends on #486.

Why this matters

GitNexus is increasingly contributed to by AI agents (Claude Code, Cursor, GitHub Copilot) alongside human developers. Without structured guidance, agents make well-intentioned but unsafe changes — renaming symbols with find-and-replace instead of the graph-aware rename tool, skipping impact analysis before edits, committing without running detect_changes, or accidentally deleting embeddings by running analyze without --embeddings.

This PR introduces a layered documentation framework that gives both human and AI contributors the same guardrails:

What to do before editing — run impact analysis, check blast radius, understand risk levels
What never to do — no secrets, no find-and-replace renames, no commits without change detection
How to recover — copy-paste runbook commands for every common failure mode
How to contribute — PR template enforces test verification, conventional commits, area-touched checkboxes

The framework is self-referential: the project is indexed by GitNexus itself, so agents reading AGENTS.md are instructed to use GitNexus MCP tools to safely navigate the codebase they're modifying. This makes AI-assisted contributions reproducible and auditable.

Agent Framework

These documents form a layered system — policy on top, operations in the middle, architecture at the base:

Policy layer — rules and safety gates:

AGENTS.md — Canonical agent instructions. Mandatory impact analysis before edits (gitnexus_impact), change detection before commits (gitnexus_detect_changes), scope boundaries (what agents can read/write/execute vs. what's off-limits). Includes GitNexus MCP tool quick reference and impact risk levels (d=1 WILL BREAK → d=3 MAY NEED TESTING). Recurring failure patterns ("Signs") with fixes.
CLAUDE.md — Claude Code-specific deltas. PreToolUse hooks for hard gates (e.g., tests before git_commit). Execution sequence discipline for long sessions. Defers all MCP guidance to AGENTS.md as single source of truth.
GUARDRAILS.md — Safety boundaries for humans and AI. Five non-negotiables: no secrets, use rename tool, impact analysis, detect_changes, preserve embeddings. Escalation scenarios (HIGH/CRITICAL risk, CI/release changes, data loss uncertainty). Publishing and supply chain rules.

Operations layer — what to run and when:

TESTING.md — Test structure, commands, and targets for both packages (gitnexus CLI: vitest; gitnexus-web: vitest + Playwright E2E). Categories: unit, integration, eval-style golden sets, E2E. Known issues (LadybugDB locking in containers).
RUNBOOK.md — Copy-paste operations: index a repo, generate embeddings, troubleshoot MCP (no repos → run analyze; wrong repo → use list_repos + repo param), clean slate, CI failure repro, memory/locking issues.

Architecture layer — system context:

ARCHITECTURE.md — Monorepo layout (gitnexus CLI/MCP, gitnexus-web React/Vite, plugin packages, eval harness). End-to-end data flow: Tree-sitter parsing → LadybugDB persistence → MCP tools/resources. Where-to-change-what map for common modifications.
llms.txt — Minimal pointer document listing key docs and directories for LLM context seeding.

Editor Integration

.cursor/index.mdc — Always-on Cursor Cloud rules pointing to AGENTS.md for canonical instructions.
.cursor/rules/100-monorepo.mdc — Glob-scoped build/test quick reference per package. Known issues (native build tools, LadybugDB locking).
.cursorrules — Legacy deprecation stub pointing to .cursor/index.mdc.

GitHub Templates

PR template — Structured sections: summary, motivation, areas touched (checkboxes for CLI, web, CI, docs, eval), scope/constraints, implementation notes, testing/verification (with test command checkboxes), risk/rollout, and checklist (PR body, AGENTS.md changelog, no secrets, validation).
Bug report form — Area dropdown (CLI/core, web UI, CI, docs, other), structured fields for summary, context, expected/actual behavior, reproduction steps, environment, logs/screenshots.
Feature request form — Area dropdown, problem/opportunity, proposed solution, alternatives, acceptance criteria, contributor willingness checkbox.
config.yml — Blank issues enabled for edge cases outside the templates.

Eval Harness

tool_registry.py — Centralized tool script specs and metric keys (was scattered across multiple files).
constants.py — Shared constants for eval/bridge timeouts and defaults.
utils/errors.py — Debug-gated exception sanitization using traceback.format_exception (not format_exc) for reliable traceback capture. log_safe_exception helper for consistent error logging without stack trace leaks.
bridge/mcp_bridge.py — Centralized MCP timeout constants, improved subprocess stream closing, refactored response reading.
analysis/analyze_results.py — Longest-suffix-first run-ID parsing, flexible mode ordering, shared metric keys.
Property-based tests via Hypothesis: run-ID parsing, tool script rendering, error sanitization edge cases.

Test plan

All docs render correctly on GitHub
Eval tests pass locally
PR template renders with correct checkboxes
Issue forms show area dropdowns
.cursor/index.mdc loads in Cursor Cloud

Part 6 of 6 — split from #303. Stack: PR1 (#474 ✅) → PR2 (#475 ✅) → PR3 (#476 ✅) → PR4 (#477 ✅) → PR5 (#486) → PR6

vercel · 2026-03-23T08:09:26Z

@jreakin is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

Copilot

Pull request overview

This PR expands the repo’s contributor/agent documentation and GitHub templates, adds/updates test infrastructure (Vitest + Playwright) for gitnexus-web, and refactors parts of the eval harness + web-side ingestion/LadybugDB tooling.

Changes:

Added/updated core “agent framework” + runbook/testing/architecture docs and GitHub PR/issue templates.
Introduced gitnexus-web test harness (Vitest config, unit tests, Playwright E2E) and centralized UI constants / icon imports.
Refactored web worker + LadybugDB adapter behaviors (pipeline finalization, server-graph loading, query guards, settings storage migration) and eval harness utilities (tool registry + error sanitization + Hypothesis tests).

Reviewed changes

Copilot reviewed 83 out of 86 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`llms.txt`	Adds minimal pointer doc listing key repo docs and directories.
`gitnexus/vitest.config.ts`	Removes coverage auto-update setting; minor formatting.
`gitnexus/package.json`	Updates `prepare` script to set git hooks path after build.
`gitnexus-web/vitest.config.ts`	Adds Vitest configuration for web package (jsdom, coverage, aliases).
`gitnexus-web/test/unit/utils.test.ts`	Unit tests for `generateId`.
`gitnexus-web/test/unit/settings-service.test.ts`	Unit tests for sessionStorage-based settings service + provider config logic.
`gitnexus-web/test/unit/server-connection.test.ts`	Unit tests for server URL normalization + file content extraction.
`gitnexus-web/test/unit/security-guards.test.ts`	Tests for label/rel allowlists, ID regex, and read-only write detection regex.
`gitnexus-web/test/unit/path-resolution.test.ts`	Tests for new path resolution helpers.
`gitnexus-web/test/unit/mermaid-generator.test.ts`	Tests for mermaid generator output.
`gitnexus-web/test/unit/graph.test.ts`	Tests for graph add/dedup behavior.
`gitnexus-web/test/unit/constants.test.ts`	Tests ensuring constants consistency (colors/sizes/edges).
`gitnexus-web/test/setup.ts`	Vitest setup resetting storage between tests.
`gitnexus-web/test/fixtures/graph.ts`	Shared graph factories for tests.
`gitnexus-web/src/workers/ingestion.worker.ts`	Refactors pipeline finalization; renames/implements server graph loading; clamps negative scores; improves enrichment updates.
`gitnexus-web/src/types/lbug-wasm.d.ts`	Updates LadybugDB WASM type declarations.
`gitnexus-web/src/lib/path-resolution.ts`	Adds reusable path normalization + “best match” resolver.
`gitnexus-web/src/lib/lucide-icons.tsx`	Adds deep icon imports to reduce bundle size.
`gitnexus-web/src/lib/grounding-patterns.ts`	Adds shared regexes for file/node grounding references.
`gitnexus-web/src/lib/constants.ts`	Adds `Section` styling in node color/size maps.
`gitnexus-web/src/hooks/useBackend.ts`	Centralizes backend URL defaults and debounce constant; adds dev-only fetch warning.
`gitnexus-web/src/hooks/app-state/graph.tsx`	Adds a dedicated GraphState context/provider.
`gitnexus-web/src/core/llm/types.ts`	Centralizes provider default base URLs via UI constants.
`gitnexus-web/src/core/llm/tools.ts`	Adds label/rel-type allowlist checks before interpolating into Cypher; sanitizes relationTypes input.
`gitnexus-web/src/core/llm/settings-service.ts`	Migrates settings to sessionStorage with localStorage migration; refactors provider config selection; centralizes OpenRouter model fetch URL.
`gitnexus-web/src/core/llm/agent.ts`	Centralizes base URLs; improves streaming tool-call parsing; replaces boolean tool tracking with a counter.
`gitnexus-web/src/core/lbug/schema.ts`	Extends `REL_TYPES` list with additional relation types.
`gitnexus-web/src/core/lbug/lbug-adapter.ts`	Adds init promise; enforces readOnly by default; switches to `getAllRows`; adds prepared/batched relation insert logic; adds test-mode skip for bulk load.
`gitnexus-web/src/core/lbug/csv-generator.ts`	Adds multi-language CSV generation; fixes CSV escaping for community keywords.
`gitnexus-web/src/core/graph/types.ts`	Adds `Section` to `NodeLabel`.
`gitnexus-web/src/config/ui-constants.ts`	Introduces centralized UI/provider constants (timeouts/default URLs).
`gitnexus-web/src/components/StatusBar.tsx`	Uses memoization for primary language; switches icon import; adds test id for “Ready”.
`gitnexus-web/src/components/SettingsPanel.tsx`	Switches to deep icon imports; extracts provider cards; sessionStorage messaging; fixes timeout cleanup.
`gitnexus-web/src/components/ProcessesPanel.tsx`	Adds ID allowlist filter/guard for Cypher queries involving process/step IDs.
`gitnexus-web/src/components/ProcessFlowModal.tsx`	Switches icon imports; sanitizes mermaid SVG; improves keyboard handler guard.
`gitnexus-web/src/components/MarkdownRenderer.tsx`	Switches icon imports; clears copy-state timeout on unmount.
`gitnexus-web/src/components/GraphCanvas.tsx`	Uses O(1) node lookups; clears more highlight state when toggling off; switches icon imports.
`gitnexus-web/src/components/FileTreePanel.tsx`	Switches icon imports; makes search recursive.
`gitnexus-web/src/components/DropZone.tsx`	Switches icon imports; wraps localStorage get/set in try/catch.
`gitnexus-web/src/components/CodeReferencesPanel.tsx`	Switches icon imports; cancels rAF chain on cleanup.
`gitnexus-web/src/App.tsx`	Refactors embeddings startup; changes server-connect flow to load worker DB then init agent; uses centralized error delay.
`gitnexus-web/playwright.config.ts`	Adds Playwright config with optional insecure mode and software WebGL args.
`gitnexus-web/package.json`	Adds engines/scripts, Playwright + testing deps, adds LadybugDB + DOMPurify.
`gitnexus-web/e2e/server-connect.spec.ts`	Adds E2E tests for server connect, AI panel, processes, highlight clearing.
`gitnexus-web/e2e/manual-record.spec.ts`	Adds manual debug recording spec (skipped by default).
`gitnexus-web/e2e/debug-issues.spec.ts`	Adds targeted debug harness spec (skipped unless enabled).
`eval/utils/errors.py`	Adds debug-gated exception sanitization + log helper.
`eval/utils/__init__.py`	Marks eval utils as a package.
`eval/tool_registry.py`	Centralizes tool script specs and metric keys.
`eval/tests/test_tool_scripts.py`	Tests tool script rendering behavior.
`eval/tests/test_property_based.py`	Adds Hypothesis property-based tests for run-id parsing/tool script rendering/error sanitization.
`eval/tests/test_parse_run_id.py`	Adds targeted tests for run-id parsing.
`eval/tests/test_errors.py`	Adds tests for sanitized exception output with/without debug.
`eval/tests/conftest.py`	Ensures eval root is importable in tests.
`eval/tests/__init__.py`	Marks eval tests as a package.
`eval/run_eval.py`	Refactors build/agent/env creation; replaces raw traceback logging with sanitized error recording.
`eval/pyproject.toml`	Adds Hypothesis/coverage deps; includes additional files in wheel build.
`eval/environments/gitnexus_docker.py`	Refactors tool script installation to use registry; centralizes constants; uses safe exception logging.
`eval/constants.py`	Introduces shared constants for eval/bridge timeouts and defaults.
`eval/bridge/mcp_bridge.py`	Centralizes timeouts; improves subprocess stream closing; refactors response reading helpers; safe exception logging.
`eval/analysis/analyze_results.py`	Improves run-id parsing (longest known suffix); uses shared tool metric keys; adds flexible mode ordering.
`eval/agents/gitnexus_agent.py`	Centralizes tool binaries/metric keys; uses default augment timeout constant.
`eval/README.md`	Documents debug logging env var behavior.
`TESTING.md`	Adds consolidated testing documentation for both packages.
`RUNBOOK.md`	Adds operational runbook for common workflows and troubleshooting.
`README.md`	Adds “Development” section linking to new docs.
`GUARDRAILS.md`	Adds contributor/agent guardrails and escalation guidance.
`CONTRIBUTING.md`	Adds contribution guidelines and PR expectations.
`CLAUDE.md`	Adds structured Claude Code-specific guidance and changelog.
`ARCHITECTURE.md`	Adds architecture overview and where-to-change-what map.
`AGENTS.md`	Adds structured agent instructions and Cursor-specific guidance.
`.gitignore`	Adds ignores for handoff docs, Playwright artifacts, and Python test artifacts.
`.github/workflows/ci.yml`	Adds E2E workflow orchestration and updates CI gate/meta artifact.
`.github/workflows/ci-quality.yml`	Adds `gitnexus-web` typecheck job.
`.github/workflows/ci-e2e.yml`	Adds reusable web E2E workflow triggered only on `gitnexus-web/**` changes.
`.github/PULL_REQUEST_TEMPLATE.md`	Adds PR template with area/touch + test checklist.
`.github/ISSUE_TEMPLATE/feature_request.yml`	Adds feature request issue form with area dropdown.
`.github/ISSUE_TEMPLATE/config.yml`	Enables blank issues.
`.github/ISSUE_TEMPLATE/bug_report.yml`	Adds bug report issue form with structured fields.
`.githooks/pre-commit`	Adds pre-commit hook to run relevant typecheck/tests based on staged paths.
`.cursorrules`	Deprecates legacy Cursor rules file in favor of `.cursor/index.mdc`.
`.cursor/rules/100-monorepo.mdc`	Adds glob-scoped monorepo build/test quick refs for Cursor.
`.cursor/index.mdc`	Adds always-on Cursor rules pointing to canonical docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jreakin · 2026-03-23T08:44:25Z

Copilot review addressed:

Comment	Action
`sanitize_exception()` uses `format_exc()`	Fixed (`53e5ec0`) — switched to `traceback.format_exception(type(exc), exc, exc.__traceback__)` which reliably captures the passed exception's traceback even outside an active `except` block.
PR description says "zero runtime impact"	The cumulative diff includes runtime changes from PRs 1-5 in the stack. This PR (docs/eval only) has zero runtime impact on its own — the description is accurate for the incremental diff.
`isSafeId` rejects `/`	Already fixed in PR2 #475.
`loadGraphToLbug` doesn't close db/conn	Already fixed in PR3 #476.
`prepare` script Windows compat	Already fixed in PR1 #474.

magyargergo · 2026-03-23T17:02:41Z

Can you please rebase your changes on top of main? I merged your changes from #477 into origin/main

CI: - ci.yml consolidated to reference ci-tests.yml - ci-quality.yml: add typecheck-web job for gitnexus-web/ - ci-e2e.yml: E2E workflow with dorny/paths-filter (web changes only) - ci-report.yml: remove dead integration-reports references - CI gate allows skipped E2E status - .gitignore: playwright artifacts, eval test artifacts Pre-commit hook: - .githooks/pre-commit: typecheck + unit tests for both packages - Activated via git config core.hooksPath in prepare script Test infrastructure: - Vitest + React Testing Library: 58 unit tests (graph, server-connection, mermaid, settings, constants, utils, paths) - Playwright E2E: 5 tests + manual recording harness - vitest.config from vitest/config, engines.node >= 20 - Playwright artifacts retain-on-failure - wait-on in devDependencies - vitest/coverage-v8 aligned with vitest 4.x Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reflects devDependency additions (vitest, playwright, wait-on, @testing-library, etc.) from package.json changes in this PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

E2E tests referenced these data-testid attributes but they didn't exist in ProcessItem. All 6 E2E testids now have matching source elements: status-ready, process-list-loaded, process-row, process-view-button, process-highlight-button, server-url-input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…solving networkidle waits for zero network activity for 500ms, but Vite's HMR WebSocket stays open permanently, causing page.goto to timeout at 60s on all tests after the first. The explicit toBeVisible waits on UI elements are sufficient and deterministic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…s pass locally Root cause: test 1 clicked the Server button before React hydrated, so the tab content never rendered and the input wasn't found. Fixes: - Wait for Server button toBeVisible before clicking - Increase input wait to 15s - Remove networkidle (Vite HMR WebSocket prevents it from resolving) - Add retries: 1 in CI for transient cold-start flakiness Verified locally: all 5 E2E tests pass, 198 unit tests pass, typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gitnexus analyze can crash with "double free or corruption" (known issue abhigyanpatwari#273) during the LadybugDB native addon shutdown. The index is usually written successfully before the crash. The workflow now: 1. Allows analyze to exit non-zero with a warning 2. Verifies .gitnexus index was actually created 3. Only fails if no index exists (real failure) All tests verified locally: 198 unit, 5 E2E pass, typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The previous echo string had special characters that broke bash quoting in GitHub Actions. Simplified to: analyze || true, then check if .gitnexus exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Agent framework (layered docs for AI-assisted contributions): - AGENTS.md: canonical instructions, impact analysis, MCP tools - CLAUDE.md: Claude Code-specific deltas and hooks - GUARDRAILS.md: safety boundaries, non-negotiables, escalation - ARCHITECTURE.md: monorepo layout, data flow map - TESTING.md: test structure, commands, categories - RUNBOOK.md: copy-paste operations for dev/CI/MCP - llms.txt: minimal LLM context pointer Editor integration: - .cursor/index.mdc + rules/100-monorepo.mdc GitHub templates: - PR template with areas-touched checkboxes - Bug report + feature request issue forms Eval harness: - Refactored mcp_bridge, tool_registry, constants - Error sanitization utilities - Property-based tests via Hypothesis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eption format_exc() returns the currently handled exception traceback, which may be unrelated if called outside an active except block. Using format_exception(type(exc), exc, exc.__traceback__) reliably captures the passed exception's traceback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- CONTRIBUTING.md: add gitnexus-web typecheck command, pre-commit hook checklist item - TESTING.md: add gitnexus-web typecheck command, pre-commit hook section (husky), update CI integration to list actual workflow files (ci-quality, ci-tests, ci-e2e) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

magyargergo · 2026-03-23T22:11:35Z

@claude act as senior context engineer and expert in LLMs. Ultra-think while reviewing the AI related md files and research about if this is the right direction of setting up cursor and is optimal for token usage.

github-actions · 2026-03-23T22:11:46Z

Claude finished @magyargergo's task in 2m 21s —— View job

AI Context Engineering Review — Cursor Setup & Token Optimization

Tasks:

Understand the request
Read AGENTS.md, CLAUDE.md, GUARDRAILS.md
Read .cursor/index.mdc, .cursor/rules/100-monorepo.mdc, .cursorrules
Read llms.txt, ARCHITECTURE.md, CONTRIBUTING.md
Analyze token efficiency of current setup
Write detailed review

Direction Assessment: Mostly Right, But Critical Token Waste Found

The layered architecture (policy → operations → architecture) is the correct mental model for multi-agent contexts. The decisions to use .cursor/index.mdc as a pointer, scope rules with globs, and deprecate .cursorrules all follow current best practices. However, there are two critical duplication bugs that waste a significant number of tokens per session, plus several structural improvements worth making.

🔴 Critical: `gitnexus:start` Block Is Loaded Twice Per Claude Code Session

CLAUDE.md lines 60–161 contain the full `` block — the same ~100-line GitNexus tool reference that also lives in AGENTS.md. CLAUDE.md is always loaded by Claude Code, so every session consumes this block plus any time AGENTS.md is read. Line 58 of CLAUDE.md even says:

"For all GitNexus MCP guidance…, use AGENTS.md as the single source of truth."

…and then immediately embeds the full block, contradicting itself.

Token cost: ~700 tokens per Claude Code session, every session, even for doc-only or eval work.

Fix: Replace lines 60–161 in CLAUDE.md with a single pointer:

GitNexus MCP rules are in the `` block in **AGENTS.md** — load that section when working with MCP tools or the graph index.

🔴 Critical: `gitnexus:start` Block Is Duplicated Inside AGENTS.md Itself

In AGENTS.md, the Reference Documentation bullet at line 51 embeds the entire gitnexus:start block inline as the value of a list item (lines 52–151), and then the same block appears again as a standalone section at lines 162–262. This means any agent that reads AGENTS.md linearly processes the block twice.

This is likely a markdown formatting artifact — the reference text tried to say "see the block below" but instead pasted the block as literal text. The reference bullet should read:

- **GitNexus:** skills under `.claude/skills/gitnexus/`; machine-oriented rules in the `gitnexus:start` … `gitnexus:end` block below.

🟡 High: `.cursor/index.mdc` Is Too Minimal for "Always On" Use

Current content (~13 lines) just says "go read AGENTS.md." With alwaysApply: true, Cursor injects this file's content into every request context — which means the 3 most critical rules (impact before edit, detect_changes before commit, never find-and-replace rename) are not in that always-on context. They only get applied if the model decides to fetch AGENTS.md.

For hard safety gates, the critical rules need to live in the always-on file itself. Best practice is a two-tier approach:

.cursor/index.mdc (alwaysApply: true) — 15–20 lines max: the 3–5 non-negotiables + pointer to AGENTS.md for details
AGENTS.md — full reference, read on demand for complex tasks

Recommended addition to .cursor/index.mdc:

## Non-negotiables (always apply)
- NEVER edit a function/class/method without running `gitnexus_impact` first.
- NEVER rename symbols with find-and-replace — use `gitnexus_rename`.  
- NEVER commit without running `gitnexus_detect_changes()`.
- NEVER ignore HIGH/CRITICAL risk warnings.
- NEVER run `npx gitnexus analyze` without `--embeddings` if `.gitnexus/meta.json` shows stored embeddings.

Full rules → **AGENTS.md** (`gitnexus:start` block, Cursor Cloud section).

This adds ~200 tokens to every Cursor request but guarantees the safety gates are applied — far better than the current approach where they might be skipped.

🟡 Medium: Scope Table Repeated 3× Across Files

The read/write/execute/off-limits table appears in AGENTS.md, CLAUDE.md, and GUARDRAILS.md with essentially the same content. This costs ~300–400 tokens per file that gets loaded and adds maintenance burden (update one, forget the others).

Recommended: Keep it only in AGENTS.md as the single source of truth. CLAUDE.md and GUARDRAILS.md should reference it via a link, not re-state it.

🟡 Medium: No Cursor Rule for `eval/`

.cursor/rules/100-monorepo.mdc has globs gitnexus/** and gitnexus-web/** but nothing for eval/**. Python/eval contributors working in that directory get no guidance about the uv run stack, Hypothesis tests, or the bridge timeout constants. A 10-line 200-eval.mdc with Python-specific commands would close this gap.

🟢 Good: Structural Patterns That Are Correct

Pattern	Why it's right
`.cursor/index.mdc` as pointer, not content dump	Follows Cursor's recommended rule architecture; avoids putting stale rules in an always-on file
Glob-scoped rules in `.cursor/rules/*.mdc`	Correct — context is injected only when working on matching files, not for every request
Deprecating `.cursorrules`	Right call; `.cursorrules` is loaded for legacy compatibility only
GUARDRAILS.md "Signs" pattern	Excellent LLM prompt engineering — trigger/instruction/reason triples are highly effective for consistent agent behavior
`llms.txt` for context seeding	Follows the llms.txt convention correctly; good for RAG-style tooling
Self-referential indexing (the repo indexed by GitNexus itself)	Genuinely clever and architecturally sound; closes the loop for AI-assisted contributions

🔵 Minor: `llms.txt` Could Be More Useful

Current llms.txt lists files without explaining priority or usage context. LLM-aware tools that consume llms.txt benefit from brief annotations:

# Core docs (read first for any contribution)
AGENTS.md — canonical agent instructions, GitNexus MCP tool reference
GUARDRAILS.md — non-negotiables and escalation triggers

# Operational (read when running commands or debugging)
RUNBOOK.md — copy-paste commands for common workflows
TESTING.md — test commands for both packages

Token Budget Summary

Item	Est. tokens	Status
`gitnexus:start` block in CLAUDE.md	~700/session	❌ Remove
`gitnexus:start` block doubled in AGENTS.md	~700 when read	❌ Fix
Scope table ×3	~400/file loaded	🟡 Consolidate
`.cursor/index.mdc` always-on	~113/Cursor req	🟡 Expand to ~250 with critical rules
`100-monorepo.mdc` on-match	~125	✅ Appropriate

The two critical fixes (remove block from CLAUDE.md, fix AGENTS.md double-embed) would save ~700–1,400 tokens per session with zero loss of information, since AGENTS.md remains the authoritative source.

Summary

The direction is correct — layered docs, glob-scoped Cursor rules, pointer-based always-on context, and self-referential indexing are all sound patterns. The issues are execution-level: the gitnexus:start block was accidentally duplicated into CLAUDE.md (and doubled inside AGENTS.md), and .cursor/index.mdc is too minimal to guarantee the critical safety rules are applied in every Cursor session. Fix those two things and this framework is solid.

magyargergo · 2026-03-23T22:15:42Z

Cc @abhigyanpatwari

Please have a look at this @jreakin 🙏

…k-templates-eval

…atwari#486 - AGENTS.md: update test counts (CLI ~2000 unit, ~1850 integration), add gitnexus-web testing section (198 unit, 5 E2E with commands) - RUNBOOK.md: fix Node requirement to >=20, fix E2E local repro command - TESTING.md: E2E uses data-testid selectors + real servers, not mocks - .cursor/rules/100-monorepo.mdc: add web test/E2E commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

magyargergo · 2026-03-24T06:02:22Z

Can you please rebase this PR on top of origin/main?

# Conflicts: # .github/workflows/ci-e2e.yml # .github/workflows/ci-report.yml # .github/workflows/ci-tests.yml # .github/workflows/ci.yml # gitnexus-web/e2e/debug-issues.spec.ts # gitnexus-web/e2e/server-connect.spec.ts

… Cursor rules - Remove ~100-line gitnexus:start block from CLAUDE.md (was duplicated from AGENTS.md) - Fix gitnexus:start block inlined inside AGENTS.md Reference Docs bullet (doubled) - Replace CLAUDE.md scope table with pointer to AGENTS.md (single source of truth) - Expand .cursor/index.mdc with 5 non-negotiable safety rules for always-on context - Add .cursor/rules/200-eval.mdc with Python/eval commands (glob-scoped to eval/**) - Improve llms.txt with priority annotations and descriptions - Bump version headers to 1.2.0, last-reviewed to 2026-03-24 Saves ~1,400 tokens/session with zero information loss. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-24T06:19:03Z

CI Report

✅ All checks passed

Pipeline Status

Stage	Status	Details
✅ Typecheck	`success`	tsc --noEmit
✅ Tests	`success`	unit tests, 3 platforms
✅ E2E	`success`	gitnexus-web changes only

Test Results

Tests	Passed	Failed	Skipped	Duration
4134	4132	0	2	168s

✅ All 4132 tests passed

2 test(s) skipped — expand for details

withTestLbugDB(lock-retry) > withLbugDb retry behaviour > throws after max retry attempts
buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric	Coverage	Covered	Base	Delta	Status
Statements	70.71%	10216/14447	70.71%	＝ 0.0	🟢 ██████████████░░░░░░
Branches	61.26%	6932/11315	61.25%	📈 +0.0	🟢 ████████████░░░░░░░░
Functions	73.76%	897/1216	73.76%	＝ 0.0	🟢 ██████████████░░░░░░
Lines	73.09%	9122/12480	73.09%	＝ 0.0	🟢 ██████████████░░░░░░

_{📋 View full run · Generated by CI}

jreakin · 2026-03-24T06:23:23Z

Addressed all context engineering review findings

Pushed 7020293 addressing every item from the AI context engineering review:

Critical (token waste)

Finding	Fix	Tokens saved
`gitnexus:start` block duplicated in CLAUDE.md	Replaced ~100-line block with 1-line pointer to AGENTS.md	~700/session
`gitnexus:start` block doubled inside AGENTS.md (inlined in Reference Docs bullet + standalone)	Fixed reference bullet to say "see block below"	~700 when read

High

Finding	Fix
`.cursor/index.mdc` too minimal for always-on use	Added 5 non-negotiable safety rules (impact before edit, no find-and-replace rename, detect_changes before commit, no ignoring HIGH/CRITICAL, embeddings guard)

Medium

Finding	Fix
Scope table repeated in CLAUDE.md + AGENTS.md	Removed from CLAUDE.md, replaced with pointer to AGENTS.md
No Cursor rule for `eval/` directory	Added `.cursor/rules/200-eval.mdc` (glob-scoped, Python/Hypothesis/uv commands)

Minor

Finding	Fix
`llms.txt` lacks priority annotations	Reorganized into Core → Operational → Architecture tiers with descriptions

Also in this push

Merged upstream/main (PR ci: E2E workflow, web typecheck job, pre-commit hook, test suite #486 landed) — resolved 6 conflicts
Bumped version headers to 1.2.0 and last-reviewed to 2026-03-24 in both AGENTS.md and CLAUDE.md

Test results

gitnexus-web: 198/198 passed
gitnexus CLI: 3832/3840 passed (8 pre-existing failures in express-routes and python-mcp-tools resolver integration tests — unrelated to this PR)

Net effect: -230 lines, +49 lines, saving ~1,400 tokens/session with zero information loss.

🤖 Generated with Claude Code

magyargergo · 2026-03-24T14:07:47Z

@jreakin Can you please resolve these conflicts? I'll merge your changes after this is green! 🚀

…bhigyanpatwari#479) * ci: E2E workflow, web typecheck job, pre-commit hook, test suite CI: - ci.yml consolidated to reference ci-tests.yml - ci-quality.yml: add typecheck-web job for gitnexus-web/ - ci-e2e.yml: E2E workflow with dorny/paths-filter (web changes only) - ci-report.yml: remove dead integration-reports references - CI gate allows skipped E2E status - .gitignore: playwright artifacts, eval test artifacts Pre-commit hook: - .githooks/pre-commit: typecheck + unit tests for both packages - Activated via git config core.hooksPath in prepare script Test infrastructure: - Vitest + React Testing Library: 58 unit tests (graph, server-connection, mermaid, settings, constants, utils, paths) - Playwright E2E: 5 tests + manual recording harness - vitest.config from vitest/config, engines.node >= 20 - Playwright artifacts retain-on-failure - wait-on in devDependencies - vitest/coverage-v8 aligned with vitest 4.x Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update gitnexus-web package-lock.json Reflects devDependency additions (vitest, playwright, wait-on, @testing-library, etc.) from package.json changes in this PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): add missing process-list-loaded testid, increase CI timeouts - Add data-testid="process-list-loaded" to ProcessesPanel (E2E tests were waiting for an element that didn't exist) - Increase server connect timeouts from 5s to 10s for slower CI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): run gitnexus-web unit tests in CI, remove unused variable - Add gitnexus-web npm ci + vitest run to ci-tests.yml so web unit tests are gated by the CI status check (were only running locally) - Remove unused IS_PLAYWRIGHT_AUTOMATION variable from E2E spec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): add process-row testid, wait for networkidle on page load - Add data-testid="process-row" to ProcessItem component (E2E tests referenced it but it didn't exist in the source) - Use waitUntil: 'networkidle' on page.goto to ensure Vite dev server is fully ready before interacting (fixes first-test timeout in CI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): add process-view-button and process-highlight-button testids E2E tests referenced these data-testid attributes but they didn't exist in ProcessItem. All 6 E2E testids now have matching source elements: status-ready, process-list-loaded, process-row, process-view-button, process-highlight-button, server-url-input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): remove networkidle — Vite HMR WebSocket prevents it from resolving networkidle waits for zero network activity for 500ms, but Vite's HMR WebSocket stays open permanently, causing page.goto to timeout at 60s on all tests after the first. The explicit toBeVisible waits on UI elements are sufficient and deterministic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): wait for Server button visibility, add CI retry, all 5 tests pass locally Root cause: test 1 clicked the Server button before React hydrated, so the tab content never rendered and the input wasn't found. Fixes: - Wait for Server button toBeVisible before clicking - Increase input wait to 15s - Remove networkidle (Vite HMR WebSocket prevents it from resolving) - Add retries: 1 in CI for transient cold-start flakiness Verified locally: all 5 E2E tests pass, 198 unit tests pass, typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): tolerate LadybugDB native crash during analyze step gitnexus analyze can crash with "double free or corruption" (known issue abhigyanpatwari#273) during the LadybugDB native addon shutdown. The index is usually written successfully before the crash. The workflow now: 1. Allows analyze to exit non-zero with a warning 2. Verifies .gitnexus index was actually created 3. Only fails if no index exists (real failure) All tests verified locally: 198 unit, 5 E2E pass, typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): fix shell quoting in analyze step, simplify to || true The previous echo string had special characters that broke bash quoting in GitHub Actions. Simplified to: analyze || true, then check if .gitnexus exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add agent development framework, GitHub templates, eval refactor Agent framework (layered docs for AI-assisted contributions): - AGENTS.md: canonical instructions, impact analysis, MCP tools - CLAUDE.md: Claude Code-specific deltas and hooks - GUARDRAILS.md: safety boundaries, non-negotiables, escalation - ARCHITECTURE.md: monorepo layout, data flow map - TESTING.md: test structure, commands, categories - RUNBOOK.md: copy-paste operations for dev/CI/MCP - llms.txt: minimal LLM context pointer Editor integration: - .cursor/index.mdc + rules/100-monorepo.mdc GitHub templates: - PR template with areas-touched checkboxes - Bug report + feature request issue forms Eval harness: - Refactored mcp_bridge, tool_registry, constants - Error sanitization utilities - Property-based tests via Hypothesis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(eval): use format_exception instead of format_exc in sanitize_exception format_exc() returns the currently handled exception traceback, which may be unrelated if called outside an active except block. Using format_exception(type(exc), exc, exc.__traceback__) reliably captures the passed exception's traceback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update CONTRIBUTING.md and TESTING.md for current CI/hook setup - CONTRIBUTING.md: add gitnexus-web typecheck command, pre-commit hook checklist item - TESTING.md: add gitnexus-web typecheck command, pre-commit hook section (husky), update CI integration to list actual workflow files (ci-quality, ci-tests, ci-e2e) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update testing docs to reflect CI/E2E changes from PR abhigyanpatwari#486 - AGENTS.md: update test counts (CLI ~2000 unit, ~1850 integration), add gitnexus-web testing section (198 unit, 5 E2E with commands) - RUNBOOK.md: fix Node requirement to >=20, fix E2E local repro command - TESTING.md: E2E uses data-testid selectors + real servers, not mocks - .cursor/rules/100-monorepo.mdc: add web test/E2E commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address context engineering review — deduplicate tokens, expand Cursor rules - Remove ~100-line gitnexus:start block from CLAUDE.md (was duplicated from AGENTS.md) - Fix gitnexus:start block inlined inside AGENTS.md Reference Docs bullet (doubled) - Replace CLAUDE.md scope table with pointer to AGENTS.md (single source of truth) - Expand .cursor/index.mdc with 5 non-negotiable safety rules for always-on context - Add .cursor/rules/200-eval.mdc with Python/eval commands (glob-scoped to eval/**) - Improve llms.txt with priority annotations and descriptions - Bump version headers to 1.2.0, last-reviewed to 2026-03-24 Saves ~1,400 tokens/session with zero information loss. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Gergő Magyar <gergomagyar@icloud.com>

Copilot AI review requested due to automatic review settings March 23, 2026 08:09

Copilot started reviewing on behalf of jreakin March 23, 2026 08:09 View session

jreakin mentioned this pull request Mar 23, 2026

fix(web): LadybugDB query fix, server graph loading, highlights, and test suite #303

Closed

9 tasks

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread gitnexus-web/src/components/ProcessesPanel.tsx Outdated

Comment thread gitnexus-web/src/core/lbug/lbug-adapter.ts

Comment thread gitnexus/package.json Outdated

Comment thread eval/utils/errors.py

Comment thread gitnexus-web/src/App.tsx

jreakin force-pushed the docs/agent-framework-templates-eval branch 5 times, most recently from a2ff6a8 to bce7652 Compare March 23, 2026 08:40

jreakin force-pushed the docs/agent-framework-templates-eval branch 15 times, most recently from e27063e to 5aa3866 Compare March 23, 2026 15:16

jreakin and others added 2 commits March 23, 2026 13:38

chore: update gitnexus-web package-lock.json

7bc0e73

Reflects devDependency additions (vitest, playwright, wait-on, @testing-library, etc.) from package.json changes in this PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jreakin force-pushed the docs/agent-framework-templates-eval branch from cd330d3 to 5291552 Compare March 23, 2026 18:39

jreakin force-pushed the docs/agent-framework-templates-eval branch from 416f990 to e27199a Compare March 23, 2026 19:26

jreakin force-pushed the docs/agent-framework-templates-eval branch from e27199a to ba4d406 Compare March 23, 2026 19:36

jreakin force-pushed the docs/agent-framework-templates-eval branch from ba4d406 to 48d7c21 Compare March 23, 2026 19:48

jreakin force-pushed the docs/agent-framework-templates-eval branch from 48d7c21 to 52167a6 Compare March 23, 2026 20:31

jreakin and others added 4 commits March 23, 2026 15:57

jreakin force-pushed the docs/agent-framework-templates-eval branch from 52167a6 to 53d2004 Compare March 23, 2026 20:58

jreakin and others added 2 commits March 23, 2026 20:32

Merge remote-tracking branch 'upstream/main' into docs/agent-framewor…

e45def5

…k-templates-eval

Merge remote-tracking branch 'upstream/main' into pr-479

7e4318a

# Conflicts: # .github/workflows/ci-e2e.yml # .github/workflows/ci-report.yml # .github/workflows/ci-tests.yml # .github/workflows/ci.yml # gitnexus-web/e2e/debug-issues.spec.ts # gitnexus-web/e2e/server-connect.spec.ts

jreakin marked this pull request as draft March 24, 2026 06:13

jreakin marked this pull request as ready for review March 24, 2026 06:22

Merge branch 'main' into docs/agent-framework-templates-eval

bca71b8

magyargergo merged commit c68d797 into abhigyanpatwari:main Mar 25, 2026
9 of 10 checks passed

jreakin deleted the docs/agent-framework-templates-eval branch March 26, 2026 01:11

Uh oh!

Conversation

jreakin commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters

Agent Framework

Editor Integration

GitHub Templates

Eval Harness

Test plan

Uh oh!

vercel Bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jreakin commented Mar 23, 2026

Uh oh!

magyargergo commented Mar 23, 2026

Uh oh!

magyargergo commented Mar 23, 2026

Uh oh!

github-actions Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Context Engineering Review — Cursor Setup & Token Optimization

Direction Assessment: Mostly Right, But Critical Token Waste Found

🔴 Critical: gitnexus:start Block Is Loaded Twice Per Claude Code Session

🔴 Critical: gitnexus:start Block Is Duplicated Inside AGENTS.md Itself

🟡 High: .cursor/index.mdc Is Too Minimal for "Always On" Use

🟡 Medium: Scope Table Repeated 3× Across Files

🟡 Medium: No Cursor Rule for eval/

🟢 Good: Structural Patterns That Are Correct

🔵 Minor: llms.txt Could Be More Useful

Token Budget Summary

Summary

Uh oh!

magyargergo commented Mar 23, 2026

Uh oh!

magyargergo commented Mar 24, 2026

Uh oh!

github-actions Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Report

Pipeline Status

Test Results

Code Coverage

Tests

Uh oh!

jreakin commented Mar 24, 2026

Addressed all context engineering review findings

Critical (token waste)

High

Medium

Minor

Also in this push

Test results

Uh oh!

magyargergo commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jreakin commented Mar 23, 2026 •

edited

Loading

github-actions Bot commented Mar 23, 2026 •

edited

Loading

🔴 Critical: `gitnexus:start` Block Is Loaded Twice Per Claude Code Session

🔴 Critical: `gitnexus:start` Block Is Duplicated Inside AGENTS.md Itself

🟡 High: `.cursor/index.mdc` Is Too Minimal for "Always On" Use

🟡 Medium: No Cursor Rule for `eval/`

🔵 Minor: `llms.txt` Could Be More Useful

github-actions Bot commented Mar 24, 2026 •

edited

Loading