docs: agent development framework, GitHub templates, eval refactor#479
Conversation
|
@jreakin is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Pull request overview
This PR expands the repo’s contributor/agent documentation and GitHub templates, adds/updates test infrastructure (Vitest + Playwright) for gitnexus-web, and refactors parts of the eval harness + web-side ingestion/LadybugDB tooling.
Changes:
- Added/updated core “agent framework” + runbook/testing/architecture docs and GitHub PR/issue templates.
- Introduced
gitnexus-webtest harness (Vitest config, unit tests, Playwright E2E) and centralized UI constants / icon imports. - Refactored web worker + LadybugDB adapter behaviors (pipeline finalization, server-graph loading, query guards, settings storage migration) and eval harness utilities (tool registry + error sanitization + Hypothesis tests).
Reviewed changes
Copilot reviewed 83 out of 86 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
llms.txt |
Adds minimal pointer doc listing key repo docs and directories. |
gitnexus/vitest.config.ts |
Removes coverage auto-update setting; minor formatting. |
gitnexus/package.json |
Updates prepare script to set git hooks path after build. |
gitnexus-web/vitest.config.ts |
Adds Vitest configuration for web package (jsdom, coverage, aliases). |
gitnexus-web/test/unit/utils.test.ts |
Unit tests for generateId. |
gitnexus-web/test/unit/settings-service.test.ts |
Unit tests for sessionStorage-based settings service + provider config logic. |
gitnexus-web/test/unit/server-connection.test.ts |
Unit tests for server URL normalization + file content extraction. |
gitnexus-web/test/unit/security-guards.test.ts |
Tests for label/rel allowlists, ID regex, and read-only write detection regex. |
gitnexus-web/test/unit/path-resolution.test.ts |
Tests for new path resolution helpers. |
gitnexus-web/test/unit/mermaid-generator.test.ts |
Tests for mermaid generator output. |
gitnexus-web/test/unit/graph.test.ts |
Tests for graph add/dedup behavior. |
gitnexus-web/test/unit/constants.test.ts |
Tests ensuring constants consistency (colors/sizes/edges). |
gitnexus-web/test/setup.ts |
Vitest setup resetting storage between tests. |
gitnexus-web/test/fixtures/graph.ts |
Shared graph factories for tests. |
gitnexus-web/src/workers/ingestion.worker.ts |
Refactors pipeline finalization; renames/implements server graph loading; clamps negative scores; improves enrichment updates. |
gitnexus-web/src/types/lbug-wasm.d.ts |
Updates LadybugDB WASM type declarations. |
gitnexus-web/src/lib/path-resolution.ts |
Adds reusable path normalization + “best match” resolver. |
gitnexus-web/src/lib/lucide-icons.tsx |
Adds deep icon imports to reduce bundle size. |
gitnexus-web/src/lib/grounding-patterns.ts |
Adds shared regexes for file/node grounding references. |
gitnexus-web/src/lib/constants.ts |
Adds Section styling in node color/size maps. |
gitnexus-web/src/hooks/useBackend.ts |
Centralizes backend URL defaults and debounce constant; adds dev-only fetch warning. |
gitnexus-web/src/hooks/app-state/graph.tsx |
Adds a dedicated GraphState context/provider. |
gitnexus-web/src/core/llm/types.ts |
Centralizes provider default base URLs via UI constants. |
gitnexus-web/src/core/llm/tools.ts |
Adds label/rel-type allowlist checks before interpolating into Cypher; sanitizes relationTypes input. |
gitnexus-web/src/core/llm/settings-service.ts |
Migrates settings to sessionStorage with localStorage migration; refactors provider config selection; centralizes OpenRouter model fetch URL. |
gitnexus-web/src/core/llm/agent.ts |
Centralizes base URLs; improves streaming tool-call parsing; replaces boolean tool tracking with a counter. |
gitnexus-web/src/core/lbug/schema.ts |
Extends REL_TYPES list with additional relation types. |
gitnexus-web/src/core/lbug/lbug-adapter.ts |
Adds init promise; enforces readOnly by default; switches to getAllRows; adds prepared/batched relation insert logic; adds test-mode skip for bulk load. |
gitnexus-web/src/core/lbug/csv-generator.ts |
Adds multi-language CSV generation; fixes CSV escaping for community keywords. |
gitnexus-web/src/core/graph/types.ts |
Adds Section to NodeLabel. |
gitnexus-web/src/config/ui-constants.ts |
Introduces centralized UI/provider constants (timeouts/default URLs). |
gitnexus-web/src/components/StatusBar.tsx |
Uses memoization for primary language; switches icon import; adds test id for “Ready”. |
gitnexus-web/src/components/SettingsPanel.tsx |
Switches to deep icon imports; extracts provider cards; sessionStorage messaging; fixes timeout cleanup. |
gitnexus-web/src/components/ProcessesPanel.tsx |
Adds ID allowlist filter/guard for Cypher queries involving process/step IDs. |
gitnexus-web/src/components/ProcessFlowModal.tsx |
Switches icon imports; sanitizes mermaid SVG; improves keyboard handler guard. |
gitnexus-web/src/components/MarkdownRenderer.tsx |
Switches icon imports; clears copy-state timeout on unmount. |
gitnexus-web/src/components/GraphCanvas.tsx |
Uses O(1) node lookups; clears more highlight state when toggling off; switches icon imports. |
gitnexus-web/src/components/FileTreePanel.tsx |
Switches icon imports; makes search recursive. |
gitnexus-web/src/components/DropZone.tsx |
Switches icon imports; wraps localStorage get/set in try/catch. |
gitnexus-web/src/components/CodeReferencesPanel.tsx |
Switches icon imports; cancels rAF chain on cleanup. |
gitnexus-web/src/App.tsx |
Refactors embeddings startup; changes server-connect flow to load worker DB then init agent; uses centralized error delay. |
gitnexus-web/playwright.config.ts |
Adds Playwright config with optional insecure mode and software WebGL args. |
gitnexus-web/package.json |
Adds engines/scripts, Playwright + testing deps, adds LadybugDB + DOMPurify. |
gitnexus-web/e2e/server-connect.spec.ts |
Adds E2E tests for server connect, AI panel, processes, highlight clearing. |
gitnexus-web/e2e/manual-record.spec.ts |
Adds manual debug recording spec (skipped by default). |
gitnexus-web/e2e/debug-issues.spec.ts |
Adds targeted debug harness spec (skipped unless enabled). |
eval/utils/errors.py |
Adds debug-gated exception sanitization + log helper. |
eval/utils/__init__.py |
Marks eval utils as a package. |
eval/tool_registry.py |
Centralizes tool script specs and metric keys. |
eval/tests/test_tool_scripts.py |
Tests tool script rendering behavior. |
eval/tests/test_property_based.py |
Adds Hypothesis property-based tests for run-id parsing/tool script rendering/error sanitization. |
eval/tests/test_parse_run_id.py |
Adds targeted tests for run-id parsing. |
eval/tests/test_errors.py |
Adds tests for sanitized exception output with/without debug. |
eval/tests/conftest.py |
Ensures eval root is importable in tests. |
eval/tests/__init__.py |
Marks eval tests as a package. |
eval/run_eval.py |
Refactors build/agent/env creation; replaces raw traceback logging with sanitized error recording. |
eval/pyproject.toml |
Adds Hypothesis/coverage deps; includes additional files in wheel build. |
eval/environments/gitnexus_docker.py |
Refactors tool script installation to use registry; centralizes constants; uses safe exception logging. |
eval/constants.py |
Introduces shared constants for eval/bridge timeouts and defaults. |
eval/bridge/mcp_bridge.py |
Centralizes timeouts; improves subprocess stream closing; refactors response reading helpers; safe exception logging. |
eval/analysis/analyze_results.py |
Improves run-id parsing (longest known suffix); uses shared tool metric keys; adds flexible mode ordering. |
eval/agents/gitnexus_agent.py |
Centralizes tool binaries/metric keys; uses default augment timeout constant. |
eval/README.md |
Documents debug logging env var behavior. |
TESTING.md |
Adds consolidated testing documentation for both packages. |
RUNBOOK.md |
Adds operational runbook for common workflows and troubleshooting. |
README.md |
Adds “Development” section linking to new docs. |
GUARDRAILS.md |
Adds contributor/agent guardrails and escalation guidance. |
CONTRIBUTING.md |
Adds contribution guidelines and PR expectations. |
CLAUDE.md |
Adds structured Claude Code-specific guidance and changelog. |
ARCHITECTURE.md |
Adds architecture overview and where-to-change-what map. |
AGENTS.md |
Adds structured agent instructions and Cursor-specific guidance. |
.gitignore |
Adds ignores for handoff docs, Playwright artifacts, and Python test artifacts. |
.github/workflows/ci.yml |
Adds E2E workflow orchestration and updates CI gate/meta artifact. |
.github/workflows/ci-quality.yml |
Adds gitnexus-web typecheck job. |
.github/workflows/ci-e2e.yml |
Adds reusable web E2E workflow triggered only on gitnexus-web/** changes. |
.github/PULL_REQUEST_TEMPLATE.md |
Adds PR template with area/touch + test checklist. |
.github/ISSUE_TEMPLATE/feature_request.yml |
Adds feature request issue form with area dropdown. |
.github/ISSUE_TEMPLATE/config.yml |
Enables blank issues. |
.github/ISSUE_TEMPLATE/bug_report.yml |
Adds bug report issue form with structured fields. |
.githooks/pre-commit |
Adds pre-commit hook to run relevant typecheck/tests based on staged paths. |
.cursorrules |
Deprecates legacy Cursor rules file in favor of .cursor/index.mdc. |
.cursor/rules/100-monorepo.mdc |
Adds glob-scoped monorepo build/test quick refs for Cursor. |
.cursor/index.mdc |
Adds always-on Cursor rules pointing to canonical docs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a2ff6a8 to
bce7652
Compare
|
Copilot review addressed:
|
e27063e to
5aa3866
Compare
|
Can you please rebase your changes on top of |
CI: - ci.yml consolidated to reference ci-tests.yml - ci-quality.yml: add typecheck-web job for gitnexus-web/ - ci-e2e.yml: E2E workflow with dorny/paths-filter (web changes only) - ci-report.yml: remove dead integration-reports references - CI gate allows skipped E2E status - .gitignore: playwright artifacts, eval test artifacts Pre-commit hook: - .githooks/pre-commit: typecheck + unit tests for both packages - Activated via git config core.hooksPath in prepare script Test infrastructure: - Vitest + React Testing Library: 58 unit tests (graph, server-connection, mermaid, settings, constants, utils, paths) - Playwright E2E: 5 tests + manual recording harness - vitest.config from vitest/config, engines.node >= 20 - Playwright artifacts retain-on-failure - wait-on in devDependencies - vitest/coverage-v8 aligned with vitest 4.x Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflects devDependency additions (vitest, playwright, wait-on, @testing-library, etc.) from package.json changes in this PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cd330d3 to
5291552
Compare
E2E tests referenced these data-testid attributes but they didn't exist in ProcessItem. All 6 E2E testids now have matching source elements: status-ready, process-list-loaded, process-row, process-view-button, process-highlight-button, server-url-input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
416f990 to
e27199a
Compare
…solving networkidle waits for zero network activity for 500ms, but Vite's HMR WebSocket stays open permanently, causing page.goto to timeout at 60s on all tests after the first. The explicit toBeVisible waits on UI elements are sufficient and deterministic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e27199a to
ba4d406
Compare
…s pass locally Root cause: test 1 clicked the Server button before React hydrated, so the tab content never rendered and the input wasn't found. Fixes: - Wait for Server button toBeVisible before clicking - Increase input wait to 15s - Remove networkidle (Vite HMR WebSocket prevents it from resolving) - Add retries: 1 in CI for transient cold-start flakiness Verified locally: all 5 E2E tests pass, 198 unit tests pass, typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ba4d406 to
48d7c21
Compare
gitnexus analyze can crash with "double free or corruption" (known issue abhigyanpatwari#273) during the LadybugDB native addon shutdown. The index is usually written successfully before the crash. The workflow now: 1. Allows analyze to exit non-zero with a warning 2. Verifies .gitnexus index was actually created 3. Only fails if no index exists (real failure) All tests verified locally: 198 unit, 5 E2E pass, typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
48d7c21 to
52167a6
Compare
The previous echo string had special characters that broke bash quoting in GitHub Actions. Simplified to: analyze || true, then check if .gitnexus exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent framework (layered docs for AI-assisted contributions): - AGENTS.md: canonical instructions, impact analysis, MCP tools - CLAUDE.md: Claude Code-specific deltas and hooks - GUARDRAILS.md: safety boundaries, non-negotiables, escalation - ARCHITECTURE.md: monorepo layout, data flow map - TESTING.md: test structure, commands, categories - RUNBOOK.md: copy-paste operations for dev/CI/MCP - llms.txt: minimal LLM context pointer Editor integration: - .cursor/index.mdc + rules/100-monorepo.mdc GitHub templates: - PR template with areas-touched checkboxes - Bug report + feature request issue forms Eval harness: - Refactored mcp_bridge, tool_registry, constants - Error sanitization utilities - Property-based tests via Hypothesis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eption format_exc() returns the currently handled exception traceback, which may be unrelated if called outside an active except block. Using format_exception(type(exc), exc, exc.__traceback__) reliably captures the passed exception's traceback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CONTRIBUTING.md: add gitnexus-web typecheck command, pre-commit hook checklist item - TESTING.md: add gitnexus-web typecheck command, pre-commit hook section (husky), update CI integration to list actual workflow files (ci-quality, ci-tests, ci-e2e) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
52167a6 to
53d2004
Compare
|
@claude act as senior context engineer and expert in LLMs. Ultra-think while reviewing the AI related md files and research about if this is the right direction of setting up cursor and is optimal for token usage. |
|
Claude finished @magyargergo's task in 2m 21s —— View job AI Context Engineering Review — Cursor Setup & Token OptimizationTasks:
Direction Assessment: Mostly Right, But Critical Token Waste FoundThe layered architecture (policy → operations → architecture) is the correct mental model for multi-agent contexts. The decisions to use 🔴 Critical:
|
| Pattern | Why it's right |
|---|---|
.cursor/index.mdc as pointer, not content dump |
Follows Cursor's recommended rule architecture; avoids putting stale rules in an always-on file |
Glob-scoped rules in .cursor/rules/*.mdc |
Correct — context is injected only when working on matching files, not for every request |
Deprecating .cursorrules |
Right call; .cursorrules is loaded for legacy compatibility only |
| GUARDRAILS.md "Signs" pattern | Excellent LLM prompt engineering — trigger/instruction/reason triples are highly effective for consistent agent behavior |
llms.txt for context seeding |
Follows the llms.txt convention correctly; good for RAG-style tooling |
| Self-referential indexing (the repo indexed by GitNexus itself) | Genuinely clever and architecturally sound; closes the loop for AI-assisted contributions |
🔵 Minor: llms.txt Could Be More Useful
Current llms.txt lists files without explaining priority or usage context. LLM-aware tools that consume llms.txt benefit from brief annotations:
# Core docs (read first for any contribution)
AGENTS.md — canonical agent instructions, GitNexus MCP tool reference
GUARDRAILS.md — non-negotiables and escalation triggers
# Operational (read when running commands or debugging)
RUNBOOK.md — copy-paste commands for common workflows
TESTING.md — test commands for both packages
Token Budget Summary
| Item | Est. tokens | Status |
|---|---|---|
gitnexus:start block in CLAUDE.md |
~700/session | ❌ Remove |
gitnexus:start block doubled in AGENTS.md |
~700 when read | ❌ Fix |
| Scope table ×3 | ~400/file loaded | 🟡 Consolidate |
.cursor/index.mdc always-on |
~113/Cursor req | 🟡 Expand to ~250 with critical rules |
100-monorepo.mdc on-match |
~125 | ✅ Appropriate |
The two critical fixes (remove block from CLAUDE.md, fix AGENTS.md double-embed) would save ~700–1,400 tokens per session with zero loss of information, since AGENTS.md remains the authoritative source.
Summary
The direction is correct — layered docs, glob-scoped Cursor rules, pointer-based always-on context, and self-referential indexing are all sound patterns. The issues are execution-level: the gitnexus:start block was accidentally duplicated into CLAUDE.md (and doubled inside AGENTS.md), and .cursor/index.mdc is too minimal to guarantee the critical safety rules are applied in every Cursor session. Fix those two things and this framework is solid.
|
Please have a look at this @jreakin 🙏 |
…atwari#486 - AGENTS.md: update test counts (CLI ~2000 unit, ~1850 integration), add gitnexus-web testing section (198 unit, 5 E2E with commands) - RUNBOOK.md: fix Node requirement to >=20, fix E2E local repro command - TESTING.md: E2E uses data-testid selectors + real servers, not mocks - .cursor/rules/100-monorepo.mdc: add web test/E2E commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Can you please rebase this PR on top of origin/main? |
# Conflicts: # .github/workflows/ci-e2e.yml # .github/workflows/ci-report.yml # .github/workflows/ci-tests.yml # .github/workflows/ci.yml # gitnexus-web/e2e/debug-issues.spec.ts # gitnexus-web/e2e/server-connect.spec.ts
… Cursor rules - Remove ~100-line gitnexus:start block from CLAUDE.md (was duplicated from AGENTS.md) - Fix gitnexus:start block inlined inside AGENTS.md Reference Docs bullet (doubled) - Replace CLAUDE.md scope table with pointer to AGENTS.md (single source of truth) - Expand .cursor/index.mdc with 5 non-negotiable safety rules for always-on context - Add .cursor/rules/200-eval.mdc with Python/eval commands (glob-scoped to eval/**) - Improve llms.txt with priority annotations and descriptions - Bump version headers to 1.2.0, last-reviewed to 2026-03-24 Saves ~1,400 tokens/session with zero information loss. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 4132 tests passed 2 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
Addressed all context engineering review findingsPushed Critical (token waste)
High
Medium
Minor
Also in this push
Test results
Net effect: -230 lines, +49 lines, saving ~1,400 tokens/session with zero information loss. 🤖 Generated with Claude Code |
|
@jreakin Can you please resolve these conflicts? I'll merge your changes after this is green! 🚀 |
…bhigyanpatwari#479) * ci: E2E workflow, web typecheck job, pre-commit hook, test suite CI: - ci.yml consolidated to reference ci-tests.yml - ci-quality.yml: add typecheck-web job for gitnexus-web/ - ci-e2e.yml: E2E workflow with dorny/paths-filter (web changes only) - ci-report.yml: remove dead integration-reports references - CI gate allows skipped E2E status - .gitignore: playwright artifacts, eval test artifacts Pre-commit hook: - .githooks/pre-commit: typecheck + unit tests for both packages - Activated via git config core.hooksPath in prepare script Test infrastructure: - Vitest + React Testing Library: 58 unit tests (graph, server-connection, mermaid, settings, constants, utils, paths) - Playwright E2E: 5 tests + manual recording harness - vitest.config from vitest/config, engines.node >= 20 - Playwright artifacts retain-on-failure - wait-on in devDependencies - vitest/coverage-v8 aligned with vitest 4.x Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update gitnexus-web package-lock.json Reflects devDependency additions (vitest, playwright, wait-on, @testing-library, etc.) from package.json changes in this PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): add missing process-list-loaded testid, increase CI timeouts - Add data-testid="process-list-loaded" to ProcessesPanel (E2E tests were waiting for an element that didn't exist) - Increase server connect timeouts from 5s to 10s for slower CI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): run gitnexus-web unit tests in CI, remove unused variable - Add gitnexus-web npm ci + vitest run to ci-tests.yml so web unit tests are gated by the CI status check (were only running locally) - Remove unused IS_PLAYWRIGHT_AUTOMATION variable from E2E spec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): add process-row testid, wait for networkidle on page load - Add data-testid="process-row" to ProcessItem component (E2E tests referenced it but it didn't exist in the source) - Use waitUntil: 'networkidle' on page.goto to ensure Vite dev server is fully ready before interacting (fixes first-test timeout in CI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): add process-view-button and process-highlight-button testids E2E tests referenced these data-testid attributes but they didn't exist in ProcessItem. All 6 E2E testids now have matching source elements: status-ready, process-list-loaded, process-row, process-view-button, process-highlight-button, server-url-input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): remove networkidle — Vite HMR WebSocket prevents it from resolving networkidle waits for zero network activity for 500ms, but Vite's HMR WebSocket stays open permanently, causing page.goto to timeout at 60s on all tests after the first. The explicit toBeVisible waits on UI elements are sufficient and deterministic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): wait for Server button visibility, add CI retry, all 5 tests pass locally Root cause: test 1 clicked the Server button before React hydrated, so the tab content never rendered and the input wasn't found. Fixes: - Wait for Server button toBeVisible before clicking - Increase input wait to 15s - Remove networkidle (Vite HMR WebSocket prevents it from resolving) - Add retries: 1 in CI for transient cold-start flakiness Verified locally: all 5 E2E tests pass, 198 unit tests pass, typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): tolerate LadybugDB native crash during analyze step gitnexus analyze can crash with "double free or corruption" (known issue abhigyanpatwari#273) during the LadybugDB native addon shutdown. The index is usually written successfully before the crash. The workflow now: 1. Allows analyze to exit non-zero with a warning 2. Verifies .gitnexus index was actually created 3. Only fails if no index exists (real failure) All tests verified locally: 198 unit, 5 E2E pass, typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): fix shell quoting in analyze step, simplify to || true The previous echo string had special characters that broke bash quoting in GitHub Actions. Simplified to: analyze || true, then check if .gitnexus exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add agent development framework, GitHub templates, eval refactor Agent framework (layered docs for AI-assisted contributions): - AGENTS.md: canonical instructions, impact analysis, MCP tools - CLAUDE.md: Claude Code-specific deltas and hooks - GUARDRAILS.md: safety boundaries, non-negotiables, escalation - ARCHITECTURE.md: monorepo layout, data flow map - TESTING.md: test structure, commands, categories - RUNBOOK.md: copy-paste operations for dev/CI/MCP - llms.txt: minimal LLM context pointer Editor integration: - .cursor/index.mdc + rules/100-monorepo.mdc GitHub templates: - PR template with areas-touched checkboxes - Bug report + feature request issue forms Eval harness: - Refactored mcp_bridge, tool_registry, constants - Error sanitization utilities - Property-based tests via Hypothesis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(eval): use format_exception instead of format_exc in sanitize_exception format_exc() returns the currently handled exception traceback, which may be unrelated if called outside an active except block. Using format_exception(type(exc), exc, exc.__traceback__) reliably captures the passed exception's traceback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update CONTRIBUTING.md and TESTING.md for current CI/hook setup - CONTRIBUTING.md: add gitnexus-web typecheck command, pre-commit hook checklist item - TESTING.md: add gitnexus-web typecheck command, pre-commit hook section (husky), update CI integration to list actual workflow files (ci-quality, ci-tests, ci-e2e) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update testing docs to reflect CI/E2E changes from PR abhigyanpatwari#486 - AGENTS.md: update test counts (CLI ~2000 unit, ~1850 integration), add gitnexus-web testing section (198 unit, 5 E2E with commands) - RUNBOOK.md: fix Node requirement to >=20, fix E2E local repro command - TESTING.md: E2E uses data-testid selectors + real servers, not mocks - .cursor/rules/100-monorepo.mdc: add web test/E2E commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address context engineering review — deduplicate tokens, expand Cursor rules - Remove ~100-line gitnexus:start block from CLAUDE.md (was duplicated from AGENTS.md) - Fix gitnexus:start block inlined inside AGENTS.md Reference Docs bullet (doubled) - Replace CLAUDE.md scope table with pointer to AGENTS.md (single source of truth) - Expand .cursor/index.mdc with 5 non-negotiable safety rules for always-on context - Add .cursor/rules/200-eval.mdc with Python/eval commands (glob-scoped to eval/**) - Improve llms.txt with priority annotations and descriptions - Bump version headers to 1.2.0, last-reviewed to 2026-03-24 Saves ~1,400 tokens/session with zero information loss. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Gergő Magyar <gergomagyar@icloud.com>
Summary
Documentation, templates, and eval tooling. Zero runtime impact. Split from #303 (PR 6 of 6). Depends on #486.
Why this matters
GitNexus is increasingly contributed to by AI agents (Claude Code, Cursor, GitHub Copilot) alongside human developers. Without structured guidance, agents make well-intentioned but unsafe changes — renaming symbols with find-and-replace instead of the graph-aware rename tool, skipping impact analysis before edits, committing without running
detect_changes, or accidentally deleting embeddings by runninganalyzewithout--embeddings.This PR introduces a layered documentation framework that gives both human and AI contributors the same guardrails:
The framework is self-referential: the project is indexed by GitNexus itself, so agents reading AGENTS.md are instructed to use GitNexus MCP tools to safely navigate the codebase they're modifying. This makes AI-assisted contributions reproducible and auditable.
Agent Framework
These documents form a layered system — policy on top, operations in the middle, architecture at the base:
Policy layer — rules and safety gates:
AGENTS.md— Canonical agent instructions. Mandatory impact analysis before edits (gitnexus_impact), change detection before commits (gitnexus_detect_changes), scope boundaries (what agents can read/write/execute vs. what's off-limits). Includes GitNexus MCP tool quick reference and impact risk levels (d=1 WILL BREAK → d=3 MAY NEED TESTING). Recurring failure patterns ("Signs") with fixes.CLAUDE.md— Claude Code-specific deltas. PreToolUse hooks for hard gates (e.g., tests beforegit_commit). Execution sequence discipline for long sessions. Defers all MCP guidance to AGENTS.md as single source of truth.GUARDRAILS.md— Safety boundaries for humans and AI. Five non-negotiables: no secrets, use rename tool, impact analysis, detect_changes, preserve embeddings. Escalation scenarios (HIGH/CRITICAL risk, CI/release changes, data loss uncertainty). Publishing and supply chain rules.Operations layer — what to run and when:
TESTING.md— Test structure, commands, and targets for both packages (gitnexus CLI: vitest; gitnexus-web: vitest + Playwright E2E). Categories: unit, integration, eval-style golden sets, E2E. Known issues (LadybugDB locking in containers).RUNBOOK.md— Copy-paste operations: index a repo, generate embeddings, troubleshoot MCP (no repos → run analyze; wrong repo → use list_repos + repo param), clean slate, CI failure repro, memory/locking issues.Architecture layer — system context:
ARCHITECTURE.md— Monorepo layout (gitnexus CLI/MCP, gitnexus-web React/Vite, plugin packages, eval harness). End-to-end data flow: Tree-sitter parsing → LadybugDB persistence → MCP tools/resources. Where-to-change-what map for common modifications.llms.txt— Minimal pointer document listing key docs and directories for LLM context seeding.Editor Integration
.cursor/index.mdc— Always-on Cursor Cloud rules pointing to AGENTS.md for canonical instructions..cursor/rules/100-monorepo.mdc— Glob-scoped build/test quick reference per package. Known issues (native build tools, LadybugDB locking)..cursorrules— Legacy deprecation stub pointing to.cursor/index.mdc.GitHub Templates
config.yml— Blank issues enabled for edge cases outside the templates.Eval Harness
tool_registry.py— Centralized tool script specs and metric keys (was scattered across multiple files).constants.py— Shared constants for eval/bridge timeouts and defaults.utils/errors.py— Debug-gated exception sanitization usingtraceback.format_exception(notformat_exc) for reliable traceback capture.log_safe_exceptionhelper for consistent error logging without stack trace leaks.bridge/mcp_bridge.py— Centralized MCP timeout constants, improved subprocess stream closing, refactored response reading.analysis/analyze_results.py— Longest-suffix-first run-ID parsing, flexible mode ordering, shared metric keys.Test plan
.cursor/index.mdcloads in Cursor Cloud