diff --git a/AGENTS.md b/AGENTS.md index 651657b026..f212fae0ce 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,117 +1,120 @@ - - + + -Last reviewed: 2026-04-13 +Last reviewed: 2026-04-16 **Project:** GitNexus · **Environment:** dev · **Maintainer:** repository maintainers (see GitHub) -This file uses a standard agent header (version, scope, model policy, reference docs, changelog), adapted for this **TypeScript/JavaScript monorepo**. - ## Scope -| | | -|--|--| -| **Reads** | Repository tree as needed for the task: `gitnexus/`, `gitnexus-web/`, `eval/`, plugin packages, `.github/`, `.gitnexus/` when present, and docs. | -| **Writes** | Only paths required for the requested change; keep diffs minimal. Update lockfiles when dependencies change. | -| **Executes** | `npm`, `npx`, `node` under `gitnexus/` and `gitnexus-web/`; `uv run` for Python under `eval/` when applicable; shell utilities for documented CI/dev workflows. | -| **Off-limits** | User secrets (e.g. real `.env`), production deployment credentials, unrelated repositories, destructive git history operations without explicit human confirmation. | +| Boundary | Rule | +|----------|------| +| **Reads** | `gitnexus/`, `gitnexus-web/`, `eval/`, plugin packages, `.github/`, `.gitnexus/`, docs. | +| **Writes** | Only paths required for the change; keep diffs minimal. Update lockfiles when deps change. | +| **Executes** | `npm`, `npx`, `node` under `gitnexus/` and `gitnexus-web/`; `uv run` for Python under `eval/`; documented CI/dev workflows. | +| **Off-limits** | Real `.env` / secrets, production credentials, unrelated repos, destructive git ops without confirmation. | ## Model Configuration -- **Primary:** Pin in **Cursor** (Settings → model). Use a **named** model (e.g. GPT-5.2, Claude Sonnet 4.x). Avoid relying on **Auto** when reproducibility or audit trail matters. -- **Fallback:** As configured in Cursor or your organization (do not encode `latest` or wildcards in automation configs). -- **Notes:** The open-source GitNexus CLI indexer does not call an LLM. Optional Nexus AI in the web UI uses end-user provider keys and models. +- **Primary:** Use a named model (e.g. Claude Sonnet 4.x). Avoid `Auto` or unversioned `latest` when reproducibility matters. +- **Notes:** The GitNexus CLI indexer does not call an LLM. ## Execution Sequence (complex tasks) -Long sessions dilute instructions. For **multi-step** work, state up front: - +For multi-step work, state up front: 1. Which rules in this file and **[GUARDRAILS.md](GUARDRAILS.md)** apply (and any relevant Signs). -2. Current **Scope** boundaries (Reads / Writes / Off-limits). -3. Which **validation commands** you will run (e.g. `cd gitnexus && npm test`, `npx tsc --noEmit`). +2. Current **Scope** boundaries. +3. Which **validation commands** you will run (`cd gitnexus && npm test`, `npx tsc --noEmit`). -On very long threads, the human may add *“Remember: apply all AGENTS.md rules”* to re-weight rule tokens against context dilution. +On long threads, *"Remember: apply all AGENTS.md rules"* re-weights these instructions against context dilution. ## Claude Code hooks -Hooks enforce gates that prompts cannot. In **Claude Code**, **PreToolUse** hooks can block tools such as `git_commit` until checks pass. Adapt to this repo: e.g. `cd gitnexus && npm test` before commit. +**PreToolUse** hooks can block tools (e.g. `git_commit`) until checks pass. Adapt to this repo: `cd gitnexus && npm test` before commit. -## Context budget (Cursor / standards) +## Context budget -Generic “core standards” playbooks are often long and stack-specific. For this monorepo, commands and gotchas live under **Cursor Cloud specific instructions** below and in **[CONTRIBUTING.md](CONTRIBUTING.md)**. If always-on rules grow, split domain rules into **`.cursor/rules/*.mdc`** (globs). **Cursor:** project-wide rules live in **`.cursor/index.mdc`** (YAML frontmatter with `alwaysApply: true`). **Claude Code:** optionally load a **`STANDARDS.md`** only when needed (e.g. *“When writing new code, read STANDARDS.md”*) to save context. +Commands and gotchas live under **Repo reference** below and in **[CONTRIBUTING.md](CONTRIBUTING.md)**. If always-on rules grow, split into **`.cursor/rules/*.mdc`** (globs). **Cursor:** project-wide rules in `.cursor/index.mdc`. **Claude Code:** load `STANDARDS.md` only when needed. -## Reference Documentation +## Reference docs -- **This repository:** **[ARCHITECTURE.md](ARCHITECTURE.md)**, **[CONTRIBUTING.md](CONTRIBUTING.md)**, **[GUARDRAILS.md](GUARDRAILS.md)**. -- **Cursor:** `.cursor/index.mdc` (always-on rules); optional `.cursor/rules/*.mdc` (glob-scoped). Legacy `.cursorrules` is deprecated — see `.cursor/index.mdc`. -- **Optional local files:** `NOTES.md` (short vendor-neutral project snapshot). For handoffs, keep notes local (e.g., a scratch file outside the repo) rather than committing `HANDOFF.md`. -- **GitNexus:** skills under `.claude/skills/gitnexus/`; machine-oriented rules in the `gitnexus:start` … `gitnexus:end` block below. +- **[ARCHITECTURE.md](ARCHITECTURE.md)**, **[CONTRIBUTING.md](CONTRIBUTING.md)**, **[GUARDRAILS.md](GUARDRAILS.md)** +- **Cursor:** `.cursor/index.mdc` (always-on); `.cursor/rules/*.mdc` (glob-scoped). Legacy `.cursorrules` deprecated. +- **GitNexus:** skills in `.claude/skills/gitnexus/`; MCP rules in `gitnexus:start` block below. ## Changelog | Date | Version | Change | |------|---------|--------| +| 2026-04-16 | 1.4.0 | Fixed: web UI description, pre-commit behavior, MCP tools (7->16), added gitnexus-shared, removed stale vite-plugin-wasm gotcha. | | 2026-04-13 | 1.3.0 | Updated GitNexus index stats after DAG refactor. | -| 2026-03-24 | 1.2.0 | Fixed gitnexus:start block duplication (was inlined in Reference Docs bullet). | -| 2026-03-23 | 1.1.0 | Updated agent instructions (sections, references, Cursor layout). | -| 2026-03-22 | 1.0.0 | Added structured agent header and changelog. | +| 2026-03-24 | 1.2.0 | Fixed gitnexus:start block duplication. | +| 2026-03-23 | 1.1.0 | Updated agent instructions, references, Cursor layout. | +| 2026-03-22 | 1.0.0 | Initial structured header and changelog. | --- # GitNexus — Code Intelligence -This project is indexed by GitNexus as **GitNexus** (4325 symbols, 10556 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely. +Indexed as **GitNexus** (4325 symbols, 10556 relationships, 300 execution flows). Use MCP tools to understand code, assess impact, and navigate safely. -> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first. +> If any tool warns the index is stale, run `npx gitnexus analyze` first. ## Always Do -- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user. -- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows. -- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits. -- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance. -- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`. +- **MUST run impact analysis before editing any symbol.** `gitnexus_impact({target: "symbolName", direction: "upstream"})` — report blast radius to the user. +- **MUST run `gitnexus_detect_changes()` before committing** — verify only expected symbols and flows are affected. +- **MUST warn the user** if impact returns HIGH or CRITICAL risk. +- Explore unfamiliar code with `gitnexus_query({query: "concept"})` (process-grouped, ranked) instead of grepping. +- Full context on a symbol: `gitnexus_context({name: "symbolName"})`. ## When Debugging -1. `gitnexus_query({query: ""})` — find execution flows related to the issue -2. `gitnexus_context({name: ""})` — see all callers, callees, and process participation -3. `READ gitnexus://repo/GitNexus/process/{processName}` — trace the full execution flow step by step -4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed +1. `gitnexus_query({query: ""})` — find related execution flows +2. `gitnexus_context({name: ""})` — callers, callees, process participation +3. `READ gitnexus://repo/GitNexus/process/{processName}` — trace flow step by step +4. Regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` ## When Refactoring -- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`. -- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code. -- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed. +- **Rename:** `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Graph edits are safe; text_search edits need manual review. +- **Extract/Split:** `gitnexus_context` (incoming/outgoing refs) then `gitnexus_impact` (upstream callers) before moving code. +- **After any refactor:** `gitnexus_detect_changes({scope: "all"})` to verify scope. ## Never Do -- NEVER edit a function, class, or method without first running `gitnexus_impact` on it. -- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis. -- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph. -- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope. +- Edit a symbol without running `gitnexus_impact` first. +- Ignore HIGH/CRITICAL risk warnings. +- Rename with find-and-replace — use `gitnexus_rename`. +- Commit without `gitnexus_detect_changes()`. ## Tools Quick Reference -| Tool | When to use | Command | +| Tool | When to use | Example | |------|-------------|---------| +| `list_repos` | Discover indexed repos | `gitnexus_list_repos({})` | | `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` | | `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` | | `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` | | `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` | | `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` | | `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` | +| `api_impact` | Pre-change API route impact | `gitnexus_api_impact({route: "/api/users", method: "GET"})` | +| `route_map` | Route → handler → consumer map | `gitnexus_route_map({})` | +| `tool_map` | MCP/RPC tool definitions | `gitnexus_tool_map({})` | +| `shape_check` | Response shape vs consumer access | `gitnexus_shape_check({route: "/api/users"})` | +| `group_list` | List repo groups | `gitnexus_group_list({})` | +| `group_query` | Cross-repo search in a group | `gitnexus_group_query({name: "myGroup", query: "auth"})` | +| `group_sync` | Rebuild group Contract Registry | `gitnexus_group_sync({name: "myGroup"})` | +| `group_contracts` | Inspect group contracts | `gitnexus_group_contracts({name: "myGroup"})` | +| `group_status` | Group staleness report | `gitnexus_group_status({name: "myGroup"})` | ## Impact Risk Levels | Depth | Meaning | Action | |-------|---------|--------| -| d=1 | WILL BREAK — direct callers/importers | MUST update these | +| d=1 | WILL BREAK — direct callers/importers | MUST update | | d=2 | LIKELY AFFECTED — indirect deps | Should test | | d=3 | MAY NEED TESTING — transitive | Test if critical path | @@ -119,87 +122,80 @@ This project is indexed by GitNexus as **GitNexus** (4325 symbols, 10556 relatio | Resource | Use for | |----------|---------| -| `gitnexus://repo/GitNexus/context` | Codebase overview, check index freshness | +| `gitnexus://repo/GitNexus/context` | Codebase overview, index freshness | | `gitnexus://repo/GitNexus/clusters` | All functional areas | | `gitnexus://repo/GitNexus/processes` | All execution flows | | `gitnexus://repo/GitNexus/process/{name}` | Step-by-step execution trace | ## Self-Check Before Finishing -Before completing any code modification task, verify: 1. `gitnexus_impact` was run for all modified symbols -2. No HIGH/CRITICAL risk warnings were ignored -3. `gitnexus_detect_changes()` confirms changes match expected scope -4. All d=1 (WILL BREAK) dependents were updated +2. No HIGH/CRITICAL warnings were ignored +3. `gitnexus_detect_changes()` confirms expected scope +4. All d=1 dependents were updated ## Keeping the Index Fresh -After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it: - -```bash -npx gitnexus analyze -``` - -If the index previously included embeddings, preserve them by adding `--embeddings`: - ```bash -npx gitnexus analyze --embeddings +npx gitnexus analyze # basic refresh +npx gitnexus analyze --embeddings # preserve embeddings ``` -To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.** +Check `.gitnexus/meta.json` `stats.embeddings` (0 = none). Running without `--embeddings` deletes existing vectors. -> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`. +> Claude Code: PostToolUse hook handles this after `git commit` and `git merge`. -## CLI +## CLI Skills -| Task | Read this skill file | -|------|---------------------| -| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` | -| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` | -| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` | -| Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` | -| Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` | -| Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` | +| Task | Skill file | +|------|-----------| +| Architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` | +| Blast radius / "What breaks?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` | +| Debugging / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` | +| Refactoring | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` | +| Tools/resources/schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` | +| CLI commands (index, status, clean, wiki) | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` | -## Cursor Cloud specific instructions - -### Repository structure +## Repo reference -This is a monorepo with two main products and supporting config packages: +### Packages -| Component | Path | Purpose | -|-----------|------|---------| -| **GitNexus CLI/Core** | `gitnexus/` | Main product — TypeScript CLI, indexing pipeline, MCP server. Published to npm. | -| **GitNexus Web UI** | `gitnexus-web/` | React/Vite browser app — graph explorer + AI chat. Runs entirely in WASM. | -| Claude Plugin | `gitnexus-claude-plugin/` | Static config for Claude marketplace (no build). | -| Cursor Integration | `gitnexus-cursor-integration/` | Static config for Cursor editor (no build). | -| SWE-bench Eval | `eval/` | Python evaluation harness (optional; needs Docker + LLM API keys). | +| Package | Path | Purpose | +|---------|------|---------| +| **CLI/Core** | `gitnexus/` | TypeScript CLI, indexing pipeline, MCP server. Published to npm. | +| **Web UI** | `gitnexus-web/` | React/Vite thin client. All queries via `gitnexus serve` HTTP API. | +| **Shared** | `gitnexus-shared/` | Shared TypeScript types and constants. | +| Claude Plugin | `gitnexus-claude-plugin/` | Static config for Claude marketplace. | +| Cursor Integration | `gitnexus-cursor-integration/` | Static config for Cursor editor. | +| Eval | `eval/` | Python evaluation harness (Docker + LLM API keys). | ### Running services -- **CLI/Core**: `cd gitnexus && npm run dev` (tsx watch mode) or `npm run build && node dist/cli/index.js ` -- **Web UI**: `cd gitnexus-web && npm run dev` (Vite on port 5173) -- **Backend mode**: `cd && node /workspace/gitnexus/dist/cli/index.js serve` (HTTP API on port 3741 by default) +```bash +cd gitnexus && npm run dev # CLI: tsx watch mode +cd gitnexus-web && npm run dev # Web UI: Vite on port 5173 +npx gitnexus serve # HTTP API on port 4747 (from any indexed repo) +``` ### Testing **CLI / Core (`gitnexus/`)** -- **Unit tests**: `cd gitnexus && npm test` (vitest, ~2000 tests) -- **Integration tests**: `cd gitnexus && npm run test:integration` (vitest, ~1850 tests). Two LadybugDB file-locking tests (`lbug-core-adapter`, `search-core`) may fail in containerized environments due to `/tmp` locking limitations — this is a known environment issue, not a code bug. -- **TypeScript check**: `cd gitnexus && npx tsc --noEmit` +- `npm test` — full vitest suite (~2000 tests) +- `npm run test:unit` — unit tests only +- `npm run test:integration` — integration (~1850 tests). LadybugDB file-locking tests may fail in containers (known env issue). +- `npx tsc --noEmit` — typecheck **Web UI (`gitnexus-web/`)** -- **Unit tests**: `cd gitnexus-web && npm test` (vitest, ~200 tests) -- **E2E tests**: `cd gitnexus-web && E2E=1 npx playwright test` (Playwright, 5 tests — requires `gitnexus serve` + `npm run dev` running) -- **TypeScript check**: `cd gitnexus-web && npx tsc -b --noEmit` +- `npm test` — vitest (~200 tests) +- `npm run test:e2e` — Playwright (7 spec files; requires `gitnexus serve` + `npm run dev`) +- `npx tsc -b --noEmit` — typecheck -No separate lint command is configured; TypeScript strict checking serves as the primary static analysis. +**Pre-commit hook** (`.husky/pre-commit`): formatting (prettier via lint-staged) + typecheck for staged packages. Tests do **not** run in pre-commit — CI only. ### Gotchas -- `npm install` in `gitnexus/` triggers `prepare` (builds via `tsc`) and `postinstall` (patches tree-sitter-swift). Native tree-sitter bindings require `python3`, `make`, and `g++` to be present. -- `tree-sitter-kotlin` and `tree-sitter-swift` are optional dependencies — install warnings for these are expected and non-blocking. -- The Web UI uses `vite-plugin-wasm` and requires `Cross-Origin-Opener-Policy`/`Cross-Origin-Embedder-Policy` headers for `SharedArrayBuffer` (handled automatically by Vite dev server). -- There is no ESLint/Prettier configuration in this repo. +- `npm install` in `gitnexus/` triggers `prepare` (builds via `tsc`) and `postinstall` (patches tree-sitter-swift, builds tree-sitter-proto). Native bindings need `python3`, `make`, `g++`. +- `tree-sitter-kotlin` and `tree-sitter-swift` are optional — install warnings expected. +- ESLint configured via `eslint.config.mjs` (TS, React Hooks, unused-imports). No `npm run lint` script; use `npx eslint .`. Prettier runs via lint-staged. CI checks both in `ci-quality.yml`. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index ac4f46aefa..1be9468c3f 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -1,99 +1,129 @@ # Architecture — GitNexus -This repository is a **monorepo** with two main products: the **CLI / MCP package** (`gitnexus/`) and the **browser UI** (`gitnexus-web/`). Supporting folders ship editor integrations and plugins without changing the core graph engine. +Monorepo: **CLI/MCP** (`gitnexus/`) + **browser UI** (`gitnexus-web/`). ## Repository layout | Path | Role | |------|------| -| `gitnexus/` | Published npm package `gitnexus`: CLI, MCP server (stdio), local HTTP API for bridge mode, ingestion pipeline, LadybugDB graph, embeddings (optional). | -| `gitnexus-web/` | Vite + React UI: in-browser indexing (WASM), graph visualization, optional connection to `gitnexus serve`. | -| `.claude/`, `gitnexus-claude-plugin/`, `gitnexus-cursor-integration/` | Packaged **skills** and plugin metadata so agents discover the same workflows as documented in `AGENTS.md`. | -| `eval/` | Evaluation harnesses and docs for benchmarking tool usage. | -| `.github/` | CI workflows (quality, unit, integration, E2E) and composite actions. | +| `gitnexus/` | npm package `gitnexus`: CLI, MCP server (stdio), HTTP API, ingestion pipeline, LadybugDB graph, embeddings. | +| `gitnexus-web/` | Vite + React thin client: graph explorer + AI chat. All queries via `gitnexus serve` HTTP API. | +| `gitnexus-shared/` | Shared TypeScript types and constants (consumed by CLI and Web). | +| `.claude/`, `gitnexus-claude-plugin/`, `gitnexus-cursor-integration/` | Agent skills and plugin metadata. | +| `eval/` | Evaluation harnesses for benchmarking tool usage. | +| `.github/` | CI workflows + composite actions (`setup-gitnexus/`, `setup-gitnexus-web/`). | ## End-to-end flow: index → graph → tools -1. **Ingestion** (`gitnexus analyze`) - - Entry: `gitnexus/src/cli/analyze.ts` → `runPipelineFromRepo` in `gitnexus/src/core/ingestion/pipeline.ts`. - - The pipeline is structured as a **DAG (Directed Acyclic Graph)** of named phases (see [Pipeline Phase DAG](#pipeline-phase-dag) below). - - Output is loaded into **LadybugDB** under **`.gitnexus/`** at the repo root (`lbug/`, `meta.json`, etc.). Optional **FTS** indexes and **embeddings** attach to the same store. - - The repo is registered in **`~/.gitnexus/registry.json`** so MCP can find it from any working directory. +1. **Ingestion** — `analyze.ts` → `runFullAnalysis` (`run-analyze.ts`) → `runPipelineFromRepo` (`pipeline.ts`). DAG of 12 phases builds a `KnowledgeGraph` in memory, then loads into LadybugDB under `.gitnexus/`. Repo registered in `~/.gitnexus/registry.json` for MCP discovery. -2. **Persistence & metadata** - - `gitnexus/src/storage/repo-manager.ts` — paths, registry, cleanup of legacy Kuzu artifacts. - - `gitnexus/src/core/lbug/lbug-adapter.ts` — graph load, queries, embedding restore batches. +2. **Persistence** — `repo-manager.ts` (paths, registry, KuzuDB cleanup). `lbug-adapter.ts` (graph load, queries, embedding batches). -3. **Query & agents** - - **MCP (stdio):** `gitnexus/src/cli/mcp.ts` → `startMCPServer` → `LocalBackend` (`gitnexus/src/mcp/local/local-backend.ts`) opens registered repos and serves **tools** from `gitnexus/src/mcp/tools.ts` and **resources** from `gitnexus/src/mcp/resources.ts`. - - **Bridge HTTP:** `gitnexus/src/cli/serve.ts` → Express app in `gitnexus/src/server/api.ts` (CORS-limited) exposes REST + MCP-over-HTTP for the web UI. - - **CLI tools (no MCP):** `gitnexus query`, `context`, `impact`, `cypher` in `gitnexus/src/cli/tool.ts` call the same backend for scripts and CI. +3. **Query layer** — three interfaces to the same backend: + - **MCP (stdio):** `mcp.ts` → `LocalBackend` → tools (`tools.ts`) + resources (`resources.ts`) + - **HTTP bridge:** `serve.ts` → Express (`api.ts`, `mcp-http.ts`) for web UI + - **CLI direct:** `gitnexus query|context|impact|cypher` in `tool.ts` -4. **Staleness** - - `gitnexus/src/mcp/staleness.ts` compares indexed `lastCommit` to `HEAD` and surfaces hints when the graph is behind git. +4. **Staleness** — `staleness.ts` compares indexed `lastCommit` to `HEAD`, surfaces hints. -## MCP tools (summary) +## MCP tools | Tool | Purpose | |------|---------| -| `list_repos` | Discover indexed repositories when more than one is registered. | -| `query` | Natural-language / keyword search over the graph (hybrid BM25 + optional vectors). | -| `cypher` | Ad hoc **Cypher** against the schema (see resource `gitnexus://repo/{name}/schema`). | -| `context` | Callers, callees, processes for one symbol (with disambiguation). | -| `impact` | Blast radius (upstream/downstream) with depth and risk summary. | -| `detect_changes` | Map git diffs to affected symbols and processes. | -| `rename` | Graph-assisted rename with `dry_run` preview (`graph` vs `text_search` confidence). | +| `list_repos` | Discover indexed repos | +| `query` | Hybrid BM25 + vector search over the graph | +| `cypher` | Ad hoc Cypher against the schema | +| `context` | Callers, callees, processes for one symbol | +| `impact` | Blast radius (upstream/downstream) with risk summary | +| `detect_changes` | Map git diffs to affected symbols and processes | +| `rename` | Graph-assisted multi-file rename with `dry_run` preview | +| `api_impact` | Pre-change impact report for an API route handler | +| `route_map` | API route → handler → consumer mappings | +| `tool_map` | MCP/RPC tool definitions and handlers | +| `shape_check` | Response shape vs consumer property access mismatches | +| `group_list` | List repo groups or details for one group | +| `group_query` | Cross-repo search in a group (reciprocal rank fusion) | +| `group_sync` | Rebuild group Contract Registry (`contracts.json`) | +| `group_contracts` | Inspect group contracts and cross-links | +| `group_status` | Index and Contract Registry staleness per repo in a group | ## Where to change what -| If you are changing… | Start in… | -|----------------------|-----------| -| CLI commands / flags | `gitnexus/src/cli/` (`index.ts`, per-command modules). | -| Parsing or graph construction | `gitnexus/src/core/ingestion/pipeline-phases/` (individual phase files), `pipeline.ts` (orchestrator). | -| Graph schema / DB access | `gitnexus/src/core/lbug/` (`schema.ts`, `lbug-adapter.ts`), `gitnexus/src/mcp/core/lbug-adapter.ts` if MCP-specific. | -| MCP protocol, tools, resources | `gitnexus/src/mcp/server.ts`, `tools.ts`, `resources.ts`. | -| Search ranking | `gitnexus/src/core/search/` (BM25, hybrid fusion). | -| Embeddings | `gitnexus/src/core/embeddings/`, phases in `analyze.ts`. | -| Wiki generation | `gitnexus/src/core/wiki/`. | -| Web UI behavior | `gitnexus-web/src/` (components, workers, graph client). | -| CI | `.github/workflows/*.yml`, `.github/actions/setup-gitnexus/`. | +| Concern | Start in | +|---------|----------| +| CLI commands/flags | `src/cli/` (`index.ts`, per-command modules) | +| Parsing/graph construction | `src/core/ingestion/pipeline-phases/` + `pipeline.ts` | +| Graph schema/DB | `src/core/lbug/` (`schema.ts`, `lbug-adapter.ts`) | +| MCP tools/resources | `src/mcp/server.ts`, `tools.ts`, `resources.ts` | +| Search ranking | `src/core/search/` (BM25, hybrid fusion) | +| Embeddings | `src/core/embeddings/` + `src/core/run-analyze.ts` | +| Wiki generation | `src/core/wiki/` | +| Language support | `src/core/ingestion/languages/` + `tree-sitter-queries.ts` + `gitnexus-shared/src/languages.ts` | +| Import resolution | `src/core/ingestion/import-processor.ts` + `model/resolution-context.ts` | +| Call resolution/MRO | `src/core/ingestion/call-processor.ts` + `model/resolve.ts` | +| Type extraction | `src/core/ingestion/type-extractors/` | +| Worker pool | `src/core/ingestion/workers/` | +| Web UI | `gitnexus-web/src/` | +| CI | `.github/workflows/*.yml`, `.github/actions/` | + +> Paths above are relative to `gitnexus/` unless they start with `gitnexus-web/` or `.github/`. + +--- ## Pipeline Phase DAG -The ingestion pipeline is a DAG of named phases. Each phase is defined in its own file under `gitnexus/src/core/ingestion/pipeline-phases/` with explicit dependencies, typed inputs, and typed outputs. +12 phases defined in `gitnexus/src/core/ingestion/pipeline-phases/`, each with explicit `deps` and typed output. ``` scan → structure → [markdown, cobol] → parse → [routes, tools, orm] → crossFile → mro → communities → processes ``` -### Phase files - -| Phase | File | Dependencies | What it does | -|-------|------|-------------|--------------| -| `scan` | `scan.ts` | (root) | Walk repo filesystem, collect paths + sizes | -| `structure` | `structure.ts` | `scan` | Build File/Folder nodes + CONTAINS edges | -| `markdown` | `markdown.ts` | `structure` | Extract headings and cross-links from .md/.mdx | -| `cobol` | `cobol.ts` | `structure` | Regex-based COBOL/JCL extraction | -| `parse` | `parse.ts` + `parse-impl.ts` | `structure`, `markdown`, `cobol` | Chunked tree-sitter parse, import/call/heritage resolution | -| `routes` | `routes.ts` | `parse` | Route registry (Next.js, Expo, PHP, decorator-based) | -| `tools` | `tools.ts` | `parse` | MCP/RPC tool detection | -| `orm` | `orm.ts` | `parse` | Prisma/Supabase ORM query edges | -| `crossFile` | `cross-file.ts` + `cross-file-impl.ts` | `parse`, `routes`, `tools`, `orm` | Cross-file type propagation in topological order | -| `mro` | `mro.ts` | `crossFile` | Method Resolution Order, METHOD_OVERRIDES edges | -| `communities` | `communities.ts` | `mro` | Leiden community detection | -| `processes` | `processes.ts` | `communities`, `routes`, `tools` | Execution flow detection, Route/Tool → Process links | +| Phase | File | Deps | Output | +|-------|------|------|--------| +| `scan` | `scan.ts` | (root) | File paths + sizes | +| `structure` | `structure.ts` | `scan` | File/Folder nodes, CONTAINS edges, `allPathSet` | +| `markdown` | `markdown.ts` | `structure` | Section nodes, cross-link edges from .md/.mdx | +| `cobol` | `cobol.ts` | `structure` | COBOL program/paragraph/section nodes (regex, no tree-sitter) | +| `parse` | `parse.ts` + `parse-impl.ts` | `structure`, `markdown`, `cobol` | Symbol nodes, IMPORTS/CALLS/EXTENDS edges, extracted routes/tools/ORM queries | +| `routes` | `routes.ts` | `parse` | Route nodes + HANDLES_ROUTE edges (Next.js, Expo, PHP, decorators) | +| `tools` | `tools.ts` | `parse` | Tool nodes + HANDLES_TOOL edges | +| `orm` | `orm.ts` | `parse` | QUERIES edges (Prisma, Supabase) | +| `crossFile` | `cross-file.ts` + `cross-file-impl.ts` | `parse`, `routes`, `tools`, `orm` | Cross-file type propagation in topological import order | +| `mro` | `mro.ts` | `crossFile`, `structure` | METHOD_OVERRIDES + METHOD_IMPLEMENTS edges | +| `communities` | `communities.ts` | `mro`, `structure` | Community nodes + MEMBER_OF edges (Leiden algorithm) | +| `processes` | `processes.ts` | `communities`, `routes`, `tools`, `structure` | Process nodes + STEP_IN_PROCESS edges | + +**Non-phase files in the same directory:** `parse-impl.ts`, `cross-file-impl.ts` (implementation), `wildcard-synthesis.ts` (whole-module import expansion), `orm-extraction.ts` (sequential ORM fallback), `types.ts`, `runner.ts`, `index.ts`. + +### DAG runner + +`runner.ts` — static phase graph, no plugins, compile-time type safety. + +1. **Validation** — Kahn's topological sort. Rejects on: duplicate names, missing deps, cycles (DFS traces the concrete cycle path, e.g., `A -> B -> C -> A`, plus count of transitively blocked dependents). + +2. **Execution** — sequential in topological order. Each phase receives: + - `ctx: PipelineContext` — shared mutable `KnowledgeGraph`, `repoPath`, progress callback, options + - `deps: ReadonlyMap` — **declared deps only** (runner filters the results map to prevent hidden coupling) + +3. **Error handling** — wraps phase errors with the phase name, emits terminal `error` progress event, swallows progress handler errors to preserve the original cause. + +4. **Timing** — per-phase `durationMs` in `PhaseResult`, dev-mode console logging. + +**Design patterns:** +- **Single graph accumulator** — all phases mutate the same `KnowledgeGraph` in `ctx`; the graph is the primary output. +- **Typed phase access** — `getPhaseOutput(deps, 'name')` for type-safe upstream results. +- **Binding accumulator lifecycle** — created in `parse`, disposed by `crossFile` (in `finally`). No other phase should take ownership. +- **Skippable phases** — `skipGraphPhases` omits MRO/communities/processes (faster tests). `skipWorkers` forces sequential parsing. ### How to add a new phase -1. Create a new file in `pipeline-phases/` (e.g. `my-phase.ts`) -2. Define a `PipelinePhase` object with `name`, `deps`, and `execute(ctx, deps)` -3. Export it from `pipeline-phases/index.ts` -4. Add it to the `buildPhaseList()` function in `pipeline.ts` +1. Create `pipeline-phases/my-phase.ts` with a `PipelinePhase` (name, deps, execute) +2. Export from `pipeline-phases/index.ts` +3. Add to `buildPhaseList()` in `pipeline.ts` ```typescript -// pipeline-phases/my-phase.ts -import type { PipelinePhase, PipelineContext, PhaseResult } from './types.js'; +import type { PipelinePhase, PhaseResult } from './types.js'; import { getPhaseOutput } from './types.js'; import type { ParseOutput } from './parse.js'; @@ -101,81 +131,168 @@ export interface MyPhaseOutput { /* ... */ } export const myPhase: PipelinePhase = { name: 'myPhase', - deps: ['parse'], // runs after parse completes + deps: ['parse'], async execute(ctx, deps) { const { allPaths } = getPhaseOutput(deps, 'parse'); - // ... do work, write to ctx.graph ... + // ... write to ctx.graph ... return { /* typed output */ }; }, }; ``` -### DAG runner +--- + +## Language-agnostic graph feeding + +16 languages → single unified graph. Four abstraction layers: + +``` + Unified Graph Schema (44 node types, 21 relationship types) + ↑ + Unified Resolution (3-tier name lookup + MRO walk) + ↑ + Language Providers (import semantics, type config, export checker, MRO strategy) + ↑ + Tree-Sitter Queries (per-language S-expressions, unified capture tags) +``` + +### Language providers + +Each language implements `LanguageProvider` (`language-provider.ts`). Key fields: + +| Field | Purpose | +|-------|---------| +| `id`, `extensions` | Language identity and file matching | +| `treeSitterQueries` | S-expression queries for AST extraction | +| `importSemantics` | `named` / `wildcard-leaf` / `wildcard-transitive` / `namespace` | +| `importResolver` | Language-specific path → file resolution | +| `exportChecker` | Public/exported symbol detection | +| `typeConfig` | Type annotation extraction rules | +| `mroStrategy` | `first-wins` / `c3` / `none` | + +16 providers in `languages/index.ts` via `satisfies Record` — missing a language is a compile error. + +### Unified capture tags + +Per-language tree-sitter queries use different AST node names but produce the **same semantic capture tags**: `@definition.class`, `@definition.function`, `@call.name`, `@import.source`, `@heritage.extends`. Downstream extraction needs no language branching. Defined in `tree-sitter-queries.ts`. + +### Import resolution + +Unified 3-tier algorithm (`model/resolution-context.ts`), per-language `importSemantics` controls which tier activates: + +| Tier | Confidence | Mechanism | +|------|-----------|-----------| +| 1 — same-file | 0.95 | Symbol table for caller's file | +| 2 — import-scoped | 0.9 | `NamedImportMap` chains (named) or all files in `importMap` (wildcard) | +| 3 — global | 0.5 | O(1) index lookups: class, impl, callable. Fallback only | + +| Import strategy | Languages | Behavior | +|----------------|-----------|----------| +| `named` | TS, JS, Java, C#, Rust, PHP, Kotlin | Only explicitly imported names visible | +| `wildcard-leaf` | Go, Ruby, Swift, Dart | Whole-package import, no transitive re-exports | +| `wildcard-transitive` | C, C++ | `#include` closure chains through re-exports | +| `namespace` | Python | Module aliases resolved at call site | + +### Chunked parse-and-resolve + +`parse` processes files in ~20 MB byte-budget chunks to bound memory. Per chunk: +1. Worker pool dispatches files (or sequential fallback via `skipWorkers`) +2. Each worker: detect language → load grammar → run queries → return unified `ParseWorkerResult` +3. Synthesize wildcard bindings (`wildcard-synthesis.ts`) +4. Resolve imports and heritage +5. Collect `BindingAccumulator` entries for cross-file propagation + +Workers: `workers/worker-pool.ts`, `workers/parse-worker.ts`. + +### Heritage and MRO -The runner (`pipeline-phases/runner.ts`) validates the DAG at startup (detects cycles and missing deps via topological sort), then executes phases in dependency order. Each phase receives: -- `ctx: PipelineContext` — shared graph, repoPath, progress callback -- `deps: Map` — outputs from all upstream phases +All languages emit unified `ExtractedHeritage` (child, parent, `EXTENDS`/`IMPLEMENTS`). MRO phase walks the heritage graph using per-language strategy: +- **`first-wins`** — Java, C#, C++, TS, Ruby, Go +- **`c3`** — Python (C3 linearization) +- **`none`** — single-inheritance languages + +Unified walk: `lookupMethodByOwnerWithMRO()` in `model/resolve.ts`. + +--- + +## Full analysis flow + +`runFullAnalysis` in `run-analyze.ts` orchestrates everything around the pipeline: + +``` +CLI (analyze.ts) → runFullAnalysis(repoPath, options, callbacks) + 1. Early exit if lastCommit == HEAD (unless --force) [0%] + 2. Cache existing embeddings from prior index [0%] + 3. runPipelineFromRepo() → KnowledgeGraph [0-60%] + 4. Clean up legacy KuzuDB files [60%] + 5. initLbug() → loadGraphToLbug() via CSV streaming [60-85%] + 6. Create FTS indexes (File, Function, Class, Method...) [85-90%] + 7. Restore cached embeddings (batch insert) [88%] + 8. Generate new embeddings if --embeddings [90-98%] + 9. Save metadata + register repo + update .gitignore [98-100%] + 10. Generate AI context files (AGENTS.md, CLAUDE.md) [100%] +``` + +**Options:** `--force` (rebuild regardless), `--embeddings` (opt-in, skipped if >50k nodes), `--skipGit`, `--noStats`. + +## Storage + +``` +/.gitnexus/ + ├── lbug # LadybugDB database + ├── lbug.wal # Write-ahead log + ├── lbug.lock # Single-writer lock + └── meta.json # lastCommit, indexedAt, stats + +~/.gitnexus/ + └── registry.json # Global repo registry (MCP discovery) +``` + +Managed by `repo-manager.ts`. + +## LadybugDB schema + +Defined in `lbug/schema.ts`. Separate node tables per type, single `CodeRelation` table. + +**Node tables:** File, Folder, Function, Class, Interface, Method, Constructor, CodeElement, Struct, Enum, Macro, Typedef, Union, Namespace, Trait, Impl, TypeAlias, Const, Static, Property, Record, Delegate, Annotation, Template, Module, Community, Process, Route, Tool, Section, Embedding. + +**Relation types** (`CodeRelation.type`): CONTAINS, DEFINES, CALLS, IMPORTS, EXTENDS, IMPLEMENTS, HAS_METHOD, HAS_PROPERTY, ACCESSES, METHOD_OVERRIDES, METHOD_IMPLEMENTS, MEMBER_OF, STEP_IN_PROCESS, HANDLES_ROUTE, FETCHES, HANDLES_TOOL, ENTRY_POINT_OF. + +## Embeddings and search + +**Embeddings** (`src/core/embeddings/`): Snowflake arctic-embed-xs (384D). Embeddable: File, Function, Class, Method, Interface. Incremental via SHA1 content hash. Separate `Embedding` table. + +**Search** (`src/core/search/`): Hybrid BM25 + semantic vector, merged via Reciprocal Rank Fusion (K=60). ## Known limitations ### Overloaded method resolution -Method and Constructor node IDs include an arity suffix (`#`) to -disambiguate overloaded methods. Two overloads with different parameter counts -produce distinct graph nodes: `Method:file:Class.method#1` vs -`Method:file:Class.method#2`. - -**Same-arity overload disambiguation:** When two overloads share the same -parameter count but differ in types (e.g. `save(int)` vs `save(String)`), a -type-hash suffix `~type1,type2` is appended to produce distinct node IDs: -`Method:file:Class.save#1~int` vs `Method:file:Class.save#1~String`. The suffix -is only added when a same-arity collision is detected within a class and all -parameters have non-null type annotations. Languages without type info (Python, -Ruby, JS) fall back to arity-only IDs. TypeScript/JavaScript overload signatures -are intentionally excluded from type-hashing because they are declaration-only -contracts that should collapse to the implementation body's node ID. See issue -\#651. - -**C++ const-qualified overload disambiguation:** Methods overloaded by const -qualification (e.g. `begin()` vs `begin() const`) are disambiguated via an -`isConst` property and a `$const` ID suffix appended to the const-qualified -variant when a non-const collision exists. The `$const` suffix appears after the -type-hash suffix: e.g. `Method:file:Container.begin#0$const`. - -**Generic/template type preservation in type-hash:** The type-hash suffix uses -`rawType` (full AST text including generic/template args) rather than the -simplified `type` from `extractSimpleTypeName`. This means C++ template overloads -like `process(vector)` vs `process(vector)` produce distinct IDs: -`~vector` vs `~vector`. Java generic overloads like -`process(List)` vs `process(List)` are a compile error due to -type erasure, so this gap is theoretical for Java. - -**ID stability on first overload:** Type and const tags are collision-only. When -a class has `save(int)` as its only `save` method, the ID is `save#1` (no tag). -Adding `save(String)` changes the original to `save#1~int`. This is correct for -fresh analysis but means IDs are not stable across overload additions. Future -incremental re-analysis should account for this. - -**Variadic method matching:** When one side is variadic (`parameterCount` -undefined) and the other has a fixed count, `METHOD_IMPLEMENTS` edges are -emitted with confidence 0.7 instead of 1.0. Variadic methods like -`foo(String... args)` may superficially match `foo(String s)` by type but -are not guaranteed to be interchangeable across all languages (Java/Kotlin -accept this via varargs sugar; TypeScript, C#, Rust do not). - -**Confidence tiering** for `METHOD_IMPLEMENTS` edges: - -| Match quality | Confidence | When | -|---|---|---| -| Exact parameter types match | 1.0 | Both sides have `parameterTypes` arrays and they match | -| Arity (count) matches | 1.0 | Both sides have `parameterCount`, types unavailable | -| Variadic vs fixed | 0.7 | One side is variadic, other has fixed count | -| Lenient (insufficient info) | 0.7 | One or both sides lack type and count data | +Node IDs use arity suffix (`#`): `Method:file:Class.method#1` vs `#2`. + +**Same-arity disambiguation:** type-hash suffix `~type1,type2` when collision detected and type annotations present. Languages without types (Python, Ruby, JS) use arity-only. TS/JS overload signatures excluded (collapse to implementation body). See #651. + +**C++ const-qualified:** `$const` suffix after type-hash when non-const collision exists: `Method:file:Container.begin#0$const`. + +**Generic/template types:** type-hash uses `rawType` (full AST text including generics): `~vector` vs `~vector`. + +**ID stability:** collision-only tags mean IDs change when overloads are added. `save#1` becomes `save#1~int` when `save(String)` is added. + +**Variadic matching:** confidence 0.7 when one side is variadic and the other has fixed count. + +**METHOD_IMPLEMENTS confidence tiering:** + +| Match quality | Confidence | +|---|---| +| Exact parameter types match | 1.0 | +| Arity match, types unavailable | 1.0 | +| Variadic vs fixed | 0.7 | +| Insufficient info | 0.7 | ## Related docs -- [MIGRATION.md](MIGRATION.md) — breaking changes and migration guidance. -- [RUNBOOK.md](RUNBOOK.md) — operational commands and recovery. -- [GUARDRAILS.md](GUARDRAILS.md) — safety boundaries for humans and agents. -- [TESTING.md](TESTING.md) — how to run tests. -- `AGENTS.md` / `CLAUDE.md` — agent workflows and tool usage expectations for **this** repo when indexed by GitNexus. +- [MIGRATION.md](MIGRATION.md) — breaking changes and migration guidance +- [RUNBOOK.md](RUNBOOK.md) — operational commands and recovery +- [GUARDRAILS.md](GUARDRAILS.md) — safety boundaries for humans and agents +- [TESTING.md](TESTING.md) — how to run tests +- `AGENTS.md` / `CLAUDE.md` — agent workflows and tool usage diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d2d48f017a..22104edb47 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -62,7 +62,7 @@ Commits within a PR may use any style — only the **merged PR title** shows up - [ ] Typecheck passes: `npx tsc --noEmit` in `gitnexus/` and `npx tsc -b --noEmit` in `gitnexus-web/`. - [ ] No secrets, tokens, or machine-specific paths committed. - [ ] Documentation updated if behavior or public CLI/MCP contract changes. -- [ ] Pre-commit hook runs clean (`.husky/pre-commit` — typecheck + unit tests for staged packages). +- [ ] Pre-commit hook runs clean (`.husky/pre-commit` — formatting via lint-staged + typecheck for staged packages; tests run in CI only). ## Code review diff --git a/GUARDRAILS.md b/GUARDRAILS.md index 7401c79c56..ac48ab906e 100644 --- a/GUARDRAILS.md +++ b/GUARDRAILS.md @@ -1,72 +1,69 @@ -# Guardrails — GitNexus (repo + agents) +# Guardrails — GitNexus -Rules for **human contributors** and **AI agents** working on this codebase or publishing artifacts. These complement `AGENTS.md` / `CLAUDE.md` (which focus on GitNexus-in-GitNexus workflows). +Rules for **human contributors** and **AI agents**. Complements `AGENTS.md` (workflows) and `CONTRIBUTING.md` (PR process). -## Scope (typical agent session) +## Scope (least privilege) -When automating changes in this repository, treat scope as **least privilege**: +- **Read:** Source, tests, docs, public config as needed. +- **Write:** Only files required for the fix or feature; no unrelated formatting or refactors. +- **Execute:** Tests, typecheck, documented CLI commands. No destructive commands on user data without approval. +- **Off-limits:** Other people's machines, production deployments you don't own, credentials you lack permission to use. -- **Read:** Source, tests, docs, public config as needed for the task. -- **Write:** Only files required for the requested fix or feature; avoid unrelated formatting or refactors. -- **Execute:** Tests, typecheck, and documented CLI commands; do not run destructive commands on user data outside the repo without explicit approval. -- **Off-limits:** Other people’s machines, production deployments you don’t own, and credentials you didn’t receive permission to use. - -Adjust explicitly if the maintainer defines a different scope for a task. +Maintainer may widen scope per task. --- ## Non-negotiables -1. **Never commit secrets** — API keys, tokens, `.env` with real values, private URLs, or session cookies. Use `.env.example` with placeholders only. -2. **Never rename symbols with blind find-and-replace** when working in a GitNexus-indexed project — use the **`rename` MCP tool** with **`dry_run: true` first**, then review `graph` vs `text_search` edits. (There is no separate `gitnexus rename` CLI; renaming goes through MCP or editor integration.) -3. **Run impact analysis before editing shared symbols** — use **`impact`** (upstream) for functions/classes/methods others call; do not ignore **HIGH** / **CRITICAL** risk without maintainer sign-off. -4. **Prefer `detect_changes` before commit** — confirm diffs map to expected symbols/processes when the graph is available. -5. **Preserve embeddings** — if `.gitnexus/meta.json` shows embeddings, run `npx gitnexus analyze --embeddings` when refreshing the index; plain `analyze` can drop them. +1. **Never commit secrets** — API keys, tokens, real `.env` values, private URLs, session cookies. Use `.env.example` with placeholders. +2. **Never rename with find-and-replace** in GitNexus-indexed projects — use `rename` MCP tool with `dry_run: true` first, review `graph` vs `text_search` edits. No separate `gitnexus rename` CLI exists. +3. **Run impact analysis before editing shared symbols** — `impact` (upstream) for functions/classes/methods others call. Do not ignore HIGH/CRITICAL without maintainer sign-off. +4. **Run `detect_changes` before commit** — confirm diffs map to expected symbols/processes when the graph is available. +5. **Preserve embeddings** — if `.gitnexus/meta.json` shows embeddings, use `npx gitnexus analyze --embeddings`; plain `analyze` drops them. --- ## Signs (recurring failure patterns) -Use this format: **Trigger → Instruction → Reason**. -Append new Signs here when the same mistake repeats (e.g. CI broken twice the same way). +Format: **Trigger → Instruction → Reason**. Append new Signs when the same mistake repeats. -### Sign: Stale graph after edits +### Stale graph after edits -- **Trigger:** MCP or resources warn the index is behind `HEAD`, or code search doesn’t match latest commit. -- **Instruction:** Run `npx gitnexus analyze` from the repo root (plus `--embeddings` if the project used them). -- **Reason:** Tools query LadybugDB built at last analyze; git changes are invisible until re-indexed. +- **Trigger:** MCP warns index is behind `HEAD`, or search doesn't match latest commit. +- **Do:** `npx gitnexus analyze` (plus `--embeddings` if used). +- **Why:** Tools query LadybugDB from last analyze; git changes are invisible until re-indexed. -### Sign: Embeddings vanished after analyze +### Embeddings vanished after analyze -- **Trigger:** Semantic search quality drops; `stats.embeddings` in `.gitnexus/meta.json` is 0 after a refresh. -- **Instruction:** Re-run `npx gitnexus analyze --embeddings` and confirm `meta.json` reflects stored embeddings. -- **Reason:** Embedding generation is opt-in; analyze without the flag does not preserve prior vectors. +- **Trigger:** Semantic search quality drops; `stats.embeddings` in `meta.json` is 0 after refresh. +- **Do:** `npx gitnexus analyze --embeddings`, confirm `meta.json` reflects stored embeddings. +- **Why:** Embedding generation is opt-in; analyze without the flag does not preserve prior vectors. -### Sign: MCP lists no repos +### MCP lists no repos -- **Trigger:** MCP stderr says no indexed repos. -- **Instruction:** Run `npx gitnexus analyze` in the target repository; verify `npx gitnexus list` shows it. -- **Reason:** The MCP server discovers repos via `~/.gitnexus/registry.json`, populated by analyze. +- **Trigger:** MCP stderr says no indexed repos. +- **Do:** `npx gitnexus analyze` in the target repo; verify `npx gitnexus list` shows it. +- **Why:** MCP discovers repos via `~/.gitnexus/registry.json`, populated by analyze. -### Sign: Wrong repo in multi-repo setups +### Wrong repo in multi-repo setups -- **Trigger:** Query/impact results clearly belong to another project. -- **Instruction:** Call `list_repos`, then pass **`repo`** on subsequent tools (or use per-workspace MCP config). -- **Reason:** Default target may be ambiguous when multiple repos are registered. +- **Trigger:** Query/impact results belong to another project. +- **Do:** Call `list_repos`, then pass `repo` on subsequent tools. +- **Why:** Default target is ambiguous when multiple repos are registered. -### Sign: LadybugDB lock / “database busy” +### LadybugDB lock / "database busy" -- **Trigger:** Errors opening `.gitnexus/lbug` while MCP and analyze both run. -- **Instruction:** Stop overlapping processes; one writer at a time. Retry analyze or restart MCP. -- **Reason:** Embedded DB expects single-process ownership of the store. +- **Trigger:** Errors opening `.gitnexus/lbug` while MCP and analyze both run. +- **Do:** Stop overlapping processes (one writer at a time). Retry analyze or restart MCP. +- **Why:** Embedded DB expects single-process ownership. --- ## Publishing & supply chain -- **npm:** Do not publish from unreviewed automation; follow maintainer release process. Bump version intentionally; tag releases to match `package.json`. -- **Dependencies:** Prefer minimal, auditable changes to `package.json`; run tests and CI after lockfile updates. -- **License:** This project ships under **PolyForm Noncommercial 1.0.0** — do not relicense or imply a different license in docs or metadata without maintainer approval. +- **npm:** Do not publish from unreviewed automation. Bump version intentionally; tag releases to match `package.json`. +- **Dependencies:** Minimal, auditable `package.json` changes; run tests and CI after lockfile updates. +- **License:** PolyForm Noncommercial 1.0.0 — do not relicense without maintainer approval. --- @@ -74,15 +71,15 @@ Append new Signs here when the same mistake repeats (e.g. CI broken twice the sa Stop and ask a **human maintainer** when: -- Impact analysis shows **HIGH** / **CRITICAL** risk and the task still requires the change. -- You need to alter **CI**, **release**, or **security-sensitive** config. -- Requirements conflict (e.g. “speed up analyze” vs “must keep all embeddings on huge repo”). +- Impact analysis shows HIGH/CRITICAL risk and the task still requires the change. +- You need to alter CI, release, or security-sensitive config. +- Requirements conflict (e.g. "speed up analyze" vs "must keep all embeddings on huge repo"). - You are unsure whether data loss is acceptable (`clean`, forced migrations, schema changes). --- ## Related docs -- [ARCHITECTURE.md](ARCHITECTURE.md) — components and data flow. -- [RUNBOOK.md](RUNBOOK.md) — commands for recovery. -- [CONTRIBUTING.md](CONTRIBUTING.md) — PR and commit expectations. +- [ARCHITECTURE.md](ARCHITECTURE.md) — components and data flow +- [RUNBOOK.md](RUNBOOK.md) — commands for recovery +- [CONTRIBUTING.md](CONTRIBUTING.md) — PR and commit expectations diff --git a/TESTING.md b/TESTING.md index 8d267983a2..cf481d32b2 100644 --- a/TESTING.md +++ b/TESTING.md @@ -20,9 +20,9 @@ From repository root, unless noted: cd gitnexus npm install npm run build -npm test # unit: vitest run test/unit +npm test # full suite: vitest run +npm run test:unit # unit only: vitest run test/unit npm run test:integration # integration suite -npm run test:all npm run test:coverage npx tsc --noEmit # typecheck (matches CI) ``` @@ -42,8 +42,11 @@ npm run test:e2e # Playwright (requires gitnexus serve + npm run dev) A husky pre-commit hook (`.husky/pre-commit`) runs automatically on every `git commit`: -- **`gitnexus-web/` files staged** → `tsc -b --noEmit` + `vitest run` -- **`gitnexus/` files staged** → `tsc --noEmit` + `vitest run --project default` +1. **Formatting** — `lint-staged` runs prettier on staged files +2. **`gitnexus-web/` files staged** → `tsc -b --noEmit` +3. **`gitnexus/` files staged** → `tsc --noEmit` + +Tests do **not** run in the pre-commit hook — they run in CI (`ci-tests.yml`) only. Skip with `git commit --no-verify` (use sparingly). @@ -77,7 +80,7 @@ Re-run the full relevant suite when: GitHub Actions (`.github/workflows/ci.yml`) orchestrate: -- **`ci-quality.yml`** — `tsc --noEmit` for `gitnexus/` + `tsc -b --noEmit` for `gitnexus-web/` +- **`ci-quality.yml`** — prettier format check, eslint lint, `tsc --noEmit` for `gitnexus/`, `tsc -b --noEmit` for `gitnexus-web/` - **`ci-tests.yml`** — `vitest run` with coverage (ubuntu) + cross-platform (macOS, Windows) - **`ci-e2e.yml`** — Playwright E2E tests, gated on `gitnexus-web/**` changes