diff --git a/README.md b/README.md index 3248a48f..4faea949 100644 --- a/README.md +++ b/README.md @@ -8,11 +8,20 @@ [![Python](https://img.shields.io/pypi/pyversions/bicameral-mcp)](https://pypi.org/project/bicameral-mcp/) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![CI](https://img.shields.io/github/actions/workflow/status/BicameralAI/bicameral-mcp/test-mcp-regression.yml?branch=main&label=tests)](https://github.com/BicameralAI/bicameral-mcp/actions) +[![Lint + Types](https://img.shields.io/github/actions/workflow/status/BicameralAI/bicameral-mcp/lint-and-typecheck.yml?branch=main&label=lint%2Btypes)](https://github.com/BicameralAI/bicameral-mcp/actions/workflows/lint-and-typecheck.yml) +[![Secret scan](https://img.shields.io/github/actions/workflow/status/BicameralAI/bicameral-mcp/secret-scan.yml?branch=main&label=secret-scan)](https://github.com/BicameralAI/bicameral-mcp/actions/workflows/secret-scan.yml) AI agents ship code fast. They forget what your team agreed — and requirement gaps surfaced mid-implementation are buried under thousands of lines of code. Bicameral MCP is a **spec compliance layer** for AI-assisted engineering. Local-first; runs as an [MCP server](https://spec.modelcontextprotocol.io/). It ingests your meeting transcripts, PRDs, and Slack threads, captures any mid-implementation decision that was not discussed, to be ratified async by your product owner, and pins each one to the code that implements it — so your agent finds out the moment it drifts from either the written spec or the spoken one. +| | | +|---|---| +| **Maturity** | Published on PyPI; local-first MCP server; Solo + Team modes (`setup` wizard picks at install) | +| **Footprint** | Embedded SurrealDB in-process — no separate server, no daemon; install via `uv` or `pip` | +| **Trust boundary** | The OS user account. Code, decisions, and transcripts stay on your machine unless you opt into Team mode (which shares an append-only event file via a substrate *you* own) | +| **Assurance** | Phase-gated regression suite on real adapters (`memory://`); sociable handler/ledger tests; lint+types and secret-scan CI gates. Broader security/governance gates tracked in [#557](https://github.com/BicameralAI/bicameral-mcp/issues/557) | + --- ## Quickstart diff --git a/tests/README.md b/tests/README.md index d784215e..f780797f 100644 --- a/tests/README.md +++ b/tests/README.md @@ -1,54 +1,76 @@ # MCP Regression Tests -Tests are gated by phase. Each phase gate is an env var. Run only what's implemented. +The suite is **phase-gated**: each phase layers on the previous one and is toggled +by an environment variable, so you can run only what is wired up locally. All +phases run against **real adapters** — the legacy mock layer is retired (see +`mocks/README.md` for history). In tests the embedded SurrealDB runs in-process +via `SURREAL_URL=memory://` (no server, no persistence). -## Running tests +## Quickstart ```bash -source .venv/bin/activate # or: python -m pytest directly via .venv/bin/pytest +source .venv/bin/activate # or call .venv/bin/pytest directly -# Packaging / startup smoke +# Packaging / startup smoke — registers and lists every MCP tool bicameral-mcp --smoke-test -# Phase 0 — always green (mocks only, no dependencies) -pytest tests/test_phase0_mocks.py -v - -# Phase 1 — requires real code locator (Silong's work) -USE_REAL_CODE_LOCATOR=1 REPO_PATH=/path/to/repo pytest tests/test_phase1_code_locator.py -v - -# Phase 2 — embedded SurrealDB path for tests -USE_REAL_LEDGER=1 SURREAL_URL=memory:// pytest tests/test_phase2_ledger.py -v - -# Phase 3 — full integration (requires both) -USE_REAL_CODE_LOCATOR=1 USE_REAL_LEDGER=1 SURREAL_URL=memory:// REPO_PATH=/path/to/repo pytest tests/test_phase3_integration.py -v - -# All phases at once (use for CI once all phases are complete) -pytest tests/ -v +# Full suite, the way CI runs it +SURREAL_URL=memory:// pytest tests/ -v ``` -## Phase status +## Phase gates -| File | Passes without dependencies | Unblocked by | -|------|-----------------------------|--------------| -| `test_phase0_mocks.py` | YES | — | -| `test_phase1_code_locator.py` | NO | real code locator index + provider credentials | -| `test_phase2_ledger.py` | NO | `USE_REAL_LEDGER=1` + `memory://` or SurrealDB URL | -| `test_phase3_integration.py` | NO | Both Phase 1 + Phase 2 complete | +| Phase | File | Gate (env) | Validates | +|---|---|---|---| +| 1 | `test_phase1_code_locator.py` | `USE_REAL_CODE_LOCATOR=1` + `REPO_PATH=…` | Code-locator correctness: located paths exist on disk, symbols are real repo names, confidence in range | +| 2 | `test_phase2_ledger.py` | `USE_REAL_LEDGER=1` + `SURREAL_URL=memory://` | Ledger correctness: idempotent ingest, BM25 search relevance, file→decision reverse traversal, `link_commit` status updates | +| 3 | `test_phase3_integration.py` | Both of the above | End-to-end: ingest transcript → code locator → graph store → query-back coheres | -## What each phase validates +```bash +# Phase 1 +USE_REAL_CODE_LOCATOR=1 REPO_PATH=/path/to/repo pytest tests/test_phase1_code_locator.py -v -**Phase 0**: Contract shapes. Do all 4 tools return valid Pydantic types? Are all required fields present? +# Phase 2 +USE_REAL_LEDGER=1 SURREAL_URL=memory:// pytest tests/test_phase2_ledger.py -v -**Phase 1**: Code locator correctness. Do located file paths exist on disk? Are symbols real names from the repo? Is confidence in the expected range? +# Phase 3 (full integration — needs both gates) +USE_REAL_CODE_LOCATOR=1 USE_REAL_LEDGER=1 SURREAL_URL=memory:// REPO_PATH=/path/to/repo \ + pytest tests/test_phase3_integration.py -v +``` -**Phase 2**: Ledger correctness. Is ingestion idempotent? Does BM25 search return relevant results? Does reverse traversal (file → decisions) work? Does `link_commit` update statuses correctly? +## Environment variables -**Phase 3**: End-to-end pipeline. Does ingesting a sample transcript + running code locator + storing in graph + querying back produce a coherent result? +| Var | Default | Effect | +|---|---|---| +| `SURREAL_URL` | `memory://` | Ledger URL for tests (in-process, no persistence). Override when exercising a persistent SurrealKV path. | +| `USE_REAL_CODE_LOCATOR` | unset | Gate phase-1/3 code-locator tests on a real tree-sitter index. | +| `USE_REAL_LEDGER` | unset | Gate phase-2/3 tests on a real embedded SurrealDB adapter. | +| `REPO_PATH` | `.` | Repo the code locator indexes. | ## Packaging smoke -The installable package surface is now the first startup check: +The installable surface is the first startup check: -1. `pip install -r requirements.txt` +1. `pip install -e ".[test]"` 2. `bicameral-mcp --smoke-test` -3. Verify the command prints the 5 registered tool names +3. It prints the server name/version and **every registered MCP tool name** — 20 + today (18 `bicameral.*` ledger/session tools + the 2 code-locator primitives + `validate_symbols` and `get_neighbors`). The asserted source of truth is + `EXPECTED_TOOL_NAMES` in `server.py`; the smoke test fails if the live registry + drifts from it. The user-facing subset is documented in the root `README.md` + § MCP Tools Reference. + +## Sociable testing + +Handler and ledger tests default to **sociable** units (real `memory://` adapter, +`SimpleNamespace` ctx) — not mocks. The full contract and the reference patterns +are in the repo-root `CLAUDE.md` § "Sociable Testing for UX Paths". + +## What CI runs + +`.github/workflows/test-mcp-regression.yml` runs the phase suites plus the ledger, +schema-recovery, replay-determinism, extractor-parity, shadow-dispatch, and +dashboard tests in a single `pytest` invocation against `SURREAL_URL=memory://`, +then uploads JUnit XML + a self-contained HTML report as artifacts. The end-to-end +user-flow suite is separate and currently shelved to manual dispatch — see +`tests/e2e/README.md`. diff --git a/tests/e2e/README.md b/tests/e2e/README.md index 567fb4e8..b29ca823 100644 --- a/tests/e2e/README.md +++ b/tests/e2e/README.md @@ -1,14 +1,20 @@ # v0 user flow e2e -End-to-end validation of `BicameralAI/bicameral#108`'s six canonical user -flows, driven by **real Claude Code CLI sessions** with `bicameral-mcp` -registered as an MCP server. Test fixture: a pinned commit of -`github.com/desktop/desktop`, with `docs/process/roadmap.md` as ingest -content. - -This is the canonical CI test for the spec. The handler-replay simulation -at `scripts/sim_issue_108_flows.py` complements it for fast local iteration -on handler logic without burning Claude API calls. +End-to-end validation of the canonical user flows in +`BicameralAI/bicameral#108`, driven by **real Claude Code CLI sessions** with +`bicameral-mcp` registered as an MCP server. Five flows (1–5) are automated. +Test fixture: a pinned commit of `github.com/desktop/desktop`, with +`docs/process/roadmap.md` as ingest content. + +> **Status: shelved to manual dispatch (#556).** This suite is no longer a PR +> gate. The harness accumulated maintenance debt — API-key credit exhaustion, +> agent-budget non-determinism (#272), and twice-reworked auth (#528, #540) — +> that blocked PRs without actionable signal. The test code and prompts are +> preserved; run it manually via **Actions → v0 user flow e2e → Run workflow**. +> A replacement validation strategy is tracked in RFQ #555. + +The handler-replay simulation at `scripts/sim_issue_108_flows.py` is the fast +local path for iterating on handler logic without burning Claude API calls. ## What it tests @@ -66,10 +72,13 @@ per flow. ## CI -GitHub Actions workflow: `.github/workflows/v0-user-flow-e2e.yml`. +GitHub Actions workflow: `.github/workflows/v0-user-flow-e2e.yml` — +**dispatch-only (shelved, #556)**. -- Triggers on PRs touching `tests/e2e/**`, `handlers/**`, `ledger/**`, - `contracts.py`, `skills/bicameral-*/**`, or the workflow itself. +- **No PR trigger.** Run manually: Actions → *v0 user flow e2e* → *Run workflow*. + (It previously triggered on PRs touching `tests/e2e/**`, `handlers/**`, + `ledger/**`, `contracts.py`, or `skills/bicameral-*/**`.) +- Replacement validation strategy: RFQ #555. - Runs in the `ci-test` GitHub environment for `ANTHROPIC_API_KEY` (switched from `production` + `CLAUDE_CODE_OAUTH_TOKEN` in #528 after the org subscription was disabled).