BicameralAI · jinhongkuan · Jun 9, 2026 · Jun 8, 2026 · Jun 9, 2026
@@ -8,11 +8,20 @@
 [![Python](https://img.shields.io/pypi/pyversions/bicameral-mcp)](https://pypi.org/project/bicameral-mcp/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
 [![CI](https://img.shields.io/github/actions/workflow/status/BicameralAI/bicameral-mcp/test-mcp-regression.yml?branch=main&label=tests)](https://github.com/BicameralAI/bicameral-mcp/actions)
+[![Lint + Types](https://img.shields.io/github/actions/workflow/status/BicameralAI/bicameral-mcp/lint-and-typecheck.yml?branch=main&label=lint%2Btypes)](https://github.com/BicameralAI/bicameral-mcp/actions/workflows/lint-and-typecheck.yml)
+[![Secret scan](https://img.shields.io/github/actions/workflow/status/BicameralAI/bicameral-mcp/secret-scan.yml?branch=main&label=secret-scan)](https://github.com/BicameralAI/bicameral-mcp/actions/workflows/secret-scan.yml)
 
 AI agents ship code fast. They forget what your team agreed — and requirement gaps surfaced mid-implementation are buried under thousands of lines of code.
 
 Bicameral MCP is a **spec compliance layer** for AI-assisted engineering. Local-first; runs as an [MCP server](https://spec.modelcontextprotocol.io/). It ingests your meeting transcripts, PRDs, and Slack threads, captures any mid-implementation decision that was not discussed, to be ratified async by your product owner, and pins each one to the code that implements it — so your agent finds out the moment it drifts from either the written spec or the spoken one.
 
+| | |
+|---|---|
+| **Maturity** | Published on PyPI; local-first MCP server; Solo + Team modes (`setup` wizard picks at install) |
+| **Footprint** | Embedded SurrealDB in-process — no separate server, no daemon; install via `uv` or `pip` |
+| **Trust boundary** | The OS user account. Code, decisions, and transcripts stay on your machine unless you opt into Team mode (which shares an append-only event file via a substrate *you* own) |
+| **Assurance** | Phase-gated regression suite on real adapters (`memory://`); sociable handler/ledger tests; lint+types and secret-scan CI gates. Broader security/governance gates tracked in [#557](https://github.com/BicameralAI/bicameral-mcp/issues/557) |
+
 ---
 
 ## Quickstart

@@ -1,54 +1,76 @@
 # MCP Regression Tests
 
-Tests are gated by phase. Each phase gate is an env var. Run only what's implemented.
+The suite is **phase-gated**: each phase layers on the previous one and is toggled
+by an environment variable, so you can run only what is wired up locally. All
+phases run against **real adapters** — the legacy mock layer is retired (see
+`mocks/README.md` for history). In tests the embedded SurrealDB runs in-process
+via `SURREAL_URL=memory://` (no server, no persistence).
 
-## Running tests
+## Quickstart
 
 ```bash
-source .venv/bin/activate  # or: python -m pytest directly via .venv/bin/pytest
+source .venv/bin/activate            # or call .venv/bin/pytest directly
 
-# Packaging / startup smoke
+# Packaging / startup smoke — registers and lists every MCP tool
 bicameral-mcp --smoke-test
 
-# Phase 0 — always green (mocks only, no dependencies)
-pytest tests/test_phase0_mocks.py -v
-
-# Phase 1 — requires real code locator (Silong's work)
-USE_REAL_CODE_LOCATOR=1 REPO_PATH=/path/to/repo pytest tests/test_phase1_code_locator.py -v
-
-# Phase 2 — embedded SurrealDB path for tests
-USE_REAL_LEDGER=1 SURREAL_URL=memory:// pytest tests/test_phase2_ledger.py -v
-
-# Phase 3 — full integration (requires both)
-USE_REAL_CODE_LOCATOR=1 USE_REAL_LEDGER=1 SURREAL_URL=memory:// REPO_PATH=/path/to/repo pytest tests/test_phase3_integration.py -v
-
-# All phases at once (use for CI once all phases are complete)
-pytest tests/ -v
+# Full suite, the way CI runs it
+SURREAL_URL=memory:// pytest tests/ -v
 ```
 
-## Phase status
+## Phase gates
 
-| File | Passes without dependencies | Unblocked by |
-|------|-----------------------------|--------------|
-| `test_phase0_mocks.py` | YES | — |
-| `test_phase1_code_locator.py` | NO | real code locator index + provider credentials |
-| `test_phase2_ledger.py` | NO | `USE_REAL_LEDGER=1` + `memory://` or SurrealDB URL |
-| `test_phase3_integration.py` | NO | Both Phase 1 + Phase 2 complete |
+| Phase | File | Gate (env) | Validates |
+|---|---|---|---|
+| 1 | `test_phase1_code_locator.py` | `USE_REAL_CODE_LOCATOR=1` + `REPO_PATH=…` | Code-locator correctness: located paths exist on disk, symbols are real repo names, confidence in range |
+| 2 | `test_phase2_ledger.py` | `USE_REAL_LEDGER=1` + `SURREAL_URL=memory://` | Ledger correctness: idempotent ingest, BM25 search relevance, file→decision reverse traversal, `link_commit` status updates |
+| 3 | `test_phase3_integration.py` | Both of the above | End-to-end: ingest transcript → code locator → graph store → query-back coheres |
 
-## What each phase validates
+```bash
+# Phase 1
+USE_REAL_CODE_LOCATOR=1 REPO_PATH=/path/to/repo pytest tests/test_phase1_code_locator.py -v
 
-**Phase 0**: Contract shapes. Do all 4 tools return valid Pydantic types? Are all required fields present?
+# Phase 2
+USE_REAL_LEDGER=1 SURREAL_URL=memory:// pytest tests/test_phase2_ledger.py -v
 
-**Phase 1**: Code locator correctness. Do located file paths exist on disk? Are symbols real names from the repo? Is confidence in the expected range?
+# Phase 3 (full integration — needs both gates)
+USE_REAL_CODE_LOCATOR=1 USE_REAL_LEDGER=1 SURREAL_URL=memory:// REPO_PATH=/path/to/repo \
+  pytest tests/test_phase3_integration.py -v
+```
 
-**Phase 2**: Ledger correctness. Is ingestion idempotent? Does BM25 search return relevant results? Does reverse traversal (file → decisions) work? Does `link_commit` update statuses correctly?
+## Environment variables
 
-**Phase 3**: End-to-end pipeline. Does ingesting a sample transcript + running code locator + storing in graph + querying back produce a coherent result?
+| Var | Default | Effect |
+|---|---|---|
+| `SURREAL_URL` | `memory://` | Ledger URL for tests (in-process, no persistence). Override when exercising a persistent SurrealKV path. |
+| `USE_REAL_CODE_LOCATOR` | unset | Gate phase-1/3 code-locator tests on a real tree-sitter index. |
+| `USE_REAL_LEDGER` | unset | Gate phase-2/3 tests on a real embedded SurrealDB adapter. |
+| `REPO_PATH` | `.` | Repo the code locator indexes. |
 
 ## Packaging smoke
 
-The installable package surface is now the first startup check:
+The installable surface is the first startup check:
 
-1. `pip install -r requirements.txt`
+1. `pip install -e ".[test]"`
 2. `bicameral-mcp --smoke-test`
-3. Verify the command prints the 5 registered tool names
+3. It prints the server name/version and **every registered MCP tool name** — 20
+   today (18 `bicameral.*` ledger/session tools + the 2 code-locator primitives
+   `validate_symbols` and `get_neighbors`). The asserted source of truth is
+   `EXPECTED_TOOL_NAMES` in `server.py`; the smoke test fails if the live registry
+   drifts from it. The user-facing subset is documented in the root `README.md`
+   § MCP Tools Reference.
+
+## Sociable testing
+
+Handler and ledger tests default to **sociable** units (real `memory://` adapter,
+`SimpleNamespace` ctx) — not mocks. The full contract and the reference patterns
+are in the repo-root `CLAUDE.md` § "Sociable Testing for UX Paths".
+
+## What CI runs
+
+`.github/workflows/test-mcp-regression.yml` runs the phase suites plus the ledger,
+schema-recovery, replay-determinism, extractor-parity, shadow-dispatch, and
+dashboard tests in a single `pytest` invocation against `SURREAL_URL=memory://`,
+then uploads JUnit XML + a self-contained HTML report as artifacts. The end-to-end
+user-flow suite is separate and currently shelved to manual dispatch — see
+`tests/e2e/README.md`.
@@ -1,14 +1,20 @@
 # v0 user flow e2e
 
-End-to-end validation of `BicameralAI/bicameral#108`'s six canonical user
-flows, driven by **real Claude Code CLI sessions** with `bicameral-mcp`
-registered as an MCP server. Test fixture: a pinned commit of
-`github.com/desktop/desktop`, with `docs/process/roadmap.md` as ingest
-content.
-
-This is the canonical CI test for the spec. The handler-replay simulation
-at `scripts/sim_issue_108_flows.py` complements it for fast local iteration
-on handler logic without burning Claude API calls.
+End-to-end validation of the canonical user flows in
+`BicameralAI/bicameral#108`, driven by **real Claude Code CLI sessions** with
+`bicameral-mcp` registered as an MCP server. Five flows (1–5) are automated.
+Test fixture: a pinned commit of `github.com/desktop/desktop`, with
+`docs/process/roadmap.md` as ingest content.
+
+> **Status: shelved to manual dispatch (#556).** This suite is no longer a PR
+> gate. The harness accumulated maintenance debt — API-key credit exhaustion,
+> agent-budget non-determinism (#272), and twice-reworked auth (#528, #540) —
+> that blocked PRs without actionable signal. The test code and prompts are
+> preserved; run it manually via **Actions → v0 user flow e2e → Run workflow**.
+> A replacement validation strategy is tracked in RFQ #555.
+
+The handler-replay simulation at `scripts/sim_issue_108_flows.py` is the fast
+local path for iterating on handler logic without burning Claude API calls.
 
 ## What it tests
 
@@ -66,10 +72,13 @@ per flow.
 
 ## CI
 
-GitHub Actions workflow: `.github/workflows/v0-user-flow-e2e.yml`.
+GitHub Actions workflow: `.github/workflows/v0-user-flow-e2e.yml` —
+**dispatch-only (shelved, #556)**.
 
-- Triggers on PRs touching `tests/e2e/**`, `handlers/**`, `ledger/**`,
-  `contracts.py`, `skills/bicameral-*/**`, or the workflow itself.
+- **No PR trigger.** Run manually: Actions → *v0 user flow e2e* → *Run workflow*.
+  (It previously triggered on PRs touching `tests/e2e/**`, `handlers/**`,
+  `ledger/**`, `contracts.py`, or `skills/bicameral-*/**`.)
+- Replacement validation strategy: RFQ #555.
 - Runs in the `ci-test` GitHub environment for `ANTHROPIC_API_KEY`
   (switched from `production` + `CLAUDE_CODE_OAUTH_TOKEN` in #528 after the
   org subscription was disabled).