Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/test-mcp-regression.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ jobs:
tests/test_phase1_code_locator.py
tests/test_phase2_ledger.py
tests/test_phase3_integration.py
tests/test_legacy_ledger_fixtures.py
tests/test_schema_recoverable_errors.py
-v --tb=short
--junitxml=test-results/results.xml
--html=test-results/report.html --self-contained-html
Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,26 @@ All notable changes to bicameral-mcp are tracked here. Format loosely follows

- **#243 Piece B — Eager symbol-index initialization at server startup, fail-loud on empty index.** Pre-fix, `get_code_locator()` returned a fresh `RealCodeLocatorAdapter` per call AND lazy-initialized the symbol index on first `_ensure_initialized()` call — so the FIRST tool call after server boot paid the index-build cost AND could race the index check on concurrent dispatch. Post-fix, `get_code_locator()` is **singleton-by-`REPO_PATH`** (multi-repo correctness preserved) with a new `reset_code_locator_cache()` test-only hook. New `async def RealCodeLocatorAdapter.initialize()` wraps `_ensure_initialized()` in a thread-pool executor for the `serve_stdio()` startup path; idempotent on already-initialized adapters. `serve_stdio()` calls `await get_code_locator().initialize()` between the dashboard sidecar start and the consent-notice block — **fail-loud**: an empty/missing index now propagates `RuntimeError("Code locator index is empty...")` and aborts boot with a clear stderr message ("Run: python -m code_locator index <repo_path>") rather than silently degrading every preflight call. Lazy init via `_ensure_initialized` remains for test contexts where mock adapters bypass startup. 4 new tests in `tests/test_preflight_graph_expansion.py` cover singleton-by-`REPO_PATH` (with reset), `initialize()` idempotency, `RuntimeError` propagation, and the `serve_stdio` boot-refusal path. Together with Piece A: production deployments can no longer accumulate silent-fallback hours; either the index is healthy at boot OR the operator gets a loud failure they can act on. Refs #243 (parent #173 / PR #174). Plan signoff via https://github.com/BicameralAI/bicameral-mcp/issues/243#issuecomment-4414163338.

## v0.14.5 — bicameral_diagnose tool + reset --replay-from-events + README polish (triage)

Triages the #296 ledger-resilience track and the #299 README pass onto main. New `bicameral_diagnose` MCP tool gives operators a privacy-preserving structural read of their ledger (recovery_path classification + table counts + recent audit events) without crashing on init when the ledger is corrupt — pairs with `bicameral_reset --replay-from-events` for the dataloss-avoidant recovery path. README opens with a position-take pitch + three-scene demo video drop-in (ingest → preflight → ratify async).

### Added

- **`bicameral_diagnose` MCP tool (#296).** Read-only structural diagnosis that opens a raw `LedgerClient` (no `init_schema` / `migrate`) so it works even when the normal adapter connect crashes during init. Returns `recovery_path` classification (`clean` / `fixable` / `reset_rebuild` / `reset_destructive`), `schema_meta` state, table counts, recent `warn`|`error` audit events, and a `next_action` recommendation. Slash alias `/bicameral-diagnose`. New skill at `skills/bicameral-diagnose/SKILL.md`. Privacy-preserving — only structural shape leaves the machine, never decision content.
- **`bicameral_reset --replay-from-events` flag (#296).** Rebuild the ledger from the team-mode JSONL event substrate instead of nuking decision rows. Use when the binary store is corrupt but the event log is intact — recovers without dataloss in team mode.
- **Legacy-ledger fixtures + replay tests (#296).** New `tests/fixtures/legacy_ledgers/` holds reproducible byte-level corruption fixtures (e.g. `v3_yields_source_span.py` for the v16→v17 yields integrity bug). `tests/test_legacy_ledger_fixtures.py` and `tests/test_schema_recoverable_errors.py` exercise the recovery paths against these fixtures.

### Fixed

- **`ledger/schema.py`: resilient init + v16→v17 yields integrity cleanup (#296).** v0.14.4 ledgers with stale `source_span:...` records on the `yields.in` field rejected `DEFINE INDEX OVERWRITE idx_yields_unique` on every connect, blocking `ingest` / `history` / `preflight` until manual reset. The migration now sweeps these stale rows during the v16→v17 step and the init path tolerates the recovery without aborting. Pairs with `bicameral_diagnose` for operator visibility into whether the migration ran.

### Documentation

- **README opener rewrite (#299).** Two-paragraph position-take: paragraph 1 names the failure mode ("requirement gaps surfaced mid-implementation are buried under thousands of lines of code"); paragraph 2 introduces Bicameral MCP as a **spec compliance layer** for AI-assisted engineering that ingests transcripts / PRDs / Slack threads, captures any mid-implementation decision that was not discussed (to be ratified async by the product owner), and pins each one to the implementing code.
- **README demo video section (#299).** Replaces the dashboard image transition with a three-beat demo loop — ingest (PM/dev) → preflight (auto) → ratify async (product owner) — each as an inline `user-attachments` video drop-in so the videos render on github.com without asset-path coupling.
- **README star CTA relocation (#299).** Moved from the top header (where it sat awkwardly between the hero image and the logo) to a centered placement immediately after the demo videos — natural post-demo conversion beat.

## v0.14.4 — Hotfix: skip install-time manifest verification until release-side sigstore wiring lands

Hotfix on v0.14.3. Unblocks every fresh `bicameral-mcp setup` against the published wheel — the v0.14.x wheels ship `skills-manifest.toml` and `hooks-manifest.json` but no `.sig`/`.crt` companions yet (release-side sigstore signing is a deferred follow-up of #218 LLM-06 / #237 LLM-11). The install-time verifiers were hitting a missing-signature path their own docstrings claim is unreachable, raising `SignatureError` and aborting setup.
Expand Down
21 changes: 21 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,27 @@ worse than a compile error because it fails at runtime in production sessions.
- [ ] Did a new tool get added? → Create `skills/<tool-name>/SKILL.md`
- [ ] Did a status literal gain a new value (e.g. `"proposal"`)? → Update every skill that renders status

## Sociable Testing for UX Paths (Mandatory for Handlers + Ledger)

Default to **sociable unit tests** ([Martin Fowler, "On the Diverse And Fantastical Shape of Testing"](https://martinfowler.com/articles/2021-test-shapes.html)) for anything the MCP agent actually invokes: handlers under `handlers/`, ledger queries in `ledger/`, and the contracts they return. A test is **solitary** when it replaces a collaborator we ship to users (the `ctx`, the `ledger`, a handler in the call graph) with a `MagicMock` / `AsyncMock` / `patch(...)`; it's **sociable** when it runs the real collaborator and only seams off something we genuinely can't run in tests (network, time, external SaaS, an injected failure mode like "symbol disappears").

The motivation is concrete: AI-authored tests skew solitary because mocks are easy to make pass. A solitary test for `get_session_start_banner` stayed green for months while `get_decisions_by_status` was selecting an undefined `decision_id` field and returning `None` for every banner row — agents saw null IDs in production while the suite reported full coverage. The first sociable run caught it.

**Rules**

1. **Handler tests** (`tests/test_<handler>*.py`) — instantiate a real `SurrealDBLedgerAdapter` over `memory://` and seed rows with the production schema. Reference pattern: `tests/test_codegenome_continuity_service.py::_fresh_adapter` and `tests/test_sync_middleware.py::_make_real_adapter`.
2. **Ledger query tests** — never `MagicMock` the client. Use the real `LedgerClient(url="memory://", ...)` + `init_schema` + `migrate`.
3. **`ctx` should be `SimpleNamespace`, not `MagicMock`** — when a handler grows a new required field, `SimpleNamespace` raises `AttributeError` and the test fails honestly; `MagicMock` silently invents the field.
4. **Narrow seams are fine** when the alternative is impossible or fragile: patching `ledger.status.resolve_symbol_lines` to simulate a missing symbol (`tests/test_link_commit_grounding.py:185`), patching `handle_link_commit` when testing the *caller's* cache logic (not link_commit itself), patching `time.monotonic` for TTL math.
5. **Solitary is correct for** pure helpers (`_check_payload_size` standalone), external boundaries we can't run (`tests/test_backends_google_drive_unit.py`), and concurrency primitives that don't talk to collaborators (`repo_write_barrier` tests).

**Checklist before opening a tests-only PR**

- [ ] Does the test instantiate `MagicMock` for `ctx` or `ledger`? → Replace with `SimpleNamespace` + real adapter unless one of the "solitary is correct" exceptions applies.
- [ ] Does the test hand-craft a row dict that mimics what the ledger returns? → Seed the real ledger and let it produce the row.
- [ ] Does an `assert_called_once_with(<exact SQL or arg list>)` mirror the production code? → That's a tautology. Replace it with an assertion on observable behavior (what the user/agent sees).
- [ ] Does the failure mode under test (e.g. symbol disappeared, ledger crashed) actually require a patch? → Yes is fine; pin the patch to the narrowest seam.

## Auto-Tick Rule

After completing **any** implementation work in this directory:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![CI](https://img.shields.io/github/actions/workflow/status/BicameralAI/bicameral-mcp/test-mcp-regression.yml?branch=main&label=tests)](https://github.com/BicameralAI/bicameral-mcp/actions)

AI agents ship code fast. They forget what your team agreed — and most agreements emerge mid-flight, in corrections and side comments that never reach a doc.
AI agents ship code fast. They forget what your team agreed — and requirement gaps surfaced mid-implementation are buried under thousands of lines of code.

Bicameral MCP is a **spec compliance layer** for AI-assisted engineering. Local-first; runs as an [MCP server](https://spec.modelcontextprotocol.io/). It ingests your meeting transcripts, PRDs, and Slack threads, captures any mid-implementation decision that was not discussed, to be ratified async by your product owner, and pins each one to the code that implements it — so your agent finds out the moment it drifts from either the written spec or the spoken one.

Expand Down
2 changes: 1 addition & 1 deletion RECOMMENDED_VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.14.4
0.14.5
3 changes: 3 additions & 0 deletions audit_log.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ class AuditEventType(enum.StrEnum):
# #252 Layer 2 — wire-format sentinel observability
LEDGER_SCHEMA_VERIFIED = "ledger_schema_verified"
LEDGER_VERSION_DRIFT = "ledger_version_drift"
# #296 — recoverable schema-definition skip (init_schema deferring to migrate)
SCHEMA_DEFINE_SKIPPED = "schema_define_skipped"


_LEVEL_BY_EVENT: dict[AuditEventType, str] = {
Expand All @@ -84,6 +86,7 @@ class AuditEventType(enum.StrEnum):
AuditEventType.ERROR: "error",
AuditEventType.LEDGER_SCHEMA_VERIFIED: "info",
AuditEventType.LEDGER_VERSION_DRIFT: "warn",
AuditEventType.SCHEMA_DEFINE_SKIPPED: "warn",
}

_LEVEL_RANK = {"info": 10, "warn": 20, "error": 30}
Expand Down
78 changes: 58 additions & 20 deletions cli/_diagnose_gather.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,8 @@
_RECENT_EVENT_TAIL = 5


def _read_ledger_metadata(adapter) -> tuple[str, int | None, str | None]:
def _read_ledger_metadata_for_url(url: str) -> tuple[str, int | None, str | None]:
"""Return (ledger_url, size_bytes_or_None, mtime_iso_or_None)."""
url = getattr(adapter, "_url", "")
if not url.startswith("surrealkv://"):
return url, None, None
path_str = url.removeprefix("surrealkv://")
Expand All @@ -37,21 +36,23 @@ def _read_ledger_metadata(adapter) -> tuple[str, int | None, str | None]:
return url, stat.st_size, mtime_iso


async def _read_bicameral_meta(
adapter,
) -> tuple[str | None, str | None, str | None, str, str]:
"""Return (first_write, last_write, last_write_at_iso, drift_status, running).
def _read_ledger_metadata(adapter) -> tuple[str, int | None, str | None]:
return _read_ledger_metadata_for_url(getattr(adapter, "_url", ""))

``drift_status`` is one of: ``"first-write"`` / ``"match"`` / ``"drift"`` /
``"unavailable"`` (table missing, e.g., pre-Layer-2 ledger).
"""

async def _read_bicameral_meta_raw(
client,
) -> tuple[str | None, str | None, str | None, str, str]:
"""Same shape as ``_read_bicameral_meta`` but operates on a raw
``LedgerClient``. Used by the MCP ``bicameral_diagnose`` tool, which
must work without ``init_schema``/``migrate`` succeeding."""
try:
running = importlib.metadata.version("surrealdb")
except importlib.metadata.PackageNotFoundError:
running = "unknown"

try:
rows = await adapter._client.query("SELECT * FROM bicameral_meta LIMIT 1")
rows = await client.query("SELECT * FROM bicameral_meta LIMIT 1")
except Exception: # noqa: BLE001 — table missing is the load-bearing case
return None, None, None, "unavailable", running

Expand All @@ -71,9 +72,20 @@ async def _read_bicameral_meta(
return first, last, last_at_iso, "drift", running


async def _read_schema_version(adapter) -> int | None:
async def _read_bicameral_meta(
adapter,
) -> tuple[str | None, str | None, str | None, str, str]:
"""Return (first_write, last_write, last_write_at_iso, drift_status, running).

``drift_status`` is one of: ``"first-write"`` / ``"match"`` / ``"drift"`` /
``"unavailable"`` (table missing, e.g., pre-Layer-2 ledger).
"""
return await _read_bicameral_meta_raw(adapter._client)


async def _read_schema_version_raw(client) -> int | None:
try:
rows = await adapter._client.query("SELECT version FROM schema_meta LIMIT 1")
rows = await client.query("SELECT version FROM schema_meta LIMIT 1")
except Exception: # noqa: BLE001
return None
if not rows:
Expand All @@ -82,11 +94,15 @@ async def _read_schema_version(adapter) -> int | None:
return int(val) if val is not None else None


async def _read_table_counts(adapter) -> dict[str, int]:
async def _read_schema_version(adapter) -> int | None:
return await _read_schema_version_raw(adapter._client)


async def _read_table_counts_raw(client) -> dict[str, int]:
counts: dict[str, int] = {}
for table in _CANONICAL_TABLES:
try:
rows = await adapter._client.query(f"SELECT count() AS n FROM {table} GROUP ALL")
rows = await client.query(f"SELECT count() AS n FROM {table} GROUP ALL")
except Exception: # noqa: BLE001 — missing table is acceptable (pre-v16)
continue
if rows:
Expand All @@ -95,6 +111,10 @@ async def _read_table_counts(adapter) -> dict[str, int]:
return counts


async def _read_table_counts(adapter) -> dict[str, int]:
return await _read_table_counts_raw(adapter._client)


def _resolve_audit_log_channel() -> tuple[str, Path | None]:
"""Return (channel_label, configured_file_path_or_None)."""
raw = os.getenv("BICAMERAL_AUDIT_LOG", "stderr").strip()
Expand Down Expand Up @@ -195,19 +215,28 @@ def _fetch_recommended() -> str | None:
return None


async def gather_diagnosis(adapter) -> Diagnosis:
"""Collect every allowlisted field from the running install + ledger."""
async def gather_diagnosis_raw(client, ledger_url: str) -> Diagnosis:
"""Same allowlisted gather as ``gather_diagnosis`` but takes a raw
``LedgerClient`` and an explicit ``ledger_url``.

Used by the MCP ``bicameral_diagnose`` tool, which opens a raw client
so it can produce a report even when ``adapter.connect()`` (and its
init_schema / migrate calls) would crash on a corrupted ledger. The
CLI ``bicameral-mcp diagnose`` keeps using ``gather_diagnosis``
(adapter-based) because it benefits from the adapter's connection
lifecycle in the happy-path operator-bug-report flow.
"""
try:
bicameral_version = importlib.metadata.version("bicameral-mcp")
except importlib.metadata.PackageNotFoundError:
bicameral_version = "unknown"

from ledger.schema import SCHEMA_VERSION

ledger_url, size_bytes, mtime_iso = _read_ledger_metadata(adapter)
first, last, last_at_iso, drift_status, running = await _read_bicameral_meta(adapter)
schema_recorded = await _read_schema_version(adapter)
table_counts = await _read_table_counts(adapter)
_, size_bytes, mtime_iso = _read_ledger_metadata_for_url(ledger_url)
first, last, last_at_iso, drift_status, running = await _read_bicameral_meta_raw(client)
schema_recorded = await _read_schema_version_raw(client)
table_counts = await _read_table_counts_raw(client)
channel_label, audit_path = _resolve_audit_log_channel()
recent_events = _tail_recent_events(audit_path, _RECENT_EVENT_TAIL)

Expand Down Expand Up @@ -242,3 +271,12 @@ async def gather_diagnosis(adapter) -> Diagnosis:
recent_events=recent_events,
suggestions=suggestions,
)


async def gather_diagnosis(adapter) -> Diagnosis:
"""Adapter-flavoured wrapper over ``gather_diagnosis_raw``.

Reads the ledger URL off the adapter and forwards to the raw helper.
Existing CLI callers (`bicameral-mcp diagnose`) keep this entry point.
"""
return await gather_diagnosis_raw(adapter._client, getattr(adapter, "_url", ""))
37 changes: 37 additions & 0 deletions contracts.py
Original file line number Diff line number Diff line change
Expand Up @@ -650,6 +650,43 @@ class ResetResponse(BaseModel):
replay_plan: list[ResetReplayEntry] = []
replay_errors: list[str] = []
next_action: str
# #296 Layer E — automated rebuild from .bicameral/events/*.jsonl
# after wipe. Populated only when `replay_from_events=True` and
# `confirm=True`; reports how many events the materializer replayed.
events_replayed: int = 0


# ── Tool 8 (new): /bicameral_diagnose ────────────────────────────────


class DiagnoseResponse(BaseModel):
"""Read-only diagnostic snapshot. Mirrors the CLI ``bicameral-mcp
diagnose`` output but returns structured fields so agents can render
a recovery prompt deterministically.

`recovery_path` classifies the next operator action:
- ``clean`` — ledger looks healthy, no remediation needed
- ``fixable`` — schema is behind binary; next normal call migrates
- ``reset_rebuild`` — ledger broken AND events present → reset
with `replay_from_events=True` recovers without data loss
- ``reset_destructive`` — ledger broken AND no events → reset
loses decision history; user must explicitly accept

`diagnosis` carries the same structural-metadata-only fields the
CLI emits (see ``cli.diagnose.Diagnosis``); empty when the raw
client could not connect.
"""

ledger_url: str
connect_error: str = ""
recovery_path: Literal[
"clean",
"fixable",
"reset_rebuild",
"reset_destructive",
]
diagnosis: dict | None = None
next_action: str


# ── Tool 9: /bicameral_preflight ─────────────────────────────────────
Expand Down
Loading
Loading