BicameralAI · jinhongkuan · May 10, 2026 · May 10, 2026 · May 10, 2026 · May 10, 2026
@@ -77,6 +77,8 @@ jobs:
           tests/test_phase1_code_locator.py
           tests/test_phase2_ledger.py
           tests/test_phase3_integration.py
+          tests/test_legacy_ledger_fixtures.py
+          tests/test_schema_recoverable_errors.py
           -v --tb=short
           --junitxml=test-results/results.xml
           --html=test-results/report.html --self-contained-html

@@ -13,6 +13,26 @@ All notable changes to bicameral-mcp are tracked here. Format loosely follows
 
 - **#243 Piece B — Eager symbol-index initialization at server startup, fail-loud on empty index.** Pre-fix, `get_code_locator()` returned a fresh `RealCodeLocatorAdapter` per call AND lazy-initialized the symbol index on first `_ensure_initialized()` call — so the FIRST tool call after server boot paid the index-build cost AND could race the index check on concurrent dispatch. Post-fix, `get_code_locator()` is **singleton-by-`REPO_PATH`** (multi-repo correctness preserved) with a new `reset_code_locator_cache()` test-only hook. New `async def RealCodeLocatorAdapter.initialize()` wraps `_ensure_initialized()` in a thread-pool executor for the `serve_stdio()` startup path; idempotent on already-initialized adapters. `serve_stdio()` calls `await get_code_locator().initialize()` between the dashboard sidecar start and the consent-notice block — **fail-loud**: an empty/missing index now propagates `RuntimeError("Code locator index is empty...")` and aborts boot with a clear stderr message ("Run: python -m code_locator index <repo_path>") rather than silently degrading every preflight call. Lazy init via `_ensure_initialized` remains for test contexts where mock adapters bypass startup. 4 new tests in `tests/test_preflight_graph_expansion.py` cover singleton-by-`REPO_PATH` (with reset), `initialize()` idempotency, `RuntimeError` propagation, and the `serve_stdio` boot-refusal path. Together with Piece A: production deployments can no longer accumulate silent-fallback hours; either the index is healthy at boot OR the operator gets a loud failure they can act on. Refs #243 (parent #173 / PR #174). Plan signoff via https://github.com/BicameralAI/bicameral-mcp/issues/243#issuecomment-4414163338.
 
+## v0.14.5 — bicameral_diagnose tool + reset --replay-from-events + README polish (triage)
+
+Triages the #296 ledger-resilience track and the #299 README pass onto main. New `bicameral_diagnose` MCP tool gives operators a privacy-preserving structural read of their ledger (recovery_path classification + table counts + recent audit events) without crashing on init when the ledger is corrupt — pairs with `bicameral_reset --replay-from-events` for the dataloss-avoidant recovery path. README opens with a position-take pitch + three-scene demo video drop-in (ingest → preflight → ratify async).
+
+### Added
+
+- **`bicameral_diagnose` MCP tool (#296).** Read-only structural diagnosis that opens a raw `LedgerClient` (no `init_schema` / `migrate`) so it works even when the normal adapter connect crashes during init. Returns `recovery_path` classification (`clean` / `fixable` / `reset_rebuild` / `reset_destructive`), `schema_meta` state, table counts, recent `warn`|`error` audit events, and a `next_action` recommendation. Slash alias `/bicameral-diagnose`. New skill at `skills/bicameral-diagnose/SKILL.md`. Privacy-preserving — only structural shape leaves the machine, never decision content.
+- **`bicameral_reset --replay-from-events` flag (#296).** Rebuild the ledger from the team-mode JSONL event substrate instead of nuking decision rows. Use when the binary store is corrupt but the event log is intact — recovers without dataloss in team mode.
+- **Legacy-ledger fixtures + replay tests (#296).** New `tests/fixtures/legacy_ledgers/` holds reproducible byte-level corruption fixtures (e.g. `v3_yields_source_span.py` for the v16→v17 yields integrity bug). `tests/test_legacy_ledger_fixtures.py` and `tests/test_schema_recoverable_errors.py` exercise the recovery paths against these fixtures.
+
+### Fixed
+
+- **`ledger/schema.py`: resilient init + v16→v17 yields integrity cleanup (#296).** v0.14.4 ledgers with stale `source_span:...` records on the `yields.in` field rejected `DEFINE INDEX OVERWRITE idx_yields_unique` on every connect, blocking `ingest` / `history` / `preflight` until manual reset. The migration now sweeps these stale rows during the v16→v17 step and the init path tolerates the recovery without aborting. Pairs with `bicameral_diagnose` for operator visibility into whether the migration ran.
+
+### Documentation
+
+- **README opener rewrite (#299).** Two-paragraph position-take: paragraph 1 names the failure mode ("requirement gaps surfaced mid-implementation are buried under thousands of lines of code"); paragraph 2 introduces Bicameral MCP as a **spec compliance layer** for AI-assisted engineering that ingests transcripts / PRDs / Slack threads, captures any mid-implementation decision that was not discussed (to be ratified async by the product owner), and pins each one to the implementing code.
+- **README demo video section (#299).** Replaces the dashboard image transition with a three-beat demo loop — ingest (PM/dev) → preflight (auto) → ratify async (product owner) — each as an inline `user-attachments` video drop-in so the videos render on github.com without asset-path coupling.
+- **README star CTA relocation (#299).** Moved from the top header (where it sat awkwardly between the hero image and the logo) to a centered placement immediately after the demo videos — natural post-demo conversion beat.
+
 ## v0.14.4 — Hotfix: skip install-time manifest verification until release-side sigstore wiring lands
 
 Hotfix on v0.14.3. Unblocks every fresh `bicameral-mcp setup` against the published wheel — the v0.14.x wheels ship `skills-manifest.toml` and `hooks-manifest.json` but no `.sig`/`.crt` companions yet (release-side sigstore signing is a deferred follow-up of #218 LLM-06 / #237 LLM-11). The install-time verifiers were hitting a missing-signature path their own docstrings claim is unreachable, raising `SignatureError` and aborting setup.

@@ -20,6 +20,27 @@ worse than a compile error because it fails at runtime in production sessions.
 - [ ] Did a new tool get added? → Create `skills/<tool-name>/SKILL.md`
 - [ ] Did a status literal gain a new value (e.g. `"proposal"`)? → Update every skill that renders status
 
+## Sociable Testing for UX Paths (Mandatory for Handlers + Ledger)
+
+Default to **sociable unit tests** ([Martin Fowler, "On the Diverse And Fantastical Shape of Testing"](https://martinfowler.com/articles/2021-test-shapes.html)) for anything the MCP agent actually invokes: handlers under `handlers/`, ledger queries in `ledger/`, and the contracts they return. A test is **solitary** when it replaces a collaborator we ship to users (the `ctx`, the `ledger`, a handler in the call graph) with a `MagicMock` / `AsyncMock` / `patch(...)`; it's **sociable** when it runs the real collaborator and only seams off something we genuinely can't run in tests (network, time, external SaaS, an injected failure mode like "symbol disappears").
+
+The motivation is concrete: AI-authored tests skew solitary because mocks are easy to make pass. A solitary test for `get_session_start_banner` stayed green for months while `get_decisions_by_status` was selecting an undefined `decision_id` field and returning `None` for every banner row — agents saw null IDs in production while the suite reported full coverage. The first sociable run caught it.
+
+**Rules**
+
+1. **Handler tests** (`tests/test_<handler>*.py`) — instantiate a real `SurrealDBLedgerAdapter` over `memory://` and seed rows with the production schema. Reference pattern: `tests/test_codegenome_continuity_service.py::_fresh_adapter` and `tests/test_sync_middleware.py::_make_real_adapter`.
+2. **Ledger query tests** — never `MagicMock` the client. Use the real `LedgerClient(url="memory://", ...)` + `init_schema` + `migrate`.
+3. **`ctx` should be `SimpleNamespace`, not `MagicMock`** — when a handler grows a new required field, `SimpleNamespace` raises `AttributeError` and the test fails honestly; `MagicMock` silently invents the field.
+4. **Narrow seams are fine** when the alternative is impossible or fragile: patching `ledger.status.resolve_symbol_lines` to simulate a missing symbol (`tests/test_link_commit_grounding.py:185`), patching `handle_link_commit` when testing the *caller's* cache logic (not link_commit itself), patching `time.monotonic` for TTL math.
+5. **Solitary is correct for** pure helpers (`_check_payload_size` standalone), external boundaries we can't run (`tests/test_backends_google_drive_unit.py`), and concurrency primitives that don't talk to collaborators (`repo_write_barrier` tests).
+
+**Checklist before opening a tests-only PR**
+
+- [ ] Does the test instantiate `MagicMock` for `ctx` or `ledger`? → Replace with `SimpleNamespace` + real adapter unless one of the "solitary is correct" exceptions applies.
+- [ ] Does the test hand-craft a row dict that mimics what the ledger returns? → Seed the real ledger and let it produce the row.
+- [ ] Does an `assert_called_once_with(<exact SQL or arg list>)` mirror the production code? → That's a tautology. Replace it with an assertion on observable behavior (what the user/agent sees).
+- [ ] Does the failure mode under test (e.g. symbol disappeared, ledger crashed) actually require a patch? → Yes is fine; pin the patch to the narrowest seam.
+
 ## Auto-Tick Rule
 
 After completing **any** implementation work in this directory:

@@ -9,7 +9,7 @@
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
 [![CI](https://img.shields.io/github/actions/workflow/status/BicameralAI/bicameral-mcp/test-mcp-regression.yml?branch=main&label=tests)](https://github.com/BicameralAI/bicameral-mcp/actions)
 
-AI agents ship code fast. They forget what your team agreed — and most agreements emerge mid-flight, in corrections and side comments that never reach a doc.
+AI agents ship code fast. They forget what your team agreed — and requirement gaps surfaced mid-implementation are buried under thousands of lines of code.
 
 Bicameral MCP is a **spec compliance layer** for AI-assisted engineering. Local-first; runs as an [MCP server](https://spec.modelcontextprotocol.io/). It ingests your meeting transcripts, PRDs, and Slack threads, captures any mid-implementation decision that was not discussed, to be ratified async by your product owner, and pins each one to the code that implements it — so your agent finds out the moment it drifts from either the written spec or the spoken one.
 

@@ -1 +1 @@
-0.14.4
+0.14.5
@@ -71,6 +71,8 @@ class AuditEventType(enum.StrEnum):
     # #252 Layer 2 — wire-format sentinel observability
     LEDGER_SCHEMA_VERIFIED = "ledger_schema_verified"
     LEDGER_VERSION_DRIFT = "ledger_version_drift"
+    # #296 — recoverable schema-definition skip (init_schema deferring to migrate)
+    SCHEMA_DEFINE_SKIPPED = "schema_define_skipped"
 
 
 _LEVEL_BY_EVENT: dict[AuditEventType, str] = {
@@ -84,6 +86,7 @@ class AuditEventType(enum.StrEnum):
     AuditEventType.ERROR: "error",
     AuditEventType.LEDGER_SCHEMA_VERIFIED: "info",
     AuditEventType.LEDGER_VERSION_DRIFT: "warn",
+    AuditEventType.SCHEMA_DEFINE_SKIPPED: "warn",
 }
 
 _LEVEL_RANK = {"info": 10, "warn": 20, "error": 30}

@@ -23,9 +23,8 @@
 _RECENT_EVENT_TAIL = 5
 
 
-def _read_ledger_metadata(adapter) -> tuple[str, int | None, str | None]:
+def _read_ledger_metadata_for_url(url: str) -> tuple[str, int | None, str | None]:
     """Return (ledger_url, size_bytes_or_None, mtime_iso_or_None)."""
-    url = getattr(adapter, "_url", "")
     if not url.startswith("surrealkv://"):
         return url, None, None
     path_str = url.removeprefix("surrealkv://")
@@ -37,21 +36,23 @@ def _read_ledger_metadata(adapter) -> tuple[str, int | None, str | None]:
     return url, stat.st_size, mtime_iso
 
 
-async def _read_bicameral_meta(
-    adapter,
-) -> tuple[str | None, str | None, str | None, str, str]:
-    """Return (first_write, last_write, last_write_at_iso, drift_status, running).
+def _read_ledger_metadata(adapter) -> tuple[str, int | None, str | None]:
+    return _read_ledger_metadata_for_url(getattr(adapter, "_url", ""))
 
-    ``drift_status`` is one of: ``"first-write"`` / ``"match"`` / ``"drift"`` /
-    ``"unavailable"`` (table missing, e.g., pre-Layer-2 ledger).
-    """
+
+async def _read_bicameral_meta_raw(
+    client,
+) -> tuple[str | None, str | None, str | None, str, str]:
+    """Same shape as ``_read_bicameral_meta`` but operates on a raw
+    ``LedgerClient``. Used by the MCP ``bicameral_diagnose`` tool, which
+    must work without ``init_schema``/``migrate`` succeeding."""
     try:
         running = importlib.metadata.version("surrealdb")
     except importlib.metadata.PackageNotFoundError:
         running = "unknown"
 
     try:
-        rows = await adapter._client.query("SELECT * FROM bicameral_meta LIMIT 1")
+        rows = await client.query("SELECT * FROM bicameral_meta LIMIT 1")
     except Exception:  # noqa: BLE001 — table missing is the load-bearing case
         return None, None, None, "unavailable", running
 
@@ -71,9 +72,20 @@ async def _read_bicameral_meta(
     return first, last, last_at_iso, "drift", running
 
 
-async def _read_schema_version(adapter) -> int | None:
+async def _read_bicameral_meta(
+    adapter,
+) -> tuple[str | None, str | None, str | None, str, str]:
+    """Return (first_write, last_write, last_write_at_iso, drift_status, running).
+
+    ``drift_status`` is one of: ``"first-write"`` / ``"match"`` / ``"drift"`` /
+    ``"unavailable"`` (table missing, e.g., pre-Layer-2 ledger).
+    """
+    return await _read_bicameral_meta_raw(adapter._client)
+
+
+async def _read_schema_version_raw(client) -> int | None:
     try:
-        rows = await adapter._client.query("SELECT version FROM schema_meta LIMIT 1")
+        rows = await client.query("SELECT version FROM schema_meta LIMIT 1")
     except Exception:  # noqa: BLE001
         return None
     if not rows:
@@ -82,11 +94,15 @@ async def _read_schema_version(adapter) -> int | None:
     return int(val) if val is not None else None
 
 
-async def _read_table_counts(adapter) -> dict[str, int]:
+async def _read_schema_version(adapter) -> int | None:
+    return await _read_schema_version_raw(adapter._client)
+
+
+async def _read_table_counts_raw(client) -> dict[str, int]:
     counts: dict[str, int] = {}
     for table in _CANONICAL_TABLES:
         try:
-            rows = await adapter._client.query(f"SELECT count() AS n FROM {table} GROUP ALL")
+            rows = await client.query(f"SELECT count() AS n FROM {table} GROUP ALL")
         except Exception:  # noqa: BLE001 — missing table is acceptable (pre-v16)
             continue
         if rows:
@@ -95,6 +111,10 @@ async def _read_table_counts(adapter) -> dict[str, int]:
     return counts
 
 
+async def _read_table_counts(adapter) -> dict[str, int]:
+    return await _read_table_counts_raw(adapter._client)
+
+
 def _resolve_audit_log_channel() -> tuple[str, Path | None]:
     """Return (channel_label, configured_file_path_or_None)."""
     raw = os.getenv("BICAMERAL_AUDIT_LOG", "stderr").strip()
@@ -195,19 +215,28 @@ def _fetch_recommended() -> str | None:
         return None
 
 
-async def gather_diagnosis(adapter) -> Diagnosis:
-    """Collect every allowlisted field from the running install + ledger."""
+async def gather_diagnosis_raw(client, ledger_url: str) -> Diagnosis:
+    """Same allowlisted gather as ``gather_diagnosis`` but takes a raw
+    ``LedgerClient`` and an explicit ``ledger_url``.
+
+    Used by the MCP ``bicameral_diagnose`` tool, which opens a raw client
+    so it can produce a report even when ``adapter.connect()`` (and its
+    init_schema / migrate calls) would crash on a corrupted ledger. The
+    CLI ``bicameral-mcp diagnose`` keeps using ``gather_diagnosis``
+    (adapter-based) because it benefits from the adapter's connection
+    lifecycle in the happy-path operator-bug-report flow.
+    """
     try:
         bicameral_version = importlib.metadata.version("bicameral-mcp")
     except importlib.metadata.PackageNotFoundError:
         bicameral_version = "unknown"
 
     from ledger.schema import SCHEMA_VERSION
 
-    ledger_url, size_bytes, mtime_iso = _read_ledger_metadata(adapter)
-    first, last, last_at_iso, drift_status, running = await _read_bicameral_meta(adapter)
-    schema_recorded = await _read_schema_version(adapter)
-    table_counts = await _read_table_counts(adapter)
+    _, size_bytes, mtime_iso = _read_ledger_metadata_for_url(ledger_url)
+    first, last, last_at_iso, drift_status, running = await _read_bicameral_meta_raw(client)
+    schema_recorded = await _read_schema_version_raw(client)
+    table_counts = await _read_table_counts_raw(client)
     channel_label, audit_path = _resolve_audit_log_channel()
     recent_events = _tail_recent_events(audit_path, _RECENT_EVENT_TAIL)
 
@@ -242,3 +271,12 @@ async def gather_diagnosis(adapter) -> Diagnosis:
         recent_events=recent_events,
         suggestions=suggestions,
     )
+
+
+async def gather_diagnosis(adapter) -> Diagnosis:
+    """Adapter-flavoured wrapper over ``gather_diagnosis_raw``.
+
+    Reads the ledger URL off the adapter and forwards to the raw helper.
+    Existing CLI callers (`bicameral-mcp diagnose`) keep this entry point.
+    """
+    return await gather_diagnosis_raw(adapter._client, getattr(adapter, "_url", ""))
@@ -650,6 +650,43 @@ class ResetResponse(BaseModel):
     replay_plan: list[ResetReplayEntry] = []
     replay_errors: list[str] = []
     next_action: str
+    # #296 Layer E — automated rebuild from .bicameral/events/*.jsonl
+    # after wipe. Populated only when `replay_from_events=True` and
+    # `confirm=True`; reports how many events the materializer replayed.
+    events_replayed: int = 0
+
+
+# ── Tool 8 (new): /bicameral_diagnose ────────────────────────────────
+
+
+class DiagnoseResponse(BaseModel):
+    """Read-only diagnostic snapshot. Mirrors the CLI ``bicameral-mcp
+    diagnose`` output but returns structured fields so agents can render
+    a recovery prompt deterministically.
+
+    `recovery_path` classifies the next operator action:
+      - ``clean`` — ledger looks healthy, no remediation needed
+      - ``fixable`` — schema is behind binary; next normal call migrates
+      - ``reset_rebuild`` — ledger broken AND events present → reset
+        with `replay_from_events=True` recovers without data loss
+      - ``reset_destructive`` — ledger broken AND no events → reset
+        loses decision history; user must explicitly accept
+
+    `diagnosis` carries the same structural-metadata-only fields the
+    CLI emits (see ``cli.diagnose.Diagnosis``); empty when the raw
+    client could not connect.
+    """
+
+    ledger_url: str
+    connect_error: str = ""
+    recovery_path: Literal[
+        "clean",
+        "fixable",
+        "reset_rebuild",
+        "reset_destructive",
+    ]
+    diagnosis: dict | None = None
+    next_action: str
 
 
 # ── Tool 9: /bicameral_preflight ─────────────────────────────────────