triage→main: resilient ledger migration + bicameral_diagnose + reset --replay-from-events (#296) + README demo videos & opener rewrite (#299) by jinhongkuan · Pull Request #298 · BicameralAI/bicameral-mcp

jinhongkuan · 2026-05-10T07:14:35Z

Summary

Triages the resilient-ledger-migration work from #297 (against dev) onto main via the triage-from-dev branch. Identical commit set, cherry-picked off origin/main after a clean reset.

Companion P0 issue: #296 (replay determinism regression suite).

What ships

Same four logical layers as #297:

fix(ledger) — tolerant init_schema (expected a record< substring catch + SCHEMA_DEFINE_SKIPPED audit event) + extracted _clean_yields_legacy_rows / _dedupe_yields helpers + new _migrate_v16_to_v17 that re-runs the cleanup. SCHEMA_VERSION 16 → 17.
feat(diagnose) — bicameral.diagnose MCP tool that opens a raw LedgerClient (no init/migrate), so it works even when normal connect crashes. Returns a structured recovery_path (clean / fixable / reset_rebuild / reset_destructive). Shared gather_diagnosis_raw between CLI and MCP — one body, two thin wrappers.
feat(reset) — bicameral_reset(replay_from_events=true) reuses EventMaterializer.replay_new_events to rebuild the ledger from .bicameral/events/*.jsonl after wipe. Mutually exclusive with wipe_mode="full".
test(ledger) — fixture-replay regression (parametrized over frozen DB shapes — first fixture is the user's exact dogfood symptom) + error-format provocation suite (locks down the SurrealDB substring contract).

Plus the three lint/type follow-ups from #297 review iteration:

chore(lint) — handlers.diagnose import order
chore(format) — ruff format on the five new/touched files
fix(types) — RecoveryPath Literal alias for mypy

Why a separate PR to main (and not waiting on #297 to merge dev → main)

The dogfood crash motivating this work bricks every ledger op until the v17 migration runs. Going through dev → main on the normal cadence delays the unblock for any operator on main who hits the same bad-row state. Triaging directly onto main via triage-from-dev matches the convention used for #290 / #291 hotfixes earlier in this milestone.

Test plan

Local: pytest tests/test_schema_recoverable_errors.py tests/test_legacy_ledger_fixtures.py tests/test_diagnose_gather.py tests/test_reset.py → 27 passed.

CI green on the regression matrix (Linux + Windows).
CI green on the new fixture-replay suite + error-format provocation.
CI green on ruff + mypy.
Manual: dogfood ledger upgrades cleanly to v17 and the legacy source_span rows in yields are removed.
Manual: bicameral.diagnose returns recovery_path: "clean" after upgrade.

Companion PRs / issues

feat(ledger): resilient migration + bicameral_diagnose + reset --replay-from-events (#296) #297 (triage/resilient-ledger-migration → dev) — same commits, going through the dev queue.
#296 (P0) — replay determinism regression suite, gates Layer E's correctness claim.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a diagnostic tool for read-only ledger connectivity and health assessment.
- Added event replay capability to rebuild ledger data from stored event logs after reset.
Bug Fixes
- Enhanced robustness of schema definition handling for recoverable SurrealDB errors.
- Added cleanup migration for legacy ledger data corruption in the yields table.
Tests
- Added regression test suite for legacy ledger fixture validation and schema error pattern matching.

Two paired changes that together unblock any ledger whose schema_meta.version rolled past 5 with v3-era source_span rows still in the yields table — the dogfood crash that motivated this work. **Tolerant init_schema (Layer A).** \`_execute_define_idempotent\` now recognizes a small allowlist of SurrealDB error substrings as recoverable: existing "already exists" and "already contains" plus the new "expected a record<" (TYPE constraint mismatch when DEFINE TABLE/INDEX OVERWRITE re-validates legacy rows) and a defensive "but expected" for adjacent value-type errors. The patterns live in module-level RECOVERABLE_DEFINE_PATTERNS so the substring contract is testable. When a recoverable error fires, init logs a warning and emits a SCHEMA_DEFINE_SKIPPED audit event so the next migration's cleanup is visible in the audit log. **Re-runnable yields cleanup (Layer B).** The v4→v5 cleanup body is extracted into shared \`_clean_yields_legacy_rows\` + \`_dedupe_yields\` helpers. v4→v5 still calls them, so historical behaviour is unchanged. The new \`_migrate_v16_to_v17\` calls the same helpers, which means any DB that ran past v5 with the corruption still present gets the cleanup on the v16→v17 boundary instead of permanently — the migrate() loop only iterates \`range(current+1, SCHEMA_VERSION+1)\`, so a one-shot historical migration is permanently unreachable for users who upgraded later. \`SCHEMA_VERSION\` bumps to 17. SCHEMA_COMPATIBILITY[17] is a placeholder release-eng pins at PR merge. **Audit log.** New \`AuditEventType.SCHEMA_DEFINE_SKIPPED\` (warn-level) emitted by \`_execute_define_idempotent\` whenever a UNIQUE-violation or type-mismatch is swallowed — gives operators visibility into the in-flight cleanup without breaking init. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…296) Mirrors the existing \`bicameral-mcp diagnose\` CLI as an MCP tool so agents can fire it from any tool-error envelope. Critical property: the handler opens a raw \`LedgerClient\` and never goes through \`adapter.connect()\` — which means it produces a useful diagnosis even when normal connect crashes inside \`init_schema\` / \`migrate\`, the exact failure mode that motivated #296. **Refactor.** \`cli/_diagnose_gather.py\` grows three raw-client variants (\`_read_bicameral_meta_raw\`, \`_read_schema_version_raw\`, \`_read_table_counts_raw\`) and a \`gather_diagnosis_raw(client, ledger_url)\` entry point. The existing adapter-flavoured \`gather_diagnosis(adapter)\` becomes a one-line wrapper, so the CLI keeps its current shape and the test suite (\`test_diagnose_*\`) is unchanged. **New handler + tool.** \`handlers/diagnose.handle_diagnose\` opens a raw client against the resolved ledger URL, calls \`gather_diagnosis_raw\`, and classifies the result into a \`recovery_path\`: - \`clean\` — schema matches binary, tables look sane. - \`fixable\` — schema behind binary; next normal connect migrates. - \`reset_rebuild\` — ledger broken AND \`.bicameral/events/\` has events on disk; \`bicameral_reset(replay_from_events=true, confirm=true)\` recovers without data loss. - \`reset_destructive\` — ledger broken AND no events; reset loses decision history, user must accept. \`next_action\` is a human-readable string the agent renders verbatim. Wired into \`server.py\` list_tools + dispatch as \`bicameral.diagnose\`. Pydantic contract \`DiagnoseResponse\` lives alongside \`ResetResponse\` in \`contracts.py\`. **SKILL contract.** \`skills/bicameral-diagnose/SKILL.md\` documents auto-fire from any schema/migration error envelope, the recovery-path → command matrix, and the read-only invariant (repair is CLI-driven, never an in-session agent action). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e (#296) \`bicameral_reset(replay_from_events=True, confirm=True)\` (only valid with \`wipe_mode="ledger"\`) wipes the materialized DB, resets the team-mode watermark, and replays every event in \`.bicameral/events/*.jsonl\` back through the same ingest path team mode uses. The event log is the canonical record (committed to git in team mode); replay is recovery, not destruction. Implementation reuses \`EventMaterializer.replay_new_events\` — the exact code team mode runs on every sync — so replay-vs-live divergence is impossible by construction. Determinism of the underlying canonical_id resolution is tracked separately under #296 (P0 follow-up: regression suite covering every event type the materializer dispatches on). **Behaviour.** - Dry run reports on-disk event count via \`events_replayed\` so the user sees scale before confirming. - Post-confirm response reports actual replayed count + any \`replay_errors\` (wipe still committed; replay errors don't destroy data, only block recovery). - \`replay_from_events=True\` + \`wipe_mode="full"\` is rejected with a clear error — full-wipe deletes the substrate we'd replay from. - Honours \`BICAMERAL_DATA_PATH\` symmetrically with \`adapters/ledger.py\`'s team-mode write path so the replay reads from the same directory the writers wrote to. **Skill.** \`skills/bicameral-reset/SKILL.md\` documents the new flag, the exclusivity rule with full-wipe, and the recommended pairing with \`bicameral_diagnose\`'s \`recovery_path="reset_rebuild"\`. **Contracts.** \`ResetResponse.events_replayed\` (default 0) carries the count. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two CI test classes that lock down the safety contract for layers A and B. Both run on every PR via the existing \`test-mcp-regression.yml\` workflow. **Fixture-replay suite (\`test_legacy_ledger_fixtures.py\`).** Frozen DB shapes under \`tests/fixtures/legacy_ledgers/\`. Each fixture is a Python module exporting an async \`build(client)\` coroutine that mutates the client into a known-historical bad state. The parametrized test: 1. Builds the bad state in a \`memory://\` client. 2. Runs the production \`init_schema\` + \`migrate\` path. 3. Asserts schema reaches \`SCHEMA_VERSION\`, fixture-specific invariants hold, and a second \`init_schema\` + \`migrate\` is a no-op (idempotent). First fixture is the user's exact dogfood symptom: \`v3_yields_source_span\` — \`yields.in = source_span:*\` rows surviving past v5, schema_meta.version = 16 to force the v16→v17 boundary. Adding a new fixture later requires zero test code: register it in \`FIXTURES\` and the parametrized test runs against it. README under the fixtures dir documents the registration shape. **Error-format provocation suite (\`test_schema_recoverable_errors.py\`).** Three tests that lock down the SurrealDB error-string contract layer A depends on: - UNIQUE-violation produces a substring matching \`RECOVERABLE_DEFINE_PATTERNS\` (existing v4→v5 path). - Type-mismatch on \`DEFINE INDEX OVERWRITE\` against legacy rows produces a matching substring (the #296 scenario). - All entries in \`RECOVERABLE_DEFINE_PATTERNS\` are lowercase (the catch lower-cases the message before substring match). When a future \`surrealdb-py\` bump changes the format, these tests fail loudly with the exact message the maintainer needs to update the patterns list — turning a silent regression into a known-loud signal. **CI wiring.** \`test-mcp-regression.yml\` lists the two new files in the pytest invocation. \`test_ledger_bicameral_meta_migration.py\` had a hardcoded \`assert SCHEMA_VERSION == 16\` updated to \`assert SCHEMA_VERSION >= 16\` so future bumps land cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ruff I001 — alphabetical block. handlers.diagnose belongs after handlers.bind, not after handlers.ratify. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`ruff format --check .` was failing on the five files added in this PR. No behavioural changes — purely whitespace/line-wrapping per the project's ruff config. 42-test smoke pass green after the reformat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mypy flagged DiagnoseResponse.recovery_path as expecting Literal['clean','fixable','reset_rebuild','reset_destructive'] while _classify_recovery returned plain str. Add a `RecoveryPath` type alias and use it on both the function signature and the intermediate `path` binding for the rec>exp branch. Pure typing — no runtime change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-10T07:14:47Z

📝 Walkthrough

Walkthrough

This PR introduces comprehensive ledger diagnostics and recovery workflows: a new read-only bicameral.diagnose MCP tool that reports ledger health and recovery paths (clean, fixable, reset_rebuild, reset_destructive), schema version bump to v17 with yields table cleanup migrations, event replay support in the reset handler, and test infrastructure validating recovery paths through fixture-based corruption scenarios.

Changes

Ledger Diagnostics & Recovery Workflows

Layer / File(s)	Summary
Data Contracts & Audit Events `contracts.py`, `audit_log.py`	`DiagnoseResponse` defines ledger health snapshot with recovery paths and connection errors; `ResetResponse` adds `events_replayed` counter; audit system adds `SCHEMA_DEFINE_SKIPPED` event type.
Schema Versions & Migrations `ledger/schema.py`	Increments `SCHEMA_VERSION` to 17; introduces `RECOVERABLE_DEFINE_PATTERNS` for idempotent DEFINE handling; refactors yields cleanup into helpers and adds v16→v17 migration to remediate unreachable v4→v5 corruption.
Diagnosis Gathering (Raw Client) `cli/_diagnose_gather.py`	Refactors diagnosis gathering into URL-based and raw client-driven helpers; `gather_diagnosis_raw(client, ledger_url)` assembles full diagnosis for use when adapter fails; adapter-based wrapper delegates to raw implementation.
Diagnose MCP Handler & Recovery Classification `handlers/diagnose.py`	Implements `handle_diagnose(ctx)` to resolve ledger URL, gather diagnosis, classify recovery path based on schema version and event presence, and return `DiagnoseResponse` with next-action guidance; includes `_classify_recovery` and `_events_present` helpers.
Reset Handler with Event Replay `handlers/reset.py`	Extends `handle_reset` with `replay_from_events` parameter; rejects replay+full-wipe combination; for ledger-wipe+replay, counts on-disk events, executes replay via `EventMaterializer.replay_new_events` after wipe, captures replay failures in response.
MCP Server Tool Registration `server.py`	Imports `handle_diagnose`; registers `bicameral.diagnose` MCP tool with empty input schema; wires tool dispatcher to route `diagnose` calls to handler.
Skill Documentation & Fixture Guide `skills/bicameral-diagnose/SKILL.md`, `skills/bicameral-reset/SKILL.md`, `tests/fixtures/legacy_ledgers/README.md`	Defines diagnostic trigger conditions, recovery-path matrix, and auto-invocation flow; documents `replay_from_events` mode constraints; provides legacy fixture framework and contributor guide.
Legacy Ledger Fixtures & Recovery Tests `tests/fixtures/legacy_ledgers/v3_yields_source_span.py`, `tests/test_legacy_ledger_fixtures.py`, `tests/test_schema_recoverable_errors.py`, `tests/test_ledger_bicameral_meta_migration.py`, `.github/workflows/test-mcp-regression.yml`	Adds v3-era corruption fixture; parametrized regression suite validating cleanup and idempotency; contract tests for recoverable DEFINE patterns; updates migration test expectations; registers new test modules in CI.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

BicameralAI/bicameral-mcp#29: Both PRs modify the CI workflow test configuration (.github/workflows/test-mcp-regression.yml) and establish legacy ledger fixture infrastructure for regression testing.
BicameralAI/bicameral-mcp#12: Both PRs modify the reset handler (handlers/reset.py) and ResetResponse contract to support new recovery workflows.
BicameralAI/bicameral-mcp#128: Both PRs interact with the event replay surface (EventMaterializer.replay_new_events, .bicameral/events/*.jsonl reading/writing).

Suggested labels

flow:release

Suggested reviewers

Knapp-Kevin

Poem

🐰 A diagnosis so keen, now your ledger is seen,
Recovery paths clear as the morning dew—
From yields with old ties to a v17 reprise,
Replaying events to rebuild anew! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.17% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The PR title comprehensively summarizes the four main logical layers being implemented: resilient ledger migration, bicameral_diagnose, reset --replay-from-events, plus supporting test infrastructure and documentation updates, aligned with the objectives and all major file changes across ledger/schema, handlers, contracts, and test suites.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch triage-from-dev

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server.py (1)
137-156: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Smoke test will fail — EXPECTED_TOOL_NAMES is missing bicameral.diagnose.

list_tools() now returns 19 tools (the 18 listed here plus the new bicameral.diagnose registered at lines 822–834), but run_smoke_test() does a strict equality compare:
if tool_names != EXPECTED_TOOL_NAMES:
    raise RuntimeError(f"Unexpected MCP tool registry: {tool_names!r} != {EXPECTED_TOOL_NAMES!r}")
This means bicameral-mcp --smoke-test (and any CI step that calls it) will raise immediately. Add the new tool name in the same slot order as list_tools() (between bicameral.usage_summary and validate_symbols).
🔧 Proposed fix
     "bicameral.feedback",
     "bicameral.usage_summary",
+    "bicameral.diagnose",
     "validate_symbols",
     "get_neighbors",
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server.py` around lines 137 - 156, The EXPECTED_TOOL_NAMES constant is
missing the newly registered tool "bicameral.diagnose", causing
run_smoke_test()'s strict equality check against list_tools() to fail; update
EXPECTED_TOOL_NAMES to include "bicameral.diagnose" in the exact position shown
by list_tools() (between "bicameral.usage_summary" and "validate_symbols") so
the order and contents match list_tools() and the smoke test passes.

🧹 Nitpick comments (4)

server.py (1)

822-834: ⚡ Quick win

Tool registration and dispatch wiring look correct.

The empty input schema ({"type": "object", "properties": {}}) is appropriate for a no-arg read-only probe, and the dispatch arm correctly aliases both bicameral.diagnose and the bare diagnose form for parity with the rest of the registry. replay_from_events=arguments.get("replay_from_events", False) defaults match the handler signature and avoid breaking existing reset callers.

One thing to consider: the bicameral.reset tool's inputSchema (lines 331–357) doesn't advertise the new replay_from_events property. Agents discovering tools via list_tools won't see it; the dispatch will accept it from arguments because the handler defaults to False, but the tool description should document the new flag for the call site.

🔧 Suggested addition to `bicameral.reset` inputSchema

                     "wipe_mode": {
                         "type": "string",
                         "enum": ["ledger", "full"],
                         "default": "ledger",
                         "description": (
                             "'ledger' (default): wipe materialized DB rows only — config and event "
                             "files are preserved, server stays live. Use for bug/pollution recovery. "
                             "'full': delete the entire .bicameral/ directory. Nuclear option — "
                             "removes config, team event history, and all data. Always confirm "
                             "with the user after showing the dry-run warning."
                         ),
                     },
+                    "replay_from_events": {
+                        "type": "boolean",
+                        "default": False,
+                        "description": (
+                            "When true (only valid with wipe_mode='ledger'), wipes the materialized "
+                            "DB and rebuilds it by replaying .bicameral/events/*.jsonl through the "
+                            "ingest path. Mutually exclusive with wipe_mode='full' — full-wipe "
+                            "deletes the events substrate. Use after bicameral.diagnose returns "
+                            "recovery_path='reset_rebuild'."
+                        ),
+                    },
                 },
             },
         ),

Also applies to: 1043-1052

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server.py` around lines 822 - 834, The bicameral.reset tool's inputSchema is
missing the newly supported replay_from_events flag, so update the inputSchema
for Tool(name="bicameral.reset") to include a boolean property
"replay_from_events" (default false) and a description so agents that call
list_tools can discover it; ensure the schema key name matches the argument used
in the handler (replay_from_events) and keep the handler's default
(replay_from_events=False) unchanged so behavior remains backward-compatible.

tests/fixtures/legacy_ledgers/README.md (1)

25-33: ⚡ Quick win

Document the optional assert_clean(client) hook.

tests/test_legacy_ledger_fixtures.py calls module.assert_clean(c) when present (and the v3 fixture exports one), but the authoring section here only mentions build(client). New fixture authors will not know the contract for fixture-specific invariants.

📝 Proposed README addition

 ```python
 # tests/fixtures/legacy_ledgers/<name>.py
 async def build(client):
     await client.execute("…raw SurrealQL that produces the bad state…")
+
+# Optional: asserted after init_schema + migrate runs in the
+# parametrized regression suite. Use it for fixture-specific
+# post-cleanup invariants beyond the generic schema_meta.version
+# check.
+async def assert_clean(client):
+    rows = await client.query("…")
+    assert …, "describe what stale state would have leaked through"
 ```

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/fixtures/legacy_ledgers/README.md` around lines 25 - 33, Update the
README entry for legacy ledger fixtures to document the optional
assert_clean(client) hook (in addition to build(client)); state that
tests/test_legacy_ledger_fixtures.py will call module.assert_clean(c) when
present (and v3 fixtures should export it), that it runs after init_schema +
migrate in the parametrized regression suite, and that authors should implement
async def assert_clean(client) to query the DB and assert any fixture-specific
post-cleanup invariants (raising/asserting with a descriptive message when stale
state is detected).

tests/test_legacy_ledger_fixtures.py (1)

26-39: 💤 Low value

Registry ergonomics: hard-coded import partially defeats the "register-only" promise.

The docstring states "Adding a new fixture requires no test code — register it in FIXTURES" (line 11-12), but the current shape requires both a top-level import v3_yields_source_span line and a FIXTURES entry. A future contributor following the docstring will register a fixture and watch the suite silently skip it (or rather, fail on NameError). Consider discovering fixtures via pkgutil.iter_modules(_FIXTURES_DIR) or importlib.import_module(slug) at module load and dropping the hard-coded import.
🔧 Sketch of registry-only loading
-import v3_yields_source_span  # noqa: E402 — see sys.path comment
-
-# Each entry: (slug, module). Module must export `build(client)` and
-# may export `assert_clean(client)` for fixture-specific invariants.
-FIXTURES = [
-    ("v3_yields_source_span", v3_yields_source_span),
-]
+# Slugs of fixture modules under tests/fixtures/legacy_ledgers/.
+# Each module must export an async `build(client)` and may export
+# an async `assert_clean(client)`.
+_FIXTURE_SLUGS = ["v3_yields_source_span"]
+FIXTURES = [(slug, importlib.import_module(slug)) for slug in _FIXTURE_SLUGS]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_legacy_ledger_fixtures.py` around lines 26 - 39, The hard-coded
import of v3_yields_source_span defeats the "register-only" promise; instead
dynamically discover and import modules from _FIXTURES_DIR and populate FIXTURES
by iterating over module names (e.g., using pkgutil.iter_modules or
importlib.import_module on the entries found in _FIXTURES_DIR) so you only need
to register the slug; keep the sys.path insertion for _FIXTURES_DIR, iterate
discovered module names to import each module (verify the module exports
build(client) and optional assert_clean(client)), and construct FIXTURES as a
list of (slug, module) tuples rather than relying on a literal import like
v3_yields_source_span.

ledger/schema.py (1)

525-525: 💤 Low value

Parameterize DELETE statements to eliminate SQL-injection linting warning.

OpenGrep flags DELETE {row['id']} (lines 525, 551) as SQL-injection shape. While row['id'] is DB-derived (not user input) and risk is low, SurrealDB 2.x supports parameter binding for DELETE targets. Use:
await client.execute("DELETE $id", {"id": row['id']})
This eliminates the linting noise without changing logic.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ledger/schema.py` at line 525, The DELETE calls currently build SQL via
f"DELETE {row['id']}" which triggers SQL-injection lint warnings; change these
to use SurrealDB parameter binding by calling client.execute with a
parameterized query like "DELETE $id" and passing the params dict {"id":
row['id']}; update each occurrence where client.execute is invoked with f"DELETE
{row['id']}" (and any similar string-built DELETE) to use the parameterized form
to remove the lint warning while preserving behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@handlers/diagnose.py`:
- Around line 156-166: The _events_present function currently checks only the
ledger_url parent and ignores BICAMERAL_DATA_PATH; update it to reuse the same
resolver used by handlers.reset (either import and call
handlers.reset:_resolve_events_dir or move that resolver to a shared helper and
call it) so the events directory is resolved identically to handlers.reset.py,
then check that resolved path for any "*.jsonl" files (keep function name
_events_present and the same boolean return semantics).
- Around line 139-153: The current rec is None branch from
_read_schema_version_raw can misclassify unreadable/corrupted ledgers as
"clean"; update the logic that currently returns "clean" when rec is None so it
only does so if there's a positive signal (e.g., diagnosis.drift_status !=
"unavailable" and diagnosis.table_counts is non-empty). If rec is None but
drift_status == "unavailable" or table_counts is empty, return "fixable" (or an
appropriate non-clean recovery_path) with a message indicating the schema row is
unreadable and remediation/inspection is required; adjust the variables
referenced (rec, diagnosis.drift_status, diagnosis.table_counts) and the
returned tuples to implement this gating.

In `@ledger/schema.py`:
- Around line 522-531: The per-row delete loop silently swallows exceptions;
replace the bare "except Exception: pass" that surrounds the
client.execute(f\"DELETE {row['id']}\") call with logged handling that records
the row id and exception (e.g., use logger.exception or logger.error with
row['id'] and the caught exception), optionally incrementing a failure counter
while still continuing to the next row; apply the same change to the other
identical loop block (the one around lines 547-559) so operators can correlate
failures to specific stale rows instead of losing the error signal.

---

Outside diff comments:
In `@server.py`:
- Around line 137-156: The EXPECTED_TOOL_NAMES constant is missing the newly
registered tool "bicameral.diagnose", causing run_smoke_test()'s strict equality
check against list_tools() to fail; update EXPECTED_TOOL_NAMES to include
"bicameral.diagnose" in the exact position shown by list_tools() (between
"bicameral.usage_summary" and "validate_symbols") so the order and contents
match list_tools() and the smoke test passes.

---

Nitpick comments:
In `@ledger/schema.py`:
- Line 525: The DELETE calls currently build SQL via f"DELETE {row['id']}" which
triggers SQL-injection lint warnings; change these to use SurrealDB parameter
binding by calling client.execute with a parameterized query like "DELETE $id"
and passing the params dict {"id": row['id']}; update each occurrence where
client.execute is invoked with f"DELETE {row['id']}" (and any similar
string-built DELETE) to use the parameterized form to remove the lint warning
while preserving behavior.

In `@server.py`:
- Around line 822-834: The bicameral.reset tool's inputSchema is missing the
newly supported replay_from_events flag, so update the inputSchema for
Tool(name="bicameral.reset") to include a boolean property "replay_from_events"
(default false) and a description so agents that call list_tools can discover
it; ensure the schema key name matches the argument used in the handler
(replay_from_events) and keep the handler's default (replay_from_events=False)
unchanged so behavior remains backward-compatible.

In `@tests/fixtures/legacy_ledgers/README.md`:
- Around line 25-33: Update the README entry for legacy ledger fixtures to
document the optional assert_clean(client) hook (in addition to build(client));
state that tests/test_legacy_ledger_fixtures.py will call module.assert_clean(c)
when present (and v3 fixtures should export it), that it runs after init_schema
+ migrate in the parametrized regression suite, and that authors should
implement async def assert_clean(client) to query the DB and assert any
fixture-specific post-cleanup invariants (raising/asserting with a descriptive
message when stale state is detected).

In `@tests/test_legacy_ledger_fixtures.py`:
- Around line 26-39: The hard-coded import of v3_yields_source_span defeats the
"register-only" promise; instead dynamically discover and import modules from
_FIXTURES_DIR and populate FIXTURES by iterating over module names (e.g., using
pkgutil.iter_modules or importlib.import_module on the entries found in
_FIXTURES_DIR) so you only need to register the slug; keep the sys.path
insertion for _FIXTURES_DIR, iterate discovered module names to import each
module (verify the module exports build(client) and optional
assert_clean(client)), and construct FIXTURES as a list of (slug, module) tuples
rather than relying on a literal import like v3_yields_source_span.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04b33658-0c24-4068-888c-eb74adee573f

📥 Commits

Reviewing files that changed from the base of the PR and between 178251f and 9383748.

📒 Files selected for processing (16)

.github/workflows/test-mcp-regression.yml
audit_log.py
cli/_diagnose_gather.py
contracts.py
handlers/diagnose.py
handlers/reset.py
ledger/schema.py
server.py
skills/bicameral-diagnose/SKILL.md
skills/bicameral-reset/SKILL.md
tests/fixtures/legacy_ledgers/README.md
tests/fixtures/legacy_ledgers/__init__.py
tests/fixtures/legacy_ledgers/v3_yields_source_span.py
tests/test_ledger_bicameral_meta_migration.py
tests/test_legacy_ledger_fixtures.py
tests/test_schema_recoverable_errors.py

coderabbitai · 2026-05-10T07:22:31Z

+    if rec is None:
+        return "clean", (
+            "Schema version not yet recorded — likely a fresh install. "
+            "Any tool call will initialise the ledger."
+        )
+
+    # rec == exp — confirm the table counts look sane
+    table_counts = diagnosis.table_counts or {}
+    if not table_counts:
+        return "fixable", (
+            "Schema version matches but no tables visible. "
+            "Connect may have stopped mid-init; re-run a tool call to retry."
+        )
+
+    return "clean", (f"Ledger is at expected schema v{exp}. No remediation needed.")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

rec is None may misclassify a broken ledger as clean.

_read_schema_version_raw returns None for two distinct conditions: (a) fresh install with no schema_meta row, and (b) the SELECT version FROM schema_meta query itself raised (table missing or read failed). Routing both to recovery_path="clean" with the message "likely a fresh install" can mask a corrupted/locked ledger that connected just enough to surface but cannot read its own schema row.

Consider gating the "clean" branch on a positive signal — e.g., also require diagnosis.drift_status != "unavailable" (which _read_bicameral_meta_raw returns when bicameral_meta is missing), or require non-empty table_counts. Otherwise, when the same DB is partially up but schema/meta tables are unreadable, both _read_schema_version_raw and _read_bicameral_meta_raw return None/"unavailable" and the user is told no remediation is needed.

if rec is None: + if (diagnosis.table_counts or {}) or diagnosis.drift_status != "unavailable": + return "clean", ( + "Schema version not yet recorded — likely a fresh install. " + "Any tool call will initialise the ledger." + ) + return "fixable", ( + "Schema and meta tables unreadable — connect succeeded but the ledger " + "appears uninitialised. Run any tool to trigger init_schema; if that fails, " + "rerun `bicameral-mcp diagnose --repair` from the CLI." + ) - return "clean", ( - "Schema version not yet recorded — likely a fresh install. " - "Any tool call will initialise the ledger." - )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@handlers/diagnose.py` around lines 139 - 153, The current rec is None branch from _read_schema_version_raw can misclassify unreadable/corrupted ledgers as "clean"; update the logic that currently returns "clean" when rec is None so it only does so if there's a positive signal (e.g., diagnosis.drift_status != "unavailable" and diagnosis.table_counts is non-empty). If rec is None but drift_status == "unavailable" or table_counts is empty, return "fixable" (or an appropriate non-clean recovery_path) with a message indicating the schema row is unreadable and remediation/inspection is required; adjust the variables referenced (rec, diagnosis.drift_status, diagnosis.table_counts) and the returned tuples to implement this gating.

coderabbitai · 2026-05-10T07:22:31Z

+def _events_present(ledger_url: str) -> bool:
+    """Best-effort check for ``.bicameral/events/*.jsonl``."""
+    if not ledger_url.startswith("surrealkv://"):
+        return False
+    from pathlib import Path
+
+    db_path = Path(ledger_url.removeprefix("surrealkv://"))
+    events_dir = db_path.parent / "events"
+    if not events_dir.exists():
+        return False
+    return any(events_dir.glob("*.jsonl"))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

_events_present ignores BICAMERAL_DATA_PATH, diverging from handlers/reset.py.

handlers/reset.py:_resolve_events_dir resolves the events directory by first consulting BICAMERAL_DATA_PATH (with the comment "Honours BICAMERAL_DATA_PATH symmetrically with adapters/ledger.py"), then falling back to the ledger-URL parent. This helper only inspects the ledger-URL parent.

When an operator sets BICAMERAL_DATA_PATH to a directory other than the surrealkv ledger's parent, recovery_path will be classified reset_destructive here, and the renderer will tell the user "no events on disk → user must accept data loss" (per the DiagnoseResponse contract docstring). The user then runs bicameral_reset(replay_from_events=False, confirm=True) and loses decision history that bicameral_reset(replay_from_events=True, confirm=True) would in fact have replayed (because reset's own events-dir resolver picks BICAMERAL_DATA_PATH). This silently turns a reset_rebuild situation into a reset_destructive outcome.

Reuse the same resolver — either import _resolve_events_dir from handlers.reset, or factor it into a shared module so both code paths can't drift again.

🔧 Suggested fix (one of two options)

Option A — delegate to the existing helper:

def _events_present(ledger_url: str) -> bool: """Best-effort check for ``.bicameral/events/*.jsonl``.""" - if not ledger_url.startswith("surrealkv://"): - return False - from pathlib import Path - - db_path = Path(ledger_url.removeprefix("surrealkv://")) - events_dir = db_path.parent / "events" - if not events_dir.exists(): - return False - return any(events_dir.glob("*.jsonl")) + import os + from pathlib import Path + + data_path = os.environ.get("BICAMERAL_DATA_PATH") + if data_path: + events_dir = Path(data_path) / ".bicameral" / "events" + elif ledger_url.startswith("surrealkv://"): + db_path = Path(ledger_url.removeprefix("surrealkv://")) + events_dir = db_path.parent / "events" + else: + return False + if not events_dir.exists(): + return False + return any(events_dir.glob("*.jsonl"))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@handlers/diagnose.py` around lines 156 - 166, The _events_present function currently checks only the ledger_url parent and ignores BICAMERAL_DATA_PATH; update it to reuse the same resolver used by handlers.reset (either import and call handlers.reset:_resolve_events_dir or move that resolver to a shared helper and call it) so the events directory is resolved identically to handlers.reset.py, then check that resolved path for any "*.jsonl" files (keep function name _events_present and the same boolean return semantics).

coderabbitai · 2026-05-10T07:22:31Z

+    removed = 0
+    for row in stale or []:
+        try:
+            await client.execute(f"DELETE {row['id']}")
+            removed += 1
+        except Exception:
+            pass
+    if removed:
+        logger.info("[migration] %s: removed %d stale legacy yields edges", log_tag, removed)
+    return removed


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Per-row delete failures are silently dropped — operator loses cleanup signal.

When client.execute(f"DELETE {row['id']}") fails, the bare except Exception: pass discards the row id and the exception. If a meaningful subset of deletes fails (e.g., due to a transient SurrealDB binding issue), removed undercounts and the dirty rows survive. Combined with _execute_define_idempotent swallowing the subsequent idx_yields_unique "already contains" failure (line 1018), the migration reports success and bumps schema_meta.version to 17 with the legacy rows still present — the exact failure mode #296 was meant to escape.

At minimum, log per-row failures so an operator can correlate a follow-up bicameral.diagnose finding back to the specific stale rows.

🪵 Proposed fix to surface per-row failures

for row in stale or []: try: await client.execute(f"DELETE {row['id']}") removed += 1 - except Exception: - pass + except Exception as exc: + logger.warning( + "[migration] %s: failed to delete stale yields row %s: %s", + log_tag, row.get("id"), exc, + )

if key in seen: try: await client.execute(f"DELETE {row['id']}") removed += 1 - except Exception: - pass + except Exception as exc: + logger.warning( + "[migration] %s: failed to delete duplicate yields row %s: %s", + log_tag, row.get("id"), exc, + ) else: seen.add(key)

Also applies to: 547-559

🧰 Tools

🪛 OpenGrep (1.20.0)

[ERROR] 525-525: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ledger/schema.py` around lines 522 - 531, The per-row delete loop silently swallows exceptions; replace the bare "except Exception: pass" that surrounds the client.execute(f\"DELETE {row['id']}\") call with logged handling that records the row id and exception (e.g., use logger.exception or logger.error with row['id'] and the caught exception), optionally incrementing a failure counter while still continuing to the next row; apply the same change to the other identical loop block (the one around lines 547-559) so operators can correlate failures to specific stale rows instead of losing the error signal.

Replaces the dashboard image at the bottom of "How It Feels" with a three-beat demo video section (ingest -> preflight -> ratify async) referencing GitHub user-attachments URLs so videos render as inline players. Moves the "Star on GitHub" banner from the top header to a centered placement immediately after the demo, turning it into a post-demo conversion beat instead of a misaligned header element. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…itch Replaces the single-line MCP-server description with a position-take opener: paragraph 1 names the failure mode (agreements emerge mid-flight, never reach a doc); paragraph 2 introduces Bicameral MCP as a spec compliance layer that captures both formal source materials (transcripts, PRDs, Slack) and undiscussed mid-implementation decisions to be ratified async by the product owner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jinhongkuan and others added 7 commits May 10, 2026 00:13

chore(lint): fix handlers.diagnose import order in server.py

fa4ca79

Ruff I001 — alphabetical block. handlers.diagnose belongs after handlers.bind, not after handlers.ratify. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jinhongkuan temporarily deployed to ci-test May 10, 2026 07:14 — with GitHub Actions Inactive

jinhongkuan temporarily deployed to production May 10, 2026 07:14 — with GitHub Actions Inactive

jinhongkuan had a problem deploying to recording-approval May 10, 2026 07:14 — with GitHub Actions Failure

coderabbitai Bot reviewed May 10, 2026

View reviewed changes

jinhongkuan and others added 2 commits May 10, 2026 00:55

jinhongkuan changed the title ~~triage→main: resilient ledger migration + bicameral_diagnose + reset --replay-from-events (#296)~~ triage→main: resilient ledger migration + bicameral_diagnose + reset --replay-from-events (#296) + README demo videos & opener rewrite (#299) May 10, 2026

jinhongkuan temporarily deployed to ci-test May 10, 2026 07:55 — with GitHub Actions Inactive

jinhongkuan had a problem deploying to recording-approval May 10, 2026 07:55 — with GitHub Actions Failure

jinhongkuan temporarily deployed to production May 10, 2026 07:55 — with GitHub Actions Inactive

jinhongkuan merged commit 6d96ac3 into main May 10, 2026
8 of 9 checks passed

This was referenced May 17, 2026

fix(ledger): route ledger_sync deserialization warnings to wipe-and-replay recovery (#301) #403

Merged

release: v0.16.0 — Ledger Locator + partial verdict + recovery hardening (triage) #421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triage→main: resilient ledger migration + bicameral_diagnose + reset --replay-from-events (#296) + README demo videos & opener rewrite (#299)#298

triage→main: resilient ledger migration + bicameral_diagnose + reset --replay-from-events (#296) + README demo videos & opener rewrite (#299)#298
jinhongkuan merged 9 commits into
mainfrom
triage-from-dev

jinhongkuan commented May 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jinhongkuan commented May 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What ships

Why a separate PR to main (and not waiting on #297 to merge dev → main)

Test plan

Companion PRs / issues

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jinhongkuan commented May 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 10, 2026 •

edited

Loading