Skip to content

triage→main: resilient ledger migration + bicameral_diagnose + reset --replay-from-events (#296) + README demo videos & opener rewrite (#299)#298

Merged
jinhongkuan merged 9 commits into
mainfrom
triage-from-dev
May 10, 2026
Merged

triage→main: resilient ledger migration + bicameral_diagnose + reset --replay-from-events (#296) + README demo videos & opener rewrite (#299)#298
jinhongkuan merged 9 commits into
mainfrom
triage-from-dev

Conversation

@jinhongkuan

@jinhongkuan jinhongkuan commented May 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Triages the resilient-ledger-migration work from #297 (against dev) onto main via the triage-from-dev branch. Identical commit set, cherry-picked off origin/main after a clean reset.

Companion P0 issue: #296 (replay determinism regression suite).

What ships

Same four logical layers as #297:

  • fix(ledger) — tolerant init_schema (expected a record< substring catch + SCHEMA_DEFINE_SKIPPED audit event) + extracted _clean_yields_legacy_rows / _dedupe_yields helpers + new _migrate_v16_to_v17 that re-runs the cleanup. SCHEMA_VERSION 16 → 17.
  • feat(diagnose)bicameral.diagnose MCP tool that opens a raw LedgerClient (no init/migrate), so it works even when normal connect crashes. Returns a structured recovery_path (clean / fixable / reset_rebuild / reset_destructive). Shared gather_diagnosis_raw between CLI and MCP — one body, two thin wrappers.
  • feat(reset)bicameral_reset(replay_from_events=true) reuses EventMaterializer.replay_new_events to rebuild the ledger from .bicameral/events/*.jsonl after wipe. Mutually exclusive with wipe_mode="full".
  • test(ledger) — fixture-replay regression (parametrized over frozen DB shapes — first fixture is the user's exact dogfood symptom) + error-format provocation suite (locks down the SurrealDB substring contract).

Plus the three lint/type follow-ups from #297 review iteration:

  • chore(lint) — handlers.diagnose import order
  • chore(format) — ruff format on the five new/touched files
  • fix(types)RecoveryPath Literal alias for mypy

Why a separate PR to main (and not waiting on #297 to merge dev → main)

The dogfood crash motivating this work bricks every ledger op until the v17 migration runs. Going through dev → main on the normal cadence delays the unblock for any operator on main who hits the same bad-row state. Triaging directly onto main via triage-from-dev matches the convention used for #290 / #291 hotfixes earlier in this milestone.

Test plan

Local: pytest tests/test_schema_recoverable_errors.py tests/test_legacy_ledger_fixtures.py tests/test_diagnose_gather.py tests/test_reset.py → 27 passed.

  • CI green on the regression matrix (Linux + Windows).
  • CI green on the new fixture-replay suite + error-format provocation.
  • CI green on ruff + mypy.
  • Manual: dogfood ledger upgrades cleanly to v17 and the legacy source_span rows in yields are removed.
  • Manual: bicameral.diagnose returns recovery_path: "clean" after upgrade.

Companion PRs / issues

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a diagnostic tool for read-only ledger connectivity and health assessment.
    • Added event replay capability to rebuild ledger data from stored event logs after reset.
  • Bug Fixes

    • Enhanced robustness of schema definition handling for recoverable SurrealDB errors.
    • Added cleanup migration for legacy ledger data corruption in the yields table.
  • Tests

    • Added regression test suite for legacy ledger fixture validation and schema error pattern matching.

Review Change Stack

jinhongkuan and others added 7 commits May 10, 2026 00:13
Two paired changes that together unblock any ledger whose
schema_meta.version rolled past 5 with v3-era source_span rows still
in the yields table — the dogfood crash that motivated this work.

**Tolerant init_schema (Layer A).**

\`_execute_define_idempotent\` now recognizes a small allowlist of
SurrealDB error substrings as recoverable: existing "already exists"
and "already contains" plus the new "expected a record<" (TYPE
constraint mismatch when DEFINE TABLE/INDEX OVERWRITE re-validates
legacy rows) and a defensive "but expected" for adjacent value-type
errors. The patterns live in module-level RECOVERABLE_DEFINE_PATTERNS
so the substring contract is testable. When a recoverable error
fires, init logs a warning and emits a SCHEMA_DEFINE_SKIPPED audit
event so the next migration's cleanup is visible in the audit log.

**Re-runnable yields cleanup (Layer B).**

The v4→v5 cleanup body is extracted into shared
\`_clean_yields_legacy_rows\` + \`_dedupe_yields\` helpers. v4→v5 still
calls them, so historical behaviour is unchanged. The new
\`_migrate_v16_to_v17\` calls the same helpers, which means any DB
that ran past v5 with the corruption still present gets the cleanup
on the v16→v17 boundary instead of permanently — the migrate() loop
only iterates \`range(current+1, SCHEMA_VERSION+1)\`, so a one-shot
historical migration is permanently unreachable for users who
upgraded later.

\`SCHEMA_VERSION\` bumps to 17. SCHEMA_COMPATIBILITY[17] is a
placeholder release-eng pins at PR merge.

**Audit log.**

New \`AuditEventType.SCHEMA_DEFINE_SKIPPED\` (warn-level) emitted by
\`_execute_define_idempotent\` whenever a UNIQUE-violation or
type-mismatch is swallowed — gives operators visibility into the
in-flight cleanup without breaking init.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…296)

Mirrors the existing \`bicameral-mcp diagnose\` CLI as an MCP tool so
agents can fire it from any tool-error envelope. Critical property:
the handler opens a raw \`LedgerClient\` and never goes through
\`adapter.connect()\` — which means it produces a useful diagnosis
even when normal connect crashes inside \`init_schema\` / \`migrate\`,
the exact failure mode that motivated #296.

**Refactor.**

\`cli/_diagnose_gather.py\` grows three raw-client variants
(\`_read_bicameral_meta_raw\`, \`_read_schema_version_raw\`,
\`_read_table_counts_raw\`) and a \`gather_diagnosis_raw(client,
ledger_url)\` entry point. The existing adapter-flavoured
\`gather_diagnosis(adapter)\` becomes a one-line wrapper, so the CLI
keeps its current shape and the test suite (\`test_diagnose_*\`) is
unchanged.

**New handler + tool.**

\`handlers/diagnose.handle_diagnose\` opens a raw client against the
resolved ledger URL, calls \`gather_diagnosis_raw\`, and classifies
the result into a \`recovery_path\`:

- \`clean\` — schema matches binary, tables look sane.
- \`fixable\` — schema behind binary; next normal connect migrates.
- \`reset_rebuild\` — ledger broken AND \`.bicameral/events/\` has
  events on disk; \`bicameral_reset(replay_from_events=true,
  confirm=true)\` recovers without data loss.
- \`reset_destructive\` — ledger broken AND no events; reset loses
  decision history, user must accept.

\`next_action\` is a human-readable string the agent renders verbatim.
Wired into \`server.py\` list_tools + dispatch as
\`bicameral.diagnose\`. Pydantic contract \`DiagnoseResponse\` lives
alongside \`ResetResponse\` in \`contracts.py\`.

**SKILL contract.**

\`skills/bicameral-diagnose/SKILL.md\` documents auto-fire from any
schema/migration error envelope, the recovery-path → command matrix,
and the read-only invariant (repair is CLI-driven, never an
in-session agent action).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e (#296)

\`bicameral_reset(replay_from_events=True, confirm=True)\` (only valid
with \`wipe_mode="ledger"\`) wipes the materialized DB, resets the
team-mode watermark, and replays every event in
\`.bicameral/events/*.jsonl\` back through the same ingest path team
mode uses. The event log is the canonical record (committed to git
in team mode); replay is recovery, not destruction.

Implementation reuses \`EventMaterializer.replay_new_events\` — the
exact code team mode runs on every sync — so replay-vs-live
divergence is impossible by construction. Determinism of the
underlying canonical_id resolution is tracked separately under #296
(P0 follow-up: regression suite covering every event type the
materializer dispatches on).

**Behaviour.**

- Dry run reports on-disk event count via \`events_replayed\` so the
  user sees scale before confirming.
- Post-confirm response reports actual replayed count + any
  \`replay_errors\` (wipe still committed; replay errors don't
  destroy data, only block recovery).
- \`replay_from_events=True\` + \`wipe_mode="full"\` is rejected with
  a clear error — full-wipe deletes the substrate we'd replay from.
- Honours \`BICAMERAL_DATA_PATH\` symmetrically with
  \`adapters/ledger.py\`'s team-mode write path so the replay reads
  from the same directory the writers wrote to.

**Skill.**

\`skills/bicameral-reset/SKILL.md\` documents the new flag, the
exclusivity rule with full-wipe, and the recommended pairing with
\`bicameral_diagnose\`'s \`recovery_path="reset_rebuild"\`.

**Contracts.**

\`ResetResponse.events_replayed\` (default 0) carries the count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two CI test classes that lock down the safety contract for layers A
and B. Both run on every PR via the existing
\`test-mcp-regression.yml\` workflow.

**Fixture-replay suite (\`test_legacy_ledger_fixtures.py\`).**

Frozen DB shapes under \`tests/fixtures/legacy_ledgers/\`. Each
fixture is a Python module exporting an async \`build(client)\`
coroutine that mutates the client into a known-historical bad state.
The parametrized test:

  1. Builds the bad state in a \`memory://\` client.
  2. Runs the production \`init_schema\` + \`migrate\` path.
  3. Asserts schema reaches \`SCHEMA_VERSION\`, fixture-specific
     invariants hold, and a second \`init_schema\` + \`migrate\` is
     a no-op (idempotent).

First fixture is the user's exact dogfood symptom:
\`v3_yields_source_span\` — \`yields.in = source_span:*\` rows
surviving past v5, schema_meta.version = 16 to force the v16→v17
boundary. Adding a new fixture later requires zero test code:
register it in \`FIXTURES\` and the parametrized test runs against
it. README under the fixtures dir documents the registration shape.

**Error-format provocation suite (\`test_schema_recoverable_errors.py\`).**

Three tests that lock down the SurrealDB error-string contract layer
A depends on:

  - UNIQUE-violation produces a substring matching
    \`RECOVERABLE_DEFINE_PATTERNS\` (existing v4→v5 path).
  - Type-mismatch on \`DEFINE INDEX OVERWRITE\` against legacy rows
    produces a matching substring (the #296 scenario).
  - All entries in \`RECOVERABLE_DEFINE_PATTERNS\` are lowercase
    (the catch lower-cases the message before substring match).

When a future \`surrealdb-py\` bump changes the format, these tests
fail loudly with the exact message the maintainer needs to update
the patterns list — turning a silent regression into a known-loud
signal.

**CI wiring.**

\`test-mcp-regression.yml\` lists the two new files in the pytest
invocation. \`test_ledger_bicameral_meta_migration.py\` had a
hardcoded \`assert SCHEMA_VERSION == 16\` updated to
\`assert SCHEMA_VERSION >= 16\` so future bumps land cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ruff I001 — alphabetical block. handlers.diagnose belongs after
handlers.bind, not after handlers.ratify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ruff format --check .` was failing on the five files added in this
PR. No behavioural changes — purely whitespace/line-wrapping per the
project's ruff config. 42-test smoke pass green after the reformat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mypy flagged DiagnoseResponse.recovery_path as expecting
Literal['clean','fixable','reset_rebuild','reset_destructive']
while _classify_recovery returned plain str. Add a `RecoveryPath`
type alias and use it on both the function signature and the
intermediate `path` binding for the rec>exp branch. Pure typing —
no runtime change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 10, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR introduces comprehensive ledger diagnostics and recovery workflows: a new read-only bicameral.diagnose MCP tool that reports ledger health and recovery paths (clean, fixable, reset_rebuild, reset_destructive), schema version bump to v17 with yields table cleanup migrations, event replay support in the reset handler, and test infrastructure validating recovery paths through fixture-based corruption scenarios.

Changes

Ledger Diagnostics & Recovery Workflows

Layer / File(s) Summary
Data Contracts & Audit Events
contracts.py, audit_log.py
DiagnoseResponse defines ledger health snapshot with recovery paths and connection errors; ResetResponse adds events_replayed counter; audit system adds SCHEMA_DEFINE_SKIPPED event type.
Schema Versions & Migrations
ledger/schema.py
Increments SCHEMA_VERSION to 17; introduces RECOVERABLE_DEFINE_PATTERNS for idempotent DEFINE handling; refactors yields cleanup into helpers and adds v16→v17 migration to remediate unreachable v4→v5 corruption.
Diagnosis Gathering (Raw Client)
cli/_diagnose_gather.py
Refactors diagnosis gathering into URL-based and raw client-driven helpers; gather_diagnosis_raw(client, ledger_url) assembles full diagnosis for use when adapter fails; adapter-based wrapper delegates to raw implementation.
Diagnose MCP Handler & Recovery Classification
handlers/diagnose.py
Implements handle_diagnose(ctx) to resolve ledger URL, gather diagnosis, classify recovery path based on schema version and event presence, and return DiagnoseResponse with next-action guidance; includes _classify_recovery and _events_present helpers.
Reset Handler with Event Replay
handlers/reset.py
Extends handle_reset with replay_from_events parameter; rejects replay+full-wipe combination; for ledger-wipe+replay, counts on-disk events, executes replay via EventMaterializer.replay_new_events after wipe, captures replay failures in response.
MCP Server Tool Registration
server.py
Imports handle_diagnose; registers bicameral.diagnose MCP tool with empty input schema; wires tool dispatcher to route diagnose calls to handler.
Skill Documentation & Fixture Guide
skills/bicameral-diagnose/SKILL.md, skills/bicameral-reset/SKILL.md, tests/fixtures/legacy_ledgers/README.md
Defines diagnostic trigger conditions, recovery-path matrix, and auto-invocation flow; documents replay_from_events mode constraints; provides legacy fixture framework and contributor guide.
Legacy Ledger Fixtures & Recovery Tests
tests/fixtures/legacy_ledgers/v3_yields_source_span.py, tests/test_legacy_ledger_fixtures.py, tests/test_schema_recoverable_errors.py, tests/test_ledger_bicameral_meta_migration.py, .github/workflows/test-mcp-regression.yml
Adds v3-era corruption fixture; parametrized regression suite validating cleanup and idempotency; contract tests for recoverable DEFINE patterns; updates migration test expectations; registers new test modules in CI.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • BicameralAI/bicameral-mcp#29: Both PRs modify the CI workflow test configuration (.github/workflows/test-mcp-regression.yml) and establish legacy ledger fixture infrastructure for regression testing.
  • BicameralAI/bicameral-mcp#12: Both PRs modify the reset handler (handlers/reset.py) and ResetResponse contract to support new recovery workflows.
  • BicameralAI/bicameral-mcp#128: Both PRs interact with the event replay surface (EventMaterializer.replay_new_events, .bicameral/events/*.jsonl reading/writing).

Suggested labels

flow:release

Suggested reviewers

  • Knapp-Kevin

Poem

🐰 A diagnosis so keen, now your ledger is seen,
Recovery paths clear as the morning dew—
From yields with old ties to a v17 reprise,
Replaying events to rebuild anew! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title comprehensively summarizes the four main logical layers being implemented: resilient ledger migration, bicameral_diagnose, reset --replay-from-events, plus supporting test infrastructure and documentation updates, aligned with the objectives and all major file changes across ledger/schema, handlers, contracts, and test suites.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch triage-from-dev

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server.py (1)

137-156: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Smoke test will fail — EXPECTED_TOOL_NAMES is missing bicameral.diagnose.

list_tools() now returns 19 tools (the 18 listed here plus the new bicameral.diagnose registered at lines 822–834), but run_smoke_test() does a strict equality compare:

if tool_names != EXPECTED_TOOL_NAMES:
    raise RuntimeError(f"Unexpected MCP tool registry: {tool_names!r} != {EXPECTED_TOOL_NAMES!r}")

This means bicameral-mcp --smoke-test (and any CI step that calls it) will raise immediately. Add the new tool name in the same slot order as list_tools() (between bicameral.usage_summary and validate_symbols).

🔧 Proposed fix
     "bicameral.feedback",
     "bicameral.usage_summary",
+    "bicameral.diagnose",
     "validate_symbols",
     "get_neighbors",
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server.py` around lines 137 - 156, The EXPECTED_TOOL_NAMES constant is
missing the newly registered tool "bicameral.diagnose", causing
run_smoke_test()'s strict equality check against list_tools() to fail; update
EXPECTED_TOOL_NAMES to include "bicameral.diagnose" in the exact position shown
by list_tools() (between "bicameral.usage_summary" and "validate_symbols") so
the order and contents match list_tools() and the smoke test passes.
🧹 Nitpick comments (4)
server.py (1)

822-834: ⚡ Quick win

Tool registration and dispatch wiring look correct.

The empty input schema ({"type": "object", "properties": {}}) is appropriate for a no-arg read-only probe, and the dispatch arm correctly aliases both bicameral.diagnose and the bare diagnose form for parity with the rest of the registry. replay_from_events=arguments.get("replay_from_events", False) defaults match the handler signature and avoid breaking existing reset callers.

One thing to consider: the bicameral.reset tool's inputSchema (lines 331–357) doesn't advertise the new replay_from_events property. Agents discovering tools via list_tools won't see it; the dispatch will accept it from arguments because the handler defaults to False, but the tool description should document the new flag for the call site.

🔧 Suggested addition to `bicameral.reset` inputSchema
                     "wipe_mode": {
                         "type": "string",
                         "enum": ["ledger", "full"],
                         "default": "ledger",
                         "description": (
                             "'ledger' (default): wipe materialized DB rows only — config and event "
                             "files are preserved, server stays live. Use for bug/pollution recovery. "
                             "'full': delete the entire .bicameral/ directory. Nuclear option — "
                             "removes config, team event history, and all data. Always confirm "
                             "with the user after showing the dry-run warning."
                         ),
                     },
+                    "replay_from_events": {
+                        "type": "boolean",
+                        "default": False,
+                        "description": (
+                            "When true (only valid with wipe_mode='ledger'), wipes the materialized "
+                            "DB and rebuilds it by replaying .bicameral/events/*.jsonl through the "
+                            "ingest path. Mutually exclusive with wipe_mode='full' — full-wipe "
+                            "deletes the events substrate. Use after bicameral.diagnose returns "
+                            "recovery_path='reset_rebuild'."
+                        ),
+                    },
                 },
             },
         ),

Also applies to: 1043-1052

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server.py` around lines 822 - 834, The bicameral.reset tool's inputSchema is
missing the newly supported replay_from_events flag, so update the inputSchema
for Tool(name="bicameral.reset") to include a boolean property
"replay_from_events" (default false) and a description so agents that call
list_tools can discover it; ensure the schema key name matches the argument used
in the handler (replay_from_events) and keep the handler's default
(replay_from_events=False) unchanged so behavior remains backward-compatible.
tests/fixtures/legacy_ledgers/README.md (1)

25-33: ⚡ Quick win

Document the optional assert_clean(client) hook.

tests/test_legacy_ledger_fixtures.py calls module.assert_clean(c) when present (and the v3 fixture exports one), but the authoring section here only mentions build(client). New fixture authors will not know the contract for fixture-specific invariants.

📝 Proposed README addition
 ```python
 # tests/fixtures/legacy_ledgers/<name>.py
 async def build(client):
     await client.execute("…raw SurrealQL that produces the bad state…")
+
+# Optional: asserted after init_schema + migrate runs in the
+# parametrized regression suite. Use it for fixture-specific
+# post-cleanup invariants beyond the generic schema_meta.version
+# check.
+async def assert_clean(client):
+    rows = await client.query("…")
+    assert …, "describe what stale state would have leaked through"
 ```
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/fixtures/legacy_ledgers/README.md` around lines 25 - 33, Update the
README entry for legacy ledger fixtures to document the optional
assert_clean(client) hook (in addition to build(client)); state that
tests/test_legacy_ledger_fixtures.py will call module.assert_clean(c) when
present (and v3 fixtures should export it), that it runs after init_schema +
migrate in the parametrized regression suite, and that authors should implement
async def assert_clean(client) to query the DB and assert any fixture-specific
post-cleanup invariants (raising/asserting with a descriptive message when stale
state is detected).
tests/test_legacy_ledger_fixtures.py (1)

26-39: 💤 Low value

Registry ergonomics: hard-coded import partially defeats the "register-only" promise.

The docstring states "Adding a new fixture requires no test code — register it in FIXTURES" (line 11-12), but the current shape requires both a top-level import v3_yields_source_span line and a FIXTURES entry. A future contributor following the docstring will register a fixture and watch the suite silently skip it (or rather, fail on NameError). Consider discovering fixtures via pkgutil.iter_modules(_FIXTURES_DIR) or importlib.import_module(slug) at module load and dropping the hard-coded import.

🔧 Sketch of registry-only loading
-import v3_yields_source_span  # noqa: E402 — see sys.path comment
-
-# Each entry: (slug, module). Module must export `build(client)` and
-# may export `assert_clean(client)` for fixture-specific invariants.
-FIXTURES = [
-    ("v3_yields_source_span", v3_yields_source_span),
-]
+# Slugs of fixture modules under tests/fixtures/legacy_ledgers/.
+# Each module must export an async `build(client)` and may export
+# an async `assert_clean(client)`.
+_FIXTURE_SLUGS = ["v3_yields_source_span"]
+FIXTURES = [(slug, importlib.import_module(slug)) for slug in _FIXTURE_SLUGS]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_legacy_ledger_fixtures.py` around lines 26 - 39, The hard-coded
import of v3_yields_source_span defeats the "register-only" promise; instead
dynamically discover and import modules from _FIXTURES_DIR and populate FIXTURES
by iterating over module names (e.g., using pkgutil.iter_modules or
importlib.import_module on the entries found in _FIXTURES_DIR) so you only need
to register the slug; keep the sys.path insertion for _FIXTURES_DIR, iterate
discovered module names to import each module (verify the module exports
build(client) and optional assert_clean(client)), and construct FIXTURES as a
list of (slug, module) tuples rather than relying on a literal import like
v3_yields_source_span.
ledger/schema.py (1)

525-525: 💤 Low value

Parameterize DELETE statements to eliminate SQL-injection linting warning.

OpenGrep flags DELETE {row['id']} (lines 525, 551) as SQL-injection shape. While row['id'] is DB-derived (not user input) and risk is low, SurrealDB 2.x supports parameter binding for DELETE targets. Use:

await client.execute("DELETE $id", {"id": row['id']})

This eliminates the linting noise without changing logic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ledger/schema.py` at line 525, The DELETE calls currently build SQL via
f"DELETE {row['id']}" which triggers SQL-injection lint warnings; change these
to use SurrealDB parameter binding by calling client.execute with a
parameterized query like "DELETE $id" and passing the params dict {"id":
row['id']}; update each occurrence where client.execute is invoked with f"DELETE
{row['id']}" (and any similar string-built DELETE) to use the parameterized form
to remove the lint warning while preserving behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@handlers/diagnose.py`:
- Around line 156-166: The _events_present function currently checks only the
ledger_url parent and ignores BICAMERAL_DATA_PATH; update it to reuse the same
resolver used by handlers.reset (either import and call
handlers.reset:_resolve_events_dir or move that resolver to a shared helper and
call it) so the events directory is resolved identically to handlers.reset.py,
then check that resolved path for any "*.jsonl" files (keep function name
_events_present and the same boolean return semantics).
- Around line 139-153: The current rec is None branch from
_read_schema_version_raw can misclassify unreadable/corrupted ledgers as
"clean"; update the logic that currently returns "clean" when rec is None so it
only does so if there's a positive signal (e.g., diagnosis.drift_status !=
"unavailable" and diagnosis.table_counts is non-empty). If rec is None but
drift_status == "unavailable" or table_counts is empty, return "fixable" (or an
appropriate non-clean recovery_path) with a message indicating the schema row is
unreadable and remediation/inspection is required; adjust the variables
referenced (rec, diagnosis.drift_status, diagnosis.table_counts) and the
returned tuples to implement this gating.

In `@ledger/schema.py`:
- Around line 522-531: The per-row delete loop silently swallows exceptions;
replace the bare "except Exception: pass" that surrounds the
client.execute(f\"DELETE {row['id']}\") call with logged handling that records
the row id and exception (e.g., use logger.exception or logger.error with
row['id'] and the caught exception), optionally incrementing a failure counter
while still continuing to the next row; apply the same change to the other
identical loop block (the one around lines 547-559) so operators can correlate
failures to specific stale rows instead of losing the error signal.

---

Outside diff comments:
In `@server.py`:
- Around line 137-156: The EXPECTED_TOOL_NAMES constant is missing the newly
registered tool "bicameral.diagnose", causing run_smoke_test()'s strict equality
check against list_tools() to fail; update EXPECTED_TOOL_NAMES to include
"bicameral.diagnose" in the exact position shown by list_tools() (between
"bicameral.usage_summary" and "validate_symbols") so the order and contents
match list_tools() and the smoke test passes.

---

Nitpick comments:
In `@ledger/schema.py`:
- Line 525: The DELETE calls currently build SQL via f"DELETE {row['id']}" which
triggers SQL-injection lint warnings; change these to use SurrealDB parameter
binding by calling client.execute with a parameterized query like "DELETE $id"
and passing the params dict {"id": row['id']}; update each occurrence where
client.execute is invoked with f"DELETE {row['id']}" (and any similar
string-built DELETE) to use the parameterized form to remove the lint warning
while preserving behavior.

In `@server.py`:
- Around line 822-834: The bicameral.reset tool's inputSchema is missing the
newly supported replay_from_events flag, so update the inputSchema for
Tool(name="bicameral.reset") to include a boolean property "replay_from_events"
(default false) and a description so agents that call list_tools can discover
it; ensure the schema key name matches the argument used in the handler
(replay_from_events) and keep the handler's default (replay_from_events=False)
unchanged so behavior remains backward-compatible.

In `@tests/fixtures/legacy_ledgers/README.md`:
- Around line 25-33: Update the README entry for legacy ledger fixtures to
document the optional assert_clean(client) hook (in addition to build(client));
state that tests/test_legacy_ledger_fixtures.py will call module.assert_clean(c)
when present (and v3 fixtures should export it), that it runs after init_schema
+ migrate in the parametrized regression suite, and that authors should
implement async def assert_clean(client) to query the DB and assert any
fixture-specific post-cleanup invariants (raising/asserting with a descriptive
message when stale state is detected).

In `@tests/test_legacy_ledger_fixtures.py`:
- Around line 26-39: The hard-coded import of v3_yields_source_span defeats the
"register-only" promise; instead dynamically discover and import modules from
_FIXTURES_DIR and populate FIXTURES by iterating over module names (e.g., using
pkgutil.iter_modules or importlib.import_module on the entries found in
_FIXTURES_DIR) so you only need to register the slug; keep the sys.path
insertion for _FIXTURES_DIR, iterate discovered module names to import each
module (verify the module exports build(client) and optional
assert_clean(client)), and construct FIXTURES as a list of (slug, module) tuples
rather than relying on a literal import like v3_yields_source_span.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04b33658-0c24-4068-888c-eb74adee573f

📥 Commits

Reviewing files that changed from the base of the PR and between 178251f and 9383748.

📒 Files selected for processing (16)
  • .github/workflows/test-mcp-regression.yml
  • audit_log.py
  • cli/_diagnose_gather.py
  • contracts.py
  • handlers/diagnose.py
  • handlers/reset.py
  • ledger/schema.py
  • server.py
  • skills/bicameral-diagnose/SKILL.md
  • skills/bicameral-reset/SKILL.md
  • tests/fixtures/legacy_ledgers/README.md
  • tests/fixtures/legacy_ledgers/__init__.py
  • tests/fixtures/legacy_ledgers/v3_yields_source_span.py
  • tests/test_ledger_bicameral_meta_migration.py
  • tests/test_legacy_ledger_fixtures.py
  • tests/test_schema_recoverable_errors.py

Comment thread handlers/diagnose.py
Comment on lines +139 to +153
if rec is None:
return "clean", (
"Schema version not yet recorded — likely a fresh install. "
"Any tool call will initialise the ledger."
)

# rec == exp — confirm the table counts look sane
table_counts = diagnosis.table_counts or {}
if not table_counts:
return "fixable", (
"Schema version matches but no tables visible. "
"Connect may have stopped mid-init; re-run a tool call to retry."
)

return "clean", (f"Ledger is at expected schema v{exp}. No remediation needed.")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

rec is None may misclassify a broken ledger as clean.

_read_schema_version_raw returns None for two distinct conditions: (a) fresh install with no schema_meta row, and (b) the SELECT version FROM schema_meta query itself raised (table missing or read failed). Routing both to recovery_path="clean" with the message "likely a fresh install" can mask a corrupted/locked ledger that connected just enough to surface but cannot read its own schema row.

Consider gating the "clean" branch on a positive signal — e.g., also require diagnosis.drift_status != "unavailable" (which _read_bicameral_meta_raw returns when bicameral_meta is missing), or require non-empty table_counts. Otherwise, when the same DB is partially up but schema/meta tables are unreadable, both _read_schema_version_raw and _read_bicameral_meta_raw return None/"unavailable" and the user is told no remediation is needed.

     if rec is None:
+        if (diagnosis.table_counts or {}) or diagnosis.drift_status != "unavailable":
+            return "clean", (
+                "Schema version not yet recorded — likely a fresh install. "
+                "Any tool call will initialise the ledger."
+            )
+        return "fixable", (
+            "Schema and meta tables unreadable — connect succeeded but the ledger "
+            "appears uninitialised. Run any tool to trigger init_schema; if that fails, "
+            "rerun `bicameral-mcp diagnose --repair` from the CLI."
+        )
-        return "clean", (
-            "Schema version not yet recorded — likely a fresh install. "
-            "Any tool call will initialise the ledger."
-        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@handlers/diagnose.py` around lines 139 - 153, The current rec is None branch
from _read_schema_version_raw can misclassify unreadable/corrupted ledgers as
"clean"; update the logic that currently returns "clean" when rec is None so it
only does so if there's a positive signal (e.g., diagnosis.drift_status !=
"unavailable" and diagnosis.table_counts is non-empty). If rec is None but
drift_status == "unavailable" or table_counts is empty, return "fixable" (or an
appropriate non-clean recovery_path) with a message indicating the schema row is
unreadable and remediation/inspection is required; adjust the variables
referenced (rec, diagnosis.drift_status, diagnosis.table_counts) and the
returned tuples to implement this gating.

Comment thread handlers/diagnose.py
Comment on lines +156 to +166
def _events_present(ledger_url: str) -> bool:
"""Best-effort check for ``.bicameral/events/*.jsonl``."""
if not ledger_url.startswith("surrealkv://"):
return False
from pathlib import Path

db_path = Path(ledger_url.removeprefix("surrealkv://"))
events_dir = db_path.parent / "events"
if not events_dir.exists():
return False
return any(events_dir.glob("*.jsonl"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

_events_present ignores BICAMERAL_DATA_PATH, diverging from handlers/reset.py.

handlers/reset.py:_resolve_events_dir resolves the events directory by first consulting BICAMERAL_DATA_PATH (with the comment "Honours BICAMERAL_DATA_PATH symmetrically with adapters/ledger.py"), then falling back to the ledger-URL parent. This helper only inspects the ledger-URL parent.

When an operator sets BICAMERAL_DATA_PATH to a directory other than the surrealkv ledger's parent, recovery_path will be classified reset_destructive here, and the renderer will tell the user "no events on disk → user must accept data loss" (per the DiagnoseResponse contract docstring). The user then runs bicameral_reset(replay_from_events=False, confirm=True) and loses decision history that bicameral_reset(replay_from_events=True, confirm=True) would in fact have replayed (because reset's own events-dir resolver picks BICAMERAL_DATA_PATH). This silently turns a reset_rebuild situation into a reset_destructive outcome.

Reuse the same resolver — either import _resolve_events_dir from handlers.reset, or factor it into a shared module so both code paths can't drift again.

🔧 Suggested fix (one of two options)

Option A — delegate to the existing helper:

 def _events_present(ledger_url: str) -> bool:
     """Best-effort check for ``.bicameral/events/*.jsonl``."""
-    if not ledger_url.startswith("surrealkv://"):
-        return False
-    from pathlib import Path
-
-    db_path = Path(ledger_url.removeprefix("surrealkv://"))
-    events_dir = db_path.parent / "events"
-    if not events_dir.exists():
-        return False
-    return any(events_dir.glob("*.jsonl"))
+    import os
+    from pathlib import Path
+
+    data_path = os.environ.get("BICAMERAL_DATA_PATH")
+    if data_path:
+        events_dir = Path(data_path) / ".bicameral" / "events"
+    elif ledger_url.startswith("surrealkv://"):
+        db_path = Path(ledger_url.removeprefix("surrealkv://"))
+        events_dir = db_path.parent / "events"
+    else:
+        return False
+    if not events_dir.exists():
+        return False
+    return any(events_dir.glob("*.jsonl"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@handlers/diagnose.py` around lines 156 - 166, The _events_present function
currently checks only the ledger_url parent and ignores BICAMERAL_DATA_PATH;
update it to reuse the same resolver used by handlers.reset (either import and
call handlers.reset:_resolve_events_dir or move that resolver to a shared helper
and call it) so the events directory is resolved identically to
handlers.reset.py, then check that resolved path for any "*.jsonl" files (keep
function name _events_present and the same boolean return semantics).

Comment thread ledger/schema.py
Comment on lines +522 to +531
removed = 0
for row in stale or []:
try:
await client.execute(f"DELETE {row['id']}")
removed += 1
except Exception:
pass
if removed:
logger.info("[migration] %s: removed %d stale legacy yields edges", log_tag, removed)
return removed

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Per-row delete failures are silently dropped — operator loses cleanup signal.

When client.execute(f"DELETE {row['id']}") fails, the bare except Exception: pass discards the row id and the exception. If a meaningful subset of deletes fails (e.g., due to a transient SurrealDB binding issue), removed undercounts and the dirty rows survive. Combined with _execute_define_idempotent swallowing the subsequent idx_yields_unique "already contains" failure (line 1018), the migration reports success and bumps schema_meta.version to 17 with the legacy rows still present — the exact failure mode #296 was meant to escape.

At minimum, log per-row failures so an operator can correlate a follow-up bicameral.diagnose finding back to the specific stale rows.

🪵 Proposed fix to surface per-row failures
     for row in stale or []:
         try:
             await client.execute(f"DELETE {row['id']}")
             removed += 1
-        except Exception:
-            pass
+        except Exception as exc:
+            logger.warning(
+                "[migration] %s: failed to delete stale yields row %s: %s",
+                log_tag, row.get("id"), exc,
+            )
         if key in seen:
             try:
                 await client.execute(f"DELETE {row['id']}")
                 removed += 1
-            except Exception:
-                pass
+            except Exception as exc:
+                logger.warning(
+                    "[migration] %s: failed to delete duplicate yields row %s: %s",
+                    log_tag, row.get("id"), exc,
+                )
         else:
             seen.add(key)

Also applies to: 547-559

🧰 Tools
🪛 OpenGrep (1.20.0)

[ERROR] 525-525: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ledger/schema.py` around lines 522 - 531, The per-row delete loop silently
swallows exceptions; replace the bare "except Exception: pass" that surrounds
the client.execute(f\"DELETE {row['id']}\") call with logged handling that
records the row id and exception (e.g., use logger.exception or logger.error
with row['id'] and the caught exception), optionally incrementing a failure
counter while still continuing to the next row; apply the same change to the
other identical loop block (the one around lines 547-559) so operators can
correlate failures to specific stale rows instead of losing the error signal.

jinhongkuan and others added 2 commits May 10, 2026 00:55
Replaces the dashboard image at the bottom of "How It Feels" with a
three-beat demo video section (ingest -> preflight -> ratify async)
referencing GitHub user-attachments URLs so videos render as inline
players. Moves the "Star on GitHub" banner from the top header to a
centered placement immediately after the demo, turning it into a
post-demo conversion beat instead of a misaligned header element.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itch

Replaces the single-line MCP-server description with a position-take
opener: paragraph 1 names the failure mode (agreements emerge mid-flight,
never reach a doc); paragraph 2 introduces Bicameral MCP as a spec
compliance layer that captures both formal source materials (transcripts,
PRDs, Slack) and undiscussed mid-implementation decisions to be ratified
async by the product owner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan changed the title triage→main: resilient ledger migration + bicameral_diagnose + reset --replay-from-events (#296) triage→main: resilient ledger migration + bicameral_diagnose + reset --replay-from-events (#296) + README demo videos & opener rewrite (#299) May 10, 2026
@jinhongkuan jinhongkuan had a problem deploying to recording-approval May 10, 2026 07:55 — with GitHub Actions Failure
@jinhongkuan jinhongkuan merged commit 6d96ac3 into main May 10, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant