Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,24 @@
All notable changes to bicameral-mcp are tracked here. Format loosely follows
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## v0.15.1 — hotfix: route `ledger_sync` deserialization warnings to a wipe-and-replay recovery path (#301)

Hotfix for [#301](https://github.com/BicameralAI/bicameral-mcp/issues/301). v0.15.0 already added a row-level deserialization probe to `bicameral.diagnose` ([`cli/_diagnose_gather.py::_probe_row_deserialization`](cli/_diagnose_gather.py)), but the probe's findings stopped at the `suggestions` list — `_classify_recovery` in `handlers/diagnose.py` still inspected only `schema_meta.version`, so an agent that ran `diagnose` after a `link_commit` failure would see `recovery_path: "clean"` and `next_action: "No remediation needed."` while the ledger was actually unreadable.

This release closes the loop end-to-end:

- **`LedgerDeserializationError`** (subclass of `LedgerError`) is raised from `ledger.client.query` / `ledger.client.execute` whenever the SurrealDB SDK returns `Invalid revision \`N\` for type \`Value\`` or a `deserialization error` wrapper. The exception message embeds the recovery command, so the agent sees the wipe-and-replay instruction inside the MCP error envelope without needing a second `diagnose` round-trip.
- **`handlers/diagnose.py::_classify_recovery`** now consults `Diagnosis.row_probe_warnings` *before* the schema-version checks. Non-empty warnings route to `reset_rebuild` (when `.bicameral/events/*.jsonl` is present) or `reset_destructive` (no events on disk) with a `next_action` that quotes the exact `bicameral_reset(wipe_mode="ledger", replay_from_events=…, confirm=True)` call.
- **`handlers/sync_middleware.py::ensure_ledger_synced`** re-raises `LedgerDeserializationError` instead of swallowing it at DEBUG. The broad `except Exception` is still in place for transient catch-up failures (the original "best-effort" contract); only deserialization errors break out, because they're the one class of failure the agent must surface to the user.

### Fixed

- **#301** — `bicameral.link_commit` no longer fails with a bare `LedgerError("SurrealDB rejected query: ...")` when `ledger_sync` rows can't be deserialized; the new exception class names the recovery path. `bicameral.diagnose` now classifies the same failure mode under `reset_rebuild` / `reset_destructive` so the agent's next_action is actionable.

### Notes for v0.14.x → v0.15.x upgraders hit by #301

The SurrealDB SDK pin (`surrealdb==2.0.0`) is unchanged between v0.14.x and v0.15.1; upgrading the package does NOT rewrite the persisted SurrealKV record format. An installation that hit the `Invalid revision \`3\` for type \`Value\`` error on v0.14.x will continue to see the same SDK-level deserialization failure on first `link_commit`. What changes in v0.15.1 is the *visibility*: the error now surfaces with an explicit recovery command instead of as a generic `LedgerError`, and `diagnose` correctly routes to a reset path. To recover, run `bicameral_reset(wipe_mode="ledger", replay_from_events=True, confirm=True)` (if `.bicameral/events/*.jsonl` is present) or `bicameral_reset(wipe_mode="ledger", confirm=True)` otherwise.

## v0.15.0 — PII archive, hard-delete `remove_decision`, schema v17→v24 chain, team-mode foundations

Cumulative release draining the dev → main backlog accumulated since v0.14.7. Lands the **#221 PII archive** (operator-erasable PII surface), retires the **soft-delete tombstone model** for `bicameral.remove_decision` (now hard-delete by default), brings the constant-time **`bicameral_meta.decision_revision` counter** (#87) into the preflight dedup path, ships **`bicameral.admin/query`** and **dashboard source view** (#278 Phase 1+3), wires the **`LocalDirectorySourceAdapter`** and **`sync-and-brief`** team-mode flows (#344, #279), and adds the **code-locator singleton + eager startup init** that moves index work off the MCP stdio handshake (#243, #380).
Expand Down
2 changes: 1 addition & 1 deletion RECOMMENDED_VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.15.0
0.15.1
20 changes: 19 additions & 1 deletion handlers/diagnose.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,8 +121,26 @@ def _classify_recovery(diagnosis) -> tuple[RecoveryPath, str]:
exp = diagnosis.schema_version_expected
has_events = _events_present(diagnosis.ledger_url)

if rec is not None and rec > exp:
# #301 — row-level deserialization warnings outrank the schema-version
# check. Schema can read clean (`schema_meta` is a tiny row, never the
# one that breaks) while the operational tables hit
# ``Invalid revision `N` for type Value``. The probe in
# cli/_diagnose_gather.py::_probe_row_deserialization catches that; here
# we route the verdict so the agent sees recovery_path=reset_rebuild
# instead of "clean, no remediation needed".
row_warnings: list[str] = list(getattr(diagnosis, "row_probe_warnings", []) or [])
if row_warnings:
path: RecoveryPath = "reset_rebuild" if has_events else "reset_destructive"
tables = ", ".join(sorted({w.split(":", 1)[0] for w in row_warnings}))
Comment on lines +133 to +134

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify duplicate typed declarations are removed (expect 1 match after fix).
rg -nP '\bpath:\s*RecoveryPath\b|\bwarning_path:\s*RecoveryPath\b' handlers/diagnose.py

Repository: BicameralAI/bicameral-mcp

Length of output: 245


🏁 Script executed:

sed -n '120,150p' handlers/diagnose.py | cat -n

Repository: BicameralAI/bicameral-mcp

Length of output: 1932


🏁 Script executed:

sed -n '133,141p' handlers/diagnose.py | cat -n

Repository: BicameralAI/bicameral-mcp

Length of output: 628


Resolve path redefinition to unblock mypy.

The typed assignment on line 133 conflicts with the typed assignment on line 143 (no-redef), causing CI failure. Both are separate conditional branches that independently define the same variable name. The proposed fix correctly renames line 133's variable to warning_path and updates the corresponding return statement, eliminating the redefinition while preserving the distinct logic for each recovery path.

🛠️ Proposed fix
-        path: RecoveryPath = "reset_rebuild" if has_events else "reset_destructive"
+        warning_path: RecoveryPath = "reset_rebuild" if has_events else "reset_destructive"
         tables = ", ".join(sorted({w.split(":", 1)[0] for w in row_warnings}))
-        return path, (
+        return warning_path, (
             f"Row-level deserialization warnings on {tables} — likely a "
             "SurrealDB embedded-SDK record-format mismatch. Run "
             f"`bicameral_reset(wipe_mode='ledger', replay_from_events={has_events}, "
             "confirm=True)` to wipe and replay from .bicameral/events/."
         )
🧰 Tools
🪛 GitHub Actions: Lint & Type Check / 0_ruff + mypy.txt

[error] mypy . failed with 1 error (no-redef). Checked 134 source files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@handlers/diagnose.py` around lines 133 - 134, Rename the variable currently
assigned as path in the branch that handles row_warnings to warning_path (e.g.,
change "path: RecoveryPath = ..." to "warning_path: RecoveryPath = ...") and
update any subsequent uses/return in that branch to return warning_path instead
of path so it no longer collides with the other branch's path variable; locate
this in the function handling recovery paths where has_events, row_warnings,
tables are computed and adjust the corresponding return statement(s) to
reference warning_path.

return path, (
f"Row-level deserialization warnings on {tables} — likely a "
"SurrealDB embedded-SDK record-format mismatch. Run "
f"`bicameral_reset(wipe_mode='ledger', replay_from_events={has_events}, "
"confirm=True)` to wipe and replay from .bicameral/events/."
)
Comment on lines +136 to +140

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make next_action text consistent with destructive recovery.

When has_events is False, the command correctly uses replay_from_events=False, but the sentence still says "wipe and replay from .bicameral/events/". That instruction is contradictory for the destructive path.

✏️ Proposed fix
         path: RecoveryPath = "reset_rebuild" if has_events else "reset_destructive"
         tables = ", ".join(sorted({w.split(":", 1)[0] for w in row_warnings}))
+        replay_text = (
+            "to wipe and replay from .bicameral/events/."
+            if has_events
+            else "to wipe the ledger (no replayable .bicameral/events/*.jsonl found)."
+        )
         return path, (
             f"Row-level deserialization warnings on {tables} — likely a "
             "SurrealDB embedded-SDK record-format mismatch. Run "
             f"`bicameral_reset(wipe_mode='ledger', replay_from_events={has_events}, "
-            "confirm=True)` to wipe and replay from .bicameral/events/."
+            f"confirm=True)` {replay_text}"
         )
🧰 Tools
🪛 GitHub Actions: Lint & Type Check / 0_ruff + mypy.txt

[error] mypy . failed with 1 error (no-redef). Checked 134 source files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@handlers/diagnose.py` around lines 136 - 140, The next_action string
currently instructs users to "wipe and replay from .bicameral/events/" even when
has_events is False; update the conditional message construction around
next_action (the code that formats the string with
replay_from_events={has_events}) so when has_events is False it does not mention
replaying from .bicameral/events (e.g., change the tail to "wipe only (no events
to replay)" or similar), otherwise keep the existing "wipe and replay from
.bicameral/events/" wording when has_events is True.


if rec is not None and rec > exp:
path = "reset_rebuild" if has_events else "reset_destructive"
return path, (
f"Ledger schema v{rec} is newer than this binary (v{exp}). "
f"Upgrade `bicameral-mcp` to a version that understands v{rec}, "
Expand Down
10 changes: 10 additions & 0 deletions handlers/sync_middleware.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
from datetime import UTC, datetime
from typing import TYPE_CHECKING

from ledger.client import LedgerDeserializationError

if TYPE_CHECKING:
from contracts import LinkCommitResponse, SessionStartBanner

Expand Down Expand Up @@ -288,6 +290,14 @@ async def ensure_ledger_synced(ctx) -> LinkCommitResponse | None:
_LAST_SYNCED_SHA = live_head
logger.debug("[sync_middleware] catch-up ran for %s", live_head[:8])
return result
except LedgerDeserializationError:
# #301 — row-format mismatch on ledger_sync / yields / etc. This is
# the *opposite* of a swallowable error: the ledger is in a state
# the user must repair before any tool will work. Re-raise so the
# MCP transport layer surfaces the exception (with embedded
# RECOVERY_HINT) to the agent instead of silently logging at DEBUG.
logger.warning("[sync_middleware] ledger row-format mismatch — surfacing to agent")
raise
except Exception as exc:
logger.debug("[sync_middleware] catch-up failed: %s", exc)
return None
59 changes: 59 additions & 0 deletions ledger/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,55 @@ class LedgerError(RuntimeError):
"""


class LedgerDeserializationError(LedgerError):
"""Raised when SurrealDB's embedded SDK can't decode a row on read (#301).

The on-disk SurrealKV record header carries a revision number that must
match the surrealdb-py deserializer. A mismatch surfaces as
``Invalid revision `N` for type `Value```. This is *below* the schema
layer — `init_schema`/`migrate` don't help because the migration code
can't read the row either. The fix is to wipe the affected cursor and
replay from the event log via
``bicameral_reset(wipe_mode="ledger", replay_from_events=True, confirm=True)``.

Subclass of ``LedgerError`` so existing ``except LedgerError`` handlers
catch it; callers that surface a recovery hint to the agent match on
``LedgerDeserializationError`` specifically.
"""

RECOVERY_HINT = (
"Row-level deserialization failed — likely a SurrealDB embedded SDK "
"revision mismatch on persisted rows. Run "
"`bicameral_reset(wipe_mode='ledger', replay_from_events=True, "
"confirm=True)` to wipe and replay from .bicameral/events/, or "
"`bicameral-mcp diagnose` for a full report."
)

def __init__(self, *, raw: str, sql_prefix: str) -> None:
self.raw = raw
self.sql_prefix = sql_prefix
super().__init__(
f"SurrealDB row deserialization failed: {raw}\n"
f"SQL: {sql_prefix}\n"
f"Recovery: {self.RECOVERY_HINT}"
)


_DESERIALIZATION_SIGNATURES = ("Invalid revision", "deserialization error")


def _is_deserialization_error(raw: str) -> bool:
"""Return True when ``raw`` looks like a SurrealKV row-format mismatch.

Matches the two signatures observed in #301 and the related ``yields``
incident: the SDK's ``Invalid revision `N` for type `T``` and the more
general ``A deserialization error occured`` wrapper. Both come back as
strings the caller sees as ``LedgerError`` today — this helper is the
classification seam.
"""
return any(sig in raw for sig in _DESERIALIZATION_SIGNATURES)


class LedgerTimeoutError(LedgerError):
"""Raised when a ledger query exceeds its wallclock timeout budget.

Expand Down Expand Up @@ -270,8 +319,13 @@ async def query(
try:
result = await self._run_with_timeout(sql, vars, timeout_class)
except SurrealError as exc:
msg = str(exc)
if _is_deserialization_error(msg):
raise LedgerDeserializationError(raw=msg, sql_prefix=sql[:300]) from exc
raise LedgerError(f"SurrealDB rejected query: {exc}\nSQL: {sql[:300]}") from exc
if isinstance(result, str):
if _is_deserialization_error(result):
raise LedgerDeserializationError(raw=result, sql_prefix=sql[:300])
raise LedgerError(f"SurrealDB rejected query: {result}\nSQL: {sql[:300]}")
return _normalize(result) if isinstance(result, list) else []

Expand All @@ -297,8 +351,13 @@ async def execute(
try:
result = await self._run_with_timeout(sql, vars, timeout_class)
except SurrealError as exc:
msg = str(exc)
if _is_deserialization_error(msg):
raise LedgerDeserializationError(raw=msg, sql_prefix=sql[:300]) from exc
raise LedgerError(f"SurrealDB rejected statement: {exc}\nSQL: {sql[:300]}") from exc
if isinstance(result, str):
if _is_deserialization_error(result):
raise LedgerDeserializationError(raw=result, sql_prefix=sql[:300])
raise LedgerError(f"SurrealDB rejected statement: {result}\nSQL: {sql[:300]}")

async def execute_many(
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "bicameral-mcp"
version = "0.15.0"
version = "0.15.1"
description = "Decision ledger MCP server — ingests meeting transcripts, maps decisions to code, tracks drift"
readme = "README.md"
requires-python = ">=3.10"
Expand Down
Loading
Loading