fix(ledger): route ledger_sync deserialization warnings to wipe-and-replay recovery (#301)#403
Conversation
…eplay recovery (#301) v0.15.0 added the row-level probe (cli/_diagnose_gather.py::_probe_row_deserialization) but its findings stopped at the suggestions list — _classify_recovery still inspected only schema_meta.version, so an agent that ran diagnose after a link_commit failure saw recovery_path=clean / "No remediation needed" while the ledger was actually unreadable. This wires the probe through: - New LedgerDeserializationError (subclass of LedgerError) is raised from ledger.client.query/execute when SurrealDB returns "Invalid revision \`N\` for type \`Value\`" or a "deserialization error" wrapper. The exception message embeds the recovery command so the agent sees the wipe-and-replay instruction inside the MCP error envelope. - handlers/diagnose.py::_classify_recovery consults row_probe_warnings before the schema-version checks and routes to reset_rebuild / reset_destructive with a quoted bicameral_reset(...) next_action. - handlers/sync_middleware.py::ensure_ledger_synced re-raises LedgerDeserializationError instead of swallowing it at DEBUG. The broad except Exception still catches transient catch-up failures. The SurrealDB SDK pin is unchanged — v0.14.x users hit by #301 still need to wipe and replay; this PR makes the recovery path discoverable instead of leaving them with a bare LedgerError. Bumps version → 0.15.1 and RECOMMENDED_VERSION → 0.15.1. Closes #301 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThis PR fixes issue ChangesDeserialization Error Detection & Recovery Routing
Sequence DiagramsequenceDiagram
participant Agent as Agent/MCP
participant Middleware as ensure_ledger_synced
participant Ledger as LedgerClient
participant Diagnose as _classify_recovery
Agent->>Middleware: sync request
Middleware->>Ledger: query/execute (HEAD catch-up)
Ledger-->>Middleware: SurrealError (deserialization)
Middleware->>Middleware: detect LedgerDeserializationError
Middleware-->>Agent: re-raise to transport
Agent->>Diagnose: call diagnose()
Diagnose->>Diagnose: check row_probe_warnings
Diagnose-->>Agent: recovery_path: reset_rebuild/destructive
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
… no-redef `path` is annotated at line 133 (the new #301 row_probe_warnings branch) and also at line 143 (existing schema-newer-than-binary branch). Same scope → same name → mypy no-redef. Drop the later annotation; type is unchanged because the literal still narrows to `RecoveryPath`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_ledger_sync_deserialization_recovery_301.py (1)
74-108: 💤 Low valueConsider adding schema initialization for guideline alignment.
The coding guideline states: "For ledger query tests, never
MagicMockthe client; use the realLedgerClient(url="memory://", ...)+init_schema+migrate". These tests correctly use the real client but skip schema initialization. While not strictly necessary for narrow seam tests that patch_db.query(the schema is never queried), addinginit_schemaandmigratewould improve consistency with the guideline and make the test setup more realistic.♻️ Example for test_query_raises_deserialization_error_when_surrealdb_complains
async def test_query_raises_deserialization_error_when_surrealdb_complains(): """The classifier triggers on a real LedgerClient.query() path. Narrow seam: we patch the surrealdb-py async call so it raises ``SurrealError("Invalid revision ...")`` — this is the documented failure mode for SurrealKV record-format drift and cannot be triggered naturally against ``memory://``. """ client = LedgerClient(url="memory://", ns="t301_q", db="ledger_test") await client.connect() + await init_schema(client) + await migrate(client) try:Apply the same pattern to
test_query_with_non_deserialization_error_still_raises_plain_ledger_error.As per coding guidelines: "For ledger query tests, never
MagicMockthe client; use the realLedgerClient(url="memory://", ...)+init_schema+migrate".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_ledger_sync_deserialization_recovery_301.py` around lines 74 - 108, Add schema initialization to both tests by calling the real LedgerClient's init_schema and migrate before exercising the patched query: after await client.connect() invoke await client.init_schema() and await client.migrate() in test_query_raises_deserialization_error_when_surrealdb_complains and test_query_with_non_deserialization_error_still_raises_plain_ledger_error so the in-memory client is set up per guidelines while keeping the existing patching of client._db.query and existing asserts intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@handlers/diagnose.py`:
- Around line 136-140: The next_action string currently instructs users to "wipe
and replay from .bicameral/events/" even when has_events is False; update the
conditional message construction around next_action (the code that formats the
string with replay_from_events={has_events}) so when has_events is False it does
not mention replaying from .bicameral/events (e.g., change the tail to "wipe
only (no events to replay)" or similar), otherwise keep the existing "wipe and
replay from .bicameral/events/" wording when has_events is True.
- Around line 133-134: Rename the variable currently assigned as path in the
branch that handles row_warnings to warning_path (e.g., change "path:
RecoveryPath = ..." to "warning_path: RecoveryPath = ...") and update any
subsequent uses/return in that branch to return warning_path instead of path so
it no longer collides with the other branch's path variable; locate this in the
function handling recovery paths where has_events, row_warnings, tables are
computed and adjust the corresponding return statement(s) to reference
warning_path.
---
Nitpick comments:
In `@tests/test_ledger_sync_deserialization_recovery_301.py`:
- Around line 74-108: Add schema initialization to both tests by calling the
real LedgerClient's init_schema and migrate before exercising the patched query:
after await client.connect() invoke await client.init_schema() and await
client.migrate() in
test_query_raises_deserialization_error_when_surrealdb_complains and
test_query_with_non_deserialization_error_still_raises_plain_ledger_error so the
in-memory client is set up per guidelines while keeping the existing patching of
client._db.query and existing asserts intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f0ecdd36-bc22-49e2-89a6-fe9f79b17ea8
📒 Files selected for processing (7)
CHANGELOG.mdRECOMMENDED_VERSIONhandlers/diagnose.pyhandlers/sync_middleware.pyledger/client.pypyproject.tomltests/test_ledger_sync_deserialization_recovery_301.py
| path: RecoveryPath = "reset_rebuild" if has_events else "reset_destructive" | ||
| tables = ", ".join(sorted({w.split(":", 1)[0] for w in row_warnings})) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify duplicate typed declarations are removed (expect 1 match after fix).
rg -nP '\bpath:\s*RecoveryPath\b|\bwarning_path:\s*RecoveryPath\b' handlers/diagnose.pyRepository: BicameralAI/bicameral-mcp
Length of output: 245
🏁 Script executed:
sed -n '120,150p' handlers/diagnose.py | cat -nRepository: BicameralAI/bicameral-mcp
Length of output: 1932
🏁 Script executed:
sed -n '133,141p' handlers/diagnose.py | cat -nRepository: BicameralAI/bicameral-mcp
Length of output: 628
Resolve path redefinition to unblock mypy.
The typed assignment on line 133 conflicts with the typed assignment on line 143 (no-redef), causing CI failure. Both are separate conditional branches that independently define the same variable name. The proposed fix correctly renames line 133's variable to warning_path and updates the corresponding return statement, eliminating the redefinition while preserving the distinct logic for each recovery path.
🛠️ Proposed fix
- path: RecoveryPath = "reset_rebuild" if has_events else "reset_destructive"
+ warning_path: RecoveryPath = "reset_rebuild" if has_events else "reset_destructive"
tables = ", ".join(sorted({w.split(":", 1)[0] for w in row_warnings}))
- return path, (
+ return warning_path, (
f"Row-level deserialization warnings on {tables} — likely a "
"SurrealDB embedded-SDK record-format mismatch. Run "
f"`bicameral_reset(wipe_mode='ledger', replay_from_events={has_events}, "
"confirm=True)` to wipe and replay from .bicameral/events/."
)🧰 Tools
🪛 GitHub Actions: Lint & Type Check / 0_ruff + mypy.txt
[error] mypy . failed with 1 error (no-redef). Checked 134 source files.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@handlers/diagnose.py` around lines 133 - 134, Rename the variable currently
assigned as path in the branch that handles row_warnings to warning_path (e.g.,
change "path: RecoveryPath = ..." to "warning_path: RecoveryPath = ...") and
update any subsequent uses/return in that branch to return warning_path instead
of path so it no longer collides with the other branch's path variable; locate
this in the function handling recovery paths where has_events, row_warnings,
tables are computed and adjust the corresponding return statement(s) to
reference warning_path.
| f"Row-level deserialization warnings on {tables} — likely a " | ||
| "SurrealDB embedded-SDK record-format mismatch. Run " | ||
| f"`bicameral_reset(wipe_mode='ledger', replay_from_events={has_events}, " | ||
| "confirm=True)` to wipe and replay from .bicameral/events/." | ||
| ) |
There was a problem hiding this comment.
Make next_action text consistent with destructive recovery.
When has_events is False, the command correctly uses replay_from_events=False, but the sentence still says "wipe and replay from .bicameral/events/". That instruction is contradictory for the destructive path.
✏️ Proposed fix
path: RecoveryPath = "reset_rebuild" if has_events else "reset_destructive"
tables = ", ".join(sorted({w.split(":", 1)[0] for w in row_warnings}))
+ replay_text = (
+ "to wipe and replay from .bicameral/events/."
+ if has_events
+ else "to wipe the ledger (no replayable .bicameral/events/*.jsonl found)."
+ )
return path, (
f"Row-level deserialization warnings on {tables} — likely a "
"SurrealDB embedded-SDK record-format mismatch. Run "
f"`bicameral_reset(wipe_mode='ledger', replay_from_events={has_events}, "
- "confirm=True)` to wipe and replay from .bicameral/events/."
+ f"confirm=True)` {replay_text}"
)🧰 Tools
🪛 GitHub Actions: Lint & Type Check / 0_ruff + mypy.txt
[error] mypy . failed with 1 error (no-redef). Checked 134 source files.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@handlers/diagnose.py` around lines 136 - 140, The next_action string
currently instructs users to "wipe and replay from .bicameral/events/" even when
has_events is False; update the conditional message construction around
next_action (the code that formats the string with
replay_from_events={has_events}) so when has_events is False it does not mention
replaying from .bicameral/events (e.g., change the tail to "wipe only (no events
to replay)" or similar), otherwise keep the existing "wipe and replay from
.bicameral/events/" wording when has_events is True.
Why
#301 —
bicameral.link_commitfails withSurrealDB rejected query: Versioned error: A deserialization error occured: Invalid revision \3` for type `Value`, and the agent's natural recovery instinct (runbicameral.diagnose) returnsrecovery_path: clean/next_action: "Ledger is at expected schema v17. No remediation needed."`. The user is stuck.v0.15.0 already added a row-level probe (
cli/_diagnose_gather.py::_probe_row_deserialization) that catches this failure mode and writes the warning intoDiagnosis.row_probe_warnings+ thesuggestionslist. But the probe's findings stopped there —handlers/diagnose.py::_classify_recoveryonly inspectsschema_meta.version, so therecovery_pathenum (which the agent's skill text branches on) staysclean. Andhandlers/sync_middleware.py::ensure_ledger_syncedswallows the underlyingLedgerErrorat DEBUG, so the agent never even sees the error message from the sync attempt.Net effect on v0.15.0: the probe runs but its verdict doesn't reach the agent's decision surface.
What
LedgerDeserializationError(subclass ofLedgerError) —ledger.client.query/ledger.client.executeraise it instead of the generic class when SurrealDB returns a record-format mismatch. The exception message embeds the recovery command (bicameral_reset(wipe_mode='ledger', replay_from_events=True, confirm=True)), so the agent sees the wipe-and-replay instruction inside the MCP error envelope._classify_recoverynow consultsDiagnosis.row_probe_warningsbefore the schema-version checks. Non-empty warnings route toreset_rebuild(when.bicameral/events/*.jsonlis present next to the ledger) orreset_destructive(no events on disk) with anext_actionthat quotes the exactbicameral_reset(...)call.ensure_ledger_syncedre-raisesLedgerDeserializationErrorinstead of swallowing it at DEBUG. The broad `except Exception` is still in place for transient catch-up failures (the original best-effort contract); only deserialization errors break out, because they're the one class of failure the agent must surface to the user.pyproject.toml,RECOMMENDED_VERSION).Out of scope
surrealdb==2.0.0) stays — bumping it would require a separate schema + format-migration story.bicameral_resetwithconfirm=True). The hotfix only makes the recovery path discoverable.ledger_syncrows persisted by prior versions. Affected users still run the reset command.cli/_diagnose_gather.py::_probe_row_deserializationis untouched — it already exists from v0.15.0 (commit 72bbd20).Acceptance
tests/test_ledger_sync_deserialization_recovery_301.py— 13 sociable tests covering (a) classification ofInvalid revision/deserialization errorsubstrings, (b)_classify_recoveryrouting onrow_probe_warnings(reset_rebuild w/ events, reset_destructive w/o), (c)ensure_ledger_syncedre-raises the new class but still swallows unrelatedRuntimeErrors, (d) the new exception is a subclass ofLedgerErrorso existing handler blocks still catch it.LedgerDeserializationError.RECOVERY_HINTmentionsbicameral_resetandreplay_from_events=True.handlers/diagnose.py::_classify_recoveryreturnsreset_rebuildorreset_destructive(notclean) whendiagnosis.row_probe_warningsis non-empty, even ifschema_recorded == schema_expected.Closes #301
🤖 Generated with Claude Code
Summary by CodeRabbit
v0.15.1 Release Notes
Bug Fixes
Tests