feat(cli): assistant db repair — conversation-backfill step + open-failure catch#32642
Conversation
…ty-step open-failure catch
Adds the second step to the `assistant db repair` sequence — replays
`<workspace>/conversations/<id>/{meta.json,messages.jsonl}` into the
SQLite conversations/messages tables so a wiped or restored-from-old-
backup database can be rebuilt from the on-disk view.
Architecture: the recovery body lives in a new shared module
`workspace/recovery/conversations-from-disk.ts` that takes a drizzle
handle + workspace dir and returns `{ recovered, skipped, errors,
warnings }`. Two callers consume it:
1. workspace migration 028 — runs once at startup against the daemon's
global `getDb()` (rewritten from 271 → 46 lines, delegates to the
shared function)
2. `db repair` conversation-backfill step — opens its own RW
bun:sqlite handle with the same pragmas as the daemon, wraps it in
drizzle, calls the shared function
Idempotent: the per-conversation existence check guards both call
sites. Malformed `meta.json` / `messages.jsonl` lines are skipped
with warnings (capped at 20 in human output, full list in --json up
to a 500-entry memory cap).
Two follow-ups from PR #32632 review folded in:
- Vargas: dropped `(this PR)` / `(next PR)` / `(future)` PR-
chronology callouts from `repair-steps.ts` and `repair.ts`
module docs and from the `STEPS` comment. Rewritten to describe
the abstraction (sequence of steps, append by extending the array)
rather than the timeline. Codified in the software-engineering
skill's `comments.md` as a lesson entry.
- Codex P2: `integrity-check` step now catches `new Database(…)`
failures (file-is-a-directory, unreadable file, header so broken
SQLite refuses to attach) and surfaces them as a structured
`status: "error"` with `data.openFailed: true` rather than
letting the runner's generic "this is a bug" fallback eat it.
Tests: 16 unit tests in `repair.test.ts` (11 carried, 5 new — 1
open-failure + 4 backfill: disk-only convo backfills + verifies SQLite
rows, idempotency on second run, empty-conversations-dir nothing-to-
backfill summary, malformed meta.json surfaced as a warning without
erroring the step). Migration 028's 10 tests all still pass against
the refactored delegator.
Smoke-tested on the live ~4 GB workspace DB:
[1/2] integrity-check — ok no corruption detected (40.1s)
scanned 993,829 pages
[2/2] conversation-backfill — ok nothing to backfill
(773 on-disk conversations already present) (1.5s)
Done. 2 steps ran: 2 ok, 0 failed
real 0m42.483s
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87c1265e12
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| import { getDb } from "../../memory/db-connection.js"; | ||
| import { conversations, messages } from "../../memory/schema/conversations.js"; | ||
| import { getLogger } from "../../util/logger.js"; | ||
| import { recoverConversationsFromDisk } from "../recovery/conversations-from-disk.js"; |
There was a problem hiding this comment.
Keep migration 028 self-contained
assistant/src/workspace/migrations/AGENTS.md requires each workspace migration to be fully self-contained and not import shared modules outside ./types.js, ./utils.js, the logger, and runtime built-ins. Pulling the recovery body from ../recovery/conversations-from-disk.js makes this already-shipped migration depend on mutable repair code, so later fixes to the CLI path can silently change what migration 028 does on fresh installs or reruns, violating the append-only migration contract.
Useful? React with 👍 / 👎.
| const existing = db | ||
| .select() | ||
| .from(conversations) | ||
| .where(eq(conversations.id, meta.id)) | ||
| .get(); |
There was a problem hiding this comment.
Handle read failures during backfill
When the DB file opens but the conversations table cannot be read (for example a corrupt assistant.db with on-disk conversation directories present), this existence check throws outside the insert-level try/catch. The step runner then reports step threw an unexpected error — this is a bug instead of a structured repair error, so the new repair path gives users a false tool-bug diagnostic precisely in a corruption scenario it is meant to handle.
Useful? React with 👍 / 👎.
Reverts the `workspace/recovery/conversations-from-disk.ts` shared module + the migration 028 delegator collapse. Migration 028 is back to its original 271-line form (unchanged from main); the repair step gets its own self-contained copy inlined into `repair-step-conversation-backfill.ts`. Migrations are frozen historical snapshots. Sharing live code between a migration and an evolving CLI command risks changing the migration's behavior on workspaces that have already run it. The two consumers should be free to drift — bug fixes or schema changes in the repair step shouldn't retroactively alter what migration 028 does.
Second step in the
assistant db repairsequence (third PR in the recovery workstream after #32606 and #32632).What it does
Replays the on-disk view at
<workspace>/conversations/<id>/{meta.json,messages.jsonl}back into the SQLiteconversationsandmessagestables. The on-disk files are written by the runtime as the source of truth for the disk view; if SQLite was wiped, restored from an old backup, or otherwise lost rows, this step rebuilds them.The recovery body is a new shared module at
assistant/src/workspace/recovery/conversations-from-disk.ts. Two callers consume it:getDb(). Rewritten from 271 → 46 lines, delegating to the shared function. All 10 of its existing tests still pass.db repairconversation-backfill step — opens its own RWbun:sqlitehandle (mirroring the daemon's pragmas: WAL, synchronous=FULL, busy_timeout=5000, foreign_keys=ON), wraps it in drizzle, calls the shared function. Stays local-transport so the command works when the daemon is down.Idempotency is enforced by the per-conversation existence check inside the shared function. Malformed
meta.jsonormessages.jsonllines are skipped with warnings rather than crashing the step. Warning output is capped at 20 lines in human mode (full list in--json, hard cap of 500 entries in memory).Two follow-ups from #32632 review folded in
Vargas — drop PR/future references from comments
The
repair-steps.tsmodule doc,repair.tsmodule doc, and theSTEPScomment all listed the step sequence with(this PR),(next PR),(future)annotations naming what hadn't shipped yet. Rewritten to describe the abstraction (sequence of steps, append by extending the array) without enumerating future entries.Codified the broader rule in the software-engineering skill's
comments.md: future-narration is just as bad as history-narration — both belong in the PR thread, not the code.Codex P2 — handle DB open failures as repair errors
The
integrity-checkstep'snew Database(ctx.dbPath, { readonly: true })could throw when the file is unreadable, owned by root, orassistant.dbis a directory. The throw escaped the innertryaroundPRAGMA integrity_check, so the runner's generic fallback reported "step threw an unexpected error — this is a bug" instead of an actionable open diagnostic.Now wrapped in a top-level
try/catchthat returns a structuredstatus: "error"withsummary: "could not open database for integrity check",data.openFailed: true, and the SQLite error message as a detail line. The synthetic-bug fallback is reserved for genuine unexpected throws.Tests
16 unit tests in
repair.test.ts(11 carried forward, 5 new):data.openFailed: true, "could not open" summary, NOT flagged as a bug.meta.json+messages.jsonlon disk only, runs repair, verifies the conversation row + message count landed in SQLite via direct query.recovered: 0, skipped: 1.okstatus.meta.json— surfaced as a warning, step staysok, skip counter increments.Migration 028's 10 existing tests all still pass against the refactored delegator.
Smoke test on the live ~4 GB workspace DB
Backfill correctly found 773 on-disk conversations all already present in SQLite (the daemon writes both paths on every conversation), no recovery needed.
Files
assistant/src/workspace/recovery/conversations-from-disk.ts— NEW shared core (~290 lines)assistant/src/workspace/migrations/028-recover-conversations-from-disk-view.ts— 271 → 46 lines, delegatesassistant/src/cli/commands/db/repair-step-conversation-backfill.ts— NEW step (~135 lines)assistant/src/cli/commands/db/repair-step-integrity.ts— open-failure catch addedassistant/src/cli/commands/db/repair-steps.ts— module doc cleanedassistant/src/cli/commands/db/repair.ts— module doc cleaned,conversationBackfillStepadded toSTEPSassistant/src/cli/commands/db/__tests__/repair.test.ts— 5 new tests