feat(cli): add assistant db repair with integrity-check step#32632
Conversation
Introduces a step-runner framework for the repair surface so future remediation passes (conversation backfill, memory consolidation, etc.) can be appended without restructuring the command. Each step produces a structured `StepResult` and the runner aggregates them into a `RepairReport` that renders as plain text or as JSON via `--json`. First step: `integrity-check` — runs `PRAGMA integrity_check` on a read-only handle. Full scan (not quick_check) because a user typing `repair` is opting in to a thorough probe. Healthy DBs report `ok` + page count. Damaged DBs surface the integrity_check rows verbatim, capped at 20 lines in human output (full list in --json). Severely-malformed DBs whose pragma throws before yielding rows are normalized into the same corruption signal, not flagged as a step bug. Also drops the "(read-only by default)" qualifier from the parent `db` description per review feedback on #32606 — no flag exists to flip the default, so the qualifier had no referent. Gateway risk registry: `db repair` registered as medium. First step is read-only; later steps will mutate. Smoke-tested on the live ~4 GB workspace DB (993 829 pages, 5m44s, no corruption). 11 unit tests pass covering healthy DB, two corrupt seed shapes, missing DB, --json shape, and the four runner semantics (sequential order, continue-on-error, halt-on-error, throw capture).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17459e48d9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| async function runIntegrityCheck(ctx: RepairContext): Promise<StepResult> { | ||
| return withDb<StepResult>( | ||
| () => new Database(ctx.dbPath, { readonly: true }), |
There was a problem hiding this comment.
Handle database open failures as repair errors
When the file exists but SQLite cannot open it at all (for example an unreadable/root-owned file or assistant.db being a directory), this constructor throws before the inner try around PRAGMA integrity_check, so the generic runner reports step threw an unexpected error — this is a bug instead of an actionable repair/open diagnostic. Since this recovery command is meant for damaged local databases and db status already treats open failures as user-facing errors, catch open failures here and return a normal status: "error" result.
Useful? React with 👍 / 👎.
| * 1. integrity check (this PR) | ||
| * 2. conversation backfill (next PR — replay /workspace/conversations | ||
| * into SQLite) | ||
| * 3. … more to come (memory consolidation, lost-and-found triage, etc.) |
There was a problem hiding this comment.
Delete references to pr or the future in comments
…ilure catch (#32642) * feat(cli): assistant db repair — conversation-backfill step + integrity-step open-failure catch Adds the second step to the `assistant db repair` sequence — replays `<workspace>/conversations/<id>/{meta.json,messages.jsonl}` into the SQLite conversations/messages tables so a wiped or restored-from-old- backup database can be rebuilt from the on-disk view. Architecture: the recovery body lives in a new shared module `workspace/recovery/conversations-from-disk.ts` that takes a drizzle handle + workspace dir and returns `{ recovered, skipped, errors, warnings }`. Two callers consume it: 1. workspace migration 028 — runs once at startup against the daemon's global `getDb()` (rewritten from 271 → 46 lines, delegates to the shared function) 2. `db repair` conversation-backfill step — opens its own RW bun:sqlite handle with the same pragmas as the daemon, wraps it in drizzle, calls the shared function Idempotent: the per-conversation existence check guards both call sites. Malformed `meta.json` / `messages.jsonl` lines are skipped with warnings (capped at 20 in human output, full list in --json up to a 500-entry memory cap). Two follow-ups from PR #32632 review folded in: - Vargas: dropped `(this PR)` / `(next PR)` / `(future)` PR- chronology callouts from `repair-steps.ts` and `repair.ts` module docs and from the `STEPS` comment. Rewritten to describe the abstraction (sequence of steps, append by extending the array) rather than the timeline. Codified in the software-engineering skill's `comments.md` as a lesson entry. - Codex P2: `integrity-check` step now catches `new Database(…)` failures (file-is-a-directory, unreadable file, header so broken SQLite refuses to attach) and surfaces them as a structured `status: "error"` with `data.openFailed: true` rather than letting the runner's generic "this is a bug" fallback eat it. Tests: 16 unit tests in `repair.test.ts` (11 carried, 5 new — 1 open-failure + 4 backfill: disk-only convo backfills + verifies SQLite rows, idempotency on second run, empty-conversations-dir nothing-to- backfill summary, malformed meta.json surfaced as a warning without erroring the step). Migration 028's 10 tests all still pass against the refactored delegator. Smoke-tested on the live ~4 GB workspace DB: [1/2] integrity-check — ok no corruption detected (40.1s) scanned 993,829 pages [2/2] conversation-backfill — ok nothing to backfill (773 on-disk conversations already present) (1.5s) Done. 2 steps ran: 2 ok, 0 failed real 0m42.483s * refactor: don't share recovery logic between migration 028 and db repair Reverts the `workspace/recovery/conversations-from-disk.ts` shared module + the migration 028 delegator collapse. Migration 028 is back to its original 271-line form (unchanged from main); the repair step gets its own self-contained copy inlined into `repair-step-conversation-backfill.ts`. Migrations are frozen historical snapshots. Sharing live code between a migration and an evolving CLI command risks changing the migration's behavior on workspaces that have already run it. The two consumers should be free to drift — bug fixes or schema changes in the repair step shouldn't retroactively alter what migration 028 does. --------- Co-authored-by: vellum-apollo-bot[bot] <242025090+vellum-apollo-bot[bot]@users.noreply.github.com>
What
Adds
assistant db repair— the second piece of theassistant dbrecovery surface, after
db status(#32606).This PR ships a step-runner framework + the first repair step
(integrity check). Subsequent PRs append more steps to the sequence;
the wiring here is structured so they don't need to touch anything
besides adding their step to the
STEPSarray.What this command does
Walks every page in the assistant SQLite database and reports
corruption.
Runs
PRAGMA integrity_checkon a read-only handle. Full scan (notquick_check) — a user typingrepairis opting in to a thoroughprobe.
Output paths:
at 20 lines in human mode; full list in
--json)exit 1, normalized into the same corruption signal rather than
being flagged as a step bug
backup list--json→ singleRepairReportpayload withdbPath,okCount,errorCount,halted, and per-stepresult.datafor scriptingStep-runner framework
Each step is a
RepairStepwithname,description, andrun(ctx).The runner:
StepResult(okorerror) for each stepDB shouldn't block conversation backfill from disk)
halt: truesynthetic
errorresult with detail "step threw an unexpectederror — this is a bug"
durationMsper stepThat gives PR 3 (conversation backfill) and beyond a stable interface:
add a
RepairSteptoSTEPSand the runner does the rest.Risk
assistant db repairregistered asmediumin the gateway riskregistry. The integrity-check step is strictly read-only, but the
description is accurate because future steps in the sequence will
mutate the database to recover state. Approving the path once gates
the whole future surface.
Description fix (review feedback on #32606)
@dvargasfuertes called out that the parent
dbdescription's"(read-only by default)" qualifier had no referent — there's no flag
to flip it writable, and the parent has both read-only (
status) andmutating (
repair) subcommands. Dropped the qualifier; description isnow just
"Inspect and repair the assistant SQLite database". Lessonabsorbed into the software-engineering skill's
cli-design.md.Testing
11 unit tests, all using real
bun:sqlitedatabases in tmp dirs (theintegrity check needs to walk actual pages — mocking would defeat the
point):
--jsonshape correct (steps, okCount, durationMs)--jsoncarries full error list--jsoncarriesmissing: truehalt: truedurationMsSmoke-tested against the live ~4 GB workspace DB:
5m44s wall-clock for ~994K pages on a ~4 GB DB. Acceptable for a
command the user only runs when they suspect trouble.
--jsonand missing-DB paths verified manually too.What's next
PR 3:
conversation-backfillstep that replays/workspace/conversationsinto SQLite. Reuses the logic frommigration 028 (already does meta + jsonl → SQLite with
conversation-level idempotency). Goal: a fresh DB after a wipe can be
rebuilt from the on-disk view.
After PR 3: write up the remaining ~95 tables and propose other
recovery steps (memory consolidation, lost-and-found triage, etc.).