feat(cli): add `assistant db repair` with integrity-check step by vellum-apollo-bot[bot] · Pull Request #32632 · vellum-ai/vellum-assistant

vellum-apollo-bot · 2026-05-29T21:57:15Z

What

Adds assistant db repair — the second piece of the assistant db
recovery surface, after db status (#32606).

This PR ships a step-runner framework + the first repair step
(integrity check). Subsequent PRs append more steps to the sequence;
the wiring here is structured so they don't need to touch anything
besides adding their step to the STEPS array.

What this command does

Walks every page in the assistant SQLite database and reports
corruption.

$ assistant db repair
[1/1] integrity-check — starting
        Walk every database page and verify b-tree consistency
[1/1] integrity-check — ok  no corruption detected  (340.6s)
        scanned 993,829 pages

Done. 1 step ran: 1 ok, 0 failed

Runs PRAGMA integrity_check on a read-only handle. Full scan (not
quick_check) — a user typing repair is opting in to a thorough
probe.

Output paths:

Healthy DB → exit 0, page count detail line
Corrupt DB → exit 1, integrity violations surfaced verbatim (capped
at 20 lines in human mode; full list in --json)
Severely malformed DB (pragma throws before yielding rows) →
exit 1, normalized into the same corruption signal rather than
being flagged as a step bug
Missing DB → exit 1, loud stderr error pointing at backup list
--json → single RepairReport payload with dbPath, okCount,
errorCount, halted, and per-step result.data for scripting

Step-runner framework

Each step is a RepairStep with name, description, and run(ctx).
The runner:

Runs steps sequentially (later steps may depend on earlier ones)
Captures StepResult (ok or error) for each step
Continues past non-halting failures (best-effort repair — a corrupt
DB shouldn't block conversation backfill from disk)
Stops the sequence only when a step explicitly returns halt: true
Never throws — uncaught errors from a step are captured as a
synthetic error result with detail "step threw an unexpected
error — this is a bug"
Records durationMs per step

That gives PR 3 (conversation backfill) and beyond a stable interface:
add a RepairStep to STEPS and the runner does the rest.

Risk

assistant db repair registered as medium in the gateway risk
registry. The integrity-check step is strictly read-only, but the
description is accurate because future steps in the sequence will
mutate the database to recover state. Approving the path once gates
the whole future surface.

Description fix (review feedback on #32606)

@dvargasfuertes called out that the parent db description's
"(read-only by default)" qualifier had no referent — there's no flag
to flip it writable, and the parent has both read-only (status) and
mutating (repair) subcommands. Dropped the qualifier; description is
now just "Inspect and repair the assistant SQLite database". Lesson
absorbed into the software-engineering skill's cli-design.md.

Testing

11 unit tests, all using real bun:sqlite databases in tmp dirs (the
integrity check needs to walk actual pages — mocking would defeat the
point):

Healthy DB → integrity check passes, exit 0
Healthy DB → --json shape correct (steps, okCount, durationMs)
Corrupt DB → corruption surfaced, exit 1, not flagged as a bug
Corrupt DB → --json carries full error list
Missing DB → loud stderr, exit 1
Missing DB → --json carries missing: true
Runner: sequential order
Runner: continues past non-halting failures
Runner: stops on halt: true
Runner: captures thrown errors as synthetic error results
Runner: records non-negative durationMs

Smoke-tested against the live ~4 GB workspace DB:

[1/1] integrity-check — ok  no corruption detected  (340.6s)
        scanned 993,829 pages
Done. 1 step ran: 1 ok, 0 failed

5m44s wall-clock for ~994K pages on a ~4 GB DB. Acceptable for a
command the user only runs when they suspect trouble.

--json and missing-DB paths verified manually too.

What's next

PR 3: conversation-backfill step that replays
/workspace/conversations into SQLite. Reuses the logic from
migration 028 (already does meta + jsonl → SQLite with
conversation-level idempotency). Goal: a fresh DB after a wipe can be
rebuilt from the on-disk view.

After PR 3: write up the remaining ~95 tables and propose other
recovery steps (memory consolidation, lost-and-found triage, etc.).

Introduces a step-runner framework for the repair surface so future remediation passes (conversation backfill, memory consolidation, etc.) can be appended without restructuring the command. Each step produces a structured `StepResult` and the runner aggregates them into a `RepairReport` that renders as plain text or as JSON via `--json`. First step: `integrity-check` — runs `PRAGMA integrity_check` on a read-only handle. Full scan (not quick_check) because a user typing `repair` is opting in to a thorough probe. Healthy DBs report `ok` + page count. Damaged DBs surface the integrity_check rows verbatim, capped at 20 lines in human output (full list in --json). Severely-malformed DBs whose pragma throws before yielding rows are normalized into the same corruption signal, not flagged as a step bug. Also drops the "(read-only by default)" qualifier from the parent `db` description per review feedback on #32606 — no flag exists to flip the default, so the qualifier had no referent. Gateway risk registry: `db repair` registered as medium. First step is read-only; later steps will mutate. Smoke-tested on the live ~4 GB workspace DB (993 829 pages, 5m44s, no corruption). 11 unit tests pass covering healthy DB, two corrupt seed shapes, missing DB, --json shape, and the four runner semantics (sequential order, continue-on-error, halt-on-error, throw capture).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17459e48d9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T22:00:03Z

+
+async function runIntegrityCheck(ctx: RepairContext): Promise<StepResult> {
+  return withDb<StepResult>(
+    () => new Database(ctx.dbPath, { readonly: true }),


Handle database open failures as repair errors

When the file exists but SQLite cannot open it at all (for example an unreadable/root-owned file or assistant.db being a directory), this constructor throws before the inner try around PRAGMA integrity_check, so the generic runner reports step threw an unexpected error — this is a bug instead of an actionable repair/open diagnostic. Since this recovery command is meant for damaged local databases and db status already treats open failures as user-facing errors, catch open failures here and return a normal status: "error" result.

Useful? React with 👍 / 👎.

dvargasfuertes · 2026-05-29T22:14:24Z

+ *   1. integrity check       (this PR)
+ *   2. conversation backfill (next PR — replay /workspace/conversations
+ *      into SQLite)
+ *   3. … more to come (memory consolidation, lost-and-found triage, etc.)


Delete references to pr or the future in comments

…ilure catch (#32642) * feat(cli): assistant db repair — conversation-backfill step + integrity-step open-failure catch Adds the second step to the `assistant db repair` sequence — replays `<workspace>/conversations/<id>/{meta.json,messages.jsonl}` into the SQLite conversations/messages tables so a wiped or restored-from-old- backup database can be rebuilt from the on-disk view. Architecture: the recovery body lives in a new shared module `workspace/recovery/conversations-from-disk.ts` that takes a drizzle handle + workspace dir and returns `{ recovered, skipped, errors, warnings }`. Two callers consume it: 1. workspace migration 028 — runs once at startup against the daemon's global `getDb()` (rewritten from 271 → 46 lines, delegates to the shared function) 2. `db repair` conversation-backfill step — opens its own RW bun:sqlite handle with the same pragmas as the daemon, wraps it in drizzle, calls the shared function Idempotent: the per-conversation existence check guards both call sites. Malformed `meta.json` / `messages.jsonl` lines are skipped with warnings (capped at 20 in human output, full list in --json up to a 500-entry memory cap). Two follow-ups from PR #32632 review folded in: - Vargas: dropped `(this PR)` / `(next PR)` / `(future)` PR- chronology callouts from `repair-steps.ts` and `repair.ts` module docs and from the `STEPS` comment. Rewritten to describe the abstraction (sequence of steps, append by extending the array) rather than the timeline. Codified in the software-engineering skill's `comments.md` as a lesson entry. - Codex P2: `integrity-check` step now catches `new Database(…)` failures (file-is-a-directory, unreadable file, header so broken SQLite refuses to attach) and surfaces them as a structured `status: "error"` with `data.openFailed: true` rather than letting the runner's generic "this is a bug" fallback eat it. Tests: 16 unit tests in `repair.test.ts` (11 carried, 5 new — 1 open-failure + 4 backfill: disk-only convo backfills + verifies SQLite rows, idempotency on second run, empty-conversations-dir nothing-to- backfill summary, malformed meta.json surfaced as a warning without erroring the step). Migration 028's 10 tests all still pass against the refactored delegator. Smoke-tested on the live ~4 GB workspace DB: [1/2] integrity-check — ok no corruption detected (40.1s) scanned 993,829 pages [2/2] conversation-backfill — ok nothing to backfill (773 on-disk conversations already present) (1.5s) Done. 2 steps ran: 2 ok, 0 failed real 0m42.483s * refactor: don't share recovery logic between migration 028 and db repair Reverts the `workspace/recovery/conversations-from-disk.ts` shared module + the migration 028 delegator collapse. Migration 028 is back to its original 271-line form (unchanged from main); the repair step gets its own self-contained copy inlined into `repair-step-conversation-backfill.ts`. Migrations are frozen historical snapshots. Sharing live code between a migration and an evolving CLI command risks changing the migration's behavior on workspaces that have already run it. The two consumers should be free to drift — bug fixes or schema changes in the repair step shouldn't retroactively alter what migration 028 does. --------- Co-authored-by: vellum-apollo-bot[bot] <242025090+vellum-apollo-bot[bot]@users.noreply.github.com>

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

dvargasfuertes approved these changes May 29, 2026

View reviewed changes

dvargasfuertes merged commit 77a20bc into main May 29, 2026
14 checks passed

dvargasfuertes deleted the apollo/assistant-db-repair branch May 29, 2026 22:15

vellum-apollo-bot Bot mentioned this pull request May 29, 2026

feat(cli): assistant db repair — conversation-backfill step + open-failure catch #32642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): add `assistant db repair` with integrity-check step#32632

feat(cli): add `assistant db repair` with integrity-check step#32632
dvargasfuertes merged 1 commit into
mainfrom
apollo/assistant-db-repair

vellum-apollo-bot Bot commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

dvargasfuertes May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vellum-apollo-bot Bot commented May 29, 2026

What

What this command does

Step-runner framework

Risk

Description fix (review feedback on #32606)

Testing

What's next

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dvargasfuertes May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant