feat(cli): assistant db repair — conversation-backfill step + open-failure catch by vellum-apollo-bot[bot] · Pull Request #32642 · vellum-ai/vellum-assistant

vellum-apollo-bot · 2026-05-29T22:35:58Z

Second step in the assistant db repair sequence (third PR in the recovery workstream after #32606 and #32632).

What it does

Replays the on-disk view at <workspace>/conversations/<id>/{meta.json,messages.jsonl} back into the SQLite conversations and messages tables. The on-disk files are written by the runtime as the source of truth for the disk view; if SQLite was wiped, restored from an old backup, or otherwise lost rows, this step rebuilds them.

The recovery body is a new shared module at assistant/src/workspace/recovery/conversations-from-disk.ts. Two callers consume it:

Workspace migration 028 — runs once at startup against the daemon's global getDb(). Rewritten from 271 → 46 lines, delegating to the shared function. All 10 of its existing tests still pass.
db repair conversation-backfill step — opens its own RW bun:sqlite handle (mirroring the daemon's pragmas: WAL, synchronous=FULL, busy_timeout=5000, foreign_keys=ON), wraps it in drizzle, calls the shared function. Stays local-transport so the command works when the daemon is down.

Idempotency is enforced by the per-conversation existence check inside the shared function. Malformed meta.json or messages.jsonl lines are skipped with warnings rather than crashing the step. Warning output is capped at 20 lines in human mode (full list in --json, hard cap of 500 entries in memory).

Two follow-ups from #32632 review folded in

Vargas — drop PR/future references from comments

The repair-steps.ts module doc, repair.ts module doc, and the STEPS comment all listed the step sequence with (this PR), (next PR), (future) annotations naming what hadn't shipped yet. Rewritten to describe the abstraction (sequence of steps, append by extending the array) without enumerating future entries.

Codified the broader rule in the software-engineering skill's comments.md: future-narration is just as bad as history-narration — both belong in the PR thread, not the code.

Codex P2 — handle DB open failures as repair errors

The integrity-check step's new Database(ctx.dbPath, { readonly: true }) could throw when the file is unreadable, owned by root, or assistant.db is a directory. The throw escaped the inner try around PRAGMA integrity_check, so the runner's generic fallback reported "step threw an unexpected error — this is a bug" instead of an actionable open diagnostic.

Now wrapped in a top-level try/catch that returns a structured status: "error" with summary: "could not open database for integrity check", data.openFailed: true, and the SQLite error message as a detail line. The synthetic-bug fallback is reserved for genuine unexpected throws.

Tests

16 unit tests in repair.test.ts (11 carried forward, 5 new):

File-is-a-directory open-failure — verifies data.openFailed: true, "could not open" summary, NOT flagged as a bug.
Backfill: disk-only conversation — seeds meta.json + messages.jsonl on disk only, runs repair, verifies the conversation row + message count landed in SQLite via direct query.
Backfill: idempotent — second run reports recovered: 0, skipped: 1.
Backfill: empty conversations dir — "nothing to backfill" ok status.
Backfill: malformed meta.json — surfaced as a warning, step stays ok, skip counter increments.

Migration 028's 10 existing tests all still pass against the refactored delegator.

Smoke test on the live ~4 GB workspace DB

[1/2] integrity-check — starting
        Walk every database page and verify b-tree consistency
[1/2] integrity-check — ok  no corruption detected  (40.1s)
        scanned 993,829 pages
[2/2] conversation-backfill — starting
        Replay workspace/conversations/<id>/{meta.json,messages.jsonl} into SQLite
[2/2] conversation-backfill — ok  nothing to backfill (773 on-disk conversations already present)  (1.5s)

Done. 2 steps ran: 2 ok, 0 failed
real  0m42.483s

Backfill correctly found 773 on-disk conversations all already present in SQLite (the daemon writes both paths on every conversation), no recovery needed.

Files

assistant/src/workspace/recovery/conversations-from-disk.ts — NEW shared core (~290 lines)
assistant/src/workspace/migrations/028-recover-conversations-from-disk-view.ts — 271 → 46 lines, delegates
assistant/src/cli/commands/db/repair-step-conversation-backfill.ts — NEW step (~135 lines)
assistant/src/cli/commands/db/repair-step-integrity.ts — open-failure catch added
assistant/src/cli/commands/db/repair-steps.ts — module doc cleaned
assistant/src/cli/commands/db/repair.ts — module doc cleaned, conversationBackfillStep added to STEPS
assistant/src/cli/commands/db/__tests__/repair.test.ts — 5 new tests

…ty-step open-failure catch Adds the second step to the `assistant db repair` sequence — replays `<workspace>/conversations/<id>/{meta.json,messages.jsonl}` into the SQLite conversations/messages tables so a wiped or restored-from-old- backup database can be rebuilt from the on-disk view. Architecture: the recovery body lives in a new shared module `workspace/recovery/conversations-from-disk.ts` that takes a drizzle handle + workspace dir and returns `{ recovered, skipped, errors, warnings }`. Two callers consume it: 1. workspace migration 028 — runs once at startup against the daemon's global `getDb()` (rewritten from 271 → 46 lines, delegates to the shared function) 2. `db repair` conversation-backfill step — opens its own RW bun:sqlite handle with the same pragmas as the daemon, wraps it in drizzle, calls the shared function Idempotent: the per-conversation existence check guards both call sites. Malformed `meta.json` / `messages.jsonl` lines are skipped with warnings (capped at 20 in human output, full list in --json up to a 500-entry memory cap). Two follow-ups from PR #32632 review folded in: - Vargas: dropped `(this PR)` / `(next PR)` / `(future)` PR- chronology callouts from `repair-steps.ts` and `repair.ts` module docs and from the `STEPS` comment. Rewritten to describe the abstraction (sequence of steps, append by extending the array) rather than the timeline. Codified in the software-engineering skill's `comments.md` as a lesson entry. - Codex P2: `integrity-check` step now catches `new Database(…)` failures (file-is-a-directory, unreadable file, header so broken SQLite refuses to attach) and surfaces them as a structured `status: "error"` with `data.openFailed: true` rather than letting the runner's generic "this is a bug" fallback eat it. Tests: 16 unit tests in `repair.test.ts` (11 carried, 5 new — 1 open-failure + 4 backfill: disk-only convo backfills + verifies SQLite rows, idempotency on second run, empty-conversations-dir nothing-to- backfill summary, malformed meta.json surfaced as a warning without erroring the step). Migration 028's 10 tests all still pass against the refactored delegator. Smoke-tested on the live ~4 GB workspace DB: [1/2] integrity-check — ok no corruption detected (40.1s) scanned 993,829 pages [2/2] conversation-backfill — ok nothing to backfill (773 on-disk conversations already present) (1.5s) Done. 2 steps ran: 2 ok, 0 failed real 0m42.483s

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87c1265e12

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T22:38:33Z

 import { getDb } from "../../memory/db-connection.js";
-import { conversations, messages } from "../../memory/schema/conversations.js";
 import { getLogger } from "../../util/logger.js";
+import { recoverConversationsFromDisk } from "../recovery/conversations-from-disk.js";


Keep migration 028 self-contained

assistant/src/workspace/migrations/AGENTS.md requires each workspace migration to be fully self-contained and not import shared modules outside ./types.js, ./utils.js, the logger, and runtime built-ins. Pulling the recovery body from ../recovery/conversations-from-disk.js makes this already-shipped migration depend on mutable repair code, so later fixes to the CLI path can silently change what migration 028 does on fresh installs or reruns, violating the append-only migration contract.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-29T22:38:33Z

+    const existing = db
+      .select()
+      .from(conversations)
+      .where(eq(conversations.id, meta.id))
+      .get();


Handle read failures during backfill

When the DB file opens but the conversations table cannot be read (for example a corrupt assistant.db with on-disk conversation directories present), this existence check throws outside the insert-level try/catch. The step runner then reports step threw an unexpected error — this is a bug instead of a structured repair error, so the new repair path gives users a false tool-bug diagnostic precisely in a corruption scenario it is meant to handle.

Useful? React with 👍 / 👎.

Reverts the `workspace/recovery/conversations-from-disk.ts` shared module + the migration 028 delegator collapse. Migration 028 is back to its original 271-line form (unchanged from main); the repair step gets its own self-contained copy inlined into `repair-step-conversation-backfill.ts`. Migrations are frozen historical snapshots. Sharing live code between a migration and an evolving CLI command risks changing the migration's behavior on workspaces that have already run it. The two consumers should be free to drift — bug fixes or schema changes in the repair step shouldn't retroactively alter what migration 028 does.

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

dvargasfuertes approved these changes May 30, 2026

View reviewed changes

dvargasfuertes merged commit 03eac5b into main May 30, 2026
13 checks passed

dvargasfuertes deleted the apollo/assistant-db-repair-backfill branch May 30, 2026 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): assistant db repair — conversation-backfill step + open-failure catch#32642

feat(cli): assistant db repair — conversation-backfill step + open-failure catch#32642
dvargasfuertes merged 2 commits into
mainfrom
apollo/assistant-db-repair-backfill

vellum-apollo-bot Bot commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vellum-apollo-bot Bot commented May 29, 2026

What it does

Two follow-ups from #32632 review folded in

Vargas — drop PR/future references from comments

Codex P2 — handle DB open failures as repair errors

Tests

Smoke test on the live ~4 GB workspace DB

Files

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant