feat(cli): operator polish — mm doctor, retention crons, backup scripts, docs (#64)#181
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (9)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 Walkthrough<review_stack_artifact> </review_stack_artifact> 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
…ed workflows >30d Migration 006 adds workflows.archived_at + a retention_runs table. runRetentionPass deletes events older than 14d and archives completed workflows older than 30d (events dropped, row + final state + meta_json config snapshot preserved), idempotently, and records every pass (ok/failure) so mm doctor can report recent retention status. SQLite-only — never touches GitHub. Wired into the daemon as a daily bunqueue cron with a startup pass; collectRetentionStatus exposes row counts + last run for doctor. Phase of #64. Closes #66.
…sue, db/retention Extends mm doctor beyond the toolchain checks with four operator checks: - config: loadConfig of global (+ cwd repo) config; malformed TOML is a hard fail - dispatcher: probe /health on the configured port — pass reachable, fail when the pidfile is live but the port is dead (wedged), warn when simply not running - state-issue: re-validate the parser against schemas/state-issue.v1.md by parse → byte-identical render → validate of the canonical fixture - database: SQLite row counts + recent retention-run status (FAILED surfaced as warn) New checks/state-issue.ts holds the pure round-trip check; formatAgo + summarizeRetention are exported pure helpers with unit tests. Any check failing exits non-zero. Phase of #64. Closes #65.
scripts/backup.sh: VACUUM INTO snapshot of the live db (consistent under WAL, no need to stop the dispatcher) + config.toml → timestamped .tar.gz, with a --restore mode that refuses while the dispatcher is up. scripts/reset-db.sh: deletes db.sqlite3 + -wal/-shm, refuses while the dispatcher is running, lists targets and confirms before deleting (--yes to skip). Neither touches GitHub. End-to-end test (db-scripts.test.ts): backup → reset → restore preserves rows + schema; covers the no-db and dispatcher-running guards. Phase of #64. Closes #67.
…strap, skill-enforcement, operator) Authors the docs/ set the spec's Repo layout calls for (dogfooding.md already existed) and corrects README drift: the shipped dispatcher port is 4120, not 8822. Each doc is one Diátaxis mode — architecture/skill-enforcement explain, adapters/bootstrap are reference, operator is how-to — grounded in shipped behavior (adapter interface quoted from core/adapter.ts, CLI surface and mm init dry-run verified against the binary). README 'Going deeper' links the set. Phase of #64. Closes #68.
# Conflicts: # packages/cli/src/commands/doctor.ts # packages/dispatcher/src/index.ts # packages/dispatcher/src/main.ts
- retention: record ok=0 keyed on detail presence, not truthiness, so a failure whose Error.message is "" isn't mis-recorded as a clean run (+ test). - backup.sh / reset-db.sh: resolve the db path so a relocated db_path isn't silently no-op'd — add --db, and fall back to the configured global.dbPath (the loader mm doctor uses) when neither --db nor --home is given (+ test). - backup.sh / reset-db.sh: pin the --help sed range to the comment block so --help no longer prints script code past the header. - migration 007: fix stale '006_retention.sql' header comment after the rename.
thejustinwalsh
left a comment
There was a problem hiding this comment.
Decision-log highlights from planning/issues/64/decisions.md, distilled inline. The full log lives on the branch.
| * dead process → **warn**: the dispatcher simply isn't started, which is normal | ||
| * when an operator runs `mm doctor` before `mm start`. | ||
| */ | ||
| async function checkDispatcher(config: MiddleConfig | null): Promise<Check> { |
There was a problem hiding this comment.
Why warn-vs-fail split here: /health reachable → pass; a live pidfile with an unreachable /health → fail (wedged daemon); no/dead pidfile → warn. Operators routinely run mm doctor before mm start, so a not-running dispatcher must keep exit 0 — only a wedged daemon is a real failure. Distinguishing the two needs the pidfile, not just the probe.
| * (dispatcher never started) → `warn`. A db below the retention schema version → | ||
| * `warn` (start the dispatcher to migrate). A db that can't be opened → `fail`. | ||
| */ | ||
| function checkDatabase(config: MiddleConfig | null): Check { |
There was a problem hiding this comment.
Why this degrades instead of failing: no db file / schema < retention version → warn; a failed last retention run → warn (surfaced as FAILED); only an unreadable db → fail. Doctor must be safe to run anytime. Note existsSync guards before openDb (which has create:true), so doctor never creates the db as a side effect.
| * invariant the dispatcher relies on to edit one section without disturbing | ||
| * others), and assert `validate` passes. Returns the first failure it hits. | ||
| */ | ||
| export function checkStateIssueRoundTrip(body: string, adapters: string[]): StateIssueCheckResult { |
There was a problem hiding this comment.
Why the fixture's own adapter set, not the operator's: this is a self-test of the parse→render→validate machinery against schemas/state-issue.v1.md. Using the operator's configured adapters would make a conforming fixture fail validate (rule 5) on a single-adapter install. Paths resolve from the module location (like the module-index/skills checks) so it inspects middle's own source tree.
| # it). bun is middle's runtime, so it is always present. The connection is | ||
| # read-write because VACUUM INTO needs a writable handle even though it only | ||
| # reads the source; concurrent with a running dispatcher this is safe under WAL. | ||
| snapshot_db() { |
There was a problem hiding this comment.
Why paths go through env, not argv: bun -e '<code>' a b does NOT expose a/b as process.argv[2+] — new Database(undefined) then silently opens an in-memory db, so an argv-based snapshot writes an empty file with no error. VACUUM INTO runs on a read-write handle (readonly → SQLITE_CANTOPEN) and is consistent even against a live WAL, so backup works without stopping the dispatcher.
| const eventsDeleted = db.run("DELETE FROM events WHERE ts < ?", [eventsCutoff]).changes; | ||
| // Drop events of the workflows about to be archived, then stamp them. Both | ||
| // statements share the same predicate so the counts stay consistent. | ||
| const archivePredicate = "state = 'completed' AND updated_at < ? AND archived_at IS NULL"; |
There was a problem hiding this comment.
Why one shared predicate in one transaction: the events-delete and the workflow-archival UPDATE use the identical state='completed' AND updated_at < ? AND archived_at IS NULL predicate inside a single db.transaction, so their counts can't diverge and the pass is idempotent (archived_at IS NULL). The row + its meta_json config snapshot survive; only its events are dropped.
Reviewer's brief — Epic #64 (operator polish)PR #181 ships Phase 11: How to run itbun install
bun run typecheck # clean
bun run lint # clean
bun test # 1096 pass / 0 fail
bun packages/cli/src/index.ts doctor # the new full health checkOperator scripts (safe to try — they guard against a running dispatcher and confirm before destroying): scripts/backup.sh --help
scripts/reset-db.sh --help
# round-trip is covered by packages/cli/test/db-scripts.test.tsWhat to verify
How to review
Fragile / extra eyes
Final review + merge are yours — middle doesn't merge its own work. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
125-125:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate stale port reference in diagram caption.
The diagram alt text still references port
:8822, but the configuration was updated to4120on line 70.Suggested fix
- <img src="docs/diagrams/architecture.svg" alt="middle dispatcher (single process) — bunqueue engine, hook receiver, and watchdog ticker write to SQLite (WAL); Bun.serve fronts HTTP + SSE on :8822 to a read-only dashboard; GitHub on the left is the source of truth; tmux sessions and git worktrees are spawned below." width="720" /> + <img src="docs/diagrams/architecture.svg" alt="middle dispatcher (single process) — bunqueue engine, hook receiver, and watchdog ticker write to SQLite (WAL); Bun.serve fronts HTTP + SSE on :4120 to a read-only dashboard; GitHub on the left is the source of truth; tmux sessions and git worktrees are spawned below." width="720" />🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` at line 125, The image alt text for the architecture diagram still mentions port ":8822" in the string "middle dispatcher (single process) — bunqueue engine, hook receiver, and watchdog ticker write to SQLite (WAL); Bun.serve fronts HTTP + SSE on :8822 to a read-only dashboard; GitHub on the left is the source of truth; tmux sessions and git worktrees are spawned below." — update that alt text to the current port ":4120" (replace ":8822" with ":4120") so the README caption matches the configuration change.
🧹 Nitpick comments (1)
packages/cli/src/commands/doctor.ts (1)
139-166: ⚡ Quick winUse the already-loaded doctor config for adapter binary checks.
runDoctor()now resolves repo-aware config, butcheckAdapterBinaries()reloads global-only config. That can report the wrong enabled adapters when.middle/config.tomlis present in the repo.Suggested patch
-async function checkAdapterBinaries(): Promise<Check[]> { - let config: ReturnType<typeof loadConfig>; - try { - config = loadConfig({ globalPath: process.env.MIDDLE_CONFIG }); - } catch (error) { +async function checkAdapterBinaries(config: MiddleConfig | null): Promise<Check[]> { + if (!config) { return [ { name: "adapters", status: "warn", - detail: `config unreadable: ${(error as Error).message}`, + detail: "config unreadable: adapter checks skipped", }, ]; } const enabled = Object.entries(config.adapters).filter(([, a]) => a.enabled); if (enabled.length === 0) { @@ export async function runDoctor({ fix }: { fix?: boolean } = {}): Promise<number> { const { check: configCheck, config } = loadDoctorConfig(); const stateIssue = checkStateIssue(); const checks: Check[] = [ @@ - ...(await checkAdapterBinaries()), + ...(await checkAdapterBinaries(config)),Also applies to: 418-425
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/cli/src/commands/doctor.ts` around lines 139 - 166, checkAdapterBinaries currently reloads global-only config via loadConfig which ignores the repo-aware config resolved by runDoctor; change checkAdapterBinaries to accept the already-loaded doctor config (e.g., add a parameter like config: ReturnType<typeof loadConfig>) and use that config.adapters when building the enabled list and checking binaries; update all call sites (including the duplicate at the other occurrence) to pass the resolved config from runDoctor instead of letting checkAdapterBinaries call loadConfig itself.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/cli/src/checks/state-issue.ts`:
- Around line 100-104: The current code calls
readFileSync(STATE_ISSUE_FIXTURE_PATH, "utf8") inside checkStateIssue and will
throw on I/O errors; change it to catch file read errors and return a structured
failure result instead of letting the exception propagate. Specifically, wrap
the readFileSync call and the subsequent checkStateIssueRoundTrip(body,
[...FIXTURE_ADAPTERS]) invocation in a try/catch (or handle the thrown error),
and in the catch return { status: "fail", detail: `Failed to read fixture
${STATE_ISSUE_FIXTURE_PATH}: ${error.message}` } (or a similar descriptive
detail) so checkStateIssue always returns a pass/fail object rather than
throwing. Ensure the fix references the same symbols: readFileSync,
STATE_ISSUE_FIXTURE_PATH, checkStateIssueRoundTrip and preserves existing
successful return shape.
In `@scripts/backup.sh`:
- Around line 119-120: The restore fails if destination parent directories don't
exist; before copying "$tmp/db.sqlite3" to "$DB_PATH" and "$tmp/config.toml" to
"$CONFIG_PATH" ensure the target directories exist by creating their parent
directories (use the dirname of $DB_PATH and $CONFIG_PATH with mkdir -p) prior
to the cp commands so the restore doesn't abort when the destination directory
is missing.
---
Outside diff comments:
In `@README.md`:
- Line 125: The image alt text for the architecture diagram still mentions port
":8822" in the string "middle dispatcher (single process) — bunqueue engine,
hook receiver, and watchdog ticker write to SQLite (WAL); Bun.serve fronts HTTP
+ SSE on :8822 to a read-only dashboard; GitHub on the left is the source of
truth; tmux sessions and git worktrees are spawned below." — update that alt
text to the current port ":4120" (replace ":8822" with ":4120") so the README
caption matches the configuration change.
---
Nitpick comments:
In `@packages/cli/src/commands/doctor.ts`:
- Around line 139-166: checkAdapterBinaries currently reloads global-only config
via loadConfig which ignores the repo-aware config resolved by runDoctor; change
checkAdapterBinaries to accept the already-loaded doctor config (e.g., add a
parameter like config: ReturnType<typeof loadConfig>) and use that
config.adapters when building the enabled list and checking binaries; update all
call sites (including the duplicate at the other occurrence) to pass the
resolved config from runDoctor instead of letting checkAdapterBinaries call
loadConfig itself.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: da5d65a0-cc00-442c-8259-96eb0ec919b5
📒 Files selected for processing (22)
README.mddocs/adapters.mddocs/architecture.mddocs/bootstrap.mddocs/operator.mddocs/skill-enforcement.mdpackages/cli/src/checks/state-issue.tspackages/cli/src/commands/doctor.tspackages/cli/test/db-scripts.test.tspackages/cli/test/doctor.test.tspackages/cli/test/state-issue-check.test.tspackages/dispatcher/src/db/migrations/007_retention.sqlpackages/dispatcher/src/index.tspackages/dispatcher/src/main.tspackages/dispatcher/src/retention-cron.tspackages/dispatcher/src/retention.tspackages/dispatcher/test/db.test.tspackages/dispatcher/test/retention.test.tsplanning/issues/64/decisions.mdplanning/issues/64/plan.mdscripts/backup.shscripts/reset-db.sh
- doctor: checkAdapterBinaries takes the repo-aware config runDoctor already resolved instead of reloading global-only config, so a repo's .middle/config.toml adapter set is honored; exported + unit-tested (null, empty, enabled-from-this-config, missing-binary). - state-issue check: wrap the fixture read in try/catch so checkStateIssue returns a structured fail instead of throwing and aborting `mm doctor`; test via a restored fs spy (no cross-file mock leak). - backup.sh restore: mkdir -p the parent dirs of DB_PATH and CONFIG_PATH before copying, so a relocated --db / MIDDLE_CONFIG under a missing dir no longer aborts restore; covered by a relocated round-trip test. - docs: stale dispatcher port :8822 -> :4120 across the whole live-doc class — README architecture caption, the architecture.svg/excalidraw it renders, and the dogfooding how-to. Frozen planning/spec docs and test fixtures keep 8822 intentionally.
Review round 1 addressed — 94f87ddAll four findings resolved in one push, each class-wide within its blast radius (a test per code fix). Inline (replied in-thread):
From the review body:
Gates: |
- migration filenames: 008→009 (workflows.epic_ref) to clear collision with #181's 007_retention.sql - db.test.ts + db-scripts.test.ts: bump schema-version assertions to 9 (final post-008+009 state) - re-apply rename codemod to references introduced by #185/#186 after the original rename codemod landed
- migration filenames: 008→009 (workflows.epic_ref) to clear collision with #181's 007_retention.sql - db.test.ts + db-scripts.test.ts: bump schema-version assertions to 9 (final post-008+009 state) - re-apply rename codemod to references introduced by #185/#186 after the original rename codemod landed
Summary
Closes #64
Phase 11 operator polish: a real
mm doctorhealth check, retention crons that bound operational state, backup/reset-db operator scripts, and the README +docs/set. The bar for the Epic — a new user can clone,bun install,mm start,mm init <repo>, and reach a working dispatch within 5 minutes — is met by the documented, command-verified quickstart (and proven by dogfooding: this PR was produced by a middle dispatch).What changed
007_retention.sql(workflows.archived_at+retention_runs);retention.ts/retention-cron.tsrun daily, deletingevents>14d and archivingcompletedworkflows >30d (events dropped; row + final state +meta_jsonpreserved), SQLite-only. Wired into the daemon with a startup pass + shutdown teardown.mm doctor— addsconfig(parse),dispatcher(/healthprobe),state-issue(parser re-validation), anddatabase(row counts + retention status) checks; any failure exits non-zero.scripts/backup.sh(consistentVACUUM INTOsnapshot + config →.tar.gz, with--restore) andscripts/reset-db.sh(nuke SQLite, never GitHub). Both honor a configureddb_path, guard against a running dispatcher, and never destroy silently.4120) +docs/: architecture, adapters, bootstrap, skill-enforcement, operator.Status
mm doctorfull health checkAcceptance criteria
mm doctorvalidates environment + config (ghauth,tmux, config parse, dispatcher reachable) — doctor.tsmm doctorre-validates the state-issue parser againstschemas/state-issue.v1.md— checks/state-issue.tsmm doctorflagspackages/skills/↔bootstrap-assets/drift — doctor.tscheckSkillsDriftmm doctorreports SQLite row counts + recent retention-run status — retention.tscollectRetentionStatusmm doctorchecks report clear pass/fail with detail and exit non-zero on any failure — doctor.tsrunDoctoreventsolder than 14 days — retention.tsrunRetentionPassworkflowsolder than 30 days archived (config + final state preserved, events dropped) — retention.tsmm doctor— migration 007scripts/reset-db.shnukes SQLite and does not touch GitHub — reset-db.shREADME.mdcovers what middle is, install, and the quickstart — README.mddocs/contains architecture, adapters, bootstrap, skill-enforcement, dogfooding, operator — docs/mm doctor,mm init --dry-run, CLI help; dispatch path proven by dogfooding) — docs/operator.md4120) — docs/architecture.mdVerification
bun test→ 1096 pass / 0 fail;bun run typecheckclean;bun run lintclean;bun run formatclean.mm doctorrun live: all toolchain + adapter-binary checks pass,state-issueround-trip passes,config/dispatcherpass;databasewarnsschema v5 (pre-retention, needs ≥ v7)because the running daemon predates this migration — it migrates on nextmm start.mm init --dry-runoutput matches docs/bootstrap.md;mm --helpmatches docs/operator.md's command table.Merge strategy
Merged
origin/main(deep interleave — main extractedadapters.ts, addedrecovery.ts/audit/staleness crons, and shipped migration006_pr_divergence_state.sql). Resolved new-work-as-base: unioned the additive import/export/cron-teardown hunks, renumbered my migration006→007to avoid the version collision, and bumped the schema-version assertions to 7. Re-ran the full suite green.mergeablereads MERGEABLE.Decisions
planning/issues/64/decisions.md(highlights posted as inline review comments on this PR).Stumbling points
bun -e '<code>' a bdoes not exposea/basprocess.argv— an argv-based DB snapshot silently opened an in-memory db and wrote an empty file. Fixed by passing paths via env vars.VACUUM INTOneeds a read-write handle (readonly →SQLITE_CANTOPEN).Out of scope
Summary by CodeRabbit
Documentation
New Features
Tools