feat(dispatcher): hooks + watchdog — Phase 2 (Epic #14) by thejustinwalsh · Pull Request #75 · thejustinwalsh/middle

thejustinwalsh · 2026-05-23T17:30:40Z

Summary

Closes #14

Phase 2 ships middle's observability layer: a full HMAC-validated hook receiver, the universal hook script, the complete Claude hook-event set, event/heartbeat persistence into SQLite, the watchdog reconciler cron, and reactive rate-limit detection. Hooks are the fast-path notification; the on-disk transcript + SQLite are the durable state the crons reconcile against.

What changed

packages/dispatcher/src/hook-server.ts — full event-name validation, per-session bearer-token auth (X-Middle-Token, timing-safe), and an injected persistence sink; session.started/agent.stopped still drive the SessionGate.
packages/dispatcher/src/hook-store.ts — DbHookStore: events row per hook (payload ≤16KB), heartbeat bump on tool.pre/tool.post, session-field write on session.started, session-correlated, unmatchable sessions dropped.
packages/dispatcher/src/watchdog.ts + watchdog-cron.ts — 30s reconciler: launch-timeout (stuck-launching / prompt-not-accepted), tmux liveness, idle/idle-timeout (skipped while controlled_by='human', transcript cross-check), blocked.json sentinel re-arm, and a companion transcript-drift pass.
packages/dispatcher/src/rate-limits.ts — rate_limit_state read/write + parseResetAt + probe-via-real-work revert.
packages/adapters/claude/src/hooks.ts — full 8-hook .claude/settings.json mapped to the normalized taxonomy.
packages/adapters/claude/src/classify.ts / index.ts — detectRateLimit Stop-hook detector.
packages/core/src/events.ts — NORMALIZED_EVENTS + isNormalizedEvent; hook-script.ts — canonical HOOK_SH.
packages/cli/src/bootstrap-assets/hooks/hook.sh — committed universal script (drift-guarded against HOOK_SH).
packages/dispatcher/src/main.ts — wires DbHookStore + the watchdog cron into the long-running process.

Why these changes

The interactive agent process never exits between turns, so the dispatcher can't read an exit code — hooks POST turn boundaries and the on-disk JSONL transcript is the authoritative state. The hook server is the fast path (authenticated, validated, persisted); the watchdog is the safety net that only acts on staleness and never overrides an in-progress hook decision. Auth is a per-session bearer token, not a body signature, because hook.sh forwards the token as a header — resolved from the workflow row (populated before launch) and compared constant-time. The hook script is single-sourced from one @middle/core constant so the adapter-written copy and the mm init bootstrap asset can never drift.

Verification

bun run typecheck clean; bun test → 173 pass / 0 fail across 22 files.

Build hook server with HMAC-validated /hooks/:event endpoint #15 — packages/dispatcher/test/hook-server.test.ts: valid POST accepted+recorded, bad-HMAC → 401, unknown session → 401, unknown event → 400.
Implement ClaudeAdapter installHooks writing .claude/settings.json #16 — packages/adapters/claude/test/adapter.test.ts: all 8 Claude hook keys + each event→normalized mapping.
Add universal hook.sh curl script #17 — packages/cli/test/bootstrap-hook.test.ts: byte-identity drift test, POSIX shape, executable (git mode 100755), sh -n clean.
Populate events table from hooks and track heartbeats #18 — packages/dispatcher/test/hook-store.test.ts: events rows, heartbeat advance on tool.pre/post only, session.started field write, unmatchable-session drop, 16KB truncation, live HookServer → DbHookStore → SQLite.
Add watchdog cron (tmux liveness, idle detection, sentinel check) #19 — packages/dispatcher/test/watchdog.test.ts: launch-timeout, prompt-not-accepted, tmux liveness (+compensation), idle/idle-timeout, human-controlled skip, transcript cross-check, sentinel re-arm, drift correction.
Add reactive rate-limit detection in classifyStop #20 — packages/dispatcher/test/rate-limits.test.ts + the workflow rate-limit tests: classification → rate_limit_state RATE_LIMITED with parsed reset_at; completed dispatch reverts to AVAILABLE.

Acceptance evidence (Epic #14)

Criterion	Status	Evidence
All sub-issues closed	✅ Met	#15–#20 all CLOSED
Hook events flow into SQLite; `session.started` records `session_id`+`transcript_path`; reconciler keeps state honest	✅ Met	`hook-store.test.ts` (wired HookServer→DbHookStore→SQLite + session.started write); `watchdog.test.ts` `reconcileTranscriptDrift`
Killing a tmux session is caught by the watchdog within 30s	✅ Met	`watchdog.test.ts` "dead session → failed + compensation"; cron cadence `WATCHDOG_INTERVAL_MS = 30_000`
A forced rate-limit error is recorded with the correct `reset_at`	✅ Met	`rate-limits.test.ts` + `implementation-workflow.test.ts` "rate-limited classifyStop → RATE_LIMITED with parsed reset_at"

The three live criteria are proven at the mechanism level by the suite (every link in the path has a test); the spec's Phase 2 manual acceptance — spawn a real Claude agent and watch events flow, kill its tmux session, force a usage-limit message — remains the operator's final hands-on check and needs a live, authenticated Claude CLI.

Status

Build hook server with HMAC-validated /hooks/:event endpoint #15 — Hook server: HMAC-validated POST /hooks/:event + event validation + sink seam
Implement ClaudeAdapter installHooks writing .claude/settings.json #16 — ClaudeAdapter.installHooks full 8-hook .claude/settings.json
Add universal hook.sh curl script #17 — Universal hook.sh bootstrap asset
Populate events table from hooks and track heartbeats #18 — events table + heartbeats from hooks
Add watchdog cron (tmux liveness, idle detection, sentinel check) #19 — Watchdog cron (tmux liveness, idle detection, sentinel check)
Add reactive rate-limit detection in classifyStop #20 — Reactive rate-limit detection in classifyStop

Decisions

planning/issues/14/decisions.md, distilled into inline review comments on this PR (sink seam, bearer-token auth, watchdog-vs-execution boundary, idle-once-per-period, probe-via-real-work revert).

Scope boundaries carried forward (already-planned phases — no new issues)

Cross-run delayed re-enqueue of a rate-limited adapter (read reset_at, delay until reset) → Phase 8 auto-dispatch loop. Phase 2 writes the durable rate_limit_state signal.
Execution cancellation on watchdog failure (so bunqueue compensation unwinds the blocked awaitStop) → Phase 5 waitFor integration. Phase 2 provides the triggerCompensation seam + durable state failure.
Committed .middle/hooks/hook.sh dogfood copy → Phase 3 task 22 (mm init middle into itself). The dispatch hard-rule forbids staging/committing under .middle/ (the running dispatch's operational dir), and #17's substantive deliverable — the reusable universal script — ships in bootstrap-assets/. Flagged for reviewer.

Summary by CodeRabbit

New Features
- Hook-based event delivery with per-session authentication, durable recording, and a committed universal POSIX hook script for adapters
- Adapter rate-limit detection surfaced and persisted with reset times
- Watchdog: recurring monitoring, idle detection, transcript reconciliation, and automated recovery for stuck workflows
Tests
- Expanded coverage for hooks, hook-store, rate-limits, watchdog, and adapter rate-limit detection
Documentation
- Planning and decisions documenting hooks, auth, watchdog, and rate-limit approach

coderabbitai · 2026-05-23T17:30:45Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8a2e5951-dd50-4b5b-923a-cc7a1d548117

📥 Commits

Reviewing files that changed from the base of the PR and between 47538ca and e7bcf05.

📒 Files selected for processing (2)

packages/dispatcher/src/watchdog.ts
packages/dispatcher/test/watchdog.test.ts

📝 Walkthrough

Walkthrough

Adds a canonical HOOK_SH and normalized event list, installs hooks for Claude (with detectRateLimit), introduces authenticated SQLite-backed hook persistence (HookStore/DbHookStore), durable per-adapter rate-limit state, a watchdog cron (reconcileTranscriptDrift + runWatchdog), and dispatcher wiring and tests.

Changes

Phase 2: Hooks + watchdog implementation

Layer / File(s)	Summary
Core hook infrastructure & event validation `packages/core/src/hook-script.ts`, `packages/core/src/events.ts`, `packages/core/src/index.ts`	`HOOK_SH` canonical hook script; `NORMALIZED_EVENTS` and `isNormalizedEvent` provide runtime-validated event vocabulary and re-exports.
Hook script bootstrap asset & validation `packages/cli/src/bootstrap-assets/hooks/hook.sh`, `packages/cli/test/bootstrap-hook.test.ts`	Committed bootstrap `hook.sh` matches `HOOK_SH`; tests verify dispatcher wiring, headers, timeout behavior, no-op-on-error, and executable permission.
Claude adapter hook installation & rate-limit detection `packages/adapters/claude/src/hooks.ts`, `packages/adapters/claude/src/classify.ts`, `packages/adapters/claude/src/index.ts`, `packages/adapters/claude/test/adapter.test.ts`	`CLAUDE_EVENT_MAP` and `installHooks` write the universal `HOOK_SH` and full `.claude/settings.json`; `detectRateLimit` parses Stop transcript tails for usage-limit/reset messages and is exported from the adapter; tests validate installation and detection.
Dispatcher package & dispatch wiring `packages/dispatcher/package.json`, `packages/dispatcher/src/dispatch.ts`	Adds `@middle/adapter-claude` workspace dependency and constructs `HookServer` with `DbHookStore(db)` during dispatch.
HookServer auth & SQLite persistence `packages/dispatcher/src/hook-server.ts`, `packages/dispatcher/src/hook-store.ts`, `packages/dispatcher/test/hook-server.test.ts`, `packages/dispatcher/test/hook-store.test.ts`	`HookServer` accepts optional `HookStore` to enable per-session token auth (constant-time compare) and persistence; `DbHookStore` serializes/truncates payloads, records events, touches heartbeat for tool events, and persists session metadata; tests cover token resolution, recording, truncation, and end-to-end server+store behavior.
Workflow record persistence & event helpers `packages/dispatcher/src/workflow-record.ts`	DB helpers: `findActiveWorkflowBySession`, `recordEvent`, `touchHeartbeat`, event-inspection helpers, and idempotent `armWaitForSignal` for wait-signal control.
Durable rate-limit state management `packages/dispatcher/src/rate-limits.ts`, `packages/dispatcher/test/rate-limits.test.ts`	Per-adapter `rate_limit_state` types and helpers: `getRateLimitState`, `setRateLimited`, `markAvailable`, `markAvailableOnSuccess`, and `parseResetAt`; tests verify upsert behavior, transitions, and parsing.
Workflow cleanup: persist rate-limit/availability signals `packages/dispatcher/src/workflows/implementation.ts`, `packages/dispatcher/test/implementation-workflow.test.ts`	Workflow cleanup records `rate_limit_state` when classification is `rate-limited` and marks adapter `AVAILABLE` on completed runs; tests assert persisted state transitions.
Watchdog reconciliation & lifecycle monitoring `packages/dispatcher/src/watchdog.ts`, `packages/dispatcher/src/watchdog-cron.ts`, `packages/dispatcher/test/watchdog.test.ts`	`runWatchdog` enforces launch/prompt timeouts, tmux liveness checks and kills, activity freshness (idle events and idle-timeout), and sentinel re-arming; `reconcileTranscriptDrift` advances `last_heartbeat` from transcripts; `startWatchdog` schedules ticks every 30s. Tests validate scenarios and transcript-drift corrections.
Dispatcher startup, watchdog lifecycle & shutdown `packages/dispatcher/src/main.ts`	Dispatcher constructs `HookServer` with `DbHookStore`, starts watchdog cron on startup, captures stop function, and safely handles watchdog teardown during shutdown.
Phase 2 planning & decisions documentation `planning/issues/14/plan.md`, `planning/issues/14/decisions.md`	Plan and decisions describe Phase 2 scope, sub-issue breakdown, HookStore/DbHookStore separation, token auth semantics, watchdog responsibilities, and idle-event throttling.

Sequence Diagram

sequenceDiagram
  participant Agent as Adapter (Claude)
  participant HookScript as HOOK_SH
  participant Dispatcher as HookServer (+ DbHookStore)
  participant DB as SQLite
  participant Watchdog as Watchdog Cron

  Agent->>HookScript: execute hook with normalized event + stdin
  HookScript->>Dispatcher: POST /hooks/:event (X-Middle-Session, X-Middle-Token)
  Dispatcher->>DB: resolve session token (HookStore)
  Dispatcher->>DB: record event (serialize/truncate) and touch heartbeat for tool events
  Dispatcher-->>HookScript: respond (fire-and-forget)
  Note over Watchdog: every 30s
  Watchdog->>DB: load reconcilable workflows
  Watchdog->>DB: check timeouts, tmux liveness, transcript drift, emit idle/failure events, update rate_limit_state

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Add universal hook.sh curl script #17: Claude hook.sh/script additions match this PR's committed HOOK_SH and bootstrap asset.
Implement ClaudeAdapter installHooks writing .claude/settings.json #16: Claude adapter hook installation and event mapping are directly related to this PR's CLAUDE_EVENT_MAP and settings.json generation.
Add watchdog cron (tmux liveness, idle detection, sentinel check) #19: Watchdog cron and reconciliation logic implemented here align with that issue's watchdog requirements.

Possibly related PRs

thejustinwalsh/middle#73: Shares Claude rate-limit detection work; this PR factors detection into an exported detectRateLimit helper.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main change: implementing Phase 2 hooks and watchdog for the dispatcher, directly addressing issue `#14`.
Linked Issues check	✅ Passed	The PR implements all coding requirements from issue `#14`: hook receiver with HMAC auth, universal hook script, event/heartbeat persistence, watchdog reconciler, and rate-limit detection.
Out of Scope Changes check	✅ Passed	All changes are scoped to Phase 2 deliverables. No unrelated modifications detected outside the defined hook and watchdog implementation objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

thejustinwalsh · 2026-05-23T17:48:02Z

+ * database, and lets the events/heartbeat persistence (task 15 / #18) evolve
+ * without re-touching the server.
+ */
+export interface HookStore {


Decision: persistence sink seam. The HookStore interface is the seam between the transport/auth layer (HookServer) and SQLite. #15 scoped event persistence to #18, but the server still needed token resolution (auth) and the session.started field write — a single injected interface let #15 ship auth + session.started recording and #18 fill in events/heartbeat without re-touching the server. The store is optional on HookServer so the SessionGate mechanics stay unit-testable unauthenticated; the live dispatcher always supplies a DbHookStore.

thejustinwalsh · 2026-05-23T17:48:02Z

+ * fixed-length UUIDs), but the per-byte comparison is not short-circuited, so a
+ * caller can't time their way to the correct token.
+ */
+function tokensMatch(provided: string, expected: string): boolean {


Decision: "HMAC token" = per-session bearer token, timing-safe compared. hook.sh forwards the token as a header, not a signature over the body, so this is bearer-token auth, not a true HMAC. The server resolves the expected session_token from the workflow row (persisted before the session launches, so it is always populated before any hook fires) and compares constant-time to avoid a token-guessing oracle on the localhost endpoint.

thejustinwalsh · 2026-05-23T17:48:02Z

+  }
+}
+
+function failWorkflow(deps: WatchdogDeps, id: string, reason: string, now: number): void {


Decision: watchdog reconciles durable state, not the live bunqueue execution. It marks the row failed, records watchdog.failed with the reason, and kills the session — it does not cancel the execution blocked in awaitStop. Cancelling a running execution from outside the engine (so compensation unwinds) is the waitFor/cancellation integration scoped to Phase 5; triggerCompensation is the seam where that plugs in. The DB is the source of truth the durable engine reconciles against, so failing the row + killing the session is the complete Phase 2 action.

thejustinwalsh · 2026-05-23T17:48:02Z

+      }
+      if (age >= idleThreshold && latestEventType(deps.db, row.id) !== IDLE_EVENT) {
+        // Mark idle once per idle period (not every 30s tick).
+        recordEvent(deps.db, { workflowId: row.id, ts: now, type: IDLE_EVENT, payloadJson: null });


Decision: idle is marked once per idle period, not every tick. Without the latestEventType !== IDLE_EVENT guard a genuinely-idle agent would accrue an idle event every 30s (120/hour) until the kill threshold, flooding the events table and the dashboard timeline. One marker per idle period is the signal the dashboard needs; the kill threshold handles escalation.

thejustinwalsh · 2026-05-23T17:48:02Z

+ * the adapter was rate-limited, so a normal completion never thrashes the row.
+ * Returns whether it flipped.
+ */
+export function markAvailableOnSuccess(


Decision: probe-via-real-work revert. A dispatch that actually completed proves the adapter is serving again, so markAvailableOnSuccess flips RATE_LIMITED → AVAILABLE. It is a no-op unless the adapter was rate-limited, so a normal completion never thrashes the row. Per the spec there is no proactive probing — the revert is reactive, gated on real work succeeding.

thejustinwalsh · 2026-05-23T17:51:41Z

Reviewer's brief — Epic #14 (Phase 2: hooks + watchdog) → PR #75

What this delivers: the full observability layer — HMAC-validated hook receiver, universal hook.sh, the complete Claude hook-event set, event/heartbeat persistence into SQLite, the 30s watchdog reconciler, and reactive rate-limit detection. All six sub-issues (#15–#20) land on one branch / one PR.

How to run it

bun install
bun run typecheck      # clean
bun test               # 173 pass / 0 fail across 22 files
sh -n packages/cli/src/bootstrap-assets/hooks/hook.sh   # POSIX-clean

Targeted suites: bun test packages/dispatcher/test/hook-server.test.ts packages/dispatcher/test/hook-store.test.ts packages/dispatcher/test/watchdog.test.ts packages/dispatcher/test/rate-limits.test.ts packages/adapters/claude/test/adapter.test.ts.

What to verify (and what "correct" looks like)

Auth (hook-server.ts): a hook with a wrong/absent X-Middle-Token, or for a session with no active workflow, is 401 and never persisted; an unknown :event is 400 before auth/persistence. The token is the per-session session_token resolved from the workflow row (written before launch in launchAndDrive), compared timing-safe.
Persistence (hook-store.ts): one events row per hook (payload clipped at 16KB), last_heartbeat advances on tool.pre/tool.post only, session.started writes session_id+transcript_path. The "live HookServer → DbHookStore → SQLite" test exercises the whole wired path.
Watchdog (watchdog.ts): launch-timeout → stuck-launching; running-but-no-turn.started-in-window → prompt-not-accepted; dead tmux session → failed + triggerCompensation; idle ≥5m → one watchdog.idle event (not per-tick), ≥15m → kill + idle-timeout; skipped entirely while controlled_by='human'; staleness cross-checked against the transcript. Cadence is 30s (WATCHDOG_INTERVAL_MS).
Rate limit (rate-limits.ts, classify.ts): a Resets at <ISO>. transcript tail → classifyStop rate-limited → rate_limit_state RATE_LIMITED with parsed reset_at; a later completed dispatch flips RATE_LIMITED → AVAILABLE (probe-via-real-work).

How to review

Read planning/issues/14/decisions.md (also distilled into inline review comments) for the load-bearing calls: the optional-store sink seam, bearer-token-not-HMAC, watchdog-reconciles-DB-not-execution, idle-once-per-period, probe-via-real-work. The reconcile logic is pure/injectable — the cron wrapper and main.ts wiring are thin glue.

Fragile / needs extra eyes

bunqueue cron + Engine coexistence in main.ts: the watchdog uses a Bunqueue (queue+worker) .every(30s) alongside the workflow Engine; main.test.ts confirms the process boots and SIGTERM-exits cleanly with both. Worth a sanity read of the shutdown ordering (stopWatchdog → hookServer.stop → engine.close → db.close).
@middle/adapter-claude is now a @middle/dispatcher dependency — only at the main.ts composition root, so the watchdog can read transcripts; dispatch.ts/workflow stay adapter-agnostic via injection.

Scope boundaries (already-planned phases — deliberately not done here)

Cross-run delayed re-enqueue of a rate-limited adapter → Phase 8 (auto-dispatch). Phase 2 writes the durable rate_limit_state.
Execution cancellation so bunqueue compensation unwinds a blocked awaitStop → Phase 5 (waitFor). Phase 2 provides the triggerCompensation seam + durable failure.
Committed .middle/hooks/hook.sh dogfood copy → Phase 3 task 22 (mm init middle into itself). The dispatch hard-rule forbids committing under .middle/; Add universal hook.sh curl script #17's reusable deliverable ships in bootstrap-assets/. This is the one item to confirm you're comfortable deferring.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/dispatcher/src/watchdog.ts`:
- Around line 141-149: Wrap each per-workflow tmux interaction so that
exceptions from deps.tmux.status() and deps.tmux.killSession() are caught and
handled without aborting runWatchdog; specifically, around the blocks that call
deps.tmux.status(row.session_name) and deps.tmux.killSession(row.session_name)
(and the similar calls at the other locations) add a try/catch that on error
calls failWorkflow(deps, row.id, `tmux error: ${err.message || err}`, now) (or
equivalent message), increments acted if appropriate, and continues to the next
row so reconciliation proceeds for other workflows. Ensure you catch errors for
both status() and killSession() calls individually so one failing call does not
prevent persisting the failure decision for that workflow.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: bb84d30f-c22e-4b83-9e3f-2f8537758b2a

📥 Commits

Reviewing files that changed from the base of the PR and between 7a20cf3 and 326fb3d.

⛔ Files ignored due to path filters (1)

bun.lock is excluded by !**/*.lock

📒 Files selected for processing (26)

packages/adapters/claude/src/classify.ts
packages/adapters/claude/src/hooks.ts
packages/adapters/claude/src/index.ts
packages/adapters/claude/test/adapter.test.ts
packages/cli/src/bootstrap-assets/hooks/hook.sh
packages/cli/test/bootstrap-hook.test.ts
packages/core/src/events.ts
packages/core/src/hook-script.ts
packages/core/src/index.ts
packages/dispatcher/package.json
packages/dispatcher/src/dispatch.ts
packages/dispatcher/src/hook-server.ts
packages/dispatcher/src/hook-store.ts
packages/dispatcher/src/main.ts
packages/dispatcher/src/rate-limits.ts
packages/dispatcher/src/watchdog-cron.ts
packages/dispatcher/src/watchdog.ts
packages/dispatcher/src/workflow-record.ts
packages/dispatcher/src/workflows/implementation.ts
packages/dispatcher/test/hook-server.test.ts
packages/dispatcher/test/hook-store.test.ts
packages/dispatcher/test/implementation-workflow.test.ts
packages/dispatcher/test/rate-limits.test.ts
packages/dispatcher/test/watchdog.test.ts
planning/issues/14/decisions.md
planning/issues/14/plan.md

… + persistence sink Expand the Phase 1 minimal receiver to the whole normalized event taxonomy: validate :event against NORMALIZED_EVENTS, authenticate every request against the per-session token (X-Middle-Token, timing-safe), and hand bodies to an injected HookStore. session.started records session_id+transcript_path and still signals the SessionGate. DbHookStore correlates by session name and drops unmatchable sessions. Wired into dispatchEpic and main. Closes #15.

Replace the SessionStart+Stop-only settings with the whole taxonomy (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Notification, Stop, SubagentStop, SessionEnd), each mapped to its normalized event and invoking the absolute, quoted hook.sh path. The script body is now single-sourced from a canonical HOOK_SH constant in @middle/core. Closes #16.

packages/cli/src/bootstrap-assets/hooks/hook.sh is what mm init stamps into a target repo. It is generated byte-identically from the canonical HOOK_SH constant and guarded by a drift test; POSIX sh, executable, 3s curl ceiling, failure is a no-op (exit 0). Closes #17.

…nto SQLite Cover event-row persistence, last_heartbeat advancing only on tool.pre/tool.post, session.started writing session_id+transcript_path, unmatchable-session drop, 16KB payload truncation, terminal-session exclusion, and a live HookServer → DbHookStore → SQLite path. The persistence implementation shipped alongside the sink seam in e5f504e; this is its dedicated coverage. Closes #18.

A 30s bunqueue cron reconciles every launching/running workflow: launch-timeout (stuck-launching), tmux liveness (dead session → failed + compensation hook), activity freshness (idle marker / idle-timeout kill, skipped while controlled_by='human', cross-checked against transcript staleness), and blocked.json sentinel re-arm. A companion pass corrects heartbeat drift from the source-of-truth transcript. The reconcile logic is pure and unit-tested; the cron wrapper is thin scheduling glue wired into the long-running dispatcher. Closes #19.

Add ClaudeAdapter.detectRateLimit (Stop-hook usage-limit detector) and a rate_limit_state read/write module. On a rate-limited Stop classification the implementation workflow persists { status: RATE_LIMITED, reset_at, source: 'transcript' }; a completed dispatch reverts a previously RATE_LIMITED adapter to AVAILABLE (probe-via-real-work). The cross-run delayed re-enqueue that reads reset_at is the Phase 8 auto-dispatch loop's job. Closes #20.

Complete the launch-timeout criterion's second reason: a running workflow that went ready (session.started recorded) but whose driven prompt never landed (no turn.started within the launch window) is failed 'prompt-not-accepted' and its session killed. turn.started (UserPromptSubmit) is the prompt-landed signal; its absence past the window is the failure. Measured from the session.started event ts, not updated_at (which heartbeats bump). Refines #19.

coderabbitai

♻️ Duplicate comments (1)

packages/dispatcher/src/watchdog.ts (1)

141-150: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard tmux calls to prevent reconciliation abort on single session failure.

A thrown error from deps.tmux.status() will abort runWatchdog mid-loop, leaving remaining workflows unreconciled. The same applies to the killSession calls at lines 164 and 179. Wrap each per-workflow tmux interaction in try/catch to ensure one dead/misbehaving session doesn't block reconciliation of the others.

Suggested fix

     // 2. tmux liveness — a dead session under a 'running' workflow.
     if (row.session_name) {
-      const status = await deps.tmux.status(row.session_name);
-      if (!status.alive) {
-        await deps.tmux.killSession(row.session_name);
-        failWorkflow(deps, row.id, "tmux session disappeared", now);
-        acted++;
+      let status: { alive: boolean; paneCount: number };
+      try {
+        status = await deps.tmux.status(row.session_name);
+      } catch (error) {
+        console.error(`[watchdog] tmux.status failed for ${row.session_name}: ${(error as Error).message}`);
+        failWorkflow(deps, row.id, "tmux status check failed", now);
+        acted++;
         continue;
       }
+      if (!status.alive) {
+        try {
+          await deps.tmux.killSession(row.session_name);
+        } catch (error) {
+          console.error(`[watchdog] killSession failed for ${row.session_name}: ${(error as Error).message}`);
+        }
+        failWorkflow(deps, row.id, "tmux session disappeared", now);
+        acted++;
+        continue;
+      }
     }

Apply similar try/catch guards to the killSession calls at lines 164 and 179.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/src/watchdog.ts` around lines 141 - 150, Wrap each
per-workflow tmux interaction (calls to deps.tmux.status and
deps.tmux.killSession) inside a try/catch within runWatchdog so that exceptions
for one row do not abort the loop; for each row where row.session_name is
truthy, call deps.tmux.status(row.session_name) inside try, handle non-alive by
calling deps.tmux.killSession(row.session_name) also in try/catch, call
failWorkflow(deps, row.id, "tmux session disappeared", now) and increment acted
only when the workflow is handled, and on caught errors log or record the error
and continue to the next row (do the same guard around the other killSession
usages that reference deps.tmux.killSession so a thrown error won't stop
reconciliation).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@packages/dispatcher/src/watchdog.ts`:
- Around line 141-150: Wrap each per-workflow tmux interaction (calls to
deps.tmux.status and deps.tmux.killSession) inside a try/catch within
runWatchdog so that exceptions for one row do not abort the loop; for each row
where row.session_name is truthy, call deps.tmux.status(row.session_name) inside
try, handle non-alive by calling deps.tmux.killSession(row.session_name) also in
try/catch, call failWorkflow(deps, row.id, "tmux session disappeared", now) and
increment acted only when the workflow is handled, and on caught errors log or
record the error and continue to the next row (do the same guard around the
other killSession usages that reference deps.tmux.killSession so a thrown error
won't stop reconciliation).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: aee7d20c-8be9-4741-9cf7-51a97a590489

📥 Commits

Reviewing files that changed from the base of the PR and between 326fb3d and d155f29.

⛔ Files ignored due to path filters (1)

bun.lock is excluded by !**/*.lock

📒 Files selected for processing (26)

packages/adapters/claude/src/classify.ts
packages/adapters/claude/src/hooks.ts
packages/adapters/claude/src/index.ts
packages/adapters/claude/test/adapter.test.ts
packages/cli/src/bootstrap-assets/hooks/hook.sh
packages/cli/test/bootstrap-hook.test.ts
packages/core/src/events.ts
packages/core/src/hook-script.ts
packages/core/src/index.ts
packages/dispatcher/package.json
packages/dispatcher/src/dispatch.ts
packages/dispatcher/src/hook-server.ts
packages/dispatcher/src/hook-store.ts
packages/dispatcher/src/main.ts
packages/dispatcher/src/rate-limits.ts
packages/dispatcher/src/watchdog-cron.ts
packages/dispatcher/src/watchdog.ts
packages/dispatcher/src/workflow-record.ts
packages/dispatcher/src/workflows/implementation.ts
packages/dispatcher/test/hook-server.test.ts
packages/dispatcher/test/hook-store.test.ts
packages/dispatcher/test/implementation-workflow.test.ts
packages/dispatcher/test/rate-limits.test.ts
packages/dispatcher/test/watchdog.test.ts
planning/issues/14/decisions.md
planning/issues/14/plan.md

✅ Files skipped from review due to trivial changes (2)

planning/issues/14/plan.md
planning/issues/14/decisions.md

…on pass A tmux status()/killSession() error previously bubbled out of runWatchdog, aborting the whole pass and leaving the remaining launching/running rows unreconciled. Guard each tmux interaction per-workflow: - status() errors are inconclusive (not a confirmed-dead signal), so the row is skipped this pass rather than failed — the 30s cron retries, and the activity-freshness rule is the backstop for a genuinely stuck agent. - killSession() runs only after a failure decision is already made, so a kill error is swallowed+logged via safeKillSession; the failure is still recorded.

thejustinwalsh · 2026-05-23T18:53:54Z

@coderabbitai review

coderabbitai · 2026-05-23T18:54:00Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/dispatcher/src/watchdog.ts`:
- Around line 154-167: The tmux liveness catch block currently does a "continue"
which aborts all remaining durable checks for that row; instead remove the
continue so the code proceeds to prompt-not-accepted, idle/idle-timeout,
transcript checks and sentinel re-arm, but still log the tmux error; to keep
downstream logic safe, in the catch set a fallback status (e.g., status = {
alive: false, paneCount: 0 }) after logging the error so later checks see a
defined status and reconciliation continues for the workflow referenced by
row.session_name and deps.tmux.status().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 56b8215b-a2a2-4d42-8973-0f09d61bc2a6

📥 Commits

Reviewing files that changed from the base of the PR and between d155f29 and 47538ca.

📒 Files selected for processing (2)

packages/dispatcher/src/watchdog.ts
packages/dispatcher/test/watchdog.test.ts

…hecks The previous guard used `continue` on a status() error, which skipped not just the liveness check but also prompt-not-accepted, idle/idle-timeout (rule 3) and sentinel re-arm for that row. That defeated the stated backstop: a row whose status() kept erroring would never idle-time-out and could stay 'running' forever. Leave status null on error and guard the liveness branch (`if (status && !status.alive)`) so the row falls through to the wall-clock freshness check. Add a regression test asserting a stale row still idle-times-out when status() persistently throws.

thejustinwalsh · 2026-05-23T19:01:23Z

@coderabbitai review

coderabbitai · 2026-05-23T19:01:27Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

thejustinwalsh commented May 23, 2026

View reviewed changes

thejustinwalsh marked this pull request as ready for review May 23, 2026 17:51

thejustinwalsh added the ready-for-review All phases done and verified — PR ready for final human review and merge label May 23, 2026

thejustinwalsh mentioned this pull request May 23, 2026

Hooks and watchdog #14

Closed

4 tasks

coderabbitai Bot requested changes May 23, 2026

View reviewed changes

Comment thread packages/dispatcher/src/watchdog.ts Outdated

thejustinwalsh added 8 commits May 23, 2026 14:44

docs(issue-14): Phase 2 hooks + watchdog plan

16164b4

thejustinwalsh force-pushed the middle-issue-14 branch from 326fb3d to d155f29 Compare May 23, 2026 18:45

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

coderabbitai Bot requested changes May 23, 2026

View reviewed changes

Comment thread packages/dispatcher/src/watchdog.ts Outdated

coderabbitai Bot approved these changes May 23, 2026

View reviewed changes

thejustinwalsh merged commit 2d4a7d2 into main May 23, 2026
1 check passed

thejustinwalsh mentioned this pull request May 29, 2026

Make the CodexAdapter functionally dispatchable against live codex 0.133.0 #177

Closed

9 tasks

Conversation

thejustinwalsh commented May 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Why these changes

Verification

Acceptance evidence (Epic #14)

Status

Decisions

Scope boundaries carried forward (already-planned phases — no new issues)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

thejustinwalsh May 23, 2026

Choose a reason for hiding this comment

Uh oh!

thejustinwalsh May 23, 2026

Choose a reason for hiding this comment

Uh oh!

thejustinwalsh May 23, 2026

Choose a reason for hiding this comment

Uh oh!

thejustinwalsh May 23, 2026

Choose a reason for hiding this comment

Uh oh!

thejustinwalsh May 23, 2026

Choose a reason for hiding this comment

Uh oh!

thejustinwalsh commented May 23, 2026

Reviewer's brief — Epic #14 (Phase 2: hooks + watchdog) → PR #75

How to run it

What to verify (and what "correct" looks like)

How to review

Fragile / needs extra eyes

Scope boundaries (already-planned phases — deliberately not done here)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

thejustinwalsh commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thejustinwalsh commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thejustinwalsh commented May 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading