feat(verify): verify-on-stop — run verify.toml gates before a 'done' parks for review by thejustinwalsh · Pull Request #136 · thejustinwalsh/middle

thejustinwalsh · 2026-05-25T04:36:15Z

What you asked for

Force the agent to run the verifications before it can finish, and feed failures back so it auto-fixes — so it can't ship an unverified PR.

Why this is the first live gate run

The verify.toml gates run nowhere in a live dispatch today: the per-phase checkbox-revert reconciler exists and is tested, but its production trigger is unbuilt (#101). So this is the first place gates actually execute against a running agent.

How (workflow-side, by design)

We discussed a literal Claude Stop hook, but that races the workflow's existing agent.stopped turn-boundary (it would classify + kill the session mid-fix). So this lives in the workflow, reusing the nudge machinery:

When the agent's Stop classifies as done (about to park for review), the workflow runs runGates(config.gates, {cwd: worktree}):

pass → done stands → parks for review as before.
fail → nudge the agent in-session with the gate report ("fix these, then finish"), re-await Stop, re-verify — up to verifyRoundCap (default 3).
exhausted → park for a human (waiting-human, worktree kept) — an unverified PR never goes up.
A re-classification that's no longer done (agent asked a question, etc.) is handed back to the normal branch.

Runs after the plan-comment guard; never races the Stop boundary or touches the hook protocol. Opt-in per repo — a missing/malformed verify.toml skips (so only repos with gates, like middle's own, enforce).

Verification

format/lint/typecheck clean; full suite 493 pass, incl. 3 new tests: fail-then-pass→parks-for-review, cap-exhausted→waiting-human (worktree kept), and no-seam→unchanged.

Note

Complements, doesn't replace, #101 (the per-push per-phase trigger). This is the "before the PR goes up" backstop.

Summary by CodeRabbit

New Features
- Verify-on-stop: repository verification gates run automatically when tasks are marked done, returning concise, formatted failure reports.
- In-session nudges show gate failure details so users can fix and retry; verification attempts are capped to avoid infinite retries.
- Verification runs integrate with stop flow and extended timeouts to allow retry/repair during the stop process.
Tests
- Added end-to-end tests for verify-on-stop retries, failure exhaustion, bare-stop behavior, and session cleanup.

coderabbitai · 2026-05-25T04:36:25Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 376b264d-43c1-4d07-bed0-2dfcde4e83ae

📥 Commits

Reviewing files that changed from the base of the PR and between 12c3082 and 2ce9509.

📒 Files selected for processing (3)

packages/dispatcher/src/build-deps.ts
packages/dispatcher/src/workflows/implementation.ts
packages/dispatcher/test/implementation-workflow.test.ts

📝 Walkthrough

Walkthrough

Adds an optional runVerifyGates hook and enforceVerifyOnDone loop: when a workflow outcome is done, the workflow runs repository verify.toml gates, nudges the agent with gate failure reports and re-awaits Stop until gates pass or a retry cap is reached; tests cover pass/fail/bare-stop/absent seam behaviors.

Changes

Verify-on-stop gate enforcement

Layer / File(s)	Summary
Gate verification capability `packages/dispatcher/src/build-deps.ts`	`formatGateFailures` helper truncates and formats gate failure output; new `deps.runVerifyGates(worktree)` loads and executes `verify.toml` gates, logs and skips on load error, and returns formatted failure report on gate failure.
Workflow dependency contract and constants `packages/dispatcher/src/workflows/implementation.ts`	`ImplementationDeps` gains optional `runVerifyGates` and `verifyRoundCap` fields; `DEFAULT_VERIFY_ROUND_CAP` and local `verifyRoundCap` binding added.
Verify-on-done enforcement and integration `packages/dispatcher/src/workflows/implementation.ts`	New `enforceVerifyOnDone` loops: runs gates, returns `done` on pass, otherwise nudges agent and re-awaits Stop; exhausting `verifyRoundCap` returns `nudge-exhausted`. `driveOnce` invokes this when `runVerifyGates` is configured; launch-and-drive timeout expanded to include retry budget.
Verify-on-stop test suite `packages/dispatcher/test/implementation-workflow.test.ts`	E2E tests: verify failing then passing in-session, verify never passing with round cap (parks for human), bare-stop re-stop cannot bypass verification, and seam-absent behavior (done parks unchanged).

Sequence Diagram(s)

sequenceDiagram
  participant driveOnce
  participant enforceVerifyOnDone
  participant runVerifyGates
  participant tmuxSession as tmux session

  driveOnce->>enforceVerifyOnDone: outcome done + runVerifyGates available
  loop verify rounds until pass or cap exhausted
    enforceVerifyOnDone->>runVerifyGates: run gates for worktree
    runVerifyGates-->>enforceVerifyOnDone: { ok, report }
    opt gates failed
      enforceVerifyOnDone->>tmuxSession: nudge agent with failure report
      enforceVerifyOnDone->>enforceVerifyOnDone: re-await Stop, re-classify
    end
  end

  alt gates eventually pass
    enforceVerifyOnDone->>driveOnce: return done (park for review)
  else verifyRoundCap exhausted
    enforceVerifyOnDone->>driveOnce: return nudge-exhausted (park for human)
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

thejustinwalsh/middle#99: Introduces loadVerifyConfig and runGates utilities that are consumed by the new runVerifyGates implementation to load and execute verify.toml gates.

Suggested labels

ready-for-review

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main feature: adding verify-on-stop functionality to run verify.toml gates before 'done' outcomes park for review. It is specific, concise, and clearly communicates the primary change.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/dispatcher/src/workflows/implementation.ts`:
- Line 391: The step timeout calculation for the launch-and-drive flow currently
only accounts for maxNudges, so additional awaits caused by verifyRoundCap can
make the workflow hit the timeout during verify retries; update the timeout
computation and/or retry loop to include the verifyRoundCap waits (or
multiply/add verifyRoundCap * awaitStop) so verifyRoundCap is reflected in the
overall step timeout budget, and adjust the awaitStop/verify retry logic in the
functions referencing verifyRoundCap and maxNudges (look for verifyRoundCap,
maxNudges, awaitStop and the step timeout/launch-and-drive timeout computation)
to ensure the total estimated wait time includes both maxNudges and
verifyRoundCap contributions.
- Around line 463-468: The current early return of classification after
args.classifyAt(stopPayload) lets a re-classified "bare-stop" bypass done-signal
enforcement; change the branch so that if classification.kind === "bare-stop"
you call the resolver (e.g., deps.resolveBareStop or the existing
resolveBareStop helper) with the stopPayload/classification and return its
result, otherwise keep the existing return for non-"done" kinds; keep using
awaitStop, args.classifyAt and classification to locate the logic and ensure
"bare-stop" flows through resolveBareStop instead of returning raw
classification.

In `@packages/dispatcher/test/implementation-workflow.test.ts`:
- Around line 697-700: The test currently only checks the row reached
"waiting-human" and the worktree was kept; update it to assert the cap-exhausted
terminal semantics by verifying the wait signal was consumed/not re-armed: after
awaitRow(id, "waiting-human") and the expect(verifyCalls).toBe(2) check, add an
assertion that there are no further "waiting-human" signals for this id (e.g.,
query the row state or try a follow-up awaitRow/inspection and assert it is not
"waiting-human"/no new wait created) so the test proves the path is terminal
rather than a reusable park; keep existing checks on listWorktrees and
verifyCalls.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a67f81f5-ddaf-4d11-884d-da503b9ee723

📥 Commits

Reviewing files that changed from the base of the PR and between 4cf69ca and dd7b3d6.

📒 Files selected for processing (3)

packages/dispatcher/src/build-deps.ts
packages/dispatcher/src/workflows/implementation.ts
packages/dispatcher/test/implementation-workflow.test.ts

…lassification Addresses CodeRabbit review on #136: - widen launch-and-drive step timeout to include verifyRoundCap nudges - route a re-classified bare-stop in the verify loop through resolveBareStop so it can't bypass the #80 done-signal as legacy completion; loop to re-verify when it clears, hand a parked outcome back otherwise - assert terminal (no armed resume signal) semantics on the cap-exhaustion test, and add a bare-stop-bypass regression test

thejustinwalsh · 2026-05-25T04:47:16Z

@coderabbitai review

coderabbitai · 2026-05-25T04:47:22Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

♻️ Duplicate comments (1)

packages/dispatcher/src/workflows/implementation.ts (1)

857-862: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Step timeout still undercounts verify→bare-stop retry paths.

Line 857 widens the backstop for verifyRoundCap, but it still only budgets one awaitStop(...) per verify round. That misses the nested resolveBareStop(...) path at Lines 475-483, which can spend up to maxNudges additional stop waits before returning. With maxNudges=3 and verifyRoundCap=3, a done -> verify fail -> bare-stop -> ... -> done cycle can consume 12 nudge waits while this timeout only budgets 6, so bunqueue can still kill the live session mid-retry.

Proposed fix

       .step("launch-and-drive", launchAndDrive, {
         retry: 1,
         // Backstop above the internal waits, widened for both bounded in-session
         // nudge loops — bare-stop done-signal nudges (up to maxNudges) and
-        // verify-on-stop fix nudges (up to verifyRoundCap), each a further
-        // Stop-await — so it can't fire mid-nudge.
+        // verify-on-stop fix nudges. A verify retry can itself fall into the
+        // bare-stop loop, so budget `maxNudges` for the initial stop handling
+        // plus `verifyRoundCap * (1 + maxNudges)` further Stop-awaits.
         timeout:
-          launchTimeout + stopTimeout + (maxNudges + verifyRoundCap) * nudgeStopTimeout + 60_000,
+          launchTimeout +
+          stopTimeout +
+          (maxNudges + verifyRoundCap * (maxNudges + 1)) * nudgeStopTimeout +
+          60_000,
       })

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/src/workflows/implementation.ts` around lines 857 - 862,
The timeout backstop undercounts nested stop waits because each verify round can
itself trigger up to maxNudges additional awaitStop waits via resolveBareStop;
update the timeout calculation (the timeout property computed from
launchTimeout, stopTimeout, maxNudges, verifyRoundCap, nudgeStopTimeout) so it
budgets maxNudges for the initial nudges plus maxNudges for each verify round —
e.g. replace the (maxNudges + verifyRoundCap) * nudgeStopTimeout term with
maxNudges * (1 + verifyRoundCap) * nudgeStopTimeout so the timeout covers the
nested verify→bare-stop retry paths.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@packages/dispatcher/src/workflows/implementation.ts`:
- Around line 857-862: The timeout backstop undercounts nested stop waits
because each verify round can itself trigger up to maxNudges additional
awaitStop waits via resolveBareStop; update the timeout calculation (the timeout
property computed from launchTimeout, stopTimeout, maxNudges, verifyRoundCap,
nudgeStopTimeout) so it budgets maxNudges for the initial nudges plus maxNudges
for each verify round — e.g. replace the (maxNudges + verifyRoundCap) *
nudgeStopTimeout term with maxNudges * (1 + verifyRoundCap) * nudgeStopTimeout
so the timeout covers the nested verify→bare-stop retry paths.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 091db836-c873-40fa-a73e-5b16f3fe1b4f

📥 Commits

Reviewing files that changed from the base of the PR and between dd7b3d6 and ec6e0e3.

📒 Files selected for processing (2)

packages/dispatcher/src/workflows/implementation.ts
packages/dispatcher/test/implementation-workflow.test.ts

thejustinwalsh · 2026-05-25T04:54:48Z

@coderabbitai review

coderabbitai · 2026-05-25T04:54:55Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

… to fix (verify-on-stop) The agent's verify.toml gates run NOWHERE in a live dispatch today — the checkbox-revert reconciler exists but its production trigger is unbuilt (#101). This adds the first live gate run: when the agent claims 'done' (about to park for review), the workflow runs runGates(config.gates) in the worktree. On failure it nudges the agent IN-SESSION with the gate report ('fix these, then finish') and re-awaits the Stop, up to verifyRoundCap (3). It finalizes 'done' only when gates pass; exhausting the cap parks for a human (worktree kept) so an unverified PR never goes up. A re-classification that's no longer 'done' is handed back to the normal branch. Workflow-side (not a blocking Claude Stop hook) so it reuses the existing nudge machinery and never races the agent.stopped turn-boundary. Opt-in per repo: a missing/malformed verify.toml → skip. Wired in build-deps via loadVerifyConfig + runGates; tests cover fail-then-pass→park, cap-exhausted→waiting-human, and the no-seam no-op.

…lassification Addresses CodeRabbit review on #136: - widen launch-and-drive step timeout to include verifyRoundCap nudges - route a re-classified bare-stop in the verify loop through resolveBareStop so it can't bypass the #80 done-signal as legacy completion; loop to re-verify when it clears, hand a parked outcome back otherwise - assert terminal (no armed resume signal) semantics on the cap-exhaustion test, and add a bare-stop-bypass regression test

…step timeout The launch-and-drive backstop counted one awaitStop per verify round plus maxNudges, but each verify round's re-stop can re-enter resolveBareStop for up to maxNudges further stop waits. Worst case is maxNudges + verifyRoundCap*(1 + maxNudges) (15 with the defaults, not 6), so a done->verify-fail->bare-stop retry cycle could be killed mid-flight. Hoisted as maxNudgeStopWaits.

…lassification Addresses CodeRabbit review on #136: - widen launch-and-drive step timeout to include verifyRoundCap nudges - route a re-classified bare-stop in the verify loop through resolveBareStop so it can't bypass the #80 done-signal as legacy completion; loop to re-verify when it clears, hand a parked outcome back otherwise - assert terminal (no armed resume signal) semantics on the cap-exhaustion test, and add a bare-stop-bypass regression test

thejustinwalsh added the dogfood Work dispatched through middle once dogfooding starts (phases 3+) label May 25, 2026

coderabbitai Bot requested changes May 25, 2026

View reviewed changes

Comment thread packages/dispatcher/src/workflows/implementation.ts

Comment thread packages/dispatcher/src/workflows/implementation.ts

Comment thread packages/dispatcher/test/implementation-workflow.test.ts

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

coderabbitai Bot approved these changes May 25, 2026

View reviewed changes

thejustinwalsh added 3 commits May 25, 2026 00:58

thejustinwalsh force-pushed the feat/verify-on-stop branch from 12c3082 to 2ce9509 Compare May 25, 2026 04:58

thejustinwalsh merged commit 855e222 into main May 25, 2026
1 check was pending

thejustinwalsh deleted the feat/verify-on-stop branch May 25, 2026 04:59

This was referenced May 26, 2026

feat(self-audit): integration-verified requirements + freshness (Epic #143) #163

Merged

fix(dispatcher): self-heal a blocked agent that doesn't exit cleanly #165

Merged

thejustinwalsh mentioned this pull request May 29, 2026

Make the CodexAdapter functionally dispatchable against live codex 0.133.0 #177

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(verify): verify-on-stop — run verify.toml gates before a 'done' parks for review#136

feat(verify): verify-on-stop — run verify.toml gates before a 'done' parks for review#136
thejustinwalsh merged 3 commits into
mainfrom
feat/verify-on-stop

thejustinwalsh commented May 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thejustinwalsh commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

thejustinwalsh commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thejustinwalsh commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What you asked for

Why this is the first live gate run

How (workflow-side, by design)

Verification

Note

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thejustinwalsh commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

thejustinwalsh commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thejustinwalsh commented May 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading