feat(verify): verify-on-stop — run verify.toml gates before a 'done' parks for review#136
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds an optional runVerifyGates hook and enforceVerifyOnDone loop: when a workflow outcome is ChangesVerify-on-stop gate enforcement
Sequence Diagram(s)sequenceDiagram
participant driveOnce
participant enforceVerifyOnDone
participant runVerifyGates
participant tmuxSession as tmux session
driveOnce->>enforceVerifyOnDone: outcome done + runVerifyGates available
loop verify rounds until pass or cap exhausted
enforceVerifyOnDone->>runVerifyGates: run gates for worktree
runVerifyGates-->>enforceVerifyOnDone: { ok, report }
opt gates failed
enforceVerifyOnDone->>tmuxSession: nudge agent with failure report
enforceVerifyOnDone->>enforceVerifyOnDone: re-await Stop, re-classify
end
end
alt gates eventually pass
enforceVerifyOnDone->>driveOnce: return done (park for review)
else verifyRoundCap exhausted
enforceVerifyOnDone->>driveOnce: return nudge-exhausted (park for human)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/dispatcher/src/workflows/implementation.ts`:
- Line 391: The step timeout calculation for the launch-and-drive flow currently
only accounts for maxNudges, so additional awaits caused by verifyRoundCap can
make the workflow hit the timeout during verify retries; update the timeout
computation and/or retry loop to include the verifyRoundCap waits (or
multiply/add verifyRoundCap * awaitStop) so verifyRoundCap is reflected in the
overall step timeout budget, and adjust the awaitStop/verify retry logic in the
functions referencing verifyRoundCap and maxNudges (look for verifyRoundCap,
maxNudges, awaitStop and the step timeout/launch-and-drive timeout computation)
to ensure the total estimated wait time includes both maxNudges and
verifyRoundCap contributions.
- Around line 463-468: The current early return of classification after
args.classifyAt(stopPayload) lets a re-classified "bare-stop" bypass done-signal
enforcement; change the branch so that if classification.kind === "bare-stop"
you call the resolver (e.g., deps.resolveBareStop or the existing
resolveBareStop helper) with the stopPayload/classification and return its
result, otherwise keep the existing return for non-"done" kinds; keep using
awaitStop, args.classifyAt and classification to locate the logic and ensure
"bare-stop" flows through resolveBareStop instead of returning raw
classification.
In `@packages/dispatcher/test/implementation-workflow.test.ts`:
- Around line 697-700: The test currently only checks the row reached
"waiting-human" and the worktree was kept; update it to assert the cap-exhausted
terminal semantics by verifying the wait signal was consumed/not re-armed: after
awaitRow(id, "waiting-human") and the expect(verifyCalls).toBe(2) check, add an
assertion that there are no further "waiting-human" signals for this id (e.g.,
query the row state or try a follow-up awaitRow/inspection and assert it is not
"waiting-human"/no new wait created) so the test proves the path is terminal
rather than a reusable park; keep existing checks on listWorktrees and
verifyCalls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: a67f81f5-ddaf-4d11-884d-da503b9ee723
📒 Files selected for processing (3)
packages/dispatcher/src/build-deps.tspackages/dispatcher/src/workflows/implementation.tspackages/dispatcher/test/implementation-workflow.test.ts
…lassification Addresses CodeRabbit review on #136: - widen launch-and-drive step timeout to include verifyRoundCap nudges - route a re-classified bare-stop in the verify loop through resolveBareStop so it can't bypass the #80 done-signal as legacy completion; loop to re-verify when it clears, hand a parked outcome back otherwise - assert terminal (no armed resume signal) semantics on the cap-exhaustion test, and add a bare-stop-bypass regression test
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
packages/dispatcher/src/workflows/implementation.ts (1)
857-862:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStep timeout still undercounts verify→bare-stop retry paths.
Line 857 widens the backstop for
verifyRoundCap, but it still only budgets oneawaitStop(...)per verify round. That misses the nestedresolveBareStop(...)path at Lines 475-483, which can spend up tomaxNudgesadditional stop waits before returning. WithmaxNudges=3andverifyRoundCap=3, adone -> verify fail -> bare-stop -> ... -> donecycle can consume 12 nudge waits while this timeout only budgets 6, so bunqueue can still kill the live session mid-retry.Proposed fix
.step("launch-and-drive", launchAndDrive, { retry: 1, // Backstop above the internal waits, widened for both bounded in-session // nudge loops — bare-stop done-signal nudges (up to maxNudges) and - // verify-on-stop fix nudges (up to verifyRoundCap), each a further - // Stop-await — so it can't fire mid-nudge. + // verify-on-stop fix nudges. A verify retry can itself fall into the + // bare-stop loop, so budget `maxNudges` for the initial stop handling + // plus `verifyRoundCap * (1 + maxNudges)` further Stop-awaits. timeout: - launchTimeout + stopTimeout + (maxNudges + verifyRoundCap) * nudgeStopTimeout + 60_000, + launchTimeout + + stopTimeout + + (maxNudges + verifyRoundCap * (maxNudges + 1)) * nudgeStopTimeout + + 60_000, })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/dispatcher/src/workflows/implementation.ts` around lines 857 - 862, The timeout backstop undercounts nested stop waits because each verify round can itself trigger up to maxNudges additional awaitStop waits via resolveBareStop; update the timeout calculation (the timeout property computed from launchTimeout, stopTimeout, maxNudges, verifyRoundCap, nudgeStopTimeout) so it budgets maxNudges for the initial nudges plus maxNudges for each verify round — e.g. replace the (maxNudges + verifyRoundCap) * nudgeStopTimeout term with maxNudges * (1 + verifyRoundCap) * nudgeStopTimeout so the timeout covers the nested verify→bare-stop retry paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@packages/dispatcher/src/workflows/implementation.ts`:
- Around line 857-862: The timeout backstop undercounts nested stop waits
because each verify round can itself trigger up to maxNudges additional
awaitStop waits via resolveBareStop; update the timeout calculation (the timeout
property computed from launchTimeout, stopTimeout, maxNudges, verifyRoundCap,
nudgeStopTimeout) so it budgets maxNudges for the initial nudges plus maxNudges
for each verify round — e.g. replace the (maxNudges + verifyRoundCap) *
nudgeStopTimeout term with maxNudges * (1 + verifyRoundCap) * nudgeStopTimeout
so the timeout covers the nested verify→bare-stop retry paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 091db836-c873-40fa-a73e-5b16f3fe1b4f
📒 Files selected for processing (2)
packages/dispatcher/src/workflows/implementation.tspackages/dispatcher/test/implementation-workflow.test.ts
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
… to fix (verify-on-stop) The agent's verify.toml gates run NOWHERE in a live dispatch today — the checkbox-revert reconciler exists but its production trigger is unbuilt (#101). This adds the first live gate run: when the agent claims 'done' (about to park for review), the workflow runs runGates(config.gates) in the worktree. On failure it nudges the agent IN-SESSION with the gate report ('fix these, then finish') and re-awaits the Stop, up to verifyRoundCap (3). It finalizes 'done' only when gates pass; exhausting the cap parks for a human (worktree kept) so an unverified PR never goes up. A re-classification that's no longer 'done' is handed back to the normal branch. Workflow-side (not a blocking Claude Stop hook) so it reuses the existing nudge machinery and never races the agent.stopped turn-boundary. Opt-in per repo: a missing/malformed verify.toml → skip. Wired in build-deps via loadVerifyConfig + runGates; tests cover fail-then-pass→park, cap-exhausted→waiting-human, and the no-seam no-op.
…lassification Addresses CodeRabbit review on #136: - widen launch-and-drive step timeout to include verifyRoundCap nudges - route a re-classified bare-stop in the verify loop through resolveBareStop so it can't bypass the #80 done-signal as legacy completion; loop to re-verify when it clears, hand a parked outcome back otherwise - assert terminal (no armed resume signal) semantics on the cap-exhaustion test, and add a bare-stop-bypass regression test
…step timeout The launch-and-drive backstop counted one awaitStop per verify round plus maxNudges, but each verify round's re-stop can re-enter resolveBareStop for up to maxNudges further stop waits. Worst case is maxNudges + verifyRoundCap*(1 + maxNudges) (15 with the defaults, not 6), so a done->verify-fail->bare-stop retry cycle could be killed mid-flight. Hoisted as maxNudgeStopWaits.
12c3082 to
2ce9509
Compare
…lassification Addresses CodeRabbit review on #136: - widen launch-and-drive step timeout to include verifyRoundCap nudges - route a re-classified bare-stop in the verify loop through resolveBareStop so it can't bypass the #80 done-signal as legacy completion; loop to re-verify when it clears, hand a parked outcome back otherwise - assert terminal (no armed resume signal) semantics on the cap-exhaustion test, and add a bare-stop-bypass regression test
What you asked for
Force the agent to run the verifications before it can finish, and feed failures back so it auto-fixes — so it can't ship an unverified PR.
Why this is the first live gate run
The
verify.tomlgates run nowhere in a live dispatch today: the per-phase checkbox-revert reconciler exists and is tested, but its production trigger is unbuilt (#101). So this is the first place gates actually execute against a running agent.How (workflow-side, by design)
We discussed a literal Claude
Stophook, but that races the workflow's existingagent.stoppedturn-boundary (it would classify + kill the session mid-fix). So this lives in the workflow, reusing the nudge machinery:When the agent's Stop classifies as
done(about to park for review), the workflow runsrunGates(config.gates, {cwd: worktree}):donestands → parks for review as before.verifyRoundCap(default 3).waiting-human, worktree kept) — an unverified PR never goes up.done(agent asked a question, etc.) is handed back to the normal branch.Runs after the plan-comment guard; never races the Stop boundary or touches the hook protocol. Opt-in per repo — a missing/malformed
verify.tomlskips (so only repos with gates, like middle's own, enforce).Verification
format/lint/typecheckclean; full suite 493 pass, incl. 3 new tests: fail-then-pass→parks-for-review, cap-exhausted→waiting-human (worktree kept), and no-seam→unchanged.Note
Complements, doesn't replace, #101 (the per-push per-phase trigger). This is the "before the PR goes up" backstop.
Summary by CodeRabbit
New Features
Tests