fix(dispatcher): park arms the actual-reason signal even when blocked: is pre-armed#235
Conversation
…: is pre-armed parkForResume was guarded by isWaitForArmed: skip arming the actual- reason signal when ANY signal was already armed. The intent was to avoid duplicate rows, justified by the comment 'both names map to the same resume reason'. That comment is wrong: blocked:<id> → answered-question (poller watches Epic comments) epic-N-review-resolved → review-changes (poller watches PR reviews) So when the watchdog's rule-4 sentinel pass armed blocked:<id> from a stale .middle/blocked.json *before* parkForResume ran for an outcome of kind='done' (review-changes), the review-resolved signal was never armed. The poller had no matching arm to fire when CR responded with CHANGES_REQUESTED — the workflow stayed parked forever. Real incident: PR #230 / Epic #208 sat in waiting-human for ~11h after CR posted. Fix: always arm the actual-reason signal in parkForResume. armWaitFor- Signal is INSERT OR IGNORE keyed on signal_name, so re-arming the same name is a no-op; arming a *different* name leaves both wake paths live — which is the correct semantics when the workflow could legitimately wake on either a human answer or a CR re-review. Sentinel cleanup (.middle/blocked.json) is also lifted out of the asked-question branch so it runs on every park. Otherwise a stale sentinel left from an earlier phase would still cause the watchdog to re-arm blocked:<id> on the next tick after a done-park, racing the legitimate review-resolved arm and re-introducing the bug class. Two test updates: - New regression test confirms a done-park arms epic-N-review-resolved even when blocked:<id> is pre-armed. - Existing 'no duplicate' test renamed + retargeted to assert BOTH signals are armed (the new correct contract). The old assertion codified the bug.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR fixes a race condition in the park/resume signaling logic where stale ChangesparkForResume dual-signal arming and sentinel cleanup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Fix: parkForResume orphans the workflow when blocked:<id> is pre-armed
Root cause
parkForResumewas guarded byisWaitForArmed: skip arming the actual-reason signal when ANY signal was already armed. The intent was to avoid duplicate rows, justified by the comment "both names map to the same resume reason." That comment is wrong:blocked:<id>answered-questionepic-N-review-resolvedreview-changesWhen the watchdog's rule-4 sentinel pass armed
blocked:<id>from a stale.middle/blocked.jsonbeforeparkForResumeran forkind = "done"(review-changes), theepic-N-review-resolvedsignal was never armed. CR'sCHANGES_REQUESTEDhad no matching arm to fire; the workflow stayed parked.Real incident: PR #230 / Epic #208 sat in
waiting-human~11h after CR responded. The user's frustration about "another issue sitting in awaiting changes, no movement again" surfaced the class.Fix
parkForResumealways arms the actual-reason signal.armWaitForSignalisINSERT OR IGNOREkeyed onsignal_name, so re-arming the same name is a no-op; arming a different name leaves both wake paths live. The poller already iterates every armed wait independently (loadPollableWaitsdoesn't dedupe), so each signal fires its own classify path when its specific event appears.Sentinel cleanup is lifted out of the
asked-questionbranch. Every park now removes.middle/blocked.jsonfrom the worktree. Otherwise a stale sentinel left from an earlier phase would keep causing the watchdog's rule-4 pass to re-armblocked:<id>on the next tick after a done-park, racing the legitimateepic-N-review-resolvedarm and re-introducing the class.Tests: new regression test confirms the done-park dual-signal case; the existing "no duplicate" test (which codified the bug) is renamed + retargeted to assert BOTH signals end up armed in the asked-question case.
Verification
bun run typecheck— cleanbun run lint(oxlint --deny-warnings) — cleanbun test packages/dispatcher/test/implementation-workflow.test.ts— 40 / 40 pass (regression test + updated existing test both green)bun test(full repo) — 1469 / 1469 passOperator-side unblock for #230
The fix prevents the orphan going forward, but PR #230 / Epic #208 is already parked (its
blocked:<id>was armed without the matchingreview-resolved). One SQL statement against~/.middle/db.sqlite3rearms it so the next poller tick picks it up:This is a one-off recovery for the already-orphaned row; no future row needs this once the fix lands.
Scope
packages/dispatcher/src/workflows/implementation.ts(parkForResume) +packages/dispatcher/test/implementation-workflow.test.ts. No schema change, no public API change.Summary by CodeRabbit
Bug Fixes
Tests