Skip to content

fix(dispatcher): park arms the actual-reason signal even when blocked: is pre-armed#235

Merged
thejustinwalsh merged 1 commit into
mainfrom
fix/park-arms-actual-reason
Jun 5, 2026
Merged

fix(dispatcher): park arms the actual-reason signal even when blocked: is pre-armed#235
thejustinwalsh merged 1 commit into
mainfrom
fix/park-arms-actual-reason

Conversation

@thejustinwalsh

@thejustinwalsh thejustinwalsh commented Jun 5, 2026

Copy link
Copy Markdown
Owner

Fix: parkForResume orphans the workflow when blocked:<id> is pre-armed

Root cause

parkForResume was guarded by isWaitForArmed: skip arming the actual-reason signal when ANY signal was already armed. The intent was to avoid duplicate rows, justified by the comment "both names map to the same resume reason." That comment is wrong:

signal name resume reason what wakes it
blocked:<id> answered-question a human comment on the Epic
epic-N-review-resolved review-changes a PR review (APPROVED / CHANGES_REQUESTED)

When the watchdog's rule-4 sentinel pass armed blocked:<id> from a stale .middle/blocked.json before parkForResume ran for kind = "done" (review-changes), the epic-N-review-resolved signal was never armed. CR's CHANGES_REQUESTED had no matching arm to fire; the workflow stayed parked.

Real incident: PR #230 / Epic #208 sat in waiting-human ~11h after CR responded. The user's frustration about "another issue sitting in awaiting changes, no movement again" surfaced the class.

Fix

  1. parkForResume always arms the actual-reason signal. armWaitForSignal is INSERT OR IGNORE keyed on signal_name, so re-arming the same name is a no-op; arming a different name leaves both wake paths live. The poller already iterates every armed wait independently (loadPollableWaits doesn't dedupe), so each signal fires its own classify path when its specific event appears.

  2. Sentinel cleanup is lifted out of the asked-question branch. Every park now removes .middle/blocked.json from the worktree. Otherwise a stale sentinel left from an earlier phase would keep causing the watchdog's rule-4 pass to re-arm blocked:<id> on the next tick after a done-park, racing the legitimate epic-N-review-resolved arm and re-introducing the class.

  3. Tests: new regression test confirms the done-park dual-signal case; the existing "no duplicate" test (which codified the bug) is renamed + retargeted to assert BOTH signals end up armed in the asked-question case.

Verification

  • bun run typecheck — clean
  • bun run lint (oxlint --deny-warnings) — clean
  • bun test packages/dispatcher/test/implementation-workflow.test.ts40 / 40 pass (regression test + updated existing test both green)
  • bun test (full repo) — 1469 / 1469 pass

Operator-side unblock for #230

The fix prevents the orphan going forward, but PR #230 / Epic #208 is already parked (its blocked:<id> was armed without the matching review-resolved). One SQL statement against ~/.middle/db.sqlite3 rearms it so the next poller tick picks it up:

INSERT OR IGNORE INTO waitfor_signals
  (signal_name, workflow_id, created_at, payload_json)
VALUES
  ('epic-208-review-resolved',
   'wf_1780554031185_6h4v8mc5',
   strftime('%s','now') * 1000,
   '{"reason":"review-changes"}');

This is a one-off recovery for the already-orphaned row; no future row needs this once the fix lands.

Scope

Summary by CodeRabbit

  • Bug Fixes

    • Improved workflow signal handling to ensure consistent state transition logic in all scenarios.
    • Fixed race condition where outdated state data could cause unintended workflow triggers.
  • Tests

    • Enhanced test coverage for complex state transition scenarios.
    • Added regression tests for workflow management edge cases.

…: is pre-armed

parkForResume was guarded by isWaitForArmed: skip arming the actual-
reason signal when ANY signal was already armed. The intent was to
avoid duplicate rows, justified by the comment 'both names map to the
same resume reason'. That comment is wrong:

  blocked:<id>              → answered-question  (poller watches Epic comments)
  epic-N-review-resolved    → review-changes     (poller watches PR reviews)

So when the watchdog's rule-4 sentinel pass armed blocked:<id> from a
stale .middle/blocked.json *before* parkForResume ran for an outcome
of kind='done' (review-changes), the review-resolved signal was never
armed. The poller had no matching arm to fire when CR responded with
CHANGES_REQUESTED — the workflow stayed parked forever. Real incident:
PR #230 / Epic #208 sat in waiting-human for ~11h after CR posted.

Fix: always arm the actual-reason signal in parkForResume. armWaitFor-
Signal is INSERT OR IGNORE keyed on signal_name, so re-arming the same
name is a no-op; arming a *different* name leaves both wake paths
live — which is the correct semantics when the workflow could
legitimately wake on either a human answer or a CR re-review.

Sentinel cleanup (.middle/blocked.json) is also lifted out of the
asked-question branch so it runs on every park. Otherwise a stale
sentinel left from an earlier phase would still cause the watchdog
to re-arm blocked:<id> on the next tick after a done-park, racing
the legitimate review-resolved arm and re-introducing the bug class.

Two test updates:
- New regression test confirms a done-park arms epic-N-review-resolved
  even when blocked:<id> is pre-armed.
- Existing 'no duplicate' test renamed + retargeted to assert BOTH
  signals are armed (the new correct contract). The old assertion
  codified the bug.
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2088b7df-5b0f-498f-88fe-51ffe3faf392

📥 Commits

Reviewing files that changed from the base of the PR and between 4f32410 and 6dc41bb.

📒 Files selected for processing (2)
  • packages/dispatcher/src/workflows/implementation.ts
  • packages/dispatcher/test/implementation-workflow.test.ts

📝 Walkthrough

Walkthrough

The PR fixes a race condition in the park/resume signaling logic where stale .middle/blocked.json sentinels can cause incorrect signal arming. parkForResume now unconditionally arms the durable signal for the actual park reason and unconditionally removes the sentinel file on every park, preventing stale sentinels from re-triggering watchdog logic when they map to a different resume signal than the agent's actual stop reason.

Changes

parkForResume dual-signal arming and sentinel cleanup

Layer / File(s) Summary
parkForResume signal and sentinel logic
packages/dispatcher/src/workflows/implementation.ts
Removes isWaitForArmed import and the idempotency guard that conditionally skipped signal arming. parkForResume now unconditionally arms the durable signal for the actual park reason and unconditionally removes .middle/blocked.json on every park outcome, preventing stale sentinels from re-triggering the wrong resume path.
parkForResume dual-signal test coverage
packages/dispatcher/test/implementation-workflow.test.ts
Replaces prior idempotency assertion with new test coverage for the dual-signal scenario: one test asserts that both pre-armed blocked:<id> and the actual-reason signal are armed together, and a regression test simulates the watchdog pre-arming blocked:<id> before the agent finishes with kind = "done", verifying both signals coexist for the workflow.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • thejustinwalsh/middle#206: Both PRs modify parkForResume logic to unconditionally clean up the .middle/blocked.json sentinel to prevent stale sentinel re-triggers.
  • thejustinwalsh/middle#165: PR #165 adds watchdog-driven blocked:<id> self-heal while this PR modifies parkForResume to unconditionally arm the correct durable resume signal and adjust sentinel cleanup to handle races between watchdog and agent signal arming.
  • thejustinwalsh/middle#90: This PR's changes to how .middle/blocked.json is cleared and waitfor_signals is armed in parkForResume directly interact with the retrieved PR's classifyStop logic that reads the sentinel file to drive the asked-question resume behavior.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main bug fix: parkForResume now arms the actual-reason signal even when a blocked signal is pre-armed, which directly addresses the root cause of workflows becoming orphaned.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@thejustinwalsh thejustinwalsh marked this pull request as ready for review June 5, 2026 02:35
@thejustinwalsh thejustinwalsh added the ready-for-review All phases done and verified — PR ready for final human review and merge label Jun 5, 2026
@thejustinwalsh thejustinwalsh merged commit 36a3037 into main Jun 5, 2026
2 checks passed
@thejustinwalsh thejustinwalsh deleted the fix/park-arms-actual-reason branch June 5, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review All phases done and verified — PR ready for final human review and merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant