Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .agents/scripts/supervisor-helper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,7 @@ readonly VALID_STATES="queued dispatched running evaluating retrying complete pr
# shellcheck disable=SC2034 # Used by supervisor/state.sh
readonly -a VALID_TRANSITIONS=(
"queued:dispatched"
"queued:blocked"
"queued:cancelled"
"dispatched:running"
"dispatched:failed"
Expand Down
12 changes: 7 additions & 5 deletions .agents/scripts/supervisor/dispatch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1528,14 +1528,16 @@ check_dispatch_dedup_guard() {
local cooldown_secs="${SUPERVISOR_FAILURE_COOLDOWN_SECS:-600}" # 10 minutes default
local max_consecutive="${SUPERVISOR_MAX_CONSECUTIVE_FAILURES:-2}"

# Rule 2: Block after max_consecutive identical failures
# Rule 2: Cancel after max_consecutive identical failures
# Note: queued->blocked is not a valid transition; use cancelled instead.
# The task can be manually re-queued after investigation.
if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then
local block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"
log_warn " $task_id: BLOCKED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
cmd_transition "$task_id" "blocked" --error "$block_reason" 2>/dev/null || true
log_warn " $task_id: CANCELLED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
cmd_transition "$task_id" "cancelled" --error "$block_reason" 2>/dev/null || true
update_todo_on_blocked "$task_id" "$block_reason" 2>/dev/null || true
send_task_notification "$task_id" "blocked" "$block_reason" 2>/dev/null || true
store_failure_pattern "$task_id" "blocked" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
send_task_notification "$task_id" "cancelled" "$block_reason" 2>/dev/null || true
store_failure_pattern "$task_id" "cancelled" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
Comment on lines +1531 to +1540
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use cancellation-specific TODO updates to avoid state drift.

The task now transitions to cancelled, but the TODO sync still uses update_todo_on_blocked, which will annotate BLOCKED in TODO.md and diverge from the DB state. Please switch to update_todo_on_cancelled. Also, the note about queued->blocked being invalid is now stale with the new VALID_TRANSITIONS entry—either update it or clarify why cancellation is still preferred here.

🛠️ Suggested fix
-		update_todo_on_blocked "$task_id" "$block_reason" 2>/dev/null || true
+		update_todo_on_cancelled "$task_id" "$block_reason" 2>/dev/null || true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Rule 2: Cancel after max_consecutive identical failures
# Note: queued->blocked is not a valid transition; use cancelled instead.
# The task can be manually re-queued after investigation.
if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then
local block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"
log_warn " $task_id: BLOCKED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
cmd_transition "$task_id" "blocked" --error "$block_reason" 2>/dev/null || true
log_warn " $task_id: CANCELLED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
cmd_transition "$task_id" "cancelled" --error "$block_reason" 2>/dev/null || true
update_todo_on_blocked "$task_id" "$block_reason" 2>/dev/null || true
send_task_notification "$task_id" "blocked" "$block_reason" 2>/dev/null || true
store_failure_pattern "$task_id" "blocked" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
send_task_notification "$task_id" "cancelled" "$block_reason" 2>/dev/null || true
store_failure_pattern "$task_id" "cancelled" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
# Rule 2: Cancel after max_consecutive identical failures
# Note: queued->blocked is not a valid transition; use cancelled instead.
# The task can be manually re-queued after investigation.
if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then
local block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"
log_warn " $task_id: CANCELLED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
cmd_transition "$task_id" "cancelled" --error "$block_reason" 2>/dev/null || true
update_todo_on_cancelled "$task_id" "$block_reason" 2>/dev/null || true
send_task_notification "$task_id" "cancelled" "$block_reason" 2>/dev/null || true
store_failure_pattern "$task_id" "cancelled" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/dispatch.sh around lines 1531 - 1540, The TODO
sync call uses update_todo_on_blocked after transitioning the task to
"cancelled", causing TODO.md to show BLOCKED instead of CANCELLED; replace the
call to update_todo_on_blocked with update_todo_on_cancelled (referencing
cmd_transition, update_todo_on_cancelled, send_task_notification,
store_failure_pattern, and the local block_reason variable) so the TODO state
matches the DB, and update or remove the stale comment about "queued->blocked is
not a valid transition" to reflect the current VALID_TRANSITIONS or explain why
cancellation is still chosen here.

return 1
Comment on lines +1531 to 1541

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a logic inconsistency here. The PR adds queued:blocked to VALID_TRANSITIONS in supervisor-helper.sh (line 248), but this code switches to cancelled because it claims queued:blocked is invalid. Since the transition is now valid, blocked should be used as it correctly reflects the 'manual intervention required' status mentioned in the logs. Furthermore, the use of 2>/dev/null || true for blanket error suppression violates the repository guidelines (Rule 1), which states that 2>/dev/null should be avoided for blanket suppression to ensure errors remain visible for debugging. Errors should be redirected to ${SUPERVISOR_LOG} to maintain visibility for debugging.

Suggested change
# Rule 2: Cancel after max_consecutive identical failures
# Note: queued->blocked is not a valid transition; use cancelled instead.
# The task can be manually re-queued after investigation.
if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then
local block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"
log_warn " $task_id: BLOCKED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
cmd_transition "$task_id" "blocked" --error "$block_reason" 2>/dev/null || true
log_warn " $task_id: CANCELLED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
cmd_transition "$task_id" "cancelled" --error "$block_reason" 2>/dev/null || true
update_todo_on_blocked "$task_id" "$block_reason" 2>/dev/null || true
send_task_notification "$task_id" "blocked" "$block_reason" 2>/dev/null || true
store_failure_pattern "$task_id" "blocked" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
send_task_notification "$task_id" "cancelled" "$block_reason" 2>/dev/null || true
store_failure_pattern "$task_id" "cancelled" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
return 1
# Rule 2: Block after max_consecutive identical failures
# Note: queued->blocked is now a valid transition in VALID_TRANSITIONS.
# The task can be manually re-queued after investigation.
if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then
local block_reason
block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"
log_warn " $task_id: BLOCKED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
cmd_transition "$task_id" "blocked" --error "$block_reason" 2>>"${SUPERVISOR_LOG:-/dev/null}" || true
update_todo_on_blocked "$task_id" "$block_reason" 2>>"${SUPERVISOR_LOG:-/dev/null}" || true
send_task_notification "$task_id" "blocked" "$block_reason" 2>>"${SUPERVISOR_LOG:-/dev/null}" || true
store_failure_pattern "$task_id" "blocked" "$block_reason" "dispatch-dedup-guard" 2>>"${SUPERVISOR_LOG:-/dev/null}" || true
return 1
fi
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

fi

Expand Down
8 changes: 4 additions & 4 deletions .agents/scripts/supervisor/todo-sync.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1693,7 +1693,7 @@ cmd_reconcile_db_todo() {
else
log_info "Phase 7b: Transitioning $tid from $tstatus to complete (TODO.md shows [x])"
cmd_transition "$tid" "complete" \
--reason "Reconciled: TODO.md marked [x] but DB was $tstatus (t1001)" \
--error "Reconciled: TODO.md marked [x] but DB was $tstatus (t1001)" \
2>>"${SUPERVISOR_LOG:-/dev/null}" || {
log_warn "Phase 7b: Failed to transition $tid to complete"
continue
Expand Down Expand Up @@ -1874,7 +1874,7 @@ cmd_reconcile_queue_dispatchability() {
else
log_info "Phase 0.6: $tid queued in DB but [x] in TODO.md — transitioning to complete"
cmd_transition "$tid" "complete" \
--reason "Reconciled: TODO.md marked [x] but DB was queued (t1180)" \
--error "Reconciled: TODO.md marked [x] but DB was queued (t1180)" \
2>>"${SUPERVISOR_LOG:-/dev/null}" || {
log_warn "Phase 0.6: Failed to transition $tid to complete"
continue
Expand All @@ -1891,7 +1891,7 @@ cmd_reconcile_queue_dispatchability() {
else
log_info "Phase 0.6: $tid queued in DB but [-] in TODO.md — cancelling"
cmd_transition "$tid" "cancelled" \
--reason "Reconciled: TODO.md marked [-] but DB was queued (t1180)" \
--error "Reconciled: TODO.md marked [-] but DB was queued (t1180)" \
2>>"${SUPERVISOR_LOG:-/dev/null}" || {
log_warn "Phase 0.6: Failed to cancel $tid"
continue
Expand Down Expand Up @@ -1950,7 +1950,7 @@ cmd_reconcile_queue_dispatchability() {
else
log_warn "Phase 0.6: $tid queued in DB but not dispatchable in TODO.md — cancelling phantom entry"
cmd_transition "$tid" "cancelled" \
--reason "Reconciled: queued in DB but TODO.md has no #auto-dispatch tag or Dispatch Queue entry (t1180)" \
--error "Reconciled: queued in DB but TODO.md has no #auto-dispatch tag or Dispatch Queue entry (t1180)" \
2>>"${SUPERVISOR_LOG:-/dev/null}" || {
log_warn "Phase 0.6: Failed to cancel phantom $tid"
continue
Expand Down
Loading