-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add worker dispatch with worktree isolation to supervisor (t128.2) #377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add 4 new commands to supervisor-helper.sh for autonomous worker dispatch: - dispatch: Creates worktree per task (wt/git), starts AI worker in background, tracks PID for monitoring - pulse: Stateless supervisor cycle - evaluates completed workers, dispatches queued tasks up to concurrency limit - worker-status: Checks worker process liveness, log signals, PR URLs - cleanup: Removes worktrees for terminal tasks, cleans stale PIDs Key features: - Concurrency semaphore (default 4, configurable via env/batch) - Tabby tab detection for visual dispatch mode - Log-based outcome evaluation (FULL_LOOP_COMPLETE, error patterns) - Automatic retry/block/fail classification - Mail escalation for blocked tasks - opencode/claude CLI auto-detection Zero ShellCheck violations.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
WalkthroughIntroduces comprehensive task dispatch and lifecycle management to a supervisor helper script, enabling task worktree creation, parallel worker execution, state transitions, status monitoring, periodic evaluation cycles, and automated cleanup. Changes
Sequence Diagram(s)sequenceDiagram
actor Supervisor
participant DispatchCmd as cmd_dispatch
participant WorktreeOps as Worktree Mgmt
participant AICliCmd as AI CLI
participant Worker as Background Worker
participant Database as State DB
Supervisor->>DispatchCmd: dispatch task
DispatchCmd->>Database: validate input & check concurrency
DispatchCmd->>WorktreeOps: create/reuse worktree
WorktreeOps-->>DispatchCmd: worktree ready
DispatchCmd->>Database: transition state
DispatchCmd->>AICliCmd: build dispatch command
AICliCmd-->>DispatchCmd: command constructed
DispatchCmd->>Worker: spawn background worker (fork)
Worker->>Database: store PID
Worker-->>Supervisor: backgrounded
sequenceDiagram
actor Supervisor
participant Pulse as cmd_pulse
participant StatusCmd as cmd_worker_status
participant EvalCmd as evaluate_worker
participant Database as State DB
participant WorkerLog as Worker Logs
participant CleanupCmd as cmd_cleanup
Supervisor->>Pulse: periodic pulse cycle
Pulse->>StatusCmd: query active workers
StatusCmd->>Database: fetch PID & metadata
StatusCmd->>WorkerLog: read logs (signals, exit code)
StatusCmd-->>Pulse: worker status report
Pulse->>EvalCmd: evaluate worker outcomes
EvalCmd->>WorkerLog: analyze signals & output
EvalCmd-->>Pulse: outcome (complete/retry/blocked/failed)
Pulse->>CleanupCmd: dispatch new tasks (up to limits)
Pulse->>CleanupCmd: cleanup completed worktrees
CleanupCmd->>Database: purge stale PIDs
CleanupCmd-->>Pulse: cleanup summary
Pulse-->>Supervisor: pulse report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 04:39:51 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In @.agent/scripts/supervisor-helper.sh:
- Around line 1287-1292: The task is transitioned to "dispatched" via
cmd_transition "$task_id" "dispatched" before the worker is actually started,
which can leave tasks stuck if subsequent commands fail; move the cmd_transition
call so it happens only after the worker is successfully backgrounded and the
PID file (and any worktree setup like mkdir -p) is created, or alternatively add
a rollback that calls cmd_transition back to the previous state on any failure
after the transition; update references to cmd_transition, "$task_id",
"$worktree_path", "$branch_name", and "$log_file" and ensure the PID/daemon
startup block is the gate for the successful transition.
- Around line 1049-1068: The comment for detect_dispatch_mode mentions
"interactive" but the function never returns it; update detect_dispatch_mode to
handle an "interactive" path by first checking if SUPERVISOR_DISPATCH_MODE ==
"interactive" and returning "interactive", and then if not explicitly set,
detect an interactive tty (e.g., using the shell test -t 1 or [[ -t 1 ]]) and
return "interactive" when stdout is a TTY; keep the existing checks for
"headless" and "tabby" and preserve the default "headless" fallback.
- Around line 1614-1626: The loop silences all cmd_dispatch stderr and ignores
non-concurrency failures; update the dispatch logic in the while reading
next_tasks so that you do not redirect stderr to /dev/null, capture the exit
code from cmd_dispatch (dispatch_exit), and then: if dispatch_exit == 2 keep the
existing log_info "Concurrency limit reached, stopping dispatch" and break;
otherwise increment a failed_count (e.g., failed_count=$((failed_count+1))) and
emit a clear log (use log_error or log_info) including the tid and dispatch_exit
to surface the failure; keep incrementing dispatched_count only on success.
Ensure you reference cmd_dispatch, next_tasks, dispatched_count, failed_count
and dispatch_exit in the change.
- Around line 1555-1559: The task state change is being swallowed because tasks
in 'dispatched' are never allowed to move to 'evaluating' and the failing
cmd_transition call is suppressed; update VALID_TRANSITIONS to include the
'dispatched:evaluating' pair (or, alternatively, before invoking cmd_transition
"$tid" "evaluating" add an explicit cmd_transition "$tid" "running" to normalize
state) so the cmd_transition call for evaluating can succeed; ensure the change
touches the VALID_TRANSITIONS array/variable and leave the cmd_transition "$tid"
"evaluating" call as-is (remove the need for the "|| true" suppression if the
transition will succeed).
- Around line 1574-1583: The retry path currently re-transitions tasks to
"dispatched" causing them to be re-evaluated without a new worker; update the
valid transitions map to include a "retrying:queued" transition (add
"retrying:queued" to the transitions array/definition near where transitions are
declared) and change the retry block that calls cmd_transition "$tid"
"dispatched" to instead call cmd_transition "$tid" "queued" so Phase 2 will
re-dispatch the task and create a fresh worker/log; keep the existing retrying
transition to increment counters and preserve the error handling/log_error
behavior for failed re-queue attempts.
- Around line 1456-1463: The handler that detects FULL_LOOP_COMPLETE currently
emits the literal sentinel "no_pr" into outcome_detail via the pr_url variable
(see variable pr_url and the echo "complete:${pr_url:-no_pr}" in
supervisor-helper.sh), which causes the string "no_pr" to be stored as a PR URL;
change the emission to leave the detail empty when no PR is found (e.g., emit
"complete:" or an empty pr field instead of "no_pr") and update the cmd_pulse
invocation (the call that adds --pr-url) to only include the --pr-url flag when
the pr_url value is non-empty so nothing like "no_pr" is ever passed/stored.
Ensure you update both the pr_url assignment/echo in supervisor-helper.sh and
the conditional around the --pr-url argument in cmd_pulse.
- Around line 1310-1321: The Tabby path currently always starts a background
worker because the printf escape is forced to succeed with "|| true", causing
dual execution; remove the unconditional truthy fallback and make the fallback
conditional on the OSC 1337 escape result: call printf '\e]1337;NewTab=%s\a'
"$tab_cmd" without "|| true", capture its exit status (or use a boolean like
opened_tab) and only execute the background subshell "(cd "$worktree_path" &&
"${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") &" when the
printf/Tabby open failed; keep variables dispatch_mode, tab_cmd, cmd_parts,
worktree_path and log_file as the referenced symbols to locate and update the
logic.
🧹 Nitpick comments (3)
.agent/scripts/supervisor-helper.sh (3)
1091-1115:worktree_pathandlog_fileparameters are accepted but never used.Parameters
$2and$3are passed bycmd_dispatch(Line 1305) but are not referenced inside this function. This is misleading—future contributors may assume the command incorporates them. Either use them or drop them from the signature.♻️ Proposed fix — remove unused params
build_dispatch_cmd() { local task_id="$1" - local worktree_path="$2" - local log_file="$3" - local ai_cli="$4" + local ai_cli="$2"And update the call site at Line 1305:
- done < <(build_dispatch_cmd "$task_id" "$worktree_path" "$log_file" "$ai_cli") + done < <(build_dispatch_cmd "$task_id" "$ai_cli")
1480-1494: Error-patterngrepscans the entire log — risk of false-positive classification.Patterns like
rate.limit|429andpermission denied|401are matched anywhere in the log. A transient 429 that was recovered from mid-session would still triggerretry:rate_limitedif the worker later exited non-zero withoutFULL_LOOP_COMPLETE. Consider narrowing the search to the tail of the log (e.g., last 50 lines) to reduce false positives.♻️ Proposed narrowing
- if grep -qi 'rate.limit\|429\|too many requests' "$tlog" 2>/dev/null; then + if tail -50 "$tlog" | grep -qi 'rate.limit\|429\|too many requests' 2>/dev/null; then echo "retry:rate_limited" return 0 fi - if grep -qi 'permission denied\|unauthorized\|403\|401' "$tlog" 2>/dev/null; then + if tail -50 "$tlog" | grep -qi 'permission denied\|unauthorized\|403\|401' 2>/dev/null; then echo "blocked:auth_error" return 0 fi - if grep -qi 'merge conflict\|CONFLICT' "$tlog" 2>/dev/null; then + if tail -50 "$tlog" | grep -qi 'merge conflict\|CONFLICT' 2>/dev/null; then echo "blocked:merge_conflict" return 0 fi
1236-1257: TOCTOU window in concurrency enforcement.The concurrency check (
cmd_running_count) and the subsequent dispatch + state transition are not atomic. Two concurrent invocations (e.g., parallelcmd_pulseruns or manual + cron overlap) could both pass the check and exceed the limit. Given that the pulse is typically cron-based and single-threaded, this is low risk, but worth noting.A lightweight mitigation would be a filesystem lock (e.g.,
flock) around the pulse's dispatch phase.
| ####################################### | ||
| # Detect terminal environment for dispatch mode | ||
| # Returns: "tabby", "headless", or "interactive" | ||
| ####################################### | ||
| detect_dispatch_mode() { | ||
| if [[ "${SUPERVISOR_DISPATCH_MODE:-}" == "headless" ]]; then | ||
| echo "headless" | ||
| return 0 | ||
| fi | ||
| if [[ "${SUPERVISOR_DISPATCH_MODE:-}" == "tabby" ]]; then | ||
| echo "tabby" | ||
| return 0 | ||
| fi | ||
| if [[ "${TERM_PROGRAM:-}" == "Tabby" ]]; then | ||
| echo "tabby" | ||
| return 0 | ||
| fi | ||
| echo "headless" | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment/code mismatch: "interactive" mode is documented but never returned.
Line 1051 states the function returns "tabby", "headless", or "interactive", but no code path ever emits "interactive". Either remove it from the comment or implement the missing return path.
📝 Proposed fix
#######################################
# Detect terminal environment for dispatch mode
-# Returns: "tabby", "headless", or "interactive"
+# Returns: "tabby" or "headless"
#######################################📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ####################################### | |
| # Detect terminal environment for dispatch mode | |
| # Returns: "tabby", "headless", or "interactive" | |
| ####################################### | |
| detect_dispatch_mode() { | |
| if [[ "${SUPERVISOR_DISPATCH_MODE:-}" == "headless" ]]; then | |
| echo "headless" | |
| return 0 | |
| fi | |
| if [[ "${SUPERVISOR_DISPATCH_MODE:-}" == "tabby" ]]; then | |
| echo "tabby" | |
| return 0 | |
| fi | |
| if [[ "${TERM_PROGRAM:-}" == "Tabby" ]]; then | |
| echo "tabby" | |
| return 0 | |
| fi | |
| echo "headless" | |
| return 0 | |
| } | |
| ####################################### | |
| # Detect terminal environment for dispatch mode | |
| # Returns: "tabby" or "headless" | |
| ####################################### | |
| detect_dispatch_mode() { | |
| if [[ "${SUPERVISOR_DISPATCH_MODE:-}" == "headless" ]]; then | |
| echo "headless" | |
| return 0 | |
| fi | |
| if [[ "${SUPERVISOR_DISPATCH_MODE:-}" == "tabby" ]]; then | |
| echo "tabby" | |
| return 0 | |
| fi | |
| if [[ "${TERM_PROGRAM:-}" == "Tabby" ]]; then | |
| echo "tabby" | |
| return 0 | |
| fi | |
| echo "headless" | |
| return 0 | |
| } |
🤖 Prompt for AI Agents
In @.agent/scripts/supervisor-helper.sh around lines 1049 - 1068, The comment
for detect_dispatch_mode mentions "interactive" but the function never returns
it; update detect_dispatch_mode to handle an "interactive" path by first
checking if SUPERVISOR_DISPATCH_MODE == "interactive" and returning
"interactive", and then if not explicitly set, detect an interactive tty (e.g.,
using the shell test -t 1 or [[ -t 1 ]]) and return "interactive" when stdout is
a TTY; keep the existing checks for "headless" and "tabby" and preserve the
default "headless" fallback.
| # Transition to dispatched | ||
| cmd_transition "$task_id" "dispatched" \ | ||
| --worktree "$worktree_path" \ | ||
| --branch "$branch_name" \ | ||
| --log-file "$log_file" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State transition to "dispatched" occurs before the worker actually starts.
If any command between lines 1292 and 1317/1320 fails (e.g., mkdir -p on line 1308, though unlikely), set -e aborts the function and the task remains in dispatched with no running worker and no PID file. Consider transitioning to dispatched after the worker is confirmed backgrounded, or adding a rollback on failure.
🤖 Prompt for AI Agents
In @.agent/scripts/supervisor-helper.sh around lines 1287 - 1292, The task is
transitioned to "dispatched" via cmd_transition "$task_id" "dispatched" before
the worker is actually started, which can leave tasks stuck if subsequent
commands fail; move the cmd_transition call so it happens only after the worker
is successfully backgrounded and the PID file (and any worktree setup like mkdir
-p) is created, or alternatively add a rollback that calls cmd_transition back
to the previous state on any failure after the transition; update references to
cmd_transition, "$task_id", "$worktree_path", "$branch_name", and "$log_file"
and ensure the PID/daemon startup block is the gate for the successful
transition.
| if [[ "$dispatch_mode" == "tabby" ]]; then | ||
| # Tabby: attempt to open in a new tab via OSC 1337 escape sequence | ||
| log_info "Opening Tabby tab for $task_id..." | ||
| local tab_cmd | ||
| tab_cmd="cd '${worktree_path}' && ${cmd_parts[*]} > '${log_file}' 2>&1; echo \"EXIT:\$?\" >> '${log_file}'" | ||
| printf '\e]1337;NewTab=%s\a' "$tab_cmd" 2>/dev/null || true | ||
| # Also start background process as fallback (Tabby may not support OSC 1337) | ||
| (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") & | ||
| else | ||
| # Headless: background process | ||
| (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") & | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tabby mode launches two competing workers on the same task and log file.
When dispatch_mode == "tabby", line 1315 attempts to open a Tabby tab, then line 1317 unconditionally starts a background process as a "fallback." If the OSC 1337 escape succeeds, both the Tabby tab and the background subshell execute the same command in the same worktree, writing concurrently to the same $log_file. This causes:
- Corrupted/interleaved log output (two concurrent writers).
- Duplicate AI CLI invocations against the same branch, risking conflicting commits.
- Only the background PID (line 1323) is tracked — the Tabby process is unmanaged.
The fallback should be conditional, not unconditional.
🐛 Proposed fix — only fall back if the Tabby escape is unavailable
if [[ "$dispatch_mode" == "tabby" ]]; then
# Tabby: attempt to open in a new tab via OSC 1337 escape sequence
log_info "Opening Tabby tab for $task_id..."
local tab_cmd
tab_cmd="cd '${worktree_path}' && ${cmd_parts[*]} > '${log_file}' 2>&1; echo \"EXIT:\$?\" >> '${log_file}'"
- printf '\e]1337;NewTab=%s\a' "$tab_cmd" 2>/dev/null || true
- # Also start background process as fallback (Tabby may not support OSC 1337)
- (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") &
+ if ! printf '\e]1337;NewTab=%s\a' "$tab_cmd" 2>/dev/null; then
+ log_warn "Tabby OSC 1337 not available, falling back to headless"
+ (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") &
+ else
+ # In Tabby mode, we still need a PID to monitor.
+ # Start headless since we can't capture the Tabby tab PID.
+ log_warn "Tabby tab requested but PID tracking requires headless; using headless mode"
+ (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") &
+ fi
elseAlternatively, simplify by always using headless and dropping the Tabby code path until there is a reliable mechanism to avoid dual execution.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [[ "$dispatch_mode" == "tabby" ]]; then | |
| # Tabby: attempt to open in a new tab via OSC 1337 escape sequence | |
| log_info "Opening Tabby tab for $task_id..." | |
| local tab_cmd | |
| tab_cmd="cd '${worktree_path}' && ${cmd_parts[*]} > '${log_file}' 2>&1; echo \"EXIT:\$?\" >> '${log_file}'" | |
| printf '\e]1337;NewTab=%s\a' "$tab_cmd" 2>/dev/null || true | |
| # Also start background process as fallback (Tabby may not support OSC 1337) | |
| (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") & | |
| else | |
| # Headless: background process | |
| (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") & | |
| fi | |
| if [[ "$dispatch_mode" == "tabby" ]]; then | |
| # Tabby: attempt to open in a new tab via OSC 1337 escape sequence | |
| log_info "Opening Tabby tab for $task_id..." | |
| local tab_cmd | |
| tab_cmd="cd '${worktree_path}' && ${cmd_parts[*]} > '${log_file}' 2>&1; echo \"EXIT:\$?\" >> '${log_file}'" | |
| if ! printf '\e]1337;NewTab=%s\a' "$tab_cmd" 2>/dev/null; then | |
| log_warn "Tabby OSC 1337 not available, falling back to headless" | |
| (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") & | |
| else | |
| # In Tabby mode, we still need a PID to monitor. | |
| # Start headless since we can't capture the Tabby tab PID. | |
| log_warn "Tabby tab requested but PID tracking requires headless; using headless mode" | |
| (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") & | |
| fi | |
| else | |
| # Headless: background process | |
| (cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") & | |
| fi |
🤖 Prompt for AI Agents
In @.agent/scripts/supervisor-helper.sh around lines 1310 - 1321, The Tabby path
currently always starts a background worker because the printf escape is forced
to succeed with "|| true", causing dual execution; remove the unconditional
truthy fallback and make the fallback conditional on the OSC 1337 escape result:
call printf '\e]1337;NewTab=%s\a' "$tab_cmd" without "|| true", capture its exit
status (or use a boolean like opened_tab) and only execute the background
subshell "(cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo
"EXIT:$?" >> "$log_file") &" when the printf/Tabby open failed; keep variables
dispatch_mode, tab_cmd, cmd_parts, worktree_path and log_file as the referenced
symbols to locate and update the logic.
| # Check for FULL_LOOP_COMPLETE (best outcome) | ||
| if grep -q 'FULL_LOOP_COMPLETE' "$tlog" 2>/dev/null; then | ||
| # Extract PR URL if present | ||
| local pr_url | ||
| pr_url=$(grep -oE 'https://github\.com/[^/]+/[^/]+/pull/[0-9]+' "$tlog" 2>/dev/null | tail -1 || true) | ||
| echo "complete:${pr_url:-no_pr}" | ||
| return 0 | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Literal "no_pr" stored as the PR URL in the database.
When no PR URL is found (line 1461), outcome_detail becomes the string "no_pr". In cmd_pulse line 1569, this is passed as --pr-url "no_pr", storing the literal string in the pr_url column. Downstream consumers querying for PR URLs would need to special-case this sentinel value.
📝 Proposed fix — emit empty detail and guard in pulse
- echo "complete:${pr_url:-no_pr}"
+ echo "complete:${pr_url:-}"And in cmd_pulse (line 1569), only pass --pr-url when the value is non-empty:
- cmd_transition "$tid" "complete" --pr-url "$outcome_detail" 2>/dev/null || true
+ if [[ -n "$outcome_detail" ]]; then
+ cmd_transition "$tid" "complete" --pr-url "$outcome_detail" 2>/dev/null || true
+ else
+ cmd_transition "$tid" "complete" 2>/dev/null || true
+ fi🤖 Prompt for AI Agents
In @.agent/scripts/supervisor-helper.sh around lines 1456 - 1463, The handler
that detects FULL_LOOP_COMPLETE currently emits the literal sentinel "no_pr"
into outcome_detail via the pr_url variable (see variable pr_url and the echo
"complete:${pr_url:-no_pr}" in supervisor-helper.sh), which causes the string
"no_pr" to be stored as a PR URL; change the emission to leave the detail empty
when no PR is found (e.g., emit "complete:" or an empty pr field instead of
"no_pr") and update the cmd_pulse invocation (the call that adds --pr-url) to
only include the --pr-url flag when the pr_url value is non-empty so nothing
like "no_pr" is ever passed/stored. Ensure you update both the pr_url
assignment/echo in supervisor-helper.sh and the conditional around the --pr-url
argument in cmd_pulse.
| # Worker is done - evaluate outcome | ||
| log_info " $tid: worker finished, evaluating..." | ||
|
|
||
| # Transition to evaluating | ||
| cmd_transition "$tid" "evaluating" 2>/dev/null || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tasks in dispatched state cannot transition to evaluating — silently swallowed error.
Phase 1 queries tasks in ('running', 'dispatched') (line 1529), but the valid transitions only include dispatched:running, dispatched:failed, and dispatched:cancelled — there is no dispatched:evaluating. Line 1559 suppresses the failed transition with || true, so evaluation proceeds against a task whose DB state was never updated. Subsequent transitions from the evaluate outcome (e.g., to complete) will also fail silently since the task is still in dispatched.
Consider either adding dispatched:evaluating to VALID_TRANSITIONS, or inserting a dispatched→running transition before the evaluation step for tasks still in dispatched state.
🐛 Proposed fix — add missing transition
"dispatched:running"
+ "dispatched:evaluating"
"dispatched:failed"🤖 Prompt for AI Agents
In @.agent/scripts/supervisor-helper.sh around lines 1555 - 1559, The task state
change is being swallowed because tasks in 'dispatched' are never allowed to
move to 'evaluating' and the failing cmd_transition call is suppressed; update
VALID_TRANSITIONS to include the 'dispatched:evaluating' pair (or,
alternatively, before invoking cmd_transition "$tid" "evaluating" add an
explicit cmd_transition "$tid" "running" to normalize state) so the
cmd_transition call for evaluating can succeed; ensure the change touches the
VALID_TRANSITIONS array/variable and leave the cmd_transition "$tid"
"evaluating" call as-is (remove the need for the "|| true" suppression if the
transition will succeed).
| retry) | ||
| log_warn " $tid: RETRY ($outcome_detail)" | ||
| cmd_transition "$tid" "retrying" --error "$outcome_detail" 2>/dev/null || true | ||
| # Re-queue for dispatch | ||
| cmd_transition "$tid" "dispatched" 2>/dev/null || { | ||
| # If transition fails (e.g., max retries), mark failed | ||
| log_error " $tid: could not re-queue, marking failed" | ||
| failed_count=$((failed_count + 1)) | ||
| } | ||
| rm -f "$pid_file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retry path transitions to dispatched but never starts a new worker — task enters an infinite re-evaluation loop.
The flow is:
cmd_transition → "retrying"(retry counter incremented).cmd_transition → "dispatched"(no worker started, old log file still referenced in DB).- PID file removed.
- Next pulse: Phase 1 finds this task in
dispatchedwith no PID → dead → evaluates the same old log → same outcome → retry again. - Repeat until
max_retriesis hit, but no actual work is ever re-attempted.
Phase 2 only dispatches queued tasks, and dispatched tasks are never picked up. The task is stuck in a retry-evaluation loop burning retries without restarting a worker.
🐛 Proposed fix — retry to `queued` so Phase 2 re-dispatches
Add retrying:queued to the valid transitions array (around line 63):
"retrying:dispatched"
+ "retrying:queued"
"retrying:failed"Then change the pulse retry block:
retry)
log_warn " $tid: RETRY ($outcome_detail)"
cmd_transition "$tid" "retrying" --error "$outcome_detail" 2>/dev/null || true
- # Re-queue for dispatch
- cmd_transition "$tid" "dispatched" 2>/dev/null || {
+ # Return to queued so Phase 2 re-dispatches with a fresh worker
+ cmd_transition "$tid" "queued" 2>/dev/null || {
# If transition fails (e.g., max retries), mark failed
log_error " $tid: could not re-queue, marking failed"
failed_count=$((failed_count + 1))
}
rm -f "$pid_file"
;;This lets the next Phase 2 iteration pick up the task, create a fresh worktree and worker, and produce a new log file.
🤖 Prompt for AI Agents
In @.agent/scripts/supervisor-helper.sh around lines 1574 - 1583, The retry path
currently re-transitions tasks to "dispatched" causing them to be re-evaluated
without a new worker; update the valid transitions map to include a
"retrying:queued" transition (add "retrying:queued" to the transitions
array/definition near where transitions are declared) and change the retry block
that calls cmd_transition "$tid" "dispatched" to instead call cmd_transition
"$tid" "queued" so Phase 2 will re-dispatch the task and create a fresh
worker/log; keep the existing retrying transition to increment counters and
preserve the error handling/log_error behavior for failed re-queue attempts.
| if [[ -n "$next_tasks" ]]; then | ||
| while IFS='|' read -r tid trepo tdesc tmodel; do | ||
| if cmd_dispatch "$tid" --batch "$batch_id" 2>/dev/null; then | ||
| dispatched_count=$((dispatched_count + 1)) | ||
| else | ||
| local dispatch_exit=$? | ||
| if [[ "$dispatch_exit" -eq 2 ]]; then | ||
| # Concurrency limit reached | ||
| log_info "Concurrency limit reached, stopping dispatch" | ||
| break | ||
| fi | ||
| fi | ||
| done <<< "$next_tasks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-concurrency dispatch failures are silently swallowed.
cmd_dispatch ... 2>/dev/null suppresses all stderr. When cmd_dispatch returns 1 (e.g., worktree creation failure), the else branch only checks for exit code 2 (concurrency). All other failures are silently ignored — no log, no count, no break. Consider logging the failure or incrementing failed_count for non-concurrency errors.
📝 Proposed fix
else
local dispatch_exit=$?
if [[ "$dispatch_exit" -eq 2 ]]; then
# Concurrency limit reached
log_info "Concurrency limit reached, stopping dispatch"
break
+ else
+ log_warn " $tid: dispatch failed (exit $dispatch_exit)"
fi🤖 Prompt for AI Agents
In @.agent/scripts/supervisor-helper.sh around lines 1614 - 1626, The loop
silences all cmd_dispatch stderr and ignores non-concurrency failures; update
the dispatch logic in the while reading next_tasks so that you do not redirect
stderr to /dev/null, capture the exit code from cmd_dispatch (dispatch_exit),
and then: if dispatch_exit == 2 keep the existing log_info "Concurrency limit
reached, stopping dispatch" and break; otherwise increment a failed_count (e.g.,
failed_count=$((failed_count+1))) and emit a clear log (use log_error or
log_info) including the tid and dispatch_exit to surface the failure; keep
incrementing dispatched_count only on success. Ensure you reference
cmd_dispatch, next_tasks, dispatched_count, failed_count and dispatch_exit in
the change.



Summary
supervisor-helper.shfor autonomous worker dispatch:dispatch,pulse,worker-status,cleanupwt switch -c feature/tXXX) with AI worker dispatched viaopencode run --format json "/full-loop tXXX"Details
New Commands
dispatch <task_id>pulse [--batch id]worker-status <task_id>cleanup [--dry-run]Key Features
~/Git/{repo}.feature-{tXXX}/viawtorgit worktreeSUPERVISOR_MAX_CONCURRENCYenv var or batch--concurrencyflagTERM_PROGRAM=Tabbyfor visual dispatch modemail-helper.shnotificationopencode, falls back toclaudeQuality
local var="$1"pattern throughoutTesting
Tested manually:
help- shows all new commands and optionsinit+add+dispatch- full lifecyclepulse- evaluates workers and dispatches queued tasksworker-status- shows process state and log signalscleanup --dry-run- lists worktrees to cleanRelated
Summary by CodeRabbit