Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions .agents/scripts/supervisor/ai-lifecycle.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,19 +52,22 @@ gather_task_state() {
escaped_id=$(sql_escape "$task_id")

# DB state
# Note: worker_pid column was removed in schema migration; use session_id instead.
# The query previously referenced worker_pid which caused silent SQLite errors,
# making gather_task_state return empty for ALL tasks and breaking Phase 3 entirely.
local task_row
task_row=$(db -separator '|' "$SUPERVISOR_DB" "
SELECT id, status, pr_url, repo, branch, worktree, error,
rebase_attempts, retries, max_retries, model, worker_pid
rebase_attempts, retries, max_retries, model, session_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the SELECT, but _dispatch_ai_worker later still does UPDATE tasks ... worker_pid = ... (with errors suppressed), which likely keeps worker state updates silently failing on the migrated schema. Consider aligning that path to write session_id (and/or rely on cmd_transition --session) so lifecycle decisions don’t diverge from actual worker state.

Severity: high

Other Locations
  • .agents/scripts/supervisor/ai-lifecycle.sh:684

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

FROM tasks WHERE id = '$escaped_id';
" 2>/dev/null || echo "")

if [[ -z "$task_row" ]]; then
return 1
fi

local tid tstatus tpr trepo tbranch tworktree terror trebase tretries tmax_retries tmodel tpid
IFS='|' read -r tid tstatus tpr trepo tbranch tworktree terror trebase tretries tmax_retries tmodel tpid <<<"$task_row"
local tid tstatus tpr trepo tbranch tworktree terror trebase tretries tmax_retries tmodel tsession
IFS='|' read -r tid tstatus tpr trepo tbranch tworktree terror trebase tretries tmax_retries tmodel tsession <<<"$task_row"

# GitHub PR state (if PR exists)
local pr_state="none" pr_merge_state="none" pr_ci_summary="none"
Expand Down Expand Up @@ -119,13 +122,16 @@ gather_task_state() {
fi
fi

# Worker process state
# Worker process state — session_id replaces worker_pid after schema migration.
# We can check if a worker session is active by looking for the session's
# log file or checking if the status implies an active worker.
local worker_alive="unknown"
if [[ -n "$tpid" && "$tpid" != "0" ]]; then
if kill -0 "$tpid" 2>/dev/null; then
worker_alive="yes"
if [[ -n "$tsession" && "$tsession" != "0" && "$tsession" != "" ]]; then
# Session exists — check if the task is in an active-worker state
if [[ "$tstatus" == "running" || "$tstatus" == "dispatched" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new worker_alive heuristic only checks status + non-empty session_id, so if a worker crashes but status remains running/dispatched the AI may wait indefinitely. If session_id is pid:<n> (or a remote token), validating liveness via the PID file and/or log file (as the comment suggests) would make this more robust.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

worker_alive="yes (session: ${tsession:0:12}...)"
else
worker_alive="no (PID $tpid dead)"
worker_alive="no (session ended)"
fi
else
worker_alive="no worker"
Expand Down
Loading