Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions .agents/scripts/supervisor-helper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ readonly -a VALID_TRANSITIONS=(
"retrying:failed"
"retrying:cancelled"
"blocked:queued"
"blocked:pr_review"
"blocked:cancelled"
"failed:queued"
# Post-PR lifecycle transitions (t128.8)
Expand Down Expand Up @@ -1760,6 +1761,15 @@ CONTEST_SQL
fi
fi

# Migrate: add last_main_sha column to tasks (t1029)
local has_last_main_sha
has_last_main_sha=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM pragma_table_info('tasks') WHERE name='last_main_sha';" 2>/dev/null || echo "0")
if [[ "$has_last_main_sha" -eq 0 ]]; then
log_info "Migrating tasks table: adding last_main_sha column (t1029)..."
db "$SUPERVISOR_DB" "ALTER TABLE tasks ADD COLUMN last_main_sha TEXT;" 2>/dev/null || true
log_success "Added last_main_sha column to tasks (t1029)"
fi
Comment on lines +1764 to +1771
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

New DBs won’t get last_main_sha (and rebase_attempts), so Phase 3.5 silently disables itself.
ensure_db returns after init_db, so migrations won’t run on fresh installs. Add these columns to the init_db tasks table to keep schema parity.

🔧 Suggested schema update (init_db)
 CREATE TABLE IF NOT EXISTS tasks (
     id              TEXT PRIMARY KEY,
     repo            TEXT NOT NULL,
     description     TEXT,
     status          TEXT NOT NULL DEFAULT 'queued'
                     CHECK(status IN ('queued','dispatched','running','evaluating','retrying','complete','pr_review','review_triage','merging','merged','deploying','deployed','verifying','verified','verify_failed','blocked','failed','cancelled')),
     session_id      TEXT,
     worktree        TEXT,
     branch          TEXT,
     log_file        TEXT,
     retries         INTEGER NOT NULL DEFAULT 0,
+    rebase_attempts INTEGER NOT NULL DEFAULT 0,
     max_retries     INTEGER NOT NULL DEFAULT 3,
     deploying_recovery_attempts INTEGER NOT NULL DEFAULT 0,
     model           TEXT DEFAULT 'anthropic/claude-opus-4-6',
     error           TEXT,
     pr_url          TEXT,
+    last_main_sha   TEXT,
     issue_url       TEXT,
     diagnostic_of   TEXT,
     triage_result   TEXT,
     escalation_depth INTEGER NOT NULL DEFAULT 0,
     max_escalation  INTEGER NOT NULL DEFAULT 2,
     created_at      TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%SZ','now')),
     started_at      TEXT,
     completed_at    TEXT,
     updated_at      TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%SZ','now'))
 );

As per coding guidelines, Automation scripts should focus on reliability and robustness.

🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 1764 - 1771, ensure_db
returns early after init_db so fresh installs never get later migrations
(missing last_main_sha and rebase_attempts), causing Phase 3.5 to disable
itself; update the initial tasks table creation in init_db to include the
missing columns (add last_main_sha TEXT and rebase_attempts INTEGER/appropriate
type) so new DBs have the same schema as migrated DBs and subsequent logic
referencing last_main_sha and rebase_attempts won't break. Locate the tasks
table CREATE statement in init_db within .agents/scripts/supervisor-helper.sh
and add the two columns with sensible defaults/nullability to match the ALTER
TABLE migrations already applied.


# Ensure WAL mode for existing databases created before t135.3
local current_mode
current_mode=$(db "$SUPERVISOR_DB" "PRAGMA journal_mode;" 2>/dev/null || echo "")
Expand Down Expand Up @@ -11891,6 +11901,60 @@ cmd_pulse() {
log_error "Phase 3b (process_verify_queue) failed — see $SUPERVISOR_LOG for details"
fi

# Phase 3.5: Auto-retry blocked merge-conflict tasks (t1029)
# When a task is blocked with "Merge conflict — auto-rebase failed", periodically
# re-attempt the rebase after main advances. Other PRs merging often resolve conflicts.
local blocked_tasks
blocked_tasks=$(db "$SUPERVISOR_DB" "SELECT id, repo, error, rebase_attempts, last_main_sha FROM tasks WHERE status = 'blocked' AND error LIKE '%Merge conflict%auto-rebase failed%';" 2>/dev/null || echo "")

if [[ -n "$blocked_tasks" ]]; then
while IFS='|' read -r blocked_id blocked_repo blocked_error blocked_rebase_attempts blocked_last_main_sha; do
[[ -z "$blocked_id" ]] && continue

# Cap at 3 total retry cycles to prevent infinite loops
local max_retry_cycles=3
if [[ "${blocked_rebase_attempts:-0}" -ge "$max_retry_cycles" ]]; then
log_info " Skipping $blocked_id — max retry cycles ($max_retry_cycles) reached"
continue
fi

# Get current main SHA
local current_main_sha
current_main_sha=$(git -C "$blocked_repo" rev-parse origin/main 2>/dev/null || echo "")
if [[ -z "$current_main_sha" ]]; then
log_warn " Failed to get origin/main SHA for $blocked_id in $blocked_repo"
continue
fi

# Check if main has advanced since last attempt
if [[ -n "$blocked_last_main_sha" && "$current_main_sha" == "$blocked_last_main_sha" ]]; then
# Main hasn't advanced — skip retry
continue
fi

# Main has advanced (or this is first retry) — reset counter and retry
log_info " Main advanced for $blocked_id — retrying rebase (attempt $((blocked_rebase_attempts + 1))/$max_retry_cycles)"

# Update last_main_sha before attempting rebase
local escaped_blocked_id
escaped_blocked_id=$(sql_escape "$blocked_id")
db "$SUPERVISOR_DB" "UPDATE tasks SET last_main_sha = '$current_main_sha' WHERE id = '$escaped_blocked_id';" 2>/dev/null || true

# Attempt rebase
if rebase_sibling_pr "$blocked_id" 2>>"$SUPERVISOR_LOG"; then
log_success " Auto-rebase retry succeeded for $blocked_id — transitioning to pr_review"
# Increment rebase_attempts counter
db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((blocked_rebase_attempts + 1)) WHERE id = '$escaped_blocked_id';" 2>/dev/null || true
# Transition back to pr_review so CI can run
cmd_transition "$blocked_id" "pr_review" --error "" 2>>"$SUPERVISOR_LOG" || true
else
# Rebase still failed — increment counter and stay blocked
log_warn " Auto-rebase retry failed for $blocked_id — staying blocked"
db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((blocked_rebase_attempts + 1)) WHERE id = '$escaped_blocked_id';" 2>/dev/null || true
fi
Comment on lines +11944 to +11954
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an opportunity to deduplicate code here. The db command to update rebase_attempts is called in both the if and else blocks. You can move this line to before the if statement to make the code more concise and easier to maintain, since the counter is incremented for every attempt regardless of the outcome.

Suggested change
if rebase_sibling_pr "$blocked_id" 2>>"$SUPERVISOR_LOG"; then
log_success " Auto-rebase retry succeeded for $blocked_id — transitioning to pr_review"
# Increment rebase_attempts counter
db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((blocked_rebase_attempts + 1)) WHERE id = '$escaped_blocked_id';" 2>/dev/null || true
# Transition back to pr_review so CI can run
cmd_transition "$blocked_id" "pr_review" --error "" 2>>"$SUPERVISOR_LOG" || true
else
# Rebase still failed — increment counter and stay blocked
log_warn " Auto-rebase retry failed for $blocked_id — staying blocked"
db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((blocked_rebase_attempts + 1)) WHERE id = '$escaped_blocked_id';" 2>/dev/null || true
fi
# Increment rebase_attempts counter before the attempt.
# This is done for both successful and failed rebases.
db "$SUPERVISOR_DB" "UPDATE tasks SET rebase_attempts = $((blocked_rebase_attempts + 1)) WHERE id = '$escaped_blocked_id';" 2>/dev/null || true
if rebase_sibling_pr "$blocked_id" 2>>"$SUPERVISOR_LOG"; then
log_success " Auto-rebase retry succeeded for $blocked_id — transitioning to pr_review"
# Transition back to pr_review so CI can run
cmd_transition "$blocked_id" "pr_review" --error "" 2>>"$SUPERVISOR_LOG" || true
else
# Rebase still failed — stay blocked
log_warn " Auto-rebase retry failed for $blocked_id — staying blocked"
fi

done <<<"$blocked_tasks"
Comment on lines +11904 to +11955
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fetch origin/main and guard repo existence before SHA comparison.
Without a fetch, origin/main can be stale and the retry loop may never trigger. Also, git -C on a missing repo risks mis-targeting the current directory.

🛠️ Suggested hardening
 		while IFS='|' read -r blocked_id blocked_repo blocked_error blocked_rebase_attempts blocked_last_main_sha; do
 			[[ -z "$blocked_id" ]] && continue
+			if [[ -z "$blocked_repo" || ! -d "$blocked_repo/.git" ]]; then
+				log_warn "  Skipping $blocked_id — repo not found: $blocked_repo"
+				continue
+			fi
+
+			if ! git -C "$blocked_repo" fetch origin main 2>>"$SUPERVISOR_LOG"; then
+				log_warn "  Failed to fetch origin/main for $blocked_id in $blocked_repo"
+				continue
+			fi
 
 			# Get current main SHA
 			local current_main_sha
 			current_main_sha=$(git -C "$blocked_repo" rev-parse origin/main 2>/dev/null || echo "")

As per coding guidelines, Automation scripts should focus on reliability and robustness, clear logging and feedback, proper exit codes, and error recovery mechanisms.

🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 11904 - 11955, The current
logic reads origin/main SHA without ensuring the repo exists or that origin/main
is up-to-date; before calling git -C "$blocked_repo" rev-parse origin/main,
first verify the repo exists (e.g., check "$blocked_repo/.git" or git -C
"$blocked_repo" rev-parse --is-inside-work-tree) and if present run a fetch (git
-C "$blocked_repo" fetch origin main) and guard fetch errors; if the repo is
missing or fetch/rev-parse fails, log via log_warn and continue, otherwise
assign current_main_sha from the freshly-fetched origin/main and proceed to the
existing comparison and rebase_sibling_pr flow, keeping existing db updates and
logging (references: blocked_repo, current_main_sha, blocked_last_main_sha,
rebase_sibling_pr, log_warn, log_info, db).

fi

# Phase 4: Worker health checks - detect dead, hung, and orphaned workers
local worker_timeout_seconds="${SUPERVISOR_WORKER_TIMEOUT:-3600}" # 1 hour default (t314: restored after merge overwrite)
# Absolute max runtime: kill workers regardless of log activity.
Expand Down
Loading