t1183: Fix 3 supervisor pipeline bugs — PR-aware reaping, atomic eval, batch merge by marcusquinn · Pull Request #1790 · marcusquinn/aidevops

marcusquinn · 2026-02-18T21:04:55Z

Summary

Fixes three critical supervisor pipeline bugs that required repeated manual intervention during the PR backlog clearing session.

Bug 1: Phase 1c PR-aware reaping (lines 728-760)

Problem: Phase 1c catches stuck evaluating tasks with dead workers and blindly retries them — even when the task already has a PR. This dispatches a duplicate worker that may create a duplicate PR.
Fix: Before the retry/fail decision, check pr_url in the DB. If a PR exists, transition to pr_review (Phase 3 handles it) instead of retrying.

Bug 2: Atomic eval transitions (lines 474-512)

Problem: cmd_transition("complete") executed after proof-log, budget tracking, and other non-critical operations. If the pulse is killed by the stale lock breaker (600s timeout) between evaluation and transition, the task stays stuck in evaluating forever — re-evaluated every pulse but never transitioning.
Fix: Move cmd_transition + cleanup_worker_processes to execute immediately after the quality gate passes. All subsequent operations (proof-log, patterns, notifications, model labels) are non-critical and can be lost without affecting correctness.

Bug 3: Phase 3 batch merge (lines 2647-2720)

Problem: Each merge moves main forward, dirtying remaining PRs. If rebase fails, the PR is deferred to the next pulse — creating a growing backlog when workers create PRs faster than Phase 3 can merge them.
Fix: After each successful merge, git pull --rebase origin main in the task's repo so subsequent PRs can rebase cleanly in the same pulse. Added SUPERVISOR_MAX_MERGES_PER_PULSE cap (default: 5) to prevent runaway merge loops.

Testing

bash -n syntax check: pass
shellcheck -x -s bash: zero violations
All changes in a single file: .agents/scripts/supervisor/pulse.sh

Closes #1788

Summary by CodeRabbit

Bug Fixes

Fixed potential duplicate pull requests in task workflows
Improved recovery handling for stale task states
Enhanced task completion checkpointing for crash safety
Added merge rate limiting (5 per cycle) for improved stability under heavy workloads
Optimized repository synchronization for cleaner task cascades

…itions, batch merge Bug 1 (Phase 1c): Check pr_url before retrying stuck evaluating tasks. Tasks with PRs go to pr_review instead of retrying, preventing duplicate worker dispatch and duplicate PRs. Bug 2 (Phase 1 eval): cmd_transition('complete') now executes immediately after quality gate passes, before non-critical post-processing (proof-log, patterns, notifications). Prevents lost transitions when pulse is killed by stale lock breaker at 600s timeout. Bug 3 (Phase 3): Pull main after each merge so subsequent PRs can rebase cleanly in the same pulse. Cap at SUPERVISOR_MAX_MERGES_PER_PULSE (default 5) to prevent runaway merge loops when workers create PRs faster than Phase 3 can process them.

gemini-code-assist · 2026-02-18T21:04:59Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-02-18T21:05:11Z

Walkthrough

Enhanced supervisor pipeline pulse script with crash-safe checkpoint transitions after task completion, PR-aware retry routing in Phase 1c/0.7 to avoid duplicate PRs, per-pulse merge cap (default 5) in Phase 3, and post-merge main branch sync for cleaner cascading PR rebases.

Changes

Cohort / File(s)	Summary
Supervisor Pulse Phase Enhancements `.agents/scripts/supervisor/pulse.sh`	Adds immediate checkpoint transitions with state cleanup after task completion; enhances Phase 1c/0.7 stale-evaluating detection to route to pr_review if PR exists, otherwise retry; introduces per-pulse merge cap (max_merges_per_pulse=5) in Phase 3; adds post-merge main branch sync to rebase subsequent PRs against updated main.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

#1488: Extends Phase 3b2 PR-driven reconciliation with merged/closed PR handling and related recovery paths
#917: Implements per-task orphaned-PR detection to avoid unnecessary retries and transitions tasks to pr_review when PRs exist
#1704: Adds PR adoption into Phase 3a pipeline alongside PR-aware retry/transition logic and post-merge sync handling

Poem

🚀 Checkpoints lock in, clean and fast,
Duplicates vanish—a thing of the past,
Phases align with PR-aware might,
Five merges per pulse, rebasing just right! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the three main bugs being fixed (PR-aware reaping, atomic eval, batch merge) and directly relates to the changeset.
Linked Issues check	✅ Passed	The PR successfully implements all three objectives from `#1788`: PR-aware reaping in Phase 1c, atomic evaluation transitions in Phase 1b, and Phase 3 batch merge handling with merge caps.
Out of Scope Changes check	✅ Passed	All changes in the pulse.sh file are directly scoped to fixing the three identified bugs; no unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix/t1183-supervisor-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-18T21:05:42Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 26 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 18 21:05:38 UTC 2026: Code review monitoring started
Wed Feb 18 21:05:38 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 26

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 26
VULNERABILITIES: 0

Generated on: Wed Feb 18 21:05:41 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-18T21:06:29Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

.agents/scripts/supervisor/pulse.sh (2)
2707-2716: git pull --rebase could fail or cause unintended rebases if the repo isn't on main.

git pull --rebase origin main rebases the current branch onto origin/main. If the repo working directory happens to be checked out on a different branch (e.g., after a worktree operation or interrupted cleanup), this either fails or rebases the wrong branch. The || log_warn prevents breakage, but the update intent is silently lost.

If the goal is simply to ensure subsequent git rebase origin/main commands in worktrees see the latest main, git fetch origin main is sufficient and branch-agnostic.
Proposed: use fetch instead of pull --rebase
 				if [[ -n "$merge_repo" && -d "$merge_repo" ]]; then
-					git -C "$merge_repo" pull --rebase origin main 2>>"$SUPERVISOR_LOG" ||
+					git -C "$merge_repo" fetch origin main 2>>"$SUPERVISOR_LOG" ||
 						log_warn "  $tid: failed to pull main in $merge_repo after merge (non-blocking)"
 				fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh around lines 2707 - 2716, The current
command git -C "$merge_repo" pull --rebase origin main can rebase the
checked-out branch or fail if not on main; replace it with a branch-agnostic
fetch: run git -C "$merge_repo" fetch origin main 2>>"$SUPERVISOR_LOG" ||
log_warn "  $tid: failed to fetch main in $merge_repo after merge
(non-blocking)". Update the block using merge_repo, SUPERVISOR_LOG and log_warn
so it calls db/sql_escape/tid unchanged but uses git ... fetch origin main
instead of pull --rebase.
2650-2666: Merge cap is effective, but break also defers no-PR deploy transitions.

The break at line 2665 exits the entire loop, which stops processing all remaining eligible tasks — including complete tasks with no_pr/task_only that would otherwise get a cheap direct-to-deployed transition at line 2673 (no actual git merge involved). Those deferred deploys don't contribute to the cascading rebase problem the cap is designed to prevent.

This is low-impact (they'll process next pulse in ~5 min), but if you want to preserve throughput for no-PR tasks while still capping real merges, a continue with a flag could skip only merge-eligible tasks.
Optional: skip only merge-eligible tasks instead of breaking
-		# t1183 Bug 3: Stop processing if we've hit the merge cap.
-		# Remaining tasks will be picked up in the next pulse cycle.
-		if [[ "$merged_count" -ge "$max_merges_per_pulse" ]]; then
-			log_info "  Phase 3: reached max merges per pulse ($max_merges_per_pulse), deferring rest to next cycle"
-			break
-		fi
+		# t1183 Bug 3: Once merge cap is reached, only allow no-PR fast-path deploys.
+		# Real merges are deferred to the next pulse cycle.
+		local merge_cap_reached=false
+		if [[ "$merged_count" -ge "$max_merges_per_pulse" ]]; then
+			merge_cap_reached=true
+		fi
Then guard the cmd_pr_lifecycle call (around line 2697) with:
if [[ "$merge_cap_reached" == "true" ]]; then
    log_info "  $tid: deferred (merge cap reached)"
    deferred_count=$((deferred_count + 1))
    continue
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh around lines 2650 - 2666, The loop
currently uses break when merged_count >= max_merges_per_pulse which prevents
any further task processing (including no-PR/complete deploy-only tasks); change
that to set a flag (e.g. merge_cap_reached="true") and continue so the loop
keeps iterating for non-merge work; then, before attempting any merge work
(identify the merge path around cmd_pr_lifecycle and where sibling/parent checks
use merged_parents and merged_count), guard the merge branch with if [[
"$merge_cap_reached" == "true" ]] then log_info "  $tid: deferred (merge cap
reached)"; deferred_count=$((deferred_count+1)); continue; fi so only
merge-eligible tasks are skipped while no-PR/complete transitions still run.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 739-741: The condition that routes tasks to pr_review should
exclude the "verified_complete" sentinel and also preserve the audit flags
passed in Phase 0.7; update the if-check that uses stuck_pr_url to add a check
that "$stuck_pr_url" != "verified_complete", and change the cmd_transition call
(the invocation of cmd_transition "$stuck_id" "pr_review") to include the same
--pr-url "$stuck_pr_url" and --error "$stuck_error" arguments (and keep the
2>>"$SUPERVISOR_LOG" || true redirection) so the transition and log_info call
maintain parity with Phase 0.7 behavior and existing sentinel handling.
- Around line 749-753: The Phase 1c branch double-increments task retries:
cmd_transition "$stuck_id" "retrying" already causes state.sh to increment
retries, and the following manual db UPDATE (UPDATE tasks SET retries = retries
+ 1 ...) adds a second increment; remove the redundant manual increment line and
keep the cmd_transition call (or replace cmd_transition with an explicit set to
the intended absolute value if you truly need to control retries), i.e. delete
or comment out the db "$SUPERVISOR_DB" "UPDATE tasks SET retries = retries +
1..." statement so retries increase only once when cmd_transition is invoked.

---

Nitpick comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 2707-2716: The current command git -C "$merge_repo" pull --rebase
origin main can rebase the checked-out branch or fail if not on main; replace it
with a branch-agnostic fetch: run git -C "$merge_repo" fetch origin main
2>>"$SUPERVISOR_LOG" || log_warn "  $tid: failed to fetch main in $merge_repo
after merge (non-blocking)". Update the block using merge_repo, SUPERVISOR_LOG
and log_warn so it calls db/sql_escape/tid unchanged but uses git ... fetch
origin main instead of pull --rebase.
- Around line 2650-2666: The loop currently uses break when merged_count >=
max_merges_per_pulse which prevents any further task processing (including
no-PR/complete deploy-only tasks); change that to set a flag (e.g.
merge_cap_reached="true") and continue so the loop keeps iterating for non-merge
work; then, before attempting any merge work (identify the merge path around
cmd_pr_lifecycle and where sibling/parent checks use merged_parents and
merged_count), guard the merge branch with if [[ "$merge_cap_reached" == "true"
]] then log_info "  $tid: deferred (merge cap reached)";
deferred_count=$((deferred_count+1)); continue; fi so only merge-eligible tasks
are skipped while no-PR/complete transitions still run.

coderabbitai · 2026-02-18T21:10:41Z

.agents/scripts/supervisor/pulse.sh

+			if [[ -n "$stuck_pr_url" && "$stuck_pr_url" != "no_pr" && "$stuck_pr_url" != "task_only" ]]; then
+				cmd_transition "$stuck_id" "pr_review" 2>>"$SUPERVISOR_LOG" || true
+				log_info "  Phase 1c: $stuck_id → pr_review (has PR: $stuck_pr_url, skipping retry)"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, examine the exact lines in pulse.sh at 739-741 echo "=== Lines 739-741 in pulse.sh ===" sed -n '739,741p' .agents/scripts/supervisor/pulse.sh echo "" echo "=== Context around lines 739-741 (735-745) ===" sed -n '735,745p' .agents/scripts/supervisor/pulse.sh

Repository: marcusquinn/aidevops

Length of output: 1077

🏁 Script executed:

# Search for verified_complete sentinel usage across the codebase echo "=== All verified_complete references in pulse.sh ===" rg -n "verified_complete" .agents/scripts/supervisor/pulse.sh -B2 -A2 echo "" echo "=== Compare with no_pr and task_only usage ===" rg -n 'no_pr.*task_only|task_only.*no_pr' .agents/scripts/supervisor/pulse.sh -B1 -A1

Repository: marcusquinn/aidevops

Length of output: 2166

🏁 Script executed:

# Find Phase 0.7 and the transition at line 289 echo "=== Phase 0.7 context (around line 289) ===" sed -n '280,300p' .agents/scripts/supervisor/pulse.sh echo "" echo "=== cmd_transition calls with pr_review in pulse.sh ===" rg -n 'cmd_transition.*pr_review' .agents/scripts/supervisor/pulse.sh -B2 -A2

Repository: marcusquinn/aidevops

Length of output: 3180

🏁 Script executed:

# Check state.sh for cmd_transition definition and flag handling echo "=== cmd_transition implementation in state.sh ===" if [ -f ".agents/scripts/supervisor/state.sh" ]; then rg -n "cmd_transition" .agents/scripts/supervisor/state.sh | head -5 # Look for the function definition sed -n '1,200p' .agents/scripts/supervisor/state.sh | rg -n "cmd_transition|--pr-url|--error" -B2 -A2 else echo "state.sh not found" fi

Repository: marcusquinn/aidevops

Length of output: 862

🏁 Script executed:

# Verify all sentinel checks in conditional statements echo "=== All conditional checks for pr_url sentinels ===" rg -n 'if.*\$\{.*pr.*url.*!=' .agents/scripts/supervisor/pulse.sh -B1 -A1 | head -60

Repository: marcusquinn/aidevops

Length of output: 115

Add missing verified_complete sentinel check and audit trail parity with Phase 0.7.

Line 739 filters no_pr and task_only but omits verified_complete, which is explicitly checked in Phase 3b2 (line 960), Phase 3 (line 2673), and other pr_url sentinel validations (line 1144). If a task has pr_url == "verified_complete", the condition evaluates true and incorrectly routes to pr_review.

Additionally, Phase 0.7 (line 289) passes --pr-url and --error flags to cmd_transition for the same type of transition; Phase 1c (line 740) omits both, losing audit trail and reducing consistency.

Proposed fix

- if [[ -n "$stuck_pr_url" && "$stuck_pr_url" != "no_pr" && "$stuck_pr_url" != "task_only" ]]; then - cmd_transition "$stuck_id" "pr_review" 2>>"$SUPERVISOR_LOG" || true + if [[ -n "$stuck_pr_url" && "$stuck_pr_url" != "no_pr" && "$stuck_pr_url" != "task_only" && "$stuck_pr_url" != "verified_complete" ]]; then + cmd_transition "$stuck_id" "pr_review" --pr-url "$stuck_pr_url" --error "Stuck evaluating recovery (Phase 1c/t1183): worker dead, PR exists" 2>>"$SUPERVISOR_LOG" || true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ -n "$stuck_pr_url" && "$stuck_pr_url" != "no_pr" && "$stuck_pr_url" != "task_only" ]]; then

cmd_transition "$stuck_id" "pr_review" 2>>"$SUPERVISOR_LOG" || true

log_info " Phase 1c: $stuck_id → pr_review (has PR: $stuck_pr_url, skipping retry)"

if [[ -n "$stuck_pr_url" && "$stuck_pr_url" != "no_pr" && "$stuck_pr_url" != "task_only" && "$stuck_pr_url" != "verified_complete" ]]; then

cmd_transition "$stuck_id" "pr_review" --pr-url "$stuck_pr_url" --error "Stuck evaluating recovery (Phase 1c/t1183): worker dead, PR exists" 2>>"$SUPERVISOR_LOG" || true

log_info " Phase 1c: $stuck_id → pr_review (has PR: $stuck_pr_url, skipping retry)"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/pulse.sh around lines 739 - 741, The condition that routes tasks to pr_review should exclude the "verified_complete" sentinel and also preserve the audit flags passed in Phase 0.7; update the if-check that uses stuck_pr_url to add a check that "$stuck_pr_url" != "verified_complete", and change the cmd_transition call (the invocation of cmd_transition "$stuck_id" "pr_review") to include the same --pr-url "$stuck_pr_url" and --error "$stuck_error" arguments (and keep the 2>>"$SUPERVISOR_LOG" || true redirection) so the transition and log_info call maintain parity with Phase 0.7 behavior and existing sentinel handling.

coderabbitai · 2026-02-18T21:10:41Z

.agents/scripts/supervisor/pulse.sh

+				if [[ "$stuck_retries" -lt "$stuck_max_retries" ]]; then
+					# Transition to retrying so it gets re-dispatched
+					cmd_transition "$stuck_id" "retrying" --error "Auto-reaped: stuck in evaluating >10min with dead worker (Phase 1c)" 2>>"$SUPERVISOR_LOG" || true
+					db "$SUPERVISOR_DB" "UPDATE tasks SET retries = retries + 1, updated_at = strftime('%Y-%m-%dT%H:%M:%SZ', 'now') WHERE id = '$(sql_escape "$stuck_id")';" 2>/dev/null || true
+					log_info "  Phase 1c: $stuck_id → retrying (retry $((stuck_retries + 1))/$stuck_max_retries)"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify cmd_transition auto-increment behavior and identify redundant increments echo "=== Checking state.sh for 'retrying' state auto-increment logic ===" fd state.sh --type f | head -5

Repository: marcusquinn/aidevops

Length of output: 167

🏁 Script executed:

#!/bin/bash # Search for retrying state increment pattern rg -n "retrying" --type=sh -B3 -A3 | head -80

Repository: marcusquinn/aidevops

Length of output: 5627

🏁 Script executed:

#!/bin/bash # Find all retry increment patterns in pulse.sh fd pulse.sh --type f

Repository: marcusquinn/aidevops

Length of output: 137

🏁 Script executed:

#!/bin/bash # Get state.sh content around retrying logic cat -n .agents/scripts/supervisor/state.sh | grep -A10 -B10 "retrying"

Repository: marcusquinn/aidevops

Length of output: 5418

🏁 Script executed:

#!/bin/bash # Get pulse.sh content around lines 749-753 sed -n '740,760p' .agents/scripts/supervisor/pulse.sh | cat -n

Repository: marcusquinn/aidevops

Length of output: 1597

🏁 Script executed:

#!/bin/bash # Find cmd_transition definition and check its behavior with retrying rg -n "cmd_transition" .agents/scripts/supervisor/ --type=sh -A15 | head -100

Repository: marcusquinn/aidevops

Length of output: 7774

🏁 Script executed:

#!/bin/bash # Verify Phase 0.7 pattern (lines ~290-310) sed -n '280,320p' .agents/scripts/supervisor/todo-sync.sh | cat -n

Repository: marcusquinn/aidevops

Length of output: 1730

🏁 Script executed:

#!/bin/bash # Search for other manual retry increments in pulse.sh to see if this is a pattern rg -n "retries = retries \+ 1|retries \+= 1" .agents/scripts/supervisor/pulse.sh -C3

Repository: marcusquinn/aidevops

Length of output: 689

🏁 Script executed:

#!/bin/bash # Search for Phase 0.7 in pulse.sh to verify the comparison sed -n '290,310p' .agents/scripts/supervisor/pulse.sh | cat -n

Repository: marcusquinn/aidevops

Length of output: 1583

🏁 Script executed:

#!/bin/bash # Look for all Phase patterns and retry logic in pulse.sh rg -n "Phase 0\.|Phase 1" .agents/scripts/supervisor/pulse.sh -B2 -A8 | head -150

Repository: marcusquinn/aidevops

Length of output: 8105

🏁 Script executed:

#!/bin/bash # Find cmd_transition function definition to verify it executes the update_parts sed -n '1,300p' .agents/scripts/supervisor/state.sh | grep -A30 "^cmd_transition()"

Repository: marcusquinn/aidevops

Length of output: 648

🏁 Script executed:

#!/bin/bash # Check if db function is being called to execute the UPDATE in cmd_transition sed -n '1,250p' .agents/scripts/supervisor/state.sh | tail -100

Repository: marcusquinn/aidevops

Length of output: 3714

Double retry increment — each Phase 1c recovery consumes two retries instead of one.

When cmd_transition "$stuck_id" "retrying" executes (line 751), state.sh appends retries = retries + 1 to the UPDATE statement (lines 182–184). The subsequent manual UPDATE tasks SET retries = retries + 1 on line 752 increments it a second time. Tasks exhaust their retry budget at 2× the intended rate.

Compare with Phase 0.7 (lines 298–301), which correctly sets an absolute value (retries = $new_retries) before calling cmd_transition to "queued" — a state that does not auto-increment retries in cmd_transition.

Proposed fix: remove the redundant increment

if [[ "$stuck_retries" -lt "$stuck_max_retries" ]]; then # Transition to retrying so it gets re-dispatched cmd_transition "$stuck_id" "retrying" --error "Auto-reaped: stuck in evaluating >10min with dead worker (Phase 1c)" 2>>"$SUPERVISOR_LOG" || true - db "$SUPERVISOR_DB" "UPDATE tasks SET retries = retries + 1, updated_at = strftime('%Y-%m-%dT%H:%M:%SZ', 'now') WHERE id = '$(sql_escape "$stuck_id")';" 2>/dev/null || true log_info " Phase 1c: $stuck_id → retrying (retry $((stuck_retries + 1))/$stuck_max_retries)"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ "$stuck_retries" -lt "$stuck_max_retries" ]]; then

# Transition to retrying so it gets re-dispatched

cmd_transition "$stuck_id" "retrying" --error "Auto-reaped: stuck in evaluating >10min with dead worker (Phase 1c)" 2>>"$SUPERVISOR_LOG" || true

db "$SUPERVISOR_DB" "UPDATE tasks SET retries = retries + 1, updated_at = strftime('%Y-%m-%dT%H:%M:%SZ', 'now') WHERE id = '$(sql_escape "$stuck_id")';" 2>/dev/null || true

log_info " Phase 1c: $stuck_id → retrying (retry $((stuck_retries + 1))/$stuck_max_retries)"

if [[ "$stuck_retries" -lt "$stuck_max_retries" ]]; then

# Transition to retrying so it gets re-dispatched

cmd_transition "$stuck_id" "retrying" --error "Auto-reaped: stuck in evaluating >10min with dead worker (Phase 1c)" 2>>"$SUPERVISOR_LOG" || true

log_info " Phase 1c: $stuck_id → retrying (retry $((stuck_retries + 1))/$stuck_max_retries)"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/pulse.sh around lines 749 - 753, The Phase 1c branch double-increments task retries: cmd_transition "$stuck_id" "retrying" already causes state.sh to increment retries, and the following manual db UPDATE (UPDATE tasks SET retries = retries + 1 ...) adds a second increment; remove the redundant manual increment line and keep the cmd_transition call (or replace cmd_transition with an explicit set to the intended absolute value if you truly need to control retries), i.e. delete or comment out the db "$SUPERVISOR_DB" "UPDATE tasks SET retries = retries + 1..." statement so retries increase only once when cmd_transition is invoked.

marcusquinn merged commit 921c1f9 into main Feb 18, 2026
10 of 11 checks passed

marcusquinn added a commit that referenced this pull request Feb 18, 2026

chore: mark t1183 complete (PR #1790), fix stashed merge conflict

b4f2e70

coderabbitai bot requested changes Feb 18, 2026

View reviewed changes

marcusquinn deleted the bugfix/t1183-supervisor-pipeline branch February 21, 2026 01:59

This was referenced Feb 21, 2026

refactor: AI-driven task lifecycle engine replacing hardcoded bash heuristics #2107

Merged

fix: skip Phase 4b2 legacy pr_review recovery when AI lifecycle is active #2108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

t1183: Fix 3 supervisor pipeline bugs — PR-aware reaping, atomic eval, batch merge#1790

t1183: Fix 3 supervisor pipeline bugs — PR-aware reaping, atomic eval, batch merge#1790
marcusquinn merged 1 commit intomainfrom
bugfix/t1183-supervisor-pipeline

marcusquinn commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

sonarqubecloud bot commented Feb 18, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 18, 2026

Uh oh!

coderabbitai bot Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

marcusquinn commented Feb 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug 1: Phase 1c PR-aware reaping (lines 728-760)

Bug 2: Atomic eval transitions (lines 474-512)

Bug 3: Phase 3 batch merge (lines 2647-2720)

Testing

Summary by CodeRabbit

Bug Fixes

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions bot commented Feb 18, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 18, 2026

Quality Gate passed

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading