-
Notifications
You must be signed in to change notification settings - Fork 5
fix: cancel stale diagnostic subtasks in pulse Phase 4c #678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Diagnostic tasks (diagnostic_of IS NOT NULL) become stale when their parent task reaches a terminal state (deployed/cancelled/failed) before the diagnostic is dispatched. Phase 4c now detects these and cancels them automatically during pulse, freeing queue slots. Fixes: t175-diag-1 lingering in queue after t175 was deployed.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
WalkthroughAdds Phase 4c to the pulse cycle cleanup routine that identifies and cancels stale diagnostic subtasks whose parent tasks have reached terminal states (deployed, cancelled, failed, complete, or merged). This targeted garbage collection prevents orphaned diagnostic work from lingering in queued or retrying states. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sun Feb 8 22:32:00 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.agents/scripts/supervisor-helper.sh:
- Around line 5484-5496: The SQL that builds stale_diags currently treats parent
tasks with p.status IN ('deployed','cancelled','failed','complete','merged') as
terminal and cancels their diagnostic subtasks; remove 'failed' from that IN
list (or replace it with a stronger explicit condition) so diagnostics for
failed parents are not cancelled by the stale_diags logic in the
supervisor-helper.sh script when evaluating the SUPERVISOR_DB query, preserving
self‑heal behavior for failed tasks.
- Around line 5489-5499: The DB output parsing is brittle because sqlite3's
output separator can be changed by ~/.sqliterc; modify the db invocation that
produces stale_diags so it sets the output separator explicitly (to the '|' used
by the parser) and disable headers/null-related formatting so the while IFS='|'
read -r diag_id parent_id parent_status loop reliably splits fields; update the
call that populates stale_diags (the db "$SUPERVISOR_DB" ... invocation) to pass
the sqlite3 options that force a deterministic '|' separator and consistent row
formatting.
| # Phase 4c: Cancel stale diagnostic subtasks whose parent is already resolved | ||
| # Diagnostic tasks (diagnostic_of != NULL) become stale when the parent task | ||
| # reaches a terminal state (deployed, cancelled, failed) before the diagnostic | ||
| # is dispatched. Cancel them to free queue slots. | ||
| local stale_diags | ||
| stale_diags=$(db "$SUPERVISOR_DB" " | ||
| SELECT d.id, d.diagnostic_of, p.status AS parent_status | ||
| FROM tasks d | ||
| JOIN tasks p ON d.diagnostic_of = p.id | ||
| WHERE d.diagnostic_of IS NOT NULL | ||
| AND d.status IN ('queued', 'retrying') | ||
| AND p.status IN ('deployed', 'cancelled', 'failed', 'complete', 'merged'); | ||
| " 2>/dev/null || echo "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid cancelling diagnostics for failed parents — it breaks self‑heal.
When a task fails, a diagnostic subtask is created to recover it. On Line 5495, including failed in the parent terminal list will cancel queued diagnostics whenever concurrency is tight, effectively disabling self‑heal for failed tasks. Consider excluding failed (or gating on a stronger condition) so diagnostics can still run for failed parents.
🔧 Suggested fix
- AND p.status IN ('deployed', 'cancelled', 'failed', 'complete', 'merged');
+ AND p.status IN ('deployed', 'cancelled', 'complete', 'merged');As per coding guidelines, "Automation scripts - focus on: Reliability and robustness".
🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 5484 - 5496, The SQL that
builds stale_diags currently treats parent tasks with p.status IN
('deployed','cancelled','failed','complete','merged') as terminal and cancels
their diagnostic subtasks; remove 'failed' from that IN list (or replace it with
a stronger explicit condition) so diagnostics for failed parents are not
cancelled by the stale_diags logic in the supervisor-helper.sh script when
evaluating the SUPERVISOR_DB query, preserving self‑heal behavior for failed
tasks.
| stale_diags=$(db "$SUPERVISOR_DB" " | ||
| SELECT d.id, d.diagnostic_of, p.status AS parent_status | ||
| FROM tasks d | ||
| JOIN tasks p ON d.diagnostic_of = p.id | ||
| WHERE d.diagnostic_of IS NOT NULL | ||
| AND d.status IN ('queued', 'retrying') | ||
| AND p.status IN ('deployed', 'cancelled', 'failed', 'complete', 'merged'); | ||
| " 2>/dev/null || echo "") | ||
|
|
||
| if [[ -n "$stale_diags" ]]; then | ||
| while IFS='|' read -r diag_id parent_id parent_status; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harden DB output parsing with an explicit separator.
The loop assumes | separators, but sqlite3 output can be influenced by user .sqliterc settings. Line 5489 should explicitly set the separator to keep parsing deterministic.
🔧 Suggested fix
- stale_diags=$(db "$SUPERVISOR_DB" "
+ stale_diags=$(db -separator '|' "$SUPERVISOR_DB" "
SELECT d.id, d.diagnostic_of, p.status AS parent_status
FROM tasks d
JOIN tasks p ON d.diagnostic_of = p.id
WHERE d.diagnostic_of IS NOT NULL
AND d.status IN ('queued', 'retrying')
AND p.status IN ('deployed', 'cancelled', 'failed', 'complete', 'merged');
" 2>/dev/null || echo "")As per coding guidelines, "Automation scripts - focus on: Reliability and robustness".
🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 5489 - 5499, The DB output
parsing is brittle because sqlite3's output separator can be changed by
~/.sqliterc; modify the db invocation that produces stale_diags so it sets the
output separator explicitly (to the '|' used by the parser) and disable
headers/null-related formatting so the while IFS='|' read -r diag_id parent_id
parent_status loop reliably splits fields; update the call that populates
stale_diags (the db "$SUPERVISOR_DB" ... invocation) to pass the sqlite3 options
that force a deterministic '|' separator and consistent row formatting.



Summary
diagnostic_of IS NOT NULL) become stale when their parent task reaches a terminal state (deployed/cancelled/failed/complete/merged) before the diagnostic is dispatchedProblem
t175-diag-1was lingering in the queue asqueuedeven though its parentt175had already been deployed. This wastes a concurrency slot and clutters the batch dashboard.Changes
supervisor-helper.sh: New Phase 4c block between Phase 4b (orphan cleanup) and Phase 5 (summary)queued/retryingstate whose parent is in a terminal statecancelledwith a descriptive error messagelocalvariables,IFS='|'parsing, error logging to$SUPERVISOR_LOGTesting
Summary by CodeRabbit