fix: Phase 3 AI lifecycle completely broken — gather_task_state references non-existent worker_pid column#2275
Conversation
…n — Phase 3 completely broken The tasks table schema has no worker_pid column (removed in a prior migration). The SELECT query in gather_task_state() silently failed for EVERY task, returning empty results and causing the function to return 1. This made Phase 3 (ai-lifecycle) skip all 20+ eligible tasks per pulse — zero evaluated, zero actioned. All PR reviews, merges, blocked-task resolution, and deploy decisions were silently disabled. Impact: 11 PRs stuck in review indefinitely, 9 blocked tasks never re-evaluated, pipeline throughput at 0% for the review→merge→deploy stages. Fix: replace worker_pid with session_id (which exists in the schema) and adapt the worker-alive check to use session_id + task status instead of PID probing.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug that had completely halted Phase 3 of the AI lifecycle, leading to stalled PRs and zero pipeline throughput. The fix involves updating the database query to reference the correct Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Feb 25 03:21:37 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request provides a critical fix for the AI lifecycle, which was completely non-operational due to a query referencing a non-existent worker_pid column. The change correctly replaces worker_pid with session_id and adapts the worker-alive check to use the task's status, which is a robust solution.
|
🤖 Augment PR SummarySummary: Restores Phase 3 AI lifecycle by fixing 🤖 Was this summary useful? React with 👍 or 👎 |
| task_row=$(db -separator '|' "$SUPERVISOR_DB" " | ||
| SELECT id, status, pr_url, repo, branch, worktree, error, | ||
| rebase_attempts, retries, max_retries, model, worker_pid | ||
| rebase_attempts, retries, max_retries, model, session_id |
There was a problem hiding this comment.
This fixes the SELECT, but _dispatch_ai_worker later still does UPDATE tasks ... worker_pid = ... (with errors suppressed), which likely keeps worker state updates silently failing on the migrated schema. Consider aligning that path to write session_id (and/or rely on cmd_transition --session) so lifecycle decisions don’t diverge from actual worker state.
Severity: high
Other Locations
.agents/scripts/supervisor/ai-lifecycle.sh:684
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| worker_alive="yes" | ||
| if [[ -n "$tsession" && "$tsession" != "0" && "$tsession" != "" ]]; then | ||
| # Session exists — check if the task is in an active-worker state | ||
| if [[ "$tstatus" == "running" || "$tstatus" == "dispatched" ]]; then |
There was a problem hiding this comment.
The new worker_alive heuristic only checks status + non-empty session_id, so if a worker crashes but status remains running/dispatched the AI may wait indefinitely. If session_id is pid:<n> (or a remote token), validating liveness via the PID file and/or log file (as the comment suggests) would make this more robust.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
… issue tag drift detection (t1336) Add three new sanity check sections to detect systemic pipeline failures: - Section 6: Pipeline phase health — detects Phase 3 zero-eval streaks, gather_task_state errors, dispatch stalls, underutilisation - Section 7: Schema validation — verifies required columns exist in tasks table via PRAGMA (catches the exact worker_pid bug from PR #2275) - Section 8: Cross-repo issue tag truthfulness — compares GH issue labels against DB state for all managed repos Update sanity prompt to instruct AI about new sections 6-8 with appropriate action recommendations. Add Phase 3 throughput metrics (last eval/actioned, zero-eval rate) to build_health_context() so the AI reasoner has visibility into whether the lifecycle phase is actually working.
… issue tag drift detection (t1336) (#2276) Add three new sanity check sections to detect systemic pipeline failures: - Section 6: Pipeline phase health — detects Phase 3 zero-eval streaks, gather_task_state errors, dispatch stalls, underutilisation - Section 7: Schema validation — verifies required columns exist in tasks table via PRAGMA (catches the exact worker_pid bug from PR #2275) - Section 8: Cross-repo issue tag truthfulness — compares GH issue labels against DB state for all managed repos Update sanity prompt to instruct AI about new sections 6-8 with appropriate action recommendations. Add Phase 3 throughput metrics (last eval/actioned, zero-eval rate) to build_health_context() so the AI reasoner has visibility into whether the lifecycle phase is actually working.
…lumn — Phase 3 stuck on first task The UPDATE in _dispatch_ai_worker (line 687) referenced worker_pid, a column removed in a prior schema migration. SQLite silently failed the entire UPDATE (including status='running'), so the dispatched task never transitioned out of 'blocked'. Next pulse, same task was first in priority order, got actioned again, same failure. The other 19 eligible tasks were never reached. Same root cause as PR #2275 (gather_task_state), different location. Fix: replace worker_pid column with session_id (matching the pattern used by dispatch.sh and deploy.sh).
…lumn — Phase 3 stuck on first task (#2278) The UPDATE in _dispatch_ai_worker (line 687) referenced worker_pid, a column removed in a prior schema migration. SQLite silently failed the entire UPDATE (including status='running'), so the dispatched task never transitioned out of 'blocked'. Next pulse, same task was first in priority order, got actioned again, same failure. The other 19 eligible tasks were never reached. Same root cause as PR #2275 (gather_task_state), different location. Fix: replace worker_pid column with session_id (matching the pattern used by dispatch.sh and deploy.sh).



Summary
Critical bug:
gather_task_state()inai-lifecycle.shqueries aworker_pidcolumn that doesn't exist in the tasks table schema. SQLite silently returns empty results, causing the function to return 1 for every single task. Phase 3 (AI lifecycle) has been a complete no-op — zero tasks evaluated, zero actioned.Impact
Root Cause
The
worker_pidcolumn was removed in a prior schema migration butgather_task_state()was not updated. The querySELECT ... worker_pid FROM tasksfails silently (SQLite returns empty on column-not-found when using2>/dev/null || echo ""), the empty check on line 62 triggers, and the function returns 1.Fix
worker_pidwithsession_id(which exists in the schema)Evidence
Every pulse cycle. For days.