fix: Phase 3 AI lifecycle completely broken — gather_task_state references non-existent worker_pid column by marcusquinn · Pull Request #2275 · marcusquinn/aidevops

marcusquinn · 2026-02-25T03:20:57Z

Summary

Critical bug: gather_task_state() in ai-lifecycle.sh queries a worker_pid column that doesn't exist in the tasks table schema. SQLite silently returns empty results, causing the function to return 1 for every single task. Phase 3 (AI lifecycle) has been a complete no-op — zero tasks evaluated, zero actioned.

Impact

11 PRs stuck in review indefinitely — never merged
9 blocked tasks never re-evaluated for unblocking
Pipeline throughput at 0% for review → merge → deploy stages
Workers dispatch and produce PRs, but nothing happens after that
Concurrency slots wasted waiting for reviews that never complete

Root Cause

The worker_pid column was removed in a prior schema migration but gather_task_state() was not updated. The query SELECT ... worker_pid FROM tasks fails silently (SQLite returns empty on column-not-found when using 2>/dev/null || echo ""), the empty check on line 62 triggers, and the function returns 1.

Fix

Replace worker_pid with session_id (which exists in the schema)
Adapt worker-alive check to use session_id + task status instead of PID probing
Add comment documenting the migration to prevent recurrence

Evidence

[SUPERVISOR] ai-lifecycle: could not gather state for t1327.1
[SUPERVISOR] ai-lifecycle: could not gather state for t1327.2
[SUPERVISOR] ai-lifecycle: could not gather state for t1328
... (13 more)
[SUPERVISOR] ai-lifecycle: evaluated 0 tasks, actioned 0

Every pulse cycle. For days.

…n — Phase 3 completely broken The tasks table schema has no worker_pid column (removed in a prior migration). The SELECT query in gather_task_state() silently failed for EVERY task, returning empty results and causing the function to return 1. This made Phase 3 (ai-lifecycle) skip all 20+ eligible tasks per pulse — zero evaluated, zero actioned. All PR reviews, merges, blocked-task resolution, and deploy decisions were silently disabled. Impact: 11 PRs stuck in review indefinitely, 9 blocked tasks never re-evaluated, pipeline throughput at 0% for the review→merge→deploy stages. Fix: replace worker_pid with session_id (which exists in the schema) and adapt the worker-alive check to use session_id + task status instead of PID probing.

coderabbitai · 2026-02-25T03:21:06Z

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 16 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 4139e1f and 8095574.

📒 Files selected for processing (1)

.agents/scripts/supervisor/ai-lifecycle.sh

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix/gather-task-state-worker-pid

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-25T03:21:11Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug that had completely halted Phase 3 of the AI lifecycle, leading to stalled PRs and zero pipeline throughput. The fix involves updating the database query to reference the correct session_id column, which replaced a previously removed worker_pid column, and adjusting the worker-alive status check accordingly. This ensures that tasks can be properly evaluated and actioned, restoring the intended functionality of the AI lifecycle.

Highlights

Critical Bug Fix: Resolved a critical bug in ai-lifecycle.sh where gather_task_state() queried a non-existent worker_pid column, causing silent SQLite errors and preventing Phase 3 of the AI lifecycle from functioning.
Database Query Update: Replaced the reference to worker_pid with the correct session_id column in the tasks table query within gather_task_state().
Worker State Check Adaptation: Adapted the worker-alive check logic to utilize session_id and task status (e.g., 'running', 'dispatched') instead of attempting to probe a process ID (PID).
Documentation Added: Included a comment in the code to document the schema migration and prevent similar issues from recurring.

Changelog

.agents/scripts/supervisor/ai-lifecycle.sh
- Updated the database query to select session_id instead of worker_pid.
- Modified the parsing of the database row to use tsession instead of tpid.
- Adjusted the worker process state check to rely on session_id and task status.
- Added a comment explaining the worker_pid to session_id migration.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-25T03:21:38Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 56 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 25 03:21:34 UTC 2026: Code review monitoring started
Wed Feb 25 03:21:34 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 56

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 56
VULNERABILITIES: 0

Generated on: Wed Feb 25 03:21:37 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist

Code Review

This pull request provides a critical fix for the AI lifecycle, which was completely non-operational due to a query referencing a non-existent worker_pid column. The change correctly replaces worker_pid with session_id and adapts the worker-alive check to use the task's status, which is a robust solution.

sonarqubecloud · 2026-02-25T03:22:17Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

augmentcode · 2026-02-25T03:23:34Z

🤖 Augment PR Summary

Summary: Restores Phase 3 AI lifecycle by fixing gather_task_state() to query session_id instead of the removed worker_pid column.
Changes: Updates DB row parsing and worker-state reporting to be session/status-based, and documents the migration-related failure mode.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-02-25T03:23:35Z

.agents/scripts/supervisor/ai-lifecycle.sh

 	task_row=$(db -separator '|' "$SUPERVISOR_DB" "
 		SELECT id, status, pr_url, repo, branch, worktree, error,
-		       rebase_attempts, retries, max_retries, model, worker_pid
+		       rebase_attempts, retries, max_retries, model, session_id


This fixes the SELECT, but _dispatch_ai_worker later still does UPDATE tasks ... worker_pid = ... (with errors suppressed), which likely keeps worker state updates silently failing on the migrated schema. Consider aligning that path to write session_id (and/or rely on cmd_transition --session) so lifecycle decisions don’t diverge from actual worker state.

Severity: high

Other Locations

.agents/scripts/supervisor/ai-lifecycle.sh:684

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-02-25T03:23:35Z

.agents/scripts/supervisor/ai-lifecycle.sh

-			worker_alive="yes"
+	if [[ -n "$tsession" && "$tsession" != "0" && "$tsession" != "" ]]; then
+		# Session exists — check if the task is in an active-worker state
+		if [[ "$tstatus" == "running" || "$tstatus" == "dispatched" ]]; then


The new worker_alive heuristic only checks status + non-empty session_id, so if a worker crashes but status remains running/dispatched the AI may wait indefinitely. If session_id is pid:<n> (or a remote token), validating liveness via the PID file and/or log file (as the comment suggests) would make this more robust.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

… issue tag drift detection (t1336) Add three new sanity check sections to detect systemic pipeline failures: - Section 6: Pipeline phase health — detects Phase 3 zero-eval streaks, gather_task_state errors, dispatch stalls, underutilisation - Section 7: Schema validation — verifies required columns exist in tasks table via PRAGMA (catches the exact worker_pid bug from PR #2275) - Section 8: Cross-repo issue tag truthfulness — compares GH issue labels against DB state for all managed repos Update sanity prompt to instruct AI about new sections 6-8 with appropriate action recommendations. Add Phase 3 throughput metrics (last eval/actioned, zero-eval rate) to build_health_context() so the AI reasoner has visibility into whether the lifecycle phase is actually working.

… issue tag drift detection (t1336) (#2276) Add three new sanity check sections to detect systemic pipeline failures: - Section 6: Pipeline phase health — detects Phase 3 zero-eval streaks, gather_task_state errors, dispatch stalls, underutilisation - Section 7: Schema validation — verifies required columns exist in tasks table via PRAGMA (catches the exact worker_pid bug from PR #2275) - Section 8: Cross-repo issue tag truthfulness — compares GH issue labels against DB state for all managed repos Update sanity prompt to instruct AI about new sections 6-8 with appropriate action recommendations. Add Phase 3 throughput metrics (last eval/actioned, zero-eval rate) to build_health_context() so the AI reasoner has visibility into whether the lifecycle phase is actually working.

…lumn — Phase 3 stuck on first task The UPDATE in _dispatch_ai_worker (line 687) referenced worker_pid, a column removed in a prior schema migration. SQLite silently failed the entire UPDATE (including status='running'), so the dispatched task never transitioned out of 'blocked'. Next pulse, same task was first in priority order, got actioned again, same failure. The other 19 eligible tasks were never reached. Same root cause as PR #2275 (gather_task_state), different location. Fix: replace worker_pid column with session_id (matching the pattern used by dispatch.sh and deploy.sh).

…lumn — Phase 3 stuck on first task (#2278) The UPDATE in _dispatch_ai_worker (line 687) referenced worker_pid, a column removed in a prior schema migration. SQLite silently failed the entire UPDATE (including status='running'), so the dispatched task never transitioned out of 'blocked'. Next pulse, same task was first in priority order, got actioned again, same failure. The other 19 eligible tasks were never reached. Same root cause as PR #2275 (gather_task_state), different location. Fix: replace worker_pid column with session_id (matching the pattern used by dispatch.sh and deploy.sh).

marcusquinn merged commit 3b85e96 into main Feb 25, 2026
6 of 8 checks passed

gemini-code-assist bot reviewed Feb 25, 2026

View reviewed changes

augmentcode bot reviewed Feb 25, 2026

View reviewed changes

marcusquinn mentioned this pull request Feb 25, 2026

t1336: Supervisor self-diagnosis — pipeline health, schema validation, issue tag drift #2276

Merged

marcusquinn mentioned this pull request Feb 25, 2026

feat: AI supervisor pipeline self-heal via reasoning prompt #2277

Merged

marcusquinn mentioned this pull request Feb 25, 2026

fix: Phase 3 stuck on first task — worker_pid column in _dispatch_ai_worker UPDATE #2278

Merged

marcusquinn deleted the bugfix/gather-task-state-worker-pid branch March 3, 2026 03:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Phase 3 AI lifecycle completely broken — gather_task_state references non-existent worker_pid column#2275

fix: Phase 3 AI lifecycle completely broken — gather_task_state references non-existent worker_pid column#2275
marcusquinn merged 1 commit intomainfrom
bugfix/gather-task-state-worker-pid

marcusquinn commented Feb 25, 2026

Uh oh!

coderabbitai bot commented Feb 25, 2026

Rate limit exceeded

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

sonarqubecloud bot commented Feb 25, 2026

Uh oh!

augmentcode bot commented Feb 25, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

augmentcode bot Feb 25, 2026

Uh oh!

augmentcode bot Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marcusquinn commented Feb 25, 2026

Summary

Impact

Root Cause

Fix

Evidence

Uh oh!

coderabbitai bot commented Feb 25, 2026

Rate limit exceeded

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 25, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

sonarqubecloud bot commented Feb 25, 2026

Quality Gate passed

Uh oh!

augmentcode bot commented Feb 25, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant