t1259: Fix stale evaluating recovery pattern with pre/post-eval heartbeats by marcusquinn · Pull Request #1968 · marcusquinn/aidevops

marcusquinn · 2026-02-19T14:51:20Z

Summary

Root cause: Phase 0.7 was firing on tasks that completed successfully via the fast-path (evaluate_worker() returns complete: without calling evaluate_with_ai()). These tasks had no heartbeat protection beyond the cmd_transition('evaluating') timestamp. After 240s, Phase 0.7 fired and routed to pr_review unnecessarily, adding an extra recovery cycle to every completed task.
Three fixes applied:
1. Pre-evaluation heartbeat: refresh updated_at immediately before evaluate_worker() so the 240s heartbeat window is anchored to evaluation start, not the earlier cmd_transition('evaluating') call
2. Post-evaluation heartbeat: refresh updated_at after evaluate_worker() returns complete:* to extend the window through the quality gate and cmd_transition call
3. Increase fast-path grace from 10s to 30s: evaluate_worker() can take 10-30s for PR discovery via GitHub API; the 10s grace caused false positives when the task was actively being evaluated

Impact

Eliminates unnecessary Phase 0.7 recovery cycles for fast-path completions
Reduces dispatch latency by removing the extra pr_review cycle
The heartbeat check in _diagnose_stale_root_cause() (240s window) remains the primary protection

Files Changed

.agents/scripts/supervisor/pulse.sh: Pre/post-eval heartbeats + increased fast-path grace period

Ref #1967

Summary by CodeRabbit

Chores
- Improved task evaluation reliability by refreshing heartbeats immediately before and after evaluation runs, ensuring timestamps reflect actual start/end.
- Increased fast-path grace window from 10s to 30s to reduce premature recovery during evaluations.
- Made heartbeat and recovery handling more consistent across evaluation paths to reduce false recoveries and timing-related flakiness.

…artbeats (t1259) Root cause: Phase 0.7 was firing on tasks that completed successfully via the fast-path (evaluate_worker() returns complete: without calling evaluate_with_ai()). These tasks had no heartbeat protection beyond the cmd_transition('evaluating') timestamp. If the pulse was killed between evaluate_worker() returning and cmd_transition('complete'), the task stayed in evaluating with a stale updated_at. After 240s (heartbeat_window), Phase 0.7 fired and routed to pr_review unnecessarily. Three fixes: 1. Pre-evaluation heartbeat: refresh updated_at immediately before evaluate_worker() so the 240s heartbeat window is anchored to evaluation start, not the earlier cmd_transition('evaluating') call. 2. Post-evaluation heartbeat: refresh updated_at after evaluate_worker() returns complete:* to extend the window through the quality gate and cmd_transition call. 3. Increase fast-path grace from 10s to 30s: evaluate_worker() can take 10-30s for PR discovery via GitHub API. The 10s grace caused false positives when the task was actively being evaluated but updated_at was 10-30s old. The heartbeat check in _diagnose_stale_root_cause() (240s window) remains the primary protection; these changes reduce the frequency of unnecessary recovery cycles and improve dispatch latency.

coderabbitai · 2026-02-19T14:51:47Z

Walkthrough

This change updates .agents/scripts/supervisor/pulse.sh to add a new _update_task_heartbeat() helper and perform explicit pre- and post-evaluation heartbeat writes around evaluate_worker() calls, and increases the fast-path grace for evaluating+PR from 10s to 30s across related fast-path and recovery branches.

Changes

Cohort / File(s)	Summary
Supervisor Pulse Heartbeat & Grace Period Updates `.agents/scripts/supervisor/pulse.sh`	Adds `_update_task_heartbeat()` to write `updated_at`; calls it immediately before `evaluate_worker()` and after a successful completion; raises `SUPERVISOR_FAST_PATH_EVALUATING_GRACE_SECONDS` from 10 → 30s and applies the larger grace across Phase 1, 0.7/0.8 fast-paths and PR-related branches; updates comments documenting t1259 usage and heartbeat anchoring.

Sequence Diagram(s)

sequenceDiagram
    participant Supervisor
    participant Database
    participant Evaluator as evaluate_worker()

    Supervisor->>Database: _update_task_heartbeat() (set updated_at = now)
    Supervisor->>Evaluator: start evaluate_worker()
    Evaluator->>Supervisor: running / progress updates (optional)
    Evaluator-->>Supervisor: outcome = "complete"
    Supervisor->>Database: _update_task_heartbeat() (set updated_at = now)
    Supervisor->>Supervisor: apply fast-path grace (30s) / recovery checks

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

t1254: Fix stale-evaluating periodic heartbeat — eliminate false recoveries for long evals #1958: Similar supervisor heartbeat and evaluation grace-window adjustments affecting the same pulse/evaluation paths.
t1183: Fix 3 supervisor pipeline bugs — PR-aware reaping, atomic eval, batch merge #1790: Changes Phase 1 evaluation-to-complete transitions and PR-aware state handling in the supervisor pulse flow.
t1245: fix stale evaluating — persist PR URL to DB immediately in evaluate_worker() #1940: Adds PR-persist checkpoint and pulse recovery handling related to stale "evaluating" states—overlaps heartbeat/recovery concerns.

Poem

⏱️ A tiny heartbeat, written twice,
Thirty seconds calm the noisy splice.
Pre and post the eval takes place,
Timestamps steady, races trace—
Supervisor hums, the pulse keeps pace. 💓

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: fixing stale evaluating recovery by adding pre/post-evaluation heartbeats, which aligns perfectly with the changeset's core purpose.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1259

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-19T14:51:48Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where tasks completing via the fast-path were erroneously flagged for recovery due to an outdated heartbeat mechanism. By introducing explicit pre- and post-evaluation heartbeats, the system now accurately reflects the active evaluation state, preventing unnecessary recovery cycles. Additionally, the fast-path grace period has been extended to accommodate the time required for PR discovery, further reducing false positives and improving overall dispatch latency.

Highlights

Pre-evaluation Heartbeat: Implemented a heartbeat to refresh the task's updated_at timestamp immediately before evaluate_worker() is called, ensuring the evaluation start time is correctly anchored for fast-path completions.
Post-evaluation Heartbeat: Added a heartbeat to refresh the task's updated_at timestamp after evaluate_worker() returns a complete:* outcome, extending the grace window through the quality gate and final transition.
Increased Fast-Path Grace Period: Increased the SUPERVISOR_FAST_PATH_EVALUATING_GRACE_SECONDS from 10 seconds to 30 seconds to prevent false recoveries for tasks actively undergoing PR discovery via the GitHub API.

Changelog

.agents/scripts/supervisor/pulse.sh
- Updated comments to reflect the t1259 fix for evaluation heartbeats.
- Modified the default SUPERVISOR_FAST_PATH_EVALUATING_GRACE_SECONDS from 10 seconds to 30 seconds.
- Introduced a pre-evaluation heartbeat to update the task's updated_at timestamp before evaluate_worker() is called.
- Added a post-evaluation heartbeat to update the task's updated_at timestamp after evaluate_worker() returns a complete:* outcome.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-19T14:51:55Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 14:51:50 UTC 2026: Code review monitoring started
Thu Feb 19 14:51:51 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 30
VULNERABILITIES: 0

Generated on: Thu Feb 19 14:51:54 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist

Code Review

This pull request effectively addresses a race condition where a recovery mechanism was incorrectly triggered for successfully completed tasks. The introduction of pre- and post-evaluation heartbeats, along with increasing the fast-path grace period, are solid fixes. The code is well-commented, explaining the rationale behind the changes. My suggestion to refactor a small piece of duplicated code to improve maintainability, as noted in my comment, remains valid and aligns with best practices for shell scripts.

.agents/scripts/supervisor/pulse.sh

Extract duplicate heartbeat SQL into _update_task_heartbeat() helper to satisfy DRY principle flagged by Gemini review on PR #1968. Both pre- and post-evaluation heartbeat calls now use the shared helper.

github-actions · 2026-02-19T15:58:53Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 15:58:49 UTC 2026: Code review monitoring started
Thu Feb 19 15:58:50 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 30
VULNERABILITIES: 0

Generated on: Thu Feb 19 15:58:52 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-19T15:59:38Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

🧹 Nitpick comments (1)

.agents/scripts/supervisor/pulse.sh (1)
527-530: Add a defensive empty-id guard to _update_task_heartbeat().

If task_id is empty, the query executes WHERE id = '' — a harmless no-op since no task has an empty primary key, but it still incurs an unnecessary DB round-trip. A one-line guard closes the gap and makes the precondition explicit.
🛡️ Proposed defensive guard
 _update_task_heartbeat() {
 	local task_id="$1"
+	[[ -z "$task_id" ]] && return 0
 	db "$SUPERVISOR_DB" "UPDATE tasks SET updated_at = strftime('%Y-%m-%dT%H:%M:%SZ','now') WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null || true
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh around lines 527 - 530, The
_update_task_heartbeat function should short-circuit when task_id is empty to
avoid a needless DB round-trip; add a one-line guard at the top of
_update_task_heartbeat that checks if task_id is empty (e.g. test for
zero-length) and returns immediately if so, then proceed with the existing db
call and sql_escape usage unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 527-530: The _update_task_heartbeat function should short-circuit
when task_id is empty to avoid a needless DB round-trip; add a one-line guard at
the top of _update_task_heartbeat that checks if task_id is empty (e.g. test for
zero-length) and returns immediately if so, then proceed with the existing db
call and sql_escape usage unchanged.

marcusquinn mentioned this pull request Feb 19, 2026

[Supervisor:marcusquinn] 0 queued, 0 working, 1 in review at 14:46 UTC #1502

Closed

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

.agents/scripts/supervisor/pulse.sh Outdated Show resolved Hide resolved

coderabbitai bot approved these changes Feb 19, 2026

View reviewed changes

fix: address review feedback for t1259

4a74f9c

Extract duplicate heartbeat SQL into _update_task_heartbeat() helper to satisfy DRY principle flagged by Gemini review on PR #1968. Both pre- and post-evaluation heartbeat calls now use the shared helper.

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

marcusquinn merged commit 61560e3 into main Feb 19, 2026
11 checks passed

marcusquinn deleted the feature/t1259 branch February 19, 2026 19:03

marcusquinn added a commit that referenced this pull request Feb 19, 2026

chore: mark t1259 complete in TODO.md (#1968)

182e5bf

marcusquinn mentioned this pull request Feb 19, 2026

t1259: Investigate and mitigate stale evaluating recovery pattern across all recent ... #1967

Closed

coderabbitai bot mentioned this pull request Feb 21, 2026

refactor: eliminate evaluating state race condition in supervisor pulse #2056

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t1259: Fix stale evaluating recovery pattern with pre/post-eval heartbeats#1968

t1259: Fix stale evaluating recovery pattern with pre/post-eval heartbeats#1968
marcusquinn merged 2 commits intomainfrom
feature/t1259

marcusquinn commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

sonarqubecloud bot commented Feb 19, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marcusquinn commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Impact

Files Changed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 19, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 19, 2026

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading