Skip to content

t1259: Fix stale evaluating recovery pattern with pre/post-eval heartbeats#1968

Merged
marcusquinn merged 2 commits intomainfrom
feature/t1259
Feb 19, 2026
Merged

t1259: Fix stale evaluating recovery pattern with pre/post-eval heartbeats#1968
marcusquinn merged 2 commits intomainfrom
feature/t1259

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 19, 2026

Summary

  • Root cause: Phase 0.7 was firing on tasks that completed successfully via the fast-path (evaluate_worker() returns complete: without calling evaluate_with_ai()). These tasks had no heartbeat protection beyond the cmd_transition('evaluating') timestamp. After 240s, Phase 0.7 fired and routed to pr_review unnecessarily, adding an extra recovery cycle to every completed task.

  • Three fixes applied:

    1. Pre-evaluation heartbeat: refresh updated_at immediately before evaluate_worker() so the 240s heartbeat window is anchored to evaluation start, not the earlier cmd_transition('evaluating') call
    2. Post-evaluation heartbeat: refresh updated_at after evaluate_worker() returns complete:* to extend the window through the quality gate and cmd_transition call
    3. Increase fast-path grace from 10s to 30s: evaluate_worker() can take 10-30s for PR discovery via GitHub API; the 10s grace caused false positives when the task was actively being evaluated

Impact

  • Eliminates unnecessary Phase 0.7 recovery cycles for fast-path completions
  • Reduces dispatch latency by removing the extra pr_review cycle
  • The heartbeat check in _diagnose_stale_root_cause() (240s window) remains the primary protection

Files Changed

  • .agents/scripts/supervisor/pulse.sh: Pre/post-eval heartbeats + increased fast-path grace period

Ref #1967

Summary by CodeRabbit

  • Chores
    • Improved task evaluation reliability by refreshing heartbeats immediately before and after evaluation runs, ensuring timestamps reflect actual start/end.
    • Increased fast-path grace window from 10s to 30s to reduce premature recovery during evaluations.
    • Made heartbeat and recovery handling more consistent across evaluation paths to reduce false recoveries and timing-related flakiness.

…artbeats (t1259)

Root cause: Phase 0.7 was firing on tasks that completed successfully via the
fast-path (evaluate_worker() returns complete: without calling evaluate_with_ai()).
These tasks had no heartbeat protection beyond the cmd_transition('evaluating')
timestamp. If the pulse was killed between evaluate_worker() returning and
cmd_transition('complete'), the task stayed in evaluating with a stale updated_at.
After 240s (heartbeat_window), Phase 0.7 fired and routed to pr_review unnecessarily.

Three fixes:
1. Pre-evaluation heartbeat: refresh updated_at immediately before evaluate_worker()
   so the 240s heartbeat window is anchored to evaluation start, not the earlier
   cmd_transition('evaluating') call.
2. Post-evaluation heartbeat: refresh updated_at after evaluate_worker() returns
   complete:* to extend the window through the quality gate and cmd_transition call.
3. Increase fast-path grace from 10s to 30s: evaluate_worker() can take 10-30s for
   PR discovery via GitHub API. The 10s grace caused false positives when the task
   was actively being evaluated but updated_at was 10-30s old.

The heartbeat check in _diagnose_stale_root_cause() (240s window) remains the
primary protection; these changes reduce the frequency of unnecessary recovery
cycles and improve dispatch latency.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Walkthrough

This change updates .agents/scripts/supervisor/pulse.sh to add a new _update_task_heartbeat() helper and perform explicit pre- and post-evaluation heartbeat writes around evaluate_worker() calls, and increases the fast-path grace for evaluating+PR from 10s to 30s across related fast-path and recovery branches.

Changes

Cohort / File(s) Summary
Supervisor Pulse Heartbeat & Grace Period Updates
.agents/scripts/supervisor/pulse.sh
Adds _update_task_heartbeat() to write updated_at; calls it immediately before evaluate_worker() and after a successful completion; raises SUPERVISOR_FAST_PATH_EVALUATING_GRACE_SECONDS from 10 → 30s and applies the larger grace across Phase 1, 0.7/0.8 fast-paths and PR-related branches; updates comments documenting t1259 usage and heartbeat anchoring.

Sequence Diagram(s)

sequenceDiagram
    participant Supervisor
    participant Database
    participant Evaluator as evaluate_worker()

    Supervisor->>Database: _update_task_heartbeat() (set updated_at = now)
    Supervisor->>Evaluator: start evaluate_worker()
    Evaluator->>Supervisor: running / progress updates (optional)
    Evaluator-->>Supervisor: outcome = "complete"
    Supervisor->>Database: _update_task_heartbeat() (set updated_at = now)
    Supervisor->>Supervisor: apply fast-path grace (30s) / recovery checks
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

⏱️ A tiny heartbeat, written twice,
Thirty seconds calm the noisy splice.
Pre and post the eval takes place,
Timestamps steady, races trace—
Supervisor hums, the pulse keeps pace. 💓

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: fixing stale evaluating recovery by adding pre/post-evaluation heartbeats, which aligns perfectly with the changeset's core purpose.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1259

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where tasks completing via the fast-path were erroneously flagged for recovery due to an outdated heartbeat mechanism. By introducing explicit pre- and post-evaluation heartbeats, the system now accurately reflects the active evaluation state, preventing unnecessary recovery cycles. Additionally, the fast-path grace period has been extended to accommodate the time required for PR discovery, further reducing false positives and improving overall dispatch latency.

Highlights

  • Pre-evaluation Heartbeat: Implemented a heartbeat to refresh the task's updated_at timestamp immediately before evaluate_worker() is called, ensuring the evaluation start time is correctly anchored for fast-path completions.
  • Post-evaluation Heartbeat: Added a heartbeat to refresh the task's updated_at timestamp after evaluate_worker() returns a complete:* outcome, extending the grace window through the quality gate and final transition.
  • Increased Fast-Path Grace Period: Increased the SUPERVISOR_FAST_PATH_EVALUATING_GRACE_SECONDS from 10 seconds to 30 seconds to prevent false recoveries for tasks actively undergoing PR discovery via the GitHub API.
Changelog
  • .agents/scripts/supervisor/pulse.sh
    • Updated comments to reflect the t1259 fix for evaluation heartbeats.
    • Modified the default SUPERVISOR_FAST_PATH_EVALUATING_GRACE_SECONDS from 10 seconds to 30 seconds.
    • Introduced a pre-evaluation heartbeat to update the task's updated_at timestamp before evaluate_worker() is called.
    • Added a post-evaluation heartbeat to update the task's updated_at timestamp after evaluate_worker() returns a complete:* outcome.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 14:51:50 UTC 2026: Code review monitoring started
Thu Feb 19 14:51:51 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 30
  • VULNERABILITIES: 0

Generated on: Thu Feb 19 14:51:54 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a race condition where a recovery mechanism was incorrectly triggered for successfully completed tasks. The introduction of pre- and post-evaluation heartbeats, along with increasing the fast-path grace period, are solid fixes. The code is well-commented, explaining the rationale behind the changes. My suggestion to refactor a small piece of duplicated code to improve maintainability, as noted in my comment, remains valid and aligns with best practices for shell scripts.

Extract duplicate heartbeat SQL into _update_task_heartbeat() helper
to satisfy DRY principle flagged by Gemini review on PR #1968.
Both pre- and post-evaluation heartbeat calls now use the shared helper.
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 15:58:49 UTC 2026: Code review monitoring started
Thu Feb 19 15:58:50 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 30
  • VULNERABILITIES: 0

Generated on: Thu Feb 19 15:58:52 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.agents/scripts/supervisor/pulse.sh (1)

527-530: Add a defensive empty-id guard to _update_task_heartbeat().

If task_id is empty, the query executes WHERE id = '' — a harmless no-op since no task has an empty primary key, but it still incurs an unnecessary DB round-trip. A one-line guard closes the gap and makes the precondition explicit.

🛡️ Proposed defensive guard
 _update_task_heartbeat() {
 	local task_id="$1"
+	[[ -z "$task_id" ]] && return 0
 	db "$SUPERVISOR_DB" "UPDATE tasks SET updated_at = strftime('%Y-%m-%dT%H:%M:%SZ','now') WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null || true
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh around lines 527 - 530, The
_update_task_heartbeat function should short-circuit when task_id is empty to
avoid a needless DB round-trip; add a one-line guard at the top of
_update_task_heartbeat that checks if task_id is empty (e.g. test for
zero-length) and returns immediately if so, then proceed with the existing db
call and sql_escape usage unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 527-530: The _update_task_heartbeat function should short-circuit
when task_id is empty to avoid a needless DB round-trip; add a one-line guard at
the top of _update_task_heartbeat that checks if task_id is empty (e.g. test for
zero-length) and returns immediately if so, then proceed with the existing db
call and sql_escape usage unchanged.

@marcusquinn marcusquinn merged commit 61560e3 into main Feb 19, 2026
11 checks passed
@marcusquinn marcusquinn deleted the feature/t1259 branch February 19, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant