Skip to content

fix(keepalive): prevent race condition from resetting iteration counter#129

Merged
stranske merged 1 commit intomainfrom
fix/keepalive-iteration-race
Dec 25, 2025
Merged

fix(keepalive): prevent race condition from resetting iteration counter#129
stranske merged 1 commit intomainfrom
fix/keepalive-iteration-race

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Dec 25, 2025

Problem

PR #124's iteration counter was showing 1 after two successful Codex runs. The keepalive loop summary showed iteration:0 in state despite multiple successful runs completing.

Root Cause

The updateKeepaliveLoopSummary function used inputs.iteration (from the evaluate job at workflow START) instead of reading the current state's iteration. When multiple runs overlap or a later run's evaluate job runs before an earlier run's summary saves, stale iteration values overwrite newer ones.

In the logs, you could see:

iteration: Number('0') || 0

The evaluate job reads state at the START of the workflow, but by the time the summary job runs, another workflow may have already incremented the iteration.

Solution

The summary job now reads the iteration from the current persisted state (previousState.iteration) before calculating nextIteration, rather than trusting the potentially-stale inputs.iteration.

// OLD (buggy):

// NEW (fixed):
const currentIteration = toNumber(previousState?.iteration ?? iteration, 0);
let nextIteration = currentIteration;

Testing

  • Added new test: "updateKeepaliveLoopSummary uses state iteration when inputs have stale value"
  • All 207 tests pass

Related

Discovered while investigating #124 iteration counter not incrementing

Automated Status Summary

Scope

  • After merging PR chore(codex): bootstrap PR for issue #101 #103 (multi-agent routing infrastructure), we need to:
  • 1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
  • 2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
  • 3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
  • 4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Tasks

  • ### Pipeline Validation
  • After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
  • Verify task appendix appears in Codex prompt (check workflow logs)
  • Verify Codex works on actual tasks (not random infrastructure work)
  • Verify keepalive comment updates with iteration progress
  • ### GITHUB_STEP_SUMMARY
  • Add step summary output to agents-keepalive-loop.yml after agent run
  • Include: iteration number, tasks completed, files changed, outcome
  • Ensure summary is visible in workflow run UI
  • ### Conditional Status Summary
  • Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
  • When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
  • Keep Scope/Tasks/Acceptance checkboxes for all cases
  • Pass agent type from workflow to the update_body job
  • ### Comment Pattern Cleanup
  • For CLI agents (agent:* label):
  • Suppress <!-- gate-summary: --> comment posting (use step summary instead)
  • Suppress <!-- keepalive-round: N --> instruction comments (task appendix replaces this)
  • Update <!-- keepalive-loop-summary --> to be the single source of truth
  • Ensure state marker is embedded in the summary comment (not separate)
  • For UI Codex (no agent:* label):
  • Keep existing comment patterns (instruction comments, connector bot reports)
  • Keep <!-- gate-summary: --> comment
  • Add agent_type output to detect job so downstream workflows know the mode
  • Update agents-pr-meta.yml to conditionally skip gate summary for CLI agent PRs

Acceptance criteria

  • CLI agent receives explicit tasks in prompt and works on them
  • Iteration results visible in Actions workflow run summary
  • PR body shows checkboxes but not workflow clutter when using CLI agents
  • UI Codex path (no agent label) continues to show full status summary
  • CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
  • State tracking is consolidated in the summary comment, not scattered
  • ## Dependencies
  • - Requires PR chore(codex): bootstrap PR for issue #101 #103 to be merged first
  • Head SHA: dbe2ff0
  • Latest Runs: ❔ in progress — Agents PR meta manager
  • Required: gate: ⏸️ not started
  • | Workflow / Job | Result | Logs |
  • |----------------|--------|------|
  • | Agents PR meta manager | ❔ in progress | View run |

Head SHA: e16dbd9
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job Result Logs
Agents PR meta manager ❔ in progress View run
CI Autofix Loop ✅ success View run
Copilot code review ❔ in progress View run
Gate ✅ success View run
Health 40 Sweep ✅ success View run
Health 44 Gate Branch Protection ✅ success View run
Health 45 Agents Guard ✅ success View run
Health 50 Security Scan ✅ success View run
Maint 52 Validate Workflows ✅ success View run
PR 11 - Minimal invariant CI ✅ success View run
Selftest CI ✅ success View run

The summary job was using inputs.iteration (from evaluate job at workflow
START) instead of reading the current state's iteration. When multiple
runs overlap or a later run's evaluate runs before an earlier run's
summary saves, stale iteration values overwrite newer ones.

Now reads iteration from the current persisted state before calculating
nextIteration, ensuring we never lose iteration progress due to timing.
Copilot AI review requested due to automatic review settings December 25, 2025 02:33
@github-actions
Copy link
Copy Markdown
Contributor

Automated Status Summary

Head SHA: 2afab0f
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 77.97%
Baseline 0.00%
Delta +77.97%
Minimum 70.00%
Status ✅ Pass

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

No scope information available

Tasks

  • No tasks defined

Acceptance criteria

  • No acceptance criteria defined

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Keepalive Loop Status

PR #129 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Gate success
Tasks 0/39 complete
Keepalive ❌ disabled
Autofix ❌ disabled

⚠️ Failure Tracking

| Consecutive failures | 1/3 |
| Reason | missing-agent-label |

@stranske stranske merged commit 948224c into main Dec 25, 2025
133 checks passed
@stranske stranske deleted the fix/keepalive-iteration-race branch December 25, 2025 02:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition in the keepalive loop's iteration counter where stale iteration values from the evaluate job could overwrite newer values in persisted state. When multiple workflow runs overlap, the evaluate job captures state at workflow start, but by the time the summary job runs, another workflow may have already incremented the iteration, causing the counter to reset incorrectly.

Key Changes:

  • Modified updateKeepaliveLoopSummary to read iteration from persisted state instead of trusting potentially stale input values
  • Added comprehensive test coverage for the race condition scenario

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
.github/scripts/keepalive_loop.js Updated updateKeepaliveLoopSummary to prioritize previousState.iteration over inputs.iteration, preventing stale values from overwriting current state
.github/scripts/tests/keepalive-loop.test.js Added test case that simulates race condition with stale inputs but current state, verifying iteration is preserved from state

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants