fix(keepalive): prevent race condition from resetting iteration counter by stranske · Pull Request #129 · stranske/Workflows

stranske · 2025-12-25T02:33:03Z

Problem

PR #124's iteration counter was showing 1 after two successful Codex runs. The keepalive loop summary showed iteration:0 in state despite multiple successful runs completing.

Root Cause

The updateKeepaliveLoopSummary function used inputs.iteration (from the evaluate job at workflow START) instead of reading the current state's iteration. When multiple runs overlap or a later run's evaluate job runs before an earlier run's summary saves, stale iteration values overwrite newer ones.

In the logs, you could see:

iteration: Number('0') || 0

The evaluate job reads state at the START of the workflow, but by the time the summary job runs, another workflow may have already incremented the iteration.

Solution

The summary job now reads the iteration from the current persisted state (previousState.iteration) before calculating nextIteration, rather than trusting the potentially-stale inputs.iteration.

// OLD (buggy):

// NEW (fixed):
const currentIteration = toNumber(previousState?.iteration ?? iteration, 0);
let nextIteration = currentIteration;

Testing

Added new test: "updateKeepaliveLoopSummary uses state iteration when inputs have stale value"
All 207 tests pass

Automated Status Summary

Scope

After merging PR chore(codex): bootstrap PR for issue #101 #103 (multi-agent routing infrastructure), we need to:
1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Tasks

Acceptance criteria

Head SHA: e16dbd9
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job	Result	Logs
Agents PR meta manager	❔ in progress	View run
CI Autofix Loop	✅ success	View run
Copilot code review	❔ in progress	View run
Gate	✅ success	View run
Health 40 Sweep	✅ success	View run
Health 44 Gate Branch Protection	✅ success	View run
Health 45 Agents Guard	✅ success	View run
Health 50 Security Scan	✅ success	View run
Maint 52 Validate Workflows	✅ success	View run
PR 11 - Minimal invariant CI	✅ success	View run
Selftest CI	✅ success	View run

The summary job was using inputs.iteration (from evaluate job at workflow START) instead of reading the current state's iteration. When multiple runs overlap or a later run's evaluate runs before an earlier run's summary saves, stale iteration values overwrite newer ones. Now reads iteration from the current persisted state before calculating nextIteration, ensuring we never lose iteration progress due to timing.

github-actions · 2025-12-25T02:34:30Z

Automated Status Summary

Head SHA: 2afab0f
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	77.97%
Baseline	0.00%
Delta	+77.97%
Minimum	70.00%
Status	✅ Pass

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

No scope information available

Tasks

No tasks defined

Acceptance criteria

No acceptance criteria defined

github-actions · 2025-12-25T02:34:54Z

🤖 Keepalive Loop Status

PR #129 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Gate	success
Tasks	0/39 complete
Keepalive	❌ disabled
Autofix	❌ disabled

⚠️ Failure Tracking

Copilot

Pull request overview

This PR fixes a race condition in the keepalive loop's iteration counter where stale iteration values from the evaluate job could overwrite newer values in persisted state. When multiple workflow runs overlap, the evaluate job captures state at workflow start, but by the time the summary job runs, another workflow may have already incremented the iteration, causing the counter to reset incorrectly.

Key Changes:

Modified updateKeepaliveLoopSummary to read iteration from persisted state instead of trusting potentially stale input values
Added comprehensive test coverage for the race condition scenario

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
.github/scripts/keepalive_loop.js	Updated `updateKeepaliveLoopSummary` to prioritize `previousState.iteration` over `inputs.iteration`, preventing stale values from overwriting current state
.github/scripts/tests/keepalive-loop.test.js	Added test case that simulates race condition with stale inputs but current state, verifying iteration is preserved from state

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings December 25, 2025 02:33

stranske temporarily deployed to agent-standard December 25, 2025 02:33 — with GitHub Actions Inactive

Copilot started reviewing on behalf of stranske December 25, 2025 02:33 View session

stranske merged commit 948224c into main Dec 25, 2025
133 checks passed

stranske deleted the fix/keepalive-iteration-race branch December 25, 2025 02:35

Copilot AI reviewed Dec 25, 2025

View reviewed changes

stranske mentioned this pull request Jan 2, 2026

feat: LangChain-enhanced task completion detection for keepalive #459

Merged

60 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(keepalive): prevent race condition from resetting iteration counter#129

fix(keepalive): prevent race condition from resetting iteration counter#129
stranske merged 1 commit intomainfrom
fix/keepalive-iteration-race

stranske commented Dec 25, 2025 •

edited by agents-workflows-bot bot

Loading

Uh oh!

github-actions bot commented Dec 25, 2025

Uh oh!

github-actions bot commented Dec 25, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Dec 25, 2025 • edited by agents-workflows-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Testing

Related

Automated Status Summary

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Dec 25, 2025

Automated Status Summary

Coverage Overview

Coverage Trend

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Dec 25, 2025

🤖 Keepalive Loop Status

Current State

⚠️ Failure Tracking

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stranske commented Dec 25, 2025 •

edited by agents-workflows-bot bot

Loading