Conversation
Automated sync from stranske/Workflows Template hash: bc1620a3c7c4 Changes synced from sync-manifest.yml
🤖 Keepalive Loop StatusPR #272 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
Keepalive Work Log (click to expand)
|
There was a problem hiding this comment.
Pull request overview
This PR syncs workflow templates from the stranske/Workflows repository, updating the auto-pilot orchestration and keepalive loop logic. The changes enhance the reliability of belt dispatcher workflows by adding retry logic with verification, implementing automatic re-dispatch when dispatchers fail, and improving keepalive loop behavior after review actions.
Changes:
- Enhanced belt dispatcher with 3-attempt retry loop and run verification to handle GitHub Actions silent cancellations (observed in issue #34)
- Added automatic belt dispatcher re-dispatch in branch-check backoff logic when no active dispatcher is found
- Reset rounds_without_task_completion counter to 0 after review actions to give agents a chance to act on feedback
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| .github/workflows/agents-auto-pilot.yml | Added belt dispatcher retry logic with verification (3 attempts, 15s wait + verification per attempt) and re-dispatch logic in branch-check backoff to handle missing/cancelled dispatchers |
| .github/scripts/keepalive_loop.js | Added special case to reset rounds_without_task_completion counter after review actions, matching the existing force_retry pattern |
| // Re-dispatch the belt if no recent dispatcher run is active | ||
| // for this issue. Only consider runs created in the last 30 | ||
| // minutes to avoid matching stale runs for other issues. | ||
| let redispatched = false; | ||
| try { | ||
| const { data: runs } = await withRetry((client) => | ||
| client.rest.actions.listWorkflowRuns({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| workflow_id: 'agents-71-codex-belt-dispatcher.yml', | ||
| per_page: 10, | ||
| }) | ||
| ); | ||
| const cutoff = new Date(Date.now() - 30 * 60 * 1000); | ||
| const recentRuns = runs.workflow_runs.filter( | ||
| r => new Date(r.created_at) >= cutoff && | ||
| r.event === 'workflow_dispatch' | ||
| ); | ||
| const alive = recentRuns.find( | ||
| r => r.status === 'queued' || r.status === 'in_progress' | ||
| ); | ||
| if (!alive) { | ||
| core.info( | ||
| `No active belt dispatcher run in last 30m ` + | ||
| `(${recentRuns.length} recent runs checked); re-dispatching` | ||
| ); | ||
| const { data: repoInfo } = await withRetry((client) => | ||
| client.rest.repos.get({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| }) | ||
| ); | ||
| const dispatchRef = repoInfo.default_branch || 'main'; | ||
| await withRetry((client) => | ||
| client.rest.actions.createWorkflowDispatch({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| workflow_id: 'agents-71-codex-belt-dispatcher.yml', | ||
| ref: dispatchRef, | ||
| inputs: { | ||
| agent_key: agentKey, | ||
| force_issue: String(issueNumber), | ||
| dry_run: 'false', | ||
| }, | ||
| }) | ||
| ); | ||
| redispatched = true; | ||
| core.info(`Re-dispatched belt for issue #${issueNumber}`); | ||
| } else { | ||
| core.info( | ||
| `Belt dispatcher run ${alive.id} still ${alive.status}; skipping re-dispatch` | ||
| ); | ||
| } |
There was a problem hiding this comment.
The re-dispatch logic checks for any active belt dispatcher run in the last 30 minutes but doesn't verify it's for the current issue. The comment at line 2376 states "Re-dispatch the belt if no recent dispatcher run is active for this issue", but the code at lines 2393-2395 checks for any run with status 'queued' or 'in_progress' without filtering by the issue number.
This could cause the re-dispatch to be skipped when another issue's belt dispatcher is running, even though the current issue might need its own dispatcher to be re-dispatched. Consider filtering the recentRuns to only include runs for the current issue, possibly by checking the workflow run's inputs or associated pull request.
| } catch (checkError) { | ||
| core.warning( | ||
| `Could not verify dispatcher run after attempt ${attempt}: ` + | ||
| `${checkError?.message}; status unknown, will retry.` |
There was a problem hiding this comment.
The error message at line 1965 always says "will retry" regardless of which attempt it is. On the final attempt (attempt 3), this message is misleading because there won't be another retry. Consider making this message conditional like the one at line 1960, showing either "will retry" or "no more attempts" based on whether attempt is less than maxDispatchAttempts.
| `${checkError?.message}; status unknown, will retry.` | |
| `${checkError?.message}; ` + | |
| (attempt < maxDispatchAttempts ? 'status unknown, will retry.' : 'status unknown, no more attempts.') |
Sync Summary
Files Updated
Files Skipped
Review Checklist
Source: stranske/Workflows
Manifest:
.github/sync-manifest.yml