Follow-up: address sync PR review feedback#1666
Conversation
GitHub Actions ::warning:: commands truncate/mangle multi-line content. Emit a short annotation message and print full npm stderr in a collapsible ::group:: instead, so logs stay readable. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
…ation fixes Mirror the main setup-api-client changes into the consumer-repo template to prevent template drift: - Exponential backoff retry (3 attempts, 5s/10s) for transient npm errors - --legacy-peer-deps fallback on first failure - Short ::warning:: annotations with full stderr in collapsible ::group:: - Pin lru-cache@10.4.3 (was ^10.0.0) https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
…feguard Three changes to reusable-codex-run.yml to prevent work loss on timeout: 1. Pre-timeout watchdog: A background timer fires 5 minutes before max_runtime_minutes, committing and pushing any uncommitted work so it survives the job cancellation. Killed automatically if Codex finishes before the timer fires. 2. Robust parser import: Replace sys.path-based import of codex_jsonl_parser with importlib.util.spec_from_file_location. Consumer repos (e.g. Counter_Risk) have their own tools/ package with __init__.py that shadows the Workflows tools/ on sys.path, causing "No module named 'tools.codex_jsonl_parser'". 3. Commit step always runs: Add if: always() to the "Commit and push changes" step so uncommitted work is captured even on non-zero exit codes (the watchdog handles timeout, this handles failures). https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
parseCheckboxStates() and mergeCheckboxStates() only matched top-level checkboxes (^- \[), ignoring indented sub-tasks ( - \[). When PR Meta regenerated the PR body from the issue, auto-reconciled sub-task checkboxes were silently reverted to unchecked. This caused the keepalive loop to stall with rounds_without_task_completion: 8 despite the agent completing real work — PR #256 had 5 tasks auto-checked then immediately un-checked on every push. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
- P1: Add fetch/rebase before watchdog push to avoid non-fast-forward rejection when another workflow updates the branch during the run. Includes one retry with re-fetch/rebase and merge fallback. - P2: Export watchdog-saved in on.workflow_call.outputs so callers of the reusable workflow can observe the signal. - Copilot: Add git fetch before checking FETCH_HEAD to ensure it exists and is current (actions/checkout doesn't set FETCH_HEAD). - Copilot: Initialize watchdog-saved=false before background subshell so downstream consumers always get a defined value. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Update WORKFLOW_OUTPUTS.md to include the new watchdog-saved output from reusable-codex-run.yml, fixing the test_reusable_workflow_outputs_documented test. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
The body scan in extractIssueNumberFromPull was treating patterns like "Run #2615" as issue references, causing the Upsert PR body sections check to fail with a 404 when trying to fetch non-existent issues. Add a preceding-word filter to skip #NNN when preceded by common non-issue words (run, attempt, step, job, check, task, version, v). Add 12 unit tests covering the extraction logic. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
… to Claude runner Closes the three remaining feature gaps between the Claude and Codex runners identified in issue #1646: 1. **Session analysis (LLM-powered)**: Reuses analyze_codex_session.py which auto-detects Claude's plain-text session log (data_source=summary) and feeds it through the same LLM analysis pipeline for structured task completion assessment. Outputs feed into the keepalive loop. 2. **Completion checkpoint comment**: Posts a PR comment summarizing completed tasks and acceptance criteria using the shared post_completion_comment.js script. Supports both claude-prompt*.md and codex-prompt*.md file names. 3. **Error diagnostics**: Adds GITHUB_STEP_SUMMARY with error table, creates a diagnostics artifact (JSON + agent output), and posts a structured PR comment on non-transient failures with recovery guidance and log links. Uses a distinct <!-- claude-failure-notification --> marker. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Claude runner (reusable-claude-run.yml):
- Fix shell quoting of completed-tasks JSON by using env vars instead
of inline ${{ }} expansion which breaks on apostrophes in task names
- Declare OPENAI_API_KEY and CLAUDE_API_STRANSKE in workflow_call.secrets
so callers can pass them (matches Codex runner)
- Use printf instead of echo when writing PR body to disk to avoid
mangling of -n/-e prefixes or backslashes
- Add info log when falling back to codex-prompt file
Codex runner (reusable-codex-run.yml):
- Gate watchdog-saved=true on actual push success instead of emitting
it unconditionally after push attempts that may have both failed
- Use a fired-flag file so the watchdog kill only terminates the
background process if it's still sleeping (hasn't started its
commit/push work yet)
https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
All four conflicts were in reusable-codex-run.yml watchdog code where our branch has the fired-flag and push-success-gating improvements vs the unchanged main version. Kept our (HEAD) version for all. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
- Remove "task" from the non-issue prefix filter in extractIssueNumberFromPull so "Task #123" is correctly treated as an issue reference (flagged by Codex on PAEM sync PR) - Make --legacy-peer-deps retry conditional on ERESOLVE/peer-dep errors instead of only firing on the first attempt (flagged by Copilot on TMP sync PR) - Add test for "Task #N" being treated as a valid issue ref https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
The label sync workflow (maint-69-sync-labels.yml) has been failing since Feb 2 because npm install -g js-yaml installs to the global prefix which actions/github-script can't resolve. Install locally so Node's module resolution finds it in node_modules/. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Two changes to prevent the issue stranske/Counter_Risk#34 scenario where a single cancelled belt dispatcher run strands an issue: 1. Capability-check step: dispatch the belt up to 3 times with a 15s verification window after each attempt. If the dispatched run is not queued/in_progress, retry. This catches silent cancellations before the auto-pilot moves on. 2. Branch-check loop: on the 2nd+ backoff iteration, check whether any belt dispatcher run is still active. If not, re-dispatch the belt before sleeping. This makes the loop self-healing instead of passively waiting for a run that was already cancelled. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
After a progress-review action, rounds_without_task_completion was never reset. The next keepalive trigger would re-evaluate, find the counter still above threshold, enter review again, increment the counter, and repeat — permanently trapping the loop in review mode with no agent work ever running again. This affected all 4 agent PRs (#266, #267, #268, #269) which each stalled at progress-review-N with uncompleted tasks. Fix: 1. keepalive_loop.js summary function: reset rounds_without_task_completion to 0 after a review action, so the next evaluate triggers a run instead of another review. The review already provided course-correction feedback — the agent needs a chance to act on it. 2. agents-keepalive-loop.yml: add progress-review as a dependency of the summary job so the state update waits for the review to complete before persisting the reset counter. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Fixes from inline code review on PR #1665: 1. Scope dispatcher run verification to the current dispatch by filtering runs created after the dispatch timestamp (dispatchedAt). Previously an old successful run for a different issue would falsely satisfy the check, causing retries to stop early. 2. Verify all dispatch attempts including the final one. Previously the last attempt assumed success without checking, creating a false-positive path when the last run was also cancelled. 3. On verification errors (catch block), continue to the next retry attempt instead of optimistically breaking out of the loop. Transient API errors no longer mask failed dispatches. 4. Scope the branch-check re-dispatch to recent runs (last 30 minutes) instead of any active run. An unrelated dispatcher run for a different issue no longer suppresses re-dispatch. 5. Apply all auto-pilot changes to both .github/workflows/ and templates/consumer-repo/.github/workflows/ per sync conventions. 6. Use --no-save --no-package-lock for npm install in maint-69-sync-labels.yml per repo conventions. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Keep --no-save --no-package-lock flags on npm install per repo conventions. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
1. Remove progress-review from summary job needs. The dependency was unnecessary (summary doesn't consume progress-review outputs) and risked breaking state updates on non-review runs despite always(). The counter reset in keepalive_loop.js already handles the review case based on action === 'review' from the evaluate step. 2. Add event === 'workflow_dispatch' filter to dispatch verification and branch-check re-dispatch run matching. This prevents matching runs triggered by schedule, issues, or other events that happen to use the same workflow file. Applied to both .github/workflows/ and templates/consumer-repo/. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac098ab397
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const alive = recentRuns.find( | ||
| r => r.status === 'queued' || r.status === 'in_progress' | ||
| ); |
There was a problem hiding this comment.
Match belt runs to the current issue before skipping re-dispatch
In the branch-backoff re-dispatch path, this alive check treats any recent workflow_dispatch run of agents-71-codex-belt-dispatcher.yml as proof that the current issue already has an active dispatcher run, but it never verifies that run was dispatched with this issue’s force_issue value. Since auto-pilot runs are concurrent per issue, an unrelated queued/in-progress belt run can cause this issue to skip re-dispatch repeatedly and eventually hit the stall failure path even though its own dispatch never started.
Useful? React with 👍 / 👎.
| const recentRuns = runs.workflow_runs.filter( | ||
| r => new Date(r.created_at) >= dispatchedAt && | ||
| r.event === 'workflow_dispatch' |
There was a problem hiding this comment.
Use a buffered cutoff when filtering runs by created_at
The dispatch verification filter compares new Date(r.created_at) against dispatchedAt captured with millisecond precision before the dispatch call; GitHub workflow created_at timestamps are second-granularity, so a run created in the same second can be dropped as “older” even when it is the run just dispatched. That yields false negatives, triggering unnecessary retry dispatches and duplicate belt runs for the same issue.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR addresses inline code review feedback from sync PR #271 in Counter_Risk, following PR #1665 which added belt dispatcher retry logic. The changes add event type filtering to dispatch verification logic and handle review action state resets in the keepalive loop, plus a minor npm install convention fix.
Changes:
- Add
event === 'workflow_dispatch'filter to belt dispatcher run verification in capability-check and re-dispatch logic to avoid false matches with workflow_call or scheduled runs - Reset
rounds_without_task_completionto 0 after review actions so the next iteration runs the agent instead of triggering another review - Update npm install command in maint-69-sync-labels.yml to follow
--no-save --no-package-lockconvention
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/agents-auto-pilot.yml |
Added event filtering to capability-check dispatch verification (line 1968) and branch-check re-dispatch logic (line 2467) |
templates/consumer-repo/.github/workflows/agents-auto-pilot.yml |
Identical changes to main workflow file for template sync (lines 1938, 2391) |
.github/scripts/keepalive_loop.js |
Added review action handling to reset rounds_without_task_completion counter (lines 2981-2986) |
templates/consumer-repo/.github/scripts/keepalive_loop.js |
Identical changes to main script for template sync |
.github/workflows/maint-69-sync-labels.yml |
Updated npm install to use --no-save --no-package-lock flags per codebase convention (line 43) |
🤖 Keepalive Loop StatusPR #1666 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
Keepalive Work Log (click to expand)
|
Summary
Follow-up to PR #1665 addressing inline review comments from the sync-generated PR (#271 in Counter_Risk).
progress-reviewfromsummaryjobneedsto avoid breaking state updates on non-review runsevent === 'workflow_dispatch'filter to dispatch verification and re-dispatch checksContext
After #1665 was merged and synced to Counter_Risk, the sync PR (#271) received inline code review that identified:
summaryjob's newprogress-reviewdependency could prevent state updates on normal (non-review) keepalive runseventto avoid false matches against schedule/issues-triggered runsTest plan
https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6