chore(codex): bootstrap PR for issue #266 by stranske · Pull Request #267 · stranske/Workflows

stranske · 2025-12-29T03:47:08Z

Automated Status Summary

Scope

Testing the autofix system on PR test: autofix system validation with intentional failures Manager-Database#84 revealed two issues preventing the system from fully resolving CI failures:
1. Bug: Safe sweep pattern matching fails when repo has Python at root
2. Gap: Non-agent PRs never get Codex fallback, even when quick autofix partially succeeds

Tasks

Fix safe sweep pattern for root directory Python projects
Add unit test for ./** vs ** pattern matching
Modify agents-autofix-loop to check for partial autofix success
Add autofix:escalated label definition
Limit Codex attempts for auto-escalated PRs (1-2 max)
Update documentation on autofix behavior

Acceptance criteria

Autofix can push fixes when repo has Python at root
Human PRs with partial autofix success get Codex dispatch
Auto-escalated PRs have limited retry count
autofix:escalated label applied when Codex auto-dispatched
PRs with autofix: false still opt out completely
## Test PR
test: autofix system validation with intentional failures Manager-Database#84 - Contains intentional lint, mypy, and test failures for validation

github-actions · 2025-12-29T03:47:26Z

🤖 Keepalive Loop Status

PR #267 | Agent: Codex | Iteration 2/5

Current State

Metric	Value
Iteration progress	[####------] 2/5
Action	run (ready)
Gate	success
Tasks	1/13 complete
Keepalive	✅ enabled
Autofix	❌ disabled

Last Codex Run

Result	Value
Status	✅ Success
Changes	⚪ No changes

Codex output:

I reviewed the latest commit 4d40bf2 (changes in .github/scripts/merge_manager.js, .github/scripts/__tests__/merge-manager.test.js, and .workflows-lib). Those changes don’t map to any of the current PR tasks, so there are no task checkboxes to mark complete based on that commit. Blockers: -...

Copilot

Pull request overview

This PR creates a bootstrap file for the Codex agent to track work on issue #266. The change follows the established pattern used throughout the repository's agent automation system.

Creates a new bootstrap markdown file in the agents directory
Follows the naming convention codex-{issue-number}.md used by other agent tracking files
Contains standard bootstrap comment format used to initialize agent work on GitHub issues

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2025-12-29T03:54:29Z

✅ Codex Completion Checkpoint

Iteration: 0
Commit: 4d40bf2
Recorded: 2025-12-29T03:54:29.437Z

No new completions recorded this round.

About this comment

This comment is automatically generated to track task completions.
The Automated Status Summary reads these checkboxes to update PR progress.
Do not edit this comment manually.

stranske · 2025-12-29T04:59:54Z

Closing this PR - Codex modified the wrong files (merge_manager.js instead of the autofix workflows).

Stashed content:

Patch saved locally
Keepalive findings added to issue fix(autofix): Safe sweep pattern bug + auto-dispatch Codex for partial fixes #266 (see comment)

Next steps:

Address keepalive weaknesses identified in issue comment
Re-run Codex on issue fix(autofix): Safe sweep pattern bug + auto-dispatch Codex for partial fixes #266 with improved file targeting hints
The merge_manager.js changes may be useful separately - they normalize ./** patterns in the automerge allowlist system

After a progress-review action, rounds_without_task_completion was never reset. The next keepalive trigger would re-evaluate, find the counter still above threshold, enter review again, increment the counter, and repeat — permanently trapping the loop in review mode with no agent work ever running again. This affected all 4 agent PRs (#266, #267, #268, #269) which each stalled at progress-review-N with uncompleted tasks. Fix: 1. keepalive_loop.js summary function: reset rounds_without_task_completion to 0 after a review action, so the next evaluate triggers a run instead of another review. The review already provided course-correction feedback — the agent needs a chance to act on it. 2. agents-keepalive-loop.yml: add progress-review as a dependency of the summary job so the state update waits for the review to complete before persisting the reset counter. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

* fix: avoid multi-line stderr in workflow annotations GitHub Actions ::warning:: commands truncate/mangle multi-line content. Emit a short annotation message and print full npm stderr in a collapsible ::group:: instead, so logs stay readable. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * fix: sync template setup-api-client with retry-with-backoff and annotation fixes Mirror the main setup-api-client changes into the consumer-repo template to prevent template drift: - Exponential backoff retry (3 attempts, 5s/10s) for transient npm errors - --legacy-peer-deps fallback on first failure - Short ::warning:: annotations with full stderr in collapsible ::group:: - Pin lru-cache@10.4.3 (was ^10.0.0) https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * fix: pre-timeout watchdog, robust parser import, and always-commit safeguard Three changes to reusable-codex-run.yml to prevent work loss on timeout: 1. Pre-timeout watchdog: A background timer fires 5 minutes before max_runtime_minutes, committing and pushing any uncommitted work so it survives the job cancellation. Killed automatically if Codex finishes before the timer fires. 2. Robust parser import: Replace sys.path-based import of codex_jsonl_parser with importlib.util.spec_from_file_location. Consumer repos (e.g. Counter_Risk) have their own tools/ package with __init__.py that shadows the Workflows tools/ on sys.path, causing "No module named 'tools.codex_jsonl_parser'". 3. Commit step always runs: Add if: always() to the "Commit and push changes" step so uncommitted work is captured even on non-zero exit codes (the watchdog handles timeout, this handles failures). https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * fix: preserve indented checkbox states in PR Meta body sync parseCheckboxStates() and mergeCheckboxStates() only matched top-level checkboxes (^- \[), ignoring indented sub-tasks ( - \[). When PR Meta regenerated the PR body from the issue, auto-reconciled sub-task checkboxes were silently reverted to unchecked. This caused the keepalive loop to stall with rounds_without_task_completion: 8 despite the agent completing real work — PR #256 had 5 tasks auto-checked then immediately un-checked on every push. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * chore: sync template scripts * fix: address review comments on watchdog pre-timeout mechanism - P1: Add fetch/rebase before watchdog push to avoid non-fast-forward rejection when another workflow updates the branch during the run. Includes one retry with re-fetch/rebase and merge fallback. - P2: Export watchdog-saved in on.workflow_call.outputs so callers of the reusable workflow can observe the signal. - Copilot: Add git fetch before checking FETCH_HEAD to ensure it exists and is current (actions/checkout doesn't set FETCH_HEAD). - Copilot: Initialize watchdog-saved=false before background subshell so downstream consumers always get a defined value. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * docs: add watchdog-saved to workflow outputs reference Update WORKFLOW_OUTPUTS.md to include the new watchdog-saved output from reusable-codex-run.yml, fixing the test_reusable_workflow_outputs_documented test. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * fix: skip non-issue refs like "Run #NNN" in extractIssueNumberFromPull The body scan in extractIssueNumberFromPull was treating patterns like "Run #2615" as issue references, causing the Upsert PR body sections check to fail with a 404 when trying to fetch non-existent issues. Add a preceding-word filter to skip #NNN when preceded by common non-issue words (run, attempt, step, job, check, task, version, v). Add 12 unit tests covering the extraction logic. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * feat: add session analysis, completion comment, and error diagnostics to Claude runner Closes the three remaining feature gaps between the Claude and Codex runners identified in issue #1646: 1. **Session analysis (LLM-powered)**: Reuses analyze_codex_session.py which auto-detects Claude's plain-text session log (data_source=summary) and feeds it through the same LLM analysis pipeline for structured task completion assessment. Outputs feed into the keepalive loop. 2. **Completion checkpoint comment**: Posts a PR comment summarizing completed tasks and acceptance criteria using the shared post_completion_comment.js script. Supports both claude-prompt*.md and codex-prompt*.md file names. 3. **Error diagnostics**: Adds GITHUB_STEP_SUMMARY with error table, creates a diagnostics artifact (JSON + agent output), and posts a structured PR comment on non-transient failures with recovery guidance and log links. Uses a distinct  marker. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * fix: address code review feedback from Codex and Copilot Claude runner (reusable-claude-run.yml): - Fix shell quoting of completed-tasks JSON by using env vars instead of inline ${{ }} expansion which breaks on apostrophes in task names - Declare OPENAI_API_KEY and CLAUDE_API_STRANSKE in workflow_call.secrets so callers can pass them (matches Codex runner) - Use printf instead of echo when writing PR body to disk to avoid mangling of -n/-e prefixes or backslashes - Add info log when falling back to codex-prompt file Codex runner (reusable-codex-run.yml): - Gate watchdog-saved=true on actual push success instead of emitting it unconditionally after push attempts that may have both failed - Use a fired-flag file so the watchdog kill only terminates the background process if it's still sleeping (hasn't started its commit/push work yet) https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * fix: address sync PR review feedback from coding agents - Remove "task" from the non-issue prefix filter in extractIssueNumberFromPull so "Task #123" is correctly treated as an issue reference (flagged by Codex on PAEM sync PR) - Make --legacy-peer-deps retry conditional on ERESOLVE/peer-dep errors instead of only firing on the first attempt (flagged by Copilot on TMP sync PR) - Add test for "Task #N" being treated as a valid issue ref https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * fix: install js-yaml locally instead of globally in label sync workflow The label sync workflow (maint-69-sync-labels.yml) has been failing since Feb 2 because npm install -g js-yaml installs to the global prefix which actions/github-script can't resolve. Install locally so Node's module resolution finds it in node_modules/. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * Fix auto-pilot stall when belt dispatcher run is cancelled Two changes to prevent the issue stranske/Counter_Risk#34 scenario where a single cancelled belt dispatcher run strands an issue: 1. Capability-check step: dispatch the belt up to 3 times with a 15s verification window after each attempt. If the dispatched run is not queued/in_progress, retry. This catches silent cancellations before the auto-pilot moves on. 2. Branch-check loop: on the 2nd+ backoff iteration, check whether any belt dispatcher run is still active. If not, re-dispatch the belt before sleeping. This makes the loop self-healing instead of passively waiting for a run that was already cancelled. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * Fix keepalive loop stuck in perpetual review after progress stall After a progress-review action, rounds_without_task_completion was never reset. The next keepalive trigger would re-evaluate, find the counter still above threshold, enter review again, increment the counter, and repeat — permanently trapping the loop in review mode with no agent work ever running again. This affected all 4 agent PRs (#266, #267, #268, #269) which each stalled at progress-review-N with uncompleted tasks. Fix: 1. keepalive_loop.js summary function: reset rounds_without_task_completion to 0 after a review action, so the next evaluate triggers a run instead of another review. The review already provided course-correction feedback — the agent needs a chance to act on it. 2. agents-keepalive-loop.yml: add progress-review as a dependency of the summary job so the state update waits for the review to complete before persisting the reset counter. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 * Address review feedback on belt dispatch verification Fixes from inline code review on PR #1665: 1. Scope dispatcher run verification to the current dispatch by filtering runs created after the dispatch timestamp (dispatchedAt). Previously an old successful run for a different issue would falsely satisfy the check, causing retries to stop early. 2. Verify all dispatch attempts including the final one. Previously the last attempt assumed success without checking, creating a false-positive path when the last run was also cancelled. 3. On verification errors (catch block), continue to the next retry attempt instead of optimistically breaking out of the loop. Transient API errors no longer mask failed dispatches. 4. Scope the branch-check re-dispatch to recent runs (last 30 minutes) instead of any active run. An unrelated dispatcher run for a different issue no longer suppresses re-dispatch. 5. Apply all auto-pilot changes to both .github/workflows/ and templates/consumer-repo/.github/workflows/ per sync conventions. 6. Use --no-save --no-package-lock for npm install in maint-69-sync-labels.yml per repo conventions. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6 --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

chore(codex): bootstrap PR for issue #266

b36d171

stranske added the agent:codex Agent-created issues from Codex label Dec 29, 2025

Copilot AI review requested due to automatic review settings December 29, 2025 03:47

stranske added agents:keepalive Use to initiate keepalive functionality with agents autofix Opt-in automated formatting & lint remediation labels Dec 29, 2025

stranske temporarily deployed to agent-standard December 29, 2025 03:47 — with GitHub Actions Inactive

Copilot started reviewing on behalf of stranske December 29, 2025 03:47 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

chore(codex-keepalive): apply updates (PR #267)

4d40bf2

agents-workflows-bot bot temporarily deployed to agent-standard December 29, 2025 03:54 Inactive

stranske mentioned this pull request Dec 29, 2025

fix(autofix): Safe sweep pattern bug + auto-dispatch Codex for partial fixes #266

Closed

11 tasks

stranske closed this Dec 29, 2025

stranske temporarily deployed to agent-standard December 29, 2025 05:06 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(codex): bootstrap PR for issue #266#267

chore(codex): bootstrap PR for issue #266#267
stranske wants to merge 2 commits intomainfrom
codex/issue-266

stranske commented Dec 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 29, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

stranske commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Dec 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

Last Codex Run

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions bot commented Dec 29, 2025

✅ Codex Completion Checkpoint

Uh oh!

stranske commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stranske commented Dec 29, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Dec 29, 2025 •

edited

Loading