feat: close Claude runner feature gaps vs Codex runner#1661
Conversation
GitHub Actions ::warning:: commands truncate/mangle multi-line content. Emit a short annotation message and print full npm stderr in a collapsible ::group:: instead, so logs stay readable. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
…ation fixes Mirror the main setup-api-client changes into the consumer-repo template to prevent template drift: - Exponential backoff retry (3 attempts, 5s/10s) for transient npm errors - --legacy-peer-deps fallback on first failure - Short ::warning:: annotations with full stderr in collapsible ::group:: - Pin lru-cache@10.4.3 (was ^10.0.0) https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
…feguard Three changes to reusable-codex-run.yml to prevent work loss on timeout: 1. Pre-timeout watchdog: A background timer fires 5 minutes before max_runtime_minutes, committing and pushing any uncommitted work so it survives the job cancellation. Killed automatically if Codex finishes before the timer fires. 2. Robust parser import: Replace sys.path-based import of codex_jsonl_parser with importlib.util.spec_from_file_location. Consumer repos (e.g. Counter_Risk) have their own tools/ package with __init__.py that shadows the Workflows tools/ on sys.path, causing "No module named 'tools.codex_jsonl_parser'". 3. Commit step always runs: Add if: always() to the "Commit and push changes" step so uncommitted work is captured even on non-zero exit codes (the watchdog handles timeout, this handles failures). https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
parseCheckboxStates() and mergeCheckboxStates() only matched top-level checkboxes (^- \[), ignoring indented sub-tasks ( - \[). When PR Meta regenerated the PR body from the issue, auto-reconciled sub-task checkboxes were silently reverted to unchecked. This caused the keepalive loop to stall with rounds_without_task_completion: 8 despite the agent completing real work — PR #256 had 5 tasks auto-checked then immediately un-checked on every push. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
- P1: Add fetch/rebase before watchdog push to avoid non-fast-forward rejection when another workflow updates the branch during the run. Includes one retry with re-fetch/rebase and merge fallback. - P2: Export watchdog-saved in on.workflow_call.outputs so callers of the reusable workflow can observe the signal. - Copilot: Add git fetch before checking FETCH_HEAD to ensure it exists and is current (actions/checkout doesn't set FETCH_HEAD). - Copilot: Initialize watchdog-saved=false before background subshell so downstream consumers always get a defined value. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Update WORKFLOW_OUTPUTS.md to include the new watchdog-saved output from reusable-codex-run.yml, fixing the test_reusable_workflow_outputs_documented test. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
The body scan in extractIssueNumberFromPull was treating patterns like "Run #2615" as issue references, causing the Upsert PR body sections check to fail with a 404 when trying to fetch non-existent issues. Add a preceding-word filter to skip #NNN when preceded by common non-issue words (run, attempt, step, job, check, task, version, v). Add 12 unit tests covering the extraction logic. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
… to Claude runner Closes the three remaining feature gaps between the Claude and Codex runners identified in issue #1646: 1. **Session analysis (LLM-powered)**: Reuses analyze_codex_session.py which auto-detects Claude's plain-text session log (data_source=summary) and feeds it through the same LLM analysis pipeline for structured task completion assessment. Outputs feed into the keepalive loop. 2. **Completion checkpoint comment**: Posts a PR comment summarizing completed tasks and acceptance criteria using the shared post_completion_comment.js script. Supports both claude-prompt*.md and codex-prompt*.md file names. 3. **Error diagnostics**: Adds GITHUB_STEP_SUMMARY with error table, creates a diagnostics artifact (JSON + agent output), and posts a structured PR comment on non-transient failures with recovery guidance and log links. Uses a distinct <!-- claude-failure-notification --> marker. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Automated Status SummaryHead SHA: fb5628a
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopePR #1645 aligned the Claude runner ( Context for AgentRelated Issues/PRsTasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #1661 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
Keepalive Work Log (click to expand)
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 96e5f6ec2e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Aligns the Claude reusable runner with the Codex runner by adding session/LLM analysis outputs, completion checkpoint comments, and richer error diagnostics; also includes several runner/PR-meta robustness fixes shared across templates and the main repo.
Changes:
- Add Claude session detection + LLM task-completion analysis, plus completion checkpoint PR comments and failure diagnostics/commenting.
- Improve resilience and observability: npm install retry/backoff + grouped stderr, Codex pre-timeout watchdog output, and PR body checkbox/issue parsing fixes.
- Update PR-meta scripts (checkbox state merge + issue number extraction) and add targeted tests.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/reusable-claude-run.yml |
Adds Claude session analysis + LLM analysis, completion checkpoint comment, and failure diagnostics/PR comment flow. |
.github/workflows/reusable-codex-run.yml |
Adds watchdog output and watchdog lifecycle handling; makes commit/push step run unconditionally. |
.github/actions/setup-api-client/action.yml |
Improves npm install resilience with retries/backoff and grouped stderr output. |
templates/consumer-repo/.github/actions/setup-api-client/action.yml |
Mirrors setup-api-client retry/backoff + grouped stderr for template sync. |
.github/scripts/agents_pr_meta_keepalive.js |
Fixes issue-number extraction to avoid false positives like “Run #123”. |
templates/consumer-repo/.github/scripts/agents_pr_meta_keepalive.js |
Mirrors issue-number extraction fix for consumer templates. |
.github/scripts/__tests__/agents-pr-meta-keepalive.test.js |
Adds unit tests for extractIssueNumberFromPull behavior and regressions. |
.github/scripts/agents_pr_meta_update_body.js |
Preserves checkbox indentation and detects indented checkboxes when merging PR body states. |
templates/consumer-repo/.github/scripts/agents_pr_meta_update_body.js |
Mirrors checkbox parsing/merge improvements for consumer templates. |
docs/ci/WORKFLOW_OUTPUTS.md |
Documents the new Codex workflow output watchdog-saved. |
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | ||
| CLAUDE_API_STRANSKE: ${{ secrets.CLAUDE_API_STRANSKE }} | ||
| run: | |
There was a problem hiding this comment.
This step references secrets.OPENAI_API_KEY and secrets.CLAUDE_API_STRANSKE, but reusable-claude-run.yml does not declare these under on.workflow_call.secrets. In reusable workflows, undeclared secrets cannot be passed by callers, so these will be empty (or the workflow may fail validation depending on the runner). Add both secrets to the workflow_call.secrets block (matching reusable-codex-run.yml) or remove these env vars if they are intentionally unsupported here.
| fi | ||
|
|
||
| # Save PR body to temp file | ||
| echo "$PR_BODY" > pr_body.md |
There was a problem hiding this comment.
Writing the PR body to disk via echo "$PR_BODY" > pr_body.md is not robust: echo can treat leading -n / -e as options and may mangle backslashes/newlines depending on shell implementation. Use printf '%s' "$PR_BODY" > pr_body.md (or a heredoc) to preserve the body exactly for downstream parsing.
| echo "$PR_BODY" > pr_body.md | |
| printf '%s' "$PR_BODY" > pr_body.md |
|
|
||
| const result = await postCompletionComment({ | ||
| github, context, core, | ||
| inputs: { | ||
| pr_number: process.env.PR_NUMBER, | ||
| commit_sha: process.env.COMMIT_SHA, | ||
| iteration: process.env.ITERATION, | ||
| prompt_file: promptFile, | ||
| }, | ||
| }); |
There was a problem hiding this comment.
postCompletionComment currently has Codex-specific logic that prefers codex-prompt-${prNumber}.md when it exists, even if inputs.prompt_file points at a Claude prompt. If both Claude + Codex prompt artifacts are present in the workspace, this can post a checkpoint comment based on the wrong prompt file. Consider updating post_completion_comment.js to respect the provided prompt_file (or to resolve PR-specific variants based on that filename prefix), so Claude runs cannot be overridden by stale Codex prompt files.
Claude runner (reusable-claude-run.yml):
- Fix shell quoting of completed-tasks JSON by using env vars instead
of inline ${{ }} expansion which breaks on apostrophes in task names
- Declare OPENAI_API_KEY and CLAUDE_API_STRANSKE in workflow_call.secrets
so callers can pass them (matches Codex runner)
- Use printf instead of echo when writing PR body to disk to avoid
mangling of -n/-e prefixes or backslashes
- Add info log when falling back to codex-prompt file
Codex runner (reusable-codex-run.yml):
- Gate watchdog-saved=true on actual push success instead of emitting
it unconditionally after push attempts that may have both failed
- Use a fired-flag file so the watchdog kill only terminates the
background process if it's still sleeping (hasn't started its
commit/push work yet)
https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
All four conflicts were in reusable-codex-run.yml watchdog code where our branch has the fired-flag and push-success-gating improvements vs the unchanged main version. Kept our (HEAD) version for all. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Automated Status Summary
Scope
PR #1645 aligned the Claude runner (
reusable-claude-run.yml) with the Codex runner (reusable-codex-run.yml) on critical items: CLI invocation, artifact exclusion, and agent bootstrap filtering. The following functional gaps remain and should be addressed incrementally.Context for Agent
Related Issues/PRs
Tasks
agent-error,needs-review) for triageAcceptance criteria
docs/guides/AGENT_RUNNER_IMPLEMENTATION.md: Technical guide for runner patternsHead SHA: f8737e8
Latest Runs: ✅ success — Gate
Required: gate: ✅ success