feat: close Claude runner feature gaps vs Codex runner by stranske · Pull Request #1661 · stranske/Workflows

stranske · 2026-02-26T03:05:10Z

Source: Issue #1646

Automated Status Summary

Scope

PR #1645 aligned the Claude runner (reusable-claude-run.yml) with the Codex runner (reusable-codex-run.yml) on critical items: CLI invocation, artifact exclusion, and agent bootstrap filtering. The following functional gaps remain and should be addressed incrementally.

Context for Agent

Related Issues/PRs

#1645
#1643

Tasks

Classifies failures by category (auth, timeout, sandbox, rate-limit, etc.)
Posts a structured PR comment with the error summary, suggested fixes, and a link to logs
Adds labels (agent-error, needs-review) for triage
Includes last 50 lines of session output in the comment for quick debugging

Acceptance criteria

PR fix: Claude runner — detect agent-made commits and add diagnostics #1643: Initial Claude runner alignment
PR fix: Claude runner --output-file crash, PR_REF unbound, and Codex alignment #1645: CLI invocation fix + bootstrap/artifact alignment
docs/guides/AGENT_RUNNER_IMPLEMENTATION.md: Technical guide for runner patterns

Head SHA: f8737e8
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job	Result	Logs
.github/workflows/autofix.yml	❌ failure	View run
Agents PR meta manager	❔ in progress	View run
Gate	✅ success	View run
Health 40 Sweep	✅ success	View run
Health 44 Gate Branch Protection	✅ success	View run
Health 45 Agents Guard	✅ success	View run
Health 50 Security Scan	✅ success	View run
Health 73 Template Completeness	✅ success	View run
Keepalive E2E	✅ success	View run
Maint 52 Validate Workflows	✅ success	View run
PR 11 - Minimal invariant CI	✅ success	View run
Selftest CI	✅ success	View run
Validate Sync Manifest	✅ success	View run

GitHub Actions ::warning:: commands truncate/mangle multi-line content. Emit a short annotation message and print full npm stderr in a collapsible ::group:: instead, so logs stay readable. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

…ation fixes Mirror the main setup-api-client changes into the consumer-repo template to prevent template drift: - Exponential backoff retry (3 attempts, 5s/10s) for transient npm errors - --legacy-peer-deps fallback on first failure - Short ::warning:: annotations with full stderr in collapsible ::group:: - Pin lru-cache@10.4.3 (was ^10.0.0) https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

…feguard Three changes to reusable-codex-run.yml to prevent work loss on timeout: 1. Pre-timeout watchdog: A background timer fires 5 minutes before max_runtime_minutes, committing and pushing any uncommitted work so it survives the job cancellation. Killed automatically if Codex finishes before the timer fires. 2. Robust parser import: Replace sys.path-based import of codex_jsonl_parser with importlib.util.spec_from_file_location. Consumer repos (e.g. Counter_Risk) have their own tools/ package with __init__.py that shadows the Workflows tools/ on sys.path, causing "No module named 'tools.codex_jsonl_parser'". 3. Commit step always runs: Add if: always() to the "Commit and push changes" step so uncommitted work is captured even on non-zero exit codes (the watchdog handles timeout, this handles failures). https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

parseCheckboxStates() and mergeCheckboxStates() only matched top-level checkboxes (^- \[), ignoring indented sub-tasks ( - \[). When PR Meta regenerated the PR body from the issue, auto-reconciled sub-task checkboxes were silently reverted to unchecked. This caused the keepalive loop to stall with rounds_without_task_completion: 8 despite the agent completing real work — PR #256 had 5 tasks auto-checked then immediately un-checked on every push. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

- P1: Add fetch/rebase before watchdog push to avoid non-fast-forward rejection when another workflow updates the branch during the run. Includes one retry with re-fetch/rebase and merge fallback. - P2: Export watchdog-saved in on.workflow_call.outputs so callers of the reusable workflow can observe the signal. - Copilot: Add git fetch before checking FETCH_HEAD to ensure it exists and is current (actions/checkout doesn't set FETCH_HEAD). - Copilot: Initialize watchdog-saved=false before background subshell so downstream consumers always get a defined value. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

Update WORKFLOW_OUTPUTS.md to include the new watchdog-saved output from reusable-codex-run.yml, fixing the test_reusable_workflow_outputs_documented test. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

The body scan in extractIssueNumberFromPull was treating patterns like "Run #2615" as issue references, causing the Upsert PR body sections check to fail with a 404 when trying to fetch non-existent issues. Add a preceding-word filter to skip #NNN when preceded by common non-issue words (run, attempt, step, job, check, task, version, v). Add 12 unit tests covering the extraction logic. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

… to Claude runner Closes the three remaining feature gaps between the Claude and Codex runners identified in issue #1646: 1. **Session analysis (LLM-powered)**: Reuses analyze_codex_session.py which auto-detects Claude's plain-text session log (data_source=summary) and feeds it through the same LLM analysis pipeline for structured task completion assessment. Outputs feed into the keepalive loop. 2. **Completion checkpoint comment**: Posts a PR comment summarizing completed tasks and acceptance criteria using the shared post_completion_comment.js script. Supports both claude-prompt*.md and codex-prompt*.md file names. 3. **Error diagnostics**: Adds GITHUB_STEP_SUMMARY with error table, creates a diagnostics artifact (JSON + agent output), and posts a structured PR comment on non-transient failures with recovery guidance and log links. Uses a distinct  marker. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

agents-workflows-bot · 2026-02-26T03:07:28Z

Automated Status Summary

Head SHA: fb5628a
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / guard
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	93.12%
Baseline	85.00%
Delta	+8.12%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`src/cli_parser.py`	81.8%	4
`src/percentile_calculator.py`	95.0%	1
`src/aggregator.py`	95.0%	2
`src/__init__.py`	100.0%	0
`src/ndjson_parser.py`	100.0%	0

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

PR #1645 aligned the Claude runner (reusable-claude-run.yml) with the Codex runner (reusable-codex-run.yml) on critical items: CLI invocation, artifact exclusion, and agent bootstrap filtering. The following functional gaps remain and should be addressed incrementally.

Context for Agent

Related Issues/PRs

#1645
#1643

Tasks

Classifies failures by category (auth, timeout, sandbox, rate-limit, etc.)
Posts a structured PR comment with the error summary, suggested fixes, and a link to logs
Adds labels (agent-error, needs-review) for triage
Includes last 50 lines of session output in the comment for quick debugging

Acceptance criteria

PR fix: Claude runner — detect agent-made commits and add diagnostics #1643: Initial Claude runner alignment
PR fix: Claude runner --output-file crash, PR_REF unbound, and Codex alignment #1645: CLI invocation fix + bootstrap/artifact alignment
docs/guides/AGENT_RUNNER_IMPLEMENTATION.md: Technical guide for runner patterns

agents-workflows-bot · 2026-02-26T03:08:11Z

🤖 Keepalive Loop Status

PR #1661 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/7 complete
Timeout	45 min (default)
Timeout usage	3m elapsed (7%, 42m remaining)
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

agents-workflows-bot · 2026-02-26T03:08:12Z

Keepalive Work Log (click to expand)

#	Time (UTC)	Agent	Action	Result	Files	Tasks	Progress	Commit	Gate
0	2026-02-26 03:08:12	Codex	wait (missing-agent-label-transient)	skipped	—	0	0/7	—	success
0	2026-02-26 03:35:31	Codex	wait (missing-agent-label-transient)	skipped	—	0	0/7	—	success

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96e5f6ec2e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

.github/workflows/reusable-claude-run.yml

.github/workflows/reusable-codex-run.yml

Copilot

Pull request overview

Aligns the Claude reusable runner with the Codex runner by adding session/LLM analysis outputs, completion checkpoint comments, and richer error diagnostics; also includes several runner/PR-meta robustness fixes shared across templates and the main repo.

Changes:

Add Claude session detection + LLM task-completion analysis, plus completion checkpoint PR comments and failure diagnostics/commenting.
Improve resilience and observability: npm install retry/backoff + grouped stderr, Codex pre-timeout watchdog output, and PR body checkbox/issue parsing fixes.
Update PR-meta scripts (checkbox state merge + issue number extraction) and add targeted tests.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`.github/workflows/reusable-claude-run.yml`	Adds Claude session analysis + LLM analysis, completion checkpoint comment, and failure diagnostics/PR comment flow.
`.github/workflows/reusable-codex-run.yml`	Adds watchdog output and watchdog lifecycle handling; makes commit/push step run unconditionally.
`.github/actions/setup-api-client/action.yml`	Improves npm install resilience with retries/backoff and grouped stderr output.
`templates/consumer-repo/.github/actions/setup-api-client/action.yml`	Mirrors setup-api-client retry/backoff + grouped stderr for template sync.
`.github/scripts/agents_pr_meta_keepalive.js`	Fixes issue-number extraction to avoid false positives like “Run #123”.
`templates/consumer-repo/.github/scripts/agents_pr_meta_keepalive.js`	Mirrors issue-number extraction fix for consumer templates.
`.github/scripts/__tests__/agents-pr-meta-keepalive.test.js`	Adds unit tests for `extractIssueNumberFromPull` behavior and regressions.
`.github/scripts/agents_pr_meta_update_body.js`	Preserves checkbox indentation and detects indented checkboxes when merging PR body states.
`templates/consumer-repo/.github/scripts/agents_pr_meta_update_body.js`	Mirrors checkbox parsing/merge improvements for consumer templates.
`docs/ci/WORKFLOW_OUTPUTS.md`	Documents the new Codex workflow output `watchdog-saved`.

Copilot · 2026-02-26T03:13:04Z

.github/workflows/reusable-claude-run.yml

+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          CLAUDE_API_STRANSKE: ${{ secrets.CLAUDE_API_STRANSKE }}
+        run: |


This step references secrets.OPENAI_API_KEY and secrets.CLAUDE_API_STRANSKE, but reusable-claude-run.yml does not declare these under on.workflow_call.secrets. In reusable workflows, undeclared secrets cannot be passed by callers, so these will be empty (or the workflow may fail validation depending on the runner). Add both secrets to the workflow_call.secrets block (matching reusable-codex-run.yml) or remove these env vars if they are intentionally unsupported here.

Copilot · 2026-02-26T03:13:05Z

.github/workflows/reusable-claude-run.yml

+          fi
+
+          # Save PR body to temp file
+          echo "$PR_BODY" > pr_body.md


Writing the PR body to disk via echo "$PR_BODY" > pr_body.md is not robust: echo can treat leading -n / -e as options and may mangle backslashes/newlines depending on shell implementation. Use printf '%s' "$PR_BODY" > pr_body.md (or a heredoc) to preserve the body exactly for downstream parsing.

Suggested change

echo "$PR_BODY" > pr_body.md

printf '%s' "$PR_BODY" > pr_body.md

Copilot · 2026-02-26T03:13:05Z

.github/workflows/reusable-claude-run.yml

+
+            const result = await postCompletionComment({
+              github, context, core,
+              inputs: {
+                pr_number: process.env.PR_NUMBER,
+                commit_sha: process.env.COMMIT_SHA,
+                iteration: process.env.ITERATION,
+                prompt_file: promptFile,
+              },
+            });


postCompletionComment currently has Codex-specific logic that prefers codex-prompt-${prNumber}.md when it exists, even if inputs.prompt_file points at a Claude prompt. If both Claude + Codex prompt artifacts are present in the workspace, this can post a checkpoint comment based on the wrong prompt file. Consider updating post_completion_comment.js to respect the provided prompt_file (or to resolve PR-specific variants based on that filename prefix), so Claude runs cannot be overridden by stale Codex prompt files.

.github/workflows/reusable-codex-run.yml

Claude runner (reusable-claude-run.yml): - Fix shell quoting of completed-tasks JSON by using env vars instead of inline ${{ }} expansion which breaks on apostrophes in task names - Declare OPENAI_API_KEY and CLAUDE_API_STRANSKE in workflow_call.secrets so callers can pass them (matches Codex runner) - Use printf instead of echo when writing PR body to disk to avoid mangling of -n/-e prefixes or backslashes - Add info log when falling back to codex-prompt file Codex runner (reusable-codex-run.yml): - Gate watchdog-saved=true on actual push success instead of emitting it unconditionally after push attempts that may have both failed - Use a fired-flag file so the watchdog kill only terminates the background process if it's still sleeping (hasn't started its commit/push work yet) https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

All four conflicts were in reusable-codex-run.yml watchdog code where our branch has the fired-flag and push-success-gating improvements vs the unchanged main version. Kept our (HEAD) version for all. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

claude and others added 9 commits February 25, 2026 21:13

chore: sync template scripts

6145c5e

docs: add watchdog-saved to workflow outputs reference

0b9a46c

Update WORKFLOW_OUTPUTS.md to include the new watchdog-saved output from reusable-codex-run.yml, fixing the test_reusable_workflow_outputs_documented test. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

Copilot AI review requested due to automatic review settings February 26, 2026 03:05

stranske temporarily deployed to agent-high-privilege February 26, 2026 03:05 — with GitHub Actions Inactive

Copilot started reviewing on behalf of stranske February 26, 2026 03:05 View session

chatgpt-codex-connector bot reviewed Feb 26, 2026

View reviewed changes

.github/workflows/reusable-claude-run.yml Outdated Show resolved Hide resolved

.github/workflows/reusable-codex-run.yml Outdated Show resolved Hide resolved

Copilot AI reviewed Feb 26, 2026

View reviewed changes

claude added 2 commits February 26, 2026 03:21

stranske temporarily deployed to agent-high-privilege February 26, 2026 03:32 — with GitHub Actions Inactive

stranske merged commit 018850c into main Feb 26, 2026
175 of 178 checks passed

stranske deleted the claude/fix-task-completion-concerns-I1gRT branch February 26, 2026 03:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: close Claude runner feature gaps vs Codex runner#1661

feat: close Claude runner feature gaps vs Codex runner#1661
stranske merged 11 commits intomainfrom
claude/fix-task-completion-concerns-I1gRT

stranske commented Feb 26, 2026 •

edited by stranske-keepalive bot

Loading

Uh oh!

agents-workflows-bot bot commented Feb 26, 2026 •

edited by stranske-keepalive bot

Loading

Uh oh!

agents-workflows-bot bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

agents-workflows-bot bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	echo "$PR_BODY" > pr_body.md
	printf '%s' "$PR_BODY" > pr_body.md

Conversation

stranske commented Feb 26, 2026 • edited by stranske-keepalive bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Scope

Context for Agent

Related Issues/PRs

Tasks

Acceptance criteria

Uh oh!

agents-workflows-bot bot commented Feb 26, 2026 • edited by stranske-keepalive bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Context for Agent

Related Issues/PRs

Tasks

Acceptance criteria

Uh oh!

agents-workflows-bot bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

agents-workflows-bot bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stranske commented Feb 26, 2026 •

edited by stranske-keepalive bot

Loading

agents-workflows-bot bot commented Feb 26, 2026 •

edited by stranske-keepalive bot

Loading

agents-workflows-bot bot commented Feb 26, 2026 •

edited

Loading

agents-workflows-bot bot commented Feb 26, 2026 •

edited

Loading