Fix/codex log issues by stranske · Pull Request #1484 · stranske/Workflows

stranske · 2026-02-12T15:24:18Z

No description provided.

Essential fixes: - Reporter sparse-checkout: add .github/actions to checkout so setup-api-client action is available (was failing 100% on Workflows repo) - Belt Worker: re-install API client after branch checkout wipes node_modules (was causing @octokit/rest import failures and degraded token rotation) High-value fixes: - LLM analysis outputs: use print(..., end='') to strip trailing newlines from python extraction (confidence values had '\n' suffix e.g. '0.63\n') - Repo variables fetch: downgrade from core.info to core.debug since the token permission limitation is known and the fallback to defaults works correctly Medium fixes: - Health 75 API Rate Diagnostic: pass secrets to 4 setup-api-client calls that were missing the input, causing 'No tokens were exported' warnings - datetime.utcnow(): replace deprecated calls with timezone-aware alternative in both Belt Worker ledger functions Low-salience fixes: - error_classifier: gate entry log behind RUNNER_DEBUG to reduce log noise - Non-artifact commit warning: downgrade from warning to notice since it is expected behavior when Codex produces only workflow artifacts

1. Use .belt-tools action path instead of ./ for setup-api-client after branch checkout, so the action runs from trusted Workflows code rather than the untrusted issue branch (security fix). 2. Pass GH_BELT_TOKEN || github.token as github_token input to preserve the belt token selection instead of overriding GITHUB_TOKEN/GH_TOKEN with the default workflow token.

…eshold Two independent fixes for broken automation flows: 1. capability_check.py: The bare \bsecrets?\b regex matched negative mentions like 'no secrets' in issue constraint text, causing _requires_admin_access() to return true and the fallback classifier to BLOCK tasks that merely *describe* a no-secrets constraint. Replace with specific verb+secrets patterns (manage/configure/set/ create/update/delete/add/modify/rotate secrets). Root cause of PAEM #1403 false-positive BLOCKED. 2. verdict_policy.py: CONCERNS_NEEDS_HUMAN_THRESHOLD lowered from 0.85 to 0.50. The old threshold meant any split verdict (PASS + CONCERNS) with <85% confidence on the concerns side triggered needs_human, blocking automatic follow-up issue creation. A 72% confidence concerns verdict (TMP #4894) is well above chance and should produce a follow-up rather than require manual triage. Both template and main copies updated; new regression tests added.

Three-layer fix for the systemic issue where setup-api-client's npm install overwrites vendored minimatch package.json, and git add -A captures the modification into bootstrap/autofix commits. Layer 1 (source fix): setup-api-client/action.yml - Snapshot vendored package.json files before npm install - Restore them after npm install completes - Applied to both .github/actions/ and templates/consumer-repo/ Layer 2 (targeted staging): reusable-agents-issue-bridge.yml - Replace 'git add -A' with targeted 'git add agents/${AGENT}-${ISSUE}.md' - Only the bootstrap file gets staged, not npm side-effects Layer 3 (safety net): reusable-18-autofix.yml - Add 'git reset HEAD -- .github/scripts/node_modules ...' after git add -A - Matches existing pattern in reusable-codex-run.yml line 1184 - Applied to both push-commit and patch-commit paths Also fixes test assertions that referenced the old CONCERNS_NEEDS_HUMAN_THRESHOLD (was 0.85, now 0.50) — confidence values in tests updated accordingly. Fixes: Copilot review finding on PAEM PR #1417 (minimatch vendoring cycle)

The needs_human gate was backwards: it fired when the CONCERNS provider had LOW confidence (LLM unsure there's a problem) instead of HIGH confidence (LLM confident there's a real problem). Confidence reflects the LLM's certainty in its own evaluation, not a measure of code quality. Low-confidence CONCERNS is a weak signal that shouldn't block follow-up automation. High-confidence CONCERNS is the stronger signal warranting human review. Changed: confidence_value < threshold → confidence_value >= threshold Threshold set to 0.85 (high bar — a human is already in the loop and depth-of-rounds provides an independent guard against runaway automation).

- Relax verb-to-secret regex from \s+ to .{0,30} so phrases like 'Set repository secret TOKEN' and 'Update GitHub Actions secret FOO' are correctly blocked even with intervening words (addresses Codex inline review on capability_check.py L165) - Add 2 regression tests for the above patterns - Resolve merge conflicts in 4 test files (keep >= 0.85 threshold logic; main had lowered to 0.50 with < direction) - Restore CONCERNS_NEEDS_HUMAN_THRESHOLD = 0.85 (auto-merge picked up main's 0.50 value but our >= comparison direction) All 1907 tests pass.

…s, task-focused prompts, PR meta debounce - ledger_migrate_base.py: skip corrupt YAML files instead of blocking all belt worker runs (root cause of issue #1418 stall) - agents-autofix-loop: reduce max_attempts 3→2 (standard) and 2→1 (escalated) to cut autofix churn observed in PR #4906 - agents-72-codex-belt-worker: emit task_title output and include task-focused directive in activation comment for higher first-commit success rate - agents-pr-meta: add PR-number concurrency grouping with cancel-in-progress for pull_request events to debounce redundant runs - All template counterparts updated in sync - 2 new tests for corrupt ledger handling

stranske-keepalive · 2026-02-12T15:24:40Z

⚠️ Action Required: Unable to determine source issue for PR #1484. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

stranske-keepalive · 2026-02-12T15:27:09Z

Automated Status Summary

Head SHA: e896d7e
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / guard
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	93.12%
Baseline	85.00%
Delta	+8.12%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`src/cli_parser.py`	81.8%	4
`src/percentile_calculator.py`	95.0%	1
`src/aggregator.py`	95.0%	2
`src/__init__.py`	100.0%	0
`src/ndjson_parser.py`	100.0%	0

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

No scope information available

Tasks

No tasks defined

Acceptance criteria

No acceptance criteria defined

agents-workflows-bot · 2026-02-12T15:28:00Z

🤖 Keepalive Loop Status

PR #1484 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/0 complete
Timeout	45 min (default)
Timeout usage	3m elapsed (8%, 42m remaining)
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 371370e04b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

.github/workflows/reusable-18-autofix.yml

.github/workflows/agents-72-codex-belt-worker.yml

Copilot

Pull request overview

Updates automation/workflow behavior to reduce noisy vendored node_modules diffs, tighten “needs human” gating to high-confidence split verdicts, and harden capability/admin checks and ledger migration behavior.

Changes:

Reworked verdict policy so split PASS+CONCERNS only triggers needs_human when CONCERNS confidence is high (>= 0.85), and updated tests accordingly.
Improved admin capability fallback detection for “set/update … secret(s)” phrasing and added regression tests.
Reduced unintended node_modules staging/commits in automation workflows; added ledger migration resilience for corrupt YAML ledgers; refined keepalive/meta workflow concurrency behavior.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
scripts/langchain/verdict_policy.py	Switch split-verdict `needs_human` gating to high-confidence threshold (>= 0.85).
tests/test_verdict_policy.py	Align unit tests and messaging with updated threshold semantics.
tests/test_verdict_policy_integration.py	Align integration parity tests for workflow vs follow-up verdict handling.
tests/test_verdict_extract.py	Update expectations for `needs_human` with higher CONCERNS confidence.
tests/test_followup_issue_generator.py	Update follow-up labeling test to reflect high-confidence `needs-human`.
scripts/langchain/capability_check.py	Broaden admin-required regexes to catch “verb … secret(s)” with intervening words.
templates/consumer-repo/scripts/langchain/capability_check.py	Mirror capability-check regex changes in consumer template.
tests/scripts/test_capability_check.py	Add regression tests for “set repository secret” / “update actions secret” fallback blocking.
scripts/ledger_migrate_base.py	Skip corrupt ledgers (YAML/Migration errors) while continuing processing; summarize skips.
tests/scripts/test_ledger_migrate_base.py	Add coverage for skipping corrupt ledgers in normal and `--check` modes.
.github/actions/setup-api-client/action.yml	Snapshot/restore vendored `node_modules` package metadata around npm install.
templates/consumer-repo/.github/actions/setup-api-client/action.yml	Mirror setup-api-client vendored metadata snapshot/restore in template.
.github/workflows/reusable-agents-issue-bridge.yml	Avoid staging unintended files by adding only the bootstrap markdown file.
.github/workflows/reusable-18-autofix.yml	Unstage vendored `node_modules` paths before committing autofix results.
.github/workflows/agents-pr-meta-v4.yml	Concurrency group now distinguishes PR events; cancel-in-progress for PR runs.
templates/consumer-repo/.github/workflows/agents-pr-meta.yml	Same concurrency refinement for the consumer thin-caller workflow.
.github/workflows/agents-autofix-loop.yml	Reduce max attempts; cap escalated PR attempts to 1.
templates/consumer-repo/.github/workflows/agents-autofix-loop.yml	Mirror autofix-loop attempt reductions in template.
.github/workflows/agents-72-codex-belt-worker.yml	Emit `task_title` output and include a narrowed “focus task” directive in activation comment.
templates/consumer-repo/.github/workflows/agents-72-codex-belt-worker.yml	Mirror belt-worker `task_title` output and activation comment directive in template.

Copilot · 2026-02-12T15:29:37Z

.github/workflows/agents-72-codex-belt-worker.yml

              with open(gh_output, 'a', encoding='utf-8') as handle:
                  handle.write(f"task_id={start_info['task']['id'] if start_info['task'] else ''}\n")
+                  handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n")
                  handle.write(f"task_status={start_info['task']['current_status'] if start_info['task'] else ''}\n")


Writing task_title to $GITHUB_OUTPUT as a single key=value line will break output parsing if a ledger task title contains a newline (or other characters that require the multiline output format). Consider sanitizing task_title (e.g., replace newlines) or emitting it using the official multiline <<DELIM syntax so any title is safe.

Suggested change

with open(gh_output, 'a', encoding='utf-8') as handle:

handle.write(f"task_id={start_info['task']['id'] if start_info['task'] else ''}\n")

handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n")

handle.write(f"task_status={start_info['task']['current_status'] if start_info['task'] else ''}\n")

# Prepare safe, single-line task fields for GitHub outputs

task = start_info['task'] or {}

task_id = task.get('id', '') if task else ''

task_title = task.get('title', '') if task else ''

# Sanitize title to avoid breaking GITHUB_OUTPUT parsing

task_title = task_title.replace('\r', ' ').replace('\n', ' ')

task_status = task.get('current_status', '') if task else ''

with open(gh_output, 'a', encoding='utf-8') as handle:

handle.write(f"task_id={task_id}\n")

handle.write(f"task_title={task_title}\n")

handle.write(f"task_status={task_status}\n")

Copilot · 2026-02-12T15:29:38Z

templates/consumer-repo/.github/workflows/agents-72-codex-belt-worker.yml

          if gh_output:
              with open(gh_output, 'a', encoding='utf-8') as handle:
                  handle.write(f"task_id={start_info['task']['id'] if start_info['task'] else ''}\n")
+                  handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n")


Writing task_title to $GITHUB_OUTPUT as a single key=value line will break output parsing if a ledger task title contains a newline (or other characters that require the multiline output format). Consider sanitizing task_title (e.g., replace newlines) or emitting it using the official multiline <<DELIM syntax so any title is safe.

Suggested change

handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n")

title = start_info['task']['title'] if start_info['task'] else ''

handle.write("task_title<<TASK_TITLE_EOF\n")

handle.write(f"{title}\n")

handle.write("TASK_TITLE_EOF\n")

scripts/ledger_migrate_base.py

github-actions · 2026-02-12T15:31:38Z

Autofix updated these files:

tests/scripts/test_ledger_migrate_base.py

…rkflows into fix/codex-log-issues

…tations Address inline review feedback on PR #1484: - Sanitize task_title by replacing newlines/carriage returns with spaces before writing to $GITHUB_OUTPUT (prevents broken output parsing) - Normalize yaml.YAMLError messages to single-line in ::warning:: annotations (prevents malformed GitHub Actions annotations) - Both belt-worker copies updated in sync

stranske added 7 commits February 12, 2026 00:59

Copilot AI review requested due to automatic review settings February 12, 2026 15:24

stranske temporarily deployed to agent-high-privilege February 12, 2026 15:24 — with GitHub Actions Inactive

Copilot started reviewing on behalf of stranske February 12, 2026 15:25 View session

stranske added the autofix:escalated label Feb 12, 2026

stranske temporarily deployed to agent-high-privilege February 12, 2026 15:27 — with GitHub Actions Inactive

stranske temporarily deployed to agent-standard February 12, 2026 15:28 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Feb 12, 2026

View reviewed changes

.github/workflows/reusable-18-autofix.yml Show resolved Hide resolved

.github/workflows/agents-72-codex-belt-worker.yml Outdated Show resolved Hide resolved

Copilot AI reviewed Feb 12, 2026

View reviewed changes

github-actions bot added the autofix Opt-in automated formatting & lint remediation label Feb 12, 2026

chore(autofix): formatting/lint

6f5e110

github-actions bot added the autofix:patch label Feb 12, 2026

agents-workflows-bot bot temporarily deployed to agent-high-privilege February 12, 2026 15:31 Inactive

github-actions bot removed the autofix:patch label Feb 12, 2026

github-actions bot added 2 commits February 12, 2026 15:35

chore(codex-autofix): apply updates (PR #1484)

99b3d52

Merge branch 'fix/codex-log-issues' of https://github.com/stranske/Wo…

b09ca72

…rkflows into fix/codex-log-issues

stranske added autofix:escalated and removed autofix:escalated labels Feb 12, 2026

stranske temporarily deployed to agent-high-privilege February 12, 2026 15:37 — with GitHub Actions Inactive

stranske temporarily deployed to agent-standard February 12, 2026 15:37 — with GitHub Actions Inactive

stranske temporarily deployed to agent-standard February 12, 2026 15:38 — with GitHub Actions Inactive

github-actions bot added 2 commits February 12, 2026 15:40

chore(codex-autofix): apply updates (PR #1484)

52bd354

chore: sync template scripts

ffda552

stranske removed the autofix:escalated label Feb 12, 2026

stranske temporarily deployed to agent-high-privilege February 12, 2026 16:04 — with GitHub Actions Inactive

stranske merged commit b4d5b1a into main Feb 12, 2026
37 checks passed

stranske deleted the fix/codex-log-issues branch February 12, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/codex log issues#1484

Fix/codex log issues#1484
stranske merged 13 commits intomainfrom
fix/codex-log-issues

stranske commented Feb 12, 2026

Uh oh!

stranske-keepalive bot commented Feb 12, 2026

Uh oh!

stranske-keepalive bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

agents-workflows-bot bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-              with open(gh_output, 'a', encoding='utf-8') as handle:
-                  handle.write(f"task_id={start_info['task']['id'] if start_info['task'] else ''}\n")
-                  handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n")
-                  handle.write(f"task_status={start_info['task']['current_status'] if start_info['task'] else ''}\n")
+              # Prepare safe, single-line task fields for GitHub outputs
+              task = start_info['task'] or {}
+              task_id = task.get('id', '') if task else ''
+              task_title = task.get('title', '') if task else ''
+              # Sanitize title to avoid breaking GITHUB_OUTPUT parsing
+              task_title = task_title.replace('\r', ' ').replace('\n', ' ')
+              task_status = task.get('current_status', '') if task else ''
+              with open(gh_output, 'a', encoding='utf-8') as handle:
+                  handle.write(f"task_id={task_id}\n")
+                  handle.write(f"task_title={task_title}\n")
+                  handle.write(f"task_status={task_status}\n")

-                  handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n")
+                  title = start_info['task']['title'] if start_info['task'] else ''
+                  handle.write("task_title<<TASK_TITLE_EOF\n")
+                  handle.write(f"{title}\n")
+                  handle.write("TASK_TITLE_EOF\n")

Conversation

stranske commented Feb 12, 2026

Uh oh!

stranske-keepalive bot commented Feb 12, 2026

Uh oh!

stranske-keepalive bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

agents-workflows-bot bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stranske-keepalive bot commented Feb 12, 2026 •

edited

Loading

agents-workflows-bot bot commented Feb 12, 2026 •

edited

Loading