Fix/codex log issues by stranske · Pull Request #1483 · stranske/Workflows

stranske · 2026-02-12T12:59:39Z

No description provided.

Essential fixes: - Reporter sparse-checkout: add .github/actions to checkout so setup-api-client action is available (was failing 100% on Workflows repo) - Belt Worker: re-install API client after branch checkout wipes node_modules (was causing @octokit/rest import failures and degraded token rotation) High-value fixes: - LLM analysis outputs: use print(..., end='') to strip trailing newlines from python extraction (confidence values had '\n' suffix e.g. '0.63\n') - Repo variables fetch: downgrade from core.info to core.debug since the token permission limitation is known and the fallback to defaults works correctly Medium fixes: - Health 75 API Rate Diagnostic: pass secrets to 4 setup-api-client calls that were missing the input, causing 'No tokens were exported' warnings - datetime.utcnow(): replace deprecated calls with timezone-aware alternative in both Belt Worker ledger functions Low-salience fixes: - error_classifier: gate entry log behind RUNNER_DEBUG to reduce log noise - Non-artifact commit warning: downgrade from warning to notice since it is expected behavior when Codex produces only workflow artifacts

1. Use .belt-tools action path instead of ./ for setup-api-client after branch checkout, so the action runs from trusted Workflows code rather than the untrusted issue branch (security fix). 2. Pass GH_BELT_TOKEN || github.token as github_token input to preserve the belt token selection instead of overriding GITHUB_TOKEN/GH_TOKEN with the default workflow token.

…eshold Two independent fixes for broken automation flows: 1. capability_check.py: The bare \bsecrets?\b regex matched negative mentions like 'no secrets' in issue constraint text, causing _requires_admin_access() to return true and the fallback classifier to BLOCK tasks that merely *describe* a no-secrets constraint. Replace with specific verb+secrets patterns (manage/configure/set/ create/update/delete/add/modify/rotate secrets). Root cause of PAEM #1403 false-positive BLOCKED. 2. verdict_policy.py: CONCERNS_NEEDS_HUMAN_THRESHOLD lowered from 0.85 to 0.50. The old threshold meant any split verdict (PASS + CONCERNS) with <85% confidence on the concerns side triggered needs_human, blocking automatic follow-up issue creation. A 72% confidence concerns verdict (TMP #4894) is well above chance and should produce a follow-up rather than require manual triage. Both template and main copies updated; new regression tests added.

Three-layer fix for the systemic issue where setup-api-client's npm install overwrites vendored minimatch package.json, and git add -A captures the modification into bootstrap/autofix commits. Layer 1 (source fix): setup-api-client/action.yml - Snapshot vendored package.json files before npm install - Restore them after npm install completes - Applied to both .github/actions/ and templates/consumer-repo/ Layer 2 (targeted staging): reusable-agents-issue-bridge.yml - Replace 'git add -A' with targeted 'git add agents/${AGENT}-${ISSUE}.md' - Only the bootstrap file gets staged, not npm side-effects Layer 3 (safety net): reusable-18-autofix.yml - Add 'git reset HEAD -- .github/scripts/node_modules ...' after git add -A - Matches existing pattern in reusable-codex-run.yml line 1184 - Applied to both push-commit and patch-commit paths Also fixes test assertions that referenced the old CONCERNS_NEEDS_HUMAN_THRESHOLD (was 0.85, now 0.50) — confidence values in tests updated accordingly. Fixes: Copilot review finding on PAEM PR #1417 (minimatch vendoring cycle)

The needs_human gate was backwards: it fired when the CONCERNS provider had LOW confidence (LLM unsure there's a problem) instead of HIGH confidence (LLM confident there's a real problem). Confidence reflects the LLM's certainty in its own evaluation, not a measure of code quality. Low-confidence CONCERNS is a weak signal that shouldn't block follow-up automation. High-confidence CONCERNS is the stronger signal warranting human review. Changed: confidence_value < threshold → confidence_value >= threshold Threshold set to 0.85 (high bar — a human is already in the loop and depth-of-rounds provides an independent guard against runaway automation).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 089bc23fa9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

scripts/langchain/capability_check.py

Copilot

Pull request overview

Updates the Codex/keepalive workflow ecosystem to reduce noisy logs and prevent accidental staging of vendored node_modules, while aligning verdict-policy and capability-check behavior with updated automation expectations.

Changes:

Flip split-verdict needs_human logic to trigger only on high-confidence CONCERNS (>= 0.85), and update/extend tests accordingly.
Refine capability fallback detection to avoid blocking on negated “no secrets” mentions while still blocking explicit secrets-management actions.
Reduce workflow/script log noise and harden workflows against npm install side effects (newline-free outputs, targeted staging, and post-checkout API-client reinstall).

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/test_verdict_policy_integration.py	Updates integration expectations for split-verdict `needs_human` boundary and low-confidence behavior.
tests/test_verdict_policy.py	Adds/updates unit tests for the new `needs_human` threshold semantics and reasoning text.
tests/test_verdict_extract.py	Aligns extract output test to a high-confidence CONCERNS case that now triggers `needs_human`.
tests/test_followup_issue_generator.py	Updates follow-up labeling test to require high-confidence CONCERNS for needs-human labeling.
tests/scripts/test_capability_check.py	Adds regression tests for “no secrets” false positives and explicit secrets-management blocking.
scripts/langchain/verdict_policy.py	Changes split-verdict `needs_human` condition to `>=` threshold (high-confidence concerns).
scripts/langchain/capability_check.py	Narrows admin-access “secrets” regex matching to explicit secrets-management language.
templates/consumer-repo/scripts/langchain/capability_check.py	Mirrors capability-check regex changes for consumer templates.
.github/workflows/reusable-codex-run.yml	Prevents trailing newlines in extracted JSON/fields and downgrades an artifact-only commit message to `notice`.
.github/workflows/reusable-agents-issue-bridge.yml	Avoids `git add -A` to prevent staging unintended `node_modules` changes.
.github/workflows/reusable-18-autofix.yml	Unstages vendored `node_modules` paths before committing autofix results.
.github/workflows/health-75-api-rate-diagnostic.yml	Passes `secrets`/`github_token` into `setup-api-client` consistently across jobs.
.github/workflows/agents-keepalive-loop-reporter.yml	Expands sparse checkout to include `.github/actions` so the composite action is available.
.github/workflows/agents-72-codex-belt-worker.yml	Reinstalls API client deps after branch checkout; modernizes Python UTC timestamp formatting.
templates/consumer-repo/.github/workflows/agents-72-codex-belt-worker.yml	Mirrors belt-worker step + timestamp changes for consumer templates.
.github/scripts/keepalive_loop.js	Downgrades repo-variable access failure logging from `info` to `debug`.
templates/consumer-repo/.github/scripts/keepalive_loop.js	Mirrors keepalive logging change for consumer templates.
.github/scripts/error_classifier.js	Gates verbose classification log behind `RUNNER_DEBUG=1`.
templates/consumer-repo/.github/scripts/error_classifier.js	Mirrors error-classifier debug gating for consumer templates.
.github/actions/setup-api-client/action.yml	Snapshots/restores vendored package metadata around `npm install` to reduce vendored drift.
templates/consumer-repo/.github/actions/setup-api-client/action.yml	Mirrors setup action changes for consumer templates.

stranske · 2026-02-12T13:30:08Z

Re: Codex review on capability_check.py line 165 — Valid finding. Relaxed the verb-to-secret regex from \s+ (whitespace-only) to .{0,30} (up to 30 chars gap) so phrases like "Set repository secret TOKEN" and "Update GitHub Actions secret FOO" are still correctly blocked. Added two regression tests for these exact phrases. The negated-mention test (no secrets, no workflow edits) still passes since it contains no admin verbs.

- Relax verb-to-secret regex from \s+ to .{0,30} so phrases like 'Set repository secret TOKEN' and 'Update GitHub Actions secret FOO' are correctly blocked even with intervening words (addresses Codex inline review on capability_check.py L165) - Add 2 regression tests for the above patterns - Resolve merge conflicts in 4 test files (keep >= 0.85 threshold logic; main had lowered to 0.50 with < direction) - Restore CONCERNS_NEEDS_HUMAN_THRESHOLD = 0.85 (auto-merge picked up main's 0.50 value but our >= comparison direction) All 1907 tests pass.

stranske-keepalive · 2026-02-12T13:35:04Z

⚠️ Action Required: Unable to determine source issue for PR #1483. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

stranske-keepalive · 2026-02-12T13:37:09Z

Automated Status Summary

Head SHA: 605b2db
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / guard
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	93.12%
Baseline	85.00%
Delta	+8.12%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`src/cli_parser.py`	81.8%	4
`src/percentile_calculator.py`	95.0%	1
`src/aggregator.py`	95.0%	2
`src/__init__.py`	100.0%	0
`src/ndjson_parser.py`	100.0%	0

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

No scope information available

Tasks

No tasks defined

Acceptance criteria

No acceptance criteria defined

agents-workflows-bot · 2026-02-12T13:37:51Z

🤖 Keepalive Loop Status

PR #1483 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/0 complete
Timeout	45 min (default)
Timeout usage	3m elapsed (7%, 42m remaining)
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

stranske added 5 commits February 12, 2026 00:59

Copilot AI review requested due to automatic review settings February 12, 2026 12:59

Copilot started reviewing on behalf of stranske February 12, 2026 13:00 View session

chatgpt-codex-connector bot reviewed Feb 12, 2026

View reviewed changes

scripts/langchain/capability_check.py Outdated Show resolved Hide resolved

Copilot AI reviewed Feb 12, 2026

View reviewed changes

stranske temporarily deployed to agent-high-privilege February 12, 2026 13:35 — with GitHub Actions Inactive

stranske added the autofix:escalated label Feb 12, 2026

stranske temporarily deployed to agent-high-privilege February 12, 2026 13:37 — with GitHub Actions Inactive

stranske temporarily deployed to agent-standard February 12, 2026 13:37 — with GitHub Actions Inactive

stranske temporarily deployed to agent-standard February 12, 2026 13:38 — with GitHub Actions Inactive

chore(codex-autofix): apply updates (PR #1483)

c36728e

agents-workflows-bot bot temporarily deployed to agent-high-privilege February 12, 2026 13:41 Inactive

stranske removed the autofix:escalated label Feb 12, 2026

stranske merged commit 7902785 into main Feb 12, 2026
35 checks passed

stranske deleted the fix/codex-log-issues branch February 12, 2026 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/codex log issues#1483

Fix/codex log issues#1483
stranske merged 7 commits intomainfrom
fix/codex-log-issues

stranske commented Feb 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

stranske commented Feb 12, 2026

Uh oh!

stranske-keepalive bot commented Feb 12, 2026

Uh oh!

stranske-keepalive bot commented Feb 12, 2026 •

edited by agents-workflows-bot bot

Loading

Uh oh!

agents-workflows-bot bot commented Feb 12, 2026 •

edited by stranske-keepalive bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Feb 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

stranske commented Feb 12, 2026

Uh oh!

stranske-keepalive bot commented Feb 12, 2026

Uh oh!

stranske-keepalive bot commented Feb 12, 2026 • edited by agents-workflows-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

agents-workflows-bot bot commented Feb 12, 2026 • edited by stranske-keepalive bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stranske-keepalive bot commented Feb 12, 2026 •

edited by agents-workflows-bot bot

Loading

agents-workflows-bot bot commented Feb 12, 2026 •

edited by stranske-keepalive bot

Loading