Skip to content

Fix/codex log issues#1483

Merged
stranske merged 7 commits intomainfrom
fix/codex-log-issues
Feb 12, 2026
Merged

Fix/codex log issues#1483
stranske merged 7 commits intomainfrom
fix/codex-log-issues

Conversation

@stranske
Copy link
Copy Markdown
Owner

No description provided.

Essential fixes:
- Reporter sparse-checkout: add .github/actions to checkout so setup-api-client
  action is available (was failing 100% on Workflows repo)
- Belt Worker: re-install API client after branch checkout wipes node_modules
  (was causing @octokit/rest import failures and degraded token rotation)

High-value fixes:
- LLM analysis outputs: use print(..., end='') to strip trailing newlines from
  python extraction (confidence values had '\n' suffix e.g. '0.63\n')
- Repo variables fetch: downgrade from core.info to core.debug since the token
  permission limitation is known and the fallback to defaults works correctly

Medium fixes:
- Health 75 API Rate Diagnostic: pass secrets to 4 setup-api-client calls that
  were missing the input, causing 'No tokens were exported' warnings
- datetime.utcnow(): replace deprecated calls with timezone-aware alternative
  in both Belt Worker ledger functions

Low-salience fixes:
- error_classifier: gate entry log behind RUNNER_DEBUG to reduce log noise
- Non-artifact commit warning: downgrade from warning to notice since it is
  expected behavior when Codex produces only workflow artifacts
1. Use .belt-tools action path instead of ./ for setup-api-client
   after branch checkout, so the action runs from trusted Workflows
   code rather than the untrusted issue branch (security fix).

2. Pass GH_BELT_TOKEN || github.token as github_token input to
   preserve the belt token selection instead of overriding
   GITHUB_TOKEN/GH_TOKEN with the default workflow token.
…eshold

Two independent fixes for broken automation flows:

1. capability_check.py: The bare \bsecrets?\b regex matched negative
   mentions like 'no secrets' in issue constraint text, causing
   _requires_admin_access() to return true and the fallback classifier
   to BLOCK tasks that merely *describe* a no-secrets constraint.
   Replace with specific verb+secrets patterns (manage/configure/set/
   create/update/delete/add/modify/rotate secrets).
   Root cause of PAEM #1403 false-positive BLOCKED.

2. verdict_policy.py: CONCERNS_NEEDS_HUMAN_THRESHOLD lowered from 0.85
   to 0.50.  The old threshold meant any split verdict (PASS + CONCERNS)
   with <85% confidence on the concerns side triggered needs_human,
   blocking automatic follow-up issue creation.  A 72% confidence
   concerns verdict (TMP #4894) is well above chance and should produce
   a follow-up rather than require manual triage.

Both template and main copies updated; new regression tests added.
Three-layer fix for the systemic issue where setup-api-client's npm install
overwrites vendored minimatch package.json, and git add -A captures the
modification into bootstrap/autofix commits.

Layer 1 (source fix): setup-api-client/action.yml
  - Snapshot vendored package.json files before npm install
  - Restore them after npm install completes
  - Applied to both .github/actions/ and templates/consumer-repo/

Layer 2 (targeted staging): reusable-agents-issue-bridge.yml
  - Replace 'git add -A' with targeted 'git add agents/${AGENT}-${ISSUE}.md'
  - Only the bootstrap file gets staged, not npm side-effects

Layer 3 (safety net): reusable-18-autofix.yml
  - Add 'git reset HEAD -- .github/scripts/node_modules ...' after git add -A
  - Matches existing pattern in reusable-codex-run.yml line 1184
  - Applied to both push-commit and patch-commit paths

Also fixes test assertions that referenced the old CONCERNS_NEEDS_HUMAN_THRESHOLD
(was 0.85, now 0.50) — confidence values in tests updated accordingly.

Fixes: Copilot review finding on PAEM PR #1417 (minimatch vendoring cycle)
The needs_human gate was backwards: it fired when the CONCERNS provider
had LOW confidence (LLM unsure there's a problem) instead of HIGH
confidence (LLM confident there's a real problem).

Confidence reflects the LLM's certainty in its own evaluation, not a
measure of code quality. Low-confidence CONCERNS is a weak signal that
shouldn't block follow-up automation. High-confidence CONCERNS is the
stronger signal warranting human review.

Changed: confidence_value < threshold  →  confidence_value >= threshold
Threshold set to 0.85 (high bar — a human is already in the loop and
depth-of-rounds provides an independent guard against runaway automation).
Copilot AI review requested due to automatic review settings February 12, 2026 12:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 089bc23fa9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Codex/keepalive workflow ecosystem to reduce noisy logs and prevent accidental staging of vendored node_modules, while aligning verdict-policy and capability-check behavior with updated automation expectations.

Changes:

  • Flip split-verdict needs_human logic to trigger only on high-confidence CONCERNS (>= 0.85), and update/extend tests accordingly.
  • Refine capability fallback detection to avoid blocking on negated “no secrets” mentions while still blocking explicit secrets-management actions.
  • Reduce workflow/script log noise and harden workflows against npm install side effects (newline-free outputs, targeted staging, and post-checkout API-client reinstall).

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_verdict_policy_integration.py Updates integration expectations for split-verdict needs_human boundary and low-confidence behavior.
tests/test_verdict_policy.py Adds/updates unit tests for the new needs_human threshold semantics and reasoning text.
tests/test_verdict_extract.py Aligns extract output test to a high-confidence CONCERNS case that now triggers needs_human.
tests/test_followup_issue_generator.py Updates follow-up labeling test to require high-confidence CONCERNS for needs-human labeling.
tests/scripts/test_capability_check.py Adds regression tests for “no secrets” false positives and explicit secrets-management blocking.
scripts/langchain/verdict_policy.py Changes split-verdict needs_human condition to >= threshold (high-confidence concerns).
scripts/langchain/capability_check.py Narrows admin-access “secrets” regex matching to explicit secrets-management language.
templates/consumer-repo/scripts/langchain/capability_check.py Mirrors capability-check regex changes for consumer templates.
.github/workflows/reusable-codex-run.yml Prevents trailing newlines in extracted JSON/fields and downgrades an artifact-only commit message to notice.
.github/workflows/reusable-agents-issue-bridge.yml Avoids git add -A to prevent staging unintended node_modules changes.
.github/workflows/reusable-18-autofix.yml Unstages vendored node_modules paths before committing autofix results.
.github/workflows/health-75-api-rate-diagnostic.yml Passes secrets/github_token into setup-api-client consistently across jobs.
.github/workflows/agents-keepalive-loop-reporter.yml Expands sparse checkout to include .github/actions so the composite action is available.
.github/workflows/agents-72-codex-belt-worker.yml Reinstalls API client deps after branch checkout; modernizes Python UTC timestamp formatting.
templates/consumer-repo/.github/workflows/agents-72-codex-belt-worker.yml Mirrors belt-worker step + timestamp changes for consumer templates.
.github/scripts/keepalive_loop.js Downgrades repo-variable access failure logging from info to debug.
templates/consumer-repo/.github/scripts/keepalive_loop.js Mirrors keepalive logging change for consumer templates.
.github/scripts/error_classifier.js Gates verbose classification log behind RUNNER_DEBUG=1.
templates/consumer-repo/.github/scripts/error_classifier.js Mirrors error-classifier debug gating for consumer templates.
.github/actions/setup-api-client/action.yml Snapshots/restores vendored package metadata around npm install to reduce vendored drift.
templates/consumer-repo/.github/actions/setup-api-client/action.yml Mirrors setup action changes for consumer templates.

Copy link
Copy Markdown
Owner Author

Re: Codex review on capability_check.py line 165 — Valid finding. Relaxed the verb-to-secret regex from \s+ (whitespace-only) to .{0,30} (up to 30 chars gap) so phrases like "Set repository secret TOKEN" and "Update GitHub Actions secret FOO" are still correctly blocked. Added two regression tests for these exact phrases. The negated-mention test (no secrets, no workflow edits) still passes since it contains no admin verbs.

- Relax verb-to-secret regex from \s+ to .{0,30} so phrases like
  'Set repository secret TOKEN' and 'Update GitHub Actions secret FOO'
  are correctly blocked even with intervening words (addresses Codex
  inline review on capability_check.py L165)
- Add 2 regression tests for the above patterns
- Resolve merge conflicts in 4 test files (keep >= 0.85 threshold
  logic; main had lowered to 0.50 with < direction)
- Restore CONCERNS_NEEDS_HUMAN_THRESHOLD = 0.85 (auto-merge picked up
  main's 0.50 value but our >= comparison direction)

All 1907 tests pass.
@stranske-keepalive
Copy link
Copy Markdown
Contributor

⚠️ Action Required: Unable to determine source issue for PR #1483. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

@stranske stranske temporarily deployed to agent-high-privilege February 12, 2026 13:35 — with GitHub Actions Inactive
@stranske-keepalive
Copy link
Copy Markdown
Contributor

stranske-keepalive bot commented Feb 12, 2026

Automated Status Summary

Head SHA: 605b2db
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / guard
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 93.12%
Baseline 85.00%
Delta +8.12%
Minimum 70.00%
Status ✅ Pass

Top Coverage Hotspots (lowest coverage)

File Coverage Missing
src/cli_parser.py 81.8% 4
src/percentile_calculator.py 95.0% 1
src/aggregator.py 95.0% 2
src/__init__.py 100.0% 0
src/ndjson_parser.py 100.0% 0

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

No scope information available

Tasks

  • No tasks defined

Acceptance criteria

  • No acceptance criteria defined

@agents-workflows-bot
Copy link
Copy Markdown
Contributor

agents-workflows-bot bot commented Feb 12, 2026

🤖 Keepalive Loop Status

PR #1483 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Disposition skipped (transient)
Gate success
Tasks 0/0 complete
Timeout 45 min (default)
Timeout usage 3m elapsed (7%, 42m remaining)
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

@stranske stranske merged commit 7902785 into main Feb 12, 2026
35 checks passed
@stranske stranske deleted the fix/codex-log-issues branch February 12, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants