Conversation
Essential fixes: - Reporter sparse-checkout: add .github/actions to checkout so setup-api-client action is available (was failing 100% on Workflows repo) - Belt Worker: re-install API client after branch checkout wipes node_modules (was causing @octokit/rest import failures and degraded token rotation) High-value fixes: - LLM analysis outputs: use print(..., end='') to strip trailing newlines from python extraction (confidence values had '\n' suffix e.g. '0.63\n') - Repo variables fetch: downgrade from core.info to core.debug since the token permission limitation is known and the fallback to defaults works correctly Medium fixes: - Health 75 API Rate Diagnostic: pass secrets to 4 setup-api-client calls that were missing the input, causing 'No tokens were exported' warnings - datetime.utcnow(): replace deprecated calls with timezone-aware alternative in both Belt Worker ledger functions Low-salience fixes: - error_classifier: gate entry log behind RUNNER_DEBUG to reduce log noise - Non-artifact commit warning: downgrade from warning to notice since it is expected behavior when Codex produces only workflow artifacts
1. Use .belt-tools action path instead of ./ for setup-api-client after branch checkout, so the action runs from trusted Workflows code rather than the untrusted issue branch (security fix). 2. Pass GH_BELT_TOKEN || github.token as github_token input to preserve the belt token selection instead of overriding GITHUB_TOKEN/GH_TOKEN with the default workflow token.
…eshold Two independent fixes for broken automation flows: 1. capability_check.py: The bare \bsecrets?\b regex matched negative mentions like 'no secrets' in issue constraint text, causing _requires_admin_access() to return true and the fallback classifier to BLOCK tasks that merely *describe* a no-secrets constraint. Replace with specific verb+secrets patterns (manage/configure/set/ create/update/delete/add/modify/rotate secrets). Root cause of PAEM #1403 false-positive BLOCKED. 2. verdict_policy.py: CONCERNS_NEEDS_HUMAN_THRESHOLD lowered from 0.85 to 0.50. The old threshold meant any split verdict (PASS + CONCERNS) with <85% confidence on the concerns side triggered needs_human, blocking automatic follow-up issue creation. A 72% confidence concerns verdict (TMP #4894) is well above chance and should produce a follow-up rather than require manual triage. Both template and main copies updated; new regression tests added.
Three-layer fix for the systemic issue where setup-api-client's npm install
overwrites vendored minimatch package.json, and git add -A captures the
modification into bootstrap/autofix commits.
Layer 1 (source fix): setup-api-client/action.yml
- Snapshot vendored package.json files before npm install
- Restore them after npm install completes
- Applied to both .github/actions/ and templates/consumer-repo/
Layer 2 (targeted staging): reusable-agents-issue-bridge.yml
- Replace 'git add -A' with targeted 'git add agents/${AGENT}-${ISSUE}.md'
- Only the bootstrap file gets staged, not npm side-effects
Layer 3 (safety net): reusable-18-autofix.yml
- Add 'git reset HEAD -- .github/scripts/node_modules ...' after git add -A
- Matches existing pattern in reusable-codex-run.yml line 1184
- Applied to both push-commit and patch-commit paths
Also fixes test assertions that referenced the old CONCERNS_NEEDS_HUMAN_THRESHOLD
(was 0.85, now 0.50) — confidence values in tests updated accordingly.
Fixes: Copilot review finding on PAEM PR #1417 (minimatch vendoring cycle)
The needs_human gate was backwards: it fired when the CONCERNS provider had LOW confidence (LLM unsure there's a problem) instead of HIGH confidence (LLM confident there's a real problem). Confidence reflects the LLM's certainty in its own evaluation, not a measure of code quality. Low-confidence CONCERNS is a weak signal that shouldn't block follow-up automation. High-confidence CONCERNS is the stronger signal warranting human review. Changed: confidence_value < threshold → confidence_value >= threshold Threshold set to 0.85 (high bar — a human is already in the loop and depth-of-rounds provides an independent guard against runaway automation).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 089bc23fa9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Updates the Codex/keepalive workflow ecosystem to reduce noisy logs and prevent accidental staging of vendored node_modules, while aligning verdict-policy and capability-check behavior with updated automation expectations.
Changes:
- Flip split-verdict
needs_humanlogic to trigger only on high-confidence CONCERNS (>= 0.85), and update/extend tests accordingly. - Refine capability fallback detection to avoid blocking on negated “no secrets” mentions while still blocking explicit secrets-management actions.
- Reduce workflow/script log noise and harden workflows against
npm installside effects (newline-free outputs, targeted staging, and post-checkout API-client reinstall).
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_verdict_policy_integration.py | Updates integration expectations for split-verdict needs_human boundary and low-confidence behavior. |
| tests/test_verdict_policy.py | Adds/updates unit tests for the new needs_human threshold semantics and reasoning text. |
| tests/test_verdict_extract.py | Aligns extract output test to a high-confidence CONCERNS case that now triggers needs_human. |
| tests/test_followup_issue_generator.py | Updates follow-up labeling test to require high-confidence CONCERNS for needs-human labeling. |
| tests/scripts/test_capability_check.py | Adds regression tests for “no secrets” false positives and explicit secrets-management blocking. |
| scripts/langchain/verdict_policy.py | Changes split-verdict needs_human condition to >= threshold (high-confidence concerns). |
| scripts/langchain/capability_check.py | Narrows admin-access “secrets” regex matching to explicit secrets-management language. |
| templates/consumer-repo/scripts/langchain/capability_check.py | Mirrors capability-check regex changes for consumer templates. |
| .github/workflows/reusable-codex-run.yml | Prevents trailing newlines in extracted JSON/fields and downgrades an artifact-only commit message to notice. |
| .github/workflows/reusable-agents-issue-bridge.yml | Avoids git add -A to prevent staging unintended node_modules changes. |
| .github/workflows/reusable-18-autofix.yml | Unstages vendored node_modules paths before committing autofix results. |
| .github/workflows/health-75-api-rate-diagnostic.yml | Passes secrets/github_token into setup-api-client consistently across jobs. |
| .github/workflows/agents-keepalive-loop-reporter.yml | Expands sparse checkout to include .github/actions so the composite action is available. |
| .github/workflows/agents-72-codex-belt-worker.yml | Reinstalls API client deps after branch checkout; modernizes Python UTC timestamp formatting. |
| templates/consumer-repo/.github/workflows/agents-72-codex-belt-worker.yml | Mirrors belt-worker step + timestamp changes for consumer templates. |
| .github/scripts/keepalive_loop.js | Downgrades repo-variable access failure logging from info to debug. |
| templates/consumer-repo/.github/scripts/keepalive_loop.js | Mirrors keepalive logging change for consumer templates. |
| .github/scripts/error_classifier.js | Gates verbose classification log behind RUNNER_DEBUG=1. |
| templates/consumer-repo/.github/scripts/error_classifier.js | Mirrors error-classifier debug gating for consumer templates. |
| .github/actions/setup-api-client/action.yml | Snapshots/restores vendored package metadata around npm install to reduce vendored drift. |
| templates/consumer-repo/.github/actions/setup-api-client/action.yml | Mirrors setup action changes for consumer templates. |
|
Re: Codex review on |
- Relax verb-to-secret regex from \s+ to .{0,30} so phrases like
'Set repository secret TOKEN' and 'Update GitHub Actions secret FOO'
are correctly blocked even with intervening words (addresses Codex
inline review on capability_check.py L165)
- Add 2 regression tests for the above patterns
- Resolve merge conflicts in 4 test files (keep >= 0.85 threshold
logic; main had lowered to 0.50 with < direction)
- Restore CONCERNS_NEEDS_HUMAN_THRESHOLD = 0.85 (auto-merge picked up
main's 0.50 value but our >= comparison direction)
All 1907 tests pass.
Automated Status SummaryHead SHA: 605b2db
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeNo scope information available Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #1483 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
No description provided.