Conversation
Essential fixes: - Reporter sparse-checkout: add .github/actions to checkout so setup-api-client action is available (was failing 100% on Workflows repo) - Belt Worker: re-install API client after branch checkout wipes node_modules (was causing @octokit/rest import failures and degraded token rotation) High-value fixes: - LLM analysis outputs: use print(..., end='') to strip trailing newlines from python extraction (confidence values had '\n' suffix e.g. '0.63\n') - Repo variables fetch: downgrade from core.info to core.debug since the token permission limitation is known and the fallback to defaults works correctly Medium fixes: - Health 75 API Rate Diagnostic: pass secrets to 4 setup-api-client calls that were missing the input, causing 'No tokens were exported' warnings - datetime.utcnow(): replace deprecated calls with timezone-aware alternative in both Belt Worker ledger functions Low-salience fixes: - error_classifier: gate entry log behind RUNNER_DEBUG to reduce log noise - Non-artifact commit warning: downgrade from warning to notice since it is expected behavior when Codex produces only workflow artifacts
1. Use .belt-tools action path instead of ./ for setup-api-client after branch checkout, so the action runs from trusted Workflows code rather than the untrusted issue branch (security fix). 2. Pass GH_BELT_TOKEN || github.token as github_token input to preserve the belt token selection instead of overriding GITHUB_TOKEN/GH_TOKEN with the default workflow token.
…eshold Two independent fixes for broken automation flows: 1. capability_check.py: The bare \bsecrets?\b regex matched negative mentions like 'no secrets' in issue constraint text, causing _requires_admin_access() to return true and the fallback classifier to BLOCK tasks that merely *describe* a no-secrets constraint. Replace with specific verb+secrets patterns (manage/configure/set/ create/update/delete/add/modify/rotate secrets). Root cause of PAEM #1403 false-positive BLOCKED. 2. verdict_policy.py: CONCERNS_NEEDS_HUMAN_THRESHOLD lowered from 0.85 to 0.50. The old threshold meant any split verdict (PASS + CONCERNS) with <85% confidence on the concerns side triggered needs_human, blocking automatic follow-up issue creation. A 72% confidence concerns verdict (TMP #4894) is well above chance and should produce a follow-up rather than require manual triage. Both template and main copies updated; new regression tests added.
Three-layer fix for the systemic issue where setup-api-client's npm install
overwrites vendored minimatch package.json, and git add -A captures the
modification into bootstrap/autofix commits.
Layer 1 (source fix): setup-api-client/action.yml
- Snapshot vendored package.json files before npm install
- Restore them after npm install completes
- Applied to both .github/actions/ and templates/consumer-repo/
Layer 2 (targeted staging): reusable-agents-issue-bridge.yml
- Replace 'git add -A' with targeted 'git add agents/${AGENT}-${ISSUE}.md'
- Only the bootstrap file gets staged, not npm side-effects
Layer 3 (safety net): reusable-18-autofix.yml
- Add 'git reset HEAD -- .github/scripts/node_modules ...' after git add -A
- Matches existing pattern in reusable-codex-run.yml line 1184
- Applied to both push-commit and patch-commit paths
Also fixes test assertions that referenced the old CONCERNS_NEEDS_HUMAN_THRESHOLD
(was 0.85, now 0.50) — confidence values in tests updated accordingly.
Fixes: Copilot review finding on PAEM PR #1417 (minimatch vendoring cycle)
The needs_human gate was backwards: it fired when the CONCERNS provider had LOW confidence (LLM unsure there's a problem) instead of HIGH confidence (LLM confident there's a real problem). Confidence reflects the LLM's certainty in its own evaluation, not a measure of code quality. Low-confidence CONCERNS is a weak signal that shouldn't block follow-up automation. High-confidence CONCERNS is the stronger signal warranting human review. Changed: confidence_value < threshold → confidence_value >= threshold Threshold set to 0.85 (high bar — a human is already in the loop and depth-of-rounds provides an independent guard against runaway automation).
- Relax verb-to-secret regex from \s+ to .{0,30} so phrases like
'Set repository secret TOKEN' and 'Update GitHub Actions secret FOO'
are correctly blocked even with intervening words (addresses Codex
inline review on capability_check.py L165)
- Add 2 regression tests for the above patterns
- Resolve merge conflicts in 4 test files (keep >= 0.85 threshold
logic; main had lowered to 0.50 with < direction)
- Restore CONCERNS_NEEDS_HUMAN_THRESHOLD = 0.85 (auto-merge picked up
main's 0.50 value but our >= comparison direction)
All 1907 tests pass.
…s, task-focused prompts, PR meta debounce - ledger_migrate_base.py: skip corrupt YAML files instead of blocking all belt worker runs (root cause of issue #1418 stall) - agents-autofix-loop: reduce max_attempts 3→2 (standard) and 2→1 (escalated) to cut autofix churn observed in PR #4906 - agents-72-codex-belt-worker: emit task_title output and include task-focused directive in activation comment for higher first-commit success rate - agents-pr-meta: add PR-number concurrency grouping with cancel-in-progress for pull_request events to debounce redundant runs - All template counterparts updated in sync - 2 new tests for corrupt ledger handling
Automated Status SummaryHead SHA: e896d7e
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeNo scope information available Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #1484 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 371370e04b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Updates automation/workflow behavior to reduce noisy vendored node_modules diffs, tighten “needs human” gating to high-confidence split verdicts, and harden capability/admin checks and ledger migration behavior.
Changes:
- Reworked verdict policy so split PASS+CONCERNS only triggers
needs_humanwhen CONCERNS confidence is high (>= 0.85), and updated tests accordingly. - Improved admin capability fallback detection for “set/update … secret(s)” phrasing and added regression tests.
- Reduced unintended
node_modulesstaging/commits in automation workflows; added ledger migration resilience for corrupt YAML ledgers; refined keepalive/meta workflow concurrency behavior.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/langchain/verdict_policy.py | Switch split-verdict needs_human gating to high-confidence threshold (>= 0.85). |
| tests/test_verdict_policy.py | Align unit tests and messaging with updated threshold semantics. |
| tests/test_verdict_policy_integration.py | Align integration parity tests for workflow vs follow-up verdict handling. |
| tests/test_verdict_extract.py | Update expectations for needs_human with higher CONCERNS confidence. |
| tests/test_followup_issue_generator.py | Update follow-up labeling test to reflect high-confidence needs-human. |
| scripts/langchain/capability_check.py | Broaden admin-required regexes to catch “verb … secret(s)” with intervening words. |
| templates/consumer-repo/scripts/langchain/capability_check.py | Mirror capability-check regex changes in consumer template. |
| tests/scripts/test_capability_check.py | Add regression tests for “set repository secret” / “update actions secret” fallback blocking. |
| scripts/ledger_migrate_base.py | Skip corrupt ledgers (YAML/Migration errors) while continuing processing; summarize skips. |
| tests/scripts/test_ledger_migrate_base.py | Add coverage for skipping corrupt ledgers in normal and --check modes. |
| .github/actions/setup-api-client/action.yml | Snapshot/restore vendored node_modules package metadata around npm install. |
| templates/consumer-repo/.github/actions/setup-api-client/action.yml | Mirror setup-api-client vendored metadata snapshot/restore in template. |
| .github/workflows/reusable-agents-issue-bridge.yml | Avoid staging unintended files by adding only the bootstrap markdown file. |
| .github/workflows/reusable-18-autofix.yml | Unstage vendored node_modules paths before committing autofix results. |
| .github/workflows/agents-pr-meta-v4.yml | Concurrency group now distinguishes PR events; cancel-in-progress for PR runs. |
| templates/consumer-repo/.github/workflows/agents-pr-meta.yml | Same concurrency refinement for the consumer thin-caller workflow. |
| .github/workflows/agents-autofix-loop.yml | Reduce max attempts; cap escalated PR attempts to 1. |
| templates/consumer-repo/.github/workflows/agents-autofix-loop.yml | Mirror autofix-loop attempt reductions in template. |
| .github/workflows/agents-72-codex-belt-worker.yml | Emit task_title output and include a narrowed “focus task” directive in activation comment. |
| templates/consumer-repo/.github/workflows/agents-72-codex-belt-worker.yml | Mirror belt-worker task_title output and activation comment directive in template. |
| with open(gh_output, 'a', encoding='utf-8') as handle: | ||
| handle.write(f"task_id={start_info['task']['id'] if start_info['task'] else ''}\n") | ||
| handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n") | ||
| handle.write(f"task_status={start_info['task']['current_status'] if start_info['task'] else ''}\n") |
There was a problem hiding this comment.
Writing task_title to $GITHUB_OUTPUT as a single key=value line will break output parsing if a ledger task title contains a newline (or other characters that require the multiline output format). Consider sanitizing task_title (e.g., replace newlines) or emitting it using the official multiline <<DELIM syntax so any title is safe.
| with open(gh_output, 'a', encoding='utf-8') as handle: | |
| handle.write(f"task_id={start_info['task']['id'] if start_info['task'] else ''}\n") | |
| handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n") | |
| handle.write(f"task_status={start_info['task']['current_status'] if start_info['task'] else ''}\n") | |
| # Prepare safe, single-line task fields for GitHub outputs | |
| task = start_info['task'] or {} | |
| task_id = task.get('id', '') if task else '' | |
| task_title = task.get('title', '') if task else '' | |
| # Sanitize title to avoid breaking GITHUB_OUTPUT parsing | |
| task_title = task_title.replace('\r', ' ').replace('\n', ' ') | |
| task_status = task.get('current_status', '') if task else '' | |
| with open(gh_output, 'a', encoding='utf-8') as handle: | |
| handle.write(f"task_id={task_id}\n") | |
| handle.write(f"task_title={task_title}\n") | |
| handle.write(f"task_status={task_status}\n") |
| if gh_output: | ||
| with open(gh_output, 'a', encoding='utf-8') as handle: | ||
| handle.write(f"task_id={start_info['task']['id'] if start_info['task'] else ''}\n") | ||
| handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n") |
There was a problem hiding this comment.
Writing task_title to $GITHUB_OUTPUT as a single key=value line will break output parsing if a ledger task title contains a newline (or other characters that require the multiline output format). Consider sanitizing task_title (e.g., replace newlines) or emitting it using the official multiline <<DELIM syntax so any title is safe.
| handle.write(f"task_title={start_info['task']['title'] if start_info['task'] else ''}\n") | |
| title = start_info['task']['title'] if start_info['task'] else '' | |
| handle.write("task_title<<TASK_TITLE_EOF\n") | |
| handle.write(f"{title}\n") | |
| handle.write("TASK_TITLE_EOF\n") |
|
Autofix updated these files:
|
…tations Address inline review feedback on PR #1484: - Sanitize task_title by replacing newlines/carriage returns with spaces before writing to $GITHUB_OUTPUT (prevents broken output parsing) - Normalize yaml.YAMLError messages to single-line in ::warning:: annotations (prevents malformed GitHub Actions annotations) - Both belt-worker copies updated in sync
No description provided.