Skip to content

fix(keepalive): remove preflight dependency from run-codex job#109

Merged
stranske merged 2 commits intomainfrom
fix-keepalive-codex-job-v2
Dec 24, 2025
Merged

fix(keepalive): remove preflight dependency from run-codex job#109
stranske merged 2 commits intomainfrom
fix-keepalive-codex-job-v2

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Dec 24, 2025

Problem

Two issues preventing Codex from running:

1. Keepalive Loop: run-codex job not appearing

The run-codex job (calling reusable-codex-run.yml) was not appearing in workflow runs even when action == 'run' and secrets_ok == 'true'.

Root Cause: PR #107 added a preflight job dependency with an output-based condition:

if: needs.evaluate.outputs.action == 'run' && needs.preflight.outputs.secrets_ok == 'true'

This caused GitHub Actions to not create the reusable workflow job at all.

Evidence:

2. Autofix Loop: startup_failure on every run

The autofix loop has had startup_failure on every single run since it was created.

Root Cause: Permission mismatch - the autofix loop had actions: read permission but calls reusable-codex-run.yml which declares actions: write.

Solutions

Keepalive Loop Fix

Remove the preflight dependency from run-codex:

  • Keep preflight job running (useful for debugging/logging)
  • Let run-codex depend only on evaluate
  • Secret validation happens inside the reusable workflow itself

Autofix Loop Fix

Change actions: read to actions: write to match the reusable workflow.

Testing

After merge:

  1. Trigger the keepalive loop on PR chore(codex): bootstrap PR for issue #101 #103 (which has agent:codex label). The Keepalive next task job should now appear.
  2. Trigger Gate completion on any PR - the autofix loop should no longer have startup_failure.

Automated Status Summary

Scope

  • Scope section missing from source issue.

Tasks

  • Restrict triggers:
  • do not run agent workflows on forked PRs
  • avoid pull_request_target unless absolutely necessary
  • Ensure prompts are repo-owned:
  • use prompt-file from .github/codex/prompts/
  • build a small “context appendix” file that includes sanitized task text
  • Add allowlists:
  • allow-users / allow-bots in codex-action config
  • only repo collaborators can trigger
  • Add denylist behaviors:
  • Codex should not edit .github/workflows/** unless a special environment-approved mode is enabled
  • Codex should not touch secrets or tokens (explicit instruction + sandbox limits)
  • Add logging + red flags:
  • if prompt contains “ignore previous”, HTML comments, base64 blobs, etc, stop and require human

Acceptance criteria

  • - Malicious-looking issue text does not get passed verbatim into Codex execution.
  • - Agent workflows only run for trusted actors and trusted events.
  • Head SHA: 58da3fb
  • Latest Runs: ✅ success — Gate
  • Required: gate: ✅ success
  • | Workflow / Job | Result | Logs |
  • |----------------|--------|------|
  • | Agents PR meta manager | ❔ in progress | View run |
  • | CI Autofix Loop | ✅ success | View run |
  • | Gate | ✅ success | View run |
  • | Health 40 Sweep | ✅ success | View run |
  • | Health 44 Gate Branch Protection | ✅ success | View run |
  • | Health 45 Agents Guard | ✅ success | View run |
  • | Health 50 Security Scan | ✅ success | View run |
  • | Maint 52 Validate Workflows | ✅ success | View run |
  • | PR 11 - Minimal invariant CI | ✅ success | View run |
  • | Selftest CI | ✅ success | View run |

Head SHA: ea11578
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job Result Logs
Agents PR meta manager ❔ in progress View run
CI Autofix Loop ✅ success View run
Gate ✅ success View run
Health 40 Sweep ✅ success View run
Health 44 Gate Branch Protection ✅ success View run
Health 45 Agents Guard ✅ success View run
Health 50 Security Scan ✅ success View run
Maint 52 Validate Workflows ✅ success View run
PR 11 - Minimal invariant CI ✅ success View run
Selftest CI ✅ success View run

The run-codex job was not appearing in workflow runs because adding
preflight as a dependency with an output-based condition caused GitHub
Actions to not create the reusable workflow job.

This is a GitHub Actions behavior: reusable workflow jobs with complex
dependency chains involving job outputs may not be created at all.

Solution: Remove the preflight dependency from run-codex while keeping
preflight for informational purposes in the summary job. Secret
validation happens inside the reusable workflow itself.
Copilot AI review requested due to automatic review settings December 24, 2025 13:12
@stranske stranske temporarily deployed to agent-high-privilege December 24, 2025 13:13 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 24, 2025

Automated Status Summary

Head SHA: 36e9250
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 77.97%
Baseline 0.00%
Delta +77.97%
Minimum 70.00%
Status ✅ Pass

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

  • Scope section missing from source issue.

Tasks

  • Restrict triggers:
  • do not run agent workflows on forked PRs
  • avoid pull_request_target unless absolutely necessary
  • Ensure prompts are repo-owned:
  • use prompt-file from .github/codex/prompts/
  • build a small “context appendix” file that includes sanitized task text
  • Add allowlists:
  • allow-users / allow-bots in codex-action config
  • only repo collaborators can trigger
  • Add denylist behaviors:
  • Codex should not edit .github/workflows/** unless a special environment-approved mode is enabled
  • Codex should not touch secrets or tokens (explicit instruction + sandbox limits)
  • Add logging + red flags:
  • if prompt contains “ignore previous”, HTML comments, base64 blobs, etc, stop and require human

Acceptance criteria

  • - Malicious-looking issue text does not get passed verbatim into Codex execution.
  • - Agent workflows only run for trusted actors and trusted events.
  • [ ]

The autofix loop workflow had 'actions: read' permission but the
reusable-codex-run.yml workflow declares 'actions: write'. This
permission mismatch may have been causing the workflow to fail with
startup_failure.

Changed to 'actions: write' to match the reusable workflow and the
keepalive loop which works correctly.
@stranske stranske temporarily deployed to agent-high-privilege December 24, 2025 13:15 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where the run-codex job (calling the reusable-codex-run.yml workflow) was not appearing in workflow runs due to a complex dependency chain introduced in PR #107. The fix removes the preflight job from the run-codex dependencies and simplifies the conditional to only check the evaluate job's output.

Key changes:

  • Removes preflight from run-codex job dependencies
  • Simplifies the conditional from checking both evaluate.outputs.action == 'run' AND preflight.outputs.secrets_ok == 'true' to only checking evaluate.outputs.action == 'run'
  • Aligns the keepalive workflow pattern with the autofix workflow, which also calls reusable-codex-run.yml directly without a preflight check

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- evaluate
- preflight
if: needs.evaluate.outputs.action == 'run' && needs.preflight.outputs.secrets_ok == 'true'
if: needs.evaluate.outputs.action == 'run'
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a validation inconsistency to be aware of: the preflight job validates that either CODEX_AUTH_JSON or WORKFLOWS_APP_ID is present (line 142 in the full file), but the reusable workflow strictly requires CODEX_AUTH_JSON and will fail if it's missing (see reusable-codex-run.yml:198-201).

This means if only WORKFLOWS_APP_ID credentials are configured, the preflight job would pass, but run-codex will fail later with "CODEX_AUTH_JSON secret is not set or empty."

Since this change removes the preflight dependency, the job will now run and fail during execution rather than being skipped upfront. Consider either:

  1. Updating preflight validation logic to match the actual requirements (require CODEX_AUTH_JSON)
  2. Removing the preflight job entirely if it's no longer serving a useful purpose

Note: This is a pre-existing inconsistency that becomes more visible with this change, not a new issue introduced by this PR.

Copilot uses AI. Check for mistakes.
@stranske stranske merged commit d3cb940 into main Dec 24, 2025
310 checks passed
@stranske stranske deleted the fix-keepalive-codex-job-v2 branch December 24, 2025 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants