Skip to content

fix(autofix): prevent clean mode label race condition#274

Merged
stranske merged 1 commit intomainfrom
fix/autofix-clean-label-race
Dec 29, 2025
Merged

fix(autofix): prevent clean mode label race condition#274
stranske merged 1 commit intomainfrom
fix/autofix-clean-label-race

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Dec 29, 2025

Automated Status Summary

Scope

  • Context / problem:
  • - Current orchestration depends on PATs and/or mixed identities, which is fragile and painful to maintain.
  • - GitHub Actions has recursion protection: pushes/labels/comments made with GITHUB_TOKEN generally will NOT trigger other workflows.
  • - A GitHub App installation token is the cleanest way to get predictable “workflow triggers workflow” behavior without tying everything to a human PAT.
  • Goal:
  • - Create a GitHub App (single org/user app) that can be installed on your repos.
  • - Mint short-lived installation tokens inside workflows.
  • - Replace all PAT usage in orchestrator + keepalive + dispatch workflows with the App token.

Tasks

  • Create GitHub App (UI, not code): name it "agents-workflows-bot" (or similar)
  • Set App permissions (minimal but sufficient):
  • Contents: Read & write
  • Pull requests: Read & write
  • Issues: Read & write
  • Actions: Read & write (for dispatching / reading runs)
  • Metadata: Read-only
  • Install the App on: Workflows, Workflows-Integration-Tests, Travel-Plan-Permission, Portable-Alpha-Extension-Model, Trend_Model_Project
  • Add secrets to Workflows repo (or org secrets):
  • WORKFLOWS_APP_ID
  • WORKFLOWS_APP_PRIVATE_KEY (the PEM contents)
  • Update all workflows that currently use PATs to:
  • mint app token
  • export GH_TOKEN to that token
  • (optional) checkout using that token so git push is clean
  • Add a “compat mode” fallback (temporarily) so you can flip back to PAT if needed during rollout

Acceptance criteria

  • - No workflow in Workflows repo requires a PAT for:
  • - labeling PRs/issues
  • - creating comments
  • - pushing commits to PR branches
  • - dispatching workflows
  • - A commit pushed by the bot identity reliably triggers the Gate workflow (no “dead loop”).
  • - Secrets inventory is reduced: only App ID + private key (and OPENAI_API_KEY) are required for the automation system.
  • Rollout / safety:
  • - Roll out in Workflows-Integration-Tests first, then Workflows, then consumer repos.
  • - Add CODEOWNERS for .github/workflows/** and .github/scripts/** so this can’t get silently corrupted later.
  • Head SHA: cee2332
  • Latest Runs: ✅ success — Gate
  • Required: gate: ✅ success
  • | Workflow / Job | Result | Logs |
  • |----------------|--------|------|
  • | Agents PR meta manager | ❔ in progress | View run |
  • | CI Autofix Loop | ✅ success | View run |
  • | Copilot code review | ❔ in progress | View run |
  • | Gate | ✅ success | View run |
  • | Health 40 Sweep | ✅ success | View run |
  • | Health 44 Gate Branch Protection | ❌ failure | View run |
  • | Health 45 Agents Guard | ✅ success | View run |
  • | Health 50 Security Scan | ✅ success | View run |
  • | Maint 52 Validate Workflows | ✅ success | View run |
  • | PR 11 - Minimal invariant CI | ✅ success | View run |
  • | Selftest CI | ✅ success | View run |

Head SHA: 12aca2c
Latest Runs: ⏳ queued — Gate
Required: gate: ⏳ queued

Workflow / Job Result Logs
Agents PR meta manager ❔ in progress View run
CI Autofix Loop ✅ success View run
Copilot code review ❔ in progress View run
Gate ⏳ queued View run
Health 40 Sweep ❔ in progress View run
Health 44 Gate Branch Protection ❔ in progress View run
Health 45 Agents Guard ✅ success View run
Health 50 Security Scan ❔ in progress View run
Maint 52 Validate Workflows ✅ success View run
PR 11 - Minimal invariant CI ✅ success View run
Selftest CI ❔ in progress View run
Validate Sync Manifest ✅ success View run

The 'Ensure autofix label present' step was adding the opt_in_label
(default: autofix:clean) unconditionally. Since this label is the same as
clean_label, it caused ALL autofix runs to be in clean mode, even for PRs
with non-cosmetic failures like mypy errors or test failures.

Changes:
1. Skip adding opt_in_label when it equals clean_label (the clean mode flag
   should only be added by gate workflow when cosmetic-only failure detected)
2. Change default opt_in_label from 'autofix:clean' to 'autofix' to make
   the distinction clear between trigger labels and mode flags

This fixes the race condition where autofix runs in clean mode before gate
can determine if the failure is truly cosmetic.

Fixes: autofix:clean label being added before CI completes
Copilot AI review requested due to automatic review settings December 29, 2025 05:48
@stranske stranske temporarily deployed to agent-high-privilege December 29, 2025 05:49 — with GitHub Actions Inactive
@agents-workflows-bot
Copy link
Copy Markdown
Contributor

⚠️ Action Required: Unable to determine source issue for PR #274. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

@stranske stranske enabled auto-merge (squash) December 29, 2025 05:50
@github-actions
Copy link
Copy Markdown
Contributor

Automated Status Summary

Head SHA: 3a9c55b
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 78.63%
Baseline 0.00%
Delta +78.63%
Minimum 70.00%
Status ✅ Pass

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

No scope information available

Tasks

  • No tasks defined

Acceptance criteria

  • No acceptance criteria defined

@stranske stranske merged commit ed543da into main Dec 29, 2025
55 checks passed
@stranske stranske deleted the fix/autofix-clean-label-race branch December 29, 2025 05:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition where the autofix workflow was incorrectly entering clean mode for all runs, including those with non-cosmetic failures like mypy errors or test failures. The root cause was that the workflow was adding the autofix:clean label (which triggers clean mode) before the gate workflow could determine if the failure was actually cosmetic-only.

Key Changes:

  • Changed default opt_in_label from autofix:clean to autofix to distinguish the opt-in trigger label from the clean mode flag
  • Added conditional logic to skip adding the opt-in label when it equals the clean mode label, preventing premature clean mode activation
  • Updated input description to clarify that opt_in_label should differ from clean_label

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants