fix(workflows): pr-labels-ci.yml bidirectional CI ↔ label transitions (#378)#410
Conversation
Closes the stale-label trap where a PR sitting at `CI Failed` whose follow-up push made CI go green silently no-op'd and stayed at `CI Failed`. Root cause: `on-ci-pass`'s outer guard `if echo "$LABELS" | grep -q "^Awaiting CI$"` only fired on `Awaiting CI`, so the `CI Failed → Ready for QA` arc was missing. Symmetric fix on `on-ci-fail`: a PR at `Ready for QA` whose CI was re-run (manual re-trigger after a flake) and turned red kept the green label, hiding the regression from QA. `on-ci-fail` now also fires when `Ready for QA` is the current label, removing it and applying `CI Failed`. Both jobs gain an explicit short-circuit on the in-flight review labels `QA Active` / `Ready for QA Signoff` / `QA Approved` so the broadened triggers do not disrupt review-machine state — review advances independently of CI re-runs, and a re-run's success or failure on those states is visible via the check itself rather than a label transition. `Dev Active` short-circuit preserved unchanged. No new contributor-controlled inputs introduced. Label list still read via `gh pr view --json labels` (repo-owned strings, not fork-controlled). Anchored grep patterns (`^Label$`) stay anchored. Existing env-routing of `HEAD_BRANCH` / `RUN_ID` / `PR` / `REPO` (hardened in #332/#333) unchanged. Closes #378. Verification is intrinsically post-merge — `workflow_run` triggers always run from the default branch, so changes to `pr-labels-ci.yml` cannot be exercised on the PR that introduces them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
cmeans
left a comment
There was a problem hiding this comment.
[QA] Round 1 — Ready for QA Signoff. Zero findings.
Workflow-YAML-only change closing #378. 2 files / +55/-9. All 17 status checks green.
State-machine verification
I walked the pr-labels-ci.yml head-to-toe against the PR-body trace table — every row is correct. Summary:
| Pre-state | CI=success | CI=failure |
|---|---|---|
Awaiting CI |
→ Ready for QA (REMOVE="Awaiting CI") | → CI Failed (REMOVE="Awaiting CI") |
CI Failed |
→ Ready for QA (REMOVE="CI Failed") NEW — closes #378 | stays CI Failed (REMOVE empty, no-op) |
Ready for QA |
stays Ready for QA (REMOVE empty, no-op) | → CI Failed (REMOVE="Ready for QA") NEW — symmetric fix |
Awaiting CI + CI Failed |
→ Ready for QA (REMOVE="Awaiting CI,CI Failed") | (n/a) |
Awaiting CI + Ready for QA |
(n/a) | → CI Failed (REMOVE="Awaiting CI,Ready for QA") |
Dev Active |
exits at first guard | exits at first guard |
QA Active / Ready for QA Signoff / QA Approved |
exits at new for-loop | exits at new for-loop |
| no labels / unrelated only | REMOVE empty → no-op | REMOVE empty → no-op |
The if [ -n "$REMOVE" ] wrapper correctly ties the remove + add together — no spurious Ready for QA/CI Failed adds when there's nothing to transition from.
Other checks
| Area | Verdict |
|---|---|
Anchored grep (^Label$) preserved everywhere |
✓ |
Env-routing of HEAD_BRANCH / RUN_ID / PR / REPO unchanged |
✓ |
| No new contributor-controlled inputs interpolated into shell | ✓ |
Workflow trigger remains workflow_run (base-branch context) |
✓ |
Dev Active short-circuit preserved unchanged |
✓ |
| CHANGELOG order: Changed → Fixed → Security (KaC v1.1.0) | ✓ |
Closes #378 linked correctly in both PR body and CHANGELOG entry |
✓ |
Idempotency on already-present Ready for QA (when Awaiting CI + Ready for QA co-exist on success) |
✓ — gh pr edit --add-label is idempotent |
Manual tests
- Workflow YAML parses cleanly. No parse-error annotations on
pr-labels-ci.yml. ✓ - Diff matches state-machine trace. Verified end-to-end above. ✓
- #409 migration live-validation — verified directly on this PR's status rollup:
pr-labels.ymlon-pushSUCCESS at 19:56 (run 25074527724) —Awaiting CIauto-applied to this PR on opening, no manual intervention.pr-labels.ymlon-labelSUCCESS — fired whenDev Activeremoved.pr-labels.ymlon-unlabelSUCCESS at 20:01 — fired in this current label transition cycle.qa-gate.ymlqa-approvedSUCCESS at 19:56 (3 invocations as labels evolved);QA Gatestatus context =PENDING(correct — current label isReady for QA, notQA Approved).- These end-to-end exercise the
pull_request_targettriggers from main, confirming #409's migration works on a non-bootstrap PR. ✓
- Post-merge bug-fix validation — deferred.
workflow_runalways runs from default branch, so this PR's changes cannot exercise themselves. Natural validation is the next CI-fail-then-pass cycle after merge.
Applying Ready for QA Signoff as the final act.
|
[QA] Applying Ready for QA Signoff — all 17 checks pass, zero findings on Round 1. State-machine trace walked end-to-end against |
Summary
Closes #378. Two stale-label traps in
pr-labels-ci.ymlfixed symmetrically; both rooted in narrow outer guards that only fired onAwaiting CI, missing the post-CI Failedrecovery arc and theReady for QA → CI Failedregression arc.on-ci-passAwaiting CIis presentAwaiting CIORCI Failedis presenton-ci-failCI Failedonly whenAwaiting CIis presentCI FailedwhenAwaiting CIORReady for QAis presentBug 1 —
CI Failed → CI passsilently no-ops (issue #378)Reproduction trail in #377 (2026-04-22): a lint-failing push moved labels to
CI Failed; the fix-up push made CI go green;on-ci-passfired and ran, but its outerif echo "$LABELS" | grep -q "^Awaiting CI$"was false (onlyCI Failedwas present), so it silently no-op'd. PR sat atCI Failedwhile CI was actually green. Required a manualgh pr edit --remove-label "CI Failed" --add-label "Ready for QA"to unstick.Bug 2 —
Ready for QA → CI re-failkeeps the green label (symmetric)Mirror trap on
on-ci-fail: a CI re-run on a PR sitting atReady for QA(e.g., manual re-trigger after a flake, or a workflow change forcing a re-run) that turns red leaves the PR labelledReady for QAbecause the outerif echo "$LABELS" | grep -q "^Awaiting CI$"is false. The status check goes red but the label still says ready — QA might pick it up assuming CI is green.Review-state preservation
Broadening the triggers introduces a new risk: if a
QA Active/Ready for QA Signoff/QA Approvedlabel coexists with a CI label (race, or manual mistake), the broader trigger could overwrite review-machine state withReady for QA(on pass) orCI Failed(on fail). To prevent that, both jobs now short-circuit explicitly when any of those three labels is present:Rationale: review state advances independently of CI re-runs. A passing or failing CI re-run on a PR that's already in QA review is visible via the check itself; the label transition would be redundant on success and destructive on failure.
Dev Activeshort-circuit preserved unchanged.Safety
workflow_run— base-branch context, immune to PR-branch edit attacks (same protection class as thepull_request_targetmigration in chore(sec): pr-labels.yml + qa-gate.yml → pull_request_target #409).gh pr view --json labels(repo-owned strings, not fork-controlled).^Label$) so labels likeAwaiting CI Failed(if one ever existed) cannot accidentally satisfy a^Awaiting CI$check.HEAD_BRANCH/RUN_ID/PR/REPO(hardened in bug: pr-labels-ci.yml is pre-hardening (shell-injection risk) and lacks the PR-#28 comment escape #332/fix(workflows): harden pr-labels-ci.yml against fork-PR shell injection (#332) #333) is unchanged. Nothing I add interpolates new contributor-controlled values into shell.State-machine trace (full)
Pre-state → CI conclusion → resulting transition (✓ = covered, ✗ = no-op, * = new):
Awaiting CIReady for QA✓CI Failed✓CI FailedReady for QA✓*CI Failed✓Ready for QAReady for QA✓CI Failed✓*Dev ActiveQA ActiveReady for QA SignoffQA ApprovedThe * entries are new in this PR. The
Dev Activeand "no pre-state" cases were already correct.Test plan
Workflow YAML only. No tests to add.
QA
Prerequisites
Manual tests
pr-labels-ci.yml..github/workflows/pr-labels-ci.ymlhead-to-toe; for each row of the trace, confirm the corresponding code path emits the expected transition (or skip).mainsince thepr-labels.yml/qa-gate.ymlmigration topull_request_target. Confirm:pr-labels.ymlon-pushfired on opening:Awaiting CIwas applied automatically (no manual addition required this time).qa-gate.ymlposted aQA Gatestatus on this PR's head SHA from app15368(GitHub Actions). Visible in the status-check rollup.workflow_runtriggers always run from the default branch (per theLIMITATIONcomment at the top ofpr-labels-ci.yml), so this PR's changes do not run on this PR. The natural validation is the next CI-fail-then-pass PR after this lands — when that happens, the PR should auto-promoteCI Failed → Ready for QAwithout manual intervention. Reviewer should add a follow-up note here (or in the awareness milestone for this PR) once that natural validation occurs.Out-of-scope follow-ups (not for this PR)
dismiss_stale_reviews_on_pushsetting interacts with these transitions in subtle ways (review approvals get auto-dismissed on push, then CI re-runs). No change proposed; just flagging for awareness.QA Invalidatedstyle label for the case where CI re-fails on a PR in QA review, but doing so requires designing the QA recovery path. Out of scope for pr-labels-ci.yml: handle CI Failed → CI pass transition (stale label trap) #378.