Skip to content

QVAC-18612 infra: gate every secret-bearing workflow with label-gate (re-land)#2023

Merged
Proletter merged 4 commits into
mainfrom
feat/QVAC-18612-label-gate-reland
May 14, 2026
Merged

QVAC-18612 infra: gate every secret-bearing workflow with label-gate (re-land)#2023
Proletter merged 4 commits into
mainfrom
feat/QVAC-18612-label-gate-reland

Conversation

@Proletter

@Proletter Proletter commented May 13, 2026

Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

The QVAC monorepo has ~60 workflows that consume secrets from PR-triggered events. Without an explicit label gate, anyone able to open a PR can trigger them. This re-lands the label-gate fan-out from QVAC-18612 to require a verified label applied by a trusted team member before any secret-bearing job runs.

The first attempt (PR #1997, reverted in #2019 on 2026-05-13) embedded the label-gate job inside reusable callees with pull-requests: write. GitHub's caller-cap rule caps a reusable's permissions at what the caller granted; almost every on-pr-*.yml caller scopes the call to pull-requests: read|none, so the workflow files failed validation before any secrets could be touched.

This PR re-architects the gating so the same validation never happens again.

How does it solve it?

  • Caller-gates-callee architecture. Reusable workflows (workflow_call invokees — integration-test-*.yml, cpp-lint.yaml, create-github-release-*.yml, etc.) are NOT modified. The label-gate job lives in the caller, where it has the pull-requests: write scope it needs and where the caller-cap rule never applies (in-workflow jobs aren't caller-capped). The caller adds an if: to the uses: invocation of every reusable that consumes secrets: when the gate denies, the entire reusable call is skipped, the callee runner never starts, and no secrets are exposed.
  • Standalone secret-bearing jobs (consume secrets directly, not via a reusable) get the same needs: [..., label-gate] + if: needs.label-gate.outputs.authorised == 'true' && (existing condition) treatment in the same workflow.
  • Tanstack-class supply-chain mitigation preserved. Every label-gate job pins actions/checkout to the default branch and sparse-checks-out only .github/actions/label-gate/, so a malicious PR cannot rewrite gate.mjs in its own branch to bypass the gate.
  • Exempt files left byte-identical to main. approval-worker.yml and approval-check-worker.yml are unchanged — gating them would create a deadlock (they ARE the approval mechanism). Composite actions (.github/actions/) and packages/ are also untouched. Pre-existing authorize-pr jobs are preserved alongside label-gate as belt-and-suspenders; their removal is a follow-up after a soak period.

Release-environment audit (covers full release-env surface)

77 workflows reference environment: release. Verified breakdown:

Category Count Status
Pure callers with PR trigger 34 All have inline label-gate job
workflow_call-only callees, called by gated callers 7 Gated by caller's if:
Mixed (workflow_call + workflow_dispatch/push/etc., no PR trigger) 30 Protected — workflow_dispatch requires repo-write; workflow_call gated by caller
Mixed with PR trigger + own label-gate 5 Self-gated
on-pr-close-*public-delete-npm-versions.yml 11 Now gated in commit 96031000 (this commit)

Commit 96031000 adds label-gate to the 10 remaining on-pr-close-*.yml workflows that fire on pull_request: types: [closed] and call public-delete-npm-versions.yml. (on-pr-close-translation-nmtcpp.yml is intentionally untouched — it has only workflow_dispatch, no pull_request trigger.) Without this gating, dropping the release-env approval would have left an unauthenticated path from any internal PR-close to npm version deletion in the release env.

How was it tested?

Local invariants verified post-rewrite:

69 workflow files modified — only .github/workflows/, no actions, no packages
55 reusable callees checked: 0 contain a label-gate job (caller-cap risk = 0)
77/77 release-env workflows: every PR-reachable path is now gated
.github/workflows/approval-worker.yml         IDENTICAL to main
.github/workflows/approval-check-worker.yml   IDENTICAL to main
yamlfmt v0.17.0 -formatter retain_line_breaks_single=true on changed
  workflow files: tree clean

End-to-end against real org teams in tetherto/qvac-internal#12 (qvac-internal-{dev,merge,release} teams) and the public mirror at Olutest/qvac-tests. The validation harness mirrors every pattern in this PR — most importantly the row that broke #1997:

# Scenario Run Result
1 pull_request opened, no verified label (pattern A — self-contained) 25792556959 authorised=false — gated job skipped
2 pull_request opened, no verified label (pattern B — caller-gates-callee) 25792557252 authorised=false, both caller AND callee skipped, no caller-cap error
3 verified added by trusted team member (pattern A) 25792615233 authorised=true (member of 'tetherto/qvac-internal-dev'), marker emitted
4 verified added by trusted team member (pattern B — the #1997 failure mode) 25792615205 Caller authorised; callee gated if: ran with no pull-requests: write; NO caller-cap validation error; marker emitted from inside the reusable
5 verified added by untrusted user — strip behaviour 25792768466 authorised=false (non-trusted 'Proletter' applied 'verified' — label stripped); PR labels = [] post-run
6 workflow_dispatch (trusted event) — pattern A 25792562364 authorised=true (trusted event source)
7 workflow_dispatch — pattern B 25792564424 Caller authorised + callee ran
8 push to branch 25792554793 authorised=true (push)

What's in the diff

  • 2 commits:
    • ec975f81 QVAC-18612 infra: gate every secret-bearing workflow with label-gate (59 files)
    • 96031000 QVAC-18612 infra: gate on-pr-close-* workflows with label-gate (10 files)
  • 69 files total, all under .github/workflows/
  • Zero changes to .github/actions/
  • Zero changes to packages/
  • approval-worker.yml + approval-check-worker.yml unchanged

Known CI red

sanity-checks (invoked by ONNX/whispercpp on-pr workflows) runs yamlfmt v0.17.0 across the whole repo and fails the build if anything is dirty. main itself currently has yamlfmt drift across 21 composite-action files and 4 packages/transcription-whispercpp/ config files (never tripped on main because main's sanity-checks only runs when a PR touches an ONNX path). That drift is not introduced by this PR but the check sees it on every PR that triggers ONNX/whispercpp sanity-checks. Cleaning it up requires either a separate "yamlfmt main" PR or a .yamlfmt config to scope the formatter — both deliberately out of scope for this PR per review feedback.

Refs: QVAC-18612.

@kinsta

kinsta Bot commented May 13, 2026

Copy link
Copy Markdown

Preview deployments for qvac-docs-staging ⚡️

Status Branch preview Commit preview
✅ Ready Visit preview Visit preview

Commit: 3c3c72e8089b3e98478674bdc7a183a5ff4b6242

Deployment ID: 7e424908-7bab-4efb-98b7-023d1e6e14dd

Static site name: qvac-docs-staging-fazwv

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (2/1)

**Bypass rule:** Triggered (2+ Team Lead approvals (Tier 1 exception)). This PR is approved regardless of tier.

---
*This comment is automatically updated when reviews change.*

Comment thread .github/actions/run-lint-and-unit-tests/action.yaml Fixed
Comment thread .github/actions/run-lint-and-unit-tests/action.yaml Fixed
Comment thread .github/actions/run-lint-and-unit-tests/action.yaml Fixed
Comment thread .github/actions/run-lint-and-unit-tests/action.yaml Fixed
Proletter and others added 3 commits May 13, 2026 15:02
Re-land of the label-gate fan-out after PR #1997 was reverted on
2026-05-13 (commit 919850c). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR #1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR #1971 / #1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR #1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>
Closes a release-env exposure surfaced when auditing #2023:
public-delete-npm-versions.yml (environment: release, packages: write)
is invoked by 12 on-pr-close-* workflows, but only embed-llamacpp had
label-gate. The other 10 fire on `pull_request: types: [closed]` and
reach the release env without authorisation.

This is currently held back only by the manual approval on the release
environment. Once that approval is dropped (the goal of QVAC-18612), the
label-gate becomes the sole control. This commit makes label-gate that
control everywhere.

Pattern is identical to on-pr-close-embed-llamacpp.yml (already on this
branch): inline label-gate job (caller side) + needs/if on the
delete-npm-versions-trigger reusable call. Reusable callee
(public-delete-npm-versions.yml) is unchanged.

on-pr-close-translation-nmtcpp.yml deliberately not modified - it has
only workflow_dispatch (no pull_request trigger) and is intrinsically
gated by repo-write access.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 8375d28

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 8375d28

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 8375d28

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 8375d28

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 8f25f3e

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 8f25f3e

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 8f25f3e

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 8f25f3e

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 9db6f98

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 9db6f98

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 9db6f98

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: 9db6f98

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: bff03c6

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: bff03c6

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - iOS

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: iOS
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: fe9ad87

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

@github-actions

Copy link
Copy Markdown
Contributor

❌ E2E Mobile Test Results - Android

Overall Status: FAILED
Device Farm Result: UNKNOWN
Platform: Android
Addon: @qvac/translation-nmtcpp
PR: #2023
Commit: fe9ad87

Test Summary

Metric Count
Total Tests 0
✅ Passed 0
❌ Failed 0
⏭️ Skipped 0

Links


Automated E2E mobile testing powered by AWS Device Farm
Tests located in: test/mobile/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants