Skip to content

QVAC-18612 infra: repurpose vulkaninfo as label-gate safety canary#1971

Closed
Proletter wants to merge 4 commits into
mainfrom
feat/QVAC-18612-label-gate-safety-test-vulkaninfo
Closed

QVAC-18612 infra: repurpose vulkaninfo as label-gate safety canary#1971
Proletter wants to merge 4 commits into
mainfrom
feat/QVAC-18612-label-gate-safety-test-vulkaninfo

Conversation

@Proletter

Copy link
Copy Markdown
Collaborator

🎯 What problem does this PR solve?

  • QVAC-18612 will fan out the now-merged label-gate composite action (QVAC-18608, QVAC-18608 infra: add .github/actions/label-gate (Node 20) #1968) across ~75 secret-bearing workflows in tetherto/qvac. A bad rollout would red-X every PR overnight, so we need real-CI evidence that the action loads, gates correctly on PR events, and fails closed in the safe direction before touching the rest of the repo.

  • The local node --test suite + e2e smokes from QVAC-18608 infra: add .github/actions/label-gate (Node 20) #1968 verify the action's logic in isolation, but they don't prove the action actually loads under using: node20 on a real GitHub-hosted runner inside a real PR-event payload. This PR closes that gap.

📝 How does it solve it?

  • Temporarily repurposes the smallest, most isolated workflow in the repo as a single-PR canary. vulkaninfo.yml is manual-only (workflow_dispatch), runs only on a Windows GPU runner, holds no secrets, and has no production criticality — repurposing it cannot break anything else.

  • Adds a top-of-graph label-gate job and a downstream would-run-with-secrets stand-in that gates on if: needs.label-gate.outputs.authorised == 'true'. The stand-in just echoes the gate's outputs; no real secret access happens during the canary.

  • Configures the gate with users: Proletter, teams: "". This exercises the action end-to-end without requiring read:org (no team-membership API calls). It deliberately decouples the safety test from the PAT_TOKEN-scope prerequisite that's outstanding for the production rollout.

  • Trigger is pull_request (not pull_request_target). For pull_request_target GitHub uses the workflow file from the base branch at event time, which would mean our modified file never actually runs from this PR. pull_request uses the PR head's file. Long-term label-gate is intended for pull_request_target; this safety test only proves out the action mechanics.

  • paths: filter scopes the trigger to vulkaninfo.yml and label-gate/** so the canary doesn't fire on unrelated PRs.

  • Original vulkaninfo content is preserved verbatim in a comment block at the bottom of the file for trivial restoration. This PR is intended to be closed without merging once the test completes.

  • Base is main (the action is on main as of QVAC-18608 infra: add .github/actions/label-gate (Node 20) #1968).

🧪 How was it tested?

The action's logic is already covered by 44/44 node:test unit tests + e2e smokes in PR #1968 (CodeQL Actions security scan green there too). This PR adds live-CI observation. Each scenario will be checked off in a follow-up comment:

  1. Open this PR (no verified label) → expect gate-job green, authorised=false with reason "'verified' label is not currently applied", downstream-job skipped, zero GitHub API calls.

  2. workflow_dispatch (manual run from the Actions tab) → expect gate-job green, authorised=true with reason "trusted event source (workflow_dispatch)", downstream-job green.

  3. Apply verified label as @Proletter → expect gate-job green, authorised=true with reason "label applier 'Proletter' is trusted (in users allowlist)", downstream-job green, zero team-membership API calls (allowlist short-circuit).

  4. Push another commit while still labeled (synchronize) → expect authorised=true, downstream-job green.

  5. Remove the label, push another commit → expect authorised=false ("not currently applied"), downstream-job skipped. This validates the unlabeled-bypass guard from QVAC-18608 commit 8998aa57.

If any scenario diverges from expected behaviour, this PR will not be merged and the rollout PR for QVAC-18612 will not be opened until the discrepancy is understood and fixed in PR #1968.

🛡️ Permissions changes

  • Scope: top-level (vulkaninfo.yml) and job label-gate
  • Before: permissions: {} (no permissions)
  • After: top-level contents: read, pull-requests: write (job inherits)
  • Justification:
    • contents: read so actions/checkout can make the local ./.github/actions/label-gate composite action available to the runner.
    • pull-requests: write so label-gate can strip the gate label on synchronize from non-trusted actors (defense-in-depth path). This is the same scope the existing authorize-pr action requires; only exercised in scenario 5 of the test plan.

@Proletter

Copy link
Copy Markdown
Collaborator Author

Canary did its job on the first run. Scenario 1 ran (PR opened, no verified label) and the gate-job hard-failed with required input 'github-token' is missing — even though github-token: ${{ secrets.GITHUB_TOKEN }} is set in the workflow. Run: https://github.com/tetherto/qvac/actions/runs/25672483584/job/75361185955

Root cause: getInput() in the action was uppercasing+hyphen-to-underscore (INPUT_GITHUB_TOKEN), but the runner preserves hyphens (INPUT_GITHUB-TOKEN). My local pre-merge smoke set INPUT_GITHUB_TOKEN (matching the buggy lookup), so both sides were wrong in the same direction and the smoke 'passed'. This is exactly the failure mode the canary was designed to surface — without it, the QVAC-18612 fan-out across 75 workflows would have red-X'd every PR.

Hotfix: #1973 — one-line fix to match the runner / @actions/core convention, plus 9 regression tests pinning the env-var-name resolution. node --test 53/53 pass; e2e smoke against the runner-correct env-var name returns exit 0 / authorised=false.

Once #1973 merges I'll rebase this PR onto the new main and re-walk the test plan.

@github-actions

github-actions Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ❌ PENDING

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ❌ (0/1)



---
*This comment is automatically updated when reviews change.*

Proletter added a commit that referenced this pull request May 11, 2026
…1973)

The QVAC-18612 canary (PR #1971, run id 25672483584) hard-failed with
"required input 'github-token' is missing" even though the workflow
clearly passed `github-token: ${{ secrets.GITHUB_TOKEN }}`.

Root cause: `getInput` in src/index.mjs was uppercasing the input
name AND replacing hyphens with underscores, looking up
`INPUT_GITHUB_TOKEN`. The GitHub Actions runner (and @actions/core)
preserve hyphens — only spaces are replaced — so the runner sets
`INPUT_GITHUB-TOKEN`. The action never found the token and threw a
missing-input error.

The local smoke test that "passed" before merge set
`INPUT_GITHUB_TOKEN=...` (matching the buggy lookup) so both sides
were wrong in the same direction. This is exactly the failure mode
the canary was meant to surface; without it, the gate would have
failed across all 75 secret-bearing workflows on first PR after the
QVAC-18612 fan-out.

Fix:
  - getInput now uses `name.replace(/ /g, '_').toUpperCase()` —
    matching the runner / @actions/core convention exactly.
  - getInput is exported from src/index.mjs (with an injectable env
    arg) so the convention can be unit-tested.
  - Top-level main() is gated on `import.meta.url === argv[1]` so
    importing index.mjs from tests no longer triggers a real run.

Tests:
  - 9 new tests in test/index.test.mjs pin the env-var-name resolution:
      * INPUT_GITHUB-TOKEN (hyphen preserved) -> resolves
      * INPUT_GITHUB_TOKEN (hyphen replaced) -> does NOT resolve
        (locks the contract against accidental "helpful" rewrite)
      * spaces are still replaced with underscores
      * trim, missing-required, defaults-to-process.env
  - Total: 53/53 pass via `node --test`.
  - End-to-end smoke against the runner-correct env-var name
    (INPUT_GITHUB-TOKEN=...) confirms exit 0 and authorised=false
    on the no-label deny path.

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087
Related: #1971

Co-authored-by: Cursor <cursoragent@cursor.com>
@Proletter Proletter force-pushed the feat/QVAC-18612-label-gate-safety-test-vulkaninfo branch from a86a783 to bcf5321 Compare May 11, 2026 14:46
@Proletter Proletter marked this pull request as ready for review May 11, 2026 14:53
@Proletter Proletter requested review from a team as code owners May 11, 2026 14:53
@Proletter Proletter added verified Authorize secrets / label-gate in PR workflows and removed verified Authorize secrets / label-gate in PR workflows labels May 11, 2026
@NamelsKing NamelsKing added the verified Authorize secrets / label-gate in PR workflows label May 11, 2026
@Proletter Proletter added verified Authorize secrets / label-gate in PR workflows and removed verified Authorize secrets / label-gate in PR workflows labels May 11, 2026
@github-actions github-actions Bot removed the verified Authorize secrets / label-gate in PR workflows label May 11, 2026
Proletter added a commit that referenced this pull request May 12, 2026
… it (#1978)

Currently, when a non-trusted user adds the `verified` label to a PR,
the action denies (correct) but leaves the misleading label sitting
on the PR (incorrect). The visible PR state ("verified") then no
longer matches the security state ("unverified"), creating both a
confusing UX and a minor social-engineering vector ("look, this PR
is verified -- merge it!"). The label only gets cleaned up later if
a non-trusted actor happens to push a commit (the existing
synchronize-strip path).

Confirmed in production by Olu on PR #1971 (canary):
  - Apply `verified` from a non-allowlisted user -> authorised=false
  - Label remains on the PR until a synchronize from a non-trusted
    actor cleans it up.

This commit makes the strip symmetric with the synchronize path:
when the labeled event is for our gate label AND the applier (=
sender, since they just clicked the label) is non-trusted, the
action denies AND removes the label in the same run.

Strip is intentionally NOT performed for non-labeled deny paths
(opened/reopened/edited/labeled-with-different-label/...) because
the historical applier resolved from the timeline may have been
trusted at apply time -- aggressive removal would penalise legit
labels applied by users whose trust status changed later (e.g.
former team members). The synchronize path will clean those up on
the next push from a non-trusted actor.

Tests:
  - Updated `labeled by non-member` and `labeled by bot account` to
    assert the new strip side effect (stripped: true, reason mentions
    stripped, exactly one stripLabel call to the right PR/label).
  - 4 new tests pinning the policy boundaries:
      * REGRESSION: labeled with a DIFFERENT label by non-trusted
        user while gate label still applied -> NO strip
      * idempotent strip API result is propagated through the
        decision (true on 200/204/404)
      * NOT performed when applier is trusted (no strip on success)
      * NOT performed for non-labeled deny paths (opened/reopened/...)
  - `node --test .github/actions/label-gate/test/*.test.mjs` ->
    57/57 pass (was 53; +4).

README:
  - Updated trust-model table to call out the strip on both the
    `labeled` and `synchronize` deny paths.
  - New "Strip policy" section that explicitly documents both strip
    triggers and the deliberate non-strip on historical-applier deny.

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087
Verified live in: #1971

Co-authored-by: Cursor <cursoragent@cursor.com>
Proletter and others added 2 commits May 12, 2026 10:05
Temporarily repurposes the GPU-diagnostic vulkaninfo.yml workflow as a
canary to validate the new ./.github/actions/label-gate composite
action in real GitHub Actions before fanning the gate out to all 75
secret-bearing workflows in the repo (the actual QVAC-18612 work).

Why this file: vulkaninfo is the smallest, most isolated workflow in
the repo (no PR triggers, no secrets, no production criticality, runs
only on manual dispatch on a Windows GPU runner). Repurposing it
cannot break anything else.

Trigger choice: pull_request, NOT pull_request_target. For
pull_request_target GitHub uses the workflow file from the BASE
branch at event time, which would mean our modified file never
actually runs during this canary PR. pull_request uses the PR head's
file. Long-term label-gate is intended for pull_request_target; this
safety test only proves out the action mechanics.

Token choice: secrets.GITHUB_TOKEN (NOT PAT_TOKEN). The canary uses
a `users` allowlist (just my login) instead of `teams`, so no team-
membership API calls are made and `read:org` is not required. This
deliberately decouples the safety test from the PAT_TOKEN scope
question, which is a separate prerequisite for the production rollout.

Test plan (each scenario observed live on the PR):

  1. Open this PR (no label) -> deny, downstream skips, no API calls.
  2. workflow_dispatch -> trusted event -> downstream-job runs.
  3. Apply `verified` as Proletter -> users-allowlist hit -> authorise.
  4. Push commit while still labeled (synchronize) -> still authorise.
  5. Remove label, push commit -> deny via the unlabeled-bypass guard
     from QVAC-18608 commit 8998aa5; downstream skips.

This PR is intended to be CLOSED WITHOUT MERGING once the safety test
completes. The original vulkaninfo workflow is preserved verbatim in
the comment block at the bottom of the file for trivial restoration.

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233099
Co-authored-by: Cursor <cursoragent@cursor.com>
Trivial newline change to fire the pull_request synchronize event so
the label-gate composite action is exercised on a labeled PR.
@Proletter Proletter force-pushed the feat/QVAC-18612-label-gate-safety-test-vulkaninfo branch from b2a4020 to dc23e64 Compare May 12, 2026 09:05
Pins actions/checkout to the repository default branch so the
label-gate action code is always loaded from the trusted base, not
from the PR's merge commit. Mirrors the same fix landed on the
fan-out PR (#1997) -- see that commit message for the full threat
model.

Tanstack-class bypass for \`pull_request\` triggers: same-repo
branch PRs whose checkout would otherwise pull a tampered
\`gate.mjs\` from the PR's tree and short-circuit authorisation.

Also switches the canary to sparse-checkout (action only), matching
the production fan-out shape.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Proletter Proletter closed this May 13, 2026
Proletter added a commit that referenced this pull request May 13, 2026
Re-land of the label-gate fan-out after PR #1997 was reverted on
2026-05-13 (commit 919850c). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR #1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR #1971 / #1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR #1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>
Proletter added a commit that referenced this pull request May 13, 2026
Re-land of the label-gate fan-out after PR #1997 was reverted on
2026-05-13 (commit 919850c). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR #1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR #1971 / #1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR #1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>
Proletter added a commit that referenced this pull request May 14, 2026
…(re-land) (#2023)

* QVAC-18612 infra: gate every secret-bearing workflow with label-gate

Re-land of the label-gate fan-out after PR #1997 was reverted on
2026-05-13 (commit 919850c). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR #1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR #1971 / #1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR #1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>

* QVAC-18612 infra: gate on-pr-close-* workflows with label-gate

Closes a release-env exposure surfaced when auditing #2023:
public-delete-npm-versions.yml (environment: release, packages: write)
is invoked by 12 on-pr-close-* workflows, but only embed-llamacpp had
label-gate. The other 10 fire on `pull_request: types: [closed]` and
reach the release env without authorisation.

This is currently held back only by the manual approval on the release
environment. Once that approval is dropped (the goal of QVAC-18612), the
label-gate becomes the sole control. This commit makes label-gate that
control everywhere.

Pattern is identical to on-pr-close-embed-llamacpp.yml (already on this
branch): inline label-gate job (caller side) + needs/if on the
delete-npm-versions-trigger reusable call. Reusable callee
(public-delete-npm-versions.yml) is unchanged.

on-pr-close-translation-nmtcpp.yml deliberately not modified - it has
only workflow_dispatch (no pull_request trigger) and is intrinsically
gated by repo-write access.
Proletter added a commit that referenced this pull request May 24, 2026
…1973)

The QVAC-18612 canary (PR #1971, run id 25672483584) hard-failed with
"required input 'github-token' is missing" even though the workflow
clearly passed `github-token: ${{ secrets.GITHUB_TOKEN }}`.

Root cause: `getInput` in src/index.mjs was uppercasing the input
name AND replacing hyphens with underscores, looking up
`INPUT_GITHUB_TOKEN`. The GitHub Actions runner (and @actions/core)
preserve hyphens — only spaces are replaced — so the runner sets
`INPUT_GITHUB-TOKEN`. The action never found the token and threw a
missing-input error.

The local smoke test that "passed" before merge set
`INPUT_GITHUB_TOKEN=...` (matching the buggy lookup) so both sides
were wrong in the same direction. This is exactly the failure mode
the canary was meant to surface; without it, the gate would have
failed across all 75 secret-bearing workflows on first PR after the
QVAC-18612 fan-out.

Fix:
  - getInput now uses `name.replace(/ /g, '_').toUpperCase()` —
    matching the runner / @actions/core convention exactly.
  - getInput is exported from src/index.mjs (with an injectable env
    arg) so the convention can be unit-tested.
  - Top-level main() is gated on `import.meta.url === argv[1]` so
    importing index.mjs from tests no longer triggers a real run.

Tests:
  - 9 new tests in test/index.test.mjs pin the env-var-name resolution:
      * INPUT_GITHUB-TOKEN (hyphen preserved) -> resolves
      * INPUT_GITHUB_TOKEN (hyphen replaced) -> does NOT resolve
        (locks the contract against accidental "helpful" rewrite)
      * spaces are still replaced with underscores
      * trim, missing-required, defaults-to-process.env
  - Total: 53/53 pass via `node --test`.
  - End-to-end smoke against the runner-correct env-var name
    (INPUT_GITHUB-TOKEN=...) confirms exit 0 and authorised=false
    on the no-label deny path.

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087
Related: #1971

Co-authored-by: Cursor <cursoragent@cursor.com>
Proletter added a commit that referenced this pull request May 24, 2026
… it (#1978)

Currently, when a non-trusted user adds the `verified` label to a PR,
the action denies (correct) but leaves the misleading label sitting
on the PR (incorrect). The visible PR state ("verified") then no
longer matches the security state ("unverified"), creating both a
confusing UX and a minor social-engineering vector ("look, this PR
is verified -- merge it!"). The label only gets cleaned up later if
a non-trusted actor happens to push a commit (the existing
synchronize-strip path).

Confirmed in production by Olu on PR #1971 (canary):
  - Apply `verified` from a non-allowlisted user -> authorised=false
  - Label remains on the PR until a synchronize from a non-trusted
    actor cleans it up.

This commit makes the strip symmetric with the synchronize path:
when the labeled event is for our gate label AND the applier (=
sender, since they just clicked the label) is non-trusted, the
action denies AND removes the label in the same run.

Strip is intentionally NOT performed for non-labeled deny paths
(opened/reopened/edited/labeled-with-different-label/...) because
the historical applier resolved from the timeline may have been
trusted at apply time -- aggressive removal would penalise legit
labels applied by users whose trust status changed later (e.g.
former team members). The synchronize path will clean those up on
the next push from a non-trusted actor.

Tests:
  - Updated `labeled by non-member` and `labeled by bot account` to
    assert the new strip side effect (stripped: true, reason mentions
    stripped, exactly one stripLabel call to the right PR/label).
  - 4 new tests pinning the policy boundaries:
      * REGRESSION: labeled with a DIFFERENT label by non-trusted
        user while gate label still applied -> NO strip
      * idempotent strip API result is propagated through the
        decision (true on 200/204/404)
      * NOT performed when applier is trusted (no strip on success)
      * NOT performed for non-labeled deny paths (opened/reopened/...)
  - `node --test .github/actions/label-gate/test/*.test.mjs` ->
    57/57 pass (was 53; +4).

README:
  - Updated trust-model table to call out the strip on both the
    `labeled` and `synchronize` deny paths.
  - New "Strip policy" section that explicitly documents both strip
    triggers and the deliberate non-strip on historical-applier deny.

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087
Verified live in: #1971

Co-authored-by: Cursor <cursoragent@cursor.com>
Proletter added a commit that referenced this pull request May 24, 2026
…(re-land) (#2023)

* QVAC-18612 infra: gate every secret-bearing workflow with label-gate

Re-land of the label-gate fan-out after PR #1997 was reverted on
2026-05-13 (commit c9b6856). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR #1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR #1971 / #1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR #1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>

* QVAC-18612 infra: gate on-pr-close-* workflows with label-gate

Closes a release-env exposure surfaced when auditing #2023:
public-delete-npm-versions.yml (environment: release, packages: write)
is invoked by 12 on-pr-close-* workflows, but only embed-llamacpp had
label-gate. The other 10 fire on `pull_request: types: [closed]` and
reach the release env without authorisation.

This is currently held back only by the manual approval on the release
environment. Once that approval is dropped (the goal of QVAC-18612), the
label-gate becomes the sole control. This commit makes label-gate that
control everywhere.

Pattern is identical to on-pr-close-embed-llamacpp.yml (already on this
branch): inline label-gate job (caller side) + needs/if on the
delete-npm-versions-trigger reusable call. Reusable callee
(public-delete-npm-versions.yml) is unchanged.

on-pr-close-translation-nmtcpp.yml deliberately not modified - it has
only workflow_dispatch (no pull_request trigger) and is intrinsically
gated by repo-write access.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants