Skip to content

QVAC-18608 fix(label-gate): preserve hyphens in input env-var names#1973

Merged
Proletter merged 3 commits into
mainfrom
fix/QVAC-18608-label-gate-input-name-resolution
May 11, 2026
Merged

QVAC-18608 fix(label-gate): preserve hyphens in input env-var names#1973
Proletter merged 3 commits into
mainfrom
fix/QVAC-18608-label-gate-input-name-resolution

Conversation

@Proletter

Copy link
Copy Markdown
Collaborator

🎯 What problem does this PR solve?

📝 How does it solve it?

  • Root cause: getInput() in src/index.mjs was doing name.toUpperCase().replace(/-/g, '_'), looking up INPUT_GITHUB_TOKEN. The GitHub Actions runner — and @actions/core — only replace spaces with underscores, not hyphens. The runner sets INPUT_GITHUB-TOKEN (hyphen preserved, technically non-POSIX but Node exposes it via process.env regardless). My implementation never found the value and threw the misleading "missing input" error.

  • Why local smoke tests didn't catch it: my pre-merge smoke set INPUT_GITHUB_TOKEN=... (matching the buggy lookup). Both sides were wrong in the same direction, so the smoke "passed". Real-CI exposes this immediately — exactly the failure the canary was designed to find.

  • Fix: getInput() now uses name.replace(/ /g, '_').toUpperCase() — matching the runner / @actions/core convention exactly.

  • Testability: getInput() is now exported (with an injectable env arg) so the env-var-name resolution can be unit-tested. Top-level main() is gated on import.meta.url === argv[1] so importing index.mjs from tests no longer triggers a real run.

🧪 How was it tested?

  • 9 new regression tests in test/index.test.mjs pin the env-var resolution against the runner contract:
    • INPUT_GITHUB-TOKEN (hyphen preserved) → resolves correctly
    • INPUT_GITHUB_TOKEN (hyphen-replaced, the old bug) → does not resolve (locks the contract against any accidental "helpful" reintroduction of the substitution)
    • Spaces still replaced with underscores (per @actions/core)
    • Trim, missing-required, defaults-to-process.env paths
  • Full suite: node --test .github/actions/label-gate/test/*.test.mjs53/53 pass (was 44 before, +9 new).
  • End-to-end smoke against the runner-correct env-var name:
    env "INPUT_GITHUB-TOKEN=fake-token" INPUT_LABEL=verified INPUT_USERS=Proletter \
        GITHUB_EVENT_NAME=pull_request GITHUB_EVENT_PATH=... \
        GITHUB_OUTPUT=... GITHUB_REPOSITORY=tetherto/qvac \
        node .github/actions/label-gate/src/index.mjs
    
    → exit 0, authorised=false, notice "'verified' label is not currently applied to PR #1971".
  • Live verification: once this merges I'll trigger a fresh CI run on QVAC-18612 infra: repurpose vulkaninfo as label-gate safety canary #1971 and confirm scenario 1 of its test plan now produces authorised=false (gate-job green, downstream skipped) instead of a red gate-job.

The QVAC-18612 canary (PR #1971, run id 25672483584) hard-failed with
"required input 'github-token' is missing" even though the workflow
clearly passed `github-token: ${{ secrets.GITHUB_TOKEN }}`.

Root cause: `getInput` in src/index.mjs was uppercasing the input
name AND replacing hyphens with underscores, looking up
`INPUT_GITHUB_TOKEN`. The GitHub Actions runner (and @actions/core)
preserve hyphens — only spaces are replaced — so the runner sets
`INPUT_GITHUB-TOKEN`. The action never found the token and threw a
missing-input error.

The local smoke test that "passed" before merge set
`INPUT_GITHUB_TOKEN=...` (matching the buggy lookup) so both sides
were wrong in the same direction. This is exactly the failure mode
the canary was meant to surface; without it, the gate would have
failed across all 75 secret-bearing workflows on first PR after the
QVAC-18612 fan-out.

Fix:
  - getInput now uses `name.replace(/ /g, '_').toUpperCase()` —
    matching the runner / @actions/core convention exactly.
  - getInput is exported from src/index.mjs (with an injectable env
    arg) so the convention can be unit-tested.
  - Top-level main() is gated on `import.meta.url === argv[1]` so
    importing index.mjs from tests no longer triggers a real run.

Tests:
  - 9 new tests in test/index.test.mjs pin the env-var-name resolution:
      * INPUT_GITHUB-TOKEN (hyphen preserved) -> resolves
      * INPUT_GITHUB_TOKEN (hyphen replaced) -> does NOT resolve
        (locks the contract against accidental "helpful" rewrite)
      * spaces are still replaced with underscores
      * trim, missing-required, defaults-to-process.env
  - Total: 53/53 pass via `node --test`.
  - End-to-end smoke against the runner-correct env-var name
    (INPUT_GITHUB-TOKEN=...) confirms exit 0 and authorised=false
    on the no-label deny path.

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087
Related: #1971
Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

github-actions Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (3/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

@Proletter

Copy link
Copy Markdown
Collaborator Author

/review

@Proletter Proletter merged commit 39e61d0 into main May 11, 2026
7 checks passed
@Proletter Proletter deleted the fix/QVAC-18608-label-gate-input-name-resolution branch May 11, 2026 14:44
Proletter added a commit that referenced this pull request May 13, 2026
Re-land of the label-gate fan-out after PR #1997 was reverted on
2026-05-13 (commit 919850c). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR #1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR #1971 / #1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR #1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>
Proletter added a commit that referenced this pull request May 13, 2026
Re-land of the label-gate fan-out after PR #1997 was reverted on
2026-05-13 (commit 919850c). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR #1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR #1971 / #1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR #1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>
Proletter added a commit that referenced this pull request May 14, 2026
…(re-land) (#2023)

* QVAC-18612 infra: gate every secret-bearing workflow with label-gate

Re-land of the label-gate fan-out after PR #1997 was reverted on
2026-05-13 (commit 919850c). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR #1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR #1971 / #1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR #1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>

* QVAC-18612 infra: gate on-pr-close-* workflows with label-gate

Closes a release-env exposure surfaced when auditing #2023:
public-delete-npm-versions.yml (environment: release, packages: write)
is invoked by 12 on-pr-close-* workflows, but only embed-llamacpp had
label-gate. The other 10 fire on `pull_request: types: [closed]` and
reach the release env without authorisation.

This is currently held back only by the manual approval on the release
environment. Once that approval is dropped (the goal of QVAC-18612), the
label-gate becomes the sole control. This commit makes label-gate that
control everywhere.

Pattern is identical to on-pr-close-embed-llamacpp.yml (already on this
branch): inline label-gate job (caller side) + needs/if on the
delete-npm-versions-trigger reusable call. Reusable callee
(public-delete-npm-versions.yml) is unchanged.

on-pr-close-translation-nmtcpp.yml deliberately not modified - it has
only workflow_dispatch (no pull_request trigger) and is intrinsically
gated by repo-write access.
Proletter added a commit that referenced this pull request May 24, 2026
…1973)

The QVAC-18612 canary (PR #1971, run id 25672483584) hard-failed with
"required input 'github-token' is missing" even though the workflow
clearly passed `github-token: ${{ secrets.GITHUB_TOKEN }}`.

Root cause: `getInput` in src/index.mjs was uppercasing the input
name AND replacing hyphens with underscores, looking up
`INPUT_GITHUB_TOKEN`. The GitHub Actions runner (and @actions/core)
preserve hyphens — only spaces are replaced — so the runner sets
`INPUT_GITHUB-TOKEN`. The action never found the token and threw a
missing-input error.

The local smoke test that "passed" before merge set
`INPUT_GITHUB_TOKEN=...` (matching the buggy lookup) so both sides
were wrong in the same direction. This is exactly the failure mode
the canary was meant to surface; without it, the gate would have
failed across all 75 secret-bearing workflows on first PR after the
QVAC-18612 fan-out.

Fix:
  - getInput now uses `name.replace(/ /g, '_').toUpperCase()` —
    matching the runner / @actions/core convention exactly.
  - getInput is exported from src/index.mjs (with an injectable env
    arg) so the convention can be unit-tested.
  - Top-level main() is gated on `import.meta.url === argv[1]` so
    importing index.mjs from tests no longer triggers a real run.

Tests:
  - 9 new tests in test/index.test.mjs pin the env-var-name resolution:
      * INPUT_GITHUB-TOKEN (hyphen preserved) -> resolves
      * INPUT_GITHUB_TOKEN (hyphen replaced) -> does NOT resolve
        (locks the contract against accidental "helpful" rewrite)
      * spaces are still replaced with underscores
      * trim, missing-required, defaults-to-process.env
  - Total: 53/53 pass via `node --test`.
  - End-to-end smoke against the runner-correct env-var name
    (INPUT_GITHUB-TOKEN=...) confirms exit 0 and authorised=false
    on the no-label deny path.

Refs: https://app.asana.com/1/45238840754660/project/1214153063536860/task/1214612672233087
Related: #1971

Co-authored-by: Cursor <cursoragent@cursor.com>
Proletter added a commit that referenced this pull request May 24, 2026
…(re-land) (#2023)

* QVAC-18612 infra: gate every secret-bearing workflow with label-gate

Re-land of the label-gate fan-out after PR #1997 was reverted on
2026-05-13 (commit c9b6856). Re-architected to fix the caller-cap
permissions violation that broke 30+ on-pr-* workflows the moment a
verified label was applied.

Architecture: caller-gates-callee
  - Reusable workflows (workflow_call invokees) are NOT modified. PR #1997
    embedded a label-gate job inside each reusable callee with
    `pull-requests: write`, which violates the caller-cap rule for any
    caller that scopes the call to `pull-requests: read|none`. GitHub
    enforces this at parse time; the affected workflow files won't even
    load.
  - Callers get a label-gate job at the top of `jobs:` with
    `pull-requests: write` (which never crosses a caller-cap boundary).
    Each `uses:` invocation that targets a secret-bearing reusable, plus
    every standalone secret-bearing job in the same workflow, gains
    `needs: [..., label-gate]` and an `if:` prepended with
    `needs.label-gate.outputs.authorised == 'true'`.
  - When the gate denies on a `uses:` job, the entire reusable invocation
    is skipped — the callee runner never starts, no secrets are exposed,
    and no caller-cap validation can fire because the workflow_call
    payload is never sent.

The label-gate action checks out from the default branch via sparse
checkout, which is the same Tanstack-class supply-chain mitigation
landed in the canary fix on PR #1971 / #1973.

Workflow-by-workflow stats:
  - 59 caller workflows migrated (label-gate + needs/if updates)
  - 56 reusable callees, exempt workflows, and no-secret workflows
    intentionally left UNCHANGED on disk
  - Pre-existing `authorize-pr` peer jobs preserved (belt-and-suspenders;
    removal is a follow-up after a soak period)
  - approval-worker.yml and approval-check-worker.yml exempt (gating them
    creates a deadlock; we explicitly do not touch them)

Pre-flight verification before push:
  - `python3 .github/scripts/audit-workflow-permissions.py` -> 0 hard
    violations across 162 caller-callee edges (vs. 21 hard violations
    after the naive PR #1997-style migration; the audit was added in the
    previous commit precisely to catch this regression class)
  - `actionlint .github/workflows/*.{yml,yaml}` reports identical issue
    counts before and after the migration: 1832 shellcheck (pre-existing),
    9 expression (pre-existing), 5 action (down from 7 pre-existing)

End-to-end validated in the qvac-internal sandbox with real org teams:
  - tetherto/qvac-internal#12 (caller-gates-callee + standalone gating
    against the actual qvac-internal-{dev,merge,release} teams)
  - Olutest/qvac-tests (public mirror; same harness, single-user
    allowlist)
  - Validation matrix: 9/9 scenarios pass, including the strip-on-
    non-trusted-apply case

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Proletter <40578159+Proletter@users.noreply.github.com>

* QVAC-18612 infra: gate on-pr-close-* workflows with label-gate

Closes a release-env exposure surfaced when auditing #2023:
public-delete-npm-versions.yml (environment: release, packages: write)
is invoked by 12 on-pr-close-* workflows, but only embed-llamacpp had
label-gate. The other 10 fire on `pull_request: types: [closed]` and
reach the release env without authorisation.

This is currently held back only by the manual approval on the release
environment. Once that approval is dropped (the goal of QVAC-18612), the
label-gate becomes the sole control. This commit makes label-gate that
control everywhere.

Pattern is identical to on-pr-close-embed-llamacpp.yml (already on this
branch): inline label-gate job (caller side) + needs/if on the
delete-npm-versions-trigger reusable call. Reusable callee
(public-delete-npm-versions.yml) is unchanged.

on-pr-close-translation-nmtcpp.yml deliberately not modified - it has
only workflow_dispatch (no pull_request trigger) and is intrinsically
gated by repo-write access.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants