Skip to content

ci: route HF_TOKEN-using jobs through approved-workflow environment#1473

Merged
adobrzyn merged 1 commit into
vllm-project:mainfrom
adobrzyn:feat/protected-env-for-hf-token
May 21, 2026
Merged

ci: route HF_TOKEN-using jobs through approved-workflow environment#1473
adobrzyn merged 1 commit into
vllm-project:mainfrom
adobrzyn:feat/protected-env-for-hf-token

Conversation

@adobrzyn
Copy link
Copy Markdown
Collaborator

Summary

Adds environment: approved-workflow to every job that consumes secrets.HF_TOKEN across the three CI workflows. Together with the existing approval gate in pre-merge-trigger.yaml (environment: pre-merge-approval, added in #1471), this completes the two-layer protection model:

PR opened
  -> pre-merge-trigger `gate` job: pauses for required reviewer (approval #1)
  -> on approval, pre-merge.yaml is dispatched
  -> downstream secret-using jobs resolve HF_TOKEN from the
     `approved-workflow` environment (no second per-job approval)

Why

With HF_TOKEN previously at repo-secret scope, any matrix entry of any e2e/test job had direct access the moment CI started. The recent malicious fork PR exfiltrated it via an auto-discovered run_* function. After this change, the token is only released from a GitHub Environment that a maintainer-controlled deployment-branch rule restricts to main / releases/**, and only after the upstream gate has approved the dispatch.

We deliberately add the environment only on jobs that actually use the secret (15 jobs). Helper jobs (gatekeeper, discover_*, retrieve_*, pre-commit, post-comment, cleanup_*, build_nixl_dockerfile, check_dockerfile_changes, prepare-release-branch, summarize_and_notify, setup_and_build, store_last_stable_vllm_commit) do not touch HF_TOKEN and are not modified, to avoid pointless extra gate evaluations.

Affected jobs (15)

  • pre-merge.yaml: hpu_unit_tests, hpu_pd_tests, hpu_perf_tests, hpu_dp_tests, e2e, calibration_tests
  • hourly-ci.yaml: run_unit_tests, e2e, run_data_parallel_test, run_pd_disaggregate_test
  • create-release-branch.yaml: run_unit_tests, e2e, run_data_parallel_test, run_pd_disaggregate_test, run_hpu_perf_tests

Diff

+15 lines, 0 deletions. Each touched job gets exactly one new line: environment: approved-workflow, inserted immediately after runs-on:.

Required repo configuration (before this PR can be merged safely)

  1. Settings → Environments → create environment approved-workflow.
  2. Add HF_TOKEN as an environment secret (the rotated value).
  3. No required reviewers on this environment (the upstream pre-merge-approval gate already enforces approval; adding reviewers here would prompt once per job).
  4. Deployment branches and tags: Selected branches → main, releases/**. Prevents a fork PR from claiming the environment from a non-trusted ref.
  5. Delete HF_TOKEN from repository-level secrets so the environment value is the only source.

Testing

Validated end-to-end against bmyrcha/vllm-gaudi first using a benign fork PR. With the two environments configured as above, the gate paused as expected, jobs received the secret after approval without a second prompt, and a deliberately mis-authored downstream PR could not reach the secret.

Close-cross-ref: builds on #1471.

Adds 'environment: approved-workflow' to every job that consumes
secrets.HF_TOKEN across pre-merge.yaml, hourly-ci.yaml, and
create-release-branch.yaml.

Together with the existing approval gate in pre-merge-trigger.yaml
(environment: pre-merge-approval), this completes the two-layer model:

  PR opened
    -> pre-merge-trigger 'gate' job: pauses for required reviewer
    -> on approval, pre-merge.yaml is dispatched
    -> downstream secret-using jobs resolve HF_TOKEN from the
       'approved-workflow' environment (no second per-job approval)

Affected jobs (15):
  pre-merge.yaml:           hpu_unit_tests, hpu_pd_tests, hpu_perf_tests,
                            hpu_dp_tests, e2e, calibration_tests
  hourly-ci.yaml:           run_unit_tests, e2e, run_data_parallel_test,
                            run_pd_disaggregate_test
  create-release-branch:    run_unit_tests, e2e, run_data_parallel_test,
                            run_pd_disaggregate_test, run_hpu_perf_tests

Requires (repo settings, separate from this PR):
  - GitHub Environment 'approved-workflow' must exist with HF_TOKEN
    secret stored in it.
  - Environment should have NO required reviewers (gating happens
    upstream); deployment-branch rule limited to main + releases/**
    is recommended.
  - HF_TOKEN must be removed from repository-level secrets so the
    environment value is the only source.

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Routes all GitHub Actions jobs that consume secrets.HF_TOKEN through the approved-workflow GitHub Environment, so the token can be scoped to an environment protected by deployment-branch rules and released only after the upstream approval gate dispatches CI.

Changes:

  • Added environment: approved-workflow to each HF_TOKEN-using job in pre-merge.yaml.
  • Added environment: approved-workflow to each HF_TOKEN-using job in hourly-ci.yaml.
  • Added environment: approved-workflow to each HF_TOKEN-using job in create-release-branch.yaml.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
.github/workflows/pre-merge.yaml Adds approved-workflow environment to all pre-merge jobs that pass secrets.HF_TOKEN into containers.
.github/workflows/hourly-ci.yaml Adds approved-workflow environment to hourly CI jobs that pass secrets.HF_TOKEN.
.github/workflows/create-release-branch.yaml Adds approved-workflow environment to release-branch workflow jobs that pass secrets.HF_TOKEN.

Comment on lines 361 to 365
needs: [pre_merge_hpu_test_build, discover_runner, retrieve_head_sha]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
timeout-minutes: 720
Comment on lines 100 to +104
# <-- UPDATED: Now needs 'setup_and_build' AND 'discover_runner'
needs: [setup_and_build, discover_runner]
# <-- UPDATED: Runs on the specific runner
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
Comment on lines 164 to 168
needs: [prepare-release-branch, setup_and_build, discover_runner]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
steps:
adobrzyn added a commit to adobrzyn/vllm-gaudi-bmyrcha that referenced this pull request May 21, 2026
Adds 'step-security/harden-runner@v2.19.3' (SHA-pinned) as the first
step of every CI job that consumes secrets.HF_TOKEN, configured with
'egress-policy: block' and a curated allow-list of endpoints that the
current build + test pipeline actually needs.

Allow-list (derived from reading .github/Dockerfile.ci, the workflow
files, and tests/full_tests/ci_e2e_discoverable_tests.sh):

  GitHub Actions infrastructure:
    api.github.com, github.com, codeload.github.com,
    objects.githubusercontent.com, raw.githubusercontent.com,
    release-assets.githubusercontent.com,
    *.actions.githubusercontent.com,
    results-receiver.actions.githubusercontent.com,
    ghcr.io, pkg-containers.githubusercontent.com,
    *.blob.core.windows.net  (cache / artifacts)

  Docker base image (build phase):
    vault.habana.ai  (Habana Gaudi base)

  Python packages (build + test phase):
    pypi.org, files.pythonhosted.org,
    download.pytorch.org  (torchaudio CPU wheel)

  Model weights (test phase):
    huggingface.co, cdn-lfs.huggingface.co, cdn-lfs.hf.co,
    cdn-lfs-us-1.hf.co, cas-bridge.xethub.hf.co, xet-lfs-us-1.hf.co

Because every test container is launched with '--network=host', the
host-level eBPF filter installed by harden-runner sees and enforces
on the container's traffic — no per-container instrumentation needed.

This is defense-in-depth, layered on top of:
  - pre-merge-trigger approval gate (vllm-project#1471)
  - approved-workflow environment for HF_TOKEN (vllm-project#1473)

Together these three changes mean a planted payload in a PR cannot:
  1. run at all without maintainer approval        (vllm-project#1471)
  2. receive HF_TOKEN without environment approval (vllm-project#1473)
  3. exfiltrate to an attacker-controlled host     (this PR)

If anything legitimate gets blocked, the harden-runner check run
will identify the host that was denied; we add it to the allow-list
in a follow-up.

Affected jobs (15 - same set as vllm-project#1473):
  pre-merge.yaml:           hpu_unit_tests, hpu_pd_tests, hpu_perf_tests,
                            hpu_dp_tests, e2e, calibration_tests
  hourly-ci.yaml:           run_unit_tests, e2e, run_data_parallel_test,
                            run_pd_disaggregate_test
  create-release-branch:    run_unit_tests, e2e, run_data_parallel_test,
                            run_pd_disaggregate_test, run_hpu_perf_tests

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
@adobrzyn adobrzyn merged commit ca2d952 into vllm-project:main May 21, 2026
1 of 2 checks passed
12010486 pushed a commit to 12010486/vllm-gaudi that referenced this pull request May 21, 2026
…llm-project#1473)

## Summary

Adds `environment: approved-workflow` to every job that consumes
`secrets.HF_TOKEN` across the three CI workflows. Together with the
existing approval gate in `pre-merge-trigger.yaml` (`environment:
pre-merge-approval`, added in vllm-project#1471), this completes the two-layer
protection model:

```
PR opened
  -> pre-merge-trigger `gate` job: pauses for required reviewer (approval #1)
  -> on approval, pre-merge.yaml is dispatched
  -> downstream secret-using jobs resolve HF_TOKEN from the
     `approved-workflow` environment (no second per-job approval)
```

## Why

With `HF_TOKEN` previously at repo-secret scope, any matrix entry of any
e2e/test job had direct access the moment CI started. The recent
malicious fork PR exfiltrated it via an auto-discovered `run_*`
function. After this change, the token is only released from a GitHub
Environment that a maintainer-controlled deployment-branch rule
restricts to `main` / `releases/**`, and only after the upstream gate
has approved the dispatch.

We deliberately add the environment only on jobs that actually use the
secret (15 jobs). Helper jobs (`gatekeeper`, `discover_*`, `retrieve_*`,
`pre-commit`, `post-comment`, `cleanup_*`, `build_nixl_dockerfile`,
`check_dockerfile_changes`, `prepare-release-branch`,
`summarize_and_notify`, `setup_and_build`,
`store_last_stable_vllm_commit`) do not touch HF_TOKEN and are not
modified, to avoid pointless extra gate evaluations.

## Affected jobs (15)

- `pre-merge.yaml`: `hpu_unit_tests`, `hpu_pd_tests`, `hpu_perf_tests`,
`hpu_dp_tests`, `e2e`, `calibration_tests`
- `hourly-ci.yaml`: `run_unit_tests`, `e2e`, `run_data_parallel_test`,
`run_pd_disaggregate_test`
- `create-release-branch.yaml`: `run_unit_tests`, `e2e`,
`run_data_parallel_test`, `run_pd_disaggregate_test`,
`run_hpu_perf_tests`

## Diff

+15 lines, 0 deletions. Each touched job gets exactly one new line:
`environment: approved-workflow`, inserted immediately after `runs-on:`.

## Required repo configuration (before this PR can be merged safely)

1. Settings → Environments → create environment **`approved-workflow`**.
2. Add **`HF_TOKEN`** as an environment secret (the rotated value).
3. **No required reviewers** on this environment (the upstream
`pre-merge-approval` gate already enforces approval; adding reviewers
here would prompt once per job).
4. **Deployment branches and tags**: Selected branches → `main`,
`releases/**`. Prevents a fork PR from claiming the environment from a
non-trusted ref.
5. **Delete** `HF_TOKEN` from repository-level secrets so the
environment value is the only source.

## Testing

Validated end-to-end against `bmyrcha/vllm-gaudi` first using a benign
fork PR. With the two environments configured as above, the gate paused
as expected, jobs received the secret after approval without a second
prompt, and a deliberately mis-authored downstream PR could not reach
the secret.

Close-cross-ref: builds on vllm-project#1471.

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: 12010486 <silvia.colabrese@intel.com>
mgawarkiewicz-intel pushed a commit that referenced this pull request May 25, 2026
#1491)

…#1473)

Adds `environment: approved-workflow` to every job that consumes
`secrets.HF_TOKEN` across the three CI workflows. Together with the
existing approval gate in `pre-merge-trigger.yaml` (`environment:
pre-merge-approval`, added in #1471), this completes the two-layer
protection model:

```
PR opened
  -> pre-merge-trigger `gate` job: pauses for required reviewer (approval #1)
  -> on approval, pre-merge.yaml is dispatched
  -> downstream secret-using jobs resolve HF_TOKEN from the
     `approved-workflow` environment (no second per-job approval)
```

With `HF_TOKEN` previously at repo-secret scope, any matrix entry of any
e2e/test job had direct access the moment CI started. The recent
malicious fork PR exfiltrated it via an auto-discovered `run_*`
function. After this change, the token is only released from a GitHub
Environment that a maintainer-controlled deployment-branch rule
restricts to `main` / `releases/**`, and only after the upstream gate
has approved the dispatch.

We deliberately add the environment only on jobs that actually use the
secret (15 jobs). Helper jobs (`gatekeeper`, `discover_*`, `retrieve_*`,
`pre-commit`, `post-comment`, `cleanup_*`, `build_nixl_dockerfile`,
`check_dockerfile_changes`, `prepare-release-branch`,
`summarize_and_notify`, `setup_and_build`,
`store_last_stable_vllm_commit`) do not touch HF_TOKEN and are not
modified, to avoid pointless extra gate evaluations.

- `pre-merge.yaml`: `hpu_unit_tests`, `hpu_pd_tests`, `hpu_perf_tests`,
`hpu_dp_tests`, `e2e`, `calibration_tests`
- `hourly-ci.yaml`: `run_unit_tests`, `e2e`, `run_data_parallel_test`,
`run_pd_disaggregate_test`
- `create-release-branch.yaml`: `run_unit_tests`, `e2e`,
`run_data_parallel_test`, `run_pd_disaggregate_test`,
`run_hpu_perf_tests`

+15 lines, 0 deletions. Each touched job gets exactly one new line:
`environment: approved-workflow`, inserted immediately after `runs-on:`.

1. Settings → Environments → create environment **`approved-workflow`**.
2. Add **`HF_TOKEN`** as an environment secret (the rotated value).
3. **No required reviewers** on this environment (the upstream
`pre-merge-approval` gate already enforces approval; adding reviewers
here would prompt once per job).
4. **Deployment branches and tags**: Selected branches → `main`,
`releases/**`. Prevents a fork PR from claiming the environment from a
non-trusted ref.
5. **Delete** `HF_TOKEN` from repository-level secrets so the
environment value is the only source.

Validated end-to-end against `bmyrcha/vllm-gaudi` first using a benign
fork PR. With the two environments configured as above, the gate paused
as expected, jobs received the secret after approval without a second
prompt, and a deliberately mis-authored downstream PR could not reach
the secret.

Close-cross-ref: builds on #1471.


(cherry picked from commit ca2d952)

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants