Skip to content

ci: enforce egress allow-list on jobs that consume HF_TOKEN#1475

Closed
adobrzyn wants to merge 1 commit into
vllm-project:mainfrom
adobrzyn:feat/harden-runner-egress-block
Closed

ci: enforce egress allow-list on jobs that consume HF_TOKEN#1475
adobrzyn wants to merge 1 commit into
vllm-project:mainfrom
adobrzyn:feat/harden-runner-egress-block

Conversation

@adobrzyn
Copy link
Copy Markdown
Collaborator

@adobrzyn adobrzyn commented May 21, 2026

Summary

Adds step-security/harden-runner@v2.19.3 (SHA-pinned) as the first step of every CI job that consumes secrets.HF_TOKEN, configured with egress-policy: block and a curated allow-list of endpoints that the current build + test pipeline actually needs.

Reopen of #1474 on a clearer branch name (feat/harden-runner-egress-block). No content change.

Why

Even with #1471 (pre-merge approval gate) and #1473 (approved-workflow environment for HF_TOKEN), a planted payload that activates inside a trusted job today has unrestricted egress.

Layered together, these three PRs mean a planted payload in a PR cannot:

  1. Run at all without maintainer approval — Add pre-merge-approval for execute_pre_merge #1471 (merged)
  2. Receive HF_TOKEN without environment approval — ci: route HF_TOKEN-using jobs through approved-workflow environment #1473
  3. Exfiltrate to an attacker-controlled host — this PR

Allow-list (derived from code, not collected from runs)

Walked through .github/Dockerfile.ci, the three workflow YAMLs, and tests/full_tests/ci_e2e_discoverable_tests.sh:

Purpose Endpoints
GitHub Actions infra api.github.com, github.com, codeload.github.com, objects.githubusercontent.com, raw.githubusercontent.com, release-assets.githubusercontent.com, *.actions.githubusercontent.com, results-receiver.actions.githubusercontent.com, ghcr.io, pkg-containers.githubusercontent.com, *.blob.core.windows.net
Docker base image (build phase) vault.habana.ai
Python packages (build + test) pypi.org, files.pythonhosted.org, download.pytorch.org
Model weights (test phase) huggingface.co, cdn-lfs.huggingface.co, cdn-lfs.hf.co, cdn-lfs-us-1.hf.co, cas-bridge.xethub.hf.co, xet-lfs-us-1.hf.co

If something legitimate gets blocked, the harden-runner check-run identifies the denied host and we add it in a follow-up one-liner.

How it covers the docker containers

Every test container is launched with --network=host, so the eBPF filter installed by harden-runner on the runner host sees and enforces on the container's outbound traffic — no per-container instrumentation needed.

Affected jobs (15 — same set as #1473)

Workflow Jobs
pre-merge.yaml hpu_unit_tests, hpu_pd_tests, hpu_perf_tests, hpu_dp_tests, e2e, calibration_tests
hourly-ci.yaml run_unit_tests, e2e, run_data_parallel_test, run_pd_disaggregate_test
create-release-branch.yaml run_unit_tests, e2e, run_data_parallel_test, run_pd_disaggregate_test, run_hpu_perf_tests

Snippet inserted (identical in every job)

      - name: Harden runner (egress block)
        uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
        with:
          egress-policy: block
          disable-sudo: false
          allowed-endpoints: >
            api.github.com:443
            github.com:443
            codeload.github.com:443
            objects.githubusercontent.com:443
            raw.githubusercontent.com:443
            release-assets.githubusercontent.com:443
            *.actions.githubusercontent.com:443
            results-receiver.actions.githubusercontent.com:443
            ghcr.io:443
            pkg-containers.githubusercontent.com:443
            *.blob.core.windows.net:443
            vault.habana.ai:443
            pypi.org:443
            files.pythonhosted.org:443
            download.pytorch.org:443
            huggingface.co:443
            cdn-lfs.huggingface.co:443
            cdn-lfs.hf.co:443
            cdn-lfs-us-1.hf.co:443
            cas-bridge.xethub.hf.co:443
            xet-lfs-us-1.hf.co:443

Self-hosted runner notes

  • harden-runner installs a small monitoring agent on the runner host. Requires sudo (already available on pr-ci / hourly-ci pools).
  • disable-sudo: false is kept because some CI steps need docker via group/sudo.
  • The --privileged flag on test containers means a sufficiently sophisticated payload could try to tamper with the host firewall from inside the container. This is a residual risk; closing it would require moving the harden-runner step inside the container or dropping --privileged. Out of scope here.

Adds 'step-security/harden-runner@v2.19.3' (SHA-pinned) as the first
step of every CI job that consumes secrets.HF_TOKEN, configured with
'egress-policy: block' and a curated allow-list of endpoints that the
current build + test pipeline actually needs.

Allow-list (derived from reading .github/Dockerfile.ci, the workflow
files, and tests/full_tests/ci_e2e_discoverable_tests.sh):

  GitHub Actions infrastructure:
    api.github.com, github.com, codeload.github.com,
    objects.githubusercontent.com, raw.githubusercontent.com,
    release-assets.githubusercontent.com,
    *.actions.githubusercontent.com,
    results-receiver.actions.githubusercontent.com,
    ghcr.io, pkg-containers.githubusercontent.com,
    *.blob.core.windows.net  (cache / artifacts)

  Docker base image (build phase):
    vault.habana.ai  (Habana Gaudi base)

  Python packages (build + test phase):
    pypi.org, files.pythonhosted.org,
    download.pytorch.org  (torchaudio CPU wheel)

  Model weights (test phase):
    huggingface.co, cdn-lfs.huggingface.co, cdn-lfs.hf.co,
    cdn-lfs-us-1.hf.co, cas-bridge.xethub.hf.co, xet-lfs-us-1.hf.co

Because every test container is launched with '--network=host', the
host-level eBPF filter installed by harden-runner sees and enforces
on the container's traffic — no per-container instrumentation needed.

This is defense-in-depth, layered on top of:
  - pre-merge-trigger approval gate (vllm-project#1471)
  - approved-workflow environment for HF_TOKEN (vllm-project#1473)

Together these three changes mean a planted payload in a PR cannot:
  1. run at all without maintainer approval        (vllm-project#1471)
  2. receive HF_TOKEN without environment approval (vllm-project#1473)
  3. exfiltrate to an attacker-controlled host     (this PR)

If anything legitimate gets blocked, the harden-runner check run
will identify the host that was denied; we add it to the allow-list
in a follow-up.

Affected jobs (15 - same set as vllm-project#1473):
  pre-merge.yaml:           hpu_unit_tests, hpu_pd_tests, hpu_perf_tests,
                            hpu_dp_tests, e2e, calibration_tests
  hourly-ci.yaml:           run_unit_tests, e2e, run_data_parallel_test,
                            run_pd_disaggregate_test
  create-release-branch:    run_unit_tests, e2e, run_data_parallel_test,
                            run_pd_disaggregate_test, run_hpu_perf_tests

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens CI jobs that consume secrets.HF_TOKEN by adding step-security/harden-runner as the first step in each such job, enforcing an egress-deny-by-default policy with an explicit endpoint allow-list to reduce secret exfiltration risk.

Changes:

  • Add step-security/harden-runner@v2.19.3 (SHA-pinned) to HF_TOKEN-consuming jobs in CI workflows.
  • Configure egress-policy: block with an endpoint allow-list covering GitHub Actions infra, package installs, and Hugging Face model downloads.
  • Apply the same hardening pattern across pre-merge, hourly CI, and release-branch CI jobs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
.github/workflows/pre-merge.yaml Adds harden-runner egress blocking + allow-list to HF_TOKEN-consuming pre-merge test jobs.
.github/workflows/hourly-ci.yaml Adds harden-runner egress blocking + allow-list to HF_TOKEN-consuming hourly test jobs.
.github/workflows/create-release-branch.yaml Adds harden-runner egress blocking + allow-list to HF_TOKEN-consuming release-branch test jobs.

results-receiver.actions.githubusercontent.com:443
ghcr.io:443
pkg-containers.githubusercontent.com:443
*.blob.core.windows.net:443
Comment on lines +366 to +372
- name: Harden runner (egress block)
uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
with:
egress-policy: block
disable-sudo: false
allowed-endpoints: >
api.github.com:443
results-receiver.actions.githubusercontent.com:443
ghcr.io:443
pkg-containers.githubusercontent.com:443
*.blob.core.windows.net:443
Comment on lines +105 to +111
- name: Harden runner (egress block)
uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
with:
egress-policy: block
disable-sudo: false
allowed-endpoints: >
api.github.com:443
results-receiver.actions.githubusercontent.com:443
ghcr.io:443
pkg-containers.githubusercontent.com:443
*.blob.core.windows.net:443
Comment on lines +168 to +174
- name: Harden runner (egress block)
uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
with:
egress-policy: block
disable-sudo: false
allowed-endpoints: >
api.github.com:443
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants