Skip to content

[ROCm][CI] Add uv pip compile workflow for rocm-test.txt lockfile#37930

Merged
gshtras merged 9 commits intovllm-project:mainfrom
ROCm:akaratza_feat_lockfile
Mar 26, 2026
Merged

[ROCm][CI] Add uv pip compile workflow for rocm-test.txt lockfile#37930
gshtras merged 9 commits intovllm-project:mainfrom
ROCm:akaratza_feat_lockfile

Conversation

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

Adds a proper uv pip compile workflow for generating rocm-test.txt from rocm-test.in, matching the existing CUDA test.in in to test.txt pattern.

Changes

  • Added pip-compile rocm-test hook in .pre-commit-config.yaml that compiles rocm-test.in in to rocm-test.txt using -c requirements/rocm.txt as a constraint file (since --torch-backend rocm doesn't exist in uv). CUDA/NVIDIA packages are excluded via --no-emit-package since ROCm torch is installed separately via rocm.txt.
  • Added a verification step that asserts torch.version.hip is not None to catch accidental CUDA torch installation during the test image build.
  • Regenerated as a proper uv-compiled lockfile (346 pinned packages, zero CUDA/NVIDIA packages).

Key design decisions

  • Uses -c requirements/rocm.txt constraint instead of --torch-backend (no ROCm backend in uv)
  • Excludes torch/torchvision/torchaudio/triton and all nvidia-/cuda- packages from the lockfile output since they are managed by rocm.txt and Dockerfile.rocm_base

Will run a full build on ROCm CI to verify no new regressions are introduced.

cc @kenroche

…ration

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas AndreasKaratzas added ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm labels Mar 23, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Mar 23, 2026
@mergify mergify bot added the ci/build label Mar 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a uv pip compile workflow for ROCm test dependencies, aligning it with the existing CUDA setup. The changes include a new pre-commit hook, a verification step in the Dockerfile to ensure the correct PyTorch build is used, and regenerated lockfiles. My review focuses on improving the maintainability of the pre-commit configuration by simplifying the package exclusion list.

Comment thread .pre-commit-config.yaml
Comment on lines +57 to +75
--no-emit-package, cuda-bindings,
--no-emit-package, cuda-pathfinder,
--no-emit-package, cuda-toolkit,
--no-emit-package, cupy-cuda12x,
--no-emit-package, nvidia-cublas,
--no-emit-package, nvidia-cuda-cupti,
--no-emit-package, nvidia-cuda-nvrtc,
--no-emit-package, nvidia-cuda-runtime,
--no-emit-package, nvidia-cudnn-cu13,
--no-emit-package, nvidia-cufft,
--no-emit-package, nvidia-cufile,
--no-emit-package, nvidia-curand,
--no-emit-package, nvidia-cusolver,
--no-emit-package, nvidia-cusparse,
--no-emit-package, nvidia-cusparselt-cu13,
--no-emit-package, nvidia-nccl-cu13,
--no-emit-package, nvidia-nvjitlink,
--no-emit-package, nvidia-nvshmem-cu13,
--no-emit-package, nvidia-nvtx,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The extensive list of --no-emit-package arguments for CUDA and NVIDIA packages can be significantly simplified. uv supports glob patterns for this option, which would make this configuration more concise and easier to maintain. You can replace the individual nvidia-*, cuda-*, and cupy-cuda* package exclusions with wildcard patterns.

        --no-emit-package, 'cuda-*',
        --no-emit-package, 'cupy-cuda*',
        --no-emit-package, 'nvidia-*',

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specificity is better I think.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcards would likely be more robust. As it is now every CUDA dependency change would break the ROCm build, which is annoying for CUDA developers and ROCm developers.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmellor So apparently uv does not support glob/wildcard patterns in --no-emit-package. I tested it and got:

error: invalid value 'nvidia-' for '--no-emit-package <NO_EMIT_PACKAGE>':
Not a valid package or extra name: "nvidia-
". Names must start and end with
a letter or digit and may only contain -, _, ., and alphanumeric characters.
The --no-emit-package flag requires exact package names

But, I implemented a post processing routine with warning log to account for this concern. I moved the pattern check under there.

@AndreasKaratzas AndreasKaratzas marked this pull request as ready for review March 24, 2026 07:52
…mic pattern filtering for ROCm builds

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…mic pattern filtering for ROCm builds

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 24, 2026

Hi @AndreasKaratzas, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

…mic pattern filtering for ROCm builds

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 24, 2026

Hi @AndreasKaratzas, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Copy Markdown
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of the custom script

@hmellor
Copy link
Copy Markdown
Member

hmellor commented Mar 25, 2026

Updating the uv-pre-commit version allows us to specify rocm7.0 (which based on the image build logs is what we are using), so we could have:

- repo: https://github.com/astral-sh/uv-pre-commit
  rev: 0.11.1
  hooks:
    - id: pip-compile
      args: [requirements/test.in, -o, requirements/test.txt, --index-strategy, unsafe-best-match, --torch-backend, cu129, --python-platform, x86_64-manylinux_2_28, --python-version, "3.12"]
      files: ^requirements/test\.(in|txt)$
    - id: pip-compile
      name: pip-compile-rocm
      args: [requirements/rocm-test.in, -c, requirements/rocm.txt, -o, requirements/rocm-test.txt, --index-strategy, unsafe-best-match, --torch-backend, rocm7.0, --python-platform, x86_64-manylinux_2_28, --python-version, "3.12"]
      files: ^requirements/rocm-test\.(in|txt)$

This almost works but there are conflicts in the provided requirements:

pip-compile-rocm.........................................................Failed
- hook id: pip-compile
- exit code: 1

  × No solution found when resolving dependencies:
  ╰─▶ Because there is no version of triton-rocm{sys_platform
      == 'linux'}==3.6.0 and torch==2.10.0+rocm7.0 depends on
      triton-rocm{sys_platform == 'linux'}==3.6.0, we can conclude that
      torch==2.10.0+rocm7.0 cannot be used.
      And because bitsandbytes==0.49.2 depends on torch>=2.3 and only the
      following versions of torch are available:
          torch<2.8.0
          torch==2.10.0+rocm7.0
      we can conclude that bitsandbytes==0.49.2 depends on torch>=2.3,<2.8.0.
      And because instanttensor>=0.1.5 depends on torch>=2.8.0 and only the
      following versions of instanttensor are available:
          instanttensor<=0.1.5
          instanttensor==0.1.6
          instanttensor==0.1.7
      we can conclude that bitsandbytes==0.49.2 and instanttensor>=0.1.5 are
      incompatible.
      And because you require bitsandbytes==0.49.2 and instanttensor>=0.1.5,
      we can conclude that your requirements are unsatisfiable.

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

not a fan of the custom script

I reverted the custom script :)

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

Updating the uv-pre-commit version allows us to specify rocm7.0 (which based on the image build logs is what we are using), so we could have:

- repo: https://github.com/astral-sh/uv-pre-commit
  rev: 0.11.1
  hooks:
    - id: pip-compile
      args: [requirements/test.in, -o, requirements/test.txt, --index-strategy, unsafe-best-match, --torch-backend, cu129, --python-platform, x86_64-manylinux_2_28, --python-version, "3.12"]
      files: ^requirements/test\.(in|txt)$
    - id: pip-compile
      name: pip-compile-rocm
      args: [requirements/rocm-test.in, -c, requirements/rocm.txt, -o, requirements/rocm-test.txt, --index-strategy, unsafe-best-match, --torch-backend, rocm7.0, --python-platform, x86_64-manylinux_2_28, --python-version, "3.12"]
      files: ^requirements/rocm-test\.(in|txt)$

This almost works but there are conflicts in the provided requirements:

pip-compile-rocm.........................................................Failed
- hook id: pip-compile
- exit code: 1

  × No solution found when resolving dependencies:
  ╰─▶ Because there is no version of triton-rocm{sys_platform
      == 'linux'}==3.6.0 and torch==2.10.0+rocm7.0 depends on
      triton-rocm{sys_platform == 'linux'}==3.6.0, we can conclude that
      torch==2.10.0+rocm7.0 cannot be used.
      And because bitsandbytes==0.49.2 depends on torch>=2.3 and only the
      following versions of torch are available:
          torch<2.8.0
          torch==2.10.0+rocm7.0
      we can conclude that bitsandbytes==0.49.2 depends on torch>=2.3,<2.8.0.
      And because instanttensor>=0.1.5 depends on torch>=2.8.0 and only the
      following versions of instanttensor are available:
          instanttensor<=0.1.5
          instanttensor==0.1.6
          instanttensor==0.1.7
      we can conclude that bitsandbytes==0.49.2 and instanttensor>=0.1.5 are
      incompatible.
      And because you require bitsandbytes==0.49.2 and instanttensor>=0.1.5,
      we can conclude that your requirements are unsatisfiable.

Investigated using --torch-backend rocm7.0 as you suggested. It resolves cleanly when paired with an override for triton-rocm. However, --torch-backend requires prebuilt PyTorch wheels on download.pytorch.org/whl/, and our ROCm Docker builds compile torch from source. If we target a ROCm version that PyTorch hasn't published wheels for yet, the hook would fail to resolve. Keeping the explicit --no-emit-package list for now, but applied your naming feedback (pip-compile-rocm with alias). Happy to revisit once PyTorch ROCm wheels are reliably published.

Copy link
Copy Markdown
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it should be fairly easy to update the --no-emit-package rules and once proper ROCm wheels are distributed we should be able to use --torch-backend instead

@gshtras gshtras merged commit db01535 into vllm-project:main Mar 26, 2026
13 of 14 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Mar 26, 2026
@gshtras gshtras deleted the akaratza_feat_lockfile branch March 26, 2026 17:44
asrvastava pushed a commit to asrvastava/vllm that referenced this pull request Mar 26, 2026
…lm-project#37930)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: iamvastava <iamvastava@gmail.com>
RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026
nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026
…lm-project#37930)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026
…lm-project#37930)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Rishi Puri <riship@nvidia.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants