[ROCm][CI] Add uv pip compile workflow for rocm-test.txt lockfile#37930
[ROCm][CI] Add uv pip compile workflow for rocm-test.txt lockfile#37930gshtras merged 9 commits intovllm-project:mainfrom
Conversation
…ration Signed-off-by: Andreas Karatzas <akaratza@amd.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a uv pip compile workflow for ROCm test dependencies, aligning it with the existing CUDA setup. The changes include a new pre-commit hook, a verification step in the Dockerfile to ensure the correct PyTorch build is used, and regenerated lockfiles. My review focuses on improving the maintainability of the pre-commit configuration by simplifying the package exclusion list.
| --no-emit-package, cuda-bindings, | ||
| --no-emit-package, cuda-pathfinder, | ||
| --no-emit-package, cuda-toolkit, | ||
| --no-emit-package, cupy-cuda12x, | ||
| --no-emit-package, nvidia-cublas, | ||
| --no-emit-package, nvidia-cuda-cupti, | ||
| --no-emit-package, nvidia-cuda-nvrtc, | ||
| --no-emit-package, nvidia-cuda-runtime, | ||
| --no-emit-package, nvidia-cudnn-cu13, | ||
| --no-emit-package, nvidia-cufft, | ||
| --no-emit-package, nvidia-cufile, | ||
| --no-emit-package, nvidia-curand, | ||
| --no-emit-package, nvidia-cusolver, | ||
| --no-emit-package, nvidia-cusparse, | ||
| --no-emit-package, nvidia-cusparselt-cu13, | ||
| --no-emit-package, nvidia-nccl-cu13, | ||
| --no-emit-package, nvidia-nvjitlink, | ||
| --no-emit-package, nvidia-nvshmem-cu13, | ||
| --no-emit-package, nvidia-nvtx, |
There was a problem hiding this comment.
The extensive list of --no-emit-package arguments for CUDA and NVIDIA packages can be significantly simplified. uv supports glob patterns for this option, which would make this configuration more concise and easier to maintain. You can replace the individual nvidia-*, cuda-*, and cupy-cuda* package exclusions with wildcard patterns.
--no-emit-package, 'cuda-*',
--no-emit-package, 'cupy-cuda*',
--no-emit-package, 'nvidia-*',There was a problem hiding this comment.
Specificity is better I think.
There was a problem hiding this comment.
Wildcards would likely be more robust. As it is now every CUDA dependency change would break the ROCm build, which is annoying for CUDA developers and ROCm developers.
There was a problem hiding this comment.
@hmellor So apparently uv does not support glob/wildcard patterns in --no-emit-package. I tested it and got:
error: invalid value 'nvidia-' for '--no-emit-package <NO_EMIT_PACKAGE>':
Not a valid package or extra name: "nvidia-". Names must start and end with
a letter or digit and may only contain -, _, ., and alphanumeric characters.
The --no-emit-package flag requires exact package names
But, I implemented a post processing routine with warning log to account for this concern. I moved the pattern check under there.
…mic pattern filtering for ROCm builds Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…mic pattern filtering for ROCm builds Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
Hi @AndreasKaratzas, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
…mic pattern filtering for ROCm builds Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
Hi @AndreasKaratzas, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
…mic pattern filtering for ROCm builds Signed-off-by: Andreas Karatzas <akaratza@amd.com>
hmellor
left a comment
There was a problem hiding this comment.
not a fan of the custom script
|
Updating the - repo: https://github.com/astral-sh/uv-pre-commit
rev: 0.11.1
hooks:
- id: pip-compile
args: [requirements/test.in, -o, requirements/test.txt, --index-strategy, unsafe-best-match, --torch-backend, cu129, --python-platform, x86_64-manylinux_2_28, --python-version, "3.12"]
files: ^requirements/test\.(in|txt)$
- id: pip-compile
name: pip-compile-rocm
args: [requirements/rocm-test.in, -c, requirements/rocm.txt, -o, requirements/rocm-test.txt, --index-strategy, unsafe-best-match, --torch-backend, rocm7.0, --python-platform, x86_64-manylinux_2_28, --python-version, "3.12"]
files: ^requirements/rocm-test\.(in|txt)$This almost works but there are conflicts in the provided requirements: |
…to pip-compile-rocm Signed-off-by: Andreas Karatzas <akaratza@amd.com>
I reverted the custom script :) |
Investigated using --torch-backend rocm7.0 as you suggested. It resolves cleanly when paired with an override for triton-rocm. However, --torch-backend requires prebuilt PyTorch wheels on download.pytorch.org/whl/, and our ROCm Docker builds compile torch from source. If we target a ROCm version that PyTorch hasn't published wheels for yet, the hook would fail to resolve. Keeping the explicit --no-emit-package list for now, but applied your naming feedback (pip-compile-rocm with alias). Happy to revisit once PyTorch ROCm wheels are reliably published. |
hmellor
left a comment
There was a problem hiding this comment.
LGTM, it should be fairly easy to update the --no-emit-package rules and once proper ROCm wheels are distributed we should be able to use --torch-backend instead
…lm-project#37930) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: iamvastava <iamvastava@gmail.com>
…lm-project#37930) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…lm-project#37930) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
…lm-project#37930) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…lm-project#37930) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Rishi Puri <riship@nvidia.com>
…lm-project#37930) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Adds a proper
uv pip compileworkflow for generatingrocm-test.txtfromrocm-test.in, matching the existing CUDAtest.inin totest.txtpattern.Changes
pip-compile rocm-testhook in.pre-commit-config.yamlthat compilesrocm-test.inin torocm-test.txtusing-c requirements/rocm.txtas a constraint file (since--torch-backend rocmdoesn't exist in uv). CUDA/NVIDIA packages are excluded via--no-emit-packagesince ROCm torch is installed separately viarocm.txt.torch.version.hip is not Noneto catch accidental CUDA torch installation during the test image build.Key design decisions
-c requirements/rocm.txtconstraint instead of--torch-backend(no ROCm backend in uv)rocm.txtandDockerfile.rocm_baseWill run a full build on ROCm CI to verify no new regressions are introduced.
cc @kenroche