[WIP][XPU] upgrade torch-xpu to 2.12#42262
Conversation
|
Documentation preview: https://vllm--42262.org.readthedocs.build/en/42262/ |
There was a problem hiding this comment.
Code Review
This pull request updates XPU-related dependencies, bumping torch to 2.12.0+xpu and triton-xpu to 3.7.1, while also modifying the pre-commit configuration to exclude various torch and NVIDIA/CUDA packages from generated requirement files. Feedback highlights that the XPU installation documentation was mistakenly emptied and should be updated with the new version requirements. Additionally, there are concerns regarding the use of a test index URL and the lack of version pinning for torchaudio and torchvision, which may lead to environment instability.
I am having trouble creating individual review comments. Click here to see my feedback.
docs/getting_started/installation/gpu.xpu.inc.md (1-98)
The XPU installation guide content has been completely removed. This file should be updated to reflect the new version requirements (e.g., Torch 2.12 and Triton-XPU 3.7.1) rather than being emptied, as it is essential for users setting up the XPU backend.
requirements/xpu.txt (13-16)
The use of the test/xpu index and the lack of version pinning for torchaudio and torchvision introduce potential instability. Since torch is pinned to 2.12.0+xpu, torchaudio and torchvision should also be pinned to their matching XPU versions to ensure a consistent and reproducible environment. Furthermore, the index should be switched back to the stable one once the upgrade is finalized.
bf640ae to
e9478f8
Compare
e9478f8 to
dae6271
Compare
|
Documentation preview: https://vllm--42262.org.readthedocs.build/en/42262/ |
| # Exclude torch and CUDA/NVIDIA packages | ||
| --no-emit-package, torch, | ||
| --no-emit-package, torchvision, | ||
| --no-emit-package, torchaudio, | ||
| --no-emit-package, triton, | ||
| --no-emit-package, cuda-bindings, | ||
| --no-emit-package, cuda-pathfinder, | ||
| --no-emit-package, cuda-toolkit, | ||
| --no-emit-package, cupy-cuda12x, | ||
| # nvidia packages (unsuffixed / unified naming) | ||
| --no-emit-package, nvidia-cublas, | ||
| --no-emit-package, nvidia-cuda-cupti, | ||
| --no-emit-package, nvidia-cuda-nvrtc, | ||
| --no-emit-package, nvidia-cuda-runtime, | ||
| --no-emit-package, nvidia-cudnn, | ||
| --no-emit-package, nvidia-cufft, | ||
| --no-emit-package, nvidia-cufile, | ||
| --no-emit-package, nvidia-curand, | ||
| --no-emit-package, nvidia-cusolver, | ||
| --no-emit-package, nvidia-cusparse, | ||
| --no-emit-package, nvidia-cusparselt, | ||
| --no-emit-package, nvidia-nccl, | ||
| --no-emit-package, nvidia-nvjitlink, | ||
| --no-emit-package, nvidia-nvshmem, | ||
| --no-emit-package, nvidia-nvtx, | ||
| # nvidia cu12 packages | ||
| --no-emit-package, nvidia-cublas-cu12, | ||
| --no-emit-package, nvidia-cuda-cupti-cu12, | ||
| --no-emit-package, nvidia-cuda-nvrtc-cu12, | ||
| --no-emit-package, nvidia-cuda-runtime-cu12, | ||
| --no-emit-package, nvidia-cudnn-cu12, | ||
| --no-emit-package, nvidia-cufft-cu12, | ||
| --no-emit-package, nvidia-cufile-cu12, | ||
| --no-emit-package, nvidia-curand-cu12, | ||
| --no-emit-package, nvidia-cusolver-cu12, | ||
| --no-emit-package, nvidia-cusparse-cu12, | ||
| --no-emit-package, nvidia-cusparselt-cu12, | ||
| --no-emit-package, nvidia-nccl-cu12, | ||
| --no-emit-package, nvidia-nvjitlink-cu12, | ||
| --no-emit-package, nvidia-nvshmem-cu12, | ||
| --no-emit-package, nvidia-nvtx-cu12, | ||
| # nvidia cu13 packages | ||
| --no-emit-package, nvidia-cublas-cu13, | ||
| --no-emit-package, nvidia-cuda-cupti-cu13, | ||
| --no-emit-package, nvidia-cuda-nvrtc-cu13, | ||
| --no-emit-package, nvidia-cuda-runtime-cu13, | ||
| --no-emit-package, nvidia-cudnn-cu13, | ||
| --no-emit-package, nvidia-cufft-cu13, | ||
| --no-emit-package, nvidia-cufile-cu13, | ||
| --no-emit-package, nvidia-curand-cu13, | ||
| --no-emit-package, nvidia-cusolver-cu13, | ||
| --no-emit-package, nvidia-cusparse-cu13, | ||
| --no-emit-package, nvidia-cusparselt-cu13, | ||
| --no-emit-package, nvidia-nccl-cu13, | ||
| --no-emit-package, nvidia-nvjitlink-cu13, | ||
| --no-emit-package, nvidia-nvshmem-cu13, | ||
| --no-emit-package, nvidia-nvtx-cu13, |
There was a problem hiding this comment.
Instead of duplicating this block please use yaml anchors like we do for mypy
Lines 151 to 163 in 4200f62
There was a problem hiding this comment.
Another question would be, why have we stopped using --torch-backend?
There was a problem hiding this comment.
I will revert it back. I tried before torch 2.12 release, with torch test channel(https://download.pytorch.org/whl/test/xpu), which will throw some compatible issue, so I follow what rocm did. I think it will no longer be a issue since torch 2.12 is released. thanks for your review!
60c7bbd to
42aa309
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
42aa309 to
a6beabf
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.