[Release 2.10] Update to Torch 2.10 - final release#30525
[Release 2.10] Update to Torch 2.10 - final release#30525vllm-bot merged 19 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates various dependencies to test the upcoming Torch 2.10 release candidate, primarily by bumping the version of torch and related packages like torchaudio, torchvision, and torchao across multiple configuration files. While the version updates are consistent, I've found a critical issue with how version-specific workarounds are handled. The logic for applying monkey-patches for PyTorch 2.9 bugs has been changed to apply to all future versions (>=2.9.0), which poses a significant forward-compatibility risk. My review provides suggestions to scope these patches to a more limited version range to prevent them from causing issues in future PyTorch releases where the original bugs may be fixed.
vllm/env_override.py
Outdated
| @@ -363,7 +363,7 @@ def _update_scheduler_patched(self) -> None: | |||
| self.scheduler = Scheduler(self.operations) | |||
|
|
|||
|
|
|||
| if is_torch_equal("2.9.0"): | |||
| if is_torch_equal_or_newer("2.9.0"): | |||
There was a problem hiding this comment.
The condition is_torch_equal_or_newer("2.9.0") is too broad. These monkey-patches are workarounds for specific bugs in PyTorch 2.9.0. While they might be necessary for 2.10.0, applying them to all future versions is risky and can lead to conflicts when the bugs are fixed upstream. It's better to scope this to the versions where the patch is known to be needed. A safer approach would be to specify an upper bound, for example, to include versions 2.9.x and 2.10.x but not 2.11.x and newer.
| if is_torch_equal_or_newer("2.9.0"): | |
| if is_torch_equal_or_newer("2.9.0") and not is_torch_equal_or_newer("2.11.0"): |
vllm/model_executor/layers/conv.py
Outdated
| @@ -251,6 +251,6 @@ def forward_cuda(self, x: torch.Tensor) -> torch.Tensor: | |||
| # See: https://github.com/vllm-project/vllm/issues/27406 | |||
| # and https://github.com/pytorch/pytorch/issues/166122 | |||
| # By default, we use CUDNN's convolution ops with optimization. | |||
| if self.enable_linear and is_torch_equal("2.9.0"): | |||
| if self.enable_linear and is_torch_equal_or_newer("2.9.0"): | |||
There was a problem hiding this comment.
The condition is_torch_equal_or_newer("2.9.0") is too broad. This workaround is for a performance regression in PyTorch 2.9.0. Applying it to all future versions is risky, as the fix might be included in a future release, and this workaround could become detrimental or incorrect. It's better to scope this to the versions where the workaround is known to be needed. A safer approach would be to specify an upper bound, for example, to include versions 2.9.x and 2.10.x but not 2.11.x and newer.
| if self.enable_linear and is_torch_equal_or_newer("2.9.0"): | |
| if self.enable_linear and is_torch_equal_or_newer("2.9.0") and not is_torch_equal_or_newer("2.11.0"): |
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
2 similar comments
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
1b1e207 to
6a98b4f
Compare
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
6a98b4f to
5e8a504
Compare
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
e873a83 to
22dff7b
Compare
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
This pull request has merge conflicts that must be resolved before it can be |
22dff7b to
44fe379
Compare
|
Hi @atalman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Let's go for it, I can't find any clearly failing tests from the logs |
|
After upgrading to It's not blocking for me (I don't use GPT-OSS) but this is worth looking into and fixing. |
Because on that version it was moved i think so, that error sounds to me familiar when dgx spark was launched. Anyways triton 3.6.0 has important fixes for dgx spark |
This is a problem for Hopper, since these are the best kernel for gpt-oss |
|
@DarkLight1337 I just tried running myself on H200 with a fresh environment and it worked fine using the triton backend. I think your install might have had some old state if you upgraded in place, such as not rebuilding triton_kernels in our cmake |
|
Yes, works well in my builds too with Triton 3.6.0 and triton-kernels built from the release branch. No errors. |
Ok let me try rebuilding from scratch |
|
I verified running gpt-oss-20b on B200 and H100 as well. Didn't run into any issues - good eval scores. |
Note
Cursor Bugbot is generating a summary for commit 37af14c52963e6d528866dded7f51985a1fcef7e. Configure here.
Note
Upgrade to PyTorch 2.10.0 across the project
torch/torchaudio/torchvisionto2.10.0(andtorchvision 0.25.0), updates CMake supported torch versions to2.10.0, and refreshes related deps (triton 3.6.0,nvidia-nvshmem-cu12 3.4.5)..../whl/test, enable--prerelease=allowfor installs, and plumb test indexes in CUDA and CPU images; add extra-index usage to the python-only compile test.pip-compilenow uses test cu129 index; Prime-RL script force-reinstalls torch/vision from test cu129.2.10.0.devto2.10.0and update decorators/env checks accordingly.Written by Cursor Bugbot for commit 37af14c52963e6d528866dded7f51985a1fcef7e. This will update automatically on new commits. Configure here.
FIX #29595
FIX #33888