Skip to content

Revert "[release 2.11] Update to torch 2.11" (#34644)#39300

Closed
vllm-agent wants to merge 1 commit intovllm-project:mainfrom
vllm-agent:auto-revert/pr-34644
Closed

Revert "[release 2.11] Update to torch 2.11" (#34644)#39300
vllm-agent wants to merge 1 commit intovllm-project:mainfrom
vllm-agent:auto-revert/pr-34644

Conversation

@vllm-agent
Copy link
Copy Markdown

Revert of #34644

This reverts the merge commit 2111997 from PR #34644.

Reason

5 new CI test failures detected in nightly build #60295:

  • MoE Refactor Integration Test (B200 - TEMPORARY) — Server failed to start in time for Llama-4-Scout-BF16-fi-cutlass
  • Language Models Test (Extended Generation) — Logprobs divergence for bigcode/starcoder2-3b
  • Language Models Test (Extended Pooling) — test_set_max_model_len_illegal did not raise expected ValueError
  • Language Models Test (MTEB) — Nomic embedding accuracy assertion failure
  • Quantization — Server failed to start in time for cpu_offload_compressed_tensors

Changes reverted

This PR updated torch from 2.10 to 2.11, torchao from 0.14.1 to 0.17.0, CUDA from 12.9 to 13.0, and modified moe_runner_base.py, quantization CI config, and test dependencies. The revert rolls back all of these changes.


Auto-generated by CI failure analyzer | Build #60295

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 8, 2026

Documentation preview: https://vllm--39300.org.readthedocs.build/en/39300/

@mergify mergify bot added documentation Improvements or additions to documentation ci/build nvidia rocm Related to AMD ROCm labels Apr 8, 2026
@mergify mergify bot added the cpu Related to CPU backends label Apr 8, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Apr 8, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request downgrades the supported versions of PyTorch (from 2.11.0 to 2.10.0), CUDA (from 13.0.0 to 12.9.1), and associated dependencies such as torchvision, torchaudio, and torchao across the build system, Dockerfiles, and CI configurations. Key changes include unskipping tests previously incompatible with PyTorch 2.11 and simplifying the MoE layer name resolution. Review feedback identifies a critical shell syntax error in the Dockerfile caused by comments within a multi-line command and an inconsistency in RISC-V platform support within the CPU build requirements.

Comment on lines +556 to +559
libcublas-${CUDA_VERSION_DASH} \
# Fixes nccl_allocator requiring nccl.h at runtime
# https://github.com/vllm-project/vllm/blob/1336a1ea244fa8bfd7e72751cabbdb5b68a0c11a/vllm/distributed/device_communicators/pynccl_allocator.py#L22
libnccl-dev && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The inclusion of comments within the multi-line apt-get install command breaks the shell execution. The backslash on line 556 causes the shell to continue the command to line 557, which is a comment. This effectively terminates the apt-get install command arguments, and line 559 will be interpreted as a new (and invalid) command, causing the build to fail. Please move these comments outside of the command or place them after the && operator on a separate line.

        libcublas-${CUDA_VERSION_DASH} \ 
        libnccl-dev && \

Comment on lines +6 to +7
torch==2.10.0+cpu; platform_machine == "x86_64" or platform_machine == "s390x"
torch==2.10.0; platform_machine == "aarch64" or platform_system == "Darwin" or platform_machine == "ppc64le"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The riscv64 platform is missing from the torch requirements in cpu-build.txt, creating an inconsistency with requirements/cpu.txt where it is present. This will cause build failures on RISC-V systems as the torch dependency will not be satisfied during the build phase.

torch==2.10.0+cpu; platform_machine == "x86_64" or platform_machine == "s390x"
torch==2.10.0; platform_machine == "aarch64" or platform_system == "Darwin" or platform_machine == "ppc64le" or platform_machine == "riscv64"

@noooop noooop closed this Apr 8, 2026
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Apr 8, 2026
@github-project-automation github-project-automation bot moved this to Done in NVIDIA Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends documentation Improvements or additions to documentation nvidia rocm Related to AMD ROCm

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants