[WIP][Core] Update PyTorch to 2.12.0, torchvision to 0.27.0, triton to 3.7.0 by atalman · Pull Request #40077 · vllm-project/vllm

atalman · 2026-04-16T22:58:09Z

Update PyTorch ecosystem versions:

torch: 2.11.0 → 2.12.0
torchvision: 0.26.0 → 0.27.0
triton: 3.6.0 → 3.7.0
torchaudio: stays at 2.11.0

Use PyTorch test index (download.pytorch.org/whl/test/) for CUDA and CPU packages since torch 2.12.0 is currently on the test channel.

Co-authored-by: Claude

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

gemini-code-assist

Code Review

This pull request updates the project's PyTorch dependency from version 2.11.0 to 2.12.0 across multiple configuration files, Dockerfiles, and requirement specifications, while also switching to the PyTorch 'test' index URLs. Feedback highlights critical version mismatches where torchaudio remains at 2.11.0, which could cause ABI compatibility issues. Additionally, the ROCm index URL may need to be updated to the 'test' channel to ensure the new packages are found.

gemini-code-assist · 2026-04-16T23:00:24Z

 # Common dependencies
 -r common.txt

 numba == 0.61.2 # Required for N-gram speculative decoding

 # Dependencies for NVIDIA GPUs
-torch==2.11.0
+torch==2.12.0


The torch version is updated to 2.12.0, but torchaudio (line 9) remains at 2.11.0. PyTorch ecosystem packages (torch, torchvision, torchaudio) are tightly coupled and typically require matching minor versions. A mismatch here will likely lead to dependency resolution failures or runtime ABI compatibility issues. Please ensure torchaudio is also updated to 2.12.0 to match the rest of the ecosystem.

No this is expected, torchaudio stays on 2.11

gemini-code-assist · 2026-04-16T23:00:24Z

@@ -3,10 +3,10 @@


 --extra-index-url https://download.pytorch.org/whl/rocm7.1
-torch==2.11.0
-torchvision==0.26.0
+torch==2.12.0


There are two potential issues here:

Version Mismatch: torch is updated to 2.12.0, but torchaudio (line 8) remains at 2.11.0. These should be synchronized to avoid compatibility issues.

Index URL: The index URL on line 5 points to the standard ROCm channel. If torch 2.12.0 is only available on the test channel (as indicated in the PR description for CUDA/CPU), the ROCm build will fail to find the package. Consider updating the index to https://download.pytorch.org/whl/test/rocm7.1 if the package is not in the main channel.

gemini-code-assist · 2026-04-16T23:00:24Z

@@ -1219,7 +1219,7 @@ tomli==2.2.1
    # via schemathesis
 tomli-w==1.2.0
    # via schemathesis
-torch==2.11.0+cu130
+torch==2.12.0+cu130


The lockfile now contains torch==2.12.0+cu130 but torchaudio (line 1248) remains at 2.11.0+cu130. This mismatch is highly likely to cause runtime errors due to ABI incompatibility between different PyTorch minor versions. Both should be updated to 2.12.0 to ensure a stable test environment.

No this is expected, torchaudio stays on 2.11

johnnynunez · 2026-04-24T14:40:07Z

thanks to push this @atalman . I can confirm agx thor triton issues were fixed with 3.7.0
https://gist.github.com/johnnynunez/622a31294e372c5911d7633c552c4b2e

uv pip install -U vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly/cu130
uv pip install --prerelease=allow --force-reinstall triton --index-url https://download.pytorch.org/whl/test/cu132

cc @mgoin @shakh

## Summary Adds a Claude Code skill that automates end-to-end triage of failing vLLM Buildkite CI runs for PyTorch version-bump PRs. Derived from the multi-week triage of vLLM PR vllm-project/vllm#40077 (torch 2.12.0 + triton 3.7.0), which produced umbrella issue pytorch/pytorch#180899 with 25+ tracked sub-issues over a series of daily runs. ## What the skill does - Pulls the failing build's job list from Buildkite REST API and filters to **true hard failures** (excluding `soft_failed=True`, `waiting_failed`, and infra-aborted jobs). - Compares each failing job against recent main `Full CI run - nightly/daily` builds to drop **pre-existing failures**, with the caveat that infra-killed main jobs are not a valid baseline (must be retried first). - Pulls and ANSI/timestamp-strips logs for the survivors and matches them against a curated set of **root-cause signature regexes** (Inductor MetaProxy, triton PassManager, AOT cache pickling, custom-op fake-kernel stride mismatch, GPU contention, FP8 / quantized accuracy drift, etc.). - Routes each root cause to the right repo: pytorch/pytorch (torch / triton / Inductor / Dynamo / AOTAutograd) vs. vllm-project/vllm (multimodal model assertions, custom-op fake-kernel bugs, response APIs). - Drafts upstream issues with reproducibility tables, environment blocks, and tracebacks, with a strict **draft→confirm→post** protocol, and links them under the umbrella. - Manages umbrella checklist hygiene (mark closed, reopen on regressions, retract on false positives like the recent #182549 retraction). ## Notable lessons baked in - `state=failed` + `soft_failed=True` is non-blocking — always filter both. - `Engine core initialization failed. See root cause above.` is a red herring — the actual exception is several lines up in the EngineCore worker output. - Custom-op `assert_size_stride` failures on `torch.ops.vllm.<X>.default` are almost always **vLLM-side fake-kernel bugs**, not torch regressions — inspect the `direct_register_custom_op(... fake_impl=...)` registration first. - Bulk B200 `exit_status=125` + `nvidia-container-cli: device error / driver rpc error: timed out` is agent infra, not a regression — recommend rerun. - When the same B200 infra cluster wipes out *both* the test-PR and the main-build coverage of a job, the comparison is **inconclusive** — ALWAYS request a main rerun before filing. Filing without that baseline produced a wrongful issue (#182549, retracted). - Buildkite REST API rate-limit is 400/min. Token must be in a shell variable before parallel curl in `while read` (inline `$(cat …)` silently produces 0-byte log files). - Title convention: `[vllm] [<sub-area tag>] <concise root cause>`. Always include `[vllm]` from the start (post-hoc edits are noisy). ## Test plan This skill is invoked manually by Claude Code when the user points at a failing Buildkite build. Validation has been the actual triage of vLLM #40077 over 16+ daily runs since 2026-04-20: - 25+ pytorch/pytorch issues filed under umbrella #180899, with reproducibility tables and environment blocks. - Caught real regressions: AsyncTP correctness (#182124), Fullgraph Smoke Test (#182125), Batch Invariance B200 (#181248, fixed by Lucas Kabela's PR), MetaProxy in FP8 fusion (#180906), aten::bmm double-registration (#180905), and others. - Caught the gpt-oss MoE custom-op stride mismatch as a vLLM-side bug (vllm-project/vllm#41645 / #41646), correctly routing it away from pytorch. - Caught and retracted a false positive (#182549) once the main nightly was retried, demonstrating the infra-baseline lesson. No automated test harness exists for Claude skills today; the skill is exercised through the live triage workflow. ## File added - `.claude/skills/vllm-pytorch-ci-triage/SKILL.md` (379 lines) This follows the same convention as the four existing skills under `.claude/skills/` (each is a single-file `SKILL.md` with YAML frontmatter).

Update PyTorch ecosystem versions: - torch: 2.11.0 → 2.12.0 - torchvision: 0.26.0 → 0.27.0 - triton: 3.6.0 → 3.7.0 - torchaudio: stays at 2.11.0 Use PyTorch test index (download.pytorch.org/whl/test/) for CUDA and CPU packages since torch 2.12.0 is currently on the test channel. Co-authored-by: Claude Signed-off-by: atalman <atalman@meta.com>

Update nvidia-cudnn-cu13 (9.19.0.56 -> 9.20.0.48), nvidia-cusparselt-cu13 (0.8.0 -> 0.8.1), and nvidia-nccl-cu13 (2.28.9 -> 2.29.7) to resolve the dependency conflict with torch==2.12.0+cu130. Co-authored-by: Claude <noreply@anthropic.com>

The `vllm-test-deps` stage copies requirements/test/cuda.in to cpu.in, but cuda.in's top-line `--extra-index-url` points at `https://download.pytorch.org/whl/test/cu130` which only has CUDA 13 wheels. The old command also passed `--torch-backend cpu`, which pins torch lookups to the stable CPU channel (`whl/cpu`) — neither index has torch==2.12.0 yet, so uv pip compile fails with "No solution found... no version of torch==2.12.0". Rewrite the extra-index-url in the seeded cpu.in to `whl/test/cpu` (which has torch-2.12.0+cpu wheels) and drop `--torch-backend cpu` so uv uses that index directly. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: atalman <atalman@meta.com>

…DE compat tests Fixes CPU-Compatibility Tests on torch 2.12. The SDE-emulation tests previously set TORCH_COMPILE_DISABLE=1 to skip torch.compile (slow under SDE). On torch 2.11 this turned every torch.compile call site into a silent no-op. On torch 2.12, call sites that pass fullgraph=True now raise: RuntimeError: Worker failed with error 'torch.compile with fullgraph=True found no compiled frames. The frame was likely skipped (...).' Engine init goes through vLLM's piecewise-compile path, which uses fullgraph=True, so init crashes inside determine_available_memory. Use vLLM's canonical --enforce-eager engine flag instead, which never constructs a torch.compile wrapper at all. Same speedup, no contract violation, works on both torch 2.11 and 2.12. Tracked upstream as pytorch/pytorch#181247 (under umbrella pytorch/pytorch#180899). Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: atalman <atalman@meta.com>

Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

…d-safe Hugging Face fast-tokenizer wrappers (vllm-project#41181)" This reverts commit 20dcd98.

…ad-safe Hugging Face fast-tokenizer wrappers (vllm-project#41181)" This reverts commit 4ee7407.

Update PyTorch ecosystem versions: - torch: 2.11.0 -> 2.12.0 - torchvision: 0.26.0 -> 0.27.0 - triton: 3.6.0 -> 3.7.0 - torchaudio: stays at 2.11.0 Bump CUDA 13 deps to match torch 2.12.0+cu130: - nvidia-cudnn-cu13: 9.19.0.56 -> 9.20.0.48 - nvidia-cusparselt-cu13: 0.8.0 -> 0.8.1 - nvidia-nccl-cu13: 2.28.9 -> 2.29.7 Use --enforce-eager instead of TORCH_COMPILE_DISABLE=1 in the CPU SDE compat test. On torch 2.11 TORCH_COMPILE_DISABLE turned torch.compile call sites into silent no-ops; on torch 2.12 sites that pass fullgraph=True now raise "found no compiled frames", which crashes engine init via vLLM's piecewise-compile path. --enforce-eager skips the wrapper entirely on both versions. Supersedes vllm-project#40077 (release wheels are now published, so the download.pytorch.org/whl/test/ indexes are no longer needed). Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: atalman <atalman@meta.com>

mergify · 2026-05-23T08:49:24Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @atalman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Harry-Chen · 2026-05-27T01:51:08Z

Superseded by #42848.

atalman requested review from LucasWilkinson, bigPYJ1151, tlrmchlsmth and xuechendi as code owners April 16, 2026 22:58

mergify Bot added ci/build nvidia cpu Related to CPU backends labels Apr 16, 2026

github-project-automation Bot added this to NVIDIA Apr 16, 2026

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

atalman marked this pull request as draft April 16, 2026 23:01

atalman force-pushed the release_212_tests branch 2 times, most recently from d78656a to 9ecbbd3 Compare April 22, 2026 12:19

atalman force-pushed the release_212_tests branch from ea825d8 to e2a9354 Compare April 22, 2026 23:40

atalman force-pushed the release_212_tests branch from e2a9354 to b3b475f Compare April 24, 2026 11:51

atalman mentioned this pull request Apr 24, 2026

[vllm] [2.12 regression] Qwen2-VL vision-tower-only LoRA generation diverges from golden output pytorch/pytorch#181409

Open

atalman force-pushed the release_212_tests branch from 54590d5 to 62574c0 Compare May 6, 2026 01:04

atalman mentioned this pull request May 6, 2026

[vllm] [2.12 regression][FLASH_ATTN] test_cascade_attention divergence: extra "you" in generated Fibonacci output pytorch/pytorch#182700

Closed

mergify Bot added the multi-modality Related to multi-modality (#4194) label May 6, 2026

atalman force-pushed the release_212_tests branch from 3321179 to 4ee7407 Compare May 7, 2026 00:12

yzong-rh mentioned this pull request May 7, 2026

[Bugfix] Fix RuntimeError: Already borrowed by adding thread-safe Hugging Face fast-tokenizer wrappers #41181

Merged

4 tasks

atalman force-pushed the release_212_tests branch from 4ee7407 to 6f07d2f Compare May 7, 2026 13:34

atalman and others added 7 commits May 7, 2026 15:05

Fix B200 batch determinism in fused_moe logic

4c66c02

Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

Revert "[Bugfix] Fix RuntimeError: Already borrowed by adding threa…

3261fe1

…d-safe Hugging Face fast-tokenizer wrappers (vllm-project#41181)" This reverts commit 20dcd98.

Reapply "[Bugfix] Fix RuntimeError: Already borrowed by adding thre…

26f5b0b

…ad-safe Hugging Face fast-tokenizer wrappers (vllm-project#41181)" This reverts commit 4ee7407.

atalman force-pushed the release_212_tests branch from 6f07d2f to 26f5b0b Compare May 7, 2026 22:05

ovidiusm mentioned this pull request May 15, 2026

NIXL EP wheels: Add support for multiple CUDA and PyTorch versions ai-dynamo/nixl#1646

Closed

atalman mentioned this pull request May 16, 2026

[Core] Update PyTorch to 2.12.0, torchvision to 0.27.0, triton to 3.7.0 #42848

Draft

4 tasks

atalman mentioned this pull request May 21, 2026

[claude-skills] Add PyTorch/Triton version-bump CI triage skill vllm-project/ci-infra#360

Merged

mergify Bot added the needs-rebase label May 23, 2026

Harry-Chen closed this May 27, 2026

github-project-automation Bot moved this to Done in NVIDIA May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][Core] Update PyTorch to 2.12.0, torchvision to 0.27.0, triton to 3.7.0#40077

[WIP][Core] Update PyTorch to 2.12.0, torchvision to 0.27.0, triton to 3.7.0#40077
atalman wants to merge 7 commits into
vllm-project:mainfrom
atalman:release_212_tests

atalman commented Apr 16, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Uh oh!

atalman Apr 20, 2026

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Uh oh!

atalman Apr 20, 2026

Uh oh!

johnnynunez commented Apr 24, 2026 •

edited

Loading

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Harry-Chen commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

atalman commented Apr 16, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

atalman Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

atalman Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

johnnynunez commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Harry-Chen commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

atalman commented Apr 16, 2026 •

edited by github-actions Bot

Loading

johnnynunez commented Apr 24, 2026 •

edited

Loading