Skip to content

[WIP][Core] Update PyTorch to 2.12.0, torchvision to 0.27.0, triton to 3.7.0#40077

Closed
atalman wants to merge 7 commits into
vllm-project:mainfrom
atalman:release_212_tests
Closed

[WIP][Core] Update PyTorch to 2.12.0, torchvision to 0.27.0, triton to 3.7.0#40077
atalman wants to merge 7 commits into
vllm-project:mainfrom
atalman:release_212_tests

Conversation

@atalman
Copy link
Copy Markdown
Contributor

@atalman atalman commented Apr 16, 2026

Update PyTorch ecosystem versions:

  • torch: 2.11.0 → 2.12.0
  • torchvision: 0.26.0 → 0.27.0
  • triton: 3.6.0 → 3.7.0
  • torchaudio: stays at 2.11.0

Use PyTorch test index (download.pytorch.org/whl/test/) for CUDA and CPU packages since torch 2.12.0 is currently on the test channel.

Co-authored-by: Claude

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the project's PyTorch dependency from version 2.11.0 to 2.12.0 across multiple configuration files, Dockerfiles, and requirement specifications, while also switching to the PyTorch 'test' index URLs. Feedback highlights critical version mismatches where torchaudio remains at 2.11.0, which could cause ABI compatibility issues. Additionally, the ROCm index URL may need to be updated to the 'test' channel to ensure the new packages are found.

Comment thread requirements/cuda.txt
# Common dependencies
-r common.txt

numba == 0.61.2 # Required for N-gram speculative decoding

# Dependencies for NVIDIA GPUs
torch==2.11.0
torch==2.12.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The torch version is updated to 2.12.0, but torchaudio (line 9) remains at 2.11.0. PyTorch ecosystem packages (torch, torchvision, torchaudio) are tightly coupled and typically require matching minor versions. A mismatch here will likely lead to dependency resolution failures or runtime ABI compatibility issues. Please ensure torchaudio is also updated to 2.12.0 to match the rest of the ecosystem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is expected, torchaudio stays on 2.11

@@ -3,10 +3,10 @@


--extra-index-url https://download.pytorch.org/whl/rocm7.1
torch==2.11.0
torchvision==0.26.0
torch==2.12.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two potential issues here:

  1. Version Mismatch: torch is updated to 2.12.0, but torchaudio (line 8) remains at 2.11.0. These should be synchronized to avoid compatibility issues.
  2. Index URL: The index URL on line 5 points to the standard ROCm channel. If torch 2.12.0 is only available on the test channel (as indicated in the PR description for CUDA/CPU), the ROCm build will fail to find the package. Consider updating the index to https://download.pytorch.org/whl/test/rocm7.1 if the package is not in the main channel.

@@ -1219,7 +1219,7 @@ tomli==2.2.1
# via schemathesis
tomli-w==1.2.0
# via schemathesis
torch==2.11.0+cu130
torch==2.12.0+cu130
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The lockfile now contains torch==2.12.0+cu130 but torchaudio (line 1248) remains at 2.11.0+cu130. This mismatch is highly likely to cause runtime errors due to ABI incompatibility between different PyTorch minor versions. Both should be updated to 2.12.0 to ensure a stable test environment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is expected, torchaudio stays on 2.11

@atalman atalman marked this pull request as draft April 16, 2026 23:01
@atalman atalman force-pushed the release_212_tests branch 2 times, most recently from d78656a to 9ecbbd3 Compare April 22, 2026 12:19
@atalman atalman force-pushed the release_212_tests branch from ea825d8 to e2a9354 Compare April 22, 2026 23:40
@atalman atalman force-pushed the release_212_tests branch from e2a9354 to b3b475f Compare April 24, 2026 11:51
@johnnynunez
Copy link
Copy Markdown
Contributor

johnnynunez commented Apr 24, 2026

thanks to push this @atalman . I can confirm agx thor triton issues were fixed with 3.7.0
https://gist.github.com/johnnynunez/622a31294e372c5911d7633c552c4b2e

uv pip install -U vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly/cu130
uv pip install --prerelease=allow --force-reinstall triton --index-url https://download.pytorch.org/whl/test/cu132

cc @mgoin @shakh

@atalman atalman force-pushed the release_212_tests branch from 54590d5 to 62574c0 Compare May 6, 2026 01:04
atalman added a commit to pytorch/test-infra that referenced this pull request May 6, 2026
## Summary

Adds a Claude Code skill that automates end-to-end triage of failing
vLLM Buildkite CI runs for PyTorch version-bump PRs.

Derived from the multi-week triage of vLLM PR vllm-project/vllm#40077
(torch 2.12.0 + triton 3.7.0), which produced umbrella issue
pytorch/pytorch#180899 with 25+ tracked sub-issues over a series of
daily runs.

## What the skill does

- Pulls the failing build's job list from Buildkite REST API and filters
to **true hard failures** (excluding `soft_failed=True`,
`waiting_failed`, and infra-aborted jobs).
- Compares each failing job against recent main `Full CI run -
nightly/daily` builds to drop **pre-existing failures**, with the caveat
that infra-killed main jobs are not a valid baseline (must be retried
first).
- Pulls and ANSI/timestamp-strips logs for the survivors and matches
them against a curated set of **root-cause signature regexes** (Inductor
MetaProxy, triton PassManager, AOT cache pickling, custom-op fake-kernel
stride mismatch, GPU contention, FP8 / quantized accuracy drift, etc.).
- Routes each root cause to the right repo: pytorch/pytorch (torch /
triton / Inductor / Dynamo / AOTAutograd) vs. vllm-project/vllm
(multimodal model assertions, custom-op fake-kernel bugs, response
APIs).
- Drafts upstream issues with reproducibility tables, environment
blocks, and tracebacks, with a strict **draft→confirm→post** protocol,
and links them under the umbrella.
- Manages umbrella checklist hygiene (mark closed, reopen on
regressions, retract on false positives like the recent #182549
retraction).

## Notable lessons baked in

- `state=failed` + `soft_failed=True` is non-blocking — always filter
both.
- `Engine core initialization failed. See root cause above.` is a red
herring — the actual exception is several lines up in the EngineCore
worker output.
- Custom-op `assert_size_stride` failures on
`torch.ops.vllm.<X>.default` are almost always **vLLM-side fake-kernel
bugs**, not torch regressions — inspect the
`direct_register_custom_op(... fake_impl=...)` registration first.
- Bulk B200 `exit_status=125` + `nvidia-container-cli: device error /
driver rpc error: timed out` is agent infra, not a regression —
recommend rerun.
- When the same B200 infra cluster wipes out *both* the test-PR and the
main-build coverage of a job, the comparison is **inconclusive** —
ALWAYS request a main rerun before filing. Filing without that baseline
produced a wrongful issue (#182549, retracted).
- Buildkite REST API rate-limit is 400/min. Token must be in a shell
variable before parallel curl in `while read` (inline `$(cat …)`
silently produces 0-byte log files).
- Title convention: `[vllm] [<sub-area tag>] <concise root cause>`.
Always include `[vllm]` from the start (post-hoc edits are noisy).

## Test plan

This skill is invoked manually by Claude Code when the user points at a
failing Buildkite build. Validation has been the actual triage of vLLM
#40077 over 16+ daily runs since 2026-04-20:

- 25+ pytorch/pytorch issues filed under umbrella #180899, with
reproducibility tables and environment blocks.
- Caught real regressions: AsyncTP correctness (#182124), Fullgraph
Smoke Test (#182125), Batch Invariance B200 (#181248, fixed by Lucas
Kabela's PR), MetaProxy in FP8 fusion (#180906), aten::bmm
double-registration (#180905), and others.
- Caught the gpt-oss MoE custom-op stride mismatch as a vLLM-side bug
(vllm-project/vllm#41645 / #41646), correctly routing it away from
pytorch.
- Caught and retracted a false positive (#182549) once the main nightly
was retried, demonstrating the infra-baseline lesson.

No automated test harness exists for Claude skills today; the skill is
exercised through the live triage workflow.

## File added

- `.claude/skills/vllm-pytorch-ci-triage/SKILL.md` (379 lines)

This follows the same convention as the four existing skills under
`.claude/skills/` (each is a single-file `SKILL.md` with YAML
frontmatter).
@mergify mergify Bot added the multi-modality Related to multi-modality (#4194) label May 6, 2026
@atalman atalman force-pushed the release_212_tests branch from 3321179 to 4ee7407 Compare May 7, 2026 00:12
@atalman atalman force-pushed the release_212_tests branch from 4ee7407 to 6f07d2f Compare May 7, 2026 13:34
atalman and others added 7 commits May 7, 2026 15:05
Update PyTorch ecosystem versions:
- torch: 2.11.0 → 2.12.0
- torchvision: 0.26.0 → 0.27.0
- triton: 3.6.0 → 3.7.0
- torchaudio: stays at 2.11.0

Use PyTorch test index (download.pytorch.org/whl/test/) for CUDA
and CPU packages since torch 2.12.0 is currently on the test channel.

Co-authored-by: Claude
Signed-off-by: atalman <atalman@meta.com>
Update nvidia-cudnn-cu13 (9.19.0.56 -> 9.20.0.48),
nvidia-cusparselt-cu13 (0.8.0 -> 0.8.1), and
nvidia-nccl-cu13 (2.28.9 -> 2.29.7) to resolve the dependency
conflict with torch==2.12.0+cu130.

Co-authored-by: Claude <noreply@anthropic.com>
The `vllm-test-deps` stage copies requirements/test/cuda.in to cpu.in,
but cuda.in's top-line `--extra-index-url` points at
`https://download.pytorch.org/whl/test/cu130` which only has CUDA 13
wheels. The old command also passed `--torch-backend cpu`, which pins
torch lookups to the stable CPU channel (`whl/cpu`) — neither index
has torch==2.12.0 yet, so uv pip compile fails with
"No solution found... no version of torch==2.12.0".

Rewrite the extra-index-url in the seeded cpu.in to
`whl/test/cpu` (which has torch-2.12.0+cpu wheels) and drop
`--torch-backend cpu` so uv uses that index directly.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: atalman <atalman@meta.com>
…DE compat tests

Fixes CPU-Compatibility Tests on torch 2.12.

The SDE-emulation tests previously set TORCH_COMPILE_DISABLE=1 to skip
torch.compile (slow under SDE). On torch 2.11 this turned every torch.compile
call site into a silent no-op. On torch 2.12, call sites that pass
fullgraph=True now raise:

  RuntimeError: Worker failed with error 'torch.compile with fullgraph=True
  found no compiled frames. The frame was likely skipped (...).'

Engine init goes through vLLM's piecewise-compile path, which uses
fullgraph=True, so init crashes inside determine_available_memory.

Use vLLM's canonical --enforce-eager engine flag instead, which never
constructs a torch.compile wrapper at all. Same speedup, no contract
violation, works on both torch 2.11 and 2.12.

Tracked upstream as pytorch/pytorch#181247 (under umbrella pytorch/pytorch#180899).

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: atalman <atalman@meta.com>
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
…d-safe Hugging Face fast-tokenizer wrappers (vllm-project#41181)"

This reverts commit 20dcd98.
…ad-safe Hugging Face fast-tokenizer wrappers (vllm-project#41181)"

This reverts commit 4ee7407.
@atalman atalman force-pushed the release_212_tests branch from 6f07d2f to 26f5b0b Compare May 7, 2026 22:05
atalman added a commit to atalman/vllm that referenced this pull request May 19, 2026
Update PyTorch ecosystem versions:
- torch: 2.11.0 -> 2.12.0
- torchvision: 0.26.0 -> 0.27.0
- triton: 3.6.0 -> 3.7.0
- torchaudio: stays at 2.11.0

Bump CUDA 13 deps to match torch 2.12.0+cu130:
- nvidia-cudnn-cu13: 9.19.0.56 -> 9.20.0.48
- nvidia-cusparselt-cu13: 0.8.0 -> 0.8.1
- nvidia-nccl-cu13: 2.28.9 -> 2.29.7

Use --enforce-eager instead of TORCH_COMPILE_DISABLE=1 in the
CPU SDE compat test. On torch 2.11 TORCH_COMPILE_DISABLE turned
torch.compile call sites into silent no-ops; on torch 2.12 sites
that pass fullgraph=True now raise "found no compiled frames",
which crashes engine init via vLLM's piecewise-compile path.
--enforce-eager skips the wrapper entirely on both versions.

Supersedes vllm-project#40077 (release wheels are now published, so the
download.pytorch.org/whl/test/ indexes are no longer needed).

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: atalman <atalman@meta.com>
atalman added a commit to atalman/vllm that referenced this pull request May 20, 2026
Update PyTorch ecosystem versions:
- torch: 2.11.0 -> 2.12.0
- torchvision: 0.26.0 -> 0.27.0
- triton: 3.6.0 -> 3.7.0
- torchaudio: stays at 2.11.0

Bump CUDA 13 deps to match torch 2.12.0+cu130:
- nvidia-cudnn-cu13: 9.19.0.56 -> 9.20.0.48
- nvidia-cusparselt-cu13: 0.8.0 -> 0.8.1
- nvidia-nccl-cu13: 2.28.9 -> 2.29.7

Use --enforce-eager instead of TORCH_COMPILE_DISABLE=1 in the
CPU SDE compat test. On torch 2.11 TORCH_COMPILE_DISABLE turned
torch.compile call sites into silent no-ops; on torch 2.12 sites
that pass fullgraph=True now raise "found no compiled frames",
which crashes engine init via vLLM's piecewise-compile path.
--enforce-eager skips the wrapper entirely on both versions.

Supersedes vllm-project#40077 (release wheels are now published, so the
download.pytorch.org/whl/test/ indexes are no longer needed).

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: atalman <atalman@meta.com>
atalman added a commit to atalman/vllm that referenced this pull request May 21, 2026
Update PyTorch ecosystem versions:
- torch: 2.11.0 -> 2.12.0
- torchvision: 0.26.0 -> 0.27.0
- triton: 3.6.0 -> 3.7.0
- torchaudio: stays at 2.11.0

Bump CUDA 13 deps to match torch 2.12.0+cu130:
- nvidia-cudnn-cu13: 9.19.0.56 -> 9.20.0.48
- nvidia-cusparselt-cu13: 0.8.0 -> 0.8.1
- nvidia-nccl-cu13: 2.28.9 -> 2.29.7

Use --enforce-eager instead of TORCH_COMPILE_DISABLE=1 in the
CPU SDE compat test. On torch 2.11 TORCH_COMPILE_DISABLE turned
torch.compile call sites into silent no-ops; on torch 2.12 sites
that pass fullgraph=True now raise "found no compiled frames",
which crashes engine init via vLLM's piecewise-compile path.
--enforce-eager skips the wrapper entirely on both versions.

Supersedes vllm-project#40077 (release wheels are now published, so the
download.pytorch.org/whl/test/ indexes are no longer needed).

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: atalman <atalman@meta.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 23, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @atalman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 23, 2026
@Harry-Chen
Copy link
Copy Markdown
Member

Superseded by #42848.

@Harry-Chen Harry-Chen closed this May 27, 2026
@github-project-automation github-project-automation Bot moved this to Done in NVIDIA May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends multi-modality Related to multi-modality (#4194) needs-rebase nvidia

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants