[Bugfix] Remove nested torch.compile in GDN rearrange_mixed_qkv causing CUDA graph capture failure by tdoublep · Pull Request #42070 · vllm-project/vllm

tdoublep · 2026-05-08T13:37:26Z

Summary

Remove @torch.compile(fullgraph=True) from rearrange_mixed_qkv which triggers Triton autotuning (torch.cuda.synchronize()) during CUDA graph capture
The method is already compiled by the outer AOT compilation pass, making the nested decorator redundant and harmful

Reproduction

vllm serve Qwen/Qwen3.5-35B-A3B   --language-model-only   --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

Fails with:

torch.AcceleratorError: CUDA error: operation not permitted when stream is capturing

[Bugfix] Fix Triton stream capture error on A100 in GDN attention with MTP speculative decoding #39483 — fixes a similar issue in causal_conv1d Triton kernels via warmup path
fix mtp launch error in vllm-0.17.1-rc, about cuda graph during memory profile #36634 — same class of bug (GDN + MTP spec decode triggering JIT during capture)

Test plan

Verify Qwen/Qwen3.5-35B-A3B with MTP spec decode starts successfully on GB200

🤖 Generated with Claude Code

Remove nested @torch.compile(fullgraph=True) decorator that triggered Triton autotuning (torch.cuda.synchronize) during CUDA graph capture. The method is already compiled by the outer AOT compilation pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request removes the @torch.compile(fullgraph=True) decorator from the rearrange_mixed_qkv method in vllm/model_executor/layers/mamba/gdn_linear_attn.py. I have no feedback to provide as there were no review comments to evaluate.

ZJY0516

~~Doesn't this only run on rocm?~~

tjtanaa · 2026-05-08T14:01:26Z

@tdoublep a question. In the PR that introduced this seems to be able to run Qwen3.5 on B200 https://buildkite.com/vllm/ci/builds/65043/canvas?jid=019e05df-d38b-4432-b813-5f66b42e419a&tab=output

If I understand correctly, the rearrange_mixed_qkv is invoked within the scope of custom op gdn_attention_core which is opaque to the torch compile.

tdoublep · 2026-05-08T14:06:24Z

@ZJY0516 @tjtanaa I am seeing latest main failing on GB200 without this change.

tpopp · 2026-05-08T14:41:30Z

Sorry about this and I give my spiritual approval. When this started, there were some noticeable additional fusions with torch.compile (with an older vLLM though and before some other GDN changes), so there might be some lost perf, but of course this should be done to fix breakages.

ZJY0516 · 2026-05-08T14:42:11Z

reproduced using

vllm serve Qwen/Qwen3.5-35B-A3B   --language-model-only   --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

And it only happens with spec decoding

ZJY0516 · 2026-05-08T14:44:35Z

Sorry about this and I give my spiritual approval. When this started, there were some noticeable additional fusions with torch.compile (with an older vLLM though and before some other GDN changes), so there might be some lost perf, but of course this should be done to fix breakages.

I think the best way is to change the kernel to accept non-contiguous input

tjtanaa · 2026-05-08T15:43:27Z

reproduced using

vllm serve Qwen/Qwen3.5-35B-A3B   --language-model-only   --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

And it only happens with spec decoding

That's why the CI also didn't capture this issue.

tjtanaa · 2026-05-08T15:44:00Z

@ZJY0516 @tjtanaa I am seeing latest main failing on GB200 without this change.

@tdoublep may I know which test group is that?

tdoublep · 2026-05-08T15:48:46Z

@tjtanaa No test group, I was just deploying model with MTP - similar to the above example. Surprised it is not caught by tests though.

tjtanaa · 2026-05-08T15:55:25Z

@tjtanaa No test group, I was just deploying model with MTP - similar to the above example. Surprised it is not caught by tests though.

There is no Qwen 3..5 test on CI it seems. There is only one test group lm-eval Qwen3.5 (B200) that I had to triggered manually.

ZJY0516 · 2026-05-08T16:12:46Z

@tjtanaa related failure https://buildkite.com/vllm/ci/builds/65133/canvas?sid=019e06ae-7db5-4cdb-b6ad-1d49ae6cb583&tab=output

SoluMilken · 2026-05-09T05:04:17Z

Should we rebase this on latest main and rerun the relevant Buildkite jobs? Thanks.

SoluMilken · 2026-05-09T14:51:29Z

Could a maintainer please rerun the two failing Buildkite jobs?

fusion-e2e-tp2-b200 appears to be infra: pytest never started, and the B200 k8s pod stayed Pending for 15m with imagecheck-0 incomplete.
amd-multi-modal-models-standard-2-qwen3-plus-gemm is not reproducible for me locally because I do not have AMD/MI300 access; the log ends with termination/cancellation rather than a clear assertion failure.

The PR change is limited to removing one nested torch.compile(fullgraph=True), so these look worth rerunning first.

Thanks.

…ng CUDA graph capture failure (vllm-project#42070) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>

mergify Bot added nvidia bug Something isn't working labels May 8, 2026

github-project-automation Bot added this to NVIDIA May 8, 2026

tdoublep marked this pull request as ready for review May 8, 2026 13:39

tdoublep requested review from ZJY0516 and vadiklyutiy as code owners May 8, 2026 13:39

claude Bot reviewed May 8, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

ZJY0516 requested changes May 8, 2026

View reviewed changes

github-project-automation Bot moved this to In review in NVIDIA May 8, 2026

ZJY0516 approved these changes May 8, 2026

View reviewed changes

github-project-automation Bot moved this from In review to Ready in NVIDIA May 8, 2026

ZJY0516 added the ready ONLY add when PR is ready to merge/full CI is needed label May 8, 2026

ZJY0516 enabled auto-merge (squash) May 8, 2026 15:37

ZJY0516 mentioned this pull request May 8, 2026

[CI] De-flake failure CI test "test_async_scheduling::test_without_spec_decoding" #42041

Closed

Merge branch 'main' into fix/gdn-cudagraph-capture

10734bc

SoluMilken mentioned this pull request May 9, 2026

[BugFix] Fix Gemma4 'layers.0.moe.experts.0.down_proj_packed' KeyError issue #40708

Merged

4 tasks

ZhanqiuHu mentioned this pull request May 9, 2026

[CI Bug 2026-05-09] 4 Qwen3.5/Qwen3-Next MTP tests: @torch.compile crashes CUDA graph capture (PR #40711) ZhanqiuHu/vllm-ci-watch#110

Open

vllm-bot merged commit 3dda9ae into vllm-project:main May 9, 2026
57 of 60 checks passed

github-project-automation Bot moved this from Ready to Done in NVIDIA May 9, 2026

ZJY0516 mentioned this pull request May 9, 2026

[CI][Bugfix] Fix CI Failure Step "Basic Models Tests (Extra Initialization) 1 & 2" #42154

Closed

ZhanqiuHu mentioned this pull request May 10, 2026

[CI Summary 2026-05-10] 4 failed (1 new, 3 recurring), 6 fixed ZhanqiuHu/vllm-ci-watch#120

Open

Uh oh!

Conversation

tdoublep commented May 8, 2026

Summary

Reproduction

Related

Test plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ZJY0516 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented May 8, 2026

Uh oh!

tdoublep commented May 8, 2026

Uh oh!

tpopp commented May 8, 2026

Uh oh!

ZJY0516 commented May 8, 2026

Uh oh!

ZJY0516 commented May 8, 2026

Uh oh!

tjtanaa commented May 8, 2026

Uh oh!

tjtanaa commented May 8, 2026

Uh oh!

tdoublep commented May 8, 2026

Uh oh!

tjtanaa commented May 8, 2026

Uh oh!

ZJY0516 commented May 8, 2026

Uh oh!

SoluMilken commented May 9, 2026

Uh oh!

SoluMilken commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ZJY0516 left a comment •

edited

Loading