[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs by tdoublep · Pull Request #36876 · vllm-project/vllm

tdoublep · 2026-03-12T12:09:12Z

Summary

PR [Bugfix] Warm up Triton autotuner for GDN layers during V1 profiling #36599 added Triton autotuner warmup for GDN layers during V1 profiling, which also exercises the FlashInfer path on SM90 GPUs
FlashInfer's chunk_gated_delta_rule returns a single tensor when output_final_state=False, but fi_chunk_gated_delta_rule always unpacked two values, causing a ValueError
Fix: handle the return value based on output_final_state — unpack the tuple when True, use the single tensor when False

Error before fix

WARNING 03-12 10:34:06 [qwen3_next.py:724] GDN prefill kernel warmup (T=16) failed for layer model.layers.0.linear_attn. First inference may OOM due to autotuner.
WARNING 03-12 10:34:06 [qwen3_next.py:724] Traceback (most recent call last):
WARNING 03-12 10:34:06 [qwen3_next.py:724]   File "/workspace/vllm/vllm/model_executor/models/qwen3_next.py", line 712, in _warmup_prefill_kernels
WARNING 03-12 10:34:06 [qwen3_next.py:724]     self.chunk_gated_delta_rule(
WARNING 03-12 10:34:06 [qwen3_next.py:724]   File "/workspace/vllm/vllm/model_executor/models/qwen3_next.py", line 176, in forward_cuda
WARNING 03-12 10:34:06 [qwen3_next.py:724]     return fi_chunk_gated_delta_rule(
WARNING 03-12 10:34:06 [qwen3_next.py:724]   File "/workspace/vllm/vllm/model_executor/models/qwen3_next.py", line 138, in fi_chunk_gated_delta_rule
WARNING 03-12 10:34:06 [qwen3_next.py:724]     output, final_state = chunk_gated_delta_rule_fi(
WARNING 03-12 10:34:06 [qwen3_next.py:724] ValueError: too many values to unpack (expected 2)

This error repeats for T=16, T=32, and T=64 for each GDN layer.

Test plan

Ran tests/v1/e2e/test_mamba_prefix_cache.py::test_mamba_prefix_cache on H100 (SM90) with all caches cleared — passes without the ValueError warning

🤖 Generated with Claude Code

ZJY0516 · 2026-03-12T12:13:51Z

Sorry about this. I think we still need this because flashinfer kernel is a jit kernel

tdoublep · 2026-03-12T12:14:34Z

@ZJY0516 OK, then we need just need to fix the return types. Let me update the PR

gemini-code-assist

Code Review

This pull request correctly addresses a crash during the Gated Delta Net (GDN) layer warmup on SM90 GPUs. The fix involves skipping the warmup, which is intended for Triton autotuning, on SM90 architectures as they use the FlashInfer backend and do not require this step. The change is implemented by adding a conditional check for CUDA and SM90 device capability. While the fix is correct, I've added a comment regarding code duplication that could be addressed to improve long-term maintainability.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm/model_executor/models/qwen3_next.py (682-683)

While this check correctly fixes the issue, it duplicates the logic from ChunkGatedDeltaRule.__init__ which is used to determine whether to use the FlashInfer backend. This could lead to future maintenance issues if the backend selection logic changes but this check is not updated in tandem.

To improve maintainability and avoid this duplication, consider centralizing the backend choice logic. For example, you could add a property to the ChunkGatedDeltaRule class to indicate which backend is in use:

# In ChunkGatedDeltaRule
@property
def uses_flashinfer(self) -> bool:
    return self._forward_method == self.forward_cuda

Then, you could use this property here to make the decision, ensuring the warmup logic always stays in sync with the actual backend being used:

# In _warmup_prefill_kernels
if self.chunk_gated_delta_rule.uses_flashinfer:
    return

This would make the code more robust to future changes.

FlashInfer's chunk_gated_delta_rule returns a single tensor when output_final_state=False, but the wrapper always unpacked two values. This caused a ValueError during GDN kernel warmup (added in vllm-project#36599) on SM90 GPUs (H100/H200). Handle the return value based on output_final_state: unpack the tuple when True, use the single tensor when False. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep · 2026-03-12T12:24:38Z

@ZJY0516 Please take another look. We now fix the error more explicitly, and still allow the warmup phase to happen when using FI kernels.

ZJY0516

Thanks for fixing this

mergify · 2026-03-12T12:39:13Z

Hi @tdoublep, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-03-12T12:45:03Z

Hi @tdoublep, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

ZJY0516 · 2026-03-12T15:10:47Z

Could we merge this asap?

mergify · 2026-03-12T15:45:49Z

Hi @tdoublep, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

xyang16 · 2026-03-13T04:57:44Z

Can this fix get merged?

…ect#36876) Signed-off-by: whycoming <120623296@qq.com>

…ect#36876) Signed-off-by: Athrael Soju <athrael.soju@gmail.com>

…ect#36876)

…ect#36876) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>

…ect#36876)

mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 12, 2026

tdoublep requested a review from ywang96 March 12, 2026 12:10

tdoublep marked this pull request as ready for review March 12, 2026 12:11

tdoublep requested a review from sighingnow as a code owner March 12, 2026 12:11

tdoublep force-pushed the fix-gdn-warmup-flashinfer-unpack branch from 99d804b to 927f39c Compare March 12, 2026 12:17

tdoublep changed the title ~~[Bugfix] Skip GDN Triton warmup on SM90 GPUs using FlashInfer~~ [Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs Mar 12, 2026

gemini-code-assist bot reviewed Mar 12, 2026

View reviewed changes

tdoublep force-pushed the fix-gdn-warmup-flashinfer-unpack branch from 927f39c to 5fea689 Compare March 12, 2026 12:23

ZJY0516 approved these changes Mar 12, 2026

View reviewed changes

Merge branch 'main' into fix-gdn-warmup-flashinfer-unpack

faba440

Merge branch 'main' into fix-gdn-warmup-flashinfer-unpack

e79902d

Merge branch 'main' into fix-gdn-warmup-flashinfer-unpack

a673e43

ywang96 approved these changes Mar 12, 2026

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 12, 2026

Merge branch 'main' into fix-gdn-warmup-flashinfer-unpack

4268492

tdoublep merged commit f296a19 into vllm-project:main Mar 13, 2026
52 checks passed

whycoming pushed a commit to whycoming/vllm that referenced this pull request Mar 13, 2026

[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (vllm-proj…

51443ce

…ect#36876) Signed-off-by: whycoming <120623296@qq.com>

athrael-soju pushed a commit to athrael-soju/vllm that referenced this pull request Mar 16, 2026

[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (vllm-proj…

43152ab

…ect#36876) Signed-off-by: Athrael Soju <athrael.soju@gmail.com>

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026

[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (vllm-proj…

84b5de3

…ect#36876)

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (vllm-proj…

13ad28f

…ect#36876) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (vllm-proj…

78c80c4

…ect#36876)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs#36876

[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs#36876
tdoublep merged 5 commits intovllm-project:mainfrom
tdoublep:fix-gdn-warmup-flashinfer-unpack

tdoublep commented Mar 12, 2026 •

edited

Loading

Uh oh!

ZJY0516 commented Mar 12, 2026

Uh oh!

tdoublep commented Mar 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

tdoublep commented Mar 12, 2026

Uh oh!

ZJY0516 left a comment

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

ZJY0516 commented Mar 12, 2026

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

xyang16 commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

tdoublep commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Error before fix

Test plan

Uh oh!

ZJY0516 commented Mar 12, 2026

Uh oh!

tdoublep commented Mar 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

vllm/model_executor/models/qwen3_next.py (682-683)

Uh oh!

tdoublep commented Mar 12, 2026

Uh oh!

ZJY0516 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

ZJY0516 commented Mar 12, 2026

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

xyang16 commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tdoublep commented Mar 12, 2026 •

edited

Loading