Skip to content

[ROCm][Critical] Fix the GDN import bug#43486

Merged
DarkLight1337 merged 1 commit into
vllm-project:mainfrom
EmbeddedLLM:bugfixrocmgdn
May 23, 2026
Merged

[ROCm][Critical] Fix the GDN import bug#43486
DarkLight1337 merged 1 commit into
vllm-project:mainfrom
EmbeddedLLM:bugfixrocmgdn

Conversation

@tjtanaa
Copy link
Copy Markdown
Member

@tjtanaa tjtanaa commented May 23, 2026

Purpose

#41126 changes the import of gdn thus causing import error. However this import error is critical as it will cause the server to crashed when we use VLLM_ROCM_USE_AITER=1

(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     return self._compiled_callable.aot_compile((args, kwargs))
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in aot_compile
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     return aot_compile_fullgraph(
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]            ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 239, in aot_compile_fullgraph
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     compiled_fn = backend(
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]                   ^^^^^^^^
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2509, in __call__
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     return func(*args, **kwds)
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1163, in __call__
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     self.configure_post_pass()
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 948, in configure_post_pass
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     self.pass_manager.configure(self.vllm_config)
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/passes/pass_manager.py", line 163, in configure
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     RocmAiterRMSNormQuantFusionPass(config),
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/passes/inductor_pass.py", line 139, in fn_new
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     result = fn(*args, **kwargs)
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]              ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/passes/fusion/rocm_aiter_fusion.py", line 563, in __init__
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165]     from vllm.model_executor.layers.mamba.gdn_linear_attn import (
(EngineCore pid=2818) ERROR 05-23 13:50:24 [core.py:1165] ModuleNotFoundError: No module named 'vllm.model_executor.layers.mamba.gdn_linear_attn'

Test Plan

  1. fix unit test pytest -svvvv tests/compile/passes/test_fusion.py::test_aiter_fusion_rmsnorm_gated_quant

  2. Evaluate that the following command works and model generate correct end to end accuracy

#!/bin/bash

rm -rf ~/.cache/vllm
export VLLM_ROCM_USE_AITER=1

vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 \
  --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE","custom_ops":["-rms_norm","-silu_and_mul","+quant_fp8"],"pass_config":{"fuse_norm_quant":true}}'

Test Result

  1. pytest -svvvv tests/compile/passes/test_fusion.py::test_aiter_fusion_rmsnorm_gated_quant
======================= 2 passed, 21 warnings in 11.54s ========================
  1. Acc of Qwen/Qwen3-Next-80B-A3B-Instruct-FP8
local-completions ({'model': 'Qwen/Qwen3-Next-80B-A3B-Instruct-FP8', 'base_url': 'http://0.0.0.0:8000/v1/completions', 'num_concurrent': 256, 'max_retries': 10, 'max_gen_toks': 2048, 'max_length': 1048576, 'timeout': 60000}), gen_kwargs: ({}), limit: None, num_fewshot: 8, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     8|exact_match|↑  |0.9507|±  |0.0060|
|     |       |strict-match    |     8|exact_match|↑  |0.9431|±  |0.0064|

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@mergify mergify Bot added rocm Related to AMD ROCm bug Something isn't working labels May 23, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD May 23, 2026
@tjtanaa
Copy link
Copy Markdown
Member Author

tjtanaa commented May 23, 2026

@tpopp Please evaluate if your optimization in #40710 is still working after Mamba Refactoring #41126 .

This PR is needed as quick critical fix as it affects all models when AITER is used.

@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label May 23, 2026
@tjtanaa tjtanaa requested a review from DarkLight1337 May 23, 2026 15:20
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the import path for GatedDeltaNetAttention to its new location in vllm.model_executor.layers.mamba.gdn.base within the ROCm Aiter fusion pass and the fusion test suite. It also adds a type ignore annotation to handle type-checking issues during layer discovery. I have no feedback to provide as the existing review comments were purely explanatory and did not identify any issues.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 23, 2026 15:37
@DarkLight1337 DarkLight1337 merged commit 46f95b2 into vllm-project:main May 23, 2026
65 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 23, 2026
lrioxh pushed a commit to lrioxh/vllm-dev that referenced this pull request May 24, 2026
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 26, 2026
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Liuweixiong0118 pushed a commit to Liuweixiong0118/vllm that referenced this pull request Jun 1, 2026
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: Liuweixiong0118 <lwx34158427@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants