Skip to content

[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model#27165

Merged
ProExpertProg merged 29 commits intovllm-project:mainfrom
izhuhaoran:fuse-qknorm-rope-compile
Nov 11, 2025
Merged

[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model#27165
ProExpertProg merged 29 commits intovllm-project:mainfrom
izhuhaoran:fuse-qknorm-rope-compile

Conversation

@izhuhaoran
Copy link
Copy Markdown
Contributor

@izhuhaoran izhuhaoran commented Oct 19, 2025

Purpose

Inspired by TensorRT-LLM. This PR is a follow PR about #27018 , and fuses QNorm, KNorm, and RoPE into a single CUDA kernel for the Qwen3 model, improving inference performance. We convert this fusion into a custom torch.compile pass, users can enable it by:

 --compilation-config='{"use_inductor": 1,  "pass_config": {"enable_qk_norm_rope_fusion": 1}}'

More details see #27018

Result GPU Trace

  • Main - No Inductor, all custom op
image
  • Main - use inductor
image
  • This PR
image
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fused CUDA kernel for QK Normalization and RoPE for the Qwen model, aiming to improve inference performance. The fusion is implemented as a torch.compile pass. The changes include the CUDA kernel, its PyTorch bindings, the fusion pass logic, and integration into the model and build system. A new test is also added to verify the fusion.

The overall approach is solid and follows existing patterns in the codebase for custom ops and fusions. However, I've found a critical issue in the fusion pass implementation that causes the fusion to produce incorrect results. The output of the fused operation is not correctly propagated in the graph, making the fusion effectively a no-op. Please see the detailed comment for the fix.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
@ZJY0516
Copy link
Copy Markdown
Member

ZJY0516 commented Oct 19, 2025

call_function  split_with_sizes        aten.split_with_sizes.default       (mm_3, [4096, 1024, 1024], -1)                                   {}
call_function  getitem_6               <built-in function getitem>         (split_with_sizes, 0)                                            {}
call_function  getitem_7               <built-in function getitem>         (split_with_sizes, 1)                                            {}
call_function  getitem_8               <built-in function getitem>         (split_with_sizes, 2)                                            {}
call_function  empty                   aten.empty.memory_format            ([arg1_1, 32, 128],)                                             {'dtype': torch.bfloat16, 'layout': torch.strided, 'device': device(type='cuda', index=0), 'pin_memory': False}
call_function  permute_4               aten.permute.default                (empty, [0, 1, 2])                                               {}
call_function  view_1                  aten.reshape.default                (getitem_6, [arg1_1, 32, 128])                                   {}
call_function  clone                   aten.clone.default                  (view_1,)                                                        {'memory_format': torch.contiguous_format}
call_function  auto_functionalized_2   auto_functionalized                 (<OpOverload(op='_C.rms_norm', overload='default')>,)            {'result': permute_4, 'input': clone, 'weight': arg9_1, 'epsilon': 1e-06}
call_function  getitem_10              <built-in function getitem>         (auto_functionalized_2, 1)                                       {}
call_function  empty_1                 aten.empty.memory_format            ([arg1_1, 8, 128],)                                              {'dtype': torch.bfloat16, 'layout': torch.strided, 'device': device(type='cuda', index=0), 'pin_memory': False}
call_function  permute_5               aten.permute.default                (empty_1, [0, 1, 2])                                             {}
call_function  view_3                  aten.reshape.default                (getitem_7, [arg1_1, 8, 128])                                    {}
call_function  clone_1                 aten.clone.default                  (view_3,)                                                        {'memory_format': torch.contiguous_format}
call_function  auto_functionalized_3   auto_functionalized                 (<OpOverload(op='_C.rms_norm', overload='default')>,)            {'result': permute_5, 'input': clone_1, 'weight': arg10_1, 'epsilon': 1e-06}
call_function  getitem_12              <built-in function getitem>         (auto_functionalized_3, 1)                                       {}
call_function  view_5                  aten.reshape.default                (getitem_10, [arg1_1, 4096])                                     {}
call_function  view_6                  aten.reshape.default                (getitem_12, [arg1_1, 1024])                                     {}
call_function  auto_functionalized_4   auto_functionalized                 (<OpOverload(op='_C.rotary_embedding', overload='default')>,)    {'positions': arg11_1, 'query': view_5, 'key': view_6, 'head_size': 128, 'cos_sin_cache': arg13_1, 'is_neox': True}

The target graph for replacement is quite large. Using pattern matching here, as we do in other passes, may not scale effectively and could become a maintenance burden.
Do you have any suggestions? @ProExpertProg

Copy link
Copy Markdown
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits in the kernel, otherwise LGTM!

@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 7, 2025
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
@izhuhaoran izhuhaoran force-pushed the fuse-qknorm-rope-compile branch from b9cee22 to 32e0171 Compare November 10, 2025 16:50
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
@izhuhaoran izhuhaoran force-pushed the fuse-qknorm-rope-compile branch from 32e0171 to b23467c Compare November 10, 2025 16:51
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
@github-project-automation github-project-automation bot moved this to In review in NVIDIA Nov 11, 2025
@ProExpertProg ProExpertProg merged commit 68c09ef into vllm-project:main Nov 11, 2025
92 checks passed
@github-project-automation github-project-automation bot moved this from To triage to Done in torch.compile integration Nov 11, 2025
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Nov 11, 2025
@JartX
Copy link
Copy Markdown
Contributor

JartX commented Nov 12, 2025

Hi @izhuhaoran this PR breaks the load of Qwen3VL Moe and Dense in ROCM RDNA3:

vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639] WorkerProc failed to start.
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639] Traceback (most recent call last):
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 613, in worker_main
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639] WorkerProc failed to start.
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639] Traceback (most recent call last):
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 613, in worker_main
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     worker = WorkerProc(*args, **kwargs)
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 468, in __init__
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.worker.load_model()
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 267, in load_model
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3064, in load_model
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.model = model_loader.load_model(
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                  ^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     model = initialize_model(
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]             ^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     return model_class(vllm_config=vllm_config, prefix=prefix)
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 384, in __init__
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.language_model = Qwen3MoeLLMForCausalLM(
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                           ^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 330, in __init__
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.model = Qwen3MoeLLMModel(
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     worker = WorkerProc(*args, **kwargs)
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 468, in __init__
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.worker.load_model()
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 267, in load_model
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3064, in load_model
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.model = model_loader.load_model(
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                  ^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     model = initialize_model(
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]             ^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                  ^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 276, in __init__
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     old_init(self, **kwargs)
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     return model_class(vllm_config=vllm_config, prefix=prefix)
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 384, in __init__
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.language_model = Qwen3MoeLLMForCausalLM(
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                           ^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 330, in __init__
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.model = Qwen3MoeLLMModel(
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                  ^^^^^^^^^^^^^^^^^
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 276, in __init__
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     old_init(self, **kwargs)
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 79, in __init__
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     super().__init__(vllm_config=vllm_config, prefix=prefix)
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 293, in __init__
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     TorchCompileWrapperWithCustomDispatcher.__init__(
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 44, in __init__
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     backend = vllm_config.compilation_config.init_backend(vllm_config)
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 79, in __init__
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     super().__init__(vllm_config=vllm_config, prefix=prefix)
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 293, in __init__
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     TorchCompileWrapperWithCustomDispatcher.__init__(
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 44, in __init__
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     backend = vllm_config.compilation_config.init_backend(vllm_config)
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/config/compilation.py", line 791, in init_backend
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     from vllm.compilation.backends import VllmBackend
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 40, in <module>
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     from .pass_manager import PostGradPassManager
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/pass_manager.py", line 20, in <module>
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     from .qk_norm_rope_fusion import QKNormRoPEFusionPass
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/qk_norm_rope_fusion.py", line 24, in <module>
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1364, in __getattr__
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     raise AttributeError(
vllm2-1  | (Worker_TP1_EP1 pid=40) ERROR 11-12 09:04:28 [multiproc_executor.py:639] AttributeError: '_OpNamespace' '_C' object has no attribute 'fused_qk_norm_rope'
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639] WorkerProc failed to start.
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639] Traceback (most recent call last):
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 613, in worker_main
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     worker = WorkerProc(*args, **kwargs)
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 468, in __init__
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.worker.load_model()
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 267, in load_model
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3064, in load_model
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.model = model_loader.load_model(
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/config/compilation.py", line 791, in init_backend
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     from vllm.compilation.backends import VllmBackend
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 40, in <module>
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     from .pass_manager import PostGradPassManager
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/pass_manager.py", line 20, in <module>
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     from .qk_norm_rope_fusion import QKNormRoPEFusionPass
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/qk_norm_rope_fusion.py", line 24, in <module>
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1364, in __getattr__
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     raise AttributeError(
vllm1-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639] AttributeError: '_OpNamespace' '_C' object has no attribute 'fused_qk_norm_rope'
vllm1-1  | (Worker_TP0_EP0 pid=39) INFO 11-12 09:04:28 [multiproc_executor.py:600] Parent process exited, terminating worker
vllm1-1  | (Worker_TP1_EP1 pid=40) INFO 11-12 09:04:28 [multiproc_executor.py:600] Parent process exited, terminating worker
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                  ^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     model = initialize_model(
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]             ^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     return model_class(vllm_config=vllm_config, prefix=prefix)
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 384, in __init__
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.language_model = Qwen3MoeLLMForCausalLM(
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                           ^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 330, in __init__
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     self.model = Qwen3MoeLLMModel(
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                  ^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 276, in __init__
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     old_init(self, **kwargs)
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 79, in __init__
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     super().__init__(vllm_config=vllm_config, prefix=prefix)
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 293, in __init__
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     TorchCompileWrapperWithCustomDispatcher.__init__(
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 44, in __init__
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     backend = vllm_config.compilation_config.init_backend(vllm_config)
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/config/compilation.py", line 791, in init_backend
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     from vllm.compilation.backends import VllmBackend
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 40, in <module>
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     from .pass_manager import PostGradPassManager
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/pass_manager.py", line 20, in <module>
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     from .qk_norm_rope_fusion import QKNormRoPEFusionPass
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/qk_norm_rope_fusion.py", line 24, in <module>
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1364, in __getattr__
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639]     raise AttributeError(
vllm2-1  | (Worker_TP0_EP0 pid=39) ERROR 11-12 09:04:28 [multiproc_executor.py:639] AttributeError: '_OpNamespace' '_C' object has no attribute 'fused_qk_norm_rope'
vllm2-1  | (Worker_TP0_EP0 pid=39) INFO 11-12 09:04:28 [multiproc_executor.py:600] Parent process exited, terminating worker
vllm2-1  | (Worker_TP1_EP1 pid=40) INFO 11-12 09:04:28 [multiproc_executor.py:600] Parent process exited, terminating worker
vllm1-1  | [rank0]:[W1112 09:04:29.656155644 ProcessGroupNCCL.cpp:1522] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
vllm2-1  | [rank0]:[W1112 09:04:29.706185782 ProcessGroupNCCL.cpp:1522] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855] EngineCore failed to start.
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855] Traceback (most recent call last):
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     engine_core = EngineCoreProc(*args, **kwargs)
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 619, in __init__
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     super().__init__(
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 103, in __init__
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     self.model_executor = executor_class(vllm_config)
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855] EngineCore failed to start.
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855] Traceback (most recent call last):
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     engine_core = EngineCoreProc(*args, **kwargs)
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 619, in __init__
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     super().__init__(
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 103, in __init__
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     self.model_executor = executor_class(vllm_config)
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     self._init_executor()
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 147, in _init_executor
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     self.workers = WorkerProc.wait_for_ready(unready_workers)
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 551, in wait_for_ready
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     raise e from None
vllm1-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
vllm1-1  | (EngineCore_DP0 pid=29) Process EngineCore_DP0:
vllm1-1  | (EngineCore_DP0 pid=29) Traceback (most recent call last):
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     self._init_executor()
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 147, in _init_executor
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     self.workers = WorkerProc.wait_for_ready(unready_workers)
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 551, in wait_for_ready
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855]     raise e from None
vllm2-1  | (EngineCore_DP0 pid=29) ERROR 11-12 09:04:30 [core.py:855] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
vllm2-1  | (EngineCore_DP0 pid=29) Process EngineCore_DP0:
vllm2-1  | (EngineCore_DP0 pid=29) Traceback (most recent call last):
vllm2-1  | (EngineCore_DP0 pid=29)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
vllm1-1  | (EngineCore_DP0 pid=29)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
vllm1-1  | (EngineCore_DP0 pid=29)     self.run()
vllm1-1  | (EngineCore_DP0 pid=29)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
vllm1-1  | (EngineCore_DP0 pid=29)     self._target(*self._args, **self._kwargs)
vllm1-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 859, in run_engine_core
vllm1-1  | (EngineCore_DP0 pid=29)     raise e
vllm1-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
vllm1-1  | (EngineCore_DP0 pid=29)     engine_core = EngineCoreProc(*args, **kwargs)
vllm1-1  | (EngineCore_DP0 pid=29)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 619, in __init__
vllm1-1  | (EngineCore_DP0 pid=29)     super().__init__(
vllm1-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 103, in __init__
vllm1-1  | (EngineCore_DP0 pid=29)     self.model_executor = executor_class(vllm_config)
vllm1-1  | (EngineCore_DP0 pid=29)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__
vllm1-1  | (EngineCore_DP0 pid=29)     self._init_executor()
vllm1-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 147, in _init_executor
vllm1-1  | (EngineCore_DP0 pid=29)     self.workers = WorkerProc.wait_for_ready(unready_workers)
vllm2-1  | (EngineCore_DP0 pid=29)     self.run()
vllm2-1  | (EngineCore_DP0 pid=29)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
vllm2-1  | (EngineCore_DP0 pid=29)     self._target(*self._args, **self._kwargs)
vllm2-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 859, in run_engine_core
vllm2-1  | (EngineCore_DP0 pid=29)     raise e
vllm2-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
vllm2-1  | (EngineCore_DP0 pid=29)     engine_core = EngineCoreProc(*args, **kwargs)
vllm2-1  | (EngineCore_DP0 pid=29)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 619, in __init__
vllm2-1  | (EngineCore_DP0 pid=29)     super().__init__(
vllm2-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 103, in __init__
vllm2-1  | (EngineCore_DP0 pid=29)     self.model_executor = executor_class(vllm_config)
vllm2-1  | (EngineCore_DP0 pid=29)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__
vllm1-1  | (EngineCore_DP0 pid=29)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 551, in wait_for_ready
vllm1-1  | (EngineCore_DP0 pid=29)     raise e from None
vllm1-1  | (EngineCore_DP0 pid=29) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
vllm2-1  | (EngineCore_DP0 pid=29)     self._init_executor()
vllm2-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 147, in _init_executor
vllm2-1  | (EngineCore_DP0 pid=29)     self.workers = WorkerProc.wait_for_ready(unready_workers)
vllm2-1  | (EngineCore_DP0 pid=29)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (EngineCore_DP0 pid=29)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 551, in wait_for_ready
vllm2-1  | (EngineCore_DP0 pid=29)     raise e from None
vllm2-1  | (EngineCore_DP0 pid=29) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
vllm1-1  | (APIServer pid=1) Traceback (most recent call last):
vllm1-1  | (APIServer pid=1)   File "/usr/local/bin/vllm", line 7, in <module>
vllm2-1  | (APIServer pid=1) Traceback (most recent call last):
vllm2-1  | (APIServer pid=1)   File "/usr/local/bin/vllm", line 7, in <module>
vllm2-1  | (APIServer pid=1)     sys.exit(main())
vllm2-1  | (APIServer pid=1)              ^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
vllm2-1  | (APIServer pid=1)     args.dispatch_function(args)
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 59, in cmd
vllm2-1  | (APIServer pid=1)     uvloop.run(run_server(args))
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
vllm2-1  | (APIServer pid=1)     return __asyncio.run(
vllm2-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
vllm2-1  | (APIServer pid=1)     return runner.run(main)
vllm2-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
vllm2-1  | (APIServer pid=1)     return self._loop.run_until_complete(task)
vllm2-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
vllm2-1  | (APIServer pid=1)     return await main
vllm2-1  | (APIServer pid=1)            ^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1944, in run_server
vllm2-1  | (APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1963, in run_server_worker
vllm2-1  | (APIServer pid=1)     async with build_async_engine_client(
vllm2-1  | (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm2-1  | (APIServer pid=1)     return await anext(self.gen)
vllm2-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 192, in build_async_engine_client
vllm1-1  | (APIServer pid=1)     sys.exit(main())
vllm1-1  | (APIServer pid=1)              ^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
vllm1-1  | (APIServer pid=1)     args.dispatch_function(args)
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 59, in cmd
vllm1-1  | (APIServer pid=1)     uvloop.run(run_server(args))
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
vllm1-1  | (APIServer pid=1)     return __asyncio.run(
vllm1-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
vllm1-1  | (APIServer pid=1)     return runner.run(main)
vllm1-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
vllm1-1  | (APIServer pid=1)     return self._loop.run_until_complete(task)
vllm1-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
vllm1-1  | (APIServer pid=1)     return await main
vllm1-1  | (APIServer pid=1)            ^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1944, in run_server
vllm1-1  | (APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1963, in run_server_worker
vllm1-1  | (APIServer pid=1)     async with build_async_engine_client(
vllm1-1  | (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm1-1  | (APIServer pid=1)     return await anext(self.gen)
vllm1-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 192, in build_async_engine_client
vllm1-1  | (APIServer pid=1)     async with build_async_engine_client_from_engine_args(
vllm1-1  | (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm1-1  | (APIServer pid=1)     return await anext(self.gen)
vllm1-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args
vllm1-1  | (APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
vllm2-1  | (APIServer pid=1)     async with build_async_engine_client_from_engine_args(
vllm2-1  | (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm2-1  | (APIServer pid=1)     return await anext(self.gen)
vllm2-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args
vllm2-1  | (APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
vllm2-1  | (APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/func_utils.py", line 116, in inner
vllm2-1  | (APIServer pid=1)     return fn(*args, **kwargs)
vllm2-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 202, in from_vllm_config
vllm2-1  | (APIServer pid=1)     return cls(
vllm2-1  | (APIServer pid=1)            ^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 132, in __init__
vllm2-1  | (APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
vllm2-1  | (APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
vllm2-1  | (APIServer pid=1)     return AsyncMPClient(*client_args)
vllm2-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/func_utils.py", line 116, in inner
vllm1-1  | (APIServer pid=1)     return fn(*args, **kwargs)
vllm1-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 202, in from_vllm_config
vllm1-1  | (APIServer pid=1)     return cls(
vllm1-1  | (APIServer pid=1)            ^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 132, in __init__
vllm1-1  | (APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
vllm1-1  | (APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
vllm1-1  | (APIServer pid=1)     return AsyncMPClient(*client_args)
vllm1-1  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 808, in __init__
vllm1-1  | (APIServer pid=1)     super().__init__(
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 469, in __init__
vllm1-1  | (APIServer pid=1)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
vllm1-1  | (APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
vllm1-1  | (APIServer pid=1)     next(self.gen)
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 898, in launch_core_engines
vllm1-1  | (APIServer pid=1)     wait_for_engine_startup(
vllm1-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 955, in wait_for_engine_startup
vllm1-1  | (APIServer pid=1)     raise RuntimeError(
vllm1-1  | (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 808, in __init__
vllm2-1  | (APIServer pid=1)     super().__init__(
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 469, in __init__
vllm2-1  | (APIServer pid=1)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
vllm2-1  | (APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm2-1  | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
vllm2-1  | (APIServer pid=1)     next(self.gen)
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 898, in launch_core_engines
vllm2-1  | (APIServer pid=1)     wait_for_engine_startup(
vllm2-1  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 955, in wait_for_engine_startup
vllm2-1  | (APIServer pid=1)     raise RuntimeError(
vllm2-1  | (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
vllm2-1  | /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
vllm2-1  |   warnings.warn('resource_tracker: There appear to be %d '
vllm1-1  | /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
vllm1-1  |   warnings.warn('resource_tracker: There appear to be %d '
vllm1-1 exited with code 0

@tjtanaa @DarkLight1337 Can you check it please?

@ZJY0516
Copy link
Copy Markdown
Member

ZJY0516 commented Nov 12, 2025

@JartX Could you please try this #28500

@JartX
Copy link
Copy Markdown
Contributor

JartX commented Nov 12, 2025

@ZJY0516 @izhuhaoran @tjtanaa yes this PR: #28500 solve the problem

liuzijing2014 pushed a commit to liuzijing2014/vllm that referenced this pull request Nov 13, 2025
Summary:
vllm-project#27165 introduced an issue where when we run on AMD hardware, we would try to load `FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default`, which is CUDA only, and get error:

```
[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] WorkerProc failed to start.
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] WorkerProc failed to start.
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] Traceback (most recent call last):
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] Traceback (most recent call last):
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 613, in worker_main
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 613, in worker_main
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     worker = WorkerProc(*args, **kwargs)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     worker = WorkerProc(*args, **kwargs)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 468, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 468, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.worker.load_model()
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.worker.load_model()
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_worker.py", line 266, in load_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_worker.py", line 266, in load_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_model_runner.py", line 3033, in load_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_model_runner.py", line 3033, in load_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = model_loader.load_model(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = model_loader.load_model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     model = initialize_model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     model = initialize_model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     return model_class(vllm_config=vllm_config, prefix=prefix)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/models/deepseek_v2.py", line 1349, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     return model_class(vllm_config=vllm_config, prefix=prefix)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = DeepseekV2Model(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/models/deepseek_v2.py", line 1349, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/decorators.py", line 293, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = DeepseekV2Model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     TorchCompileWrapperWithCustomDispatcher.__init__(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/wrapper.py", line 42, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/decorators.py", line 293, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     backend = vllm_config.compilation_config.init_backend(vllm_config)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     TorchCompileWrapperWithCustomDispatcher.__init__(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/config/compilation.py", line 770, in init_backend
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from vllm.compilation.backends import VllmBackend
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/wrapper.py", line 42, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/backends.py", line 40, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     backend = vllm_config.compilation_config.init_backend(vllm_config)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .pass_manager import PostGradPassManager
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/pass_manager.py", line 20, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/config/compilation.py", line 770, in init_backend
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .qk_norm_rope_fusion import QKNormRoPEFusionPass
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from vllm.compilation.backends import VllmBackend
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/qk_norm_rope_fusion.py", line 24, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/backends.py", line 40, in <module>
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .pass_manager import PostGradPassManager
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/torch/_ops.py", line 1361, in __getattr__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/pass_manager.py", line 20, in <module>
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     raise AttributeError(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .qk_norm_rope_fusion import QKNormRoPEFusionPass
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] AttributeError: '_OpNamespace' '_C' object has no attribute 'fused_qk_norm_rope'
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/qk_norm_rope_fusion.py", line 24, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/torch/_ops.py", line 1361, in __getattr__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     raise AttributeError(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] AttributeError: '_OpNamespace' '_C' object has no attribute 'fused_qk_norm_rope'
```

We should only import `QKNormRoPEFusionPass` when `is_cuda`, instead of `is_cuda_alike`, which include ROCM.

Test Plan:
Patch the change and able to start vllm on AMD properly (deepseek)
```
Ran 500/500 requests in 144.51s
Success rate:        100.00%
QPS:                 3.46
Avg latency:         4.489s
Avg TTFT (client):   161.38ms
P50 TTFT (client):   143.11ms
P99 TTFT (client):   266.50ms
Avg TTIT (client):   28.85ms
P50 TTIT (client):   28.94ms
P99 TTIT (client):   29.38ms
Avg TTFT (server):   224.00ms
Avg TTIT (server):   28.62ms
Avg prefill len:     3293.05 tokens
P50 prefill len:     3293.00 tokens
P99 prefill len:     3335.00 tokens
Avg decode len:      150.00 tokens
P50 decode len:      150.00 tokens
P99 decode len:      150.00 tokens
Peak TPGS: 66.375
```

```
[2025-11-12 16:54:55,483] [rank 0] [INFO] Evaluation results on task gsm8k.8_shot.1_gen: em: 0.960576 | f1: 0.960576 | em_maj1@1: 0.960576 | f1_maj1@1: 0.960576
```

Differential Revision: D86838348
khluu pushed a commit that referenced this pull request Nov 16, 2025
…del (#27165)

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
(cherry picked from commit 68c09ef)
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…del (vllm-project#27165)

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
@ProExpertProg
Copy link
Copy Markdown
Collaborator

I think this is broken: #33295

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build nvidia qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed torch.compile

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants