Skip to content

Fix where we load CUDA only kernel when running on AMD hardware#28605

Closed
liuzijing2014 wants to merge 1 commit intovllm-project:mainfrom
liuzijing2014:export-D86838348
Closed

Fix where we load CUDA only kernel when running on AMD hardware#28605
liuzijing2014 wants to merge 1 commit intovllm-project:mainfrom
liuzijing2014:export-D86838348

Conversation

@liuzijing2014
Copy link
Copy Markdown
Collaborator

Summary:
#27165 introduced an issue where when we run on AMD hardware, we would try to load FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default, which is CUDA only, and get error:

[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] WorkerProc failed to start.
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] WorkerProc failed to start.
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] Traceback (most recent call last):
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] Traceback (most recent call last):
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 613, in worker_main
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 613, in worker_main
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     worker = WorkerProc(*args, **kwargs)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     worker = WorkerProc(*args, **kwargs)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 468, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 468, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.worker.load_model()
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.worker.load_model()
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_worker.py", line 266, in load_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_worker.py", line 266, in load_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_model_runner.py", line 3033, in load_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_model_runner.py", line 3033, in load_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = model_loader.load_model(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = model_loader.load_model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     model = initialize_model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     model = initialize_model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     return model_class(vllm_config=vllm_config, prefix=prefix)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/models/deepseek_v2.py", line 1349, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     return model_class(vllm_config=vllm_config, prefix=prefix)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = DeepseekV2Model(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/models/deepseek_v2.py", line 1349, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/decorators.py", line 293, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = DeepseekV2Model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     TorchCompileWrapperWithCustomDispatcher.__init__(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/wrapper.py", line 42, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/decorators.py", line 293, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     backend = vllm_config.compilation_config.init_backend(vllm_config)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     TorchCompileWrapperWithCustomDispatcher.__init__(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/config/compilation.py", line 770, in init_backend
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from vllm.compilation.backends import VllmBackend
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/wrapper.py", line 42, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/backends.py", line 40, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     backend = vllm_config.compilation_config.init_backend(vllm_config)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .pass_manager import PostGradPassManager
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/pass_manager.py", line 20, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/config/compilation.py", line 770, in init_backend
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .qk_norm_rope_fusion import QKNormRoPEFusionPass
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from vllm.compilation.backends import VllmBackend
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/qk_norm_rope_fusion.py", line 24, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/backends.py", line 40, in <module>
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .pass_manager import PostGradPassManager
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/torch/_ops.py", line 1361, in __getattr__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/pass_manager.py", line 20, in <module>
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     raise AttributeError(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .qk_norm_rope_fusion import QKNormRoPEFusionPass
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] AttributeError: '_OpNamespace' '_C' object has no attribute 'fused_qk_norm_rope'
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/qk_norm_rope_fusion.py", line 24, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/torch/_ops.py", line 1361, in __getattr__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     raise AttributeError(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] AttributeError: '_OpNamespace' '_C' object has no attribute 'fused_qk_norm_rope'

We should only import QKNormRoPEFusionPass when is_cuda, instead of is_cuda_alike, which include ROCM.

Test Plan:
Patch the change and able to start vllm on AMD properly (deepseek)

Ran 500/500 requests in 144.51s
Success rate:        100.00%
QPS:                 3.46
Avg latency:         4.489s
Avg TTFT (client):   161.38ms
P50 TTFT (client):   143.11ms
P99 TTFT (client):   266.50ms
Avg TTIT (client):   28.85ms
P50 TTIT (client):   28.94ms
P99 TTIT (client):   29.38ms
Avg TTFT (server):   224.00ms
Avg TTIT (server):   28.62ms
Avg prefill len:     3293.05 tokens
P50 prefill len:     3293.00 tokens
P99 prefill len:     3335.00 tokens
Avg decode len:      150.00 tokens
P50 decode len:      150.00 tokens
P99 decode len:      150.00 tokens
Peak TPGS: 66.375
[2025-11-12 16:54:55,483] [rank 0] [INFO] Evaluation results on task gsm8k.8_shot.1_gen: em: 0.960576 | f1: 0.960576 | em_maj1@1: 0.960576 | f1_maj1@1: 0.960576

Differential Revision: D86838348

Summary:
vllm-project#27165 introduced an issue where when we run on AMD hardware, we would try to load `FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default`, which is CUDA only, and get error:

```
[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] WorkerProc failed to start.
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] WorkerProc failed to start.
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] Traceback (most recent call last):
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] Traceback (most recent call last):
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 613, in worker_main
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 613, in worker_main
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     worker = WorkerProc(*args, **kwargs)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     worker = WorkerProc(*args, **kwargs)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 468, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/executor/multiproc_executor.py", line 468, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.worker.load_model()
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.worker.load_model()
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_worker.py", line 266, in load_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_worker.py", line 266, in load_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_model_runner.py", line 3033, in load_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/v1/worker/gpu_model_runner.py", line 3033, in load_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = model_loader.load_model(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = model_loader.load_model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     model = initialize_model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     model = initialize_model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     return model_class(vllm_config=vllm_config, prefix=prefix)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/models/deepseek_v2.py", line 1349, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     return model_class(vllm_config=vllm_config, prefix=prefix)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = DeepseekV2Model(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/model_executor/models/deepseek_v2.py", line 1349, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/decorators.py", line 293, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     self.model = DeepseekV2Model(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     TorchCompileWrapperWithCustomDispatcher.__init__(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/wrapper.py", line 42, in __init__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/decorators.py", line 293, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     backend = vllm_config.compilation_config.init_backend(vllm_config)
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     TorchCompileWrapperWithCustomDispatcher.__init__(
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/config/compilation.py", line 770, in init_backend
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from vllm.compilation.backends import VllmBackend
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/wrapper.py", line 42, in __init__
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/backends.py", line 40, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     backend = vllm_config.compilation_config.init_backend(vllm_config)
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .pass_manager import PostGradPassManager
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/pass_manager.py", line 20, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/config/compilation.py", line 770, in init_backend
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .qk_norm_rope_fusion import QKNormRoPEFusionPass
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from vllm.compilation.backends import VllmBackend
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/qk_norm_rope_fusion.py", line 24, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/backends.py", line 40, in <module>
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .pass_manager import PostGradPassManager
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/torch/_ops.py", line 1361, in __getattr__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/pass_manager.py", line 20, in <module>
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     raise AttributeError(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     from .qk_norm_rope_fusion import QKNormRoPEFusionPass
�[1;36m(Worker_TP5 pid=4058)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] AttributeError: '_OpNamespace' '_C' object has no attribute 'fused_qk_norm_rope'
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/vllm/compilation/qk_norm_rope_fusion.py", line 24, in <module>
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     FUSED_QK_ROPE_OP = torch.ops._C.fused_qk_norm_rope.default
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]   File "/dev/shm/uid-99/83e08bb2-seed-nspid4026555323_cgpid2465342-ns-4026555243/torch/_ops.py", line 1361, in __getattr__
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639]     raise AttributeError(
�[1;36m(Worker_TP6 pid=4059)�[0;0m ERROR 11-11 18:07:51 [multiproc_executor.py:639] AttributeError: '_OpNamespace' '_C' object has no attribute 'fused_qk_norm_rope'
```

We should only import `QKNormRoPEFusionPass` when `is_cuda`, instead of `is_cuda_alike`, which include ROCM.

Test Plan:
Patch the change and able to start vllm on AMD properly (deepseek)
```
Ran 500/500 requests in 144.51s
Success rate:        100.00%
QPS:                 3.46
Avg latency:         4.489s
Avg TTFT (client):   161.38ms
P50 TTFT (client):   143.11ms
P99 TTFT (client):   266.50ms
Avg TTIT (client):   28.85ms
P50 TTIT (client):   28.94ms
P99 TTIT (client):   29.38ms
Avg TTFT (server):   224.00ms
Avg TTIT (server):   28.62ms
Avg prefill len:     3293.05 tokens
P50 prefill len:     3293.00 tokens
P99 prefill len:     3335.00 tokens
Avg decode len:      150.00 tokens
P50 decode len:      150.00 tokens
P99 decode len:      150.00 tokens
Peak TPGS: 66.375
```

```
[2025-11-12 16:54:55,483] [rank 0] [INFO] Evaluation results on task gsm8k.8_shot.1_gen: em: 0.960576 | f1: 0.960576 | em_maj1@1: 0.960576 | f1_maj1@1: 0.960576
```

Differential Revision: D86838348
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a crash on AMD hardware by ensuring that the CUDA-only fused_qk_norm_rope kernel and its associated fusion passes are only loaded on CUDA platforms. The changes in csrc/ops.h, vllm/compilation/fix_functionalization.py, and vllm/compilation/pass_manager.py are logical and effectively solve the reported issue. However, I've identified a related inconsistency in the configuration validation logic that was not updated as part of this change. This could lead to a NameError on ROCm systems if a user enables the corresponding fusion pass, as detailed in my comment.

Comment on lines -21 to +22
from .qk_norm_rope_fusion import QKNormRoPEFusionPass

if current_platform.is_cuda():
from .qk_norm_rope_fusion import QKNormRoPEFusionPass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While moving this import under is_cuda() is correct to fix the crash on ROCm, it introduces a potential inconsistency. The configuration validation for this fusion in vllm/config/compilation.py still uses is_cuda_alike().

This means a user on a ROCm platform could set enable_qk_norm_rope_fusion=True, and it would pass the configuration check. However, this would lead to a NameError here, as QKNormRoPEFusionPass would not be imported.

To fix this, please also update the check in vllm/config/compilation.py (line 187) to use current_platform.is_cuda() instead of current_platform.is_cuda_alike().

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +95 to +101
#ifndef USE_ROCM
void fused_qk_norm_rope(torch::Tensor& qkv, int64_t num_heads_q,
int64_t num_heads_k, int64_t num_heads_v,
int64_t head_dim, double eps, torch::Tensor& q_weight,
torch::Tensor& k_weight, torch::Tensor& cos_sin_cache,
bool is_neox, torch::Tensor& position_ids);
#endif
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard removes declaration but binding still references op

The new #ifndef USE_ROCM guard in csrc/ops.h (lines 95‑101) removes the declaration of fused_qk_norm_rope on ROCm builds, but csrc/torch_bindings.cpp still unconditionally registers the custom op at lines 178‑184 via ops.impl("fused_qk_norm_rope", torch::kCUDA, &fused_qk_norm_rope);. When building with USE_ROCM defined, the compiler no longer sees any declaration of that symbol before it is used, so torch_bindings.cpp fails to compile on AMD/ROCm even though the function definition still exists in fused_qknorm_rope_kernel.cu. Either the declaration needs to remain available or the binding needs to be wrapped in the same guard; otherwise every ROCm build is broken.

Useful? React with 👍 / 👎.

@ZJY0516
Copy link
Copy Markdown
Member

ZJY0516 commented Nov 13, 2025

I think this error has been solved by #28500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants