Skip to content

[Bugfix] [ROCm] [DSV4] Bugfix cutlass import DSV4#43735

Closed
tjtanaa wants to merge 1 commit into
vllm-project:mainfrom
EmbeddedLLM:bugfixdsv4-2
Closed

[Bugfix] [ROCm] [DSV4] Bugfix cutlass import DSV4#43735
tjtanaa wants to merge 1 commit into
vllm-project:mainfrom
EmbeddedLLM:bugfixdsv4-2

Conversation

@tjtanaa

@tjtanaa tjtanaa commented May 27, 2026

Copy link
Copy Markdown
Member

Purpose

This is a bugfix for import error cutlass not found, introduced in this PR #43584

(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]   File "/app/dsv4tilelang/bugfixdsv4-2/vllm/models/deepseek_v4/compressor.py", line 383, in forward
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]     self._compress_kernel(
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]   File "/app/dsv4tilelang/bugfixdsv4-2/vllm/models/deepseek_v4/common/ops/fused_compress_quant_cache.py", line 52, in _compress_kv_sparse_attn_cutedsl
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]     return _get_sparse_attn_cutedsl_impls()[0](*args, **kwargs)
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]   File "/app/dsv4tilelang/bugfixdsv4-2/vllm/models/deepseek_v4/common/ops/fused_compress_quant_cache.py", line 37, in _get_sparse_attn_cutedsl_impls
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]     from .sparse_attn_compress_cutedsl import (
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]   File "/app/dsv4tilelang/bugfixdsv4-2/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 12, in <module>
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962]     import cutlass
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962] ModuleNotFoundError: No module named 'cutlass'
(Worker_TP0 pid=177406) ERROR 05-27 00:12:09 [multiproc_executor.py:962] 

Test Plan

#!/bin/bash

rm -rf /root/.cache/vllm

VLLM_ROCM_USE_AITER=1 \
vllm serve deepseek-ai/DeepSeek-V4-Pro \
  --host localhost \
  --port 8001 \
  --dtype auto \
  --tensor-parallel-size 8 \
  --max-num-seqs 256 \
  --distributed-executor-backend mp \
  --trust-remote-code \
  --gpu-memory-utilization 0.6 \
  --moe-backend triton_unfused \
  --tokenizer-mode deepseek_v4 \
  --reasoning-parser deepseek_v4 \
  --kv-cache-dtype fp8_e4m3 \
  --compilation-config '{"mode":3,"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --speculative_config '{"method":"mtp","num_speculative_tokens":2}'

Test Result

local-completions ({'model': 'deepseek-ai/DeepSeek-V4-Pro', 'base_url': 'http://0.0.0.0:8001/v1/completions', 'num_concurrent': 256, 'max_retries': 10, 'max_gen_toks': 2048, 'max_length': 1048576, 'timeout': 60000}), gen_kwargs: ({}), limit: None, num_fewshot: 20, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|    20|exact_match|_  |0.9500|_  | 0.006|
|     |       |strict-match    |    20|exact_match|_  |0.9507|_  | 0.006|

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa requested a review from zyongye as a code owner May 27, 2026 02:43
@tjtanaa tjtanaa added rocm Related to AMD ROCm ready ONLY add when PR is ready to merge/full CI is needed labels May 27, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD May 27, 2026
@mergify mergify Bot added nvidia bug Something isn't working labels May 27, 2026
@tjtanaa tjtanaa added this to the v0.22.0 milestone May 27, 2026
@mergify

mergify Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tjtanaa.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 27, 2026
@tjtanaa

tjtanaa commented May 27, 2026

Copy link
Copy Markdown
Member Author

addressed by #43710

@tjtanaa tjtanaa closed this May 27, 2026
@github-project-automation github-project-automation Bot moved this to Done in NVIDIA May 27, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 27, 2026
@khluu khluu removed this from the v0.22.0 milestone May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase nvidia ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants