[ROCm] [DeepSeekV4] Preliminary Enablement of DeepSeekV4 on ROCm by tjtanaavllm · Pull Request #40931 · vllm-project/vllm

tjtanaavllm · 2026-04-26T17:38:42Z

Purpose

This PR supersedes #40871 as it is rebased against #40860 . (On CUDA side, PR #40760 supersedes PR #40871 )

Test Plan

Test Result

Server command:

max_num_seqs=64
max_num_batched_tokens=131072
tensor_parallel_size=4
export HF_HOME=/data/huggingface-cache
export VLLM_ROCM_USE_AITER=1

unset FLATMM_HIP_CLANG_PATH
MODEL=deepseek-ai/DeepSeek-V4-Flash
vllm serve ${MODEL} \
    --host localhost \
    --port 8000 \
    --dtype auto \
    --tensor-parallel-size ${tensor_parallel_size} \
    --max-num-seqs ${max_num_seqs} \
    --distributed-executor-backend mp \
    --trust-remote-code \
    --gpu-memory-utilization 0.40 \
    --moe-backend "triton_unfused" \
    --enforce-eager \
    --tokenizer-mode "deepseek_v4" \
    --async-scheduling

preliminary tests
lm_eval score

MODEL=deepseek-ai/DeepSeek-V4-Flash
lm_eval --model local-completions --model_args model=$MODEL,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=8,max_retries=10,max_gen_toks=2048 --batch_size auto --tasks gsm8k --num_fewshot 5 --limit 16  --output_path . 2>&1 | tee -a eval-oldforward.log

local-completions ({'model': 'deepseek-ai/DeepSeek-V4-Flash', 'base_url': 'http://0.0.0.0:8000/v1/completions', 'num_concurrent': 8, 'max_retries': 10, 'max_gen_toks': 2048}), gen_kwargs: ({}), limit: 16.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.875|_  |0.0854|
|     |       |strict-match    |     5|exact_match|_  |0.875|_  |0.0854|

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Co-authored-by: Yongye Zhu <yongye@inferact.ai> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Simon Mo <simon@inferact.ai> Co-authored-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roy Wang <yasong.wang@inferact.ai> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Zhewen Li <jerven.vllm@gmail.com> Co-authored-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: khluu <khluu000@gmail.com> Co-authored-by: qizixi <zixi@inferact.ai>

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

Signed-off-by: qizixi <zixi@inferact.ai>

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Co-authored-by: Zhewen Li <zhewenli@inferact.ai>

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

Signed-off-by: ganyi <ygan@amd.com> Made-with: Cursor

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

mergify · 2026-04-27T04:53:26Z

Documentation preview: https://vllm--40931.org.readthedocs.build/en/40931/

mergify · 2026-04-27T04:54:02Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tjtanaavllm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Co-authored-by: Yongye Zhu <yongye@inferact.ai> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Simon Mo <simon@inferact.ai> Co-authored-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roy Wang <yasong.wang@inferact.ai> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Zhewen Li <jerven.vllm@gmail.com> Co-authored-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: khluu <khluu000@gmail.com> Co-authored-by: qizixi <zixi@inferact.ai>

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

Signed-off-by: ganyi <ygan@amd.com> Made-with: Cursor

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

zyongye and others added 30 commits April 25, 2026 20:01

chore: pass mypy

908ab01

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

fix: update cuda requirements

cf3e417

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

fix: config

c75c382

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Integrate MegaMoE kernel

5e3525c

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

free up unused weights and support dummy weights

9abe2bd

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

[Bugfix] Flatten DeepSeek V32 indexer next_n on non-SM100 archs

f704cf3

Signed-off-by: qizixi <zixi@inferact.ai>

chore: fix pre-commit

b353527

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

fix (ci): interface mismatches

618e3b6

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Add model information

6fac86c

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

fix (ci): misc api mismatches

d95a973

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

[Bugfix][CI] Run mooncake HMA worker tests on GPU lane (#241)

36992a0

Co-authored-by: Zhewen Li <zhewenli@inferact.ai>

Merge branch 'main' into feat/dsv4-support

6a2e1ed

CI Failure for deep_gemm and layernorm_fp8_quant

e5108f7

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

make kernel compatible with rocm platform

f39bfb1

Signed-off-by: ganyi <ygan@amd.com> Made-with: Cursor

resolve merge conflict

a2c69e8

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

remove merge conflict

51ed059

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

resolve merge conflict

9251267

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

remove merge conflict

11dcc15

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

remove redundant codes; add _old_use_hadamard

42a05fa

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

FIX

f35845c

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into feat/dsv4-support

6f95a90

fix (ci): an e2e OOM issue and a MTP model registery issue

9e5f0da

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

chore: pin tilelang version

f21fcc1

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

fix (ci): pre-commit happy

5cd8311

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

FIX

999637f

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

FIX

ab72e57

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into feat/dsv4-support

fe61cd4

fix bug

08d3aeb

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

fix compilation bug and syntax error

f288c43

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

tjtanaavllm reopened this Apr 27, 2026

github-project-automation Bot moved this from Done to To Triage in gpt-oss Issues & Enhancements Apr 27, 2026

mergify Bot added the needs-rebase label Apr 27, 2026

zyongye and others added 19 commits April 27, 2026 06:24

Integrate MegaMoE kernel

1a39688

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

chore: fix pre-commit

1905afe

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

fix (ci): interface mismatches

092a532

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

CI Failure for deep_gemm and layernorm_fp8_quant

2d33f21

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

make kernel compatible with rocm platform

bc3c9c9

Signed-off-by: ganyi <ygan@amd.com> Made-with: Cursor

resolve merge conflict

b9b872d

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

remove merge conflict

c795d22

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

resolve merge conflict

4ed6311

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

remove merge conflict

f43a82c

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

remove redundant codes; add _old_use_hadamard

5316ce9

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix bug

2244a3d

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

fix compilation bug and syntax error

e6a44cd

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

FIX

fc0ad8f

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

fix (ci): pre-commit happy

a510c30

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

resolve merge conflict

bd14dff

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

fix layernorm quant for rocm and add unfused_triton

ff53d47

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

sync upstream

599e5c8

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

resolve merge conflict

12ed03e

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

mergify Bot removed the needs-rebase label Apr 27, 2026

tjtanaavllm and others added 3 commits April 27, 2026 14:26

resolve merge conflict

2b4420e

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

support topk_softplus for all number of experts

821018e

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

fix formatting

7b2a867

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaavllm closed this Apr 29, 2026

github-project-automation Bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] [DeepSeekV4] Preliminary Enablement of DeepSeekV4 on ROCm#40931

[ROCm] [DeepSeekV4] Preliminary Enablement of DeepSeekV4 on ROCm#40931
tjtanaavllm wants to merge 53 commits intovllm-project:mainfrom
ROCm:tj/dsv4prrebase

tjtanaavllm commented Apr 26, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Apr 27, 2026

Uh oh!

mergify Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Uh oh!

Conversation

tjtanaavllm commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify Bot commented Apr 27, 2026

Uh oh!

mergify Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

tjtanaavllm commented Apr 26, 2026 •

edited

Loading