Skip to content

[ROCm] [DeepSeekV4] Preliminary Enablement of DeepSeekV4 on ROCm#40931

Closed
tjtanaavllm wants to merge 53 commits intovllm-project:mainfrom
ROCm:tj/dsv4prrebase
Closed

[ROCm] [DeepSeekV4] Preliminary Enablement of DeepSeekV4 on ROCm#40931
tjtanaavllm wants to merge 53 commits intovllm-project:mainfrom
ROCm:tj/dsv4prrebase

Conversation

@tjtanaavllm
Copy link
Copy Markdown
Contributor

@tjtanaavllm tjtanaavllm commented Apr 26, 2026

Purpose

This PR supersedes #40871 as it is rebased against #40860 . (On CUDA side, PR #40760 supersedes PR #40871 )

Test Plan

Test Result

Server command:

max_num_seqs=64
max_num_batched_tokens=131072
tensor_parallel_size=4
export HF_HOME=/data/huggingface-cache
export VLLM_ROCM_USE_AITER=1

unset FLATMM_HIP_CLANG_PATH
MODEL=deepseek-ai/DeepSeek-V4-Flash
vllm serve ${MODEL} \
    --host localhost \
    --port 8000 \
    --dtype auto \
    --tensor-parallel-size ${tensor_parallel_size} \
    --max-num-seqs ${max_num_seqs} \
    --distributed-executor-backend mp \
    --trust-remote-code \
    --gpu-memory-utilization 0.40 \
    --moe-backend "triton_unfused" \
    --enforce-eager \
    --tokenizer-mode "deepseek_v4" \
    --async-scheduling

preliminary tests
lm_eval score

MODEL=deepseek-ai/DeepSeek-V4-Flash
lm_eval --model local-completions --model_args model=$MODEL,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=8,max_retries=10,max_gen_toks=2048 --batch_size auto --tasks gsm8k --num_fewshot 5 --limit 16  --output_path . 2>&1 | tee -a eval-oldforward.log
local-completions ({'model': 'deepseek-ai/DeepSeek-V4-Flash', 'base_url': 'http://0.0.0.0:8000/v1/completions', 'num_concurrent': 8, 'max_retries': 10, 'max_gen_toks': 2048}), gen_kwargs: ({}), limit: 16.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.875|_  |0.0854|
|     |       |strict-match    |     5|exact_match|_  |0.875|_  |0.0854|

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

zyongye and others added 30 commits April 25, 2026 20:01
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: Yongye Zhu <yongye@inferact.ai>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Simon Mo <simon@inferact.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: qizixi <zixi@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: ganyi <ygan@amd.com>
Made-with: Cursor
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
@tjtanaavllm tjtanaavllm reopened this Apr 27, 2026
@github-project-automation github-project-automation Bot moved this from Done to To Triage in gpt-oss Issues & Enhancements Apr 27, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 27, 2026

Documentation preview: https://vllm--40931.org.readthedocs.build/en/40931/

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 27, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tjtanaavllm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 27, 2026
zyongye and others added 19 commits April 27, 2026 06:24
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: Yongye Zhu <yongye@inferact.ai>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Simon Mo <simon@inferact.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: qizixi <zixi@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: ganyi <ygan@amd.com>
Made-with: Cursor
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
@mergify mergify Bot removed the needs-rebase label Apr 27, 2026
tjtanaavllm and others added 3 commits April 27, 2026 14:26
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector new-model Requests to new models nvidia rocm Related to AMD ROCm speculative-decoding tool-calling v1

Projects

Status: Done
Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

9 participants