Combine DeepSeek V4 recipe and tokenizer changes by alec-flowers · Pull Request #71 · NVIDIA/srt-slurm

alec-flowers · 2026-04-24T15:51:31Z

Summary

combine the DeepSeek V4 GB200 recipe branch from PR vLLM DSv4 GB200 PD 042326 #67 with the DeepSeek V4 tokenizer support from PR add dsv4 tokenizer in sa-bench #68
set the DeepSeek V4 SA-Bench recipes to use custom_tokenizer: "deepseek_v4"
disable use_chat_template for those recipes because DeepSeek V4 does not ship a Jinja chat template

Validation

UV_CACHE_DIR="$PWD/.uv-cache" uv run ruff check src/srtctl/benchmarks/scripts/sa-bench/backend_request_func.py
parsed all four recipes/vllm/deepseek-v4-pro/8k1k/*.yaml files with PyYAML

ywang96 · 2026-04-24T16:37:42Z

@alec-flowers FYI we've pushed another round of changes in my branch

codecov-commenter · 2026-04-24T16:59:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (sa-submission-q2-2026@a10acd3). Learn more about missing BASE report.

Additional details and impacted files

@@                   Coverage Diff                    @@
##             sa-submission-q2-2026      #71   +/-   ##
========================================================
  Coverage                         ?   61.41%           
========================================================
  Files                            ?       48           
  Lines                            ?     4139           
  Branches                         ?        0           
========================================================
  Hits                             ?     2542           
  Misses                           ?     1597           
  Partials                         ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PR 1142's first real sweep hit "ModuleNotFoundError: No module named 'vllm.inputs.data'" on all three multinode jobs. Same error as PR 1129 on GB200. Root cause: ai-dynamo 1.0.1 (installed by NVIDIA/srt-slurm@sa-submission-q2-2026 via `dynamo: { version: 1.0.1 }`) imports vllm.inputs.data.TokensPrompt, a path removed in the DSV4 vLLM wheel. Dynamo workers crash during import before any vLLM flag matters. Fix, mirroring PR 1129: - launch_h100-dgxc-slurm.sh: override srt-slurm clone URL/ref via SRT_SLURM_REPO_URL and SRT_SLURM_REF env vars, set to alec-flowers/srt-slurm@d60e3f1c (head of NVIDIA/srt-slurm#71) for dynamo-vllm+dsv4. All other frameworks/models keep NVIDIA upstream. - Recipes: replace `dynamo.version: 1.0.1` with `dynamo.hash: 6a159fedd8e4a1563aa647c31f622aedbf254b5b`. The fork's schema accepts `hash:` for pinning a specific ai-dynamo/dynamo commit. That commit has the matching vllm.inputs import path. - Recipes: adopt DSV4-specific flags PR 1129 proved necessary for startup: `enforce-eager: true` (prefill only), `enable-sleep-mode: true`, `no-disable-hybrid-kv-cache-manager: true`, explicit `kv-transfer-config` (NixlConnector kv_both), env vars VLLM_SERVER_DEV_MODE=1 and TILELANG_CLEANUP_TEMP_FILES=1. - Recipes: drop `data-parallel-hybrid-lb` and `async-scheduling` (DSR1 patterns that PR 1129 omitted on DSV4; keep minimal delta from DSV4 H200 single-node). Kept H100-specific knobs: VLLM_MOE_DP_CHUNK_SIZE=192, deepep_{high_throughput, low_latency} all2all backends, VLLM_USE_DEEP_GEMM. Skipped GB200-only flags (NCCL_MNNVL_ENABLE, NCCL_NVLS_ENABLE, VLLM_USE_NCCL_SYMM_MEM). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ywang96 and others added 11 commits April 23, 2026 21:48

add

6da81c4

update

d3b958b

fix dynamo

af67085

1p1d

951248d

fix

daa1e4a

add

1bf11fb

add

f330771

add

466bb99

add

6f4e65c

add dsv4 tokenizer

6573922

Set DeepSeek V4 SA-Bench tokenizer

cc50dc3

alec-flowers changed the base branch from main to sa-submission-q2-2026 April 24, 2026 15:52

Add DeepSeek V4 tokenizer mode for SA-Bench

e6eecd5

Refresh DeepSeek V4 offload recipes

12e0b61

YAMY1234 mentioned this pull request Apr 24, 2026

feat(sa-bench): add sglang DeepSeek-V4 tokenizer (depends on #71) #72

Closed

4 tasks

update

61ec64e

YAMY1234 mentioned this pull request Apr 24, 2026

feat(sa-bench): add sglang DeepSeek-V4 tokenizer #73

Merged

3 tasks

Pin Dynamo commit for DeepSeek V4 recipes

d60e3f1

Oseltamivir mentioned this pull request Apr 25, 2026

Day 0 DeepSeek V4 Pro FP4 GB200 disaggregated vLLM benchmarks SemiAnalysisAI/InferenceX#1129

Merged

5 tasks

ywang96 mentioned this pull request Apr 25, 2026

vLLM DSv4 GB200 PD 042326 #67

Closed

alec-flowers closed this Apr 25, 2026

alec-flowers mentioned this pull request Apr 25, 2026

Add DeepSeek V4 GB200 recipes #77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine DeepSeek V4 recipe and tokenizer changes#71

Combine DeepSeek V4 recipe and tokenizer changes#71
alec-flowers wants to merge 15 commits intoNVIDIA:sa-submission-q2-2026from
alec-flowers:aflowers/dsv4-pr67-pr68

alec-flowers commented Apr 24, 2026

Uh oh!

ywang96 commented Apr 24, 2026

Uh oh!

codecov-commenter commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alec-flowers commented Apr 24, 2026

Summary

Validation

Uh oh!

ywang96 commented Apr 24, 2026

Uh oh!

codecov-commenter commented Apr 24, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants