Skip to content

Combine DeepSeek V4 recipe and tokenizer changes#71

Closed
alec-flowers wants to merge 15 commits intoNVIDIA:sa-submission-q2-2026from
alec-flowers:aflowers/dsv4-pr67-pr68
Closed

Combine DeepSeek V4 recipe and tokenizer changes#71
alec-flowers wants to merge 15 commits intoNVIDIA:sa-submission-q2-2026from
alec-flowers:aflowers/dsv4-pr67-pr68

Conversation

@alec-flowers
Copy link
Copy Markdown
Collaborator

Summary

  • combine the DeepSeek V4 GB200 recipe branch from PR vLLM DSv4 GB200 PD 042326 #67 with the DeepSeek V4 tokenizer support from PR add dsv4 tokenizer in sa-bench #68
  • set the DeepSeek V4 SA-Bench recipes to use custom_tokenizer: "deepseek_v4"
  • disable use_chat_template for those recipes because DeepSeek V4 does not ship a Jinja chat template

Validation

  • UV_CACHE_DIR="$PWD/.uv-cache" uv run ruff check src/srtctl/benchmarks/scripts/sa-bench/backend_request_func.py
  • parsed all four recipes/vllm/deepseek-v4-pro/8k1k/*.yaml files with PyYAML

@alec-flowers alec-flowers changed the base branch from main to sa-submission-q2-2026 April 24, 2026 15:52
@ywang96
Copy link
Copy Markdown

ywang96 commented Apr 24, 2026

@alec-flowers FYI we've pushed another round of changes in my branch

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (sa-submission-q2-2026@a10acd3). Learn more about missing BASE report.

Additional details and impacted files
@@                   Coverage Diff                    @@
##             sa-submission-q2-2026      #71   +/-   ##
========================================================
  Coverage                         ?   61.41%           
========================================================
  Files                            ?       48           
  Lines                            ?     4139           
  Branches                         ?        0           
========================================================
  Hits                             ?     2542           
  Misses                           ?     1597           
  Partials                         ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Oseltamivir added a commit to SemiAnalysisAI/InferenceX that referenced this pull request Apr 24, 2026
PR 1142's first real sweep hit "ModuleNotFoundError: No module named
'vllm.inputs.data'" on all three multinode jobs. Same error as PR 1129
on GB200.

Root cause: ai-dynamo 1.0.1 (installed by NVIDIA/srt-slurm@sa-submission-q2-2026
via `dynamo: { version: 1.0.1 }`) imports vllm.inputs.data.TokensPrompt,
a path removed in the DSV4 vLLM wheel. Dynamo workers crash during
import before any vLLM flag matters.

Fix, mirroring PR 1129:
- launch_h100-dgxc-slurm.sh: override srt-slurm clone URL/ref via
  SRT_SLURM_REPO_URL and SRT_SLURM_REF env vars, set to
  alec-flowers/srt-slurm@d60e3f1c (head of NVIDIA/srt-slurm#71) for
  dynamo-vllm+dsv4. All other frameworks/models keep NVIDIA upstream.
- Recipes: replace `dynamo.version: 1.0.1` with `dynamo.hash:
  6a159fedd8e4a1563aa647c31f622aedbf254b5b`. The fork's schema accepts
  `hash:` for pinning a specific ai-dynamo/dynamo commit. That commit
  has the matching vllm.inputs import path.
- Recipes: adopt DSV4-specific flags PR 1129 proved necessary for
  startup: `enforce-eager: true` (prefill only), `enable-sleep-mode: true`,
  `no-disable-hybrid-kv-cache-manager: true`, explicit
  `kv-transfer-config` (NixlConnector kv_both), env vars
  VLLM_SERVER_DEV_MODE=1 and TILELANG_CLEANUP_TEMP_FILES=1.
- Recipes: drop `data-parallel-hybrid-lb` and `async-scheduling` (DSR1
  patterns that PR 1129 omitted on DSV4; keep minimal delta from DSV4
  H200 single-node).

Kept H100-specific knobs: VLLM_MOE_DP_CHUNK_SIZE=192, deepep_{high_throughput,
low_latency} all2all backends, VLLM_USE_DEEP_GEMM. Skipped GB200-only
flags (NCCL_MNNVL_ENABLE, NCCL_NVLS_ENABLE, VLLM_USE_NCCL_SYMM_MEM).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ywang96 ywang96 mentioned this pull request Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants