fix(sa-bench): auto-fallback when tokenizer has no chat template by YAMY1234 · Pull Request #74 · NVIDIA/srt-slurm

YAMY1234 · 2026-04-25T00:55:36Z

Summary

Some models (e.g. DeepSeek-V4) ship no Hugging Face chat template; their rendering happens entirely inside the engine via a hard-coded encoder. With the project-wide default use_chat_template: true introduced in #20, recipes that don't also set custom_tokenizer end up calling tokenizer.apply_chat_template(...) directly and crash with:

ValueError: Cannot use apply_chat_template() because tokenizer.chat_template is not set ...

Reported by @ishandhanani while running the dsv4 sa-bench recipes against srt-slurm without the new flags from #73.

Fix

In benchmark_serving.main(), right after get_tokenizer returns, detect the no-template case:

if use_chat_template is on and no custom_tokenizer plugin is configured and the tokenizer exposes neither chat_template nor default_chat_template,
emit a loud warnings.warn(...) pointing at custom_tokenizer (e.g. SGLangDeepseekV4Tokenizer added in feat(sa-bench): add sglang DeepSeek-V4 tokenizer #73),
and force args.use_chat_template = False so the run completes against the raw-text path.

This way:

DSv4 recipe without the plugin -> no longer crashes; runs in raw-text mode (input_tokens may diverge from server #new-token, but the user is told why).
DSv4 recipe with custom_tokenizer: sa_bench_tokenizers.sglang_deepseek_v4.SGLangDeepseekV4Tokenizer -> unchanged: real DSML rendering, exact parity with sglang server.
Models with HF chat templates (DSR1, K2, Qwen, GLM, ...) -> unchanged.

Test plan

python3 -m py_compile benchmark_serving.py
Re-run dsv4 recipe without custom_tokenizer/use_chat_template overrides; confirm warning + successful sa-bench completion.
Re-run an existing dsv4 recipe (e.g. recipes/gb300-fp4/1k1k-dsv4/agg-low-latency-chat.yaml) with custom_tokenizer set; confirm no behavior change vs. main.
Re-run a non-DSv4 recipe (DSR1 / Kimi-K2.6) and confirm chat template path is unaffected.

Made with Cursor

Models like DeepSeek-V4 ship no Hugging Face chat template; rendering lives entirely inside the engine. With the default `use_chat_template: true` (introduced in #20) and no `custom_tokenizer` plugin, sa-bench called `tokenizer.apply_chat_template(...)` directly and crashed with `ValueError: ... has no chat template`. Detect this case in `main()` after `get_tokenizer` returns: if `use_chat_template` is on but the tokenizer exposes neither `chat_template` nor `default_chat_template`, emit a loud warning and fall back to the raw-text path so the run completes. Users who care about exact token-count parity with the server are pointed at `custom_tokenizer` (e.g. SGLangDeepseekV4Tokenizer added in #73). Recipes that already set `custom_tokenizer` are unaffected.

YAMY1234 · 2026-04-25T03:29:17Z

Superseded by #76, which takes a fail-fast approach with an actionable error instead of a silent auto-fallback. Auto-fallback risks producing benchmark numbers from a different code path than the recipe intends (the client/server token counts can diverge, as this PR's own warning notes). #76 also fixes a separate silent bug where bench.sh warmup was missing CHAT_TEMPLATE_ARGS.

YAMY1234 requested review from alec-flowers, csahithi, ishandhanani and nlevin-ui as code owners April 25, 2026 00:55

YAMY1234 closed this Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sa-bench): auto-fallback when tokenizer has no chat template#74

fix(sa-bench): auto-fallback when tokenizer has no chat template#74
YAMY1234 wants to merge 1 commit intomainfrom
fix/dsv4-no-chat-template-fallback

YAMY1234 commented Apr 25, 2026

Uh oh!

YAMY1234 commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YAMY1234 commented Apr 25, 2026

Summary

Fix

Test plan

Uh oh!

YAMY1234 commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant