Skip to content

fix(sa-bench): auto-fallback when tokenizer has no chat template#74

Closed
YAMY1234 wants to merge 1 commit intomainfrom
fix/dsv4-no-chat-template-fallback
Closed

fix(sa-bench): auto-fallback when tokenizer has no chat template#74
YAMY1234 wants to merge 1 commit intomainfrom
fix/dsv4-no-chat-template-fallback

Conversation

@YAMY1234
Copy link
Copy Markdown
Collaborator

Summary

Some models (e.g. DeepSeek-V4) ship no Hugging Face chat template; their rendering happens entirely inside the engine via a hard-coded encoder. With the project-wide default use_chat_template: true introduced in #20, recipes that don't also set custom_tokenizer end up calling tokenizer.apply_chat_template(...) directly and crash with:

ValueError: Cannot use apply_chat_template() because tokenizer.chat_template is not set ...

Reported by @ishandhanani while running the dsv4 sa-bench recipes against srt-slurm without the new flags from #73.

Fix

In benchmark_serving.main(), right after get_tokenizer returns, detect the no-template case:

  • if use_chat_template is on and no custom_tokenizer plugin is configured and the tokenizer exposes neither chat_template nor default_chat_template,
  • emit a loud warnings.warn(...) pointing at custom_tokenizer (e.g. SGLangDeepseekV4Tokenizer added in feat(sa-bench): add sglang DeepSeek-V4 tokenizer #73),
  • and force args.use_chat_template = False so the run completes against the raw-text path.

This way:

  • DSv4 recipe without the plugin -> no longer crashes; runs in raw-text mode (input_tokens may diverge from server #new-token, but the user is told why).
  • DSv4 recipe with custom_tokenizer: sa_bench_tokenizers.sglang_deepseek_v4.SGLangDeepseekV4Tokenizer -> unchanged: real DSML rendering, exact parity with sglang server.
  • Models with HF chat templates (DSR1, K2, Qwen, GLM, ...) -> unchanged.

Test plan

  • python3 -m py_compile benchmark_serving.py
  • Re-run dsv4 recipe without custom_tokenizer/use_chat_template overrides; confirm warning + successful sa-bench completion.
  • Re-run an existing dsv4 recipe (e.g. recipes/gb300-fp4/1k1k-dsv4/agg-low-latency-chat.yaml) with custom_tokenizer set; confirm no behavior change vs. main.
  • Re-run a non-DSv4 recipe (DSR1 / Kimi-K2.6) and confirm chat template path is unaffected.

Made with Cursor

Models like DeepSeek-V4 ship no Hugging Face chat template; rendering
lives entirely inside the engine. With the default `use_chat_template:
true` (introduced in #20) and no `custom_tokenizer` plugin, sa-bench
called `tokenizer.apply_chat_template(...)` directly and crashed with
`ValueError: ... has no chat template`.

Detect this case in `main()` after `get_tokenizer` returns: if
`use_chat_template` is on but the tokenizer exposes neither
`chat_template` nor `default_chat_template`, emit a loud warning and
fall back to the raw-text path so the run completes. Users who care
about exact token-count parity with the server are pointed at
`custom_tokenizer` (e.g. SGLangDeepseekV4Tokenizer added in #73).

Recipes that already set `custom_tokenizer` are unaffected.
@YAMY1234
Copy link
Copy Markdown
Collaborator Author

Superseded by #76, which takes a fail-fast approach with an actionable error instead of a silent auto-fallback. Auto-fallback risks producing benchmark numbers from a different code path than the recipe intends (the client/server token counts can diverge, as this PR's own warning notes). #76 also fixes a separate silent bug where bench.sh warmup was missing CHAT_TEMPLATE_ARGS.

@YAMY1234 YAMY1234 closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant