fix(sa-bench): actionable error + warmup parity for use_chat_template by YAMY1234 · Pull Request #76 · NVIDIA/srt-slurm

YAMY1234 · 2026-04-25T01:07:11Z

Summary

Two small, related fixes in sa-bench for models that don't ship a jinja chat template (notably DeepSeek-V4-Pro / DSV4-Pro).

1. `benchmark_serving.py`: fail fast with an actionable error

When --use-chat-template is passed but the loaded tokenizer has neither a jinja chat_template nor an overridden apply_chat_template method, raise a ValueError immediately after the tokenizer is loaded, with a message that tells the user exactly how to fix their recipe:

Use the SGLang DSV4 tokenizer added in feat(sa-bench): add sglang DeepSeek-V4 tokenizer #73, or
Set benchmark.use_chat_template: false.

Without this, runs crash hundreds of lines later inside sample_random_requests with a generic transformers error (Cannot use chat template functions because tokenizer.chat_template is not set) that gives no hint about the actual recipe-level fix. This has hit at least two users onboarding to DSV4 benchmarking.

Repro (before this PR), with a recipe that has benchmark.use_chat_template: true (or omits it — the schema default is true) and no custom_tokenizer:

Traceback (most recent call last):
  File "/srtctl-benchmarks/sa-bench/benchmark_serving.py", line 346, in sample_random_requests
    chat_template_dummy = tokenizer.apply_chat_template(...)
  ...
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set
and no template argument was passed!

After this PR:

ValueError: --use-chat-template was set, but the loaded tokenizer
(LlamaTokenizerFast) has no jinja chat_template and does not override
apply_chat_template().

For DeepSeek-V4 / DSV4-Pro, set in your recipe:
  benchmark:
    custom_tokenizer: sa_bench_tokenizers.sglang_deepseek_v4.SGLangDeepseekV4Tokenizer

Or, to skip chat-template formatting entirely (random tokens sent raw,
mirrors pre-DSV4 behavior):
  benchmark:
    use_chat_template: false

2. `bench.sh`: warmup parity + early notice

Warmup runs were missing ${CHAT_TEMPLATE_ARGS[@]}, so warmup always ran without chat template even when the main measurement run had it enabled. This is a silent bug: the warmup cache state doesn't match what the measured run will actually exercise.
Adds a one-time [sa-bench] notice: echo when use_chat_template=true but no custom_tokenizer is configured, mirroring the Python-side guidance — useful for surfacing the issue in the first lines of benchmark.out before the tokenizer crash.

Test plan

DSV4-Pro recipe with use_chat_template: true and no custom_tokenizer → fails immediately at tokenizer load with the new actionable message (no spurious warmup output before).
DSV4-Pro recipe with use_chat_template: true + custom_tokenizer: sa_bench_tokenizers.sglang_deepseek_v4.SGLangDeepseekV4Tokenizer → runs end-to-end (warmup + main both with chat template applied).
DSV4-Pro recipe with use_chat_template: false → runs end-to-end with raw random tokens (existing behavior, regression check).
Llama / model that does ship a jinja chat template → unchanged behavior.

Notes

No schema or default-value changes; this is purely diagnostic + warmup parity.
The custom-tokenizer path referenced in the error message landed in feat(sa-bench): add sglang DeepSeek-V4 tokenizer #73.

Made with Cursor

… lacks chat_template Two related fixes for sa-bench when running models without a jinja chat template (e.g. DeepSeek-V4-Pro): 1. benchmark_serving.py: when --use-chat-template is set but the loaded tokenizer has neither a jinja chat_template nor an overridden apply_chat_template method, fail fast with a clear message pointing to either SGLangDeepseekV4Tokenizer (NVIDIA#73) or use_chat_template: false. Previously this crashed deep inside transformers with a generic ValueError that gave no hint how to fix the recipe. 2. bench.sh: warmup runs were missing CHAT_TEMPLATE_ARGS, so warmup always ran without chat template even when the main run had it enabled -- leading to mismatched cache state between warmup and measurement. Also adds an early-exit notice when use_chat_template is true but no custom_tokenizer is configured.

codecov-commenter · 2026-04-25T01:08:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@2cbbede). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #76   +/-   ##
=======================================
  Coverage        ?   70.35%           
=======================================
  Files           ?       59           
  Lines           ?     6270           
  Branches        ?        0           
=======================================
  Hits            ?     4411           
  Misses          ?     1859           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

YAMY1234 mentioned this pull request Apr 25, 2026

fix(sa-bench): auto-fallback when tokenizer has no chat template #74

Closed

4 tasks

YAMY1234 marked this pull request as ready for review April 25, 2026 03:29

YAMY1234 requested review from alec-flowers, csahithi, ishandhanani and nlevin-ui as code owners April 25, 2026 03:29

ishandhanani merged commit 154a750 into NVIDIA:main Apr 26, 2026
6 checks passed

alec-flowers mentioned this pull request Apr 27, 2026

[codex] Add SA-Bench DeepSeek V4 tokenizer mode #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sa-bench): actionable error + warmup parity for use_chat_template#76

fix(sa-bench): actionable error + warmup parity for use_chat_template#76
ishandhanani merged 1 commit into
NVIDIA:mainfrom
YAMY1234:fix/sa-bench-chat-template-actionable-error

YAMY1234 commented Apr 25, 2026

Uh oh!

codecov-commenter commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YAMY1234 commented Apr 25, 2026

Summary

1. benchmark_serving.py: fail fast with an actionable error

2. bench.sh: warmup parity + early notice

Test plan

Notes

Uh oh!

codecov-commenter commented Apr 25, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. `benchmark_serving.py`: fail fast with an actionable error

2. `bench.sh`: warmup parity + early notice