Skip to content

fix(sa-bench): actionable error + warmup parity for use_chat_template#76

Merged
ishandhanani merged 1 commit into
NVIDIA:mainfrom
YAMY1234:fix/sa-bench-chat-template-actionable-error
Apr 26, 2026
Merged

fix(sa-bench): actionable error + warmup parity for use_chat_template#76
ishandhanani merged 1 commit into
NVIDIA:mainfrom
YAMY1234:fix/sa-bench-chat-template-actionable-error

Conversation

@YAMY1234
Copy link
Copy Markdown
Collaborator

Summary

Two small, related fixes in sa-bench for models that don't ship a jinja chat template (notably DeepSeek-V4-Pro / DSV4-Pro).

1. benchmark_serving.py: fail fast with an actionable error

When --use-chat-template is passed but the loaded tokenizer has neither a jinja chat_template nor an overridden apply_chat_template method, raise a ValueError immediately after the tokenizer is loaded, with a message that tells the user exactly how to fix their recipe:

Without this, runs crash hundreds of lines later inside sample_random_requests with a generic transformers error (Cannot use chat template functions because tokenizer.chat_template is not set) that gives no hint about the actual recipe-level fix. This has hit at least two users onboarding to DSV4 benchmarking.

Repro (before this PR), with a recipe that has benchmark.use_chat_template: true (or omits it — the schema default is true) and no custom_tokenizer:

Traceback (most recent call last):
  File "/srtctl-benchmarks/sa-bench/benchmark_serving.py", line 346, in sample_random_requests
    chat_template_dummy = tokenizer.apply_chat_template(...)
  ...
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set
and no template argument was passed!

After this PR:

ValueError: --use-chat-template was set, but the loaded tokenizer
(LlamaTokenizerFast) has no jinja chat_template and does not override
apply_chat_template().

For DeepSeek-V4 / DSV4-Pro, set in your recipe:
  benchmark:
    custom_tokenizer: sa_bench_tokenizers.sglang_deepseek_v4.SGLangDeepseekV4Tokenizer

Or, to skip chat-template formatting entirely (random tokens sent raw,
mirrors pre-DSV4 behavior):
  benchmark:
    use_chat_template: false

2. bench.sh: warmup parity + early notice

  • Warmup runs were missing ${CHAT_TEMPLATE_ARGS[@]}, so warmup always ran without chat template even when the main measurement run had it enabled. This is a silent bug: the warmup cache state doesn't match what the measured run will actually exercise.
  • Adds a one-time [sa-bench] notice: echo when use_chat_template=true but no custom_tokenizer is configured, mirroring the Python-side guidance — useful for surfacing the issue in the first lines of benchmark.out before the tokenizer crash.

Test plan

  • DSV4-Pro recipe with use_chat_template: true and no custom_tokenizer → fails immediately at tokenizer load with the new actionable message (no spurious warmup output before).
  • DSV4-Pro recipe with use_chat_template: true + custom_tokenizer: sa_bench_tokenizers.sglang_deepseek_v4.SGLangDeepseekV4Tokenizer → runs end-to-end (warmup + main both with chat template applied).
  • DSV4-Pro recipe with use_chat_template: false → runs end-to-end with raw random tokens (existing behavior, regression check).
  • Llama / model that does ship a jinja chat template → unchanged behavior.

Notes

Made with Cursor

… lacks chat_template

Two related fixes for sa-bench when running models without a jinja chat
template (e.g. DeepSeek-V4-Pro):

1. benchmark_serving.py: when --use-chat-template is set but the loaded
   tokenizer has neither a jinja chat_template nor an overridden
   apply_chat_template method, fail fast with a clear message pointing
   to either SGLangDeepseekV4Tokenizer (NVIDIA#73) or use_chat_template: false.
   Previously this crashed deep inside transformers with a generic
   ValueError that gave no hint how to fix the recipe.

2. bench.sh: warmup runs were missing CHAT_TEMPLATE_ARGS, so warmup
   always ran without chat template even when the main run had it
   enabled -- leading to mismatched cache state between warmup and
   measurement. Also adds an early-exit notice when use_chat_template
   is true but no custom_tokenizer is configured.
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@2cbbede). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #76   +/-   ##
=======================================
  Coverage        ?   70.35%           
=======================================
  Files           ?       59           
  Lines           ?     6270           
  Branches        ?        0           
=======================================
  Hits            ?     4411           
  Misses          ?     1859           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@YAMY1234 YAMY1234 marked this pull request as ready for review April 25, 2026 03:29
@ishandhanani ishandhanani merged commit 154a750 into NVIDIA:main Apr 26, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants