fix(sa-bench): actionable error + warmup parity for use_chat_template#76
Merged
ishandhanani merged 1 commit intoApr 26, 2026
Conversation
… lacks chat_template Two related fixes for sa-bench when running models without a jinja chat template (e.g. DeepSeek-V4-Pro): 1. benchmark_serving.py: when --use-chat-template is set but the loaded tokenizer has neither a jinja chat_template nor an overridden apply_chat_template method, fail fast with a clear message pointing to either SGLangDeepseekV4Tokenizer (NVIDIA#73) or use_chat_template: false. Previously this crashed deep inside transformers with a generic ValueError that gave no hint how to fix the recipe. 2. bench.sh: warmup runs were missing CHAT_TEMPLATE_ARGS, so warmup always ran without chat template even when the main run had it enabled -- leading to mismatched cache state between warmup and measurement. Also adds an early-exit notice when use_chat_template is true but no custom_tokenizer is configured.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #76 +/- ##
=======================================
Coverage ? 70.35%
=======================================
Files ? 59
Lines ? 6270
Branches ? 0
=======================================
Hits ? 4411
Misses ? 1859
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two small, related fixes in
sa-benchfor models that don't ship a jinja chat template (notably DeepSeek-V4-Pro / DSV4-Pro).1.
benchmark_serving.py: fail fast with an actionable errorWhen
--use-chat-templateis passed but the loaded tokenizer has neither a jinjachat_templatenor an overriddenapply_chat_templatemethod, raise aValueErrorimmediately after the tokenizer is loaded, with a message that tells the user exactly how to fix their recipe:benchmark.use_chat_template: false.Without this, runs crash hundreds of lines later inside
sample_random_requestswith a generic transformers error (Cannot use chat template functions because tokenizer.chat_template is not set) that gives no hint about the actual recipe-level fix. This has hit at least two users onboarding to DSV4 benchmarking.Repro (before this PR), with a recipe that has
benchmark.use_chat_template: true(or omits it — the schema default istrue) and nocustom_tokenizer:After this PR:
2.
bench.sh: warmup parity + early notice${CHAT_TEMPLATE_ARGS[@]}, so warmup always ran without chat template even when the main measurement run had it enabled. This is a silent bug: the warmup cache state doesn't match what the measured run will actually exercise.[sa-bench] notice:echo whenuse_chat_template=truebut nocustom_tokenizeris configured, mirroring the Python-side guidance — useful for surfacing the issue in the first lines ofbenchmark.outbefore the tokenizer crash.Test plan
use_chat_template: trueand nocustom_tokenizer→ fails immediately at tokenizer load with the new actionable message (no spurious warmup output before).use_chat_template: true+custom_tokenizer: sa_bench_tokenizers.sglang_deepseek_v4.SGLangDeepseekV4Tokenizer→ runs end-to-end (warmup + main both with chat template applied).use_chat_template: false→ runs end-to-end with raw random tokens (existing behavior, regression check).Notes
Made with Cursor