fix: Handle multiple named chat templates in HuggingFace tokenizers by JustinTong0323 · Pull Request #17236 · sgl-project/sglang

JustinTong0323 · 2026-01-16T22:20:00Z

Motivation

Fix #17231
SGLang crashes with TypeError: expected string or bytes-like object, got 'dict' when loading models whose HuggingFace tokenizers have multiple named chat templates (e.g., CohereLabs/c4ai-command-r7b-12-2024).

HuggingFace tokenizers can store chat templates in two formats:

Single template: tokenizer.chat_template = "..."
Multiple named templates: tokenizer.chat_template = {"default": "...", "tool_use": "...", "rag": "..."}

The current implementation only handles the single template format, causing crashes when encountering the dict format.

Error trace:

File "/sglang/python/sglang/srt/managers/template_manager.py", line 94, in _detect_reasoning_pattern
    has_reasoning = re.search(force_reasoning_pattern, template) is not None
TypeError: expected string or bytes-like object, got 'dict'

Usage

python -m sglang.launch_server \
    --model-path CohereLabs/c4ai-command-r7b-12-2024 \
    --trust-remote-code \
    --hf-chat-template-name tool_use

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

gemini-code-assist · 2026-01-16T22:20:18Z

Summary of Changes

Hello @JustinTong0323, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the system's compatibility with HuggingFace tokenizers by addressing a critical bug where models using dictionary-based chat templates would cause a TypeError. It introduces robust logic to correctly identify and load these multi-template configurations, providing users with the flexibility to specify a preferred template via a new command-line argument, --hf-chat-template-name. This ensures smoother model loading and operation for a wider range of HuggingFace models.

Highlights

Bug Fix for Chat Templates: Resolves a TypeError that occurred when HuggingFace tokenizers provided chat templates as a dictionary of named templates instead of a single string.
Support for Multiple Named Templates: Enables the system to correctly parse and utilize HuggingFace tokenizers that define multiple chat templates (e.g., 'default', 'tool_use', 'rag').
New CLI Argument: Introduces --hf-chat-template-name to allow users to explicitly select a specific named chat template from those available in the tokenizer.
Default Template Selection: If --hf-chat-template-name is not specified, the system will now automatically use the first available chat template found in the tokenizer's dictionary.
Documentation Update: The server_arguments.md documentation has been updated to reflect the new --hf-chat-template-name argument and its usage.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

JustinTong0323 · 2026-01-16T22:20:40Z

/tag-and-rerun-ci

gemini-code-assist

Code Review

This pull request effectively addresses a crash that occurs when loading HuggingFace tokenizers with multiple named chat templates. The introduction of the --hf-chat-template-name argument is a good solution, and the implementation is clean, including necessary documentation updates. My feedback focuses on enhancing the default behavior to be more predictable by prioritizing the 'default' template and ensuring the documentation is fully consistent with this improvement.

gemini-code-assist · 2026-01-16T22:21:35Z

python/sglang/srt/managers/template_manager.py

+        # Fallback: Use first available template
+        first_name = available_names[0]
+        logger.info(f"Using first available template: '{first_name}'")
+        return templates[first_name]


The current fallback logic in _select_named_template uses the first available template. However, the help text for --hf-chat-template-name in server_args.py specifies that it should prefer the 'default' template if it exists. It's good practice to align the implementation with the documentation. Prioritizing the 'default' template is a more predictable and user-friendly behavior.

Suggested change

# Fallback: Use first available template

first_name = available_names[0]

logger.info(f"Using first available template: '{first_name}'")

return templates[first_name]

# Fallback: Use 'default' if available, otherwise the first available template

if "default" in templates:

logger.info("Using 'default' chat template.")

return templates["default"]

first_name = available_names[0]

logger.info(f"No 'default' template found. Using first available template: '{first_name}'")

return templates[first_name]

docs/advanced_features/server_arguments.md

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

ljw-mc · 2026-01-16T22:53:58Z

Just tested on the original issue, the commands work!

Thank you so much! @JustinTong0323

JustinTong0323 · 2026-01-18T04:13:29Z

/tag-and-rerun-ci

* fix(ci): recover from corrupted MMMU parquet cache (sgl-project#17256) * [diffusion] feat: support default 4-step inference for Flux2-Klein distilled models (sgl-project#17225) Signed-off-by: Lancer <maruixiang6688@gmail.com> * Add runner utilization report workflow (sgl-project#17234) * cli: support sglang version (sgl-project#17250) * Use swa radix cache and memory pool for gpt-oss model (sgl-project#17261) * [VLM][Reland] Refactor load_mm_data to improve performance (sgl-project#16152) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> * [Tiny] Improve docs (sgl-project#17264) * [diffusion] fix: set guidance_scale default to None (sgl-project#17182) * Tiny fix comment typo (sgl-project#17287) * [SPEC_V2] Enable cudagraph draft_extend for trtllm_mla_backend and Acclen Fix for DP under cudagraph mode (sgl-project#16974) * Add kl test for swa radix cache (sgl-project#17281) * fix: Handle multiple named chat templates in HuggingFace tokenizers (sgl-project#17236) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> * Move radix cache related tests (sgl-project#17295) * [Refactor] Add `-fp4-gemm-backend` to replace `SGLANG_FLASHINFER_FP4_GEMM_BACKEND` (sgl-project#16534) Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com> * [Bugfix] Fix PD accuracy when MTP is not configured on the prefill node (sgl-project#17212) Co-authored-by: Shangming Cai <csmthu@gmail.com> * [Diffusion] Apply jit qk_norm to flux1 (sgl-project#17296) * [Refactor] Split out deepseek v2 weight loader function into mixin (sgl-project#16649) * [NPU]Support GPT-OSS for NPU (sgl-project#14197) * [jit-kernel] Add CuTe DSL GDN Decode Kernel (sgl-project#15631) Co-authored-by: Jinyan Chen <jinyanc@nvidia.com> * [GLM 4.7] Add RTX 6000 Pro aka sm120 (sgl-project#17235) Co-authored-by: root <root@ubuntu-nvidia.localdomain> * Update CODEOWNERS for multimodal_gen (sgl-project#17308) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> * [Feature] overlap LoRA weight loading with compute (sgl-project#15512) * [PD] Optimize MHA models pp util calculation logic (sgl-project#17306) * [Minor] Correct sglang version when installing from source (sgl-project#17315) * Use dsv3 optimized routing `fused_topk_deepseek` instead of `moe_fused_gate` (sgl-project#15347) * [DeepSeek v3.2] Opt MTP decode cuda batch sizes and nsa implementation (sgl-project#16961) * Update code sync scripts (sgl-project#17319) * [Auto Sync] Update tokenizer_manager.py (20260119) (sgl-project#17317) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * support new qwen3_coder_detector (sgl-project#16744) Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com> * Fix kernel selection in biased_grouped_topk_gpu (sgl-project#17325) * KV Cache Events with Attention DP bug fix (sgl-project#16030) (sgl-project#16412) * [Perf] fuse q, k norm for Flux2Attention (sgl-project#17241) Co-authored-by: Minglei Zhu <zminglei@linkedin.com> * [CI] Add partition to stage-b-test-large-1-gpu (11->12) (sgl-project#17245) * fix(ci): rate limit and permission errors in trace publishing (sgl-project#17238) * Revert "[Perf] fuse q, k norm for Flux2Attention (sgl-project#17241)" (sgl-project#17332) * Migrate performance, accuracy, and quantization tests to CI registry (sgl-project#17177) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> * Inclusion of nvfp4 blockscale in EPLB Rebalance (sgl-project#17158) * [Refactor] Set `fp4-gemm-backend=auto` on SM100 and rename `fp4-gemm-backend` with `flashinfer_` prefix (sgl-project#17309) * [Diffusion] Apply qknorm to flux2 and apply lightx2v rms_norm_one_pass kernel(without residual) (sgl-project#17305) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fix v32 continue_final_message not work (sgl-project#16567) * Evict swa kv cache during decoding (sgl-project#17220) * [RadixTree][1/N Refactor]: Support unified match_prefix params (sgl-project#17142) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> * [AMD CI] Migrate and Add More Testcases (sgl-project#17116) Co-authored-by: yctseng0211 <yctseng@amd.com> * [AMD] CI - add partitions for stage-b-test-small-1-gpu-amd (sgl-project#17345) * Restore deepseek_v2.py to main's code, except the utils * Ran `pre-commit` --------- Signed-off-by: Lancer <maruixiang6688@gmail.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Hudson Xing <1277646412@qq.com> Co-authored-by: Lancer <402430575@qq.com> Co-authored-by: Alison Shao <54658187+alisonshao@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com> Co-authored-by: Yuan Luo <yuan.luo@hotmail.com> Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu> Co-authored-by: Changyi Yang <112288487+ChangyiYang@users.noreply.github.com> Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca> Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com> Co-authored-by: Ch3ngY1 <91232537+Ch3ngY1@users.noreply.github.com> Co-authored-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Jerry Ji <jerryjilol@gmail.com> Co-authored-by: Todobe <43903496+Todobe@users.noreply.github.com> Co-authored-by: Jinyan Chen <93358689+liz-badada@users.noreply.github.com> Co-authored-by: Jinyan Chen <jinyanc@nvidia.com> Co-authored-by: Koushik Dutta <koush@koushikdutta.com> Co-authored-by: root <root@ubuntu-nvidia.localdomain> Co-authored-by: Glen Liu <62917497+glenliu21@users.noreply.github.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Lee Nau <lnau@nvidia.com> Co-authored-by: Yongfei Xu <xuyongfei.xyf@antgroup.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Gaoji Liu <34803073+attack204@users.noreply.github.com> Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com> Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com> Co-authored-by: Kartik Ramesh <kartikx2000@gmail.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Minglei Zhu <zminglei@linkedin.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Shu Wang <shuw@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com> Co-authored-by: zhangheng <hzh0425@apache.org> Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com> Co-authored-by: yctseng0211 <yctseng@amd.com>

fix: Handle multiple named chat templates in HuggingFace tokenizers

dfc03db

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners January 16, 2026 22:20

github-actions bot added the documentation Improvements or additions to documentation label Jan 16, 2026

github-actions bot added the run-ci label Jan 16, 2026

gemini-code-assist bot reviewed Jan 16, 2026

View reviewed changes

upd

a3b08b9

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 mentioned this pull request Jan 16, 2026

[new-model] Add support for Cohere2ForCausalLM behind Command-A and Command-R Models #16927

Merged

5 tasks

Fridge003 approved these changes Jan 18, 2026

View reviewed changes

Fridge003 merged commit 2069050 into sgl-project:main Jan 18, 2026
395 of 425 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Handle multiple named chat templates in HuggingFace tokenizers#17236

fix: Handle multiple named chat templates in HuggingFace tokenizers#17236
Fridge003 merged 2 commits intosgl-project:mainfrom
JustinTong0323:fix-multiple-templates-hf

JustinTong0323 commented Jan 16, 2026

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Uh oh!

JustinTong0323 commented Jan 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

Uh oh!

ljw-mc commented Jan 16, 2026

Uh oh!

JustinTong0323 commented Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JustinTong0323 commented Jan 16, 2026

Motivation

Usage

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

JustinTong0323 commented Jan 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ljw-mc commented Jan 16, 2026

Uh oh!

JustinTong0323 commented Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants