fix: use custom_tokenizer to workaround the trtllm + glm5 tokenizer loading issue by richardhuo-nv · Pull Request #20 · NVIDIA/srt-slurm

richardhuo-nv · 2026-04-09T19:46:53Z

TRT-LLM is still on Transformers v4, while the GLM-5 model was built with Transformers v5. As a result, the GLM-5 tokenizer cannot be loaded directly with AutoTokenizer in Transformers v4.

Our current workaround is adapted from TensorRT-LLM’s glm_moe_dsa tokenizer implementation:
https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/tokenizer/glm_moe_dsa/tokenizer.py

This workaround uses the Rust tokenizer library to load tokenizer.json, and then initializes a Transformers v4 AutoTokenizer with appropriately translated settings from tokenizer_config.json.

At the moment, this workaround does not support chat_template, so we need to disable chat templating for now.

benchmark:
  custom_tokenizer: "glm_moe_dsa"
  use_chat_template: false

fix

codecov-commenter · 2026-04-09T19:48:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@e93856b). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #20   +/-   ##
=======================================
  Coverage        ?   60.13%           
=======================================
  Files           ?       48           
  Lines           ?     4079           
  Branches        ?        0           
=======================================
  Hits            ?     2453           
  Misses          ?     1626           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…#47) * fix tokenizer for glm5 (#20) fix * add nvidia pre-release url (#22)

fix for glm5

c93aa8d

fix

ishandhanani approved these changes Apr 9, 2026

View reviewed changes

ishandhanani merged commit 129c6fc into NVIDIA:main Apr 9, 2026
5 checks passed

richardhuo-nv added a commit that referenced this pull request Apr 20, 2026

fix: add glm5 dynamo trtllm benchmark support to sa submission branch (…

10f4ac9

…#47) * fix tokenizer for glm5 (#20) fix * add nvidia pre-release url (#22)

YAMY1234 mentioned this pull request Apr 25, 2026

fix(sa-bench): auto-fallback when tokenizer has no chat template #74

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use custom_tokenizer to workaround the trtllm + glm5 tokenizer loading issue#20

fix: use custom_tokenizer to workaround the trtllm + glm5 tokenizer loading issue#20
ishandhanani merged 1 commit intoNVIDIA:mainfrom
richardhuo-nv:rihuo/fix_glm5_tokenizer_2

richardhuo-nv commented Apr 9, 2026

Uh oh!

codecov-commenter commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

richardhuo-nv commented Apr 9, 2026

Uh oh!

codecov-commenter commented Apr 9, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants