Add ChatCompletionRequest-style support to /v1/tokenize by huangtingwei9988 · Pull Request #23981 · sgl-project/sglang

huangtingwei9988 · 2026-04-29T02:37:25Z

Motivation

/v1/tokenize previously only accepted raw string prompts, which made it difficult to inspect the actual token sequence used by /v1/chat/completions.

When we aim to build a cache-aware system using KV events, we need to obtain the actual token sequence resulting from rendering the model's chat template.

This change allows /v1/tokenize to accept ChatCompletion-style messages input and return token IDs consistent with the chat completion path.

Qwen3.5-9B

Without tools

curl -sS http://127.0.0.1:31080/v1/tokenize \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen-tokenize-test","messages":[{"role":"system","content":"You are concise."},{"role":"user","content":"Hello world"}],"max_tokens":1}'

{"tokens":[248045,8678,198,2523,513,61446,13,248046,198,248045,846,198,9419,1814,248046,198,248045,74455,198,248068,198],"count":21,"max_model_len":262144}

With tools

curl -sS http://127.0.0.1:31080/v1/tokenize \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3___5-9B","messages":[{"role":"user","content":"What is the weather in Paris?"}],"tools":[{"type":"function","function":{"name":"get_weather","description":"Get weather for a city.","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}]}'

{"tokens":[248045,8678,198,2,13455,271,2523,599,2528,310,279,2614,5568,25,271,27,15449,29,198,4754,1267,763,328,1628,487,328,1628,763,5046,4532,763,328,1882,8831,364,264,3177,10152,328,591,763,328,447,67017,487,328,13390,763,5046,1267,763,328,1640,487,328,12811,763,5046,8656,763,5046,1267,763,328,889,8934,2069,328,6081,763,4241,8656,1293,2069,328,6418,763,867,2069,328,60003,55791,763,819,92,198,510,15449,29,271,2592,488,4992,310,1562,264,709,25835,9559,303,279,2614,3443,440,5486,19900,25,271,248058,198,27,1628,28,8422,8901,1224,29,198,27,15704,28,8422,24109,62,16,29,198,927,62,16,198,510,15704,29,198,27,15704,28,8422,24109,62,17,29,198,1919,369,279,869,364,279,2018,5555,198,8761,628,9111,198,34493,4965,198,510,15704,29,198,510,1628,29,198,248059,271,27,95328,29,198,92065,25,198,12,5534,6526,26834,1732,279,5024,3443,25,449,8906,361,1628,28,1076,1419,1628,29,2424,1902,381,23283,2785,220,248058,248059,11535,9212,198,12,12296,4868,26834,381,5024,198,12,1394,1189,3300,9801,31626,364,678,709,1562,303,5629,3992,54588,279,709,1562,11,694,4045,1238,198,12,1368,1017,369,874,709,1562,2420,11,4087,279,3296,1040,4472,440,678,1428,6337,321,635,524,3184,279,1156,883,709,6526,198,510,95328,29,248046,198,248045,846,198,3710,369,279,8831,303,11751,30,248046,198,248045,74455,198,248068,198],"count":285,"max_model_len":262144}

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-29T02:37:28Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

huangtingwei9988 · 2026-04-29T02:40:49Z

/tag-and-rerun-ci

doujiang24 · 2026-04-29T06:52:08Z

Great job, we are using this API to work with dynamo kvindexer for precise cache-aware routing.
cc @ishandhanani @ShangmingCai @stmatengss

stmatengss · 2026-04-29T08:39:59Z

Great job, we are using this API to work with dynamo kvindexer for precise cache-aware routing. cc @ishandhanani @ShangmingCai @stmatengss

Nice, this is helpful for cache-aware routing in general. On the sglang side, we also have KV cache event emission that can work together with this tokenize API for prefix-matching based routing. Good to see the integration with dynamo kvindexer moving forward!

ShangmingCai

Looks good.

CC: @CatherineSue, do you have time to review this PR?

ishandhanani · 2026-04-29T21:31:26Z

LGTM!

ishandhanani · 2026-04-29T21:31:57Z

I don't see any CI's failing from this PR. Ok if I merge @ShangmingCai ?

huangtingwei9988 · 2026-04-30T02:13:41Z

/rerun-failed-ci

ShangmingCai · 2026-04-30T04:47:38Z

Should we add a test for this API?

huangtingwei9988 · 2026-05-01T04:16:14Z

/rerun-failed-ci

huangtingwei9988 · 2026-05-03T15:00:37Z

/rerun-failed-ci

ishandhanani · 2026-05-05T17:44:24Z

/rerun-failed-ci

huangtingwei9988 · 2026-05-06T02:34:16Z

/rerun-failed-ci

huangtingwei9988 · 2026-05-06T06:19:42Z

/rerun-failed-ci

* main: (894 commits) [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715) [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268) propagate pytest exit code from test __main__ entries (sgl-project#24487) [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550) Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981) Support Triton MLA FP8 KV cache (sgl-project#20479) [diffusion] chore: align LTX-2 with official (sgl-project#24313) Expand support matrix for pypi wheel release (sgl-project#24565) [codex] Optimize Z-Image packed QKV (sgl-project#24117) [Misc] Fix breaking weight checker test (sgl-project#24553) [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420) ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551) [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279) Improve metrics, observability, and PD deploy tooling (sgl-project#24521) Fix diffusion fallback guards and validation (sgl-project#23335) [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539) [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040) Support getting checksums in weight checker (sgl-project#24537) Refactor buffer patterns in weight checker (sgl-project#24538) Add unit and end-to-end tests for weight checker (sgl-project#24536) ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/model_executor/model_runner.py

…23981)

Add ChatCompletionRequest-style support to /v1/tokenize

82c4fa3

huangtingwei9988 requested review from CatherineSue, JustinTong0323, ispobock, merrymercy and slin1237 as code owners April 29, 2026 02:37

github-actions Bot added the run-ci label Apr 29, 2026

huangtingwei9988 added 2 commits April 29, 2026 11:04

fix lint

bfcfd83

add tools

7ddce72

huangtingwei9988 assigned ishandhanani Apr 29, 2026

ShangmingCai reviewed Apr 29, 2026

View reviewed changes

ShangmingCai assigned CatherineSue Apr 29, 2026

ishandhanani approved these changes Apr 29, 2026

View reviewed changes

ShangmingCai approved these changes Apr 30, 2026

View reviewed changes

add api test

8aab17b

ishandhanani merged commit 27445f9 into sgl-project:main May 7, 2026
364 of 417 checks passed

LLThomas pushed a commit to LLThomas/sglang that referenced this pull request May 8, 2026

Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#…

0ac6f65

…23981)

Conversation

huangtingwei9988 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Qwen3.5-9B

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented Apr 29, 2026

Uh oh!

huangtingwei9988 commented Apr 29, 2026

Uh oh!

doujiang24 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stmatengss commented Apr 29, 2026

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

ishandhanani commented Apr 29, 2026

Uh oh!

ishandhanani commented Apr 29, 2026

Uh oh!

huangtingwei9988 commented Apr 30, 2026

Uh oh!

ShangmingCai commented Apr 30, 2026

Uh oh!

huangtingwei9988 commented May 1, 2026

Uh oh!

huangtingwei9988 commented May 3, 2026

Uh oh!

ishandhanani commented May 5, 2026

Uh oh!

huangtingwei9988 commented May 6, 2026

Uh oh!

huangtingwei9988 commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

huangtingwei9988 commented Apr 29, 2026 •

edited

Loading

doujiang24 commented Apr 29, 2026 •

edited

Loading