tokenizer: Add fastokens support by AlonKejzman · Pull Request #41741 · vllm-project/vllm

AlonKejzman · 2026-05-05T14:22:56Z

Purpose

Purpose
Adds a new --tokenizer-backend argument that selects the engine powering the HuggingFace tokenizer. Two values are supported:

huggingface (default) - the standard tokenizers library, current behavior.
fastokens - uses the fastokens backend.

tokenizer_backend is orthogonal to tokenizer_mode: it only takes effect when the resolved mode is "hf". Non-HF modes (mistral, deepseek_v32, etc.) ignore it and continue to use their own tokenizer engines.

The fastokens package is imported lazily; if it isn't installed, a clear ImportError is raised only when the user opts in.

No existing issue — opening this as a feature addition. Searched open PRs for tokenizer-backend / fastokens and found no duplicate work.

Test Plan

# 1. Default backend unchanged (regression check)
python3 -m pytest tests/tokenizers_/ -v

# 2. Smoke test: load a BPE model with each backend and compare outputs
from vllm.tokenizers import get_tokenizer

prompt = "The quick brown fox jumps over the lazy dog."
hf = get_tokenizer("Qwen/Qwen3-0.6B", tokenizer_backend="huggingface")
fk = get_tokenizer("Qwen/Qwen3-0.6B", tokenizer_backend="fastokens")
assert hf.encode(prompt) == fk.encode(prompt), "encode mismatch"
assert hf.decode(hf.encode(prompt)) == fk.decode(fk.encode(prompt))
print("OK")

# 3. End-to-end serve check
vllm serve Qwen/Qwen3.5-35B-A3B --enable-prefix-caching --tokenizer-backend fastokens
# then run GSM8K (using lm_eval) + performance (using vllm bench), and compare results

# 4. Confirm non-HF modes ignore the flag
vllm serve mistralai/Mistral-7B-Instruct-v0.3 \
    --tokenizer-mode mistral --tokenizer-backend fastokens
# (should load the mistral_common tokenizer, no fastokens patch applied)

# 5. Missing-package error path
uv pip uninstall fastokens
vllm serve Qwen/Qwen3.5-35B-A3B --tokenizer-backend fastokens
# expect: ImportError: The 'fastokens' package is required ...

Test Result

1 - All tests passed save for those behind gated repos
2 - OK
3 - Same GSM8K scores (~0.86), 10% reduction in TTFT on 32K prompt with 30K shared prefix
4 - OK
5 - OK

Essential Elements of an Effective PR Description Checklist

[*] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[*] The test plan, such as providing test command.
[*] The test results, such as pasting the results comparison before and after, or e2e results
[*] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

mergify · 2026-05-05T14:23:46Z

Documentation preview: https://vllm--41741.org.readthedocs.build/en/41741/

gemini-code-assist

Code Review

This pull request introduces a new tokenizer_backend configuration option, allowing users to choose between the default Hugging Face tokenizers library and the fastokens Rust backend for BPE tokenizers. The implementation includes documentation updates, CLI and API argument additions, and logic to apply fastokens monkey-patches when enabled. I have no feedback to provide.

Signed-off-by: AlonKejzman <alonkeizman@gmail.com>

tjtanaa · 2026-05-07T16:35:07Z

@AlonKejzman when I launch gptoss with fastokens I am getting this error.

(APIServer pid=8195)   File "/app/vllmaitercheck/vllmnew/vllm/renderers/registry.py", line 87, in renderer_from_config
(APIServer pid=8195)     return RENDERER_REGISTRY.load_renderer(renderer_mode, config, tokenizer)
(APIServer pid=8195)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8195)   File "/app/vllmaitercheck/vllmnew/vllm/renderers/registry.py", line 68, in load_renderer
(APIServer pid=8195)     renderer_cls = self.load_renderer_cls(renderer_mode)
(APIServer pid=8195)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8195)   File "/app/vllmaitercheck/vllmnew/vllm/renderers/registry.py", line 55, in load_renderer_cls
(APIServer pid=8195)     raise ValueError(f"No renderer registered for {renderer_mode=!r}.")
(APIServer pid=8195) ValueError: No renderer registered for renderer_mode='fastokens'.

Signed-off-by: AlonKejzman <alonkeizman@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: Libin Tang <libin.tang@intel.com>

AlonKejzman requested review from DarkLight1337, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners May 5, 2026 14:22

claude Bot reviewed May 5, 2026

View reviewed changes

mergify Bot added documentation Improvements or additions to documentation v1 labels May 5, 2026

gemini-code-assist Bot reviewed May 5, 2026

View reviewed changes

tokenizer: Add fastokens support

376ee65

Signed-off-by: AlonKejzman <alonkeizman@gmail.com>

AlonKejzman force-pushed the akejzman/fastokens branch from a3e6c01 to 376ee65 Compare May 5, 2026 14:35

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label May 5, 2026

Merge branch 'main' into akejzman/fastokens

29654ee

DarkLight1337 reviewed May 6, 2026

View reviewed changes

Comment thread docs/configuration/optimization.md Outdated

BugenZhao reviewed May 6, 2026

View reviewed changes

Comment thread vllm/config/model.py Outdated

cr: Fix CR comments

51c9ac4

Signed-off-by: AlonKejzman <alonkeizman@gmail.com>

AlonKejzman force-pushed the akejzman/fastokens branch from d2e121c to 51c9ac4 Compare May 6, 2026 10:22

DarkLight1337 approved these changes May 7, 2026

View reviewed changes

DarkLight1337 merged commit 2a16ece into vllm-project:main May 7, 2026
61 checks passed

tjtanaa mentioned this pull request May 7, 2026

[Bugifx] Missing Renderer for fastokens mode #41984

Merged

4 tasks

libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026

tokenizer: Add fastokens support (vllm-project#41741)

de5c3c1

Signed-off-by: AlonKejzman <alonkeizman@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: Libin Tang <libin.tang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tokenizer: Add fastokens support#41741

tokenizer: Add fastokens support#41741
DarkLight1337 merged 3 commits intovllm-project:mainfrom
AlonKejzman:akejzman/fastokens

AlonKejzman commented May 5, 2026 •

edited by github-actions Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

mergify Bot commented May 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjtanaa commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

AlonKejzman commented May 5, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

1 - All tests passed save for those behind gated repos 2 - OK 3 - Same GSM8K scores (~0.86), 10% reduction in TTFT on 32K prompt with 30K shared prefix 4 - OK 5 - OK

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

mergify Bot commented May 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjtanaa commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AlonKejzman commented May 5, 2026 •

edited by github-actions Bot

Loading

1 - All tests passed save for those behind gated repos
2 - OK
3 - Same GSM8K scores (~0.86), 10% reduction in TTFT on 32K prompt with 30K shared prefix
4 - OK
5 - OK