Skip to content

tokenizer: Add fastokens support#41741

Merged
DarkLight1337 merged 3 commits intovllm-project:mainfrom
AlonKejzman:akejzman/fastokens
May 7, 2026
Merged

tokenizer: Add fastokens support#41741
DarkLight1337 merged 3 commits intovllm-project:mainfrom
AlonKejzman:akejzman/fastokens

Conversation

@AlonKejzman
Copy link
Copy Markdown
Contributor

@AlonKejzman AlonKejzman commented May 5, 2026

Purpose

Purpose
Adds a new --tokenizer-backend argument that selects the engine powering the HuggingFace tokenizer. Two values are supported:

  • huggingface (default) - the standard tokenizers library, current behavior.
  • fastokens - uses the fastokens backend.

tokenizer_backend is orthogonal to tokenizer_mode: it only takes effect when the resolved mode is "hf". Non-HF modes (mistral, deepseek_v32, etc.) ignore it and continue to use their own tokenizer engines.

The fastokens package is imported lazily; if it isn't installed, a clear ImportError is raised only when the user opts in.

No existing issue — opening this as a feature addition. Searched open PRs for tokenizer-backend / fastokens and found no duplicate work.

Test Plan

# 1. Default backend unchanged (regression check)
python3 -m pytest tests/tokenizers_/ -v

# 2. Smoke test: load a BPE model with each backend and compare outputs
from vllm.tokenizers import get_tokenizer

prompt = "The quick brown fox jumps over the lazy dog."
hf = get_tokenizer("Qwen/Qwen3-0.6B", tokenizer_backend="huggingface")
fk = get_tokenizer("Qwen/Qwen3-0.6B", tokenizer_backend="fastokens")
assert hf.encode(prompt) == fk.encode(prompt), "encode mismatch"
assert hf.decode(hf.encode(prompt)) == fk.decode(fk.encode(prompt))
print("OK")

# 3. End-to-end serve check
vllm serve Qwen/Qwen3.5-35B-A3B --enable-prefix-caching --tokenizer-backend fastokens
# then run GSM8K (using lm_eval) + performance (using vllm bench), and compare results

# 4. Confirm non-HF modes ignore the flag
vllm serve mistralai/Mistral-7B-Instruct-v0.3 \
    --tokenizer-mode mistral --tokenizer-backend fastokens
# (should load the mistral_common tokenizer, no fastokens patch applied)

# 5. Missing-package error path
uv pip uninstall fastokens
vllm serve Qwen/Qwen3.5-35B-A3B --tokenizer-backend fastokens
# expect: ImportError: The 'fastokens' package is required ...

Test Result

1 - All tests passed save for those behind gated repos
2 - OK
3 - Same GSM8K scores (~0.86), 10% reduction in TTFT on 32K prompt with 30K shared prefix
4 - OK
5 - OK

Essential Elements of an Effective PR Description Checklist
  • [*] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • [*] The test plan, such as providing test command.
  • [*] The test results, such as pasting the results comparison before and after, or e2e results
  • [*] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 5, 2026

Documentation preview: https://vllm--41741.org.readthedocs.build/en/41741/

@mergify mergify Bot added documentation Improvements or additions to documentation v1 labels May 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new tokenizer_backend configuration option, allowing users to choose between the default Hugging Face tokenizers library and the fastokens Rust backend for BPE tokenizers. The implementation includes documentation updates, CLI and API argument additions, and logic to apply fastokens monkey-patches when enabled. I have no feedback to provide.

Signed-off-by: AlonKejzman <alonkeizman@gmail.com>
@AlonKejzman AlonKejzman force-pushed the akejzman/fastokens branch from a3e6c01 to 376ee65 Compare May 5, 2026 14:35
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label May 5, 2026
Comment thread docs/configuration/optimization.md Outdated
Comment thread vllm/config/model.py Outdated
Signed-off-by: AlonKejzman <alonkeizman@gmail.com>
@AlonKejzman AlonKejzman force-pushed the akejzman/fastokens branch from d2e121c to 51c9ac4 Compare May 6, 2026 10:22
@DarkLight1337 DarkLight1337 merged commit 2a16ece into vllm-project:main May 7, 2026
61 checks passed
@tjtanaa
Copy link
Copy Markdown
Collaborator

tjtanaa commented May 7, 2026

@AlonKejzman when I launch gptoss with fastokens I am getting this error.

(APIServer pid=8195)   File "/app/vllmaitercheck/vllmnew/vllm/renderers/registry.py", line 87, in renderer_from_config
(APIServer pid=8195)     return RENDERER_REGISTRY.load_renderer(renderer_mode, config, tokenizer)
(APIServer pid=8195)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8195)   File "/app/vllmaitercheck/vllmnew/vllm/renderers/registry.py", line 68, in load_renderer
(APIServer pid=8195)     renderer_cls = self.load_renderer_cls(renderer_mode)
(APIServer pid=8195)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8195)   File "/app/vllmaitercheck/vllmnew/vllm/renderers/registry.py", line 55, in load_renderer_cls
(APIServer pid=8195)     raise ValueError(f"No renderer registered for {renderer_mode=!r}.")
(APIServer pid=8195) ValueError: No renderer registered for renderer_mode='fastokens'.

libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026
Signed-off-by: AlonKejzman <alonkeizman@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Libin Tang <libin.tang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants