[Bugfix]: Fix `TokenizerLike` interface by Rohan138 · Pull Request #30009 · vllm-project/vllm

Rohan138 · 2025-12-04T00:01:14Z

Purpose

Corrected+rebased version of #22121 to fix #22013. Note that this function is only called from RandomDataset.sample, which calls tokenizer.encode(prompt, add_special_tokens=False) (https://github.com/vllm-project/vllm/blob/main/vllm/benchmarks/datasets.py#L407). Hence, the tokenizer only adds the bos token; not eos,<INST>,</INST>.

Changes:

Add num_special_tokens_to_add to MistralTokenizer
update type hint and functions from PretrainedTokenizerBase -> TokenizerLike across all files in vllm/benchmark/*.py
Fix RandomDatasetForReranking.sample() and specify allowed_tokens

Test Plan

Currently, with vllm/vllm-openai:nightly:

vllm bench serve --model mistralai/Mixtral-8x22B-Instruct-v0.1 --percentile-metrics tpot,itl,e2el --dataset-name random --ignore-eos --max-concurrency 1 --num-prompts 10 --random-input-len 128 --random-output-len 128 --trust-remote-code --save-result --result-filename pyt_vllm_mixtral-8x22b_serving.json

Test Result

Namespace(subparser='bench', bench_type='serve', dispatch_function=<function BenchmarkServingSubcommand.cmd at 0x7fd11a144860>, seed=0, num_prompts=10, dataset_name='random', no_stream=False, dataset_path=None, no_oversample=False, skip_chat_template=False, disable_shuffle=False, custom_output_len=256, spec_bench_output_len=256, spec_bench_category=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, blazedit_min_distance=0.0, blazedit_max_distance=1.0, random_input_len=128, random_output_len=128, random_range_ratio=0.0, random_prefix_len=0, random_batch_size=1, no_reranker=False, random_mm_base_items_per_request=1, random_mm_num_mm_items_range_ratio=0.0, random_mm_limit_mm_per_prompt={'image': 255, 'video': 1}, random_mm_bucket_config={(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}, hf_subset=None, hf_split=None, hf_name=None, hf_output_len=None, prefix_repetition_prefix_len=256, prefix_repetition_suffix_len=256, prefix_repetition_num_prefixes=10, prefix_repetition_output_len=128, label=None, backend='openai', base_url=None, host='127.0.0.1', port=8000, endpoint='/v1/completions', header=None, max_concurrency=1, model='mistralai/Mixtral-8x22B-Instruct-v0.1', tokenizer=None, use_beam_search=False, logprobs=None, request_rate=inf, burstiness=1.0, trust_remote_code=True, disable_tqdm=False, num_warmups=0, profile=False, save_result=True, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename='pyt_vllm_mixtral-8x22b_serving.json', ignore_eos=True, percentile_metrics='tpot,itl,e2el', metric_percentiles='99', goodput=None, request_id_prefix='bench-d2d6d8f0-', top_p=None, top_k=None, min_p=None, temperature=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, tokenizer_mode='auto', served_model_name=None, lora_modules=None, ramp_up_strategy=None, ramp_up_start_rps=None, ramp_up_end_rps=None, ready_check_timeout_sec=600, extra_body=None)
[2025-12-03 15:55:01] WARNING utils.py:121: Multiple valid tokenizer files found. Using tokenizer.model.v3.
tokenizer.model.v3: 100%|██████████████████████████████████████████████████████████████████████████| 587k/587k [00:00<00:00, 711kB/s]
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
    args.dispatch_function(args)
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/benchmark/serve.py", line 21, in cmd
    main(args)
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/serve.py", line 1299, in main
    return asyncio.run(main_async(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/serve.py", line 1373, in main_async
    input_requests = get_samples(args, tokenizer)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/datasets.py", line 1910, in get_samples
    input_requests = dataset_mapping[args.dataset_name]()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/datasets.py", line 1845, in <lambda>
    ).sample(
      ^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/datasets.py", line 483, in sample
    num_special = int(tokenizer.num_special_tokens_to_add())
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'MistralTokenizer' object has no attribute 'num_special_tokens_to_add'

With this PR:

INFO 12-03 23:53:51 [datasets.py:603] Sampling input_len from [127, 127] and output_len from [128, 128]
Starting initial single prompt test run...
Waiting for endpoint to become up in 600 seconds

Minimal example of the bos token being added to an empty prompt:

>>> from vllm.transformers_utils.tokenizers.mistral import MistralTokenizer
>>> tokenizer = MistralTokenizer.from_pretrained("mistralai/Mixtral-8x22B-Instruct-v0.1")
Multiple tokenizer files found for model ID: mistralai/Mixtral-8x22B-Instruct-v0.1. Using tokenizer.model.v3.
>>> tokenizer.encode("")
[1]

Note that before #29693, vLLM would incorrectly default to the HF tokenizer even if mistral_common was installed, which hid the issue unless you explicitly specified --tokenizer-mode mistral during benchmarking. However, for recent nightly builds, this breaks since the tokenizer correctly defaults to the MistralTokenizer backend when using --tokenizer-mode auto.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

gemini-code-assist

Code Review

This pull request fixes a crash that occurs when using MistralTokenizer in benchmarks by adding the missing num_special_tokens_to_add method. My review focuses on making the implementation of this new method more robust and maintainable. I've suggested deriving the return value dynamically rather than hardcoding it, which will prevent silent bugs if the tokenizer's encoding behavior changes in the future.

vllm/tokenizers/mistral.py

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

vllm/tokenizers/mistral.py

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

vllm/benchmarks/datasets.py

vllm/benchmarks/serve.py

vllm/benchmarks/throughput.py

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

vllm/benchmarks/throughput.py

vllm/tokenizers/mistral.py

vllm/tokenizers/protocol.py

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

DarkLight1337 · 2025-12-05T09:02:59Z

Please fix pre-commit

mergify · 2025-12-05T13:06:28Z

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

add num_special_tokens_to_add

59ba19e

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

vllm/tokenizers/mistral.py Outdated Show resolved Hide resolved

Rohan138 added 2 commits December 3, 2025 18:11

add num_special_tokens_to_add

a301418

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

add num_special_tokens_to_add

fe65eab

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

DarkLight1337 reviewed Dec 4, 2025

View reviewed changes

vllm/tokenizers/mistral.py Show resolved Hide resolved

switch all type hints to use TokenizerLike

491ae44

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Rohan138 requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners December 4, 2025 21:53

mergify bot added the performance Performance-related issues label Dec 4, 2025