Skip to content

Comments

[Bugfix]: Fix TokenizerLike interface#30009

Merged
vllm-bot merged 8 commits intovllm-project:mainfrom
ROCm:fix_mixtral_tokenizer
Dec 6, 2025
Merged

[Bugfix]: Fix TokenizerLike interface#30009
vllm-bot merged 8 commits intovllm-project:mainfrom
ROCm:fix_mixtral_tokenizer

Conversation

@Rohan138
Copy link
Contributor

@Rohan138 Rohan138 commented Dec 4, 2025

Purpose

Corrected+rebased version of #22121 to fix #22013. Note that this function is only called from RandomDataset.sample, which calls tokenizer.encode(prompt, add_special_tokens=False) (https://github.com/vllm-project/vllm/blob/main/vllm/benchmarks/datasets.py#L407). Hence, the tokenizer only adds the bos token; not eos,<INST>,</INST>.

Changes:

  • Add num_special_tokens_to_add to MistralTokenizer
  • update type hint and functions from PretrainedTokenizerBase -> TokenizerLike across all files in vllm/benchmark/*.py
  • Fix RandomDatasetForReranking.sample() and specify allowed_tokens

Test Plan

Currently, with vllm/vllm-openai:nightly:

vllm bench serve --model mistralai/Mixtral-8x22B-Instruct-v0.1 --percentile-metrics tpot,itl,e2el --dataset-name random --ignore-eos --max-concurrency 1 --num-prompts 10 --random-input-len 128 --random-output-len 128 --trust-remote-code --save-result --result-filename pyt_vllm_mixtral-8x22b_serving.json

Test Result

Namespace(subparser='bench', bench_type='serve', dispatch_function=<function BenchmarkServingSubcommand.cmd at 0x7fd11a144860>, seed=0, num_prompts=10, dataset_name='random', no_stream=False, dataset_path=None, no_oversample=False, skip_chat_template=False, disable_shuffle=False, custom_output_len=256, spec_bench_output_len=256, spec_bench_category=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, blazedit_min_distance=0.0, blazedit_max_distance=1.0, random_input_len=128, random_output_len=128, random_range_ratio=0.0, random_prefix_len=0, random_batch_size=1, no_reranker=False, random_mm_base_items_per_request=1, random_mm_num_mm_items_range_ratio=0.0, random_mm_limit_mm_per_prompt={'image': 255, 'video': 1}, random_mm_bucket_config={(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}, hf_subset=None, hf_split=None, hf_name=None, hf_output_len=None, prefix_repetition_prefix_len=256, prefix_repetition_suffix_len=256, prefix_repetition_num_prefixes=10, prefix_repetition_output_len=128, label=None, backend='openai', base_url=None, host='127.0.0.1', port=8000, endpoint='/v1/completions', header=None, max_concurrency=1, model='mistralai/Mixtral-8x22B-Instruct-v0.1', tokenizer=None, use_beam_search=False, logprobs=None, request_rate=inf, burstiness=1.0, trust_remote_code=True, disable_tqdm=False, num_warmups=0, profile=False, save_result=True, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename='pyt_vllm_mixtral-8x22b_serving.json', ignore_eos=True, percentile_metrics='tpot,itl,e2el', metric_percentiles='99', goodput=None, request_id_prefix='bench-d2d6d8f0-', top_p=None, top_k=None, min_p=None, temperature=None, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, tokenizer_mode='auto', served_model_name=None, lora_modules=None, ramp_up_strategy=None, ramp_up_start_rps=None, ramp_up_end_rps=None, ready_check_timeout_sec=600, extra_body=None)
[2025-12-03 15:55:01] WARNING utils.py:121: Multiple valid tokenizer files found. Using tokenizer.model.v3.
tokenizer.model.v3: 100%|██████████████████████████████████████████████████████████████████████████| 587k/587k [00:00<00:00, 711kB/s]
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
    args.dispatch_function(args)
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/benchmark/serve.py", line 21, in cmd
    main(args)
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/serve.py", line 1299, in main
    return asyncio.run(main_async(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/serve.py", line 1373, in main_async
    input_requests = get_samples(args, tokenizer)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/datasets.py", line 1910, in get_samples
    input_requests = dataset_mapping[args.dataset_name]()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/datasets.py", line 1845, in <lambda>
    ).sample(
      ^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/datasets.py", line 483, in sample
    num_special = int(tokenizer.num_special_tokens_to_add())
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'MistralTokenizer' object has no attribute 'num_special_tokens_to_add'

With this PR:

INFO 12-03 23:53:51 [datasets.py:603] Sampling input_len from [127, 127] and output_len from [128, 128]
Starting initial single prompt test run...
Waiting for endpoint to become up in 600 seconds

Minimal example of the bos token being added to an empty prompt:

>>> from vllm.transformers_utils.tokenizers.mistral import MistralTokenizer
>>> tokenizer = MistralTokenizer.from_pretrained("mistralai/Mixtral-8x22B-Instruct-v0.1")
Multiple tokenizer files found for model ID: mistralai/Mixtral-8x22B-Instruct-v0.1. Using tokenizer.model.v3.
>>> tokenizer.encode("")
[1]

Note that before #29693, vLLM would incorrectly default to the HF tokenizer even if mistral_common was installed, which hid the issue unless you explicitly specified --tokenizer-mode mistral during benchmarking. However, for recent nightly builds, this breaks since the tokenizer correctly defaults to the MistralTokenizer backend when using --tokenizer-mode auto.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a crash that occurs when using MistralTokenizer in benchmarks by adding the missing num_special_tokens_to_add method. My review focuses on making the implementation of this new method more robust and maintainable. I've suggested deriving the return value dynamically rather than hardcoding it, which will prevent silent bugs if the tokenizer's encoding behavior changes in the future.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@Rohan138 Rohan138 changed the title [Bugfix]: Add num_special_tokens_to_add to MistralTokenizer [Bugfix]: Add num_special_tokens_to_add to MistralTokenizer, update PretrainedTokenizerBase -> TokenizerLike Dec 4, 2025
@Rohan138 Rohan138 changed the title [Bugfix]: Add num_special_tokens_to_add to MistralTokenizer, update PretrainedTokenizerBase -> TokenizerLike [Bugfix]: Fix TokenizerLike interface Dec 5, 2025
@DarkLight1337
Copy link
Member

Please fix pre-commit

@mergify
Copy link

mergify bot commented Dec 5, 2025

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 6, 2025 02:44
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2025
@vllm-bot vllm-bot merged commit 40a046c into vllm-project:main Dec 6, 2025
46 of 50 checks passed
@gshtras gshtras deleted the fix_mixtral_tokenizer branch December 9, 2025 22:38
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Random Dataset Serve Benchmark throws AttributeError when using MistralTokenizer

3 participants