Remove `all_special_tokens_extended` from tokenizer code by hmellor · Pull Request #29686 · vllm-project/vllm

hmellor · 2025-11-28T15:46:26Z

This attribute is:

equivalent to all_special_tokens
not actually used anywhere in vLLM
deleted in Transformers v5 by rm slow tokenizers huggingface/transformers#40936 (so refraining from accessing it enables forward compatibility with Transformers v5)

Should resolve the failures in https://buildkite.com/vllm/ci/builds/41020/steps/canvas?sid=019ac943-dca1-43d6-ad64-ca0187f645bc

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

chatgpt-codex-connector · 2025-11-28T15:48:56Z

💡 Codex Review

https://github.com/vllm-project/vllm/blob/44491395c266dc41b892fd8a98f25fa50b83d7ee/vllm/transformers_utils/config.py#L460-L461
Handle legacy rope_theta on transformers>=5

In patch_rope_parameters the transformers>=5 branch now no-ops, so legacy configs that still expose config.rope_theta (common for custom transformers v4 models) are no longer converted into rope_parameters or validated. When running with transformers v5, such configs keep rope_parameters unset, so later consumers (e.g., prepare_hf_config and max length scaling) drop the provided RoPE scaling/θ information and may initialize RoPE with incorrect defaults. Please keep the standardization/validation here for v5 the same way the v4 branch still does.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

DarkLight1337

I was planning on removing it as well since it's unused in vLLM, thanks for doing it!

hmellor · 2025-11-28T16:01:50Z

I've enabled the Transformers nightly tests to verify that the errors relating to this attr are gone https://buildkite.com/vllm/ci/builds/41093/steps/canvas?sid=019acb27-ad2f-4514-ab1e-68141500585c

gemini-code-assist

Code Review

This pull request correctly removes the all_special_tokens_extended attribute, which simplifies the tokenizer-related code and improves forward compatibility with Transformers v5. The changes are applied consistently across tests and implementations. However, the pull request also includes undocumented changes to RoPE parameter handling in vllm/transformers_utils/config.py and vllm/transformers_utils/configs/qwen3_next.py. Specifically, a block of code for Transformers v5 compatibility has been commented out. While this might be intentional, leaving commented-out code reduces maintainability. I've added a suggestion to remove it. It would be beneficial to update the pull request description to cover all changes for better clarity.

…t#29686) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…t#29686) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>

…t#29686) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

hmellor requested a review from patrickvonplaten as a code owner November 28, 2025 15:46

Remove all_special_tokens_extended from tokenizer code

56f2482

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor force-pushed the remove-tokenizer-field branch from 4449139 to 56f2482 Compare November 28, 2025 15:47

hmellor added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 28, 2025

DarkLight1337 approved these changes Nov 28, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 28, 2025 15:59

gemini-code-assist bot reviewed Nov 28, 2025

View reviewed changes

jeejeelee approved these changes Nov 28, 2025

View reviewed changes

DarkLight1337 mentioned this pull request Nov 28, 2025

[Misc] Refactor tokenizer interface #29693

Merged

5 tasks

DarkLight1337 merged commit fecae12 into vllm-project:main Nov 28, 2025
52 checks passed

hmellor deleted the remove-tokenizer-field branch November 28, 2025 20:31

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025

Remove all_special_tokens_extended from tokenizer code (vllm-projec…

2a40e72

…t#29686) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove `all_special_tokens_extended` from tokenizer code#29686

Remove `all_special_tokens_extended` from tokenizer code#29686
DarkLight1337 merged 1 commit intovllm-project:mainfrom
hmellor:remove-tokenizer-field

hmellor commented Nov 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Nov 28, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

hmellor commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hmellor commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Nov 28, 2025

💡 Codex Review

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hmellor commented Nov 28, 2025 •

edited by github-actions bot

Loading