[Frontend] Introduce Renderer for processing chat messages (using ModelConfig)#30200
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Code Review
This pull request introduces a Renderer abstraction to process chat messages, which is a significant and well-executed refactoring. By moving tokenizer-specific and chat template logic into a new vllm.renderers module, the code becomes more modular and maintainable. The introduction of RendererLike protocol and RendererRegistry provides a clean interface and lazy registration mechanism. The changes are consistently applied across the codebase, including updates to entrypoints, tests, and input processing. This is a great step towards deprecating the old TokenizerRegistry and simplifying the chat message handling pipeline.
I found one minor issue related to a leftover ThreadPoolExecutor which I've commented on. Otherwise, the changes look solid.
There was a problem hiding this comment.
💡 Codex Review
vllm/vllm/tokenizers/registry.py
Lines 135 to 140 in f73bafb
cached_tokenizer_from_config still forwards model_config.tokenizer plus a tokenizer_mode kwarg to cached_get_tokenizer, but get_tokenizer now expects (tokenizer_cls, tokenizer_name, …) and no longer accepts tokenizer_mode (registry.py lines 66‑74). Any code path that loads a tokenizer via this helper (e.g., model executor classes calling cached_tokenizer_from_config) will now fail with missing positional/unknown keyword errors instead of initializing a tokenizer, blocking those models entirely.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Hi @DarkLight1337, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
Hi @DarkLight1337, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
ModelConfig)ModelConfig)
|
Awesome to hear landing! |
|
@DarkLight1337 Hi :) I think this PR introduced a failure in the Transformers Nightly Model test group: https://buildkite.com/vllm/ci/builds/48034/steps/canvas?jid=019be4d2-f92f-443c-a608-6881e01b533b#019be4d2-f92f-443c-a608-6881e01b533b |
Hi @DarkLight1337, after further investigation, bisection does show that this PR works on Transformers 4.57.3, but causes an error on Transformers 5 tip-of-tree (and thus breaks the Transformers Nightly Models test group). I'm not sure as to whether we maintain for Transformers Nightly Models given that by its nature, it breaks often? Would like to know your thoughts on this. Thanks! |
|
Follow-up: I updated my existing transformers PR to resolve the latest transformers regression: |
…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 陈建华 <1647430658@qq.com>
…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
For posterity, I'm currently the person looking at Transformers nightly tests. During the v4 v5 transition these tests have almost always been broken because there are lots of breaking (but much neeeded) changes in Transformers. After v5 the release cadence will be increased and the |
…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Ready for review.
Purpose
vllm.renderers.RendererLike, to process chat messages into engine inputs.RENDERER_REGISTRYwhich lazily registers renderers to avoid circular import problem.vllm.renderers.InputPreprocessor, replacing the tokenizer initialization insideLLMEngineandAsyncLLM.EngineClient.get_tokenizer()withEngineClient.renderer.get_tokenizer()to avoid unnecessary async.tests/renderersthat is run underAsync Engine, Inputs, Utils, Worker, Config Test (CPU).Towards #22880 and #23873
Future work:
CompletionRendererwith this implementation ofRenderer.apply_chat_templatefrom tokenizer into renderer.TokenizerRegistryin favor ofRENDERER_REGISTRYand renametokenizer_modetorenderer_mode.ModelConfig.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Note
Cursor Bugbot is generating a summary for commit 3c3c2ab. Configure here.
Note
Introduces a render pipeline that decouples prompt rendering from tokenizers and centralizes chat template logic.
RendererLikeinterface andRENDERER_REGISTRY; new implementations:HfRenderer,MistralRenderer,DeepseekV32Renderer,TerratorchRendererInputPreprocessornow builds a renderer fromModelConfig; engines (LLMEngine,AsyncLLM) andEngineClientexposerendererand no longer initialize/await tokenizers; replaceget_tokenizer()calls withrenderer.get_tokenizer()vllm/renderers/{hf,mistral}(safe_apply_chat_template, content format detection, kwargs filtering); keeps a deprecation shim inchat_utils; removes OpenAI server-sideprocess_chat_templatescore_utils) to use renderer-safe chat templating; fixes error messages whenskip_tokenizer_init=True; minor imports/migrations (e.g.,MultiModalUUIDDictpath)tests/renderers/{hf,mistral}.py, removes legacy tests, and runs newrendererssuite in Buildkite; adds mypy path forvllm/renderersWritten by Cursor Bugbot for commit 3c3c2ab. This will update automatically on new commits. Configure here.
Note
Cursor Bugbot is generating a summary for commit dbd9bbb. Configure here.
Note
Introduces a renderer pipeline to decouple prompt rendering from tokenizers and centralize chat template handling.
vllm.rendererswithRendererLike, registry, and implementations:HfRenderer,MistralRenderer,DeepseekV32RendererEngineClient.get_tokenizer()withengine_client.renderer.get_tokenizer(); exposerendereracross engines and servingvllm/renderers/*withsafe_apply_chat_template; keeps deprecation shim inchat_utilsInputPreprocessorto build renderer fromModelConfig; adjustsscore_utilsto use renderer-safe templating; improvesskip_tokenizer_initerror messagingEngineClientnow exposesrenderer; addsChatTemplateContentFormattype; minor path moves (e.g.,MultiModalUUIDDict)tests/renderers/*, updates OpenAI tests to use renderer, and runs new suite in Buildkite; adds mypy path forvllm/renderersWritten by Cursor Bugbot for commit dbd9bbb. This will update automatically on new commits. Configure here.
Note
Cursor Bugbot is generating a summary for commit 4899b9e. Configure here.
Note
Centralizes prompt rendering behind a new renderer interface and updates serving paths accordingly.
vllm.rendererswithRendererLike, registry, and implementations (HfRenderer,MistralRenderer,DeepseekV32Renderer); moves HF/Mistral chat template resolution, AST detection, kwargs filtering, and safe template application into renderersEngineClient.rendererand removeget_tokenizer; all OpenAI endpoints (chat/completions/responses/pooling/tokenize), pooling/classify/embed/score, and context now use the renderer and its tokenizer; adds warmup via rendererInputPreprocessorbuilds a renderer fromModelConfig; addsparse_chat_messages_async; refines error text forskip_tokenizer_initChatTemplateContentFormattype; moveMultiModalUUIDDictimport; simplify server chat template processingtests/renderers/*, drop legacy chat/engine tests, enable new suite in Buildkite, and includevllm/renderersin mypyWritten by Cursor Bugbot for commit 4899b9e. This will update automatically on new commits. Configure here.
Note
Cursor Bugbot is generating a summary for commit d431a11. Configure here.
Note
Introduces a renderer abstraction to decouple prompt rendering from tokenizers and centralize chat template logic.
vllm.renderers:RendererLikeprotocol, registry, and implementations (HfRenderer,MistralRenderer,DeepseekV32Renderer,Grok2Renderer); safe chat-template application, kwargs filtering, and content-format detection moved hereEngineClientnow exposesrenderer(removedget_tokenizer); all OpenAI endpoints (chat/completions/responses/pooling/tokenize) render via the renderer, with warmup and simplified tokenization pathsInputPreprocessorbuilds a renderer fromModelConfig; tokenizer access flows throughrenderer.get_tokenizer(); addparse_chat_messages_asyncChatTemplateContentFormat; deprecation shim inchat_utils; adjust error messages forskip_tokenizer_init; minor import path fixes (e.g.,MultiModalUUIDDict)tests/renderers/*, remove/adjust legacy tests, and run new suite in Buildkite; includevllm/renderersin mypyWritten by Cursor Bugbot for commit d431a11. This will update automatically on new commits. Configure here.
Note
Cursor Bugbot is generating a summary for commit f41d339. Configure here.
Note
Centralizes prompt rendering via a new
vllm.rendererslayer and removes direct tokenizer access from serving paths.RendererLike, registry, and implementations (HfRenderer,MistralRenderer,DeepseekV32Renderer,Grok2Renderer); move HF/Mistral chat-template resolution, kwargs filtering, and format detection into renderers withsafe_apply_chat_templateEngineClient.get_tokenizer()withengine_client.renderer.get_tokenizer(); exposerendereron engine/models; updateInputPreprocessorto build renderer fromModelConfigskip_tokenizer_init=Trueparse_chat_messages_async,ChatTemplateContentFormat; minor import path fixes (e.g.,MultiModalUUIDDict)tests/renderers/*, remove/migrate legacy tests; run new suite in Buildkite; includevllm/renderersin mypyWritten by Cursor Bugbot for commit f41d339. This will update automatically on new commits. Configure here.