[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) by DarkLight1337 · Pull Request #30200 · vllm-project/vllm

DarkLight1337 · 2025-12-07T08:30:34Z

Ready for review.

Purpose

Prototype an interface, vllm.renderers.RendererLike, to process chat messages into engine inputs.
Introduce RENDERER_REGISTRY which lazily registers renderers to avoid circular import problem.
Move implementation-specific chat utils to the corresponding renderer in vllm.renderers.
Initialize the renderer in InputPreprocessor, replacing the tokenizer initialization inside LLMEngine and AsyncLLM.
Replace EngineClient.get_tokenizer() with EngineClient.renderer.get_tokenizer() to avoid unnecessary async.
Update tests accordingly, and move some tests into a new directory tests/renderers that is run under Async Engine, Inputs, Utils, Worker, Config Test (CPU).

Towards #22880 and #23873

Future work:

Merge CompletionRenderer with this implementation of Renderer.
Move apply_chat_template from tokenizer into renderer.
Move microbatch tokenizer into renderer.
Since each renderer uses a specific tokenizer, we can deprecate TokenizerRegistry in favor of RENDERER_REGISTRY and rename tokenizer_mode to renderer_mode.
Split out renderer-specific fields from ModelConfig.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

^{Cursor Bugbot is generating a summary for commit 3c3c2ab. Configure here.}

Note

Introduces a render pipeline that decouples prompt rendering from tokenizers and centralizes chat template logic.

Adds RendererLike interface and RENDERER_REGISTRY; new implementations: HfRenderer, MistralRenderer, DeepseekV32Renderer, TerratorchRenderer
InputPreprocessor now builds a renderer from ModelConfig; engines (LLMEngine, AsyncLLM) and EngineClient expose renderer and no longer initialize/await tokenizers; replace get_tokenizer() calls with renderer.get_tokenizer()
Moves HF/Mistral chat template resolution and application to vllm/renderers/{hf,mistral} (safe_apply_chat_template, content format detection, kwargs filtering); keeps a deprecation shim in chat_utils; removes OpenAI server-side process_chat_template
Refactors OpenAI serving (chat/completions/responses/pooling/tokenize) to render prompts via renderer, adds warmup using renderer, and adjusts tool parsing/tokenization flows
Updates scoring (score_utils) to use renderer-safe chat templating; fixes error messages when skip_tokenizer_init=True; minor imports/migrations (e.g., MultiModalUUIDDict path)
Testing/CI: adds tests/renderers/{hf,mistral}.py, removes legacy tests, and runs new renderers suite in Buildkite; adds mypy path for vllm/renderers

^{Written by Cursor Bugbot for commit 3c3c2ab. This will update automatically on new commits. Configure here.}

Note

^{Cursor Bugbot is generating a summary for commit dbd9bbb. Configure here.}

Note

Introduces a renderer pipeline to decouple prompt rendering from tokenizers and centralize chat template handling.

Adds vllm.renderers with RendererLike, registry, and implementations: HfRenderer, MistralRenderer, DeepseekV32Renderer
Replaces EngineClient.get_tokenizer() with engine_client.renderer.get_tokenizer(); expose renderer across engines and serving
Moves HF/Mistral chat template resolution/AST detection/kwargs filtering into vllm/renderers/* with safe_apply_chat_template; keeps deprecation shim in chat_utils
Refactors OpenAI endpoints (chat/completions/responses/pooling/tokenize) to use renderer for prompt rendering; adds warmup; simplifies tokenization paths
Updates InputPreprocessor to build renderer from ModelConfig; adjusts score_utils to use renderer-safe templating; improves skip_tokenizer_init error messaging
Protocol/API updates: EngineClient now exposes renderer; adds ChatTemplateContentFormat type; minor path moves (e.g., MultiModalUUIDDict)
Tests/CI: removes legacy chat/engine tests, adds tests/renderers/*, updates OpenAI tests to use renderer, and runs new suite in Buildkite; adds mypy path for vllm/renderers

^{Written by Cursor Bugbot for commit dbd9bbb. This will update automatically on new commits. Configure here.}

Note

^{Cursor Bugbot is generating a summary for commit 4899b9e. Configure here.}

Note

Centralizes prompt rendering behind a new renderer interface and updates serving paths accordingly.

Introduces vllm.renderers with RendererLike, registry, and implementations (HfRenderer, MistralRenderer, DeepseekV32Renderer); moves HF/Mistral chat template resolution, AST detection, kwargs filtering, and safe template application into renderers
Engine API: add EngineClient.renderer and remove get_tokenizer; all OpenAI endpoints (chat/completions/responses/pooling/tokenize), pooling/classify/embed/score, and context now use the renderer and its tokenizer; adds warmup via renderer
Inputs: InputPreprocessor builds a renderer from ModelConfig; adds parse_chat_messages_async; refines error text for skip_tokenizer_init
Misc: add ChatTemplateContentFormat type; move MultiModalUUIDDict import; simplify server chat template processing
Tests/CI: add tests/renderers/*, drop legacy chat/engine tests, enable new suite in Buildkite, and include vllm/renderers in mypy

^{Written by Cursor Bugbot for commit 4899b9e. This will update automatically on new commits. Configure here.}

Note

^{Cursor Bugbot is generating a summary for commit d431a11. Configure here.}

Note

Introduces a renderer abstraction to decouple prompt rendering from tokenizers and centralize chat template logic.

Add vllm.renderers: RendererLike protocol, registry, and implementations (HfRenderer, MistralRenderer, DeepseekV32Renderer, Grok2Renderer); safe chat-template application, kwargs filtering, and content-format detection moved here
Engine/serving refactor: EngineClient now exposes renderer (removed get_tokenizer); all OpenAI endpoints (chat/completions/responses/pooling/tokenize) render via the renderer, with warmup and simplified tokenization paths
Inputs update: InputPreprocessor builds a renderer from ModelConfig; tokenizer access flows through renderer.get_tokenizer(); add parse_chat_messages_async
APIs/types: add ChatTemplateContentFormat; deprecation shim in chat_utils; adjust error messages for skip_tokenizer_init; minor import path fixes (e.g., MultiModalUUIDDict)
Tests/CI: add tests/renderers/*, remove/adjust legacy tests, and run new suite in Buildkite; include vllm/renderers in mypy

^{Written by Cursor Bugbot for commit d431a11. This will update automatically on new commits. Configure here.}

Note

^{Cursor Bugbot is generating a summary for commit f41d339. Configure here.}

Note

Centralizes prompt rendering via a new vllm.renderers layer and removes direct tokenizer access from serving paths.

Add RendererLike, registry, and implementations (HfRenderer, MistralRenderer, DeepseekV32Renderer, Grok2Renderer); move HF/Mistral chat-template resolution, kwargs filtering, and format detection into renderers with safe_apply_chat_template
Replace EngineClient.get_tokenizer() with engine_client.renderer.get_tokenizer(); expose renderer on engine/models; update InputPreprocessor to build renderer from ModelConfig
Refactor OpenAI endpoints (chat, completions, responses, pooling, tokenize) to render prompts via renderer; add warmup; simplify tokenization flows; adjust error when skip_tokenizer_init=True
Add parse_chat_messages_async, ChatTemplateContentFormat; minor import path fixes (e.g., MultiModalUUIDDict)
Tests/CI: add tests/renderers/*, remove/migrate legacy tests; run new suite in Buildkite; include vllm/renderers in mypy

^{Written by Cursor Bugbot for commit f41d339. This will update automatically on new commits. Configure here.}

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist

Code Review

This pull request introduces a Renderer abstraction to process chat messages, which is a significant and well-executed refactoring. By moving tokenizer-specific and chat template logic into a new vllm.renderers module, the code becomes more modular and maintainable. The introduction of RendererLike protocol and RendererRegistry provides a clean interface and lazy registration mechanism. The changes are consistently applied across the codebase, including updates to entrypoints, tests, and input processing. This is a great step towards deprecating the old TokenizerRegistry and simplifying the chat message handling pipeline.

I found one minor issue related to a leftover ThreadPoolExecutor which I've commented on. Otherwise, the changes look solid.

vllm/entrypoints/pooling/score/serving.py

chatgpt-codex-connector

💡 Codex Review

vllm/vllm/tokenizers/registry.py

Lines 135 to 140 in f73bafb

    
           def cached_tokenizer_from_config(model_config: "ModelConfig", **kwargs): 
        
               return cached_get_tokenizer( 
        
                   model_config.tokenizer, 
        
                   tokenizer_mode=model_config.tokenizer_mode, 
        
                   revision=model_config.tokenizer_revision, 
        
                   trust_remote_code=model_config.trust_remote_code,

cached_tokenizer_from_config uses outdated get_tokenizer signature

cached_tokenizer_from_config still forwards model_config.tokenizer plus a tokenizer_mode kwarg to cached_get_tokenizer, but get_tokenizer now expects (tokenizer_cls, tokenizer_name, …) and no longer accepts tokenizer_mode (registry.py lines 66‑74). Any code path that loads a tokenizer via this helper (e.g., model executor classes calling cached_tokenizer_from_config) will now fail with missing positional/unknown keyword errors instead of initializing a tokenizer, blocking those models entirely.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/renderers/hf.py

mergify · 2025-12-07T08:34:39Z

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify · 2026-01-19T09:45:33Z

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

noooop · 2026-01-22T16:14:37Z

Awesome to hear landing！

AndreasKaratzas · 2026-01-23T03:55:43Z

@DarkLight1337 Hi :) I think this PR introduced a failure in the Transformers Nightly Model test group: https://buildkite.com/vllm/ci/builds/48034/steps/canvas?jid=019be4d2-f92f-443c-a608-6881e01b533b#019be4d2-f92f-443c-a608-6881e01b533b

mawong-amd · 2026-01-23T05:15:48Z

@DarkLight1337 Hi :) I think this PR introduced a failure in the Transformers Nightly Model test group: https://buildkite.com/vllm/ci/builds/48034/steps/canvas?jid=019be4d2-f92f-443c-a608-6881e01b533b#019be4d2-f92f-443c-a608-6881e01b533b

Hi @DarkLight1337, after further investigation, bisection does show that this PR works on Transformers 4.57.3, but causes an error on Transformers 5 tip-of-tree (and thus breaks the Transformers Nightly Models test group).
This can be seen by running python3 examples/offline_inference/basic/chat.py.

I'm not sure as to whether we maintain for Transformers Nightly Models given that by its nature, it breaks often? Would like to know your thoughts on this. Thanks!

AndreasKaratzas · 2026-01-23T05:22:02Z

Follow-up:

I updated my existing transformers PR to resolve the latest transformers regression:
#31849

…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>

…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 陈建华 <1647430658@qq.com>

…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

hmellor · 2026-02-03T07:33:26Z

For posterity, I'm currently the person looking at Transformers nightly tests. During the v4 v5 transition these tests have almost always been broken because there are lots of breaking (but much neeeded) changes in Transformers.

After v5 the release cadence will be increased and the main branch should be more stable. This means that the Transformers nightly test group should break less often and will be a valuable signal for keeping vLLM up to date with the latest Transfomers releases.

…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Renderer] Introduce Renderer

f73bafb

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested review from NickLucche, aarnphm, benchislett, chaunceyjiang, mgoin, noooop, patrickvonplaten, robertgshaw2-redhat, russellb and ywang96 as code owners December 7, 2025 08:30

mergify bot added frontend multi-modality Related to multi-modality (#4194) performance Performance-related issues gpt-oss Related to GPT-OSS models structured-output labels Dec 7, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Dec 7, 2025

mergify bot added the v1 label Dec 7, 2025

github-project-automation bot added this to Structured Output Dec 7, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Dec 7, 2025

mergify bot added the tool-calling label Dec 7, 2025

github-project-automation bot added this to Tool Calling Dec 7, 2025

gemini-code-assist bot reviewed Dec 7, 2025

View reviewed changes

vllm/entrypoints/pooling/score/serving.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Dec 7, 2025

View reviewed changes

vllm/renderers/hf.py Show resolved Hide resolved

DarkLight1337 added 3 commits December 7, 2025 08:39

Simplify

b7222cb

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Move simplify

4c81f01

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Typo

4a37cda

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 mentioned this pull request Dec 7, 2025

[Renderer] Introduce Renderer for processing chat messages (using RendererConfig) #30198

Closed

5 tasks

Set the tokenizer name

3d11a96

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

c27a8b9

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 added 2 commits January 19, 2026 09:52

More fixes

8dc62d9

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

4d6010b

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 changed the title ~~[DO NOT MERGE] Introduce Renderer for processing chat messages (using ModelConfig)~~ [Frontend] Introduce Renderer for processing chat messages (using ModelConfig) Jan 21, 2026

ywang96 approved these changes Jan 22, 2026

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Jan 22, 2026

Merge branch 'main' into init-renderer-model

08b23b2

DarkLight1337 enabled auto-merge (squash) January 22, 2026 08:29

DarkLight1337 merged commit d117a4d into vllm-project:main Jan 22, 2026
54 checks passed

DarkLight1337 deleted the init-renderer-model branch January 22, 2026 12:44

github-project-automation bot moved this to Done in Tool Calling Jan 22, 2026

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Jan 22, 2026

github-project-automation bot moved this to Done in Structured Output Jan 22, 2026

DarkLight1337 mentioned this pull request Jan 23, 2026

[Frontend] Use new Renderer for Completions and Tokenize API #32863

Merged

5 tasks

lapy pushed a commit to lapy/vllm that referenced this pull request Jan 27, 2026

[Frontend] Introduce Renderer for processing chat messages (using `Mo…

eadcbc1

…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

jeffreywang-anyscale mentioned this pull request Jan 31, 2026

[deps][LLM] Upgrade vLLM to 0.15.0 ray-project/ray#60253

Closed

6 tasks

qandrew mentioned this pull request Feb 19, 2026

[Feature]: vLLM ResponsesAPI & Tool Calling H1 2026 lookahead #34857

Open

1 task

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Frontend] Introduce Renderer for processing chat messages (using `Mo…

29ab26a

…delConfig`) (vllm-project#30200) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 mentioned this pull request Feb 21, 2026

Update vLLM import of resolve_hf_chat_template EleutherAI/lm-evaluation-harness#3595

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`)#30200

[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`)#30200
DarkLight1337 merged 94 commits intovllm-project:mainfrom
DarkLight1337:init-renderer-model

DarkLight1337 commented Dec 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

mergify bot commented Dec 7, 2025

Uh oh!

mergify bot commented Jan 19, 2026

Uh oh!

Uh oh!

noooop commented Jan 22, 2026

Uh oh!

AndreasKaratzas commented Jan 23, 2026

Uh oh!

mawong-amd commented Jan 23, 2026

Uh oh!

AndreasKaratzas commented Jan 23, 2026

Uh oh!

hmellor commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

	def cached_tokenizer_from_config(model_config: "ModelConfig", **kwargs):
	return cached_get_tokenizer(
	model_config.tokenizer,
	tokenizer_mode=model_config.tokenizer_mode,
	revision=model_config.tokenizer_revision,
	trust_remote_code=model_config.trust_remote_code,

Uh oh!

Conversation

DarkLight1337 commented Dec 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mergify bot commented Dec 7, 2025

Uh oh!

mergify bot commented Jan 19, 2026

Uh oh!

Uh oh!

noooop commented Jan 22, 2026

Uh oh!

AndreasKaratzas commented Jan 23, 2026

Uh oh!

mawong-amd commented Jan 23, 2026

Uh oh!

AndreasKaratzas commented Jan 23, 2026

Uh oh!

hmellor commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

DarkLight1337 commented Dec 7, 2025 •

edited by github-actions bot

Loading