Skip to content

[Frontend] Introduce Renderer for processing chat messages (using ModelConfig)#30200

Merged
DarkLight1337 merged 94 commits intovllm-project:mainfrom
DarkLight1337:init-renderer-model
Jan 22, 2026
Merged

[Frontend] Introduce Renderer for processing chat messages (using ModelConfig)#30200
DarkLight1337 merged 94 commits intovllm-project:mainfrom
DarkLight1337:init-renderer-model

Conversation

@DarkLight1337
Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 commented Dec 7, 2025

Ready for review.

Purpose

  • Prototype an interface, vllm.renderers.RendererLike, to process chat messages into engine inputs.
  • Introduce RENDERER_REGISTRY which lazily registers renderers to avoid circular import problem.
  • Move implementation-specific chat utils to the corresponding renderer in vllm.renderers.
  • Initialize the renderer in InputPreprocessor, replacing the tokenizer initialization inside LLMEngine and AsyncLLM.
  • Replace EngineClient.get_tokenizer() with EngineClient.renderer.get_tokenizer() to avoid unnecessary async.
  • Update tests accordingly, and move some tests into a new directory tests/renderers that is run under Async Engine, Inputs, Utils, Worker, Config Test (CPU).

Towards #22880 and #23873

Future work:

  • Merge CompletionRenderer with this implementation of Renderer.
  • Move apply_chat_template from tokenizer into renderer.
  • Move microbatch tokenizer into renderer.
  • Since each renderer uses a specific tokenizer, we can deprecate TokenizerRegistry in favor of RENDERER_REGISTRY and rename tokenizer_mode to renderer_mode.
  • Split out renderer-specific fields from ModelConfig.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Cursor Bugbot is generating a summary for commit 3c3c2ab. Configure here.


Note

Introduces a render pipeline that decouples prompt rendering from tokenizers and centralizes chat template logic.

  • Adds RendererLike interface and RENDERER_REGISTRY; new implementations: HfRenderer, MistralRenderer, DeepseekV32Renderer, TerratorchRenderer
  • InputPreprocessor now builds a renderer from ModelConfig; engines (LLMEngine, AsyncLLM) and EngineClient expose renderer and no longer initialize/await tokenizers; replace get_tokenizer() calls with renderer.get_tokenizer()
  • Moves HF/Mistral chat template resolution and application to vllm/renderers/{hf,mistral} (safe_apply_chat_template, content format detection, kwargs filtering); keeps a deprecation shim in chat_utils; removes OpenAI server-side process_chat_template
  • Refactors OpenAI serving (chat/completions/responses/pooling/tokenize) to render prompts via renderer, adds warmup using renderer, and adjusts tool parsing/tokenization flows
  • Updates scoring (score_utils) to use renderer-safe chat templating; fixes error messages when skip_tokenizer_init=True; minor imports/migrations (e.g., MultiModalUUIDDict path)
  • Testing/CI: adds tests/renderers/{hf,mistral}.py, removes legacy tests, and runs new renderers suite in Buildkite; adds mypy path for vllm/renderers

Written by Cursor Bugbot for commit 3c3c2ab. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit dbd9bbb. Configure here.


Note

Introduces a renderer pipeline to decouple prompt rendering from tokenizers and centralize chat template handling.

  • Adds vllm.renderers with RendererLike, registry, and implementations: HfRenderer, MistralRenderer, DeepseekV32Renderer
  • Replaces EngineClient.get_tokenizer() with engine_client.renderer.get_tokenizer(); expose renderer across engines and serving
  • Moves HF/Mistral chat template resolution/AST detection/kwargs filtering into vllm/renderers/* with safe_apply_chat_template; keeps deprecation shim in chat_utils
  • Refactors OpenAI endpoints (chat/completions/responses/pooling/tokenize) to use renderer for prompt rendering; adds warmup; simplifies tokenization paths
  • Updates InputPreprocessor to build renderer from ModelConfig; adjusts score_utils to use renderer-safe templating; improves skip_tokenizer_init error messaging
  • Protocol/API updates: EngineClient now exposes renderer; adds ChatTemplateContentFormat type; minor path moves (e.g., MultiModalUUIDDict)
  • Tests/CI: removes legacy chat/engine tests, adds tests/renderers/*, updates OpenAI tests to use renderer, and runs new suite in Buildkite; adds mypy path for vllm/renderers

Written by Cursor Bugbot for commit dbd9bbb. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit 4899b9e. Configure here.


Note

Centralizes prompt rendering behind a new renderer interface and updates serving paths accordingly.

  • Introduces vllm.renderers with RendererLike, registry, and implementations (HfRenderer, MistralRenderer, DeepseekV32Renderer); moves HF/Mistral chat template resolution, AST detection, kwargs filtering, and safe template application into renderers
  • Engine API: add EngineClient.renderer and remove get_tokenizer; all OpenAI endpoints (chat/completions/responses/pooling/tokenize), pooling/classify/embed/score, and context now use the renderer and its tokenizer; adds warmup via renderer
  • Inputs: InputPreprocessor builds a renderer from ModelConfig; adds parse_chat_messages_async; refines error text for skip_tokenizer_init
  • Misc: add ChatTemplateContentFormat type; move MultiModalUUIDDict import; simplify server chat template processing
  • Tests/CI: add tests/renderers/*, drop legacy chat/engine tests, enable new suite in Buildkite, and include vllm/renderers in mypy

Written by Cursor Bugbot for commit 4899b9e. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit d431a11. Configure here.


Note

Introduces a renderer abstraction to decouple prompt rendering from tokenizers and centralize chat template logic.

  • Add vllm.renderers: RendererLike protocol, registry, and implementations (HfRenderer, MistralRenderer, DeepseekV32Renderer, Grok2Renderer); safe chat-template application, kwargs filtering, and content-format detection moved here
  • Engine/serving refactor: EngineClient now exposes renderer (removed get_tokenizer); all OpenAI endpoints (chat/completions/responses/pooling/tokenize) render via the renderer, with warmup and simplified tokenization paths
  • Inputs update: InputPreprocessor builds a renderer from ModelConfig; tokenizer access flows through renderer.get_tokenizer(); add parse_chat_messages_async
  • APIs/types: add ChatTemplateContentFormat; deprecation shim in chat_utils; adjust error messages for skip_tokenizer_init; minor import path fixes (e.g., MultiModalUUIDDict)
  • Tests/CI: add tests/renderers/*, remove/adjust legacy tests, and run new suite in Buildkite; include vllm/renderers in mypy

Written by Cursor Bugbot for commit d431a11. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit f41d339. Configure here.


Note

Centralizes prompt rendering via a new vllm.renderers layer and removes direct tokenizer access from serving paths.

  • Add RendererLike, registry, and implementations (HfRenderer, MistralRenderer, DeepseekV32Renderer, Grok2Renderer); move HF/Mistral chat-template resolution, kwargs filtering, and format detection into renderers with safe_apply_chat_template
  • Replace EngineClient.get_tokenizer() with engine_client.renderer.get_tokenizer(); expose renderer on engine/models; update InputPreprocessor to build renderer from ModelConfig
  • Refactor OpenAI endpoints (chat, completions, responses, pooling, tokenize) to render prompts via renderer; add warmup; simplify tokenization flows; adjust error when skip_tokenizer_init=True
  • Add parse_chat_messages_async, ChatTemplateContentFormat; minor import path fixes (e.g., MultiModalUUIDDict)
  • Tests/CI: add tests/renderers/*, remove/migrate legacy tests; run new suite in Buildkite; include vllm/renderers in mypy

Written by Cursor Bugbot for commit f41d339. This will update automatically on new commits. Configure here.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Renderer abstraction to process chat messages, which is a significant and well-executed refactoring. By moving tokenizer-specific and chat template logic into a new vllm.renderers module, the code becomes more modular and maintainable. The introduction of RendererLike protocol and RendererRegistry provides a clean interface and lazy registration mechanism. The changes are consistently applied across the codebase, including updates to entrypoints, tests, and input processing. This is a great step towards deprecating the old TokenizerRegistry and simplifying the chat message handling pipeline.

I found one minor issue related to a leftover ThreadPoolExecutor which I've commented on. Otherwise, the changes look solid.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

def cached_tokenizer_from_config(model_config: "ModelConfig", **kwargs):
return cached_get_tokenizer(
model_config.tokenizer,
tokenizer_mode=model_config.tokenizer_mode,
revision=model_config.tokenizer_revision,
trust_remote_code=model_config.trust_remote_code,

P0 Badge cached_tokenizer_from_config uses outdated get_tokenizer signature

cached_tokenizer_from_config still forwards model_config.tokenizer plus a tokenizer_mode kwarg to cached_get_tokenizer, but get_tokenizer now expects (tokenizer_cls, tokenizer_name, …) and no longer accepts tokenizer_mode (registry.py lines 66‑74). Any code path that loads a tokenizer via this helper (e.g., model executor classes calling cached_tokenizer_from_config) will now fail with missing positional/unknown keyword errors instead of initializing a tokenizer, blocking those models entirely.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mergify
Copy link
Copy Markdown

mergify bot commented Dec 7, 2025

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 19, 2026

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 changed the title [DO NOT MERGE] Introduce Renderer for processing chat messages (using ModelConfig) [Frontend] Introduce Renderer for processing chat messages (using ModelConfig) Jan 21, 2026
@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Jan 22, 2026
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 22, 2026 08:29
@DarkLight1337 DarkLight1337 merged commit d117a4d into vllm-project:main Jan 22, 2026
54 checks passed
@DarkLight1337 DarkLight1337 deleted the init-renderer-model branch January 22, 2026 12:44
@noooop
Copy link
Copy Markdown
Collaborator

noooop commented Jan 22, 2026

Awesome to hear landing!

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

@DarkLight1337 Hi :) I think this PR introduced a failure in the Transformers Nightly Model test group: https://buildkite.com/vllm/ci/builds/48034/steps/canvas?jid=019be4d2-f92f-443c-a608-6881e01b533b#019be4d2-f92f-443c-a608-6881e01b533b

@mawong-amd
Copy link
Copy Markdown
Contributor

@DarkLight1337 Hi :) I think this PR introduced a failure in the Transformers Nightly Model test group: https://buildkite.com/vllm/ci/builds/48034/steps/canvas?jid=019be4d2-f92f-443c-a608-6881e01b533b#019be4d2-f92f-443c-a608-6881e01b533b

Hi @DarkLight1337, after further investigation, bisection does show that this PR works on Transformers 4.57.3, but causes an error on Transformers 5 tip-of-tree (and thus breaks the Transformers Nightly Models test group).
This can be seen by running python3 examples/offline_inference/basic/chat.py.

I'm not sure as to whether we maintain for Transformers Nightly Models given that by its nature, it breaks often? Would like to know your thoughts on this. Thanks!

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

Follow-up:

I updated my existing transformers PR to resolve the latest transformers regression:
#31849

monajafi-amd pushed a commit to monajafi-amd/vllm that referenced this pull request Jan 23, 2026
…delConfig`) (vllm-project#30200)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
cwazai pushed a commit to cwazai/vllm that referenced this pull request Jan 25, 2026
…delConfig`) (vllm-project#30200)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: 陈建华 <1647430658@qq.com>
lapy pushed a commit to lapy/vllm that referenced this pull request Jan 27, 2026
…delConfig`) (vllm-project#30200)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@hmellor
Copy link
Copy Markdown
Member

hmellor commented Feb 3, 2026

For posterity, I'm currently the person looking at Transformers nightly tests. During the v4 v5 transition these tests have almost always been broken because there are lots of breaking (but much neeeded) changes in Transformers.

After v5 the release cadence will be increased and the main branch should be more stable. This means that the Transformers nightly test group should break less often and will be a valuable signal for keeping vLLM up to date with the latest Transfomers releases.

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…delConfig`) (vllm-project#30200)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models multi-modality Related to multi-modality (#4194) performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed structured-output tool-calling v1

Projects

Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants