[Renderer] Introduce Renderer for processing chat messages (using RendererConfig)#30198
[Renderer] Introduce Renderer for processing chat messages (using RendererConfig)#30198DarkLight1337 wants to merge 1 commit intovllm-project:mainfrom
RendererConfig)#30198Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a Renderer abstraction to encapsulate chat template processing and tokenization logic. This is a significant and positive architectural refactoring, moving model-specific rendering logic out of the core engine and into dedicated renderer classes. The changes are extensive, touching many parts of the codebase to replace direct tokenizer usage with the new renderer interface. The implementation appears mostly correct and consistent. However, I found a bug in the MistralToolParser where an instance variable is not initialized, which will lead to an AttributeError.
| @@ -112,6 +114,8 @@ def __init__(self, tokenizer: TokenizerLike): | |||
| "the tokenizer!" | |||
| ) | |||
|
|
|||
| self.prev_tool_call_arr: list[dict[str, Any]] | |||
There was a problem hiding this comment.
| @@ -106,7 +104,7 @@ async def _preprocess( | |||
| ctx.engine_prompts = [] | |||
| return None | |||
|
|
|||
| renderer = self._get_renderer(ctx.tokenizer) | |||
| renderer = self._get_renderer(self.renderer.tokenizer) | |||
There was a problem hiding this comment.
Note that vllm.renderers.Renderer (self.renderer) is currently for chat only, and is not to be confused with the Renderer inside vllm.entrypoints.renderer.CompletionRenderer (the result of self._get_renderer). The two implementations will be merged in a later PR.
| @@ -281,16 +267,13 @@ def __init__( | |||
|
|
|||
| self.request_logger = request_logger | |||
| self.return_tokens_as_token_ids = return_tokens_as_token_ids | |||
| self._tokenizer_executor = ThreadPoolExecutor(max_workers=1) | |||
There was a problem hiding this comment.
This has been moved to the Mistral renderer
| request: RequestT | ||
| raw_request: Request | None = None | ||
| model_name: str | ||
| request_id: str | ||
| created_time: int = field(default_factory=lambda: int(time.time())) | ||
| lora_request: LoRARequest | None = None | ||
|
|
||
| # Shared across most requests |
There was a problem hiding this comment.
Prefer using self.renderer to simplify the code.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
0db05b1 to
47c1f05
Compare
RendererConfig)
RendererConfig)ModelConfig)
ModelConfig)RendererConfig)
|
Superseded by #30200 |
Purpose
vllm.renderers.RendererLike, to process chat messages into engine inputs.RendererRegistrywhich lazily registers renderers to avoid circular import problem..vllm.renderers.tokenizer_modetorenderer_mode, and use a specific tokenizer implementation for each renderer, deprecatingTokenizerRegistryin the process.Towards #22880 and #23873
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.