[Frontend] Complete OpenAI render delegation#37287
[Frontend] Complete OpenAI render delegation#37287DarkLight1337 merged 8 commits intovllm-project:mainfrom
Conversation
…ngRender Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…ingResponses Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request completes the delegation of rendering logic to OpenAIServingRender by moving several preprocessing methods out of the base OpenAIServing class. The changes are well-structured and consistently applied across multiple components, including ServingTokens, OpenAIServingPooling, and OpenAIServingResponses. The refactoring centralizes the rendering logic, improving code organization and separation of concerns. The moved methods and their call sites have been updated correctly. Overall, this is a solid refactoring effort with no apparent issues.
| @@ -47,6 +48,7 @@ def __init__( | |||
| self, | |||
| engine_client: EngineClient, | |||
There was a problem hiding this comment.
Can the "OpenAI" and "openai_" prefixes of OpenAIServingRender be removed? I think this has nothing to do with OpenAI.
There was a problem hiding this comment.
Yeah let's do that in the next PR
There was a problem hiding this comment.
We can discuss scope, but it does have something to do with OpenAI — the main entrypoints accept CompletionRequest and ChatCompletionRequest, which are OpenAI API types.
There was a problem hiding this comment.
In my understanding, the renderer is the preprocessing layer of vLLM and is vendor-independent, as it will be used by the entrypoints of all vendors.
There was a problem hiding this comment.
From my understanding that might make more sense for the renderer itself? class LLM seems to use it directly via its own API, without going through much of the logic in OpenAIServingRender.
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
with vllm top commit 1b6cb920e6ebcac57154e6154578c39d4892a16c has some diffs with vllm-omni, vllm-project/vllm#32104 vllm-project/vllm#32951 vllm-project/vllm#37287 vllm-project/vllm#36483 just modify vllm-omni to work Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
Completes the disaggregated frontend work from #36166 by delegating all remaining
OpenAIServingpreprocessing methods toOpenAIServingRender— the canonical GPU-less render layer — and removing the now-duplicate copies from the base class.Changes
ServingTokens— routespreprocess_completionthroughopenai_serving_renderOpenAIServingPooling— routespreprocess_completion,preprocess_cmpl,preprocess_chat, andvalidate_chat_templatethroughopenai_serving_renderOpenAIServingResponses— pulls down_render_next_turnand_generate_with_builtin_toolsfrom the base class (exclusively used here)OpenAIServing— 5 methods and their associated imports removedTest Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.