[Frontend] Split OpenAIServingModels into OpenAIModelRegistry + OpenAIServingModels#36536
Conversation
|
Hi @sagearc, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
The code refactors the OpenAI API entrypoints by introducing an OpenAIModelRegistry class to centralize base model management and checks, with OpenAIServingModels now inheriting from it to handle LoRA-specific logic. This change streamlines model validation and listing across various API components, including api_server, engine/serving, generate/api_router, and render/serving, by delegating these operations to the new model_registry object. A review comment pointed out a potential IndexError in OpenAIModelRegistry.model_name if base_model_paths is empty, suggesting an explicit check for improved robustness.
| ) | ||
|
|
||
|
|
||
| class OpenAIServingModels(OpenAIModelRegistry): |
There was a problem hiding this comment.
I prefer composition over inheritance here
There was a problem hiding this comment.
Since OpenAIServingModels requires an engine client, OpenAIServingRender relies on the base OpenAIModelRegistry to stay engine-free. Inheritance allows the overridden check_model to still pick up loras automatically during serving. How would you recommend structuring this dependency with composition?
There was a problem hiding this comment.
I mean that OpenAIServingModels can contain an instance of OpenAIModelRegistry. There is no need to change OpenAIServingRender itself.
There was a problem hiding this comment.
I understand using composition there. My main hesitation is how that interacts with the renderer. If my understanding is correct, passing the contained OpenAIModelRegistry to OpenAIServingRender would mean the renderer only checks for base models.
I originally structured it this way to delegate both preprocessing and the model check from the serving layer to the renderer, to avoid having two separate entry points for preprocessing (one with the check, and one without, ref). I might be missing a cleaner way to wire this up though
There was a problem hiding this comment.
LoRA doesn't affect Renderer itself, though I understand from the client's perspective they should be able to use the LoRA model for both components. Perhaps we need to integrate this logic even in the engine-less case then.
There was a problem hiding this comment.
After taking a further look, choosing composition here won't reduce the need for a duplicate check_model in both OpenAIServingRender and the serving layers, since check_model can dynamically load LoRAs.
What if, instead of the current approach, we revert the changes introduced in this PR and allow the engine client to be None in OpenAIServingModels?
There was a problem hiding this comment.
Can you open a new PR to show what that looks like?
There was a problem hiding this comment.
Something like that?
#36655
@DarkLight1337
|
Hi @sagearc, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
1 similar comment
|
Hi @sagearc, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
This pull request has merge conflicts that must be resolved before it can be |
48a8a69 to
d230fc0
Compare
4a5083e to
02dc788
Compare
|
@DarkLight1337 kept changes minimal, registry handles the base models while keeping lora related logic untouched |
…IServingModels Introduce OpenAIModelRegistry as a lightweight, engine-free base class for model verification (check_model, show_available_models), suitable for CPU-only / render-only contexts with no LoRA support. OpenAIServingModels composes an OpenAIModelRegistry and layers LoRA adapter CRUD on top. OpenAIServing._check_model retains the full LoRA-aware verification logic (static + runtime resolution). OpenAIServingRender now accepts model_registry: OpenAIModelRegistry instead of served_model_names: list[str], removing duplicated model checking and show_available_models code. Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
02dc788 to
aa2d318
Compare
DarkLight1337
left a comment
There was a problem hiding this comment.
This looks better, thanks
…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
Purpose
After #36166,
OpenAIServingRender(a CPU-only, engine-free handler) was receiving a barelist[str]of model names and reimplementing model lookup logic (_check_model,_is_model_supported,show_available_models).This PR extracts
OpenAIModelRegistry— a lightweight, engine-free class for base-model verification — and wires it into the render path via composition.Changes
OpenAIModelRegistry(new): read-only base-model registry with no engine/LoRA dependency. Providescheck_model,show_available_models,is_base_model,model_name.OpenAIServingModels: composesOpenAIModelRegistryviaself.registry; delegates base-model ops to it, layers LoRA adapter CRUD on top.OpenAIServing._check_model: unchanged — retains the full LoRA-aware verification logic (static + runtime resolution).OpenAIServingRender: acceptsmodel_registry: OpenAIModelRegistryinstead ofserved_model_names: list[str]; removes duplicate_check_model,_is_model_supported, andshow_available_models./v1/modelsendpoint: return type widened toOpenAIModelRegistry | OpenAIServingModels.init_render_app_state: constructsOpenAIModelRegistrydirectly for the render-only server.Test Plan
Existing tests cover the affected code paths.
Test Result
Pre-commit checks pass.
Essential Elements of an Effective PR Description Checklist