[Frontend] Allow engine_client=None in OpenAIServingModels#36655
[Frontend] Allow engine_client=None in OpenAIServingModels#36655sagearc wants to merge 2 commits intovllm-project:mainfrom
engine_client=None in OpenAIServingModels#36655Conversation
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
| async def load_lora_adapter( | ||
| self, request: LoadLoRAAdapterRequest, base_model_name: str | None = None | ||
| ) -> ErrorResponse | str: | ||
| if self.engine_client is None: |
There was a problem hiding this comment.
I think this somewhat goes against #36536 (comment)?
There was a problem hiding this comment.
@DarkLight1337 Oh that wasn't my intention, I tried to unify the engine-free path with the default one, since the model check in the original openai models can dynamically load loras...
What I'm trying to achieve here and in the previous PR is openai render sharing the same openai models as the rest of the serving classes. Since the full model check can include engine operations, I came with inheritance/shared object for both paths, since composition would cause the renderer to have incomplete model check. There's also a chance I didn't fully get you intentions, in that case I'd love for a clarification here :)
Thanks for taking the time to review these
There was a problem hiding this comment.
I would like to achieve a cleaner separation between the code paths with vs. without engine client. In that case I prefer the previous PR #36536
gambletan
left a comment
There was a problem hiding this comment.
Clean refactor moving model-checking logic from OpenAIServingEngine._check_model into OpenAIServingModels.check_model, and allowing engine_client=None for render-only mode. The separation of concerns is good.
A few observations:
-
_is_model_supportedbehavior change inengine/serving.py: The original code returnedTruewhenmodel_nameisNone(falsy), but now that check has been moved toOpenAIServingModels.is_base_model. The remaining_is_model_supportedinengine/serving.py(line ~1173) now delegates directly toself.models.is_base_model(model_name)without the earlyNoneguard. Sinceis_base_modelnow handlesNonecorrectly, this works, but_is_model_supportedis still called in other places — worth verifying all call sites still get the expectedNone→Truebehavior. -
create_error_responseinmodels/serving.py: Theparamkeyword argument was added to the module-levelcreate_error_responsefunction. SinceErrorInfomay not have aparamfield in all versions, it would be good to confirm this field exists onErrorInfo. IfErrorInfouses a strict Pydantic model, passing an unexpected field could raise a validation error. -
pooling/base/serving.py: The change frommodels.renderertoengine_client.renderermakes sense given thatrendererwas removed fromOpenAIServingModels, but this meansengine_clientcan never beNonewhen pooling is used. The type signature ofOpenAIServingModels.__init__now acceptsengine_client: EngineClient | None, but pooling code would crash withAttributeErrorifNonewere passed. A defensive check or a comment noting this constraint would be helpful. -
LoRA guard duplication: Both
load_lora_adapterandresolve_loranow have identicalif self.engine_client is None: return create_error_response(...)blocks. Consider extracting this into a small helper like_require_engine_client()to reduce duplication.
There was a problem hiding this comment.
Code Review
This pull request refactors the OpenAIServingModels class to allow it to be instantiated without an engine_client, which is a key change for supporting a render-only mode. The logic for checking model validity, including LoRA resolution, has been consolidated into OpenAIServingModels, simplifying other parts of the codebase like OpenAIServing and OpenAIServingRender which now delegate these checks.
The changes are well-structured and improve the modularity of the code. I've added one suggestion to further improve maintainability by avoiding duplicated code for a critical check that guards LoRA operations in the new render-only mode.
Note: Security Review did not run due to the size of the PR.
| if self.engine_client is None: | ||
| return create_error_response( | ||
| message="LoRA adapters are not supported in render-only mode.", | ||
| err_type="BadRequestError", | ||
| status_code=HTTPStatus.BAD_REQUEST, | ||
| ) |
There was a problem hiding this comment.
To avoid duplicating this check for engine_client is None in both load_lora_adapter and resolve_lora, consider extracting it into a helper method. This improves maintainability and ensures consistency if more LoRA-related methods are added in the future. This is a critical guard for the new render-only mode, and centralizing it reduces the risk of errors.
For example, you could add a private method:
def _check_lora_supported(self) -> ErrorResponse | None:
"""Return an error if LoRA adapters are not supported, else None."""
if self.engine_client is None:
return create_error_response(
message="LoRA adapters are not supported in render-only mode.",
err_type="BadRequestError",
status_code=HTTPStatus.BAD_REQUEST,
)
return NoneAnd then call it from both load_lora_adapter and resolve_lora:
if (error := self._check_lora_supported()) is not None:
return error|
This pull request has merge conflicts that must be resolved before it can be |
|
@gambletan Appreciate the review! I'll take a look |
|
Closed in favor of #36536 |
ref: #36536 (comment)
Purpose
Removes the need for a separate model-checking path in render-only mode by allowing
OpenAIServingModelsto be constructed without an engine client.model_configmust be supplied explicitly in that case.check_model(including runtime LoRA resolution) is consolidated intoOpenAIServingModels, so both the engine-backed and render-only paths share a single entrypoint.OpenAIServing._check_modeland_is_model_supportedbecome simple delegates.OpenAIServingRendernow acceptsmodels: OpenAIServingModelsinstead ofserved_model_names: list[str]init_render_app_stateconstructsOpenAIServingModels(engine_client=None, ...)directlyTest Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.