[Frontend] Allow `engine_client=None` in `OpenAIServingModels` by sagearc · Pull Request #36655 · vllm-project/vllm

sagearc · 2026-03-10T14:03:44Z

Purpose

Removes the need for a separate model-checking path in render-only mode by allowing OpenAIServingModels to be constructed without an engine client. model_config must be supplied explicitly in that case.

check_model (including runtime LoRA resolution) is consolidated into OpenAIServingModels, so both the engine-backed and render-only paths share a single entrypoint. OpenAIServing._check_model and _is_model_supported become simple delegates.

OpenAIServingRender now accepts models: OpenAIServingModels instead of served_model_names: list[str]
init_render_app_state constructs OpenAIServingModels(engine_client=None, ...) directly

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc · 2026-03-10T14:04:40Z

cc @DarkLight1337

DarkLight1337 · 2026-03-10T14:15:23Z

vllm/entrypoints/openai/models/serving.py

    async def load_lora_adapter(
        self, request: LoadLoRAAdapterRequest, base_model_name: str | None = None
    ) -> ErrorResponse | str:
+        if self.engine_client is None:


I think this somewhat goes against #36536 (comment)?

@DarkLight1337 Oh that wasn't my intention, I tried to unify the engine-free path with the default one, since the model check in the original openai models can dynamically load loras...

What I'm trying to achieve here and in the previous PR is openai render sharing the same openai models as the rest of the serving classes. Since the full model check can include engine operations, I came with inheritance/shared object for both paths, since composition would cause the renderer to have incomplete model check. There's also a chance I didn't fully get you intentions, in that case I'd love for a clarification here :)

Thanks for taking the time to review these

I would like to achieve a cleaner separation between the code paths with vs. without engine client. In that case I prefer the previous PR #36536

gambletan

Clean refactor moving model-checking logic from OpenAIServingEngine._check_model into OpenAIServingModels.check_model, and allowing engine_client=None for render-only mode. The separation of concerns is good.

A few observations:

_is_model_supported behavior change in engine/serving.py: The original code returned True when model_name is None (falsy), but now that check has been moved to OpenAIServingModels.is_base_model. The remaining _is_model_supported in engine/serving.py (line ~1173) now delegates directly to self.models.is_base_model(model_name) without the early None guard. Since is_base_model now handles None correctly, this works, but _is_model_supported is still called in other places — worth verifying all call sites still get the expected None → True behavior.
create_error_response in models/serving.py: The param keyword argument was added to the module-level create_error_response function. Since ErrorInfo may not have a param field in all versions, it would be good to confirm this field exists on ErrorInfo. If ErrorInfo uses a strict Pydantic model, passing an unexpected field could raise a validation error.
pooling/base/serving.py: The change from models.renderer to engine_client.renderer makes sense given that renderer was removed from OpenAIServingModels, but this means engine_client can never be None when pooling is used. The type signature of OpenAIServingModels.__init__ now accepts engine_client: EngineClient | None, but pooling code would crash with AttributeError if None were passed. A defensive check or a comment noting this constraint would be helpful.
LoRA guard duplication: Both load_lora_adapter and resolve_lora now have identical if self.engine_client is None: return create_error_response(...) blocks. Consider extracting this into a small helper like _require_engine_client() to reduce duplication.

gemini-code-assist

Code Review

This pull request refactors the OpenAIServingModels class to allow it to be instantiated without an engine_client, which is a key change for supporting a render-only mode. The logic for checking model validity, including LoRA resolution, has been consolidated into OpenAIServingModels, simplifying other parts of the codebase like OpenAIServing and OpenAIServingRender which now delegate these checks.

The changes are well-structured and improve the modularity of the code. I've added one suggestion to further improve maintainability by avoiding duplicated code for a critical check that guards LoRA operations in the new render-only mode.

_{Note: Security Review did not run due to the size of the PR.}

gemini-code-assist · 2026-03-10T15:14:15Z

vllm/entrypoints/openai/models/serving.py

+        if self.engine_client is None:
+            return create_error_response(
+                message="LoRA adapters are not supported in render-only mode.",
+                err_type="BadRequestError",
+                status_code=HTTPStatus.BAD_REQUEST,
+            )


To avoid duplicating this check for engine_client is None in both load_lora_adapter and resolve_lora, consider extracting it into a helper method. This improves maintainability and ensures consistency if more LoRA-related methods are added in the future. This is a critical guard for the new render-only mode, and centralizing it reduces the risk of errors.

For example, you could add a private method:

def _check_lora_supported(self) -> ErrorResponse | None: """Return an error if LoRA adapters are not supported, else None.""" if self.engine_client is None: return create_error_response( message="LoRA adapters are not supported in render-only mode.", err_type="BadRequestError", status_code=HTTPStatus.BAD_REQUEST, ) return None

And then call it from both load_lora_adapter and resolve_lora:

if (error := self._check_lora_supported()) is not None: return error

mergify · 2026-03-11T03:45:04Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sagearc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

sagearc · 2026-03-11T14:22:36Z

@gambletan Appreciate the review! I'll take a look

sagearc · 2026-03-11T19:15:18Z

Closed in favor of #36536

sagearc added 2 commits March 10, 2026 15:52

allow engine client to be None in OpenAIServingModels

09aa62e

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

delegate check models to openai serving models

1328fec

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc requested review from DarkLight1337, aarnphm, chaunceyjiang, njhill, noooop and russellb as code owners March 10, 2026 14:03

sagearc mentioned this pull request Mar 10, 2026

[Frontend] Split OpenAIServingModels into OpenAIModelRegistry + OpenAIServingModels #36536

Merged

5 tasks

mergify bot added the frontend label Mar 10, 2026

DarkLight1337 reviewed Mar 10, 2026

View reviewed changes

gambletan reviewed Mar 10, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 10, 2026

View reviewed changes

mergify bot added the needs-rebase label Mar 11, 2026

sagearc closed this Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Frontend] Allow `engine_client=None` in `OpenAIServingModels`#36655

[Frontend] Allow `engine_client=None` in `OpenAIServingModels`#36655
sagearc wants to merge 2 commits intovllm-project:mainfrom
sagearc:engine-free-openai-serving-models

sagearc commented Mar 10, 2026 •

edited by github-actions bot

Loading

Uh oh!

sagearc commented Mar 10, 2026

Uh oh!

DarkLight1337 Mar 10, 2026

Uh oh!

sagearc Mar 11, 2026

Uh oh!

DarkLight1337 Mar 11, 2026 •

edited

Loading

Uh oh!

gambletan left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

sagearc commented Mar 11, 2026 •

edited

Loading

Uh oh!

sagearc commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sagearc commented Mar 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

sagearc commented Mar 10, 2026

Uh oh!

DarkLight1337 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

sagearc Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gambletan left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

sagearc commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sagearc commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sagearc commented Mar 10, 2026 •

edited by github-actions bot

Loading

DarkLight1337 Mar 11, 2026 •

edited

Loading

sagearc commented Mar 11, 2026 •

edited

Loading