Skip to content

[Frontend] Allow engine_client=None in OpenAIServingModels#36655

Closed
sagearc wants to merge 2 commits intovllm-project:mainfrom
sagearc:engine-free-openai-serving-models
Closed

[Frontend] Allow engine_client=None in OpenAIServingModels#36655
sagearc wants to merge 2 commits intovllm-project:mainfrom
sagearc:engine-free-openai-serving-models

Conversation

@sagearc
Copy link
Copy Markdown
Contributor

@sagearc sagearc commented Mar 10, 2026

ref: #36536 (comment)

Purpose

Removes the need for a separate model-checking path in render-only mode by allowing OpenAIServingModels to be constructed without an engine client. model_config must be supplied explicitly in that case.

check_model (including runtime LoRA resolution) is consolidated into OpenAIServingModels, so both the engine-backed and render-only paths share a single entrypoint. OpenAIServing._check_model and _is_model_supported become simple delegates.

  • OpenAIServingRender now accepts models: OpenAIServingModels instead of served_model_names: list[str]
  • init_render_app_state constructs OpenAIServingModels(engine_client=None, ...) directly

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

sagearc added 2 commits March 10, 2026 15:52
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc
Copy link
Copy Markdown
Contributor Author

sagearc commented Mar 10, 2026

cc @DarkLight1337

async def load_lora_adapter(
self, request: LoadLoRAAdapterRequest, base_model_name: str | None = None
) -> ErrorResponse | str:
if self.engine_client is None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this somewhat goes against #36536 (comment)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 Oh that wasn't my intention, I tried to unify the engine-free path with the default one, since the model check in the original openai models can dynamically load loras...

What I'm trying to achieve here and in the previous PR is openai render sharing the same openai models as the rest of the serving classes. Since the full model check can include engine operations, I came with inheritance/shared object for both paths, since composition would cause the renderer to have incomplete model check. There's also a chance I didn't fully get you intentions, in that case I'd love for a clarification here :)

Thanks for taking the time to review these

Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to achieve a cleaner separation between the code paths with vs. without engine client. In that case I prefer the previous PR #36536

Copy link
Copy Markdown
Contributor

@gambletan gambletan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean refactor moving model-checking logic from OpenAIServingEngine._check_model into OpenAIServingModels.check_model, and allowing engine_client=None for render-only mode. The separation of concerns is good.

A few observations:

  1. _is_model_supported behavior change in engine/serving.py: The original code returned True when model_name is None (falsy), but now that check has been moved to OpenAIServingModels.is_base_model. The remaining _is_model_supported in engine/serving.py (line ~1173) now delegates directly to self.models.is_base_model(model_name) without the early None guard. Since is_base_model now handles None correctly, this works, but _is_model_supported is still called in other places — worth verifying all call sites still get the expected NoneTrue behavior.

  2. create_error_response in models/serving.py: The param keyword argument was added to the module-level create_error_response function. Since ErrorInfo may not have a param field in all versions, it would be good to confirm this field exists on ErrorInfo. If ErrorInfo uses a strict Pydantic model, passing an unexpected field could raise a validation error.

  3. pooling/base/serving.py: The change from models.renderer to engine_client.renderer makes sense given that renderer was removed from OpenAIServingModels, but this means engine_client can never be None when pooling is used. The type signature of OpenAIServingModels.__init__ now accepts engine_client: EngineClient | None, but pooling code would crash with AttributeError if None were passed. A defensive check or a comment noting this constraint would be helpful.

  4. LoRA guard duplication: Both load_lora_adapter and resolve_lora now have identical if self.engine_client is None: return create_error_response(...) blocks. Consider extracting this into a small helper like _require_engine_client() to reduce duplication.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the OpenAIServingModels class to allow it to be instantiated without an engine_client, which is a key change for supporting a render-only mode. The logic for checking model validity, including LoRA resolution, has been consolidated into OpenAIServingModels, simplifying other parts of the codebase like OpenAIServing and OpenAIServingRender which now delegate these checks.

The changes are well-structured and improve the modularity of the code. I've added one suggestion to further improve maintainability by avoiding duplicated code for a critical check that guards LoRA operations in the new render-only mode.

Note: Security Review did not run due to the size of the PR.

Comment on lines +163 to +168
if self.engine_client is None:
return create_error_response(
message="LoRA adapters are not supported in render-only mode.",
err_type="BadRequestError",
status_code=HTTPStatus.BAD_REQUEST,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To avoid duplicating this check for engine_client is None in both load_lora_adapter and resolve_lora, consider extracting it into a helper method. This improves maintainability and ensures consistency if more LoRA-related methods are added in the future. This is a critical guard for the new render-only mode, and centralizing it reduces the risk of errors.

For example, you could add a private method:

    def _check_lora_supported(self) -> ErrorResponse | None:
        """Return an error if LoRA adapters are not supported, else None."""
        if self.engine_client is None:
            return create_error_response(
                message="LoRA adapters are not supported in render-only mode.",
                err_type="BadRequestError",
                status_code=HTTPStatus.BAD_REQUEST,
            )
        return None

And then call it from both load_lora_adapter and resolve_lora:

        if (error := self._check_lora_supported()) is not None:
            return error

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sagearc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 11, 2026
@sagearc
Copy link
Copy Markdown
Contributor Author

sagearc commented Mar 11, 2026

@gambletan Appreciate the review! I'll take a look

@sagearc
Copy link
Copy Markdown
Contributor Author

sagearc commented Mar 11, 2026

Closed in favor of #36536

@sagearc sagearc closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants