[Frontend] Split `OpenAIServingModels` into `OpenAIModelRegistry` + `OpenAIServingModels` by sagearc · Pull Request #36536 · vllm-project/vllm

sagearc · 2026-03-09T20:20:38Z

Purpose

After #36166, OpenAIServingRender (a CPU-only, engine-free handler) was receiving a bare list[str] of model names and reimplementing model lookup logic (_check_model, _is_model_supported, show_available_models).

This PR extracts OpenAIModelRegistry — a lightweight, engine-free class for base-model verification — and wires it into the render path via composition.

Changes

OpenAIModelRegistry (new): read-only base-model registry with no engine/LoRA dependency. Provides check_model, show_available_models, is_base_model, model_name.
OpenAIServingModels: composes OpenAIModelRegistry via self.registry; delegates base-model ops to it, layers LoRA adapter CRUD on top.
OpenAIServing._check_model: unchanged — retains the full LoRA-aware verification logic (static + runtime resolution).
OpenAIServingRender: accepts model_registry: OpenAIModelRegistry instead of served_model_names: list[str]; removes duplicate _check_model, _is_model_supported, and show_available_models.
/v1/models endpoint: return type widened to OpenAIModelRegistry | OpenAIServingModels.
init_render_app_state: constructs OpenAIModelRegistry directly for the render-only server.

Test Plan

Existing tests cover the affected code paths.

Test Result

Pre-commit checks pass.

Essential Elements of an Effective PR Description Checklist

mergify · 2026-03-09T20:25:06Z

Hi @sagearc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

gemini-code-assist

Code Review

The code refactors the OpenAI API entrypoints by introducing an OpenAIModelRegistry class to centralize base model management and checks, with OpenAIServingModels now inheriting from it to handle LoRA-specific logic. This change streamlines model validation and listing across various API components, including api_server, engine/serving, generate/api_router, and render/serving, by delegating these operations to the new model_registry object. A review comment pointed out a potential IndexError in OpenAIModelRegistry.model_name if base_model_paths is empty, suggesting an explicit check for improved robustness.

vllm/entrypoints/openai/models/serving.py

DarkLight1337 · 2026-03-10T03:34:51Z

vllm/entrypoints/openai/models/serving.py

+        )
+
+
+class OpenAIServingModels(OpenAIModelRegistry):


I prefer composition over inheritance here

Since OpenAIServingModels requires an engine client, OpenAIServingRender relies on the base OpenAIModelRegistry to stay engine-free. Inheritance allows the overridden check_model to still pick up loras automatically during serving. How would you recommend structuring this dependency with composition?

I mean that OpenAIServingModels can contain an instance of OpenAIModelRegistry. There is no need to change OpenAIServingRender itself.

I understand using composition there. My main hesitation is how that interacts with the renderer. If my understanding is correct, passing the contained OpenAIModelRegistry to OpenAIServingRender would mean the renderer only checks for base models.

I originally structured it this way to delegate both preprocessing and the model check from the serving layer to the renderer, to avoid having two separate entry points for preprocessing (one with the check, and one without, ref). I might be missing a cleaner way to wire this up though

LoRA doesn't affect Renderer itself, though I understand from the client's perspective they should be able to use the LoRA model for both components. Perhaps we need to integrate this logic even in the engine-less case then.

After taking a further look, choosing composition here won't reduce the need for a duplicate check_model in both OpenAIServingRender and the serving layers, since check_model can dynamically load LoRAs.
What if, instead of the current approach, we revert the changes introduced in this PR and allow the engine client to be None in OpenAIServingModels?

Can you open a new PR to show what that looks like?

Something like that?
#36655
@DarkLight1337

mergify · 2026-03-10T11:13:35Z

Hi @sagearc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-03-10T11:38:27Z

Hi @sagearc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-03-11T03:43:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sagearc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

sagearc · 2026-03-11T19:14:08Z

@DarkLight1337 kept changes minimal, registry handles the base models while keeping lora related logic untouched

…IServingModels Introduce OpenAIModelRegistry as a lightweight, engine-free base class for model verification (check_model, show_available_models), suitable for CPU-only / render-only contexts with no LoRA support. OpenAIServingModels composes an OpenAIModelRegistry and layers LoRA adapter CRUD on top. OpenAIServing._check_model retains the full LoRA-aware verification logic (static + runtime resolution). OpenAIServingRender now accepts model_registry: OpenAIModelRegistry instead of served_model_names: list[str], removing duplicated model checking and show_available_models code. Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

DarkLight1337

This looks better, thanks

…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

sagearc requested review from DarkLight1337, aarnphm, chaunceyjiang, njhill and russellb as code owners March 9, 2026 20:20

mergify bot added the frontend label Mar 9, 2026

sagearc mentioned this pull request Mar 9, 2026

[Frontend] Delegate preprocessing to OpenAIServingRender #36483

Merged

5 tasks

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

vllm/entrypoints/openai/models/serving.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Mar 10, 2026

View reviewed changes

sagearc mentioned this pull request Mar 10, 2026

[Frontend] Allow engine_client=None in OpenAIServingModels #36655

Closed

5 tasks

mergify bot added the needs-rebase label Mar 11, 2026

sagearc force-pushed the split-openai-serving-models branch from 48a8a69 to d230fc0 Compare March 11, 2026 18:58

mergify bot removed the needs-rebase label Mar 11, 2026

sagearc force-pushed the split-openai-serving-models branch 3 times, most recently from 4a5083e to 02dc788 Compare March 11, 2026 19:10

sagearc requested a review from DarkLight1337 March 11, 2026 19:14

sagearc added 3 commits March 11, 2026 21:57

restore init_static_loras to its original position

a98ca7a

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

type hint

aa2d318

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc force-pushed the split-openai-serving-models branch from 02dc788 to aa2d318 Compare March 11, 2026 19:57

Merge branch 'main' into split-openai-serving-models

3d92970

DarkLight1337 approved these changes Mar 12, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 12, 2026 06:40

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 12, 2026

Merge branch 'main' into split-openai-serving-models

78e81c3

vllm-bot merged commit 06e0bc2 into vllm-project:main Mar 12, 2026
42 of 45 checks passed

sagearc deleted the split-openai-serving-models branch March 12, 2026 10:56

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026

[Frontend] Split OpenAIServingModels into OpenAIModelRegistry + `…

4fb635f

…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Frontend] Split OpenAIServingModels into OpenAIModelRegistry + `…

0b0ad2e

…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[Frontend] Split OpenAIServingModels into OpenAIModelRegistry + `…

e74a1fb

…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[Frontend] Split OpenAIServingModels into OpenAIModelRegistry + `…

3e4e3b9

…OpenAIServingModels` (vllm-project#36536) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

Uh oh!

Conversation

sagearc commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Test Plan

Test Result

Uh oh!

mergify bot commented Mar 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sagearc Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 10, 2026

Uh oh!

mergify bot commented Mar 10, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

sagearc commented Mar 11, 2026

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sagearc commented Mar 9, 2026 •

edited

Loading

sagearc Mar 10, 2026 •

edited

Loading