Skip to content

[R3] Add routed experts to openai entrypoint #38939

Open
hao-aaron wants to merge 3 commits intovllm-project:mainfrom
hao-aaron:r3-entrypoint
Open

[R3] Add routed experts to openai entrypoint #38939
hao-aaron wants to merge 3 commits intovllm-project:mainfrom
hao-aaron:r3-entrypoint

Conversation

@hao-aaron
Copy link
Copy Markdown
Contributor

@hao-aaron hao-aaron commented Apr 3, 2026

Purpose

Adds routed experts introduced in #28284 to openai entrypoint

Test Plan

new unit tests

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@mergify mergify Bot added the frontend label Apr 3, 2026
@hao-aaron hao-aaron marked this pull request as ready for review April 3, 2026 19:26
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the functionality to return routed expert indices in OpenAI-compatible chat and completion responses. It adds a routed_experts field to the response protocols and updates the serving logic to populate this field from the model output when the --enable-return-routed-experts flag is enabled. Additionally, a new test suite is included to verify the correct shape and values of the returned expert data. I have no feedback to provide.

Copy link
Copy Markdown
Contributor

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is really needed here is support in the tokens-in-tokens-out /inference/v1/generate endpoint.

Can you replicate the modifications here:

class ServingTokens(OpenAIServing):

class GenerateResponseChoice(BaseModel):

and add a test?

x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@hao-aaron hao-aaron requested a review from njhill as a code owner April 7, 2026 17:43
Copy link
Copy Markdown
Contributor

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants