[Feat] Add basic integration path for semantic router #740

zerofishnoodles · 2025-10-27T18:43:48Z

Add basic integration path for semantic router

Related to #519 #734
Related to vllm-project/semantic-router#295

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

gemini-code-assist · 2025-10-27T18:44:18Z

Summary of Changes

Hello @zerofishnoodles, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for integrating a semantic router into the vLLM Production Stack. The changes encompass significant updates to deployment configurations, a shift in Python dependency management tooling, and enhancements to the application's lifecycle management for routing components. The primary goal is to enable more intelligent request routing and provide clear documentation for users to leverage this new capability.

Highlights

Semantic Router Integration: Introduced the basic integration path for the vLLM Semantic Router, enabling advanced routing capabilities based on request semantics.
Helm Chart Updates: Modified several Helm value files (values-06-session-routing.yaml, values-07-prefix-routing.yaml, values-08-roundrobin-routing.yaml, values-09-kvaware-routing.yaml, values-10-disagg-prefill.yaml, values-17-kv-aware.yaml) to align with the new semantic routing setup, including updated model tags, resource requests, and lmcacheConfig parameters. A new values-23-SR.yaml was added for the semantic router tutorial.
Dependency Management Migration: The Dockerfile was updated to switch from pip to uv for dependency installation, aiming for faster and more reliable builds. The ENTRYPOINT was also adjusted to reflect this change.
Routing Logic Lifecycle Management: Implemented cleanup_routing_logic to ensure proper shutdown and cleanup of routing logic instances during application lifecycle events, preventing resource leaks. The KvawareRouter now includes a close method for graceful termination of its cluster monitor task.
Request Header Sanitization: Added logic to sanitize request headers by removing hop-by-hop headers before forwarding requests to backend services, improving proxy behavior and preventing potential issues.
New Tutorial Document: A comprehensive tutorial (tutorials/24-semantic-router-integration.md) was added to guide users through deploying and testing the vLLM Semantic Router integration with the vLLM Production Stack.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/router-e2e-test.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an integration path for the vLLM Semantic Router, which is a significant new feature. The changes include extensive updates to Helm chart values across various routing configurations to support new lmcache features, and a new tutorial document has been added to guide users through the integration. Key code changes involve refactoring the routing logic management for more robust cleanup and reconfiguration, improving request proxying by sanitizing hop-by-hop headers, and modernizing the Docker build process by switching to uv. My review has identified a critical issue in the service_discovery module where a blocking call is used within an async function, which will freeze the event loop. I've also provided suggestions to improve the clarity of the new tutorial and the robustness of the Dockerfile. Overall, these are great additions, but the critical issue needs to be addressed.

src/vllm_router/service_discovery.py

docker/Dockerfile

tutorials/24-semantic-router-integration.md

Shaoting-Feng · 2025-10-28T17:50:11Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces integration with the vLLM Semantic Router, adding a new tutorial and making related code adjustments. The modifications to the Helm chart and the request service are beneficial. The new tutorial is thorough, but I've provided some suggestions to enhance its clarity and reliability. Additionally, I've pointed out a potential issue in the sample values.yaml that might lead to deployment problems, along with a suggestion for a performance enhancement in the request handling code.

tutorials/assets/values-23-SR.yaml

src/vllm_router/services/request_service/request.py

tutorials/24-semantic-router-integration.md

tutorials/assets/values-23-SR.yaml

Shaoting-Feng · 2025-10-28T17:53:30Z

Please check whether the Gemini reviews need any fixes. Resolve all conversations that you believe are fine. Once everything is resolved, I’ll approve this PR.

Signed-off-by: Rui Zhang <[email protected]>

zerofishnoodles · 2025-10-28T21:29:45Z

@Shaoting-Feng gemini fixed

Shaoting-Feng

LGTM

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

src/vllm_router/service_discovery.py Outdated Show resolved Hide resolved

docker/Dockerfile Outdated Show resolved Hide resolved

tutorials/24-semantic-router-integration.md Outdated Show resolved Hide resolved

zerofishnoodles force-pushed the feat/SR-integration branch from 214b973 to 985dacf Compare October 27, 2025 18:48

gemini-code-assist bot reviewed Oct 28, 2025

View reviewed changes

Shaoting-Feng enabled auto-merge (squash) October 28, 2025 19:29

zerofishnoodles added 3 commits October 28, 2025 20:18

add integration for sr

3611545

Signed-off-by: Rui Zhang <[email protected]>

gemini fix

dbcc6fa

Signed-off-by: Rui Zhang <[email protected]>

gemini fix

719ba39

Signed-off-by: Rui Zhang <[email protected]>

zerofishnoodles force-pushed the feat/SR-integration branch from b6aab6f to 719ba39 Compare October 28, 2025 20:19

Shaoting-Feng approved these changes Oct 28, 2025

View reviewed changes

Shaoting-Feng merged commit 528f6d8 into vllm-project:main Oct 28, 2025
14 checks passed

nithin8702 mentioned this pull request Oct 29, 2025

Bumping version to 0.1.8 #738

Open

3 tasks

[Feat] Add basic integration path for semantic router #740

[Feat] Add basic integration path for semantic router #740

Conversation

zerofishnoodles commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Shaoting-Feng commented Oct 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Shaoting-Feng commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zerofishnoodles commented Oct 28, 2025

Uh oh!

Shaoting-Feng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zerofishnoodles commented Oct 27, 2025 •

edited

Loading

Shaoting-Feng commented Oct 28, 2025 •

edited

Loading