Skip to content

Conversation

@zerofishnoodles
Copy link
Collaborator

@zerofishnoodles zerofishnoodles commented Oct 27, 2025

Add basic integration path for semantic router

Related to #519 #734
Related to vllm-project/semantic-router#295

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zerofishnoodles, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for integrating a semantic router into the vLLM Production Stack. The changes encompass significant updates to deployment configurations, a shift in Python dependency management tooling, and enhancements to the application's lifecycle management for routing components. The primary goal is to enable more intelligent request routing and provide clear documentation for users to leverage this new capability.

Highlights

  • Semantic Router Integration: Introduced the basic integration path for the vLLM Semantic Router, enabling advanced routing capabilities based on request semantics.
  • Helm Chart Updates: Modified several Helm value files (values-06-session-routing.yaml, values-07-prefix-routing.yaml, values-08-roundrobin-routing.yaml, values-09-kvaware-routing.yaml, values-10-disagg-prefill.yaml, values-17-kv-aware.yaml) to align with the new semantic routing setup, including updated model tags, resource requests, and lmcacheConfig parameters. A new values-23-SR.yaml was added for the semantic router tutorial.
  • Dependency Management Migration: The Dockerfile was updated to switch from pip to uv for dependency installation, aiming for faster and more reliable builds. The ENTRYPOINT was also adjusted to reflect this change.
  • Routing Logic Lifecycle Management: Implemented cleanup_routing_logic to ensure proper shutdown and cleanup of routing logic instances during application lifecycle events, preventing resource leaks. The KvawareRouter now includes a close method for graceful termination of its cluster monitor task.
  • Request Header Sanitization: Added logic to sanitize request headers by removing hop-by-hop headers before forwarding requests to backend services, improving proxy behavior and preventing potential issues.
  • New Tutorial Document: A comprehensive tutorial (tutorials/24-semantic-router-integration.md) was added to guide users through deploying and testing the vLLM Semantic Router integration with the vLLM Production Stack.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/router-e2e-test.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an integration path for the vLLM Semantic Router, which is a significant new feature. The changes include extensive updates to Helm chart values across various routing configurations to support new lmcache features, and a new tutorial document has been added to guide users through the integration. Key code changes involve refactoring the routing logic management for more robust cleanup and reconfiguration, improving request proxying by sanitizing hop-by-hop headers, and modernizing the Docker build process by switching to uv. My review has identified a critical issue in the service_discovery module where a blocking call is used within an async function, which will freeze the event loop. I've also provided suggestions to improve the clarity of the new tutorial and the robustness of the Dockerfile. Overall, these are great additions, but the critical issue needs to be addressed.

@Shaoting-Feng
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces integration with the vLLM Semantic Router, adding a new tutorial and making related code adjustments. The modifications to the Helm chart and the request service are beneficial. The new tutorial is thorough, but I've provided some suggestions to enhance its clarity and reliability. Additionally, I've pointed out a potential issue in the sample values.yaml that might lead to deployment problems, along with a suggestion for a performance enhancement in the request handling code.

@Shaoting-Feng
Copy link
Collaborator

Shaoting-Feng commented Oct 28, 2025

Please check whether the Gemini reviews need any fixes. Resolve all conversations that you believe are fine. Once everything is resolved, I’ll approve this PR.

@Shaoting-Feng Shaoting-Feng enabled auto-merge (squash) October 28, 2025 19:29
Signed-off-by: Rui Zhang <[email protected]>
Signed-off-by: Rui Zhang <[email protected]>
Signed-off-by: Rui Zhang <[email protected]>
@zerofishnoodles
Copy link
Collaborator Author

@Shaoting-Feng gemini fixed

Copy link
Collaborator

@Shaoting-Feng Shaoting-Feng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Shaoting-Feng Shaoting-Feng merged commit 528f6d8 into vllm-project:main Oct 28, 2025
14 checks passed
@nithin8702 nithin8702 mentioned this pull request Oct 29, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants