-
Notifications
You must be signed in to change notification settings - Fork 302
Description
Is your feature request related to a problem? Please describe.
Currently, vLLM Semantic Router operates as a standalone intelligent routing layer that can route requests to various LLM endpoints. However, it lacks deep integration with the official vLLM Production Stack, which is the reference system for production vLLM deployments. This creates several challenges:
- Deployment Complexity: Users must manually configure and deploy semantic router alongside their vLLM production stack, leading to complex multi-service orchestration
- Configuration Duplication: Model configurations, endpoints, and scaling parameters must be maintained separately in both systems
- Monitoring Fragmentation: Metrics and observability are split between the semantic router and vLLM production stack monitoring systems
- Resource Inefficiency: Lack of coordinated resource management between intelligent routing decisions and vLLM's auto-scaling capabilities
- Operational Overhead: Separate lifecycle management, updates, and troubleshooting for two distinct but related systems
Leverage both intelligent semantic routing and production-grade vLLM inference face significant integration challenges, preventing them from realizing the full benefits of both systems.
Describe the solution you'd like
It would be great to integrate vLLM Semantic Router directly into the vLLM Production Stack as a optional component, providing intelligent routing capabilities within the production-ready vLLM ecosystem.