feat(selection): implement advanced model selection methods#1089
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
2dbb18f to
8b2fb3d
Compare
2bfb968 to
ee4e664
Compare
ee4e664 to
ee8f655
Compare
|
@asaadbalum nice work! elo is a real cool metrics. I would imagine that we can use the feedback classifier to complement user voting and automate the ranking process. Would you like to follow up with a PR to add prom metrics for explainability and tracebility so we can infer the evolution of selector overtime. |
There was a problem hiding this comment.
Pull request overview
This pull request implements advanced model selection algorithms for intelligent LLM routing, enabling the semantic router to choose the best model from multiple candidates based on learned preferences, query similarity, and cost-quality optimization. The implementation adds five selection methods: Static (baseline), Elo rating, RouterDC (dual-contrastive), AutoMix (POMDP-based), and Hybrid (combined approach).
Changes:
- New
pkg/selection/package with core selection interfaces and multiple algorithm implementations - Integration into the routing pipeline via
req_filter_classification.go - Configuration structs added to support the new selection methods
- Comprehensive test coverage and demo application
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/selection/selector.go |
Core interfaces and types for selection framework |
pkg/selection/elo.go |
Elo rating system using Bradley-Terry model |
pkg/selection/router_dc.go |
Dual-contrastive query-to-model matching |
pkg/selection/automix.go |
POMDP-based cost-quality optimization |
pkg/selection/hybrid.go |
Combines multiple methods with weighted scores |
pkg/selection/static.go |
Baseline static selection (backwards compatible) |
pkg/selection/factory.go |
Factory pattern for creating selectors |
pkg/selection/metrics.go |
Prometheus metrics for observability |
pkg/selection/selector_test.go |
Comprehensive unit tests |
pkg/extproc/router.go |
Initialize selection registry |
pkg/extproc/req_filter_classification.go |
Integration with routing pipeline |
pkg/config/config.go |
Configuration structs for selection methods |
cmd/selection-demo/main.go |
Demo application |
config/intelligent-routing/in-tree/model_selection_demo.yaml |
Example configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // contains checks if a string contains a substring (case-insensitive) | ||
| func contains(s, substr string) bool { | ||
| for i := 0; i <= len(s)-len(substr); i++ { | ||
| if equalFold(s[i:i+len(substr)], substr) { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } | ||
|
|
||
| // equalFold compares two strings case-insensitively | ||
| func equalFold(a, b string) bool { | ||
| if len(a) != len(b) { | ||
| return false | ||
| } | ||
| for i := range a { | ||
| ca, cb := a[i], b[i] | ||
| if ca >= 'A' && ca <= 'Z' { | ||
| ca += 'a' - 'A' | ||
| } | ||
| if cb >= 'A' && cb <= 'Z' { | ||
| cb += 'a' - 'A' | ||
| } | ||
| if ca != cb { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
The contains and equalFold functions reimplement functionality available in Go's standard library. Use strings.Contains with strings.ToLower for the contains check, or strings.EqualFold for case-insensitive comparison. This reduces code complexity and uses well-tested standard library functions.
| h.config.EloWeight, h.config.RouterDCWeight, h.config.AutoMixWeight, h.config.CostWeight) | ||
|
|
||
| if len(parts) > 0 { | ||
| return fmt.Sprintf("Hybrid combination: %s, %s", parts, weightsStr) |
There was a problem hiding this comment.
The parts variable is a []string slice being used with %s format specifier, which will print the slice representation (e.g., [Elo=0.500 RouterDC=0.600]) rather than a formatted string. Use strings.Join(parts, ", ") to properly format the component scores as a comma-separated string.
| {"0.5b", 0.5}, | ||
| } | ||
|
|
||
| modelLower := model |
There was a problem hiding this comment.
Variable modelLower is assigned the value of model but never converted to lowercase, despite the function name and comment indicating case-insensitive matching. Add modelLower = strings.ToLower(model) or use the standard library functions as suggested in Comment 1.
| eloCfg := cfg.IntelligentRouting.ModelSelection.Elo | ||
| modelSelectionCfg.Elo = &selection.EloConfig{ | ||
| InitialRating: eloCfg.InitialRating, | ||
| KFactor: eloCfg.KFactor, | ||
| CategoryWeighted: eloCfg.CategoryWeighted, | ||
| DecayFactor: eloCfg.DecayFactor, | ||
| MinComparisons: eloCfg.MinComparisons, | ||
| CostScalingFactor: eloCfg.CostScalingFactor, | ||
| } | ||
|
|
||
| // Copy RouterDC config | ||
| routerDCCfg := cfg.IntelligentRouting.ModelSelection.RouterDC | ||
| modelSelectionCfg.RouterDC = &selection.RouterDCConfig{ | ||
| Temperature: routerDCCfg.Temperature, | ||
| DimensionSize: routerDCCfg.DimensionSize, | ||
| MinSimilarity: routerDCCfg.MinSimilarity, | ||
| UseQueryContrastive: routerDCCfg.UseQueryContrastive, | ||
| UseModelContrastive: routerDCCfg.UseModelContrastive, | ||
| } | ||
|
|
||
| // Copy AutoMix config | ||
| autoMixCfg := cfg.IntelligentRouting.ModelSelection.AutoMix | ||
| modelSelectionCfg.AutoMix = &selection.AutoMixConfig{ | ||
| VerificationThreshold: autoMixCfg.VerificationThreshold, | ||
| MaxEscalations: autoMixCfg.MaxEscalations, | ||
| CostAwareRouting: autoMixCfg.CostAwareRouting, | ||
| CostQualityTradeoff: autoMixCfg.CostQualityTradeoff, | ||
| DiscountFactor: autoMixCfg.DiscountFactor, | ||
| UseLogprobVerification: autoMixCfg.UseLogprobVerification, | ||
| } | ||
|
|
||
| // Copy Hybrid config | ||
| hybridCfg := cfg.IntelligentRouting.ModelSelection.Hybrid | ||
| modelSelectionCfg.Hybrid = &selection.HybridConfig{ | ||
| EloWeight: hybridCfg.EloWeight, | ||
| RouterDCWeight: hybridCfg.RouterDCWeight, | ||
| AutoMixWeight: hybridCfg.AutoMixWeight, | ||
| CostWeight: hybridCfg.CostWeight, | ||
| QualityGapThreshold: hybridCfg.QualityGapThreshold, | ||
| NormalizeScores: hybridCfg.NormalizeScores, | ||
| } |
There was a problem hiding this comment.
The manual field-by-field copying of configuration structs is verbose and error-prone. Consider either reusing the same struct types between packages (if appropriate) or implementing helper conversion methods to reduce duplication and make maintenance easier when fields are added or modified.
| // Convert static scores to Elo ratings (scale 0-1 -> 1000-2000) | ||
| rating := 1000.0 + (ms.Score * 1000.0) |
There was a problem hiding this comment.
The magic numbers 1000.0 should be defined as named constants (e.g., MinEloRatingFromScore and EloRatingRange) to clarify the conversion formula and make it easier to adjust if needed.
| // Build selection context with cost/quality weights from config | ||
| costWeight := r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff | ||
| qualityWeight := 1.0 - costWeight // Quality is complement of cost |
There was a problem hiding this comment.
The cost weight is being read from the AutoMix configuration even when using other selection methods (e.g., Hybrid, Elo). Consider using a top-level or context-specific cost weight configuration that applies across all methods, or document why AutoMix's cost weight is used globally.
| // Build selection context with cost/quality weights from config | |
| costWeight := r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff | |
| qualityWeight := 1.0 - costWeight // Quality is complement of cost | |
| // Build selection context with cost/quality weights | |
| // For AutoMix, use the AutoMix-specific cost/quality tradeoff from config. | |
| // For other methods, use neutral default weights to avoid coupling them to AutoMix config. | |
| var costWeight float64 | |
| var qualityWeight float64 | |
| if method == selection.MethodAutoMix { | |
| costWeight = r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff | |
| if costWeight < 0.0 { | |
| costWeight = 0.0 | |
| } else if costWeight > 1.0 { | |
| costWeight = 1.0 | |
| } | |
| qualityWeight = 1.0 - costWeight | |
| } else { | |
| // Default to equal weighting when method does not define its own cost/quality config | |
| costWeight = 0.5 | |
| qualityWeight = 0.5 | |
| } |
| defer a.valueMu.Unlock() | ||
|
|
||
| // Simple value iteration: V(s) = R(s) + γ * max_a E[V(s')] | ||
| for model, cap := range a.capabilities { |
There was a problem hiding this comment.
The updateValueFunction method locks valueMu but iterates over a.capabilities without holding capMu. This creates a potential race condition if capabilities are modified concurrently. Either hold both locks or ensure capabilities are not modified after initialization.
|
There are some cosmetic changes requested by copilot. In addition, would be great if you can add more unit test to selector to cover more meaningful cases with multi turn elo evolutions. |
ee8f655 to
16a9735
Compare
|
Thanks for the review and approval! 🙏 Re: Prometheus metrics for evolution tracking The proposed metrics include:
Re: Multi-turn Elo tests
These verify the algorithm works correctly when users switch from the default (static) to Elo-based selection. Re: Copilot suggestions
The cosmetic/design items (#4, #6) are noted for future consideration. |
|
@asaadbalum can you align the API design with the dicussion? |
like what in looper, we put the model selection in algorithm per decision |
| _, _ = logging.InitLoggerFromEnv() | ||
| } | ||
|
|
||
| func main() { |
There was a problem hiding this comment.
i think we need to unify the code into extproc? not a main.go here?
|
@Xunzhuo I see, agreed. Will restructure to move algorithm config per-decision, aligning with looper's pattern. Regarding |
ce5ee96 to
e295a35
Compare
e295a35 to
cc54c10
Compare
Done. |
| @@ -0,0 +1,270 @@ | |||
| /* | |||
| Selection Demo - Demonstrates advanced model selection methods | |||
There was a problem hiding this comment.
looks like it is still in cmd/select-demo/main.go?
|
|
||
| # Global Model Selection (fallback for decisions without algorithm) | ||
| # Options: "static", "elo", "router_dc", "automix", "hybrid" | ||
| model_selection: |
There was a problem hiding this comment.
my question is why we need two part configurations? one in root, one per decsion? can we make this per decsion only? this will make the config clear.
Let me know and I'll update accordingly. |
|
Thanks, i prefer A) Move to examples/selection/main.go |
Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com>
cc54c10 to
e5a4135
Compare
|
@Xunzhuo Done! Changes made per your feedback: 1. Config: Removed global decisions:
- name: tech
modelRefs:
- model: "llama3.2:3b"
- model: "phi4"
algorithm:
type: "elo"
elo:
k_factor: 32
category_weighted: true2. Demo: Moved to Let me know if anything else needs adjustment. |
|
|
||
| // ModelSelection configures the algorithm used for model selection | ||
| // Supported methods: "static", "elo", "router_dc", "automix", "hybrid" | ||
| ModelSelection ModelSelectionConfig `yaml:"model_selection,omitempty"` |
There was a problem hiding this comment.
nit: should we remove this as a follow-up
|
thanks! some follow-up: P0:
P1:
|
|
Thanks @Xunzhuo for the detailed follow-up! Re: Your questions:
Follow-up Plan (2 issues to keep it focused):
Also noting #1093 tracks metrics, and #38 (Dynamic Scoring) comes after per your guidance. Shall I proceed with creating these now. |
|
Created follow-up issues as discussed:
Also noting #1093 for Prometheus metrics is already tracked. Will start on #1102 after the current items. Thanks for the guidance! 🚀 |
…ject#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com>
…ject#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com> Signed-off-by: Scanf-s <sullung2yo@gmail.com>
… config inline (#1100) * fix: Refactor Redis and Milvus Cache Config into config.go, Update CacheOption initializer to handle the new configuration approach Signed-off-by: Scanf-s <sullung2yo@gmail.com> * fix: Add fallback logic when proper redis or milvus configuration does not given Signed-off-by: Scanf-s <sullung2yo@gmail.com> * docs: Add sample inline redis configuration example Signed-off-by: Scanf-s <sullung2yo@gmail.com> * docs: Update cache configuration examples Signed-off-by: Scanf-s <sullung2yo@gmail.com> * fix: Update HybridCache Milvus configuration Signed-off-by: Scanf-s <sullung2yo@gmail.com> * chore: Apply code linter Signed-off-by: Scanf-s <sullung2yo@gmail.com> * Feat(selection): implement advanced model selection methods (#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com> Signed-off-by: Scanf-s <sullung2yo@gmail.com> * feat: Add inline cache configuration unit tests Signed-off-by: Scanf-s <sullung2yo@gmail.com> * feat: Add cache unit tests Signed-off-by: Scanf-s <sullung2yo@gmail.com> --------- Signed-off-by: Scanf-s <sullung2yo@gmail.com> Signed-off-by: asaadbalum <asaad.balum@gmail.com> Co-authored-by: asaadbalum <154635253+asaadbalum@users.noreply.github.com>
…ject#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com>
… config inline (vllm-project#1100) * fix: Refactor Redis and Milvus Cache Config into config.go, Update CacheOption initializer to handle the new configuration approach Signed-off-by: Scanf-s <sullung2yo@gmail.com> * fix: Add fallback logic when proper redis or milvus configuration does not given Signed-off-by: Scanf-s <sullung2yo@gmail.com> * docs: Add sample inline redis configuration example Signed-off-by: Scanf-s <sullung2yo@gmail.com> * docs: Update cache configuration examples Signed-off-by: Scanf-s <sullung2yo@gmail.com> * fix: Update HybridCache Milvus configuration Signed-off-by: Scanf-s <sullung2yo@gmail.com> * chore: Apply code linter Signed-off-by: Scanf-s <sullung2yo@gmail.com> * Feat(selection): implement advanced model selection methods (vllm-project#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com> Signed-off-by: Scanf-s <sullung2yo@gmail.com> * feat: Add inline cache configuration unit tests Signed-off-by: Scanf-s <sullung2yo@gmail.com> * feat: Add cache unit tests Signed-off-by: Scanf-s <sullung2yo@gmail.com> --------- Signed-off-by: Scanf-s <sullung2yo@gmail.com> Signed-off-by: asaadbalum <asaad.balum@gmail.com> Co-authored-by: asaadbalum <154635253+asaadbalum@users.noreply.github.com>

Advanced Model Selection Methods
Summary
Implement advanced model selection algorithms for intelligent routing, enabling the semantic router to choose the best LLM from multiple candidates based on learned preferences, query similarity, and cost-quality optimization.
Fixes #987
What Changed
New Package:
pkg/selection/selector.goSelector,SelectionContext,SelectionResultelo.gorouter_dc.goautomix.gohybrid.gostatic.gofactory.gometrics.goModified Files
pkg/config/config.goAlgorithmConfigwith selection typespkg/extproc/router.gopkg/extproc/req_filter_classification.goConfiguration (Per-Decision Only - Aligned with Looper Pattern)
Each decision specifies its own algorithm:
Default Behavior (Backwards Compatible)
algorithm→ uses static selection (first model)Testing
go test ./pkg/selection/...)go build ./...)cd src/semantic-router && go run ./examples/selection/main.goProduction Logging
VSR logs show selection decisions for every request:
Appendix
A. Demo Output
Elo Rating Selection
AutoMix Selection
RouterDC Selection
Hybrid Selection
B. Running the Demo
cd src/semantic-router go run ./examples/selection/main.goTweaking Parameters
In demo script (
examples/selection/main.go):costQualityTradeoff(~line 160): 0.0=quality, 1.0=costIn config:
Edit
config/intelligent-routing/in-tree/model_selection_demo.yamland restart VSR.Demo Script for Future Enhancements
The demo is extensible:
C. Future Enhancements (Not in This PR)
UpdateFeedback()method readySetModelEmbedding()APID. Reference Papers