feat(selection): implement advanced model selection methods by asaadbalum · Pull Request #1089 · vllm-project/semantic-router

asaadbalum · 2026-01-15T13:12:12Z

Advanced Model Selection Methods

Summary

Implement advanced model selection algorithms for intelligent routing, enabling the semantic router to choose the best LLM from multiple candidates based on learned preferences, query similarity, and cost-quality optimization.

Fixes #987

What Changed

New Package: `pkg/selection/`

File	Purpose
`selector.go`	Core interfaces: `Selector`, `SelectionContext`, `SelectionResult`
`elo.go`	Elo rating with Bradley-Terry model
`router_dc.go`	Dual-contrastive query-to-model matching
`automix.go`	POMDP-based cost-quality optimization
`hybrid.go`	Combines all methods with configurable weights
`static.go`	Default behavior (backwards compatible)
`factory.go`	Creates selectors from configuration
`metrics.go`	Prometheus metrics

Modified Files

File	Change
`pkg/config/config.go`	Extended `AlgorithmConfig` with selection types
`pkg/extproc/router.go`	Initialize selection registry
`pkg/extproc/req_filter_classification.go`	Per-decision algorithm support

Configuration (Per-Decision Only - Aligned with Looper Pattern)

Each decision specifies its own algorithm:

decisions:
  - name: tech
    modelRefs:
      - model: "llama3.2:3b"
      - model: "phi4"
      - model: "gemma3:27b"
    algorithm:
      type: "elo"
      elo:
        k_factor: 32
        category_weighted: true

  - name: finance
    algorithm:
      type: "automix"
      automix:
        cost_quality_tradeoff: 0.4

  - name: general
    algorithm:
      type: "hybrid"
      hybrid:
        elo_weight: 0.3
        router_dc_weight: 0.3
        automix_weight: 0.2
        cost_weight: 0.2

Default Behavior (Backwards Compatible)

If decision has no algorithm → uses static selection (first model)
No action required for existing deployments

Testing

✅ Unit tests: All pass (go test ./pkg/selection/...)
✅ Build: Clean (go build ./...)
✅ Integration: Selection wired into extproc routing path
✅ Demo: cd src/semantic-router && go run ./examples/selection/main.go

Production Logging

VSR logs show selection decisions for every request:

[EloSelector] gemma3:27b: rating=1531.3 (W:2 L:0 T:0)
[AutoMix] llama3.2:3b: cost=$0.05, quality=0.70, value=0.6990
[HybridSelector] gemma3:27b: elo=0.3947, dc=0.3333, am=0.9400 → combined=0.5080

Appendix

A. Demo Output

Elo Rating Selection

Query: "Explain quantum computing"
Elo Ratings: llama3.2:3b=1468, phi4=1501, gemma3:27b=1531
→ SELECTED: gemma3:27b (highest rating)

AutoMix Selection

Query: "What is 2+2?" (simple query)
With cost_quality_tradeoff=0.8: → gemma3:27b (quality gap 0.70→0.95 still dominates)
With cost_quality_tradeoff=0.2: → gemma3:27b (quality preferred)
Note: With closer quality scores, cost preference would flip selection.

RouterDC Selection

Query: "Debug this Go function" (code query)
Similarity: phi4=0.334 (best for code)
→ SELECTED: phi4

Hybrid Selection

Query: "Write an efficient sorting algorithm"
Combined: elo=0.395, dc=0.333, am=0.940 → score=0.508
→ SELECTED: gemma3:27b

B. Running the Demo

cd src/semantic-router
go run ./examples/selection/main.go

Tweaking Parameters

In demo script (examples/selection/main.go):

costQualityTradeoff (~line 160): 0.0=quality, 1.0=cost
Model costs (~line 70): Change pricing
Hybrid weights (~line 220): Adjust method balance

In config:
Edit config/intelligent-routing/in-tree/model_selection_demo.yaml and restart VSR.

Demo Script for Future Enhancements

The demo is extensible:

Enhancement	How to Extend
Feedback REST API	Add section calling endpoint, verify Elo updates
Model Embeddings Config	Load from YAML instead of hardcoding
RouterDC Training	Add training loop section

C. Future Enhancements (Not in This PR)

Enhancement	What We Provide Instead
Feedback REST API	`UpdateFeedback()` method ready
Model Embeddings Config	`SetModelEmbedding()` API
Quality Score Config	Default=0.8, configurable via code
True AutoMix Cascading	Pre-selection based on POMDP values
RouterDC Training	Similarity matching with provided embeddings
Prometheus Metrics	See follow-up issue #1093

D. Reference Papers

netlify · 2026-01-15T13:12:18Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`e5a4135`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/696b63b982d0550009f0c950
😎 Deploy Preview	https://deploy-preview-1089--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-01-15T13:12:25Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `config`

Owners: @rootfs, @Xunzhuo
Files changed:

config/intelligent-routing/in-tree/model_selection_demo.yaml

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/examples/selection/main.go
src/semantic-router/pkg/config/config.go
src/semantic-router/pkg/extproc/req_filter_classification.go
src/semantic-router/pkg/extproc/router.go
src/semantic-router/pkg/selection/automix.go
src/semantic-router/pkg/selection/elo.go
src/semantic-router/pkg/selection/factory.go
src/semantic-router/pkg/selection/hybrid.go
src/semantic-router/pkg/selection/metrics.go
src/semantic-router/pkg/selection/router_dc.go
src/semantic-router/pkg/selection/selector.go
src/semantic-router/pkg/selection/selector_test.go
src/semantic-router/pkg/selection/static.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2026-01-15T16:04:42Z

@asaadbalum nice work! elo is a real cool metrics. I would imagine that we can use the feedback classifier to complement user voting and automate the ranking process.

Would you like to follow up with a PR to add prom metrics for explainability and tracebility so we can infer the evolution of selector overtime.

Copilot

Pull request overview

This pull request implements advanced model selection algorithms for intelligent LLM routing, enabling the semantic router to choose the best model from multiple candidates based on learned preferences, query similarity, and cost-quality optimization. The implementation adds five selection methods: Static (baseline), Elo rating, RouterDC (dual-contrastive), AutoMix (POMDP-based), and Hybrid (combined approach).

Changes:

New pkg/selection/ package with core selection interfaces and multiple algorithm implementations
Integration into the routing pipeline via req_filter_classification.go
Configuration structs added to support the new selection methods
Comprehensive test coverage and demo application

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`pkg/selection/selector.go`	Core interfaces and types for selection framework
`pkg/selection/elo.go`	Elo rating system using Bradley-Terry model
`pkg/selection/router_dc.go`	Dual-contrastive query-to-model matching
`pkg/selection/automix.go`	POMDP-based cost-quality optimization
`pkg/selection/hybrid.go`	Combines multiple methods with weighted scores
`pkg/selection/static.go`	Baseline static selection (backwards compatible)
`pkg/selection/factory.go`	Factory pattern for creating selectors
`pkg/selection/metrics.go`	Prometheus metrics for observability
`pkg/selection/selector_test.go`	Comprehensive unit tests
`pkg/extproc/router.go`	Initialize selection registry
`pkg/extproc/req_filter_classification.go`	Integration with routing pipeline
`pkg/config/config.go`	Configuration structs for selection methods
`cmd/selection-demo/main.go`	Demo application
`config/intelligent-routing/in-tree/model_selection_demo.yaml`	Example configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-15T16:11:55Z

+// contains checks if a string contains a substring (case-insensitive)
+func contains(s, substr string) bool {
+	for i := 0; i <= len(s)-len(substr); i++ {
+		if equalFold(s[i:i+len(substr)], substr) {
+			return true
+		}
+	}
+	return false
+}
+
+// equalFold compares two strings case-insensitively
+func equalFold(a, b string) bool {
+	if len(a) != len(b) {
+		return false
+	}
+	for i := range a {
+		ca, cb := a[i], b[i]
+		if ca >= 'A' && ca <= 'Z' {
+			ca += 'a' - 'A'
+		}
+		if cb >= 'A' && cb <= 'Z' {
+			cb += 'a' - 'A'
+		}
+		if ca != cb {
+			return false
+		}
+	}
+	return true
+}


The contains and equalFold functions reimplement functionality available in Go's standard library. Use strings.Contains with strings.ToLower for the contains check, or strings.EqualFold for case-insensitive comparison. This reduces code complexity and uses well-tested standard library functions.

Copilot · 2026-01-15T16:11:55Z

+		h.config.EloWeight, h.config.RouterDCWeight, h.config.AutoMixWeight, h.config.CostWeight)
+
+	if len(parts) > 0 {
+		return fmt.Sprintf("Hybrid combination: %s, %s", parts, weightsStr)


The parts variable is a []string slice being used with %s format specifier, which will print the slice representation (e.g., [Elo=0.500 RouterDC=0.600]) rather than a formatted string. Use strings.Join(parts, ", ") to properly format the component scores as a comma-separated string.

Copilot · 2026-01-15T16:11:55Z

+		{"0.5b", 0.5},
+	}
+
+	modelLower := model


Variable modelLower is assigned the value of model but never converted to lowercase, despite the function name and comment indicating case-insensitive matching. Add modelLower = strings.ToLower(model) or use the standard library functions as suggested in Comment 1.

Copilot · 2026-01-15T16:11:56Z

+	eloCfg := cfg.IntelligentRouting.ModelSelection.Elo
+	modelSelectionCfg.Elo = &selection.EloConfig{
+		InitialRating:     eloCfg.InitialRating,
+		KFactor:           eloCfg.KFactor,
+		CategoryWeighted:  eloCfg.CategoryWeighted,
+		DecayFactor:       eloCfg.DecayFactor,
+		MinComparisons:    eloCfg.MinComparisons,
+		CostScalingFactor: eloCfg.CostScalingFactor,
+	}
+
+	// Copy RouterDC config
+	routerDCCfg := cfg.IntelligentRouting.ModelSelection.RouterDC
+	modelSelectionCfg.RouterDC = &selection.RouterDCConfig{
+		Temperature:         routerDCCfg.Temperature,
+		DimensionSize:       routerDCCfg.DimensionSize,
+		MinSimilarity:       routerDCCfg.MinSimilarity,
+		UseQueryContrastive: routerDCCfg.UseQueryContrastive,
+		UseModelContrastive: routerDCCfg.UseModelContrastive,
+	}
+
+	// Copy AutoMix config
+	autoMixCfg := cfg.IntelligentRouting.ModelSelection.AutoMix
+	modelSelectionCfg.AutoMix = &selection.AutoMixConfig{
+		VerificationThreshold:  autoMixCfg.VerificationThreshold,
+		MaxEscalations:         autoMixCfg.MaxEscalations,
+		CostAwareRouting:       autoMixCfg.CostAwareRouting,
+		CostQualityTradeoff:    autoMixCfg.CostQualityTradeoff,
+		DiscountFactor:         autoMixCfg.DiscountFactor,
+		UseLogprobVerification: autoMixCfg.UseLogprobVerification,
+	}
+
+	// Copy Hybrid config
+	hybridCfg := cfg.IntelligentRouting.ModelSelection.Hybrid
+	modelSelectionCfg.Hybrid = &selection.HybridConfig{
+		EloWeight:           hybridCfg.EloWeight,
+		RouterDCWeight:      hybridCfg.RouterDCWeight,
+		AutoMixWeight:       hybridCfg.AutoMixWeight,
+		CostWeight:          hybridCfg.CostWeight,
+		QualityGapThreshold: hybridCfg.QualityGapThreshold,
+		NormalizeScores:     hybridCfg.NormalizeScores,
+	}


The manual field-by-field copying of configuration structs is verbose and error-prone. Consider either reusing the same struct types between packages (if appropriate) or implementing helper conversion methods to reduce duplication and make maintenance easier when fields are added or modified.

Copilot · 2026-01-15T16:11:56Z

+				// Convert static scores to Elo ratings (scale 0-1 -> 1000-2000)
+				rating := 1000.0 + (ms.Score * 1000.0)


The magic numbers 1000.0 should be defined as named constants (e.g., MinEloRatingFromScore and EloRatingRange) to clarify the conversion formula and make it easier to adjust if needed.

Copilot · 2026-01-15T16:11:56Z

+	// Build selection context with cost/quality weights from config
+	costWeight := r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff
+	qualityWeight := 1.0 - costWeight // Quality is complement of cost


The cost weight is being read from the AutoMix configuration even when using other selection methods (e.g., Hybrid, Elo). Consider using a top-level or context-specific cost weight configuration that applies across all methods, or document why AutoMix's cost weight is used globally.

Suggested change

// Build selection context with cost/quality weights from config

costWeight := r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff

qualityWeight := 1.0 - costWeight // Quality is complement of cost

// Build selection context with cost/quality weights

// For AutoMix, use the AutoMix-specific cost/quality tradeoff from config.

// For other methods, use neutral default weights to avoid coupling them to AutoMix config.

var costWeight float64

var qualityWeight float64

if method == selection.MethodAutoMix {

costWeight = r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff

if costWeight < 0.0 {

costWeight = 0.0

} else if costWeight > 1.0 {

costWeight = 1.0

}

qualityWeight = 1.0 - costWeight

} else {

// Default to equal weighting when method does not define its own cost/quality config

costWeight = 0.5

qualityWeight = 0.5

}

Copilot · 2026-01-15T16:11:57Z

+	defer a.valueMu.Unlock()
+
+	// Simple value iteration: V(s) = R(s) + γ * max_a E[V(s')]
+	for model, cap := range a.capabilities {


The updateValueFunction method locks valueMu but iterates over a.capabilities without holding capMu. This creates a potential race condition if capabilities are modified concurrently. Either hold both locks or ensure capabilities are not modified after initialization.

rootfs · 2026-01-15T16:20:58Z

There are some cosmetic changes requested by copilot. In addition, would be great if you can add more unit test to selector to cover more meaningful cases with multi turn elo evolutions.

asaadbalum · 2026-01-15T16:50:20Z

Thanks for the review and approval! 🙏

Re: Prometheus metrics for evolution tracking
Great suggestion! I've created a follow-up issue: #1093, please review it to sync expectations

The proposed metrics include:

llm_model_elo_rating - Track rating evolution over time
llm_model_feedback_total - Win/loss/tie counts
llm_model_rating_change - Distribution of rating changes

Re: Multi-turn Elo tests
Added three new tests in this update:

TestEloSelector_MultiTurnEvolution - 10 rounds of feedback showing rating convergence
TestEloSelector_TieHandling - Verifies tie feedback handling
TestEloSelector_SelectionFollowsRatings - Confirms selection respects Elo rankings

These verify the algorithm works correctly when users switch from the default (static) to Elo-based selection.

Re: Copilot suggestions
Addressed all bug-level issues:

Fixed format string slice handling (strings.Join)
Fixed case-insensitive comparison (strings.ToLower)
Fixed race condition in updateValueFunction
Added constants for magic numbers

The cosmetic/design items (#4, #6) are noted for future consideration.

Xunzhuo · 2026-01-16T02:31:08Z

@asaadbalum can you align the API design with the dicussion?

asaadbalum · 2026-01-16T06:57:02Z

@Xunzhuo Sure - are you referring to the Feedback API design for Elo rating updates? (As mentioned in #1093)

If so, what aspects would you like to align on?

Xunzhuo · 2026-01-16T06:59:00Z

decisions:
  - name: "general_route"
    description: "Default fallback route for general queries"
    priority: 50
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "code_keywords"
        - type: "domain"
          name: "math"
        - type: "domain"
          name: "other"
        - type: "domain"
          name: "computer science"
    modelRefs:
      - model: "qwen-flash"
        use_reasoning: false
      - model: "qwen3-next-80b-a3b-instruct"
        use_reasoning: true
    algorithm:
      type: "confidence"
      confidence:
        confidence_method: "avg_logprob"
        threshold: 0.92      # Escalate if margin < 0.8
        on_error: "skip"    # Skip failed models, try next
    plugins:
      - type: "semantic-cache"
        configuration:
          enabled: false
          similarity_threshold: 0.85

  - name: "ratings_route"
    description: "Route using ratings algorithm for model selection"
    priority: 60
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "code_keywords"
        - type: "domain"
          name: "other"
    modelRefs:
      - model: "qwen-flash"
        use_reasoning: false
      - model: "qwen3-next-80b-a3b-instruct"
        use_reasoning: true
    algorithm:
      type: "ratings"
      ratings:
        on_error: "skip"    # Skip failed models, try next
    plugins:
      - type: "semantic-cache"
        configuration:
          enabled: false

like what in looper, we put the model selection in algorithm per decision

Xunzhuo · 2026-01-16T07:00:44Z

+	_, _ = logging.InitLoggerFromEnv()
+}
+
+func main() {


i think we need to unify the code into extproc? not a main.go here?

asaadbalum · 2026-01-16T07:29:14Z

@Xunzhuo I see, agreed.

Will restructure to move algorithm config per-decision, aligning with looper's pattern.

Regarding cmd/selection-demo/main.go - this is a standalone utility for testing/demonstrating the selection algorithms, not production code. Happy to move it to examples/ or remove it if you prefer not to have demo binaries in the repo. Let me know your preference.

asaadbalum · 2026-01-16T08:41:39Z

@Xunzhuo I see, agreed.

Will restructure to move algorithm config per-decision, aligning with looper's pattern.

Regarding cmd/selection-demo/main.go - this is a standalone utility for testing/demonstrating the selection algorithms, not production code. Happy to move it to examples/ or remove it if you prefer not to have demo binaries in the repo. Let me know your preference.

Done.

Xunzhuo · 2026-01-16T08:46:19Z

@@ -0,0 +1,270 @@
+/*
+Selection Demo - Demonstrates advanced model selection methods


looks like it is still in cmd/select-demo/main.go?

Xunzhuo · 2026-01-16T08:47:30Z

+
+# Global Model Selection (fallback for decisions without algorithm)
+# Options: "static", "elo", "router_dc", "automix", "hybrid"
+model_selection:


my question is why we need two part configurations? one in root, one per decsion? can we make this per decsion only? this will make the config clear.

asaadbalum · 2026-01-16T09:00:41Z

@Xunzhuo

Config: Agreed - will remove global model_selection: section. Per-decision algorithm: will be the only configuration.
Demo file: For cmd/selection-demo/ - would you prefer:
- A) Move to examples/selection-demo/
- B) Remove entirely
- C) Integrate as test cases in pkg/selection/

Let me know and I'll update accordingly.

Xunzhuo · 2026-01-16T09:26:40Z

Thanks, i prefer A) Move to examples/selection/main.go

Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com>

asaadbalum · 2026-01-17T10:40:09Z

@Xunzhuo Done! Changes made per your feedback:

1. Config: Removed global model_selection: section. Only per-decision algorithm: is now used:

decisions:
  - name: tech
    modelRefs:
      - model: "llama3.2:3b"
      - model: "phi4"
    algorithm:
      type: "elo"
      elo:
        k_factor: 32
        category_weighted: true

2. Demo: Moved to examples/selection/main.go

Let me know if anything else needs adjustment.

Xunzhuo · 2026-01-17T15:06:04Z


+	// ModelSelection configures the algorithm used for model selection
+	// Supported methods: "static", "elo", "router_dc", "automix", "hybrid"
+	ModelSelection ModelSelectionConfig `yaml:"model_selection,omitempty"`


nit: should we remove this as a follow-up

Xunzhuo · 2026-01-17T15:06:57Z

thanks! some follow-up:

P0:

for elo algorithms, we need to store the ratings in persistant storage.
for the router dc algorithms, i think we missed the fields in modelConfig for each model descriptions, to init the embeddings, am i missing anything?
for the auto mix, how to escalate the model when it is uncertain, should you combine this with existed confidence routing algorithms (it is size based selection, not cost based, can you combine two into one)?

P1:

add python cli translation logics to support these new algorithm fields in vllm-sr serve
add e2e for each algorithms to see if it really works
add website docs to describe each algorithm details

asaadbalum · 2026-01-17T17:29:43Z

Thanks @Xunzhuo for the detailed follow-up!

Re: Your questions:

RouterDC embeddings - You're correct, we don't have description fields in ModelConfig yet. Will add this to enable real model embeddings.
AutoMix escalation - Good point. Current AutoMix is pre-selection only. Will integrate with looper's confidence cascading for cost-aware escalation.

Follow-up Plan (2 issues to keep it focused):

Issue	Scope
P0: Core Algorithm Enhancements	Elo storage, RouterDC embeddings, AutoMix+Confidence
P1: CLI, E2E & Documentation	vllm-sr support, E2E tests, website docs

Also noting #1093 tracks metrics, and #38 (Dynamic Scoring) comes after per your guidance.

Shall I proceed with creating these now.

asaadbalum · 2026-01-18T07:11:39Z

Created follow-up issues as discussed:

[v0.2-Athena]: Model Selection: Core Algorithm Enhancements (Elo Storage, RouterDC Embeddings, AutoMix Cascading) #1102 - [P0] Core Algorithm Enhancements (Elo storage, RouterDC embeddings, AutoMix cascading)
[v0.2-Athena]: Model Selection: CLI Support, E2E Tests & Documentation #1103 - [P1] CLI Support, E2E Tests & Documentation

Also noting #1093 for Prometheus metrics is already tracked.

Will start on #1102 after the current items. Thanks for the guidance! 🚀

…ject#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com>

…ject#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com> Signed-off-by: Scanf-s <sullung2yo@gmail.com>

… config inline (#1100) * fix: Refactor Redis and Milvus Cache Config into config.go, Update CacheOption initializer to handle the new configuration approach Signed-off-by: Scanf-s <sullung2yo@gmail.com> * fix: Add fallback logic when proper redis or milvus configuration does not given Signed-off-by: Scanf-s <sullung2yo@gmail.com> * docs: Add sample inline redis configuration example Signed-off-by: Scanf-s <sullung2yo@gmail.com> * docs: Update cache configuration examples Signed-off-by: Scanf-s <sullung2yo@gmail.com> * fix: Update HybridCache Milvus configuration Signed-off-by: Scanf-s <sullung2yo@gmail.com> * chore: Apply code linter Signed-off-by: Scanf-s <sullung2yo@gmail.com> * Feat(selection): implement advanced model selection methods (#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com> Signed-off-by: Scanf-s <sullung2yo@gmail.com> * feat: Add inline cache configuration unit tests Signed-off-by: Scanf-s <sullung2yo@gmail.com> * feat: Add cache unit tests Signed-off-by: Scanf-s <sullung2yo@gmail.com> --------- Signed-off-by: Scanf-s <sullung2yo@gmail.com> Signed-off-by: asaadbalum <asaad.balum@gmail.com> Co-authored-by: asaadbalum <154635253+asaadbalum@users.noreply.github.com>

…ject#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com>

… config inline (vllm-project#1100) * fix: Refactor Redis and Milvus Cache Config into config.go, Update CacheOption initializer to handle the new configuration approach Signed-off-by: Scanf-s <sullung2yo@gmail.com> * fix: Add fallback logic when proper redis or milvus configuration does not given Signed-off-by: Scanf-s <sullung2yo@gmail.com> * docs: Add sample inline redis configuration example Signed-off-by: Scanf-s <sullung2yo@gmail.com> * docs: Update cache configuration examples Signed-off-by: Scanf-s <sullung2yo@gmail.com> * fix: Update HybridCache Milvus configuration Signed-off-by: Scanf-s <sullung2yo@gmail.com> * chore: Apply code linter Signed-off-by: Scanf-s <sullung2yo@gmail.com> * Feat(selection): implement advanced model selection methods (vllm-project#1089) Add pluggable model selection algorithms for intelligent routing: - Elo rating system with Bradley-Terry model for preference-based selection - RouterDC for query-to-model embedding matching - AutoMix for POMDP-based cost-quality optimization - Hybrid selector combining multiple methods with configurable weights - Static selector for backwards compatibility Integration: - OpenAIRouter initializes selection registry on startup - req_filter_classification uses configured selector instead of hardcoded first model - Prometheus metrics for selection tracking Signed-off-by: asaadbalum <asaad.balum@gmail.com> Signed-off-by: Scanf-s <sullung2yo@gmail.com> * feat: Add inline cache configuration unit tests Signed-off-by: Scanf-s <sullung2yo@gmail.com> * feat: Add cache unit tests Signed-off-by: Scanf-s <sullung2yo@gmail.com> --------- Signed-off-by: Scanf-s <sullung2yo@gmail.com> Signed-off-by: asaadbalum <asaad.balum@gmail.com> Co-authored-by: asaadbalum <154635253+asaadbalum@users.noreply.github.com>

github-actions Bot assigned rootfs, wangchen615 and Xunzhuo Jan 15, 2026

asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch 2 times, most recently from 2dbb18f to 8b2fb3d Compare January 15, 2026 13:20

asaadbalum requested a review from Xunzhuo January 15, 2026 13:22

asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch 2 times, most recently from 2bfb968 to ee4e664 Compare January 15, 2026 13:40

github-actions Bot assigned JaredforReal and yuluo-yx Jan 15, 2026

asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch from ee4e664 to ee8f655 Compare January 15, 2026 13:40

rootfs previously approved these changes Jan 15, 2026

View reviewed changes

rootfs requested a review from Copilot January 15, 2026 16:05

Copilot started reviewing on behalf of rootfs January 15, 2026 16:05 View session

Copilot AI reviewed Jan 15, 2026

View reviewed changes

asaadbalum mentioned this pull request Jan 15, 2026

[v0.2-Athena]: Add Prometheus metrics for model selection evolution tracking #1093

Closed

3 tasks

asaadbalum dismissed rootfs’s stale review via 16a9735 January 15, 2026 16:49

asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch from ee8f655 to 16a9735 Compare January 15, 2026 16:49

Xunzhuo reviewed Jan 16, 2026

View reviewed changes

asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch 2 times, most recently from ce5ee96 to e295a35 Compare January 16, 2026 08:28

asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch from e295a35 to cc54c10 Compare January 16, 2026 08:34

Xunzhuo reviewed Jan 16, 2026

View reviewed changes

asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch from cc54c10 to e5a4135 Compare January 17, 2026 10:25

asaadbalum requested a review from Xunzhuo January 17, 2026 10:49

Xunzhuo reviewed Jan 17, 2026

View reviewed changes

Xunzhuo approved these changes Jan 17, 2026

View reviewed changes

Xunzhuo merged commit 9ceb3fa into vllm-project:main Jan 17, 2026
33 checks passed

This was referenced Jan 18, 2026

[v0.2-Athena]: Model Selection: Core Algorithm Enhancements (Elo Storage, RouterDC Embeddings, AutoMix Cascading) #1102

Closed

[v0.2-Athena]: Model Selection: CLI Support, E2E Tests & Documentation #1103

Closed

asaadbalum mentioned this pull request Jan 18, 2026

feat(selection): add Elo storage, RouterDC embeddings, and AutoMix cascading #1104

Merged

6 tasks

asaadbalum mentioned this pull request Jan 20, 2026

[v0.2-Athena]: Model Selection: Dashboard Rating/Feedback UI #1128

Closed

asaadbalum mentioned this pull request Jan 21, 2026

feat(selection): Add Prometheus metrics for model selection evolution tracking #1124

Merged

4 tasks

		// Convert static scores to Elo ratings (scale 0-1 -> 1000-2000)
		rating := 1000.0 + (ms.Score * 1000.0)

-	// Build selection context with cost/quality weights from config
-	costWeight := r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff
-	qualityWeight := 1.0 - costWeight // Quality is complement of cost
+	// Build selection context with cost/quality weights
+	// For AutoMix, use the AutoMix-specific cost/quality tradeoff from config.
+	// For other methods, use neutral default weights to avoid coupling them to AutoMix config.
+	var costWeight float64
+	var qualityWeight float64
+	if method == selection.MethodAutoMix {
+		costWeight = r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff
+		if costWeight < 0.0 {
+			costWeight = 0.0
+		} else if costWeight > 1.0 {
+			costWeight = 1.0
+		}
+		qualityWeight = 1.0 - costWeight
+	} else {
+		// Default to equal weighting when method does not define its own cost/quality config
+		costWeight = 0.5
+		qualityWeight = 0.5
+	}

		@@ -0,0 +1,270 @@
		/*
		Selection Demo - Demonstrates advanced model selection methods

Conversation

asaadbalum commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Advanced Model Selection Methods

Summary

What Changed

New Package: pkg/selection/

Modified Files

Configuration (Per-Decision Only - Aligned with Looper Pattern)

Default Behavior (Backwards Compatible)

Testing

Production Logging

Appendix

Elo Rating Selection

AutoMix Selection

RouterDC Selection

Hybrid Selection

Tweaking Parameters

Demo Script for Future Enhancements

Uh oh!

netlify Bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions Bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 config

📁 src

🎉 Thanks for your contributions!

Uh oh!

rootfs commented Jan 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

rootfs commented Jan 15, 2026

Uh oh!

asaadbalum commented Jan 15, 2026

Uh oh!

Xunzhuo commented Jan 16, 2026

Uh oh!

asaadbalum commented Jan 16, 2026

Uh oh!

Xunzhuo commented Jan 16, 2026

Uh oh!

Xunzhuo Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

asaadbalum commented Jan 16, 2026

Uh oh!

asaadbalum commented Jan 16, 2026

Uh oh!

Xunzhuo Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Xunzhuo Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

asaadbalum commented Jan 15, 2026 •

edited

Loading

New Package: `pkg/selection/`

netlify Bot commented Jan 15, 2026 •

edited

Loading

github-actions Bot commented Jan 15, 2026 •

edited

Loading

📁 `config`

📁 `src`

Xunzhuo Jan 16, 2026 •

edited

Loading

Xunzhuo commented Jan 16, 2026 •

edited

Loading

asaadbalum commented Jan 17, 2026 •

edited

Loading

Xunzhuo commented Jan 17, 2026 •

edited

Loading