Skip to content

feat(selection): implement advanced model selection methods#1089

Merged
Xunzhuo merged 1 commit intovllm-project:mainfrom
asaadbalum:feat/issue-987-advanced-model-selection
Jan 17, 2026
Merged

feat(selection): implement advanced model selection methods#1089
Xunzhuo merged 1 commit intovllm-project:mainfrom
asaadbalum:feat/issue-987-advanced-model-selection

Conversation

@asaadbalum
Copy link
Copy Markdown
Collaborator

@asaadbalum asaadbalum commented Jan 15, 2026

Advanced Model Selection Methods

Summary

Implement advanced model selection algorithms for intelligent routing, enabling the semantic router to choose the best LLM from multiple candidates based on learned preferences, query similarity, and cost-quality optimization.

Fixes #987

What Changed

New Package: pkg/selection/

File Purpose
selector.go Core interfaces: Selector, SelectionContext, SelectionResult
elo.go Elo rating with Bradley-Terry model
router_dc.go Dual-contrastive query-to-model matching
automix.go POMDP-based cost-quality optimization
hybrid.go Combines all methods with configurable weights
static.go Default behavior (backwards compatible)
factory.go Creates selectors from configuration
metrics.go Prometheus metrics

Modified Files

File Change
pkg/config/config.go Extended AlgorithmConfig with selection types
pkg/extproc/router.go Initialize selection registry
pkg/extproc/req_filter_classification.go Per-decision algorithm support

Configuration (Per-Decision Only - Aligned with Looper Pattern)

Each decision specifies its own algorithm:

decisions:
  - name: tech
    modelRefs:
      - model: "llama3.2:3b"
      - model: "phi4"
      - model: "gemma3:27b"
    algorithm:
      type: "elo"
      elo:
        k_factor: 32
        category_weighted: true

  - name: finance
    algorithm:
      type: "automix"
      automix:
        cost_quality_tradeoff: 0.4

  - name: general
    algorithm:
      type: "hybrid"
      hybrid:
        elo_weight: 0.3
        router_dc_weight: 0.3
        automix_weight: 0.2
        cost_weight: 0.2

Default Behavior (Backwards Compatible)

  • If decision has no algorithm → uses static selection (first model)
  • No action required for existing deployments

Testing

  • ✅ Unit tests: All pass (go test ./pkg/selection/...)
  • ✅ Build: Clean (go build ./...)
  • ✅ Integration: Selection wired into extproc routing path
  • ✅ Demo: cd src/semantic-router && go run ./examples/selection/main.go

Production Logging

VSR logs show selection decisions for every request:

[EloSelector] gemma3:27b: rating=1531.3 (W:2 L:0 T:0)
[AutoMix] llama3.2:3b: cost=$0.05, quality=0.70, value=0.6990
[HybridSelector] gemma3:27b: elo=0.3947, dc=0.3333, am=0.9400 → combined=0.5080

Appendix

A. Demo Output

Elo Rating Selection

Query: "Explain quantum computing"
Elo Ratings: llama3.2:3b=1468, phi4=1501, gemma3:27b=1531
→ SELECTED: gemma3:27b (highest rating)

AutoMix Selection

Query: "What is 2+2?" (simple query)
With cost_quality_tradeoff=0.8: → gemma3:27b (quality gap 0.70→0.95 still dominates)
With cost_quality_tradeoff=0.2: → gemma3:27b (quality preferred)
Note: With closer quality scores, cost preference would flip selection.

RouterDC Selection

Query: "Debug this Go function" (code query)
Similarity: phi4=0.334 (best for code)
→ SELECTED: phi4

Hybrid Selection

Query: "Write an efficient sorting algorithm"
Combined: elo=0.395, dc=0.333, am=0.940 → score=0.508
→ SELECTED: gemma3:27b
B. Running the Demo
cd src/semantic-router
go run ./examples/selection/main.go

Tweaking Parameters

In demo script (examples/selection/main.go):

  • costQualityTradeoff (~line 160): 0.0=quality, 1.0=cost
  • Model costs (~line 70): Change pricing
  • Hybrid weights (~line 220): Adjust method balance

In config:
Edit config/intelligent-routing/in-tree/model_selection_demo.yaml and restart VSR.

Demo Script for Future Enhancements

The demo is extensible:

Enhancement How to Extend
Feedback REST API Add section calling endpoint, verify Elo updates
Model Embeddings Config Load from YAML instead of hardcoding
RouterDC Training Add training loop section
C. Future Enhancements (Not in This PR)
Enhancement What We Provide Instead
Feedback REST API UpdateFeedback() method ready
Model Embeddings Config SetModelEmbedding() API
Quality Score Config Default=0.8, configurable via code
True AutoMix Cascading Pre-selection based on POMDP values
RouterDC Training Similarity matching with provided embeddings
Prometheus Metrics See follow-up issue #1093
D. Reference Papers

@netlify
Copy link
Copy Markdown

netlify Bot commented Jan 15, 2026

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit e5a4135
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/696b63b982d0550009f0c950
😎 Deploy Preview https://deploy-preview-1089--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 15, 2026

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 config

Owners: @rootfs, @Xunzhuo
Files changed:

  • config/intelligent-routing/in-tree/model_selection_demo.yaml

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/examples/selection/main.go
  • src/semantic-router/pkg/config/config.go
  • src/semantic-router/pkg/extproc/req_filter_classification.go
  • src/semantic-router/pkg/extproc/router.go
  • src/semantic-router/pkg/selection/automix.go
  • src/semantic-router/pkg/selection/elo.go
  • src/semantic-router/pkg/selection/factory.go
  • src/semantic-router/pkg/selection/hybrid.go
  • src/semantic-router/pkg/selection/metrics.go
  • src/semantic-router/pkg/selection/router_dc.go
  • src/semantic-router/pkg/selection/selector.go
  • src/semantic-router/pkg/selection/selector_test.go
  • src/semantic-router/pkg/selection/static.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@asaadbalum asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch 2 times, most recently from 2dbb18f to 8b2fb3d Compare January 15, 2026 13:20
@asaadbalum asaadbalum requested a review from Xunzhuo January 15, 2026 13:22
@asaadbalum asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch 2 times, most recently from 2bfb968 to ee4e664 Compare January 15, 2026 13:40
@asaadbalum asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch from ee4e664 to ee8f655 Compare January 15, 2026 13:40
@rootfs
Copy link
Copy Markdown
Collaborator

rootfs commented Jan 15, 2026

@asaadbalum nice work! elo is a real cool metrics. I would imagine that we can use the feedback classifier to complement user voting and automate the ranking process.

Would you like to follow up with a PR to add prom metrics for explainability and tracebility so we can infer the evolution of selector overtime.

rootfs
rootfs previously approved these changes Jan 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements advanced model selection algorithms for intelligent LLM routing, enabling the semantic router to choose the best model from multiple candidates based on learned preferences, query similarity, and cost-quality optimization. The implementation adds five selection methods: Static (baseline), Elo rating, RouterDC (dual-contrastive), AutoMix (POMDP-based), and Hybrid (combined approach).

Changes:

  • New pkg/selection/ package with core selection interfaces and multiple algorithm implementations
  • Integration into the routing pipeline via req_filter_classification.go
  • Configuration structs added to support the new selection methods
  • Comprehensive test coverage and demo application

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pkg/selection/selector.go Core interfaces and types for selection framework
pkg/selection/elo.go Elo rating system using Bradley-Terry model
pkg/selection/router_dc.go Dual-contrastive query-to-model matching
pkg/selection/automix.go POMDP-based cost-quality optimization
pkg/selection/hybrid.go Combines multiple methods with weighted scores
pkg/selection/static.go Baseline static selection (backwards compatible)
pkg/selection/factory.go Factory pattern for creating selectors
pkg/selection/metrics.go Prometheus metrics for observability
pkg/selection/selector_test.go Comprehensive unit tests
pkg/extproc/router.go Initialize selection registry
pkg/extproc/req_filter_classification.go Integration with routing pipeline
pkg/config/config.go Configuration structs for selection methods
cmd/selection-demo/main.go Demo application
config/intelligent-routing/in-tree/model_selection_demo.yaml Example configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +461 to +489
// contains checks if a string contains a substring (case-insensitive)
func contains(s, substr string) bool {
for i := 0; i <= len(s)-len(substr); i++ {
if equalFold(s[i:i+len(substr)], substr) {
return true
}
}
return false
}

// equalFold compares two strings case-insensitively
func equalFold(a, b string) bool {
if len(a) != len(b) {
return false
}
for i := range a {
ca, cb := a[i], b[i]
if ca >= 'A' && ca <= 'Z' {
ca += 'a' - 'A'
}
if cb >= 'A' && cb <= 'Z' {
cb += 'a' - 'A'
}
if ca != cb {
return false
}
}
return true
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contains and equalFold functions reimplement functionality available in Go's standard library. Use strings.Contains with strings.ToLower for the contains check, or strings.EqualFold for case-insensitive comparison. This reduces code complexity and uses well-tested standard library functions.

Copilot uses AI. Check for mistakes.
h.config.EloWeight, h.config.RouterDCWeight, h.config.AutoMixWeight, h.config.CostWeight)

if len(parts) > 0 {
return fmt.Sprintf("Hybrid combination: %s, %s", parts, weightsStr)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parts variable is a []string slice being used with %s format specifier, which will print the slice representation (e.g., [Elo=0.500 RouterDC=0.600]) rather than a formatted string. Use strings.Join(parts, ", ") to properly format the component scores as a comma-separated string.

Copilot uses AI. Check for mistakes.
{"0.5b", 0.5},
}

modelLower := model
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable modelLower is assigned the value of model but never converted to lowercase, despite the function name and comment indicating case-insensitive matching. Add modelLower = strings.ToLower(model) or use the standard library functions as suggested in Comment 1.

Copilot uses AI. Check for mistakes.
Comment on lines +209 to +249
eloCfg := cfg.IntelligentRouting.ModelSelection.Elo
modelSelectionCfg.Elo = &selection.EloConfig{
InitialRating: eloCfg.InitialRating,
KFactor: eloCfg.KFactor,
CategoryWeighted: eloCfg.CategoryWeighted,
DecayFactor: eloCfg.DecayFactor,
MinComparisons: eloCfg.MinComparisons,
CostScalingFactor: eloCfg.CostScalingFactor,
}

// Copy RouterDC config
routerDCCfg := cfg.IntelligentRouting.ModelSelection.RouterDC
modelSelectionCfg.RouterDC = &selection.RouterDCConfig{
Temperature: routerDCCfg.Temperature,
DimensionSize: routerDCCfg.DimensionSize,
MinSimilarity: routerDCCfg.MinSimilarity,
UseQueryContrastive: routerDCCfg.UseQueryContrastive,
UseModelContrastive: routerDCCfg.UseModelContrastive,
}

// Copy AutoMix config
autoMixCfg := cfg.IntelligentRouting.ModelSelection.AutoMix
modelSelectionCfg.AutoMix = &selection.AutoMixConfig{
VerificationThreshold: autoMixCfg.VerificationThreshold,
MaxEscalations: autoMixCfg.MaxEscalations,
CostAwareRouting: autoMixCfg.CostAwareRouting,
CostQualityTradeoff: autoMixCfg.CostQualityTradeoff,
DiscountFactor: autoMixCfg.DiscountFactor,
UseLogprobVerification: autoMixCfg.UseLogprobVerification,
}

// Copy Hybrid config
hybridCfg := cfg.IntelligentRouting.ModelSelection.Hybrid
modelSelectionCfg.Hybrid = &selection.HybridConfig{
EloWeight: hybridCfg.EloWeight,
RouterDCWeight: hybridCfg.RouterDCWeight,
AutoMixWeight: hybridCfg.AutoMixWeight,
CostWeight: hybridCfg.CostWeight,
QualityGapThreshold: hybridCfg.QualityGapThreshold,
NormalizeScores: hybridCfg.NormalizeScores,
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual field-by-field copying of configuration structs is verbose and error-prone. Consider either reusing the same struct types between packages (if appropriate) or implementing helper conversion methods to reduce duplication and make maintenance easier when fields are added or modified.

Copilot uses AI. Check for mistakes.
Comment on lines +155 to +156
// Convert static scores to Elo ratings (scale 0-1 -> 1000-2000)
rating := 1000.0 + (ms.Score * 1000.0)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic numbers 1000.0 should be defined as named constants (e.g., MinEloRatingFromScore and EloRatingRange) to clarify the conversion formula and make it easier to adjust if needed.

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +206
// Build selection context with cost/quality weights from config
costWeight := r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff
qualityWeight := 1.0 - costWeight // Quality is complement of cost
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost weight is being read from the AutoMix configuration even when using other selection methods (e.g., Hybrid, Elo). Consider using a top-level or context-specific cost weight configuration that applies across all methods, or document why AutoMix's cost weight is used globally.

Suggested change
// Build selection context with cost/quality weights from config
costWeight := r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff
qualityWeight := 1.0 - costWeight // Quality is complement of cost
// Build selection context with cost/quality weights
// For AutoMix, use the AutoMix-specific cost/quality tradeoff from config.
// For other methods, use neutral default weights to avoid coupling them to AutoMix config.
var costWeight float64
var qualityWeight float64
if method == selection.MethodAutoMix {
costWeight = r.Config.IntelligentRouting.ModelSelection.AutoMix.CostQualityTradeoff
if costWeight < 0.0 {
costWeight = 0.0
} else if costWeight > 1.0 {
costWeight = 1.0
}
qualityWeight = 1.0 - costWeight
} else {
// Default to equal weighting when method does not define its own cost/quality config
costWeight = 0.5
qualityWeight = 0.5
}

Copilot uses AI. Check for mistakes.
defer a.valueMu.Unlock()

// Simple value iteration: V(s) = R(s) + γ * max_a E[V(s')]
for model, cap := range a.capabilities {
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updateValueFunction method locks valueMu but iterates over a.capabilities without holding capMu. This creates a potential race condition if capabilities are modified concurrently. Either hold both locks or ensure capabilities are not modified after initialization.

Copilot uses AI. Check for mistakes.
@rootfs
Copy link
Copy Markdown
Collaborator

rootfs commented Jan 15, 2026

There are some cosmetic changes requested by copilot. In addition, would be great if you can add more unit test to selector to cover more meaningful cases with multi turn elo evolutions.

@asaadbalum
Copy link
Copy Markdown
Collaborator Author

Thanks for the review and approval! 🙏

Re: Prometheus metrics for evolution tracking
Great suggestion! I've created a follow-up issue: #1093, please review it to sync expectations

The proposed metrics include:

  • llm_model_elo_rating - Track rating evolution over time
  • llm_model_feedback_total - Win/loss/tie counts
  • llm_model_rating_change - Distribution of rating changes

Re: Multi-turn Elo tests
Added three new tests in this update:

  • TestEloSelector_MultiTurnEvolution - 10 rounds of feedback showing rating convergence
  • TestEloSelector_TieHandling - Verifies tie feedback handling
  • TestEloSelector_SelectionFollowsRatings - Confirms selection respects Elo rankings

These verify the algorithm works correctly when users switch from the default (static) to Elo-based selection.

Re: Copilot suggestions
Addressed all bug-level issues:

  • Fixed format string slice handling (strings.Join)
  • Fixed case-insensitive comparison (strings.ToLower)
  • Fixed race condition in updateValueFunction
  • Added constants for magic numbers

The cosmetic/design items (#4, #6) are noted for future consideration.

@Xunzhuo
Copy link
Copy Markdown
Member

Xunzhuo commented Jan 16, 2026

@asaadbalum can you align the API design with the dicussion?

@asaadbalum
Copy link
Copy Markdown
Collaborator Author

@Xunzhuo Sure - are you referring to the Feedback API design for Elo rating updates? (As mentioned in #1093)

If so, what aspects would you like to align on?

@Xunzhuo
Copy link
Copy Markdown
Member

Xunzhuo commented Jan 16, 2026

decisions:
  - name: "general_route"
    description: "Default fallback route for general queries"
    priority: 50
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "code_keywords"
        - type: "domain"
          name: "math"
        - type: "domain"
          name: "other"
        - type: "domain"
          name: "computer science"
    modelRefs:
      - model: "qwen-flash"
        use_reasoning: false
      - model: "qwen3-next-80b-a3b-instruct"
        use_reasoning: true
    algorithm:
      type: "confidence"
      confidence:
        confidence_method: "avg_logprob"
        threshold: 0.92      # Escalate if margin < 0.8
        on_error: "skip"    # Skip failed models, try next
    plugins:
      - type: "semantic-cache"
        configuration:
          enabled: false
          similarity_threshold: 0.85

  - name: "ratings_route"
    description: "Route using ratings algorithm for model selection"
    priority: 60
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "code_keywords"
        - type: "domain"
          name: "other"
    modelRefs:
      - model: "qwen-flash"
        use_reasoning: false
      - model: "qwen3-next-80b-a3b-instruct"
        use_reasoning: true
    algorithm:
      type: "ratings"
      ratings:
        on_error: "skip"    # Skip failed models, try next
    plugins:
      - type: "semantic-cache"
        configuration:
          enabled: false

like what in looper, we put the model selection in algorithm per decision

_, _ = logging.InitLoggerFromEnv()
}

func main() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to unify the code into extproc? not a main.go here?

@asaadbalum
Copy link
Copy Markdown
Collaborator Author

@Xunzhuo I see, agreed.

Will restructure to move algorithm config per-decision, aligning with looper's pattern.

Regarding cmd/selection-demo/main.go - this is a standalone utility for testing/demonstrating the selection algorithms, not production code. Happy to move it to examples/ or remove it if you prefer not to have demo binaries in the repo. Let me know your preference.

@asaadbalum asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch 2 times, most recently from ce5ee96 to e295a35 Compare January 16, 2026 08:28
@asaadbalum asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch from e295a35 to cc54c10 Compare January 16, 2026 08:34
@asaadbalum
Copy link
Copy Markdown
Collaborator Author

@Xunzhuo I see, agreed.

Will restructure to move algorithm config per-decision, aligning with looper's pattern.

Regarding cmd/selection-demo/main.go - this is a standalone utility for testing/demonstrating the selection algorithms, not production code. Happy to move it to examples/ or remove it if you prefer not to have demo binaries in the repo. Let me know your preference.

Done.

@@ -0,0 +1,270 @@
/*
Selection Demo - Demonstrates advanced model selection methods
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it is still in cmd/select-demo/main.go?


# Global Model Selection (fallback for decisions without algorithm)
# Options: "static", "elo", "router_dc", "automix", "hybrid"
model_selection:
Copy link
Copy Markdown
Member

@Xunzhuo Xunzhuo Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my question is why we need two part configurations? one in root, one per decsion? can we make this per decsion only? this will make the config clear.

@asaadbalum
Copy link
Copy Markdown
Collaborator Author

@Xunzhuo

  1. Config: Agreed - will remove global model_selection: section. Per-decision algorithm: will be the only configuration.

  2. Demo file: For cmd/selection-demo/ - would you prefer:

    • A) Move to examples/selection-demo/
    • B) Remove entirely
    • C) Integrate as test cases in pkg/selection/

Let me know and I'll update accordingly.

@Xunzhuo
Copy link
Copy Markdown
Member

Xunzhuo commented Jan 16, 2026

Thanks, i prefer A) Move to examples/selection/main.go

Add pluggable model selection algorithms for intelligent routing:
- Elo rating system with Bradley-Terry model for preference-based selection
- RouterDC for query-to-model embedding matching
- AutoMix for POMDP-based cost-quality optimization
- Hybrid selector combining multiple methods with configurable weights
- Static selector for backwards compatibility

Integration:
- OpenAIRouter initializes selection registry on startup
- req_filter_classification uses configured selector instead of hardcoded first model
- Prometheus metrics for selection tracking

Signed-off-by: asaadbalum <asaad.balum@gmail.com>
@asaadbalum asaadbalum force-pushed the feat/issue-987-advanced-model-selection branch from cc54c10 to e5a4135 Compare January 17, 2026 10:25
@asaadbalum
Copy link
Copy Markdown
Collaborator Author

asaadbalum commented Jan 17, 2026

@Xunzhuo Done! Changes made per your feedback:

1. Config: Removed global model_selection: section. Only per-decision algorithm: is now used:

decisions:
  - name: tech
    modelRefs:
      - model: "llama3.2:3b"
      - model: "phi4"
    algorithm:
      type: "elo"
      elo:
        k_factor: 32
        category_weighted: true

2. Demo: Moved to examples/selection/main.go

Let me know if anything else needs adjustment.

@asaadbalum asaadbalum requested a review from Xunzhuo January 17, 2026 10:49

// ModelSelection configures the algorithm used for model selection
// Supported methods: "static", "elo", "router_dc", "automix", "hybrid"
ModelSelection ModelSelectionConfig `yaml:"model_selection,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we remove this as a follow-up

@Xunzhuo
Copy link
Copy Markdown
Member

Xunzhuo commented Jan 17, 2026

thanks! some follow-up:

P0:

  1. for elo algorithms, we need to store the ratings in persistant storage.
  2. for the router dc algorithms, i think we missed the fields in modelConfig for each model descriptions, to init the embeddings, am i missing anything?
  3. for the auto mix, how to escalate the model when it is uncertain, should you combine this with existed confidence routing algorithms (it is size based selection, not cost based, can you combine two into one)?

P1:

  1. add python cli translation logics to support these new algorithm fields in vllm-sr serve
  2. add e2e for each algorithms to see if it really works
  3. add website docs to describe each algorithm details

@Xunzhuo Xunzhuo merged commit 9ceb3fa into vllm-project:main Jan 17, 2026
33 checks passed
@asaadbalum
Copy link
Copy Markdown
Collaborator Author

Thanks @Xunzhuo for the detailed follow-up!

Re: Your questions:

  1. RouterDC embeddings - You're correct, we don't have description fields in ModelConfig yet. Will add this to enable real model embeddings.

  2. AutoMix escalation - Good point. Current AutoMix is pre-selection only. Will integrate with looper's confidence cascading for cost-aware escalation.

Follow-up Plan (2 issues to keep it focused):

Issue Scope
P0: Core Algorithm Enhancements Elo storage, RouterDC embeddings, AutoMix+Confidence
P1: CLI, E2E & Documentation vllm-sr support, E2E tests, website docs

Also noting #1093 tracks metrics, and #38 (Dynamic Scoring) comes after per your guidance.

Shall I proceed with creating these now.

@asaadbalum
Copy link
Copy Markdown
Collaborator Author

Created follow-up issues as discussed:

Also noting #1093 for Prometheus metrics is already tracked.

Will start on #1102 after the current items. Thanks for the guidance! 🚀

Scanf-s pushed a commit to Scanf-s/semantic-router that referenced this pull request Jan 18, 2026
…ject#1089)

Add pluggable model selection algorithms for intelligent routing:
- Elo rating system with Bradley-Terry model for preference-based selection
- RouterDC for query-to-model embedding matching
- AutoMix for POMDP-based cost-quality optimization
- Hybrid selector combining multiple methods with configurable weights
- Static selector for backwards compatibility

Integration:
- OpenAIRouter initializes selection registry on startup
- req_filter_classification uses configured selector instead of hardcoded first model
- Prometheus metrics for selection tracking

Signed-off-by: asaadbalum <asaad.balum@gmail.com>
Scanf-s pushed a commit to Scanf-s/semantic-router that referenced this pull request Jan 18, 2026
…ject#1089)

Add pluggable model selection algorithms for intelligent routing:
- Elo rating system with Bradley-Terry model for preference-based selection
- RouterDC for query-to-model embedding matching
- AutoMix for POMDP-based cost-quality optimization
- Hybrid selector combining multiple methods with configurable weights
- Static selector for backwards compatibility

Integration:
- OpenAIRouter initializes selection registry on startup
- req_filter_classification uses configured selector instead of hardcoded first model
- Prometheus metrics for selection tracking

Signed-off-by: asaadbalum <asaad.balum@gmail.com>
Signed-off-by: Scanf-s <sullung2yo@gmail.com>
Xunzhuo pushed a commit that referenced this pull request Jan 19, 2026
… config inline (#1100)

* fix: Refactor Redis and Milvus Cache Config into config.go, Update CacheOption initializer to handle the new configuration approach

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* fix: Add fallback logic when proper redis or milvus configuration does not given

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* docs: Add sample inline redis configuration example

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* docs: Update cache configuration examples

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* fix: Update HybridCache Milvus configuration

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* chore: Apply code linter

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* Feat(selection): implement advanced model selection methods (#1089)

Add pluggable model selection algorithms for intelligent routing:
- Elo rating system with Bradley-Terry model for preference-based selection
- RouterDC for query-to-model embedding matching
- AutoMix for POMDP-based cost-quality optimization
- Hybrid selector combining multiple methods with configurable weights
- Static selector for backwards compatibility

Integration:
- OpenAIRouter initializes selection registry on startup
- req_filter_classification uses configured selector instead of hardcoded first model
- Prometheus metrics for selection tracking

Signed-off-by: asaadbalum <asaad.balum@gmail.com>
Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* feat: Add inline cache configuration unit tests

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* feat: Add cache unit tests

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

---------

Signed-off-by: Scanf-s <sullung2yo@gmail.com>
Signed-off-by: asaadbalum <asaad.balum@gmail.com>
Co-authored-by: asaadbalum <154635253+asaadbalum@users.noreply.github.com>
henschwartz pushed a commit to henschwartz/semantic-router that referenced this pull request Jan 21, 2026
…ject#1089)

Add pluggable model selection algorithms for intelligent routing:
- Elo rating system with Bradley-Terry model for preference-based selection
- RouterDC for query-to-model embedding matching
- AutoMix for POMDP-based cost-quality optimization
- Hybrid selector combining multiple methods with configurable weights
- Static selector for backwards compatibility

Integration:
- OpenAIRouter initializes selection registry on startup
- req_filter_classification uses configured selector instead of hardcoded first model
- Prometheus metrics for selection tracking

Signed-off-by: asaadbalum <asaad.balum@gmail.com>
henschwartz pushed a commit to henschwartz/semantic-router that referenced this pull request Jan 21, 2026
… config inline (vllm-project#1100)

* fix: Refactor Redis and Milvus Cache Config into config.go, Update CacheOption initializer to handle the new configuration approach

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* fix: Add fallback logic when proper redis or milvus configuration does not given

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* docs: Add sample inline redis configuration example

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* docs: Update cache configuration examples

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* fix: Update HybridCache Milvus configuration

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* chore: Apply code linter

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* Feat(selection): implement advanced model selection methods (vllm-project#1089)

Add pluggable model selection algorithms for intelligent routing:
- Elo rating system with Bradley-Terry model for preference-based selection
- RouterDC for query-to-model embedding matching
- AutoMix for POMDP-based cost-quality optimization
- Hybrid selector combining multiple methods with configurable weights
- Static selector for backwards compatibility

Integration:
- OpenAIRouter initializes selection registry on startup
- req_filter_classification uses configured selector instead of hardcoded first model
- Prometheus metrics for selection tracking

Signed-off-by: asaadbalum <asaad.balum@gmail.com>
Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* feat: Add inline cache configuration unit tests

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

* feat: Add cache unit tests

Signed-off-by: Scanf-s <sullung2yo@gmail.com>

---------

Signed-off-by: Scanf-s <sullung2yo@gmail.com>
Signed-off-by: asaadbalum <asaad.balum@gmail.com>
Co-authored-by: asaadbalum <154635253+asaadbalum@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[v0.2-Athena]: Implement advanced model selection methods

7 participants