Issue 234 by srini-abhiram · Pull Request #1 · srini-abhiram/semantic-router

srini-abhiram · 2025-12-18T08:10:29Z

FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (link existing issues this PR will resolve)

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

* feat: initial model manager with model caching, downloading, verification, configuration Signed-off-by: samzong <samzong.lu@gmail.com> * refactor: delegate model downloading to a Python manager and externalize model configurations to YAML files. Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(linter): add 'ot' to codespell ignorewords list Signed-off-by: samzong <samzong.lu@gmail.com> * refactor: move model_manager tests to tests directory Signed-off-by: samzong <samzong.lu@gmail.com> --------- Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>

…ect#829) Signed-off-by: Senan Zedan <szedan@redhat.com>

…es calculation (vllm-project#833) Signed-off-by: Senan Zedan <szedan@redhat.com>

…ofrmance bench mark testing per PR (vllm-project#836)

…project#830) * move model_manager configs to config/models_manager/ Signed-off-by: JaredforReal <w13431838023@gmail.com> * add README for model_manager module Signed-off-by: JaredforReal <w13431838023@gmail.com> * update CI Signed-off-by: JaredforReal <w13431838023@gmail.com> * add hr_transfer to requirement Signed-off-by: JaredforReal <w13431838023@gmail.com> * accept reviews in README Signed-off-by: JaredforReal <w13431838023@gmail.com> --------- Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared <w13431838023@gmail.com>

Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>

Signed-off-by: Senan Zedan <szedan@redhat.com>

* e2e: add Response API basic operations tests Add E2E tests for Response API basic operations: - POST /v1/responses - Create a new response - GET /v1/responses/{id} - Retrieve a response - DELETE /v1/responses/{id} - Delete a response - GET /v1/responses/{id}/input_items - List input items Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> * ci: add Response API tests to Docker Compose CI - Enable Response API in config/config.yaml - Add Response API test steps to integration-test-docker.yml: - POST /v1/responses (create) - GET /v1/responses/{id} (retrieve) - GET /v1/responses/{id}/input_items (list input items) - DELETE /v1/responses/{id} (delete and verify 404) Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> --------- Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

Signed-off-by: bitliu <bitliu@tencent.com>

* feat: ad hallucination bench Signed-off-by: Huamin Chen <hchen@redhat.com> * review feedback Signed-off-by: Huamin Chen <hchen@redhat.com> --------- Signed-off-by: Huamin Chen <hchen@redhat.com>

…roject#840) * add tests for pii and tls module Signed-off-by: JaredforReal <w13431838023@gmail.com> * try fix integration test dynamic config test install kubectl fail Signed-off-by: JaredforReal <w13431838023@gmail.com> --------- Signed-off-by: JaredforReal <w13431838023@gmail.com>

…oject#846) Signed-off-by: samzong <samzong.lu@gmail.com>

Signed-off-by: Sophie8 <sw3237@nyu.edu>

…ct#844)

…tributed tracing (vllm-project#852) Signed-off-by: Fang Han <fhan0520@gmail.com>

…roject#854) Remove the unused MappingPath field from FactCheckModelConfig as the fact-check classifier loads label mappings directly from the model's config.json file, consistent with hallucination and NLI models. Signed-off-by: bitliu <bitliu@tencent.com>

…project#856) Signed-off-by: abdallahsamabd <abdallahsamabd@gmail.com>

…oject#827) Signed-off-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb> Co-authored-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb>

* feat: All-in-One Docker image for single-container Signed-off-by: samzong <samzong.lu@gmail.com> * 📝 docs(all-in-one): add all-in-one deployment README.md Signed-off-by: samzong <samzong.lu@gmail.com> * 🤖 ci: add workflow for all-in-one multi-arch Docker build Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(docker): simplify base images and remove Chinese mirrors from Dockerfile.all-in-one Signed-off-by: samzong <samzong.lu@gmail.com> * 💄 style(deploy): add shellcheck disable comment for envsubst line Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(ci): improve disk space cleanup in docker workflow using action Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(ci): optimize docker-all-in-one workflow performance and caching Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(docker-stack): change code name from all-in-one to stack. Signed-off-by: samzong <samzong.lu@gmail.com> --------- Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>

…llm-project#850)

github-actions · 2025-12-18T08:15:55Z

Performance Benchmark Results

Component benchmarks completed successfully.

Summary

Classification benchmarks: ✅
Decision engine benchmarks: ✅
Cache benchmarks: ✅

Details

See attached benchmark artifacts for detailed results and profiles.

Performance testing powered by vLLM Semantic Router

…dx limitations (vllm-project#859) Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>

Signed-off-by: samzong <samzong.lu@gmail.com>

…iles (vllm-project#865) Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>

* feat(hf-playground): replace dissatisfaction models with new feedback and tool call models - Remove Dissatisfaction Detector and Dissatisfaction Explainer models - Add Feedback Detector model for unified user satisfaction analysis - Add Tool Call Sentinel for prompt injection detection - Add Tool Call Verifier for token-level tool call verification The new models provide better coverage for conversational AI feedback analysis and enhanced security for tool-calling LLM agents. Signed-off-by: bitliu <bitliu@tencent.com> * fix(hf-playground): correct label mappings for feedback-detector model Update feedback-detector labels to match the actual model config: - 0: NEED_CLARIFICATION (was SAT) - 1: SAT (was NEED_CLARIFICATION) - 2: WANT_DIFFERENT (was WRONG_ANSWER) - 3: WRONG_ANSWER (was WANT_DIFFERENT) Signed-off-by: bitliu <bitliu@tencent.com> * feat(hf-playground): add support for simple token classification Add classify_tokens_simple function and UI support for toolcall-verifier model which uses simple AUTHORIZED/UNAUTHORIZED labels instead of BIO format. This enables proper token-level verification of tool calls to detect unauthorized actions in LLM agent systems. Signed-off-by: bitliu <bitliu@tencent.com> * feat(hf-playground): implement proper toolcall-verifier support Add specialized classify_toolcall_verifier function that formats input as '[USER] {intent} [TOOL] {call}' per model requirements. Update UI to provide separate input fields for user intent and tool call JSON. Display unauthorized tokens with detailed visualization showing flagged tokens and token-level classification results. Signed-off-by: bitliu <bitliu@tencent.com> --------- Signed-off-by: bitliu <bitliu@tencent.com>

vllm-project#848) * Created comprehensive test coverage in req_filter_tools_test.go with 27 test cases covering Signed-off-by: Senan Zedan <szedan@redhat.com> * fix: resolve variable shadowing lint errors in tool selection tests - Fixed err shadowing in extproc_test.go by renaming to loadErr - Fixed router shadowing in req_filter_tools_test.go by using testRouter - All pre-commit checks now pass Signed-off-by: Senan Zedan <szedan@redhat.com> * fix: use correct BERT model ID in test config to match initialization - Changed CreateTestConfig to use all-MiniLM-L6-v2 instead of L12-v2 - This matches the model initialized in tool selection test BeforeEach - Prevents singleton initialization conflicts that cause embedding generation to fail Signed-off-by: Senan Zedan <szedan@redhat.com> * fix: update test expectation to match corrected BERT model ID Signed-off-by: Senan Zedan <szedan@redhat.com> * fix: recreate router after modifying similarity threshold in tests The 'Similarity Threshold Filtering' tests were modifying cfg.ToolSelection.Tools.SimilarityThreshold after the router was already created in BeforeEach. This caused the tests to use the default threshold (0.2) instead of the test-specific thresholds (0.7, 0.99, 0.5). The fix recreates the router after setting the new threshold value in each test, ensuring the ToolsDatabase is initialized with the correct threshold. This fixes the failing test: - 'should return empty list when no tools meet high threshold' And prevents similar issues in three other tests in the same describe block. Signed-off-by: Senan Zedan <szedan@redhat.com> --------- Signed-off-by: Senan Zedan <szedan@redhat.com>

This PR introduces the VSR (vLLM Semantic Router) CLI tool - a comprehensive command-line interface that reduces setup time from hours to minutes and provides a unified interface for deployment, monitoring, and troubleshooting across multiple environments. ## Key Features ### Core Commands - **deploy**: Multi-environment deployment (Local, Docker Compose, Kubernetes, Helm) - **config**: Configuration management with validation and templates - **model**: Model lifecycle management (download, list, validate, remove, inspect) - **status**: Health monitoring and status checks - **debug**: Interactive debugging and diagnostic tools - **dashboard**: Dashboard management and access - **test**: Prompt testing and validation - **upgrade**: Seamless version upgrades - **get**: Resource inspection (logs, pods, services) ### Implementation Details - Built with Go and Cobra framework for robust CLI experience - Comprehensive test coverage with 3,400+ lines of test code - Support for multiple deployment targets with environment detection - Process lifecycle management with orphan process prevention - Integrated health checking and diagnostics - Shell completion support (bash, zsh, fish, powershell) ### Documentation - Complete CLI documentation with quickstart guide - Command reference with examples - Troubleshooting guide - Integration with existing semantic-router documentation ### Files Changed - 49 files changed, 11,531 insertions(+), 16 deletions(-) - New CLI implementation in src/semantic-router/cmd/vsr/ - New CLI packages in src/semantic-router/pkg/cli/ - Documentation in website/docs/cli/ - Build system integration in tools/make/build-run-test.mk Resolves vllm-project#234 Signed-off-by: Srinivas A <56465971+srini-abhiram@users.noreply.github.com>

This commit fixes CI test-and-build failures for PR vllm-project#824 by addressing two root causes: 1. **CGO Dependency Chain**: CLI tests have transitive CGO dependencies through pkg/cli/model → pkg/classification → candle-binding, requiring the Rust shared library (libcandle_semantic_router.so) at compile time. 2. **Test Execution Strategy**: Split test targets to avoid duplication and ensure proper build dependencies are met. Changes: - tools/make/build-run-test.mk: * Added test-cli target that depends on build-router and sets LD_LIBRARY_PATH * Modified test-semantic-router to exclude CLI tests using grep -v '/cmd/vsr' * Updated main test target to run both test-cli and test-semantic-router * Enhanced build-cli to inject version, commit hash, and build date - Fixed 6 CLI test bugs found during validation: * model.go: Fixed Short description text mismatch * model_test.go: Added missing output flag initialization * completion.go: Added OnlyValidArgs validator for shell argument * config_test.go: Use Name() instead of Use for subcommand matching * debug.go: Fixed Short description text * signal_handling_test.go: Added cmd.Wait() after Process.Kill() All CLI tests now pass successfully with proper CGO setup. Local CI validation confirms critical tests pass. Resolves vllm-project#234 Signed-off-by: Srinivas A <56465971+srini-abhiram@users.noreply.github.com>

samzong and others added 20 commits December 15, 2025 01:14

Add hybrid routing tests, Keyword → Embedding → BERT → MCP (vllm-proj…

41c75c2

…ect#829) Signed-off-by: Senan Zedan <szedan@redhat.com>

Add Entropy testing for reasnoning decision acccording to probabiliti…

317dfcf

…es calculation (vllm-project#833) Signed-off-by: Senan Zedan <szedan@redhat.com>

Disable the peformance comparision agaist baseline, keep just the per…

c33bafe

…ofrmance bench mark testing per PR (vllm-project#836)

fix(dashboard): proxy Jaeger /dependencies route (vllm-project#839)

f5073be

Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>

Adding new tests for reasoning filter (vllm-project#843)

5b18a79

Signed-off-by: Senan Zedan <szedan@redhat.com>

Sponsor: Add AMD Partnership (vllm-project#847)

532d668

Signed-off-by: bitliu <bitliu@tencent.com>

feat: add hallucination bench (vllm-project#838)

a6ec94d

* feat: ad hallucination bench Signed-off-by: Huamin Chen <hchen@redhat.com> * review feedback Signed-off-by: Huamin Chen <hchen@redhat.com> --------- Signed-off-by: Huamin Chen <hchen@redhat.com>

fix: regenerate package-lock.json with official npm registry (vllm-pr…

ec4a659

…oject#846) Signed-off-by: samzong <samzong.lu@gmail.com>

add fin benchmark (vllm-project#851)

8167837

Signed-off-by: Sophie8 <sw3237@nyu.edu>

feat: add configurable port support for Open WebUI iframe (vllm-proje…

063216d

…ct#844)

feat: add upstream request span and trace context propagation for dis…

787144d

…tributed tracing (vllm-project#852) Signed-off-by: Fang Han <fhan0520@gmail.com>

fix: StatefulSet readiness detection and add Dynamo demo video (vllm-…

381e1a8

…project#856) Signed-off-by: abdallahsamabd <abdallahsamabd@gmail.com>

Fix Domain Classifier Returns Empty or Wrong Classifications (vllm-pr…

211b46c

…oject#827) Signed-off-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb> Co-authored-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb>

Fix Playground tab to use backend proxy route instead of direct URL (v…

439a18b

…llm-project#850)

samzong and others added 7 commits December 18, 2025 21:13

🔧 chore(ci): disable arm64 build in docker-stack workflow due to buil…

90e6975

…dx limitations (vllm-project#859) Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>

feat: Add dashboard checks and CI workflow (vllm-project#861)

a984df3

Signed-off-by: samzong <samzong.lu@gmail.com>

fix: refactor documentation and improve clarity across multiple doc f…

5694873

…iles (vllm-project#865) Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>

srini-abhiram force-pushed the issue-234 branch from cd6f33d to 4719518 Compare December 18, 2025 15:14

srini-abhiram closed this Dec 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 234#1

Issue 234#1
srini-abhiram wants to merge 27 commits into
mainfrom
issue-234

srini-abhiram commented Dec 18, 2025

Uh oh!

github-actions Bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

srini-abhiram commented Dec 18, 2025

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

github-actions Bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmark Results

Summary

Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

github-actions Bot commented Dec 18, 2025 •

edited

Loading