Skip to content

Issue 234#1

Closed
srini-abhiram wants to merge 27 commits into
mainfrom
issue-234
Closed

Issue 234#1
srini-abhiram wants to merge 27 commits into
mainfrom
issue-234

Conversation

@srini-abhiram
Copy link
Copy Markdown
Owner

FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (link existing issues this PR will resolve)

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

samzong and others added 20 commits December 15, 2025 01:14
* feat: initial model manager with model caching, downloading, verification, configuration

Signed-off-by: samzong <samzong.lu@gmail.com>

* refactor: delegate model downloading to a Python manager and externalize model configurations to YAML files.

Signed-off-by: samzong <samzong.lu@gmail.com>

* 🔧 chore(linter): add 'ot' to codespell ignorewords list

Signed-off-by: samzong <samzong.lu@gmail.com>

* refactor: move model_manager tests to  tests directory

Signed-off-by: samzong <samzong.lu@gmail.com>

---------

Signed-off-by: samzong <samzong.lu@gmail.com>
Co-authored-by: Jared <w13431838023@gmail.com>
…es calculation (vllm-project#833)

Signed-off-by: Senan Zedan <szedan@redhat.com>
…project#830)

* move model_manager configs to config/models_manager/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* add README for model_manager module

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* update CI

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* add hr_transfer to requirement

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* accept reviews in README

Signed-off-by: JaredforReal <w13431838023@gmail.com>

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Jared <w13431838023@gmail.com>
Signed-off-by: samzong <samzong.lu@gmail.com>
Co-authored-by: Jared <w13431838023@gmail.com>
Signed-off-by: Senan Zedan <szedan@redhat.com>
* e2e: add Response API basic operations tests

Add E2E tests for Response API basic operations:
- POST /v1/responses - Create a new response
- GET /v1/responses/{id} - Retrieve a response
- DELETE /v1/responses/{id} - Delete a response
- GET /v1/responses/{id}/input_items - List input items

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

* ci: add Response API tests to Docker Compose CI

- Enable Response API in config/config.yaml
- Add Response API test steps to integration-test-docker.yml:
  - POST /v1/responses (create)
  - GET /v1/responses/{id} (retrieve)
  - GET /v1/responses/{id}/input_items (list input items)
  - DELETE /v1/responses/{id} (delete and verify 404)

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

---------

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
Signed-off-by: bitliu <bitliu@tencent.com>
* feat: ad hallucination bench

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>
…roject#840)

* add tests for pii and tls module

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix integration test dynamic config test install kubectl fail

Signed-off-by: JaredforReal <w13431838023@gmail.com>

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Sophie8 <sw3237@nyu.edu>
…tributed tracing (vllm-project#852)

Signed-off-by: Fang Han <fhan0520@gmail.com>
…roject#854)

Remove the unused MappingPath field from FactCheckModelConfig as the fact-check classifier loads label mappings directly from the model's config.json file, consistent with hallucination and NLI models.

Signed-off-by: bitliu <bitliu@tencent.com>
…project#856)

Signed-off-by: abdallahsamabd <abdallahsamabd@gmail.com>
…oject#827)

Signed-off-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb>
Co-authored-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb>
* feat: All-in-One Docker image for single-container

Signed-off-by: samzong <samzong.lu@gmail.com>

* 📝 docs(all-in-one): add all-in-one deployment README.md

Signed-off-by: samzong <samzong.lu@gmail.com>

* 🤖 ci: add workflow for all-in-one multi-arch Docker build

Signed-off-by: samzong <samzong.lu@gmail.com>

* 🔧 chore(docker): simplify base images and remove Chinese mirrors from Dockerfile.all-in-one

Signed-off-by: samzong <samzong.lu@gmail.com>

* 💄 style(deploy): add shellcheck disable comment for envsubst line

Signed-off-by: samzong <samzong.lu@gmail.com>

* 🔧 chore(ci): improve disk space cleanup in docker workflow using action

Signed-off-by: samzong <samzong.lu@gmail.com>

* 🔧 chore(ci): optimize docker-all-in-one workflow performance and caching

Signed-off-by: samzong <samzong.lu@gmail.com>

* 🔧 chore(docker-stack): change code name from all-in-one to stack.

Signed-off-by: samzong <samzong.lu@gmail.com>

---------

Signed-off-by: samzong <samzong.lu@gmail.com>
Co-authored-by: Jared <w13431838023@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 18, 2025

Performance Benchmark Results

Component benchmarks completed successfully.

Summary

  • Classification benchmarks: ✅
  • Decision engine benchmarks: ✅
  • Cache benchmarks: ✅

Details

See attached benchmark artifacts for detailed results and profiles.


Performance testing powered by vLLM Semantic Router

samzong and others added 7 commits December 18, 2025 21:13
…dx limitations (vllm-project#859)

Signed-off-by: samzong <samzong.lu@gmail.com>
Co-authored-by: Jared <w13431838023@gmail.com>
Signed-off-by: samzong <samzong.lu@gmail.com>
…iles (vllm-project#865)

Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
* feat(hf-playground): replace dissatisfaction models with new feedback and tool call models

- Remove Dissatisfaction Detector and Dissatisfaction Explainer models
- Add Feedback Detector model for unified user satisfaction analysis
- Add Tool Call Sentinel for prompt injection detection
- Add Tool Call Verifier for token-level tool call verification

The new models provide better coverage for conversational AI feedback
analysis and enhanced security for tool-calling LLM agents.

Signed-off-by: bitliu <bitliu@tencent.com>

* fix(hf-playground): correct label mappings for feedback-detector model

Update feedback-detector labels to match the actual model config:
- 0: NEED_CLARIFICATION (was SAT)
- 1: SAT (was NEED_CLARIFICATION)
- 2: WANT_DIFFERENT (was WRONG_ANSWER)
- 3: WRONG_ANSWER (was WANT_DIFFERENT)

Signed-off-by: bitliu <bitliu@tencent.com>

* feat(hf-playground): add support for simple token classification

Add classify_tokens_simple function and UI support for toolcall-verifier
model which uses simple AUTHORIZED/UNAUTHORIZED labels instead of BIO
format. This enables proper token-level verification of tool calls to
detect unauthorized actions in LLM agent systems.

Signed-off-by: bitliu <bitliu@tencent.com>

* feat(hf-playground): implement proper toolcall-verifier support

Add specialized classify_toolcall_verifier function that formats input
as '[USER] {intent} [TOOL] {call}' per model requirements. Update UI
to provide separate input fields for user intent and tool call JSON.
Display unauthorized tokens with detailed visualization showing flagged
tokens and token-level classification results.

Signed-off-by: bitliu <bitliu@tencent.com>

---------

Signed-off-by: bitliu <bitliu@tencent.com>
vllm-project#848)

* Created comprehensive test coverage in req_filter_tools_test.go with 27 test cases covering

Signed-off-by: Senan Zedan <szedan@redhat.com>

* fix: resolve variable shadowing lint errors in tool selection tests

- Fixed err shadowing in extproc_test.go by renaming to loadErr
- Fixed router shadowing in req_filter_tools_test.go by using testRouter
- All pre-commit checks now pass

Signed-off-by: Senan Zedan <szedan@redhat.com>

* fix: use correct BERT model ID in test config to match initialization

- Changed CreateTestConfig to use all-MiniLM-L6-v2 instead of L12-v2
- This matches the model initialized in tool selection test BeforeEach
- Prevents singleton initialization conflicts that cause embedding generation to fail

Signed-off-by: Senan Zedan <szedan@redhat.com>

* fix: update test expectation to match corrected BERT model ID

Signed-off-by: Senan Zedan <szedan@redhat.com>

* fix: recreate router after modifying similarity threshold in tests

The 'Similarity Threshold Filtering' tests were modifying
cfg.ToolSelection.Tools.SimilarityThreshold after the router was already
created in BeforeEach. This caused the tests to use the default threshold
(0.2) instead of the test-specific thresholds (0.7, 0.99, 0.5).

The fix recreates the router after setting the new threshold value in each
test, ensuring the ToolsDatabase is initialized with the correct threshold.

This fixes the failing test:
- 'should return empty list when no tools meet high threshold'

And prevents similar issues in three other tests in the same describe block.

Signed-off-by: Senan Zedan <szedan@redhat.com>

---------

Signed-off-by: Senan Zedan <szedan@redhat.com>
This PR introduces the VSR (vLLM Semantic Router) CLI tool - a comprehensive
command-line interface that reduces setup time from hours to minutes and provides
a unified interface for deployment, monitoring, and troubleshooting across multiple
environments.

## Key Features

### Core Commands
- **deploy**: Multi-environment deployment (Local, Docker Compose, Kubernetes, Helm)
- **config**: Configuration management with validation and templates
- **model**: Model lifecycle management (download, list, validate, remove, inspect)
- **status**: Health monitoring and status checks
- **debug**: Interactive debugging and diagnostic tools
- **dashboard**: Dashboard management and access
- **test**: Prompt testing and validation
- **upgrade**: Seamless version upgrades
- **get**: Resource inspection (logs, pods, services)

### Implementation Details
- Built with Go and Cobra framework for robust CLI experience
- Comprehensive test coverage with 3,400+ lines of test code
- Support for multiple deployment targets with environment detection
- Process lifecycle management with orphan process prevention
- Integrated health checking and diagnostics
- Shell completion support (bash, zsh, fish, powershell)

### Documentation
- Complete CLI documentation with quickstart guide
- Command reference with examples
- Troubleshooting guide
- Integration with existing semantic-router documentation

### Files Changed
- 49 files changed, 11,531 insertions(+), 16 deletions(-)
- New CLI implementation in src/semantic-router/cmd/vsr/
- New CLI packages in src/semantic-router/pkg/cli/
- Documentation in website/docs/cli/
- Build system integration in tools/make/build-run-test.mk

Resolves vllm-project#234

Signed-off-by: Srinivas A <56465971+srini-abhiram@users.noreply.github.com>
This commit fixes CI test-and-build failures for PR vllm-project#824 by addressing
two root causes:

1. **CGO Dependency Chain**: CLI tests have transitive CGO dependencies
   through pkg/cli/model → pkg/classification → candle-binding, requiring
   the Rust shared library (libcandle_semantic_router.so) at compile time.

2. **Test Execution Strategy**: Split test targets to avoid duplication
   and ensure proper build dependencies are met.

Changes:
- tools/make/build-run-test.mk:
  * Added test-cli target that depends on build-router and sets LD_LIBRARY_PATH
  * Modified test-semantic-router to exclude CLI tests using grep -v '/cmd/vsr'
  * Updated main test target to run both test-cli and test-semantic-router
  * Enhanced build-cli to inject version, commit hash, and build date

- Fixed 6 CLI test bugs found during validation:
  * model.go: Fixed Short description text mismatch
  * model_test.go: Added missing output flag initialization
  * completion.go: Added OnlyValidArgs validator for shell argument
  * config_test.go: Use Name() instead of Use for subcommand matching
  * debug.go: Fixed Short description text
  * signal_handling_test.go: Added cmd.Wait() after Process.Kill()

All CLI tests now pass successfully with proper CGO setup. Local CI
validation confirms critical tests pass.

Resolves vllm-project#234

Signed-off-by: Srinivas A <56465971+srini-abhiram@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.