Issue 234#1
Closed
srini-abhiram wants to merge 27 commits into
Closed
Conversation
* feat: initial model manager with model caching, downloading, verification, configuration Signed-off-by: samzong <samzong.lu@gmail.com> * refactor: delegate model downloading to a Python manager and externalize model configurations to YAML files. Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(linter): add 'ot' to codespell ignorewords list Signed-off-by: samzong <samzong.lu@gmail.com> * refactor: move model_manager tests to tests directory Signed-off-by: samzong <samzong.lu@gmail.com> --------- Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>
…ect#829) Signed-off-by: Senan Zedan <szedan@redhat.com>
…es calculation (vllm-project#833) Signed-off-by: Senan Zedan <szedan@redhat.com>
…ofrmance bench mark testing per PR (vllm-project#836)
…project#830) * move model_manager configs to config/models_manager/ Signed-off-by: JaredforReal <w13431838023@gmail.com> * add README for model_manager module Signed-off-by: JaredforReal <w13431838023@gmail.com> * update CI Signed-off-by: JaredforReal <w13431838023@gmail.com> * add hr_transfer to requirement Signed-off-by: JaredforReal <w13431838023@gmail.com> * accept reviews in README Signed-off-by: JaredforReal <w13431838023@gmail.com> --------- Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared <w13431838023@gmail.com>
Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>
Signed-off-by: Senan Zedan <szedan@redhat.com>
* e2e: add Response API basic operations tests
Add E2E tests for Response API basic operations:
- POST /v1/responses - Create a new response
- GET /v1/responses/{id} - Retrieve a response
- DELETE /v1/responses/{id} - Delete a response
- GET /v1/responses/{id}/input_items - List input items
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
* ci: add Response API tests to Docker Compose CI
- Enable Response API in config/config.yaml
- Add Response API test steps to integration-test-docker.yml:
- POST /v1/responses (create)
- GET /v1/responses/{id} (retrieve)
- GET /v1/responses/{id}/input_items (list input items)
- DELETE /v1/responses/{id} (delete and verify 404)
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
---------
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
Signed-off-by: bitliu <bitliu@tencent.com>
* feat: ad hallucination bench Signed-off-by: Huamin Chen <hchen@redhat.com> * review feedback Signed-off-by: Huamin Chen <hchen@redhat.com> --------- Signed-off-by: Huamin Chen <hchen@redhat.com>
…roject#840) * add tests for pii and tls module Signed-off-by: JaredforReal <w13431838023@gmail.com> * try fix integration test dynamic config test install kubectl fail Signed-off-by: JaredforReal <w13431838023@gmail.com> --------- Signed-off-by: JaredforReal <w13431838023@gmail.com>
…oject#846) Signed-off-by: samzong <samzong.lu@gmail.com>
Signed-off-by: Sophie8 <sw3237@nyu.edu>
…tributed tracing (vllm-project#852) Signed-off-by: Fang Han <fhan0520@gmail.com>
…roject#854) Remove the unused MappingPath field from FactCheckModelConfig as the fact-check classifier loads label mappings directly from the model's config.json file, consistent with hallucination and NLI models. Signed-off-by: bitliu <bitliu@tencent.com>
…project#856) Signed-off-by: abdallahsamabd <abdallahsamabd@gmail.com>
…oject#827) Signed-off-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb> Co-authored-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb>
* feat: All-in-One Docker image for single-container Signed-off-by: samzong <samzong.lu@gmail.com> * 📝 docs(all-in-one): add all-in-one deployment README.md Signed-off-by: samzong <samzong.lu@gmail.com> * 🤖 ci: add workflow for all-in-one multi-arch Docker build Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(docker): simplify base images and remove Chinese mirrors from Dockerfile.all-in-one Signed-off-by: samzong <samzong.lu@gmail.com> * 💄 style(deploy): add shellcheck disable comment for envsubst line Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(ci): improve disk space cleanup in docker workflow using action Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(ci): optimize docker-all-in-one workflow performance and caching Signed-off-by: samzong <samzong.lu@gmail.com> * 🔧 chore(docker-stack): change code name from all-in-one to stack. Signed-off-by: samzong <samzong.lu@gmail.com> --------- Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>
Performance Benchmark ResultsComponent benchmarks completed successfully. Summary
DetailsSee attached benchmark artifacts for detailed results and profiles. Performance testing powered by vLLM Semantic Router |
…dx limitations (vllm-project#859) Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: Jared <w13431838023@gmail.com>
Signed-off-by: samzong <samzong.lu@gmail.com>
…iles (vllm-project#865) Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
* feat(hf-playground): replace dissatisfaction models with new feedback and tool call models
- Remove Dissatisfaction Detector and Dissatisfaction Explainer models
- Add Feedback Detector model for unified user satisfaction analysis
- Add Tool Call Sentinel for prompt injection detection
- Add Tool Call Verifier for token-level tool call verification
The new models provide better coverage for conversational AI feedback
analysis and enhanced security for tool-calling LLM agents.
Signed-off-by: bitliu <bitliu@tencent.com>
* fix(hf-playground): correct label mappings for feedback-detector model
Update feedback-detector labels to match the actual model config:
- 0: NEED_CLARIFICATION (was SAT)
- 1: SAT (was NEED_CLARIFICATION)
- 2: WANT_DIFFERENT (was WRONG_ANSWER)
- 3: WRONG_ANSWER (was WANT_DIFFERENT)
Signed-off-by: bitliu <bitliu@tencent.com>
* feat(hf-playground): add support for simple token classification
Add classify_tokens_simple function and UI support for toolcall-verifier
model which uses simple AUTHORIZED/UNAUTHORIZED labels instead of BIO
format. This enables proper token-level verification of tool calls to
detect unauthorized actions in LLM agent systems.
Signed-off-by: bitliu <bitliu@tencent.com>
* feat(hf-playground): implement proper toolcall-verifier support
Add specialized classify_toolcall_verifier function that formats input
as '[USER] {intent} [TOOL] {call}' per model requirements. Update UI
to provide separate input fields for user intent and tool call JSON.
Display unauthorized tokens with detailed visualization showing flagged
tokens and token-level classification results.
Signed-off-by: bitliu <bitliu@tencent.com>
---------
Signed-off-by: bitliu <bitliu@tencent.com>
vllm-project#848) * Created comprehensive test coverage in req_filter_tools_test.go with 27 test cases covering Signed-off-by: Senan Zedan <szedan@redhat.com> * fix: resolve variable shadowing lint errors in tool selection tests - Fixed err shadowing in extproc_test.go by renaming to loadErr - Fixed router shadowing in req_filter_tools_test.go by using testRouter - All pre-commit checks now pass Signed-off-by: Senan Zedan <szedan@redhat.com> * fix: use correct BERT model ID in test config to match initialization - Changed CreateTestConfig to use all-MiniLM-L6-v2 instead of L12-v2 - This matches the model initialized in tool selection test BeforeEach - Prevents singleton initialization conflicts that cause embedding generation to fail Signed-off-by: Senan Zedan <szedan@redhat.com> * fix: update test expectation to match corrected BERT model ID Signed-off-by: Senan Zedan <szedan@redhat.com> * fix: recreate router after modifying similarity threshold in tests The 'Similarity Threshold Filtering' tests were modifying cfg.ToolSelection.Tools.SimilarityThreshold after the router was already created in BeforeEach. This caused the tests to use the default threshold (0.2) instead of the test-specific thresholds (0.7, 0.99, 0.5). The fix recreates the router after setting the new threshold value in each test, ensuring the ToolsDatabase is initialized with the correct threshold. This fixes the failing test: - 'should return empty list when no tools meet high threshold' And prevents similar issues in three other tests in the same describe block. Signed-off-by: Senan Zedan <szedan@redhat.com> --------- Signed-off-by: Senan Zedan <szedan@redhat.com>
This PR introduces the VSR (vLLM Semantic Router) CLI tool - a comprehensive command-line interface that reduces setup time from hours to minutes and provides a unified interface for deployment, monitoring, and troubleshooting across multiple environments. ## Key Features ### Core Commands - **deploy**: Multi-environment deployment (Local, Docker Compose, Kubernetes, Helm) - **config**: Configuration management with validation and templates - **model**: Model lifecycle management (download, list, validate, remove, inspect) - **status**: Health monitoring and status checks - **debug**: Interactive debugging and diagnostic tools - **dashboard**: Dashboard management and access - **test**: Prompt testing and validation - **upgrade**: Seamless version upgrades - **get**: Resource inspection (logs, pods, services) ### Implementation Details - Built with Go and Cobra framework for robust CLI experience - Comprehensive test coverage with 3,400+ lines of test code - Support for multiple deployment targets with environment detection - Process lifecycle management with orphan process prevention - Integrated health checking and diagnostics - Shell completion support (bash, zsh, fish, powershell) ### Documentation - Complete CLI documentation with quickstart guide - Command reference with examples - Troubleshooting guide - Integration with existing semantic-router documentation ### Files Changed - 49 files changed, 11,531 insertions(+), 16 deletions(-) - New CLI implementation in src/semantic-router/cmd/vsr/ - New CLI packages in src/semantic-router/pkg/cli/ - Documentation in website/docs/cli/ - Build system integration in tools/make/build-run-test.mk Resolves vllm-project#234 Signed-off-by: Srinivas A <56465971+srini-abhiram@users.noreply.github.com>
This commit fixes CI test-and-build failures for PR vllm-project#824 by addressing two root causes: 1. **CGO Dependency Chain**: CLI tests have transitive CGO dependencies through pkg/cli/model → pkg/classification → candle-binding, requiring the Rust shared library (libcandle_semantic_router.so) at compile time. 2. **Test Execution Strategy**: Split test targets to avoid duplication and ensure proper build dependencies are met. Changes: - tools/make/build-run-test.mk: * Added test-cli target that depends on build-router and sets LD_LIBRARY_PATH * Modified test-semantic-router to exclude CLI tests using grep -v '/cmd/vsr' * Updated main test target to run both test-cli and test-semantic-router * Enhanced build-cli to inject version, commit hash, and build date - Fixed 6 CLI test bugs found during validation: * model.go: Fixed Short description text mismatch * model_test.go: Added missing output flag initialization * completion.go: Added OnlyValidArgs validator for shell argument * config_test.go: Use Name() instead of Use for subcommand matching * debug.go: Fixed Short description text * signal_handling_test.go: Added cmd.Wait() after Process.Kill() All CLI tests now pass successfully with proper CGO setup. Local CI validation confirms critical tests pass. Resolves vllm-project#234 Signed-off-by: Srinivas A <56465971+srini-abhiram@users.noreply.github.com>
cd6f33d to
4719518
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
FILL IN THE PR DESCRIPTION HERE
FIX #xxxx (link existing issues this PR will resolve)
BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE
-swhen doinggit commit[Bugfix],[Feat], and[CI].Detailed Checklist (Click to Expand)
Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]for bug fixes.[CI/Build]for build or continuous integration improvements.[Doc]for documentation fixes and improvements.[Feat]for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).[Router]for changes to thevllm_router(e.g., routing algorithm, router observability, etc.).[Misc]for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
pre-committo format your code. SeeREADME.mdfor installation.DCO and Signed-off-by
When contributing changes to this project, you must agree to the DCO. Commits must include a
Signed-off-by:header which certifies agreement with the terms of the DCO.Using
-swithgit commitwill automatically add this header.What to Expect for the Reviews