-
Notifications
You must be signed in to change notification settings - Fork 297
✨ feat(e2e/llm-d): add LLM-D profile and test cases #705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
74d5686 to
137d03f
Compare
Signed-off-by: samzong <[email protected]>
|
@Xunzhuo It's ready for review. |
|
Cool, a quick question, any test cases for features combine VSR features and llm-d ? |
|
@Xunzhuo Yes
|
|
/gemini summary |
|
My question is looking at the testcases they are around the functionality of llm-d, that is good but the integration test should add more cases around vsr + llm-d like verifying model auto selection + llm-d, ideally should reuse the shared test cases |
|
@Xunzhuo Thanks for the review. I will add VSR + LLM-D integration tests, reuse existing VSR auto-routing cases, verify selected-model consistency with the LLM-D backend, and include the failover path. |

FIX #658 (link existing issues this PR will resolve)
LLM-D Profile Test Case
Objectives
What's Changed
LLM-DProfile implementation:e2e/profiles/llm-d/, including installation, configuration, health checks, and cleanup, with support forE2E_PROFILE=llm-dselection.LLM-DProfile in E2E main program: Profile binding completed ine2e/cmd/e2e/main.gofor unified entry management.e2e/testcases/llmd_*.go, covering auto-routing, failover recovery, distributed inference, health/readiness, and performance baseline.llm-dto test matrix in.github/workflows/integration-test-k8s.ymlto ensure PR and main branch stability; no additional dependencies required.e2e/README.md; this file provides detailed test documentation for easy PR reference.tools/make/e2e.mkfor unified command entry.Module Boundaries
Profile: Responsible for environment lifecycle (installation, configuration, health checks, cleanup).Testcases: Responsible for behavior verification (routing, degradation, observability, etc.).Helper: Encapsulates common low-level operations (K8s resource queries, HTTP calls, retry and timeout).Prerequisites
kubectl,make, and necessary access credentials (not recorded in repository).Environment Setup and Stabilization Period
make e2e-setup E2E_PROFILE=llm-dto complete installation and configuration./healthzreturns 200, necessary dependencies (e.g., upstream model services) reachable.Test Case Overview
x-inference-podheader presentllmd-auto-routing (Auto Model Selection)
autoto enable model auto-selection.model=autoand capture response headers.x-vsr-selected-modelorx-selected-model; fallback by inferring fromx-inference-podprefix.phi4-mini, networking →llama3-8b).x-inference-podpresent and non-empty; HTTP 200.llmd-failover-recovery (Failure and Recovery)
llmd-distributed-inference (Multi-Replica Serving)
vllm-llama3-8b-instructandphi4-miniwith ≥ 2 ready replicas.x-inference-podper response.llmd-health-check (Health and Readiness)
/healthz).Ready=True, probes pass.llmd-performance-baseline (Moderate Concurrency Baseline)
model=auto.Execution Methods
make e2e-test E2E_PROFILE=llm-dmake e2e-test E2E_PROFILE=llm-d E2E_VERBOSE=1make e2e-setup E2E_PROFILE=llm-d