✨ feat(e2e/llm-d): add LLM-D profile and test cases #705

samzong · 2025-11-20T16:41:05Z

FIX #658 (link existing issues this PR will resolve)

LLM-D Profile Test Case

Objectives

Verify the correctness, fault tolerance, and observability of LLM-based decision routing in standard K8s/Kind environments.
Maintain repeatable results, clean environments, and explicit assertions to facilitate CI integration and issue diagnosis.

What's Changed

Added LLM-D Profile implementation: e2e/profiles/llm-d/, including installation, configuration, health checks, and cleanup, with support for E2E_PROFILE=llm-d selection.
Registered LLM-D Profile in E2E main program: Profile binding completed in e2e/cmd/e2e/main.go for unified entry management.
Added end-to-end test cases: e2e/testcases/llmd_*.go, covering auto-routing, failover recovery, distributed inference, health/readiness, and performance baseline.
CI/CD integration: Added llm-d to test matrix in .github/workflows/integration-test-k8s.yml to ensure PR and main branch stability; no additional dependencies required.
Documentation updates: Added LLM-D usage section and troubleshooting guide in e2e/README.md; this file provides detailed test documentation for easy PR reference.
Make targets: Added help text and example usage in tools/make/e2e.mk for unified command entry.
Health and stability: Added 30–60 second stabilization period with explicit health check assertions; failure paths provide observable labels and reason codes, cleanup phase ensures no environment residue.

Module Boundaries

Profile: Responsible for environment lifecycle (installation, configuration, health checks, cleanup).
Testcases: Responsible for behavior verification (routing, degradation, observability, etc.).
Helper: Encapsulates common low-level operations (K8s resource queries, HTTP calls, retry and timeout).

Prerequisites

Available Kind/K8s cluster (consistent with CI default version).
Built or pullable Semantic Router image (with LLM-D decision capability).
Basic tools: kubectl, make, and necessary access credentials (not recorded in repository).

Environment Setup and Stabilization Period

Execute make e2e-setup E2E_PROFILE=llm-d to complete installation and configuration.
Recommend waiting 30–60 seconds after installation for stabilization to ensure related Pods reach Ready state and dependencies are initialized.
Health checks include: namespace resources Ready, service endpoint /healthz returns 200, necessary dependencies (e.g., upstream model services) reachable.

Test Case Overview

Test Case	Purpose	Key Assertions	Approx Duration
llmd-auto-routing	Auto model selection correctness	Selected model matches expectation; 200 OK; `x-inference-pod` header present	Seconds
llmd-failover-recovery	Traffic survives backend pod loss	Success rate ≥ 0.95; no routing to deleted pod; endpoints cleaned	Seconds
llmd-distributed-inference	Multi-replica backends serve and balance load	Success rate ≥ 0.98; hits on ≥ 2 pods; hit imbalance ratio ≤ 2.0	Seconds
llmd-health-check	Components readiness and basic chat	Required CRDs present; deployments Ready; chat returns 200	Seconds
llmd-performance-baseline	Baseline under moderate concurrency	Stage success rate ≥ 0.95; record p50/p95 latency per stage	Seconds

Note: Actual duration depends on environment/network; CI prioritizes "stable, fast".

llmd-auto-routing (Auto Model Selection)

Purpose: Given prompts from different domains, auto-selects the correct backend model.
Input:
- Prompts targeting math vs networking (e.g., "What is 2+2?", "Explain TCP three-way handshake").
- Mode auto to enable model auto-selection.
Steps:
- Call chat with model=auto and capture response headers.
- Derive selected model from x-vsr-selected-model or x-selected-model; fallback by inferring from x-inference-pod prefix.
Assertions:
- Selected model equals expectation (e.g., math → phi4-mini, networking → llama3-8b).
- x-inference-pod present and non-empty; HTTP 200.

llmd-failover-recovery (Failure and Recovery)

Purpose: When upstream model is unavailable, times out, or has low confidence, trigger explicit degradation path rather than silent success.
Input:
- Pod deletion to simulate backend loss; service continues to handle traffic.
- Configuration: Ensure endpoints reflect changes; routing avoids deleted pod.
Steps:
- Delete one replica; send requests during a recovery window; record success rate and hit distribution.
Assertions:
- Success rate ≥ 0.95; no hits routed to deleted pod; endpoints no longer include deleted pod.

llmd-distributed-inference (Multi-Replica Serving)

Purpose: Ensure multi-replica backends serve requests fairly and reliably.
Input:
- Deployments vllm-llama3-8b-instruct and phi4-mini with ≥ 2 ready replicas.
Steps:
- Send concurrent requests and record x-inference-pod per response.
- Compute success rate and per-pod hit counts.
Assertions:
- Success rate ≥ 0.98; hits appear on ≥ 2 pods; max/min hit ratio ≤ 2.0.

llmd-health-check (Health and Readiness)

Purpose: After deployment, component health and readiness probes pass, service endpoints are reachable.
Input:
- K8s resource status and service endpoints (e.g., /healthz).
Steps:
- After stabilization period, check Pod status, container logs, probe results.
- Perform lightweight requests to verify service endpoint responses.
Assertions:
- All related Pods Ready=True, probes pass.
- Service endpoint returns 200, basic paths are available.

llmd-performance-baseline (Moderate Concurrency Baseline)

Purpose: Measure success rate and latency percentiles under staged concurrency.
Input:
- Concurrency stages such as 15/30/60 with model=auto.
Steps:
- Run timed stages; collect per-request durations; perform single retry on transient failure.
Assertions:
- Stage success rate ≥ 0.95; compute p50/p95 latency per stage.

Execution Methods

Run all LLM-D tests: make e2e-test E2E_PROFILE=llm-d
Print verbose logs: make e2e-test E2E_PROFILE=llm-d E2E_VERBOSE=1
Setup environment only (for debugging): make e2e-setup E2E_PROFILE=llm-d

netlify · 2025-11-20T16:41:10Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`14ed091`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/6924678adce3cb0008d857e5
😎 Deploy Preview	https://deploy-preview-705--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-11-20T16:41:19Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `e2e`

Owners: @Xunzhuo
Files changed:

e2e/profiles/llm-d/manifests/httproute-services.yaml
e2e/profiles/llm-d/manifests/inference-sim.yaml
e2e/profiles/llm-d/manifests/rbac.yaml
e2e/profiles/llm-d/profile.go
e2e/profiles/llm-d/values.yaml
e2e/testcases/llmd_auto_routing.go
e2e/testcases/llmd_distributed_inference.go
e2e/testcases/llmd_failover_recovery.go
e2e/testcases/llmd_health_check.go
e2e/testcases/llmd_helpers.go
e2e/testcases/llmd_performance_baseline.go
e2e/README.md
e2e/cmd/e2e/main.go

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

.github/workflows/integration-test-k8s.yml

📁 `tools`

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

tools/make/e2e.mk

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Signed-off-by: samzong <[email protected]>

samzong · 2025-11-21T18:16:50Z

@Xunzhuo It's ready for review.

Xunzhuo · 2025-11-22T00:31:27Z

Cool, a quick question, any test cases for features combine VSR features and llm-d ?

samzong · 2025-11-22T07:44:16Z

@Xunzhuo Yes

VSR => Auto routing: e2e/testcases/llmd_auto_routing.go
LLM-d => Distributed inference: e2e/testcases/llmd_distributed_inference.go

samzong · 2025-11-22T15:16:38Z

/gemini summary

samzong · 2025-11-25T03:22:31Z

@Xunzhuo @rootfs

Hi, I've add details with "What's Changed" and "TestCases".

Could you please review it and let me know if there are any other issues?

Xunzhuo · 2025-11-25T10:09:26Z

My question is looking at the testcases they are around the functionality of llm-d, that is good but the integration test should add more cases around vsr + llm-d like verifying model auto selection + llm-d, ideally should reuse the shared test cases

samzong · 2025-11-26T01:34:32Z

@Xunzhuo Thanks for the review.

I will add VSR + LLM-D integration tests, reuse existing VSR auto-routing cases, verify selected-model consistency with the LLM-D backend, and include the failover path.

github-actions bot assigned rootfs and Xunzhuo Nov 20, 2025

samzong changed the title ~~✨ feat(e2e/llm-d): add LLM-D profile and test cases~~ [DO-NOT-MERGED] ✨ feat(e2e/llm-d): add LLM-D profile and test cases Nov 20, 2025

samzong force-pushed the e2e-llmd branch 3 times, most recently from 74d5686 to 137d03f Compare November 21, 2025 08:30

✨ feat(e2e/llm-d): add LLM-D profile and test cases

f3998d5

Signed-off-by: samzong <[email protected]>

samzong force-pushed the e2e-llmd branch from 5f82dc9 to f3998d5 Compare November 21, 2025 18:11

samzong changed the title ~~[DO-NOT-MERGED] ✨ feat(e2e/llm-d): add LLM-D profile and test cases~~ ✨ feat(e2e/llm-d): add LLM-D profile and test cases Nov 21, 2025

samzong marked this pull request as ready for review November 21, 2025 18:16

samzong requested review from Xunzhuo and rootfs as code owners November 21, 2025 18:16

Merge branch 'main' into e2e-llmd

14ed091

samzong marked this pull request as draft November 26, 2025 01:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ feat(e2e/llm-d): add LLM-D profile and test cases #705

✨ feat(e2e/llm-d): add LLM-D profile and test cases #705

Uh oh!

samzong commented Nov 20, 2025 •

edited

Loading

Uh oh!

netlify bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

samzong commented Nov 21, 2025

Uh oh!

Xunzhuo commented Nov 22, 2025

Uh oh!

samzong commented Nov 22, 2025

Uh oh!

samzong commented Nov 22, 2025

Uh oh!

samzong commented Nov 25, 2025

Uh oh!

Xunzhuo commented Nov 25, 2025

Uh oh!

samzong commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

✨ feat(e2e/llm-d): add LLM-D profile and test cases #705

Are you sure you want to change the base?

✨ feat(e2e/llm-d): add LLM-D profile and test cases #705

Uh oh!

Conversation

samzong commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LLM-D Profile Test Case

Objectives

What's Changed

Module Boundaries

Prerequisites

Environment Setup and Stabilization Period

Test Case Overview

llmd-auto-routing (Auto Model Selection)

llmd-failover-recovery (Failure and Recovery)

llmd-distributed-inference (Multi-Replica Serving)

llmd-health-check (Health and Readiness)

llmd-performance-baseline (Moderate Concurrency Baseline)

Execution Methods

Uh oh!

netlify bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 e2e

📁 Root Directory

📁 tools

🎉 Thanks for your contributions!

Uh oh!

samzong commented Nov 21, 2025

Uh oh!

Xunzhuo commented Nov 22, 2025

Uh oh!

samzong commented Nov 22, 2025

Uh oh!

samzong commented Nov 22, 2025

Uh oh!

samzong commented Nov 25, 2025

Uh oh!

Xunzhuo commented Nov 25, 2025

Uh oh!

samzong commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samzong commented Nov 20, 2025 •

edited

Loading

netlify bot commented Nov 20, 2025 •

edited

Loading

github-actions bot commented Nov 20, 2025 •

edited

Loading

📁 `e2e`

📁 `Root Directory`

📁 `tools`