Skip to content

Conversation

@samzong
Copy link
Contributor

@samzong samzong commented Nov 20, 2025

FIX #658 (link existing issues this PR will resolve)

LLM-D Profile Test Case

Objectives

  • Verify the correctness, fault tolerance, and observability of LLM-based decision routing in standard K8s/Kind environments.
  • Maintain repeatable results, clean environments, and explicit assertions to facilitate CI integration and issue diagnosis.

What's Changed

  • Added LLM-D Profile implementation: e2e/profiles/llm-d/, including installation, configuration, health checks, and cleanup, with support for E2E_PROFILE=llm-d selection.
  • Registered LLM-D Profile in E2E main program: Profile binding completed in e2e/cmd/e2e/main.go for unified entry management.
  • Added end-to-end test cases: e2e/testcases/llmd_*.go, covering auto-routing, failover recovery, distributed inference, health/readiness, and performance baseline.
  • CI/CD integration: Added llm-d to test matrix in .github/workflows/integration-test-k8s.yml to ensure PR and main branch stability; no additional dependencies required.
  • Documentation updates: Added LLM-D usage section and troubleshooting guide in e2e/README.md; this file provides detailed test documentation for easy PR reference.
  • Make targets: Added help text and example usage in tools/make/e2e.mk for unified command entry.
  • Health and stability: Added 30–60 second stabilization period with explicit health check assertions; failure paths provide observable labels and reason codes, cleanup phase ensures no environment residue.

Module Boundaries

  • Profile: Responsible for environment lifecycle (installation, configuration, health checks, cleanup).
  • Testcases: Responsible for behavior verification (routing, degradation, observability, etc.).
  • Helper: Encapsulates common low-level operations (K8s resource queries, HTTP calls, retry and timeout).

Prerequisites

  • Available Kind/K8s cluster (consistent with CI default version).
  • Built or pullable Semantic Router image (with LLM-D decision capability).
  • Basic tools: kubectl, make, and necessary access credentials (not recorded in repository).

Environment Setup and Stabilization Period

  • Execute make e2e-setup E2E_PROFILE=llm-d to complete installation and configuration.
  • Recommend waiting 30–60 seconds after installation for stabilization to ensure related Pods reach Ready state and dependencies are initialized.
  • Health checks include: namespace resources Ready, service endpoint /healthz returns 200, necessary dependencies (e.g., upstream model services) reachable.

Test Case Overview

Test Case Purpose Key Assertions Approx Duration
llmd-auto-routing Auto model selection correctness Selected model matches expectation; 200 OK; x-inference-pod header present Seconds
llmd-failover-recovery Traffic survives backend pod loss Success rate ≥ 0.95; no routing to deleted pod; endpoints cleaned Seconds
llmd-distributed-inference Multi-replica backends serve and balance load Success rate ≥ 0.98; hits on ≥ 2 pods; hit imbalance ratio ≤ 2.0 Seconds
llmd-health-check Components readiness and basic chat Required CRDs present; deployments Ready; chat returns 200 Seconds
llmd-performance-baseline Baseline under moderate concurrency Stage success rate ≥ 0.95; record p50/p95 latency per stage Seconds

Note: Actual duration depends on environment/network; CI prioritizes "stable, fast".


llmd-auto-routing (Auto Model Selection)

  • Purpose: Given prompts from different domains, auto-selects the correct backend model.
  • Input:
    • Prompts targeting math vs networking (e.g., "What is 2+2?", "Explain TCP three-way handshake").
    • Mode auto to enable model auto-selection.
  • Steps:
    • Call chat with model=auto and capture response headers.
    • Derive selected model from x-vsr-selected-model or x-selected-model; fallback by inferring from x-inference-pod prefix.
  • Assertions:
    • Selected model equals expectation (e.g., math → phi4-mini, networking → llama3-8b).
    • x-inference-pod present and non-empty; HTTP 200.

llmd-failover-recovery (Failure and Recovery)

  • Purpose: When upstream model is unavailable, times out, or has low confidence, trigger explicit degradation path rather than silent success.
  • Input:
    • Pod deletion to simulate backend loss; service continues to handle traffic.
    • Configuration: Ensure endpoints reflect changes; routing avoids deleted pod.
  • Steps:
    • Delete one replica; send requests during a recovery window; record success rate and hit distribution.
  • Assertions:
    • Success rate ≥ 0.95; no hits routed to deleted pod; endpoints no longer include deleted pod.

llmd-distributed-inference (Multi-Replica Serving)

  • Purpose: Ensure multi-replica backends serve requests fairly and reliably.
  • Input:
    • Deployments vllm-llama3-8b-instruct and phi4-mini with ≥ 2 ready replicas.
  • Steps:
    • Send concurrent requests and record x-inference-pod per response.
    • Compute success rate and per-pod hit counts.
  • Assertions:
    • Success rate ≥ 0.98; hits appear on ≥ 2 pods; max/min hit ratio ≤ 2.0.

llmd-health-check (Health and Readiness)

  • Purpose: After deployment, component health and readiness probes pass, service endpoints are reachable.
  • Input:
    • K8s resource status and service endpoints (e.g., /healthz).
  • Steps:
    • After stabilization period, check Pod status, container logs, probe results.
    • Perform lightweight requests to verify service endpoint responses.
  • Assertions:
    • All related Pods Ready=True, probes pass.
    • Service endpoint returns 200, basic paths are available.

llmd-performance-baseline (Moderate Concurrency Baseline)

  • Purpose: Measure success rate and latency percentiles under staged concurrency.
  • Input:
    • Concurrency stages such as 15/30/60 with model=auto.
  • Steps:
    • Run timed stages; collect per-request durations; perform single retry on transient failure.
  • Assertions:
    • Stage success rate ≥ 0.95; compute p50/p95 latency per stage.

Execution Methods

  • Run all LLM-D tests: make e2e-test E2E_PROFILE=llm-d
  • Print verbose logs: make e2e-test E2E_PROFILE=llm-d E2E_VERBOSE=1
  • Setup environment only (for debugging): make e2e-setup E2E_PROFILE=llm-d

@netlify
Copy link

netlify bot commented Nov 20, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 14ed091
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/6924678adce3cb0008d857e5
😎 Deploy Preview https://deploy-preview-705--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Nov 20, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 e2e

Owners: @Xunzhuo
Files changed:

  • e2e/profiles/llm-d/manifests/httproute-services.yaml
  • e2e/profiles/llm-d/manifests/inference-sim.yaml
  • e2e/profiles/llm-d/manifests/rbac.yaml
  • e2e/profiles/llm-d/profile.go
  • e2e/profiles/llm-d/values.yaml
  • e2e/testcases/llmd_auto_routing.go
  • e2e/testcases/llmd_distributed_inference.go
  • e2e/testcases/llmd_failover_recovery.go
  • e2e/testcases/llmd_health_check.go
  • e2e/testcases/llmd_helpers.go
  • e2e/testcases/llmd_performance_baseline.go
  • e2e/README.md
  • e2e/cmd/e2e/main.go

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .github/workflows/integration-test-k8s.yml

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/e2e.mk

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@samzong samzong changed the title ✨ feat(e2e/llm-d): add LLM-D profile and test cases [DO-NOT-MERGED] ✨ feat(e2e/llm-d): add LLM-D profile and test cases Nov 20, 2025
@samzong samzong force-pushed the e2e-llmd branch 3 times, most recently from 74d5686 to 137d03f Compare November 21, 2025 08:30
@samzong samzong changed the title [DO-NOT-MERGED] ✨ feat(e2e/llm-d): add LLM-D profile and test cases ✨ feat(e2e/llm-d): add LLM-D profile and test cases Nov 21, 2025
@samzong samzong marked this pull request as ready for review November 21, 2025 18:16
@samzong
Copy link
Contributor Author

samzong commented Nov 21, 2025

@Xunzhuo It's ready for review.

@Xunzhuo
Copy link
Member

Xunzhuo commented Nov 22, 2025

Cool, a quick question, any test cases for features combine VSR features and llm-d ?

@samzong
Copy link
Contributor Author

samzong commented Nov 22, 2025

@Xunzhuo Yes

  • VSR => Auto routing: e2e/testcases/llmd_auto_routing.go
  • LLM-d => Distributed inference: e2e/testcases/llmd_distributed_inference.go

@samzong
Copy link
Contributor Author

samzong commented Nov 22, 2025

/gemini summary

@samzong
Copy link
Contributor Author

samzong commented Nov 25, 2025

@Xunzhuo @rootfs

Hi, I've add details with "What's Changed" and "TestCases".

Could you please review it and let me know if there are any other issues?

@Xunzhuo
Copy link
Member

Xunzhuo commented Nov 25, 2025

My question is looking at the testcases they are around the functionality of llm-d, that is good but the integration test should add more cases around vsr + llm-d like verifying model auto selection + llm-d, ideally should reuse the shared test cases

@samzong
Copy link
Contributor Author

samzong commented Nov 26, 2025

@Xunzhuo Thanks for the review.

I will add VSR + LLM-D integration tests, reuse existing VSR auto-routing cases, verify selected-model consistency with the LLM-D backend, and include the failover path.

@samzong samzong marked this pull request as draft November 26, 2025 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[E2E] Add llm-d profile for E2E testing framework

3 participants