feat(llm-katan): Add Kubernetes deployment support by noalimoy · Pull Request #710 · vllm-project/semantic-router

noalimoy · 2025-11-20T23:42:47Z

Summary

This PR adds comprehensive Kubernetes deployment support for llm-katan, enabling multi-instance deployments with model aliasing capabilities.

Kubernetes Manifests (Kustomize-based)

Base deployment with security contexts and health probes
PersistentVolumeClaim (5Gi) for efficient model caching
Service (ClusterIP) exposing port 8000
Namespace isolation (llm-katan-system)

Multi-Instance Support (Overlays)

gpt35 overlay: Serves gpt-3.5-turbo alias
claude overlay: Serves claude-3-haiku-20240307 alias
Isolated PVCs per instance (prevents ReadWriteOnce conflicts)
Common labels component for consistent resource labeling

Model Caching Optimization

InitContainer (model-downloader) pre-downloads models to PVC
Smart caching: Skips download if model exists
Uses python:3.11-slim + hf download for ~45MB lightweight init
Main container starts instantly with cached model

Bug Fix (config.py)

Added YLLM_SERVED_MODEL_NAME environment variable support
Previously only worked via CLI arguments
Now enables Kubernetes env-based configuration

Documentation

Comprehensive deployment guide (deploy/docs/README.md)
Architecture explanation (Pod structure, storage, networking)
Kind cluster setup examples
Troubleshooting section with common issues

Test Results

Deployment Validation (Kind Cluster)

Resources Created:

Namespace: llm-katan-system
Deployments: llm-katan-gpt35, llm-katan-claude (both 1/1 Running)
Services: llm-katan-gpt35, llm-katan-claude (ClusterIP, port 8000)
PVCs: llm-katan-models-gpt35, llm-katan-models-claude (both 5Gi Bound)

API Validation:

GPT35 instance
$ curl http://llm-katan-gpt35:8000/v1/models
{"data":[{"id":"gpt-3.5-turbo",...}]}
Claude instance
$ curl http://llm-katan-claude:8000/v1/models
{"data":[{"id":"claude-3-haiku-20240307",...}]}

Motivation

This implementation addresses the need for:

Cloud-native deployments: Production-ready Kubernetes manifests
Multi-instance testing: Run multiple model aliases simultaneously
Efficient resource usage: Model caching prevents redundant downloads
Testing flexibility: Easy overlay creation for new model aliases

The Kustomize structure enables:

Consistent base configuration
Environment-specific customization via overlays
Easy addition of new model aliases without base changes

Related issue: #278

netlify · 2025-11-20T23:42:52Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`b2fc5c8`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/693aefb9e9dce90008086faa
😎 Deploy Preview	https://deploy-preview-710--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-11-20T23:49:37Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/kubernetes/llm-katan/README.md
deploy/kubernetes/llm-katan/base/deployment.yaml
deploy/kubernetes/llm-katan/base/kustomization.yaml
deploy/kubernetes/llm-katan/base/namespace.yaml
deploy/kubernetes/llm-katan/base/pvc.yaml
deploy/kubernetes/llm-katan/base/service.yaml
deploy/kubernetes/llm-katan/components/common/kustomization.yaml
deploy/kubernetes/llm-katan/overlays/claude/kustomization.yaml
deploy/kubernetes/llm-katan/overlays/gpt35/kustomization.yaml
deploy/kubernetes/llm-katan/verify-deployment.sh

📁 `e2e-tests`

Owners: @yossiovadia
Files changed:

e2e-tests/llm-katan/README.md
e2e-tests/llm-katan/llm_katan/config.py
e2e-tests/llm-katan/pyproject.toml

📁 `tools`

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

tools/make/kube.mk

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

yossiovadia · 2025-11-23T22:01:05Z

Hey @noalimoy , i'll try catching u during the week to talks about this one

noalimoy · 2025-11-23T22:09:33Z

Hi @Xunzhuo
I'm working on this issue and need some clarification about the documentation requirements mentioned in it:

Examples for common use cases (CI/CD testing, multi-model setups)
Integration patterns with the main Semantic Router

Could you share more details on what exactly you're expecting for these sections? I want to be sure I'm implementing the intended scope.

Specifically:
Should this include adding a test-llm-katan make target to tools/make/kube.mk that will run in the CI workflow or sth else?

Thanks!

Xunzhuo · 2025-11-25T11:10:20Z

@noalimoy ideally we should replace all the base-model.yaml with qwen0.6B https://github.com/vllm-project/semantic-router/blob/main/deploy/kubernetes/ai-gateway/aigw-resources/base-model.yaml

rootfs · 2025-12-06T22:47:44Z

@yossiovadia PTAL, thanks

- Add comprehensive Kustomize manifests (base + overlays for gpt35/claude) - Implement initContainer for efficient model caching using PVC - Fix config.py to read YLLM_SERVED_MODEL_NAME from environment variables - Add deployment documentation with examples for Kind cluster / Minikube This enables running multiple llm-katan instances in Kubernetes, each serving different model aliases while sharing the same underlying model. The overlays (gpt35, claude) demonstrate multi-instance deployments where each instance exposes a different served model name (e.g., gpt-3.5-turbo, claude-3-haiku-20240307) via the API. The served model name now works via environment variables, enabling Kubernetes deployments to expose diffrent model name via the API. Signed-off-by: Noa Limoy <nlimoy@nlimoy-thinkpadp1gen7.raanaii.csb> Signed-off-by: noalimoy <nlimoy@redhat.com>

noalimoy · 2025-12-11T13:24:10Z

Hi @yossiovadia and @Xunzhuo

I'm summarizing the in-depth discussion @Yossi and I had yesterday regarding Issue #278.

Decision: Deployment Path Location

After thorough review, we determined that the correct location for llm-katan's Kubernetes deployment code is deploy/kubernetes/llm-katan/.

Rationale:

While LLM Katan is designed for testing purposes and could logically fit under e2e-tests/, we decided on deploy/kubernetes/ for the following reasons:

Project Alignment: This path aligns with the existing project structure where Kubernetes cluster deployment configurations are located under deploy/kubernetes/ folder
Future Integration: This decision connects with Yossi's vision to add Kubernetes support for a full Semantic Router deployment with LLM Katan integration. This will ensure proper integration with the current implementation of Issue Add Kubernetes Support for LLM Katan #278 (Add Kubernetes Support for LLM Katan)
Consistency: Currently, full deployment options exist via Helm, Docker Compose, and OpenShift, but a pure Kubernetes manifest-based deployment with LLM Katan integration is missing. This PR lays the foundation for that future work.

Additional Context:

This creates a clear separation: deploy/kubernetes/llm-katan/ provides Kubernetes manifests for standalone llm-katan deployment, while future work can add integrated deployments
I've thoroughly tested the deployment with multi-instance configurations (GPT-3.5 and Claude overlays) and verified all endpoints work correctly
A follow-up issue will be created to add full Semantic Router with llm-kata integrated Kubernetes deployment, building on this foundation

yossiovadia · 2025-12-11T16:12:31Z

I see some failures, i think those are not related but some environmental, lets see if those are getting resolved first.

github-actions bot assigned yossiovadia Nov 20, 2025

noalimoy force-pushed the feat/add_kubernetes_support_for_llm_katan/278 branch from 04e7542 to 62e5d54 Compare November 23, 2025 21:56

noalimoy force-pushed the feat/add_kubernetes_support_for_llm_katan/278 branch from 1d38d8e to 82e4a5b Compare December 2, 2025 11:47

github-actions bot assigned rootfs and Xunzhuo Dec 2, 2025

noalimoy force-pushed the feat/add_kubernetes_support_for_llm_katan/278 branch from 82e4a5b to 64b700c Compare December 3, 2025 12:02

rootfs requested a review from yossiovadia December 3, 2025 15:26

samzong mentioned this pull request Dec 4, 2025

Milestone v0.1 — Progress Report #769

Closed

noalimoy force-pushed the feat/add_kubernetes_support_for_llm_katan/278 branch from 64b700c to 7ba97ca Compare December 4, 2025 11:15

noalimoy marked this pull request as ready for review December 6, 2025 22:13

noalimoy requested review from Xunzhuo and rootfs as code owners December 6, 2025 22:13

github-actions bot assigned yuluo-yx Dec 8, 2025

noalimoy force-pushed the feat/add_kubernetes_support_for_llm_katan/278 branch from 9705981 to eef9b68 Compare December 11, 2025 13:13

Merge branch 'main' into feat/add_kubernetes_support_for_llm_katan/278

b2fc5c8

rootfs merged commit c23d5b4 into vllm-project:main Dec 11, 2025
39 of 40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm-katan): Add Kubernetes deployment support#710

feat(llm-katan): Add Kubernetes deployment support#710
rootfs merged 2 commits intovllm-project:mainfrom
noalimoy:feat/add_kubernetes_support_for_llm_katan/278

noalimoy commented Nov 20, 2025

Uh oh!

netlify bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

yossiovadia commented Nov 23, 2025

Uh oh!

noalimoy commented Nov 23, 2025

Uh oh!

Xunzhuo commented Nov 25, 2025

Uh oh!

rootfs commented Dec 6, 2025

Uh oh!

noalimoy commented Dec 11, 2025 •

edited

Loading

Uh oh!

yossiovadia commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

noalimoy commented Nov 20, 2025

Summary

Kubernetes Manifests (Kustomize-based)

Multi-Instance Support (Overlays)

Model Caching Optimization

Bug Fix (config.py)

Documentation

Test Results

Deployment Validation (Kind Cluster)

Motivation

Uh oh!

netlify bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 deploy

📁 e2e-tests

📁 tools

🎉 Thanks for your contributions!

Uh oh!

yossiovadia commented Nov 23, 2025

Uh oh!

noalimoy commented Nov 23, 2025

Uh oh!

Xunzhuo commented Nov 25, 2025

Uh oh!

rootfs commented Dec 6, 2025

Uh oh!

noalimoy commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yossiovadia commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

netlify bot commented Nov 20, 2025 •

edited

Loading

github-actions bot commented Nov 20, 2025 •

edited

Loading

📁 `deploy`

📁 `e2e-tests`

📁 `tools`

noalimoy commented Dec 11, 2025 •

edited

Loading