sgl-project · lizzy-0323 · Jul 31, 2025 · Jul 31, 2025 · gemini-code-assist · Jul 31, 2025
@@ -0,0 +1,6 @@
+apiVersion: v2
+name: ome-predefined-models
+description: OME Predefined Models and Serving Runtimes
+type: application
+version: 0.1.0
+appVersion: "1.16.0"
@@ -0,0 +1,201 @@
+# ome-predefined-models
+
+![Version: 0.1.0](https://img.shields.io/badge/Version-0.1.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.16.0](https://img.shields.io/badge/AppVersion-1.16.0-informational?style=flat-square)
+
+OME Predefined Models and Serving Runtimes
+
+## Description
+
+This Helm chart provides a collection of predefined models and serving runtimes for OME (Open Model Engine). Instead of manually managing these resources through kustomize, users can now deploy them natively using Helm with fine-grained control over which models and runtimes to enable.
+
+## Features
+
+- **Predefined Models**: Deploy popular models from various vendors (Meta, DeepSeek, Intfloat, Microsoft, Moonshot AI, NVIDIA)
+- **Serving Runtimes**: Support for both vLLM and SRT (SGLang Runtime) configurations
+- **Selective Deployment**: Enable/disable specific models and runtimes through values configuration
+- **Production Ready**: Includes proper resource limits, health checks, and monitoring configurations
+
+## Installation
+
+### Prerequisites
+
+- Kubernetes cluster with GPU nodes
+- OME CRDs already installed (`ome-crd` chart)
+- OME controller running (`ome-resources` chart)
+
+### Install the chart
+
+```bash
+helm repo add ome https://sgl-project.github.io/ome
+helm repo update
+
+# Install with default values
+helm install ome-predefined-models ome/ome-predefined-models
+
+# Or install from local chart
+helm install ome-predefined-models ./charts/ome-predefined-models
+```
+
+### Custom Configuration
+
+Create a `custom-values.yaml` file to customize which models and runtimes to enable:
+
+```yaml
+# Enable all resources
+global:
+  enableAll: false
+
+# Enable specific models
+models:
+  meta:
+    enabled: true
+    llama_3_3_70b_instruct:
+      enabled: true
+    llama_4_maverick_17b_128e_instruct_fp8:
+      enabled: false
+
+  deepseek:
+    enabled: true
+    deepseek_v3:
+      enabled: true
+    deepseek_r1:
+      enabled: false
+
+  intfloat:
+    enabled: true
+    e5_mistral_7b_instruct:
+      enabled: true
+
+# Enable specific runtimes
+runtimes:
+  vllm:
+    enabled: true
+    e5_mistral_7b_instruct:
+      enabled: true
+    llama_3_3_70b_instruct:
+      enabled: true
+
+  srt:
+    enabled: true
+    deepseek_rdma:
+      enabled: true
+    e5_mistral_7b_instruct:
+      enabled: true
+```
+
+Then install with your custom values:
+
+```bash
+helm install ome-predefined-models ./charts/ome-predefined-models -f custom-values.yaml
+```
+
+## Supported Models
+
+### Meta/Llama Models
+
+- `llama-3-3-70b-instruct` - Llama 3.3 70B Instruct model
+- `llama-4-maverick-17b-128e-instruct-fp8` - Llama 4 Maverick 17B model (FP8)
+- `llama-4-scout-17b-16e-instruct` - Llama 4 Scout 17B model
+
+### DeepSeek Models
+
+- `deepseek-v3` - DeepSeek V3 model
+- `deepseek-r1` - DeepSeek R1 model
+
+### Intfloat Models
+
+- `e5-mistral-7b-instruct` - E5 Mistral 7B Instruct model
+
+### Microsoft Models
+
+- `phi-3-vision-128k-instruct` - Phi-3 Vision 128K Instruct model
+
+### Moonshot AI Models
+
+- `kimi-k2-instruct` - Kimi K2 Instruct model
+
+### NVIDIA Models
+
+- `llama-3-1-nemotron-ultra-253b-v1` - Llama 3.1 Nemotron Ultra 253B
+- `llama-3-3-nemotron-super-49b-v1` - Llama 3.3 Nemotron Super 49B
+- `llama-3-1-nemotron-nano-8b-v1` - Llama 3.1 Nemotron Nano 8B
+
+## Supported Runtimes
+
+### vLLM Runtimes
+
+- Optimized for inference workloads
+- Built-in OpenAI-compatible API server
+- Efficient memory utilization
+
+### SRT (SGLang Runtime) Runtimes
+
+- Advanced serving capabilities
+- Support for complex multi-node deployments
+- RDMA support for high-performance networking
+
+## Configuration Values
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| global.enableAll | bool | `false` | Enable all predefined resources |
+| models.meta.enabled | bool | `true` | Enable Meta/Llama models |
+| models.deepseek.enabled | bool | `true` | Enable DeepSeek models |
+| models.intfloat.enabled | bool | `true` | Enable Intfloat models |
+| models.microsoft.enabled | bool | `false` | Enable Microsoft models |
+| models.moonshotai.enabled | bool | `false` | Enable Moonshot AI models |
+| models.nvidia.enabled | bool | `false` | Enable NVIDIA models |
+| runtimes.vllm.enabled | bool | `true` | Enable vLLM runtimes |
+| runtimes.srt.enabled | bool | `true` | Enable SRT runtimes |
+
+## Usage Examples
+
+### Deploy Only Essential Models
+
+```yaml
+global:
+  enableAll: false
+
+models:
+  meta:
+    enabled: true
+    llama_3_3_70b_instruct:
+      enabled: true
+
+  intfloat:
+    enabled: true
+    e5_mistral_7b_instruct:
+      enabled: true
+
+runtimes:
+  vllm:
+    enabled: true
+    llama_3_3_70b_instruct:
+      enabled: true
+    e5_mistral_7b_instruct:
+      enabled: true
+```
+
+### High-Performance Setup with RDMA
+
+```yaml
+models:
+  deepseek:
+    enabled: true
+    deepseek_v3:
+      enabled: true
+
+runtimes:
+  srt:
+    enabled: true
+    deepseek_rdma:
+      enabled: true
+```
+
+## Contributing
+
+To add new models or runtimes:
+
+1. Add the configuration to the appropriate template file
+2. Update the `values.yaml` with the new configuration options
+3. Update this README with the new resource information
@@ -0,0 +1,187 @@
+{{- if or .Values.global.enableAll .Values.models.meta.enabled }}
+{{- if or .Values.global.enableAll .Values.models.meta.llama_3_3_70b_instruct.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: llama-3-3-70b-instruct
+spec:
+  disabled: false
+  displayName: meta.llama-3.3-70b-instruct
+  storage:
+    storageUri: hf://meta-llama/Llama-3.3-70B-Instruct
+    path: /raid/models/meta/llama-3-3-70b-instruct
+    key: "hf-token"
+  vendor: meta
+  version: "1.0.0"
+{{- end }}
+{{- if or .Values.global.enableAll .Values.models.meta.llama_4_maverick_17b_128e_instruct_fp8.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: llama-4-maverick-17b-128e-instruct-fp8
+spec:
+  vendor: meta
+  disabled: false
+  displayName: meta.llama-4-maverick-17b-128e-instruct-fp8
+  version: "1.0.0"
+  modelFormat:
+    name: safetensors
+    version: "1.0.0"
+  modelFramework:
+    name: transformers
+    version: "4.51.0.dev0"
+  modelType: llama
+  modelArchitecture: Llama4ForConditionalGeneration
+  storage:
+    storageUri: hf://meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
+    path: /raid/models/meta/llama-4-maverick-17b-128e-instruct-fp8
+    key: "hf-token"
+{{- end }}
+{{- if or .Values.global.enableAll .Values.models.meta.llama_4_scout_17b_16e_instruct.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: llama-4-scout-17b-16e-instruct
+spec:
+  disabled: false
+  displayName: meta.llama-4-scout-17b-16e-instruct
+  vendor: meta
+  version: "1.0.0"
+  storage:
+    storageUri: hf://meta-llama/Llama-4-Scout-17B-16E-Instruct
+    path: /raid/models/meta/llama-4-scout-17b-16e-instruct
+    key: "hf-token"
+{{- end }}
+{{- end }}
+
+{{- if or .Values.global.enableAll .Values.models.deepseek.enabled }}
+{{- if or .Values.global.enableAll .Values.models.deepseek.deepseek_v3.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: deepseek-v3
+spec:
+  vendor: deepseek-ai
+  disabled: false
+  version: "1.0.0"
+  storage:
+    storageUri: hf://deepseek-ai/DeepSeek-V3
+    path: /raid/models/deepseek-ai/deepseek-v3
+{{- end }}
+{{- if or .Values.global.enableAll .Values.models.deepseek.deepseek_r1.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: deepseek-r1
+spec:
+  vendor: deepseek-ai
+  disabled: false
+  version: "1.0.0"
+  storage:
+    storageUri: hf://deepseek-ai/DeepSeek-R1
+    path: /raid/models/deepseek-ai/deepseek-r1
+{{- end }}
+{{- end }}
+
+{{- if or .Values.global.enableAll .Values.models.intfloat.enabled }}
+{{- if or .Values.global.enableAll .Values.models.intfloat.e5_mistral_7b_instruct.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: e5-mistral-7b-instruct
+spec:
+  disabled: false
+  displayName: intfloat.e5-mistral-7b-instruct
+  storage:
+    storageUri: hf://intfloat/e5-mistral-7b-instruct
+    path: /raid/models/intfloat/e5-mistral-7b-instruct
+  vendor: intfloat
+  version: "0.0"
+{{- end }}
+{{- end }}
+
+{{- if or .Values.global.enableAll .Values.models.microsoft.enabled }}
+{{- if or .Values.global.enableAll .Values.models.microsoft.phi_3_vision_128k_instruct.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: phi-3-vision-128k-instruct
+spec:
+  disabled: false
+  displayName: microsoft.phi-3-vision-128k-instruct
+  storage:
+    storageUri: hf://microsoft/Phi-3-vision-128k-instruct
+    path: /raid/models/microsoft/phi-3-vision-128k-instruct
+  vendor: microsoft
+  version: "0.1"
+{{- end }}
+{{- end }}
+
+{{- if or .Values.global.enableAll .Values.models.moonshotai.enabled }}
+{{- if or .Values.global.enableAll .Values.models.moonshotai.kimi_k2_instruct.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: kimi-k2-instruct
+spec:
+  vendor: moonshotai
+  disabled: false
+  version: "1.0.0"
+  storage:
+    storageUri: hf://moonshotai/Kimi-K2-Instruct
+    path: /raid/models/moonshotai/Kimi-K2-Instruct
+{{- end }}
+{{- end }}
+
+{{- if or .Values.global.enableAll .Values.models.nvidia.enabled }}
+{{- if or .Values.global.enableAll .Values.models.nvidia.llama_3_1_nemotron_ultra_253b_v1.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: llama-3-1-nemotron-ultra-253b-v1
+spec:
+  vendor: nvidia
+  disabled: false
+  version: "1.0.0"
+  storage:
+    storageUri: hf://nvidia/Llama-3.1-Nemotron-70B-Instruct
+    path: /raid/models/nvidia/llama-3-1-nemotron-ultra-253b-v1
+{{- end }}
+{{- if or .Values.global.enableAll .Values.models.nvidia.llama_3_3_nemotron_super_49b_v1.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: llama-3-3-nemotron-super-49b-v1
+spec:
+  vendor: nvidia
+  disabled: false
+  version: "1.0.0"
+  storage:
+    storageUri: hf://nvidia/Llama-3.3-Nemotron-Super-49B-v1
+    path: /raid/models/nvidia/llama-3-3-nemotron-super-49b-v1
+{{- end }}
+{{- if or .Values.global.enableAll .Values.models.nvidia.llama_3_1_nemotron_nano_8b_v1.enabled }}
+---
+apiVersion: ome.io/v1beta1
+kind: ClusterBaseModel
+metadata:
+  name: llama-3-1-nemotron-nano-8b-v1
+spec:
+  vendor: nvidia
+  disabled: false
+  version: "1.0.0"
+  storage:
+    storageUri: hf://nvidia/Llama-3.1-Nemotron-Nano-8B-v1
+    path: /raid/models/nvidia/llama-3-1-nemotron-nano-8b-v1
+{{- end }}
+{{- end }}