Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions charts/ome-predefined-models/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: v2
name: ome-predefined-models
description: OME Predefined Models and Serving Runtimes
type: application
version: 0.1.0
appVersion: "1.16.0"
201 changes: 201 additions & 0 deletions charts/ome-predefined-models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# ome-predefined-models

![Version: 0.1.0](https://img.shields.io/badge/Version-0.1.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.16.0](https://img.shields.io/badge/AppVersion-1.16.0-informational?style=flat-square)

OME Predefined Models and Serving Runtimes

## Description

This Helm chart provides a collection of predefined models and serving runtimes for OME (Open Model Engine). Instead of manually managing these resources through kustomize, users can now deploy them natively using Helm with fine-grained control over which models and runtimes to enable.

## Features

- **Predefined Models**: Deploy popular models from various vendors (Meta, DeepSeek, Intfloat, Microsoft, Moonshot AI, NVIDIA)
- **Serving Runtimes**: Support for both vLLM and SRT (SGLang Runtime) configurations
- **Selective Deployment**: Enable/disable specific models and runtimes through values configuration
- **Production Ready**: Includes proper resource limits, health checks, and monitoring configurations

## Installation

### Prerequisites

- Kubernetes cluster with GPU nodes
- OME CRDs already installed (`ome-crd` chart)
- OME controller running (`ome-resources` chart)

### Install the chart

```bash
helm repo add ome https://sgl-project.github.io/ome
helm repo update

# Install with default values
helm install ome-predefined-models ome/ome-predefined-models

# Or install from local chart
helm install ome-predefined-models ./charts/ome-predefined-models
```

### Custom Configuration

Create a `custom-values.yaml` file to customize which models and runtimes to enable:

```yaml
# Enable all resources
global:
enableAll: false

# Enable specific models
models:
meta:
enabled: true
llama_3_3_70b_instruct:
enabled: true
llama_4_maverick_17b_128e_instruct_fp8:
enabled: false

deepseek:
enabled: true
deepseek_v3:
enabled: true
deepseek_r1:
enabled: false

intfloat:
enabled: true
e5_mistral_7b_instruct:
enabled: true

# Enable specific runtimes
runtimes:
vllm:
enabled: true
e5_mistral_7b_instruct:
enabled: true
llama_3_3_70b_instruct:
enabled: true

srt:
enabled: true
deepseek_rdma:
enabled: true
e5_mistral_7b_instruct:
enabled: true
```

Then install with your custom values:

```bash
helm install ome-predefined-models ./charts/ome-predefined-models -f custom-values.yaml
```

## Supported Models

### Meta/Llama Models

- `llama-3-3-70b-instruct` - Llama 3.3 70B Instruct model
- `llama-4-maverick-17b-128e-instruct-fp8` - Llama 4 Maverick 17B model (FP8)
- `llama-4-scout-17b-16e-instruct` - Llama 4 Scout 17B model

### DeepSeek Models

- `deepseek-v3` - DeepSeek V3 model
- `deepseek-r1` - DeepSeek R1 model

### Intfloat Models

- `e5-mistral-7b-instruct` - E5 Mistral 7B Instruct model

### Microsoft Models

- `phi-3-vision-128k-instruct` - Phi-3 Vision 128K Instruct model

### Moonshot AI Models

- `kimi-k2-instruct` - Kimi K2 Instruct model

### NVIDIA Models

- `llama-3-1-nemotron-ultra-253b-v1` - Llama 3.1 Nemotron Ultra 253B
- `llama-3-3-nemotron-super-49b-v1` - Llama 3.3 Nemotron Super 49B
- `llama-3-1-nemotron-nano-8b-v1` - Llama 3.1 Nemotron Nano 8B

## Supported Runtimes

### vLLM Runtimes

- Optimized for inference workloads
- Built-in OpenAI-compatible API server
- Efficient memory utilization

### SRT (SGLang Runtime) Runtimes

- Advanced serving capabilities
- Support for complex multi-node deployments
- RDMA support for high-performance networking

## Configuration Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| global.enableAll | bool | `false` | Enable all predefined resources |
| models.meta.enabled | bool | `true` | Enable Meta/Llama models |
| models.deepseek.enabled | bool | `true` | Enable DeepSeek models |
| models.intfloat.enabled | bool | `true` | Enable Intfloat models |
| models.microsoft.enabled | bool | `false` | Enable Microsoft models |
| models.moonshotai.enabled | bool | `false` | Enable Moonshot AI models |
| models.nvidia.enabled | bool | `false` | Enable NVIDIA models |
| runtimes.vllm.enabled | bool | `true` | Enable vLLM runtimes |
| runtimes.srt.enabled | bool | `true` | Enable SRT runtimes |

## Usage Examples

### Deploy Only Essential Models

```yaml
global:
enableAll: false

models:
meta:
enabled: true
llama_3_3_70b_instruct:
enabled: true

intfloat:
enabled: true
e5_mistral_7b_instruct:
enabled: true

runtimes:
vllm:
enabled: true
llama_3_3_70b_instruct:
enabled: true
e5_mistral_7b_instruct:
enabled: true
```

### High-Performance Setup with RDMA

```yaml
models:
deepseek:
enabled: true
deepseek_v3:
enabled: true

runtimes:
srt:
enabled: true
deepseek_rdma:
enabled: true
```

## Contributing

To add new models or runtimes:

1. Add the configuration to the appropriate template file
2. Update the `values.yaml` with the new configuration options
3. Update this README with the new resource information
187 changes: 187 additions & 0 deletions charts/ome-predefined-models/templates/models.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
{{- if or .Values.global.enableAll .Values.models.meta.enabled }}
{{- if or .Values.global.enableAll .Values.models.meta.llama_3_3_70b_instruct.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: llama-3-3-70b-instruct
spec:
disabled: false
displayName: meta.llama-3.3-70b-instruct
storage:
storageUri: hf://meta-llama/Llama-3.3-70B-Instruct
path: /raid/models/meta/llama-3-3-70b-instruct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The storage path /raid/models is hardcoded. This reduces the chart's flexibility, as users might have different storage layouts or permissions. It would be better to make the base path configurable via values.yaml.

    path: {{ .Values.global.storageBasePath | default "/raid/models" }}/meta/llama-3-3-70b-instruct

key: "hf-token"
vendor: meta
version: "1.0.0"
{{- end }}
{{- if or .Values.global.enableAll .Values.models.meta.llama_4_maverick_17b_128e_instruct_fp8.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: llama-4-maverick-17b-128e-instruct-fp8
spec:
vendor: meta
disabled: false
displayName: meta.llama-4-maverick-17b-128e-instruct-fp8
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.51.0.dev0"
modelType: llama
modelArchitecture: Llama4ForConditionalGeneration
storage:
storageUri: hf://meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
path: /raid/models/meta/llama-4-maverick-17b-128e-instruct-fp8
key: "hf-token"
{{- end }}
{{- if or .Values.global.enableAll .Values.models.meta.llama_4_scout_17b_16e_instruct.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: llama-4-scout-17b-16e-instruct
spec:
disabled: false
displayName: meta.llama-4-scout-17b-16e-instruct
vendor: meta
version: "1.0.0"
storage:
storageUri: hf://meta-llama/Llama-4-Scout-17B-16E-Instruct
path: /raid/models/meta/llama-4-scout-17b-16e-instruct
key: "hf-token"
{{- end }}
{{- end }}

{{- if or .Values.global.enableAll .Values.models.deepseek.enabled }}
{{- if or .Values.global.enableAll .Values.models.deepseek.deepseek_v3.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: deepseek-v3
spec:
vendor: deepseek-ai
disabled: false
version: "1.0.0"
storage:
storageUri: hf://deepseek-ai/DeepSeek-V3
path: /raid/models/deepseek-ai/deepseek-v3
{{- end }}
{{- if or .Values.global.enableAll .Values.models.deepseek.deepseek_r1.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: deepseek-r1
spec:
vendor: deepseek-ai
disabled: false
version: "1.0.0"
storage:
storageUri: hf://deepseek-ai/DeepSeek-R1
path: /raid/models/deepseek-ai/deepseek-r1
{{- end }}
{{- end }}

{{- if or .Values.global.enableAll .Values.models.intfloat.enabled }}
{{- if or .Values.global.enableAll .Values.models.intfloat.e5_mistral_7b_instruct.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: e5-mistral-7b-instruct
spec:
disabled: false
displayName: intfloat.e5-mistral-7b-instruct
storage:
storageUri: hf://intfloat/e5-mistral-7b-instruct
path: /raid/models/intfloat/e5-mistral-7b-instruct
vendor: intfloat
version: "0.0"
{{- end }}
{{- end }}

{{- if or .Values.global.enableAll .Values.models.microsoft.enabled }}
{{- if or .Values.global.enableAll .Values.models.microsoft.phi_3_vision_128k_instruct.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: phi-3-vision-128k-instruct
spec:
disabled: false
displayName: microsoft.phi-3-vision-128k-instruct
storage:
storageUri: hf://microsoft/Phi-3-vision-128k-instruct
path: /raid/models/microsoft/phi-3-vision-128k-instruct
vendor: microsoft
version: "0.1"
{{- end }}
{{- end }}

{{- if or .Values.global.enableAll .Values.models.moonshotai.enabled }}
{{- if or .Values.global.enableAll .Values.models.moonshotai.kimi_k2_instruct.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: kimi-k2-instruct
spec:
vendor: moonshotai
disabled: false
version: "1.0.0"
storage:
storageUri: hf://moonshotai/Kimi-K2-Instruct
path: /raid/models/moonshotai/Kimi-K2-Instruct
{{- end }}
{{- end }}

{{- if or .Values.global.enableAll .Values.models.nvidia.enabled }}
{{- if or .Values.global.enableAll .Values.models.nvidia.llama_3_1_nemotron_ultra_253b_v1.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: llama-3-1-nemotron-ultra-253b-v1
spec:
vendor: nvidia
disabled: false
version: "1.0.0"
storage:
storageUri: hf://nvidia/Llama-3.1-Nemotron-70B-Instruct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The storageUri appears to be incorrect for this model. The model is named llama-3-1-nemotron-ultra-253b-v1, which suggests a 253B parameter model, but the storageUri points to Llama-3.1-Nemotron-70B-Instruct. Please verify and correct the URI to ensure the correct model is downloaded.

path: /raid/models/nvidia/llama-3-1-nemotron-ultra-253b-v1
{{- end }}
{{- if or .Values.global.enableAll .Values.models.nvidia.llama_3_3_nemotron_super_49b_v1.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: llama-3-3-nemotron-super-49b-v1
spec:
vendor: nvidia
disabled: false
version: "1.0.0"
storage:
storageUri: hf://nvidia/Llama-3.3-Nemotron-Super-49B-v1
path: /raid/models/nvidia/llama-3-3-nemotron-super-49b-v1
{{- end }}
{{- if or .Values.global.enableAll .Values.models.nvidia.llama_3_1_nemotron_nano_8b_v1.enabled }}
---
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: llama-3-1-nemotron-nano-8b-v1
spec:
vendor: nvidia
disabled: false
version: "1.0.0"
storage:
storageUri: hf://nvidia/Llama-3.1-Nemotron-Nano-8B-v1
path: /raid/models/nvidia/llama-3-1-nemotron-nano-8b-v1
{{- end }}
{{- end }}
Loading