-
Notifications
You must be signed in to change notification settings - Fork 44
feat: add helm chart for predefined-models #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lizzy-0323
wants to merge
2
commits into
sgl-project:main
Choose a base branch
from
lizzy-0323:update-helm-chart
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| apiVersion: v2 | ||
| name: ome-predefined-models | ||
| description: OME Predefined Models and Serving Runtimes | ||
| type: application | ||
| version: 0.1.0 | ||
| appVersion: "1.16.0" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| # ome-predefined-models | ||
|
|
||
|    | ||
|
|
||
| OME Predefined Models and Serving Runtimes | ||
|
|
||
| ## Description | ||
|
|
||
| This Helm chart provides a collection of predefined models and serving runtimes for OME (Open Model Engine). Instead of manually managing these resources through kustomize, users can now deploy them natively using Helm with fine-grained control over which models and runtimes to enable. | ||
|
|
||
| ## Features | ||
|
|
||
| - **Predefined Models**: Deploy popular models from various vendors (Meta, DeepSeek, Intfloat, Microsoft, Moonshot AI, NVIDIA) | ||
| - **Serving Runtimes**: Support for both vLLM and SRT (SGLang Runtime) configurations | ||
| - **Selective Deployment**: Enable/disable specific models and runtimes through values configuration | ||
| - **Production Ready**: Includes proper resource limits, health checks, and monitoring configurations | ||
|
|
||
| ## Installation | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| - Kubernetes cluster with GPU nodes | ||
| - OME CRDs already installed (`ome-crd` chart) | ||
| - OME controller running (`ome-resources` chart) | ||
|
|
||
| ### Install the chart | ||
|
|
||
| ```bash | ||
| helm repo add ome https://sgl-project.github.io/ome | ||
| helm repo update | ||
|
|
||
| # Install with default values | ||
| helm install ome-predefined-models ome/ome-predefined-models | ||
|
|
||
| # Or install from local chart | ||
| helm install ome-predefined-models ./charts/ome-predefined-models | ||
| ``` | ||
|
|
||
| ### Custom Configuration | ||
|
|
||
| Create a `custom-values.yaml` file to customize which models and runtimes to enable: | ||
|
|
||
| ```yaml | ||
| # Enable all resources | ||
| global: | ||
| enableAll: false | ||
|
|
||
| # Enable specific models | ||
| models: | ||
| meta: | ||
| enabled: true | ||
| llama_3_3_70b_instruct: | ||
| enabled: true | ||
| llama_4_maverick_17b_128e_instruct_fp8: | ||
| enabled: false | ||
|
|
||
| deepseek: | ||
| enabled: true | ||
| deepseek_v3: | ||
| enabled: true | ||
| deepseek_r1: | ||
| enabled: false | ||
|
|
||
| intfloat: | ||
| enabled: true | ||
| e5_mistral_7b_instruct: | ||
| enabled: true | ||
|
|
||
| # Enable specific runtimes | ||
| runtimes: | ||
| vllm: | ||
| enabled: true | ||
| e5_mistral_7b_instruct: | ||
| enabled: true | ||
| llama_3_3_70b_instruct: | ||
| enabled: true | ||
|
|
||
| srt: | ||
| enabled: true | ||
| deepseek_rdma: | ||
| enabled: true | ||
| e5_mistral_7b_instruct: | ||
| enabled: true | ||
| ``` | ||
|
|
||
| Then install with your custom values: | ||
|
|
||
| ```bash | ||
| helm install ome-predefined-models ./charts/ome-predefined-models -f custom-values.yaml | ||
| ``` | ||
|
|
||
| ## Supported Models | ||
|
|
||
| ### Meta/Llama Models | ||
|
|
||
| - `llama-3-3-70b-instruct` - Llama 3.3 70B Instruct model | ||
| - `llama-4-maverick-17b-128e-instruct-fp8` - Llama 4 Maverick 17B model (FP8) | ||
| - `llama-4-scout-17b-16e-instruct` - Llama 4 Scout 17B model | ||
|
|
||
| ### DeepSeek Models | ||
|
|
||
| - `deepseek-v3` - DeepSeek V3 model | ||
| - `deepseek-r1` - DeepSeek R1 model | ||
|
|
||
| ### Intfloat Models | ||
|
|
||
| - `e5-mistral-7b-instruct` - E5 Mistral 7B Instruct model | ||
|
|
||
| ### Microsoft Models | ||
|
|
||
| - `phi-3-vision-128k-instruct` - Phi-3 Vision 128K Instruct model | ||
|
|
||
| ### Moonshot AI Models | ||
|
|
||
| - `kimi-k2-instruct` - Kimi K2 Instruct model | ||
|
|
||
| ### NVIDIA Models | ||
|
|
||
| - `llama-3-1-nemotron-ultra-253b-v1` - Llama 3.1 Nemotron Ultra 253B | ||
| - `llama-3-3-nemotron-super-49b-v1` - Llama 3.3 Nemotron Super 49B | ||
| - `llama-3-1-nemotron-nano-8b-v1` - Llama 3.1 Nemotron Nano 8B | ||
|
|
||
| ## Supported Runtimes | ||
|
|
||
| ### vLLM Runtimes | ||
|
|
||
| - Optimized for inference workloads | ||
| - Built-in OpenAI-compatible API server | ||
| - Efficient memory utilization | ||
|
|
||
| ### SRT (SGLang Runtime) Runtimes | ||
|
|
||
| - Advanced serving capabilities | ||
| - Support for complex multi-node deployments | ||
| - RDMA support for high-performance networking | ||
|
|
||
| ## Configuration Values | ||
|
|
||
| | Key | Type | Default | Description | | ||
| |-----|------|---------|-------------| | ||
| | global.enableAll | bool | `false` | Enable all predefined resources | | ||
| | models.meta.enabled | bool | `true` | Enable Meta/Llama models | | ||
| | models.deepseek.enabled | bool | `true` | Enable DeepSeek models | | ||
| | models.intfloat.enabled | bool | `true` | Enable Intfloat models | | ||
| | models.microsoft.enabled | bool | `false` | Enable Microsoft models | | ||
| | models.moonshotai.enabled | bool | `false` | Enable Moonshot AI models | | ||
| | models.nvidia.enabled | bool | `false` | Enable NVIDIA models | | ||
| | runtimes.vllm.enabled | bool | `true` | Enable vLLM runtimes | | ||
| | runtimes.srt.enabled | bool | `true` | Enable SRT runtimes | | ||
|
|
||
| ## Usage Examples | ||
|
|
||
| ### Deploy Only Essential Models | ||
|
|
||
| ```yaml | ||
| global: | ||
| enableAll: false | ||
|
|
||
| models: | ||
| meta: | ||
| enabled: true | ||
| llama_3_3_70b_instruct: | ||
| enabled: true | ||
|
|
||
| intfloat: | ||
| enabled: true | ||
| e5_mistral_7b_instruct: | ||
| enabled: true | ||
|
|
||
| runtimes: | ||
| vllm: | ||
| enabled: true | ||
| llama_3_3_70b_instruct: | ||
| enabled: true | ||
| e5_mistral_7b_instruct: | ||
| enabled: true | ||
| ``` | ||
|
|
||
| ### High-Performance Setup with RDMA | ||
|
|
||
| ```yaml | ||
| models: | ||
| deepseek: | ||
| enabled: true | ||
| deepseek_v3: | ||
| enabled: true | ||
|
|
||
| runtimes: | ||
| srt: | ||
| enabled: true | ||
| deepseek_rdma: | ||
| enabled: true | ||
| ``` | ||
|
|
||
| ## Contributing | ||
|
|
||
| To add new models or runtimes: | ||
|
|
||
| 1. Add the configuration to the appropriate template file | ||
| 2. Update the `values.yaml` with the new configuration options | ||
| 3. Update this README with the new resource information |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,187 @@ | ||
| {{- if or .Values.global.enableAll .Values.models.meta.enabled }} | ||
| {{- if or .Values.global.enableAll .Values.models.meta.llama_3_3_70b_instruct.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: llama-3-3-70b-instruct | ||
| spec: | ||
| disabled: false | ||
| displayName: meta.llama-3.3-70b-instruct | ||
| storage: | ||
| storageUri: hf://meta-llama/Llama-3.3-70B-Instruct | ||
| path: /raid/models/meta/llama-3-3-70b-instruct | ||
| key: "hf-token" | ||
| vendor: meta | ||
| version: "1.0.0" | ||
| {{- end }} | ||
| {{- if or .Values.global.enableAll .Values.models.meta.llama_4_maverick_17b_128e_instruct_fp8.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: llama-4-maverick-17b-128e-instruct-fp8 | ||
| spec: | ||
| vendor: meta | ||
| disabled: false | ||
| displayName: meta.llama-4-maverick-17b-128e-instruct-fp8 | ||
| version: "1.0.0" | ||
| modelFormat: | ||
| name: safetensors | ||
| version: "1.0.0" | ||
| modelFramework: | ||
| name: transformers | ||
| version: "4.51.0.dev0" | ||
| modelType: llama | ||
| modelArchitecture: Llama4ForConditionalGeneration | ||
| storage: | ||
| storageUri: hf://meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | ||
| path: /raid/models/meta/llama-4-maverick-17b-128e-instruct-fp8 | ||
| key: "hf-token" | ||
| {{- end }} | ||
| {{- if or .Values.global.enableAll .Values.models.meta.llama_4_scout_17b_16e_instruct.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: llama-4-scout-17b-16e-instruct | ||
| spec: | ||
| disabled: false | ||
| displayName: meta.llama-4-scout-17b-16e-instruct | ||
| vendor: meta | ||
| version: "1.0.0" | ||
| storage: | ||
| storageUri: hf://meta-llama/Llama-4-Scout-17B-16E-Instruct | ||
| path: /raid/models/meta/llama-4-scout-17b-16e-instruct | ||
| key: "hf-token" | ||
| {{- end }} | ||
| {{- end }} | ||
|
|
||
| {{- if or .Values.global.enableAll .Values.models.deepseek.enabled }} | ||
| {{- if or .Values.global.enableAll .Values.models.deepseek.deepseek_v3.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: deepseek-v3 | ||
| spec: | ||
| vendor: deepseek-ai | ||
| disabled: false | ||
| version: "1.0.0" | ||
| storage: | ||
| storageUri: hf://deepseek-ai/DeepSeek-V3 | ||
| path: /raid/models/deepseek-ai/deepseek-v3 | ||
| {{- end }} | ||
| {{- if or .Values.global.enableAll .Values.models.deepseek.deepseek_r1.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: deepseek-r1 | ||
| spec: | ||
| vendor: deepseek-ai | ||
| disabled: false | ||
| version: "1.0.0" | ||
| storage: | ||
| storageUri: hf://deepseek-ai/DeepSeek-R1 | ||
| path: /raid/models/deepseek-ai/deepseek-r1 | ||
| {{- end }} | ||
| {{- end }} | ||
|
|
||
| {{- if or .Values.global.enableAll .Values.models.intfloat.enabled }} | ||
| {{- if or .Values.global.enableAll .Values.models.intfloat.e5_mistral_7b_instruct.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: e5-mistral-7b-instruct | ||
| spec: | ||
| disabled: false | ||
| displayName: intfloat.e5-mistral-7b-instruct | ||
| storage: | ||
| storageUri: hf://intfloat/e5-mistral-7b-instruct | ||
| path: /raid/models/intfloat/e5-mistral-7b-instruct | ||
| vendor: intfloat | ||
| version: "0.0" | ||
| {{- end }} | ||
| {{- end }} | ||
|
|
||
| {{- if or .Values.global.enableAll .Values.models.microsoft.enabled }} | ||
| {{- if or .Values.global.enableAll .Values.models.microsoft.phi_3_vision_128k_instruct.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: phi-3-vision-128k-instruct | ||
| spec: | ||
| disabled: false | ||
| displayName: microsoft.phi-3-vision-128k-instruct | ||
| storage: | ||
| storageUri: hf://microsoft/Phi-3-vision-128k-instruct | ||
| path: /raid/models/microsoft/phi-3-vision-128k-instruct | ||
| vendor: microsoft | ||
| version: "0.1" | ||
| {{- end }} | ||
| {{- end }} | ||
|
|
||
| {{- if or .Values.global.enableAll .Values.models.moonshotai.enabled }} | ||
| {{- if or .Values.global.enableAll .Values.models.moonshotai.kimi_k2_instruct.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: kimi-k2-instruct | ||
| spec: | ||
| vendor: moonshotai | ||
| disabled: false | ||
| version: "1.0.0" | ||
| storage: | ||
| storageUri: hf://moonshotai/Kimi-K2-Instruct | ||
| path: /raid/models/moonshotai/Kimi-K2-Instruct | ||
| {{- end }} | ||
| {{- end }} | ||
|
|
||
| {{- if or .Values.global.enableAll .Values.models.nvidia.enabled }} | ||
| {{- if or .Values.global.enableAll .Values.models.nvidia.llama_3_1_nemotron_ultra_253b_v1.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: llama-3-1-nemotron-ultra-253b-v1 | ||
| spec: | ||
| vendor: nvidia | ||
| disabled: false | ||
| version: "1.0.0" | ||
| storage: | ||
| storageUri: hf://nvidia/Llama-3.1-Nemotron-70B-Instruct | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| path: /raid/models/nvidia/llama-3-1-nemotron-ultra-253b-v1 | ||
| {{- end }} | ||
| {{- if or .Values.global.enableAll .Values.models.nvidia.llama_3_3_nemotron_super_49b_v1.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: llama-3-3-nemotron-super-49b-v1 | ||
| spec: | ||
| vendor: nvidia | ||
| disabled: false | ||
| version: "1.0.0" | ||
| storage: | ||
| storageUri: hf://nvidia/Llama-3.3-Nemotron-Super-49B-v1 | ||
| path: /raid/models/nvidia/llama-3-3-nemotron-super-49b-v1 | ||
| {{- end }} | ||
| {{- if or .Values.global.enableAll .Values.models.nvidia.llama_3_1_nemotron_nano_8b_v1.enabled }} | ||
| --- | ||
| apiVersion: ome.io/v1beta1 | ||
| kind: ClusterBaseModel | ||
| metadata: | ||
| name: llama-3-1-nemotron-nano-8b-v1 | ||
| spec: | ||
| vendor: nvidia | ||
| disabled: false | ||
| version: "1.0.0" | ||
| storage: | ||
| storageUri: hf://nvidia/Llama-3.1-Nemotron-Nano-8B-v1 | ||
| path: /raid/models/nvidia/llama-3-1-nemotron-nano-8b-v1 | ||
| {{- end }} | ||
| {{- end }} | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The storage path
/raid/modelsis hardcoded. This reduces the chart's flexibility, as users might have different storage layouts or permissions. It would be better to make the base path configurable viavalues.yaml.