Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions components/models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# DynamoModel Definitions

This directory contains pre-configured `DynamoModel` resources for commonly used models.

## Available Models

### Qwen 3 0.6B
**File:** `qwen3-0.6b.yaml`
- **Size:** ~2GB
- **Use Case:** Testing, development, lightweight inference
- **Public:** Yes (no authentication required)

```bash
kubectl apply -f qwen3-0.6b.yaml -n your-namespace
```

### Llama 3.3 70B Instruct
**File:** `llama-3-70b.yaml`
- **Size:** ~140GB
- **Use Case:** Production inference, high-quality responses
- **Public:** Gated (requires HuggingFace token)

```bash
# Create secret first
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN="your-token" \
-n your-namespace

kubectl apply -f llama-3-70b.yaml -n your-namespace
```

## Usage

### 1. Deploy Model

```bash
kubectl apply -f <model-file>.yaml -n your-namespace
```

### 2. Check Status

```bash
# Watch model download progress
kubectl get dynamomodel -n your-namespace -w

# Check detailed status
kubectl describe dynamomodel qwen3-0.6b -n your-namespace

# View download logs
kubectl logs job/qwen3-0.6b-download -n your-namespace -f
```

### 3. Reference in Deployment

Once the model state is "Ready", reference it in your `DynamoGraphDeployment`:

```yaml
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: my-deployment
spec:
modelRef: qwen3-0.6b # Reference the model by name
backendFramework: vllm
services:
VllmWorker:
replicas: 1
resources:
limits:
nvidia.com/gpu: "1"
```

## Customization

Update the following fields based on your cluster:

- **`storageClass`**: Use your cluster's available storage class
- **`size`**: Adjust based on model requirements
- **`version`**: Pin to specific commit SHA for production
- **`secretRef`**: Add if model requires authentication

## Adding New Models

Create a new YAML file following this template:

```yaml
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: my-model
spec:
name: organization/model-name
version: commit-sha # Optional
sourceURL: hf://organization/model-name
secretRef: secret-name # Optional
pvc:
create: true
storageClass: your-storage-class
size: XXGi
volumeAccessMode: ReadWriteMany
```

28 changes: 28 additions & 0 deletions components/models/llama-3-70b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# DynamoModel for Llama 3.3 70B Instruct - production-ready large model
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: llama-3-70b-instruct
spec:
# Canonical model name from HuggingFace
name: meta-llama/Llama-3.3-70B-Instruct

# Version pin - use specific SHA for production
version: main

# Source URL - HuggingFace Hub
sourceURL: hf://meta-llama/Llama-3.3-70B-Instruct

# Secret reference for authentication (required for gated models)
secretRef: hf-token-secret

# PVC configuration for model storage
pvc:
create: true
storageClass: standard # Update with your storage class
size: 200Gi # Large model requires significant storage
volumeAccessMode: ReadWriteMany # Required for multi-replica deployments

28 changes: 28 additions & 0 deletions components/models/qwen3-0.6b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# DynamoModel for Qwen 3 0.6B - lightweight model for testing and development
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: qwen3-0.6b
spec:
# Canonical model name from HuggingFace
name: Qwen/Qwen3-0.6B

# Version pin (optional) - use a specific commit SHA for reproducibility
# version: main

# Source URL - HuggingFace Hub
sourceURL: hf://Qwen/Qwen3-0.6B

# Secret reference for authentication (optional for public models)
# secretRef: hf-token-secret

# PVC configuration for model storage
pvc:
create: true
storageClass: standard # Update with your storage class
size: 10Gi # Small model, only needs ~2GB but allocate extra space
volumeAccessMode: ReadWriteMany # Required for multi-replica deployments

218 changes: 218 additions & 0 deletions deploy/cloud/helm/crds/templates/nvidia.com_dynamomodels.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.4
helm.sh/resource-policy: keep
name: dynamomodels.nvidia.com
spec:
group: nvidia.com
names:
kind: DynamoModel
listKind: DynamoModelList
plural: dynamomodels
shortNames:
- dm
singular: dynamomodel
scope: Namespaced
versions:
- name: v1alpha1
additionalPrinterColumns:
- jsonPath: .status.state
name: State
type: string
- jsonPath: .spec.name
name: Model
type: string
- jsonPath: .spec.version
name: Version
type: string
- jsonPath: .status.pvcName
name: PVC
type: string
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
schema:
openAPIV3Schema:
description: |-
DynamoModel is the Schema for the dynamomodels API.
It provides a high-level abstraction for managing model artifacts cached in PVCs in the cluster.
All jobs referencing the same DynamoModel are guaranteed to use the same artifact,
preventing drift and simplifying maintenance.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: Spec defines the desired state for this model.
properties:
downloaderRef:
description: |-
DownloaderRef is an optional reference to a custom downloader or workflow
(e.g., MLFlow or internal tools). Provides extensibility for specialized workflows
(internal or third-party).
type: string
name:
description: |-
Name is the canonical model name (matches external model repo, e.g. HuggingFace, NGC).
Example: "meta-llama/Llama-3.3-70B-Instruct"
type: string
pvc:
description: PVC defines the persistent volume claim configuration for storing the model.
properties:
create:
default: true
description: Create indicates whether to create a new PVC or use an existing one.
type: boolean
name:
description: Name is the name of the PVC. If not specified, defaults to the DynamoModel name.
type: string
size:
anyOf:
- type: integer
- type: string
description: Size of the volume, used during PVC creation. Required when create is true.
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
storageClass:
description: StorageClass to be used for PVC creation. Required when create is true.
type: string
volumeAccessMode:
default: ReadWriteMany
description: VolumeAccessMode is the volume access mode of the PVC. Defaults to ReadWriteMany.
type: string
required:
- create
type: object
secretRef:
description: |-
SecretRef is an optional reference to a secret needed for accessing the source URL
(private repo, S3 credentials, etc.)
type: string
sourceURL:
description: |-
SourceURL is the source location of model weights (can be HF, S3, NGC).
Ensures flexibility in downstream storage strategies; permits flexible source management and credential injection.
Examples: "hf://meta-llama/Llama-3.3-70B-Instruct", "s3://bucket/path/to/model", "ngc://nvidia/model"
type: string
version:
description: |-
Version is a version pin (e.g., SHA or tag from source repository).
This solves version drift by pinning deployments and benchmarking jobs to the same model artifact.
type: string
required:
- name
- pvc
- sourceURL
type: object
status:
description: Status reflects the current observed state of this model.
properties:
conditions:
description: Conditions contains the latest observed conditions of the model.
items:
description: Condition contains details for one aspect of the current state of this API Resource.
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
downloadJobName:
description: DownloadJobName is the name of the Job created to download the model.
type: string
lastDownloadTime:
description: LastDownloadTime is the timestamp of the last successful download.
format: date-time
type: string
pvcName:
description: PVCName is the name of the PVC created or used for this model.
type: string
state:
description: |-
State is a high-level textual status of the model lifecycle.
Possible values: "Pending", "Downloading", "Ready", "Failed"
type: string
type: object
type: object
served: true
storage: true
subresources:
status: {}
8 changes: 8 additions & 0 deletions deploy/cloud/operator/PROJECT
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,12 @@ resources:
kind: DynamoGraphDeployment
path: github.com/ai-dynamo/dynamo/deploy/cloud/operator/api/v1alpha1
version: v1alpha1
- api:
crdVersion: v1
namespaced: true
controller: true
domain: nvidia.com
kind: DynamoModel
path: github.com/ai-dynamo/dynamo/deploy/cloud/operator/api/v1alpha1
version: v1alpha1
version: "3"
Loading
Loading