Skip to content

Conversation

hhzhang16
Copy link
Contributor

@hhzhang16 hhzhang16 commented Oct 9, 2025

Overview:

This MR adds a DynamoModel resource following this enhancement proposal.

Details:

  • adds DynamoModel CRD
  • adds DynamoModel controller to download models to cache
  • updates DGD controller to pull/reference from the downloaded model; waits for model to be "ready" before proceeding

TODO still

  • Remove pvc creation and make it optional; without it, it downloads to the node local system (race conditions?)
  • Automatic model path argument injection for all backends
  • Testing
  • Update examples
  • add support matrix -- only HF, only PVC RWX

Example CRDs:

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# Basic DynamoModel example with HuggingFace public model
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
  name: llama-3-70b-instruct-v1
  namespace: dynamo-cloud
spec:
  # Canonical model name from HuggingFace
  name: meta-llama/Llama-3.3-70B-Instruct
  
  # Version pin (optional but recommended for production)
  # This ensures all deployments use the exact same model artifact
  version: main
  
  # Source URL - supports hf://, s3://, ngc://, or https://
  sourceURL: hf://meta-llama/Llama-3.3-70B-Instruct
  
  # Secret reference for authentication (required for private models)
  secretRef: hf-token-secret
  
  # PVC configuration for model storage
  pvc:
    create: true
    storageClass: standard  # Update with your storage class
    size: 200Gi  # Adjust based on model size
    volumeAccessMode: ReadWriteMany  # Required for multi-replica deployments

---
# DynamoGraphDeployment that uses the model
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
  name: vllm-with-model
  namespace: dynamo-cloud
spec:
  # Reference the DynamoModel at top-level (applies to all services)
  modelRef: llama-3-70b-instruct-v1
  backendFramework: vllm
  services:
    VllmWorker:
      replicas: 2
      resources:
        limits:
          nvidia.com/gpu: "2"
      # Model arguments will be auto-injected by the controller
    Frontend:
      replicas: 1

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@hhzhang16 hhzhang16 requested a review from biswapanda October 9, 2025 17:30
@hhzhang16 hhzhang16 self-assigned this Oct 9, 2025
@github-actions github-actions bot added the feat label Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant