Skip to content

RHEcosystemAppEng/kserve-nim-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NIM KServe Playground

This repository hosts example projects used for exploring KServe and Nvidia NIM with the goal of integrating Nvidia NIM into Red Hat OpenShift AI.

  • The pocs folder hosts the various POC scenarios designed with Kustomize.
  • The builds folder hosts built manifests from the above-mentioned pocs for accessibility.

All POC executions require Red Hat OpenShift AI.

POCs

Deployment Types

Kserve supports three types of deployment. We explored two of them. Serverless, and Raw.

Serverless Deployment

Serverless Deployment, the default deployment type for Kserve, it leverages Knative.

Model Used kserve-sklearnserver
POC Instructions Click here
Built Manifests Click here

Key Takeaways

  • The storageUri specification from the InferenceService is used for triggering Kserve's Storage Initializer Container for downloading the model prior to runtime.

Raw Deployment

With Raw Deployment, Kserve leverages Kubernetes core resources.

Model Used kserve-sklearnserver
POC Instructions Click here
Built Manifests Click here

Key Takeaways

  • The storageUri specification from the InferenceService is used for triggering Kserve's Storage Initializer Container for downloading the model prior to runtime.
  • Annotating the InferenceService with serving.kserve.io/deploymentMode: RawDeployment triggers a Raw Deployment.

Persistence and Caching

Prerequisites!

Before proceeding, grab your NGC API Key and create the following two secret data files (git-ignored):

The files are saved in the no-cache POC folder but are used by all scenarios in this context.

# the following will be used in an opaque secret mounted into the runtime
echo "NGC_API_KEY=ngcapikeygoeshere" > pocs/persistence-and-caching/no-cache/ngc.env
# the following will be used as the pull image secret for the underlying runtime deployment
echo "{
  \"auths\": {
    \"nvcr.io\": {
      \"username\": \"\$oauthtoken\",
      \"password\": \"ngcapikeygoeshere\"
    }
  }
}" > pocs/persistence-and-caching/no-cache/ngcdockerconfig.json

No caching or Persistence

In this scenario, Nvidia NIM is in charge of downloading the required models; however, the target volume is not persistent, and the download process will occur for every Pod created and will be reflected in scaling time.

Model Used nvidia-nim-llama3-8b-instruct
POC Instructions Click here
Built Manifests Click here

Key Takeaways

  • The storageUri specification from the InferenceService is NOT required.
  • We set the NIM_CACHE_PATH environment variable is set to /mnt/models (empty-dir).

Knative PVC Feature

In this scenario, Nvidia NIM is in charge of downloading the required models; the download target is a PVC.

kubernetes.podspec-persistent-volume-claim: "enabled"
kubernetes.podspec-persistent-volume-write: "enabled"
Model Used nvidia-nim-llama3-8b-instruct
POC Instructions Click here
Built Manifests Click here

Key Takeaways

  • The storageUri specification from the InferenceService is NOT required.
  • We added a PVC setting the storage class to OpenShift's default gp3-csi.
  • We added a Volume to the ServingRuntime connected to the above-mentioned PVC.
  • We added a VolumeMount to the ServingRuntime mounting the above-mentioned Volume to /mnt/nim/models.
  • We set the NIM_CACHE_PATH environment variable is set to above-mentioned /mnt/nim/models.

Kserve Raw NIM Deployment

In this scenario, Nvidia NIM is in charge of downloading the required models; the download target is a PVC. Using writable PVCs is applicable with Kserve's Raw Deployment.

Model Used nvidia-nim-llama3-8b-instruct
POC Instructions Click here
Built Manifests Click here

Key Takeaways

  • The storageUri specification from the InferenceService is NOT required.
  • We added a PVC setting the storage class to OpenShift's default gp3-csi.
  • We added a Volume to the ServingRuntime connected to the above-mentioned PVC.
  • We added a VolumeMount to the ServingRuntime mounting the above-mentioned Volume to /mnt/nim/models.
  • We set the NIM_CACHE_PATH environment variable is set to above-mentioned /mnt/nim/models.
  • Annotating the InferenceService with serving.kserve.io/deploymentMode: RawDeployment triggers a Raw Deployment.
  • We added maxReplicas for the Predictor, which is required for using HPA.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages