Skip to content

Commit

Permalink
Add configurable runtimeClassName for gpu-agent (#55)
Browse files Browse the repository at this point in the history
* Add configurable runtimeClassName for gpu-agent

* Fix helm chart syntax for runtimeClassName

* fix: gpu-agent custom runtime class

---------

Co-authored-by: Wojtek Czekalski <[email protected]>
  • Loading branch information
Telemaco019 and wokalski authored Apr 21, 2024
1 parent b96e507 commit 80f7b39
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/en/docs/helm-charts/nos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ The open-source platform for running AI workloads on k8s in an optimized way, bo
| gpuPartitioner.gpuAgent.logLevel | int | `0` | The level of log of the GPU Agent. Zero corresponds to `info`, while values greater or equal than 1 corresponds to higher debug levels. **Must be >= 0**. |
| gpuPartitioner.gpuAgent.reportConfigIntervalSeconds | int | `10` | Interval at which the mig-agent will report to k8s status of the GPUs of the Node |
| gpuPartitioner.gpuAgent.resources | object | `{"limits":{"cpu":"100m","memory":"128Mi"}}` | Sets the resource requests and limits of the GPU Agent container. |
| gpuPartitioner.gpuAgent.runtimeClassName | string | `nil` | The container runtime class name to use for the GPU Agent container. |
| gpuPartitioner.gpuAgent.tolerations | list | `[{"effect":"NoSchedule","key":"kubernetes.azure.com/scalesetpriority","operator":"Equal","value":"spot"}]` | Sets the tolerations of the GPU Agent Pod. |
| gpuPartitioner.image.pullPolicy | string | `"IfNotPresent"` | Sets the GPU Partitioner Docker image pull policy. |
| gpuPartitioner.image.repository | string | `"ghcr.io/nebuly-ai/nos-gpu-partitioner"` | Sets the GPU Partitioner Docker image. |
Expand Down
1 change: 1 addition & 0 deletions helm-charts/nos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ The open-source platform for running AI workloads on k8s in an optimized way, bo
| gpuPartitioner.gpuAgent.logLevel | int | `0` | The level of log of the GPU Agent. Zero corresponds to `info`, while values greater or equal than 1 corresponds to higher debug levels. **Must be >= 0**. |
| gpuPartitioner.gpuAgent.reportConfigIntervalSeconds | int | `10` | Interval at which the mig-agent will report to k8s status of the GPUs of the Node |
| gpuPartitioner.gpuAgent.resources | object | `{"limits":{"cpu":"100m","memory":"128Mi"}}` | Sets the resource requests and limits of the GPU Agent container. |
| gpuPartitioner.gpuAgent.runtimeClassName | string | `nil` | The container runtime class name to use for the GPU Agent container. |
| gpuPartitioner.gpuAgent.tolerations | list | `[{"effect":"NoSchedule","key":"kubernetes.azure.com/scalesetpriority","operator":"Equal","value":"spot"}]` | Sets the tolerations of the GPU Agent Pod. |
| gpuPartitioner.image.pullPolicy | string | `"IfNotPresent"` | Sets the GPU Partitioner Docker image pull policy. |
| gpuPartitioner.image.repository | string | `"ghcr.io/nebuly-ai/nos-gpu-partitioner"` | Sets the GPU Partitioner Docker image. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ spec:
nos.nebuly.com/gpu-partitioning: mps
priorityClassName: system-node-critical
terminationGracePeriodSeconds: 20
{{- if .Values.gpuPartitioner.gpuAgent.runtimeClassName }}
runtimeClassName: {{ .Values.gpuPartitioner.gpuAgent.runtimeClassName }}
{{- end }}
containers:
- image: "{{ .Values.gpuPartitioner.gpuAgent.image.repository }}:{{ .Values.gpuPartitioner.gpuAgent.image.tag | default .Chart.AppVersion }}"
name: {{ include "gpuAgent.fullname" . }}
Expand Down
2 changes: 2 additions & 0 deletions helm-charts/nos/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,8 @@ gpuPartitioner:
# Zero corresponds to `info`, while values greater or equal than 1 corresponds to higher debug levels.
# **Must be >= 0**.
logLevel: 0
# -- The container runtime class name to use for the GPU Agent container.
runtimeClassName:
image:
# -- Sets the GPU Agent Docker image.
repository: ghcr.io/nebuly-ai/nos-gpu-agent
Expand Down

0 comments on commit 80f7b39

Please sign in to comment.