Incompatible strategy detected auto No devices found. Waiting indefinitely #1134

smartlocus · 2025-01-23T17:07:32Z

Hello Guys, i have the k8s-device-plugin running on my second master kubernetes node which contains underlying gpu which is Geforce RTX 2060. The gpu works fine and i can run my machine learning trainings using docker also. I dont understand why the k8s-device plugin container in my kubernetes cluster can not see the gpu. I am using [CRI-O] as a default runtime and also have set it to default using --set-as-default. I would appreciate it if someone could assist me. Below i have put screenshots of the kubernetes device plugin container issue and also output of my nvidia-smi which shows that my gpu is working fine and also that my crio sevice is running.

`I0123 16:54:07.204118 1 main.go:235] "Starting NVIDIA Device Plugin" version=<
d475b2c
commit: d475b2c

I0123 16:54:07.204992 1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
I0123 16:54:07.205170 1 main.go:245] Starting OS watcher.
I0123 16:54:07.205480 1 main.go:260] Starting Plugins.
I0123 16:54:07.205513 1 main.go:317] Loading configuration.
I0123 16:54:07.206134 1 main.go:342] Updating config with default resource matching patterns.
I0123 16:54:07.206267 1 main.go:353]
Running with config:
{
"version": "v1",
"flags": {
"migStrategy": "none",
"failOnInitError": false,
"mpsRoot": "",
"nvidiaDriverRoot": "/",
"nvidiaDevRoot": "/",
"gdsEnabled": false,
"mofedEnabled": false,
"useNodeFeatureAPI": null,
"deviceDiscoveryStrategy": "auto",
"plugin": {
"passDeviceSpecs": false,
"deviceListStrategy": [
"envvar"
],
"deviceIDStrategy": "uuid",
"cdiAnnotationPrefix": "cdi.k8s.io/",
"nvidiaCTKPath": "/usr/bin/nvidia-ctk",
"containerDriverRoot": "/driver-root"
}
},
"resources": {
"gpus": [
{
"pattern": "*",
"name": "nvidia.com/gpu"
}
]
},
"sharing": {
"timeSlicing": {}
},
"imex": {}
}
I0123 16:54:07.206276 1 main.go:356] Retrieving plugins.
E0123 16:54:07.206553 1 factory.go:112] Incompatible strategy detected auto
E0123 16:54:07.206570 1 factory.go:113] If this is a GPU node, did you configure the NVIDIA Container Toolkit?
E0123 16:54:07.206574 1 factory.go:114] You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
E0123 16:54:07.206577 1 factory.go:115] You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
E0123 16:54:07.206580 1 factory.go:116] If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes
I0123 16:54:07.206583 1 main.go:381] No devices found. Waiting indefinitely.
`

The text was updated successfully, but these errors were encountered:

elezar · 2025-02-03T12:33:06Z

@smartlocus could you exec into the device plugin container and confirm that you can run nvidia-smi in that container. If this works, then the device plugin should be detecting the available devices. If not, then the injection of the driver and devices from the host into its container is not working as expected.

What is your current crio config?
How is the NVIDIA Container Toolkit installed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incompatible strategy detected auto No devices found. Waiting indefinitely #1134

Incompatible strategy detected auto No devices found. Waiting indefinitely #1134

smartlocus commented Jan 23, 2025

elezar commented Feb 3, 2025

Incompatible strategy detected auto No devices found. Waiting indefinitely #1134

Incompatible strategy detected auto No devices found. Waiting indefinitely #1134

Comments

smartlocus commented Jan 23, 2025

elezar commented Feb 3, 2025