We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We've seen a couple of instances where the nvidia-device-plugin pod fails when the apiserver is not available for a short amount of time.
nvidia-device-plugin
In this situation, any PODs that require the GPU as resource cannot be scheduled, since the plugin no longer provides it to kubelet.
We see the following in the logs a couple times, but nothing interesting before, or after these messages.
/build/cmd/config-manager/main.go:287: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.28.0.1:443/api/v1/nodes?fieldSelector=metadata.name%3Dlab-mtv-qa03&resourceVersion=13505357": net/http: TLS handshake timeout /build/cmd/config-manager/main.go:287: failed to list *v1.Node: Get "https://172.28.0.1:443/api/v1/nodes?fieldSelector=metadata.name%3Dlab-mtv-qa03&resourceVersion=13505357": net/http: TLS handshake timeou Trace[1419135208]: "Reflector ListAndWatch" name:/build/cmd/config-manager/main.go:287 (05-Jan-2025 07:08:27.960) (total time: 32816ms): Trace[1419135208]: ---"Objects listed" error:Get "https://172.28.0.1:443/api/v1/nodes?fieldSelector=metadata.name%3Dlab-mtv-qa03&resourceVersion=13505357": net/http: TLS handshake timeout 32513ms (07:09:00.473) Trace[1419135208]: [32.816731696s] [32.816731696s] END /build/cmd/config-manager/main.go:287: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.28.0.1:443/api/v1/nodes?fieldSelector=metadata.name%3Dlab-mtv-qa03&resourceVersion=13505357": net/http: TLS handshake timeout
The text was updated successfully, but these errors were encountered:
No branches or pull requests
We've seen a couple of instances where the
nvidia-device-plugin
pod fails when the apiserver is not available for a short amount of time.In this situation, any PODs that require the GPU as resource cannot be scheduled, since the plugin no longer provides it to kubelet.
We see the following in the logs a couple times, but nothing interesting before, or after these messages.
The text was updated successfully, but these errors were encountered: