-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu-operator-nfd-worker fails to read net interface attribute speed #658
Comments
@blackliner thanks for reporting this. This was also reported to NFD upstream: kubernetes-sigs/node-feature-discovery#1556 and a fix has been merged: kubernetes-sigs/node-feature-discovery#1557. The fix is not in a released version of NFD yet, but we will pick it up as soon as it is out. |
getting this exact issue but on the loopback device instead
any updates on when a fix for this might be merged / if there are any workarounds? would be massively appreciated 🥲🙏 |
I am getting the same error |
Same error on GKE, |
same error |
same error on k3s agent. The master node works fine |
Use the latest version 0.17/0.16.* to fix this problem |
@blackliner can you verify if this issue is fixed in the latest GPU Operator version? GPU Operator 24.6.1 uses NFD v0.16.3, which should contain the fix for this issue. |
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
1. Quick Debug Information
2. Issue or feature description
It looks like these USB ethernet gadget devices do not support reading out the speed property:
On the host
Network details
Ethtool also doesnt show any relevant info
Relevant links:
3. Steps to reproduce the issue
Get one of the newest products from Supermicro with this "feature", install gpu operator and see how nfd fails to register any labels on that node.
4. Information to attach (optional if deemed irrelevant)
kubectl get pods -n OPERATOR_NAMESPACE
kubectl get ds -n OPERATOR_NAMESPACE
kubectl describe pod -n OPERATOR_NAMESPACE POD_NAME
kubectl logs -n OPERATOR_NAMESPACE POD_NAME --all-containers
nvidia-smi
from the driver container:kubectl exec DRIVER_POD_NAME -n OPERATOR_NAMESPACE -c nvidia-driver-ctr -- nvidia-smi
journalctl -u containerd > containerd.log
Collecting full debug bundle (optional):
NOTE: please refer to the must-gather script for debug data collected.
This bundle can be submitted to us via email: [email protected]
The text was updated successfully, but these errors were encountered: