-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removing unnecessary condition #29
Conversation
Signed-off-by: julian3xl <[email protected]>
@St0rmingBr4in or @y2kenny-amd, I'm asking you to review this pull request because you are the latest ones with activity on this repository that seems to be really abandoned :( |
My understanding is the NFD is an add-on. Do you have any documentation to indicate otherwise? |
Yeah, the problem is that if I don't want to install the included NFD because I have one already running on my cluster (that's gonna be the 99,9% of the use cases), I have to set nfd.enabled to false, and because this condition on the chart I'm not able to setup node selectors anymore, and they are really important because otherwise I can't avoid the amd gpu device plugin to be deployed into all my machines on the cluster |
what I'm trying to say is that node selectors is something related with Kubernetes and they should not be dependant on if I'm installing and add-on or not |
I don't understand. If you use NFD on your cluster, you set nfd.enabled to on here to use the node selector. This device plugin does not install NFD for you. |
Ah ok, may be that's the real issue. My intention was for that dependencies to be optional (or may be I can make that dependencies 'softer' some how.) |
well, that's fine. I've seen that in other charts and it's fine to get it installed in case you don't want to do it by yourself but anyway kubernetes standard fields should be always settable in my opinion. |
Because the documentation says that the node selector is set to feature.node.kubernetes.io/pci-0300_1002.present: "true" by default and that's totally fine and acceptable if someone might want to keep it empty, it just needs to set Values.node_selector to "" so this chart would be fitting the standard. |
Um.... I will need to think about this. Not necessarily disagreeing here just need to context switch my brain back into this because I don't work with Helm every day. |
Will node selector issue be resolved anytime soon? Kustomizing helm charts for basic features is a pain. |
To be honest, I don't see the maintainers of this repo really interested on fix this... 🤷 |
Is this still an issue now that the helm dependency uses >= instead of = for nfd when nfd.enabled is true? |
nodeSelector is a standard helm chart property, it exists on every chart so this discussion doesn't make sense, regardless if you're installing NFD or not you should be able to set node selectors, man! that's the charts standard. Go artifacthub.io, check some charts and you will see that almost all of them have the option to set nodeSelectors. I see your point, you want to ensure:
ok, put some helper to merge that selector to .Values.node_selector so users could add their custom nodeSelectors too without having to install nfd from your chart. |
Can you update this pull request to reflect what you mean by this? |
I agree, this is an issue. The situation is that this chart installs NFD, if nfd.enabled=true : However the default value is nfd.enabled=false, and in scenarios discussed here, which would be the case for me too, it must remain false, because I need to install and configure NFD separately myself, not using this chart. That being the case, with nfd disabled, it unnecessarily blocks the use of nodeselectors: https://github.com/ROCm/k8s-device-plugin/blob/master/helm/amd-gpu/templates/deviceplugin-daemonset.yaml#L19 However nodeselectors are a normal built-in k8s feature that should generally always be configurable by chart users if they want. Nodeselectors do not depend on NFD per se, though may rely on labels set by NFD. Unfortunately this chart overloads the IIUC, the change proposed here would result in the default behaviour changing from: An alternative approach would be to add a node_selectors_enabled variable (default false). People who were relying on the previous behaviour of setting nfd.enabled=true in their values would need to also set node_selectors_enabled: true in order to maintain the same behaviour as before. If they upgrade without reading the release notes the failure scenario would be that the device plugin would run on all nodes instead of only the selected nodes. It seems to me the safer and more conservative approach is to avoid unintentionally running on undesired nodes? All things considered, if I have understood the situation correctly, IMHO the simplest and safest way to fix this would be to merge this PR and add a release note to the chart docs saying "if you want to keep the default behaviour of running the plugin on all nodes set @y2kenny what do you think? |
@rptaylor, thanks for the detailed explanation. I think others have wrote similar thing but it didn't 'click' in my head until I read what you wrote so that is much appreciated. As to the solution to this problem, I actually don't think running the plugin on every single node is an issue, because:
If the choice is between running the plugin on all nodes vs no nodes as the default behaviour, I would prefer all because the user is installing the Helm Chart for a reason. Ideally I would have a context sensitive default value but sounds like that is also not welcome because nodeselector is well used/customized by cluster admins. The remaining issue is how to communicate the |
Meaning that it will remain in interruptible sleep indefinitely?
I am not sure how the chart could detect the right default, it depends on what the admin wants to do and if NFD is installed independently. Anyway I'll make a MR.
I think the values of node_selector could just remain the same. People can modify it if they need. |
Resolved by #58 |
This condition is supposed to be necessary if you install the node feature discovery chart included on this chart, but in the real world a production cluster already have that stuff set up and running, so having the proper node selector is mandatory and this condition on the chart avoids it.
Signed-off-by: julian3xl [email protected]