-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AKS issue for taint removal #2934
Comments
Hi pedroamaralm, AKS bot here 👋 I might be just a bot, but I'm told my suggestions are normally quite good, as such:
|
This issue also interferes with deploying and scaling AKS clusters which use Cilium for networking, as Cilium (and likely other similar solutions) needs a taint on fresh nodes to prevent pods from being scheduled there until Cilium configures the fresh node. |
Triage required from @Azure/aks-pm |
Hi @pedroamaralm, Instead of using the AKS api and then changing the taints on the node, you can use an admission controller to add taints to new node objects. |
Hi @allyford , Until yesterday, the taint removal worked like a charm. As of today, our cluster was broken (pods couldn't be scheduled any more...). Our aks is deployed to "West Europe". How is this supposed to work in the future. Do we need to have affinity-rules for that? Can we remove taints again? Cheers! |
Until yesterday, It was working fine. I have a |
Hi, AKS cluster is unusable for us. All our test env run on spot instances so now we are blocked. Was this change announced anywhere? |
Hi all, Thank you for all of your questions and feedback. A couple solutions here for the above issues:
This behavior change was listed in the 2022-04-24 AKS Release notes . |
this is a breaking change for us too. clusters where we run cilium on are now unusable. cilium uses the taints to ensure the cilium agent pods are deployed before any other workload and are using the cilium cni:
extremely disappointing. |
What is the purpose of this feature? It takes away key functionality of k8s. I don't understand what direction you are trying to take this and how taking this away is benefiting users of AKS. Is there any plans to make it so that taints can be controlled down the individual node? Managing taints at the nodepool level is a terrible idea. |
I just fonud out about this change when our production deployment failed due to Cilium not being able to remove the taint any longer. Cluster was provisioned using Terraform and taint was set in the same way. Cilium was chosen because it supports something both NetworkPolicy solutions that MS offers do not, DNS based egress policies. In this day and age I consider IP based network policies practically useless for anything that's residing outside the cluster. |
This change is producing issues for us too. We use an initial taint on nodes in certain pools to prevent scheduling of pods on new nodes until required provisioning done by a daemon set on the node is completed and the taint removed. The mutating admission webhook sounds like a very complicated and brittle workaround for what used to be a simple, kubernetes native way of dealing with this. |
Hello @hacst ,
I tested this on k8s version 1.21.7. |
@gara-MI thanks. Unfortunately it is not really applicable to my use-case. Though it might be an option for people specifically wanting to get rid of the spot instance taint throughout the cluster. For us as the node-pool scales up/down we would have to add/remove these mutation resources dynamically for each node once it becomes ready/goes away to replicate what we had. Also we are not running OPA in our cluster. |
Hi @hacst I see your use case is different, I think it's more useful for others where they are using daemonset to remove taints on spot nodes. |
@gara-MI I think this will not work if you have azure policies enabled, because Azure manages gatekeeper and there are no "mutation"-Ressources deployed. |
I managed to fix our Cilium clusters, here's a brief explanation how: |
I think that a workaround more simple for most cases is to delete the aks-node-validating-webhook:
But if you are more conservative, you can just disable:
To guarantee that modification will be permanent, I also have created a job to do the deactivation every minute and remove all spots node taints: |
@camposdelima Excelent find! Had I known about this I wouldn't have bothered with an admission controller. |
This is going to start failing on future releases FYI. You cannot disable system validating webhooks
What functionality of k8s is this removing? You're not forced to managed taints at nodepool level, though i'd love to know more why it's a terrible idea. You cand add any taint via the k8s API to any individual node, and also remove it. What you can't do is add nodepool taints via the AKS API and then remove them of individual nodes via the k8s API.
If it was a native k8s way, you could do it all via the k8s API no? :) But in this case you were using the AKS api to set them, and the k8s API to remove them right? This causes conflicts at reconciliation time for the service. |
Listening to the feedback on this thread it seems all use cases that were using this prior behavior are along the lines of node initialization. We can add startup taints, passed to the kubelet in order to meet this use case. They will not be enforced by AKS, or passed on the AKS API but on the kubelet custom settings, and you can then remove them via k8s API call, because they won't be reconciled by the API. Does anyone have a use case that this wouldn't meet? Otherwise we can provide this asap |
To summarize and be more constructive about this since it's a major blocker for this important project I've been working on at work: For my project requirement, I have the interesting and fun challenge of making it work on all top providers. I got 4 so far which all works the same. And recently the project and other projects using it have much more incentive to use AKS, so we have been starting to work on supporting it. And so far it looks promising and 95% working. But we are blocked for one small part. Basically we package our solution in a helm chart, and it has daemonsets that will configure any current and new nodes joining the cluster that are matching certain labels, so they are ready to pull a variety of large gameserver images in an optimized way. And the only super reliable, cross-provider and very simple way we found after having trying many different things in live environments, was to add a temporary taint This works very well on Aws, Gcp, Rancher, Linode and Tencent so far. But on AKS we get that error that we cannot remove our own taint using K8 api. (Not sure if AKS autoscaler has that logic though) I'm pretty sure any project that wants to setup special node is using this method too which those providers allows for those special node setup purposes. So yeah, here's my rough suggestion: At the AKS nodepool api level:
|
This is actually very simple, but it requires a certain respect for the customer. Microsoft, don't put arbitrary taints on nodepools, ever. If you think a nodepool should have a taint as a best practice - note it in the documentation, or if it's so critical, put a warning somewhere during provisioning. |
Are there any updates on this? I'm looking to utilize spot instances for my customers' dev and test environments. Unfortunately, I rely heavily on resources like Consul, which perform numerous side tasks. To configure them, I currently have to fork the original Helm chart, which isn't ideal. A big thank you to @7wingfly. Your workaround was a lifesaver! However, it would be great if Microsoft could address this issue, as it presents a significant limitation for some of us.
|
Looking at the next "obvious" alternative to go around this different behavior from AKS vs all other major and medium k8 clusters providers: Anyone knows if we would be able to create a core node mutating webhook to achieve this, or is there another system that will block that too from either k8 or aks? Regards, |
See #3464 for Startup taints feature currently in development. I'll be updating on that issue with release timelines and happy to answer additional questions there. |
Startup taints passed to kubelet? As in create a cluster, then edit some configmap somewhere or patch it and somehow target a nodepool by what, labels set using api? While everywhere else you just create a cluster and pass relevant options to achieve this? It sounds like an aks specific workaround to be honest :/ |
In the end what would be absolutely amazing is doing this thru the api and the various toolings around them like terraform or pulumi, to create clusters with taints and labels that can be fully managed and deleted or modified using k8 api later, without this aks specific piece of the logic that blocks k8 actions on them later, contrary to the other 6 providers I'm working with every week. Doing otherwise means at least for our project and users of it that we and them would need to starts modifying their cluster creation flow with something very outside the usual boxes / new extra flow, specifically for AKS. Also I think from what I'm seeing on recent gcp updates on that subject and their api, they might be going the route of 2 set of options in their api for both labels and taints. One that can't be changed later and one that can. Not an unreasonable way IMO. |
Positive update: a node mutating hook on aks and aws so far is able to intercept node create event and mutate/patch its labels and/or taints. It puts the bar high to do something that is so trivial elsewhere but at least that seems to work. |
This issue is neither stale nor solved. @allyford maybe you can give an update on the implementation progress? We are still waiting for a fix for this regression and some of the workarounds actually have ongoing cost to us. |
|
Any update on this issue? How can the taint be removed at nodepool level |
That solves a subset of the issues talked about in this thread. What me, @hacst and many more want is the ability to assign a taint to a node-pool such that new nodes start with that taint and that we are able to remove the taint from nodes at a later point (for example as part of a |
Ability to set taints on nodepool level has the potential to create scaleup loops, in case the taints are not removed quickly enough by user, and autoscaler keeps trying to start new nodes because pods could not be scheduled. Autoscaler provides a feature to prevent such scenario, using |
My application is slightly different from others in this thread, but have had success with @megakid's approach of using Kyverno & suspect it can generalise quite well. I am using NVIDIA's GPU Operator to configure GPU drivers and MIGs (multi-instance GPUs) - but the GPU Operator requires a node reboot before MIG can be used. The problem arose because the scheduler wouldn't notice that MIG was not configured and would therefore schedule jobs on the node just before it rebooted (this caused an elevated rate of job failure). I worked around this by adding 2 Kyverno Note: I had to configure the Kyverno deployment to be able to manage
|
This issue has been automatically marked as stale because it has not had any activity for 21 days. It will be closed if no further activity occurs within 7 days of this comment. |
This issue has been automatically marked as stale because it has not had any activity for 21 days. It will be closed if no further activity occurs within 7 days of this comment. |
This issue will now be closed because it hasn't had any activity for 7 days after stale. pedroamaralm feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion. |
|
Just found this closed issue, as @pranaysahith said, this still doesn't work, taints are not removed from spot pools. I want to use those pools just like any other pools, without forcing tolerations or using OPA/Kyverno to modifiy every single deployment; pleae make the taint optional. |
What happened:
Hi, until last month I was able to remove the taint from my nodes but since then I can't anymore, I haven't made any changes, either in the nodes or in politics, I just can't.
The following problem occurs:
p@p:~$ kubectl taint nodes aks-lxnode1-xxxxxxx-vmss00000y kubernetes.azure.com/scalesetpriority=spot:NoSchedule-
Error from server: admission webhook "aks-node-validating-webhook.azmk8s.io" denied the request: (UID: XXXXXX-XXXXX-XXXX-XXXX) Taint delete request "kubernetes.azure.com/scalesetpriority=spot:NoSchedule" refused. User is attempting to delete a taint configured on aks node pool "lxnode1".
What you expected to happen:
If I use this command on the cluster I have elsewhere I can make it work normally
k8s@node1:
$ kubectl taint nodes node2 kubernetes.azure.com/scalesetpriority=spot:NoSchedule$ kubectl taint nodes node2 kubernetes.azure.com/scalesetpriority=spot:NoSchedule-node/node2 tainted
k8s@node1:
node/node2 untainted
How to reproduce it (as minimally and precisely as possible):
kubectl taint nodes aks-lxnode1-xxxxxxxx-vmss00000y kubernetes.azure.com/scalesetpriority=spot:NoSchedule-
Anything else we need to know?:
I use spot.io to manage spot instances and when the instance is created it is born with the taint In the Schedule, we have the job that removed this taint but now it doesn't remove it anymore and as a user we can't remove it, I also have a rancher installed on it cluster where I'm Administrator but that still won't let me remove the taint. This problem is not related to spot.io or the machines being spot because I uploaded a new cluster from scratch, with on-demand machines and after inserting the taint I couldn't remove it either.
The text was updated successfully, but these errors were encountered: