You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have clusters in EKS where workers are controlled by karpenter. The worker nodes are spot instances. Therefore the cluster is quite dynamic nodes appear and disappear every few minutes.
Running helm controller on these nodes is risky because if you have long running helm install operations a given helm controller pod might be interrupted.
It would be great if the helm controller would wait before shutting down (which my still be an issue once a spot node is terminated within a 2 minute window) or would ensure that the given helm release does not stay in progress.
Another idea could be to use jobs or pods to do the single helm operation instead of doing everything in the main loop.
tldr I would like to run helm controller on short lives nodes without manual cleanups … right know I run it on fargate which is quite expensive compared to spot instances.
The text was updated successfully, but these errors were encountered:
See #149 (comment). In combination with a sensitive retry configuration, this should ensure that from next release on releases should terminate gracefully (by marking them as "failed"), and then being retried once the controller finds a new node.
We have clusters in EKS where workers are controlled by karpenter. The worker nodes are spot instances. Therefore the cluster is quite dynamic nodes appear and disappear every few minutes.
Running helm controller on these nodes is risky because if you have long running helm install operations a given helm controller pod might be interrupted.
It would be great if the helm controller would wait before shutting down (which my still be an issue once a spot node is terminated within a 2 minute window) or would ensure that the given helm release does not stay in progress.
Another idea could be to use jobs or pods to do the single helm operation instead of doing everything in the main loop.
tldr I would like to run helm controller on short lives nodes without manual cleanups … right know I run it on fargate which is quite expensive compared to spot instances.
The text was updated successfully, but these errors were encountered: