-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I gain access to the cluster autoscaler on GKE? #966
Comments
You're correct. On GKE, Cluster Autoscaler is always configured automatically. If you run your own cluster on GCE and have access to the master machine, you can change them in Cluster Autoscaler pod's manifest. |
Thanks! |
It would be good to be able to do some configuration on CA in GKE-- as in the referenced issue, I'd like to reduce the |
I wonder if emptying a node completely helps the autoscaler to quickly remove it. For interactive analytic applications, the default value of 10 minutes for --scale-down-unneeded-time seems too large. |
It helps by eliminating drain time, and also increases throughput by allowing bulk deletes. As for default 10 minutes wait, it's a compromise of sorts - we don't want the user to wait for nodes to be added because we removed them too quickly between jobs. This being said, we haven't revised this value for a while, so if you any have feedback regarding this behavior, especially production experience with it, please let us know. |
Thanks for the reply. At the moment, we are still implementing a new service and don't have any production-level experience with it yet (but will publish the result when it is ready). |
We spin up expensive high-memory instances on demand as slaves for our integration tests. The load is intermittent, so that extra 10 minutes for 10-30 instances, multiple times a day, gets quite expensive. |
I wonder if there is any update on the default value of --scale-down-unneeded-time. I think the default value of 10 minutes is fine, but I hope GKE allows users to change the value for their own cluster, because if --scale-down-unneeded-time is set to a new value, the users should know what that actually means. For us, we would like to implement an autoscaling logic for an analytics system based on Apache Hive, and we would like to remove nodes as soon as possible once the autoscaling logic decides to retire them. |
Would be nice to be able to configure things like skip-nodes-with-system-pods or skip-nodes-with-local-storage, there's tons of config that we can't touch. |
You can now choose predefined config for more aggressive scale-down: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#autoscaling_profiles (which doesn't help for the flags you listed, but it is what is requested in comments above). |
I've been using the The current workaround I've had to resort to is this: #3322 The other solution I've been considering so we don't have to maintain a fork of the autoscaler is building some kind of admission controller to add the Being able to just configure the GKE autoscaler would completely solve this though. Perhaps configuration could be exposed in a |
Also, I'd prefer be able to monitor the cluster autoscaler using Prometheus. |
Not sure why this is closed, I guess google doesn't prioritize features that save people money |
@seeruk curious - did you solve this? did you end up writing that admission controller? funny, i was thinking of writing the same thing. |
Yeah, it was a really simple one in the end and it's still working to this day! Unfortunately it's closed source currently. |
In 1.22+ GKE no longer blocks scale-down on pods with local storage https://cloud.google.com/kubernetes-engine/docs/release-notes#October_27_2021, so an admission controller may no longer be needed. |
I just open sourced our pod labeler in case this is useful for anyone. You can use this to add the https://github.com/troop-dev/k8s-pod-labeler |
I use custom-autoscaler on GKE to test my PRs. If anyone's interested I wrote a blogpost around how to delpoy your own cluster-autoscaler on GKE: https://vadasambar.com/post/kubernetes/how-to-deploy-custom-ca-on-gcp/ If you don't want to jump to the blogpost, here's a summarized version:
|
@vadasambar Do your cluster-autoscaler logs show that it manages only one instance group and the others |
@Nickmman , not sure if this answers your question but I have used the custom cluster-autoscaler (multiple times) to manage only 1 instance group and it works fine with me. If you check this comment, you will see I have 2 instance groups backed nodepools called |
added WattIQ to adopters list
I am looking to modify some of the auto scaling options, but this does not seem to be possible on GKE?
It's not clear where to run these 'flags' mentioned in the FAQ or even where these command line flags need to be executed on.
Similar issue is brought up here:
https://stackoverflow.com/questions/48963625/where-to-config-the-kubernetes-cluster-autoscaler-on-google-cloud
The text was updated successfully, but these errors were encountered: