Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROLLER_ORIGINAL_DESIRED_ON_TAG is not compatible with cluster-autoscaler #43

Open
TwiN opened this issue Apr 24, 2020 · 7 comments
Open

Comments

@TwiN
Copy link
Contributor

TwiN commented Apr 24, 2020

aws-asg-roller fights with cluster-autoscaler when the latter tries to scale up and the former tries to return to the original desired count.

This should not have been enabled by default, and should've been configurable through an environment variable.

After looking at how this was implemented, it doesn't seem like it's going to be a simple task at all to make this configurable.

@TwiN
Copy link
Contributor Author

TwiN commented Apr 24, 2020

Honestly, I'm a bit stumped. I can't think of any way to make it configurable cleanly other than reverting the entire change, as this is now used in every core functions.

On one hand, it provides a way to survive application restarts, but on the other hand, if new nodes are scheduled by cluster-autoscaler while aws-asg-roller tries to get to the desired amount, it will cause availability issues, which is not a risk I can take

@deitch @outofcoffee do you have any suggestion?

@TwiN
Copy link
Contributor Author

TwiN commented Apr 24, 2020

Yeah that's not going to work.

The problem is that it's not really possible for aws-asg-roller to know what nodes were scaled up as a result of its change to the desired count of the ASG, or cluster-autoscaler's change -- this is because nodes are not scaled up instantly, it takes some time.

As a result, it's not possible to update the "real" original desired count with the new nodes from cluster-autoscaler, because they can't be differentiated from the ones spun up because of aws-asg-roller.

Technically, we could listen to the events created by cluster-autoscaler, but the problem is that cluster-autoscaler actually takes the nodes spun up by aws-asg-roller into consideration, meaning it may send an event that was in fact caused by aws-asg-roller.

What was nice with the previous was of calculating the "original" desired count was that it was recalculated on each run, thus taking cluster-autoscaler's changes into consideration.

@deitch
Copy link
Owner

deitch commented Apr 26, 2020

Exactly. Two distinct systems fighting each other. This works fine when each works in isolation - not every ASG rolling is a k8s cluster, so not everyone has a separate autoscaler working in parallel (or more correctly, against it) - but when you put the two together, it is a problem.

They each need to find a way to do one of:

  • assume that they are "not alone in the universe" and coordinate (back-end database, messages, something)
  • have their calculations take into account the other

The problem is that each just operates on its own.

Do you recall how cluster-autoscaler does its own calculations, if it stores the "original" and how?

@TwiN
Copy link
Contributor Author

TwiN commented Apr 26, 2020

@deitch As far as I know, cluster-autoscaler publishes its current status in a configmap named cluster-autoscaler-status, however, I don't think it reads from that ConfigMap - only writes in it.

To summarize, CA does this every scan-interval (defaults to 10 seconds):

  • Checks all events in the cluster
  • If there are any events about pods that cannot be scheduled due to lack of resources, it will look for what ASG matches what that pod needs (node selector, taint, etc.) and scales that ASG up
  • Checks if any pod could be rescheduled into a different node while keeping the average of all nodes that matches that pod's requirement below scale-down-utilization-threshold (default: 0.5, meaning 50% resources requested)
  • Checks if any node has been unneeded for scale-down-unneeded-time (default: 10 minutes)
  • If a node has been unneeded for scale-down-unneeded-time, CA begins the scaling down process (node becomes unschedulable, pods get rescheduled in different nodes)

The process above varies based on the parameters passed, but that's the gist of it.

If CA restarts, the entire process starts over, which is acceptable because the worst case scenario is that there's too many nodes (scaling up is instant, scaling down requires not needing that node for scale-down-unneeded-time), but that's completely fine because scaling down is not as critical as scaling up (too many nodes = extra cost, while too little nodes = availability issues).

As of #37, this project is no longer be compatible with cluster-autoscaler, and this is because it no longer only scales down old nodes, it can also scale down new nodes because it wants to go back to its original desired count.

aws-asg-roller's primary directive should be only to replace old nodes by new nodes, not decide the final number of nodes.

@outofcoffee
Copy link
Contributor

outofcoffee commented Apr 26, 2020

Hi folks, the original version of the #37 PR had the new behaviour hidden behind a configuration option and retained the original behaviour to preserve existing semantics.

See the references to ROLLER_ORIGINAL_DESIRED_ON_TAG in the PR description for details.

If it would help in the interim whilst the approach is figured out, we could do the same again.

@TwiN
Copy link
Contributor Author

TwiN commented Apr 27, 2020

@outofcoffee Ah, I had seen c1e8e98 but assumed that the configurable portion of it had been removed due to added complexity (specifically for testing)

@TwiN TwiN changed the title Return to original desired causes several issues with scalability Return to original desired is not compatible with cluster-autoscaler Apr 27, 2020
@TwiN TwiN changed the title Return to original desired is not compatible with cluster-autoscaler ROLLER_ORIGINAL_DESIRED_ON_TAG is not compatible with cluster-autoscaler Apr 27, 2020
@TwiN
Copy link
Contributor Author

TwiN commented Jun 10, 2020

Bit of an update on this, reintroducing ROLLER_ORIGINAL_DESIRED_ON_TAG just ended up surfacing new problems for me when leveraged on my clusters, and I ended up making a new ASG roller which almost handles everything differently, so while this is no longer a problem for me, I encourage others to still keep an eye out on the aforementioned problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants