Skip to content

Conversation

@benwtrent
Copy link
Member

Backports the following commits to 7.x:

This provides an autoscaling service integration for machine learning.

The underlying logic is fairly straightforward with a couple of caveats:
- When considering to scale up/down, ML will automatically translate between Node size and the memory that the node will potentially provide for ML after the scaling plan is implemented.
- If knowledge of job sizes is out of date, we will do a best effort check for scaling up. But, if that cannot be determined with our current view of job memory, we attempt a refresh and return a no_scale event
- We assume that if the auto memory percent calculation is being used, we treat all JVM sizes on the nodes the same.
- For scale down, we keep our last scale down calculation time in memory. So, if master nodes are changed in between, we reset the scale down delay.
@benwtrent benwtrent added :ml Machine learning backport labels Nov 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent benwtrent merged commit caf35c9 into elastic:7.x Nov 17, 2020
@benwtrent benwtrent deleted the backport/7.x/pr-59309 branch November 17, 2020 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :ml Machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants