Skip to content

Conversation

@benwtrent
Copy link
Member

Backports the following commits to 8.4:

… after being allocated to node (elastic#88945)

When a model is starting, it has been rarely observed that it will lock up while trying to restore the model objects to the native process.

This would manifest as a trained model being stuck in "starting" while also being assigned to a node. So, there is a native process started and task available on the assigned nodes, but the model state never gets out of "starting".
@benwtrent benwtrent added :ml Machine learning >bug auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport cloud-deploy Publish cloud docker image for Cloud-First-Testing Team:ML Meta label for the ML team labels Aug 1, 2022
@elasticsearchmachine elasticsearchmachine merged commit 188d56a into elastic:8.4 Aug 1, 2022
@benwtrent benwtrent deleted the backport/8.4/pr-88945 branch August 1, 2022 14:45
@mark-vieira mark-vieira added v8.4.0 and removed v8.4.1 labels Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >bug cloud-deploy Publish cloud docker image for Cloud-First-Testing :ml Machine learning Team:ML Meta label for the ML team v8.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants