Skip to content

[ML] Don't allocate persistent tasks for failed or closing jobs #30006

@elasticmachine

Description

@elasticmachine

Original comment by @droberts195:

In LINK REDACTED it was noted that failed jobs would incorrectly reopen if the node they had been running on was restarted, and a fix for the problem was made in LINK REDACTED. The fix was to make the nodeOperation a no-op for failed or closing jobs.

However, @bleskes pointed out that a better fix might be to prevent allocation of persistent tasks for ML jobs if the status is failed or closing. This sounds like a better idea, but some research and testing is required to find out what all the implications and side effects are.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :mlMachine learning>bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions