Original comment by @droberts195:
In LINK REDACTED it was noted that failed jobs would incorrectly reopen if the node they had been running on was restarted, and a fix for the problem was made in LINK REDACTED. The fix was to make the nodeOperation a no-op for failed or closing jobs.
However, @bleskes pointed out that a better fix might be to prevent allocation of persistent tasks for ML jobs if the status is failed or closing. This sounds like a better idea, but some research and testing is required to find out what all the implications and side effects are.