Handle pod deletes outside workflow lifecycle #1414

logicfox · 2019-06-11T09:39:31Z

Possibly a suggestion/feature request. I ran into an issue similar to #893 when Kured restarted a node on which a pod was executing a workflow step. This triggered the handling mechanism here which marked the workflow step as failed with the message pod deleted.

Can't this scenario be augmented with a pre-stop hook injected into the pod-spec to notify workflow-controller to better handle cases where a pod has been deleted outside of the workflow lifecycle?

https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/

The text was updated successfully, but these errors were encountered:

audriusrudalevicius · 2020-02-21T13:33:58Z

I run in to same issue with v2.6.0-rc1. To reproduce just delete node running one or several workflow pods. Whole workflow stuck in running state as some pods had "pod deleted" and newer retried. Expected behaviour is pod rescheduled with retry. Found it works with 2.4.3

whynowy · 2020-02-28T19:07:17Z

@audriusrudalevicius - Could you give more detail about your case? Argo has the ability to handle the situation that pod is deleted outside of the wf lifecycle, in general if the POD is deleted, wf will retry (if it's there in your spec) or marked as failed. I want to see if there's a bug to make it not work in your case.

audriusrudalevicius · 2020-03-02T16:27:16Z

I found the issue. The problem was in my workflow after I upgraded argo from 2.4.3 to 2.6.0. I did't changed retry parameters: before it was {retryStrategy: limit: 10} (only supported by that version). And after upgrade I got message: level=info msg="Node not set to be retried after status: Error". My fault, didn't noticed this message in workflow-controller logs. The fix for workflow to change {retryStrategy: limit: 10, retryPolicy: Always, backoff: ...} backoff i added because limit 10 can be reached quickly. Now it works. Thanks! Maybe this information will help for others

simster7 · 2020-03-02T16:29:22Z

Closing, feel free to reopen if necessary

The parameters is a field of StandardK8sTrigger based on this https://github.com/argoproj/argo-events/blob/master/api/sensor.md#standardk8strigger Signed-off-by: Tho Nguyen <[email protected]>

sarabala1979 added the type/feature Feature request label Jul 15, 2019

alexec added type/bug and removed type/feature Feature request labels Feb 21, 2020

alexec added this to the Backlog milestone Feb 21, 2020

whynowy self-assigned this Feb 28, 2020

simster7 closed this as completed Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle pod deletes outside workflow lifecycle #1414

Handle pod deletes outside workflow lifecycle #1414

logicfox commented Jun 11, 2019

audriusrudalevicius commented Feb 21, 2020 •

edited

Loading

whynowy commented Feb 28, 2020

audriusrudalevicius commented Mar 2, 2020

simster7 commented Mar 2, 2020

Handle pod deletes outside workflow lifecycle #1414

Handle pod deletes outside workflow lifecycle #1414

Comments

logicfox commented Jun 11, 2019

audriusrudalevicius commented Feb 21, 2020 • edited Loading

whynowy commented Feb 28, 2020

audriusrudalevicius commented Mar 2, 2020

simster7 commented Mar 2, 2020

audriusrudalevicius commented Feb 21, 2020 •

edited

Loading