Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon containers not getting destroyed when enabling POD_NAMES as v2 #8692

Closed
3 tasks done
KesavanKing opened this issue May 10, 2022 · 5 comments · Fixed by #8708
Closed
3 tasks done

Daemon containers not getting destroyed when enabling POD_NAMES as v2 #8692

KesavanKing opened this issue May 10, 2022 · 5 comments · Fixed by #8708
Labels
area/controller Controller issues, panics type/bug

Comments

@KesavanKing
Copy link
Contributor

Checklist

  • Double-checked my configuration.
  • Tested using the latest version.
  • Used the Emissary executor.

Summary

What happened/what you expected to happen?

With POD_NAMES v2
Started workflow controller with env POD_NAMES v2
Created a simple daemon container. Workflow succeeded but the daemon container was still running.
When I check the logs no kill signal is sent to the daemon pod.

Without POD_NAMES v2
When i remove the env , the kill signal is properly sent to the daemon pod. Daemon pod name might not constructed properly

What version are you running? : 3.3.5

Diagnostics

Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: daemon-check
spec:
  entrypoint: main
  templates:
    - name: main
      dag:
        tasks:
          - name: daemoned
            template: daemoned
          - name: daemon-dependent
            dependencies: [daemoned]
            template: daemon-dependent
    - name: daemoned
      daemon: true
      container:
        image: argoproj/argosay:v2
        command: ["sleep"]
        args: ["600"]
    - name: daemon-dependent
      container:
        image: docker/whalesay:latest
        command: [cowsay]
        args: ["hello world"]
Logs with POD_NAME v2
time="2022-05-10T03:39:05.853Z" level=info msg="Processing workflow" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:05.858Z" level=info msg="Get configmaps 200"
time="2022-05-10T03:39:05.859Z" level=info msg="resolved artifact repository" artifactRepositoryRef="argo/#"
time="2022-05-10T03:39:05.859Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:05.859Z" level=info msg="DAG node test-zkw9z initialized Running" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:05.859Z" level=info msg="All of node test-zkw9z.daemoned dependencies [] completed" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:05.859Z" level=info msg="Pod node test-zkw9z-431270118 initialized Pending" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:05.864Z" level=info msg="Create events 201"
time="2022-05-10T03:39:05.890Z" level=info msg="Create pods 201"
time="2022-05-10T03:39:05.891Z" level=info msg="Created pod: test-zkw9z.daemoned (test-zkw9z-daemoned-431270118)" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:05.892Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:05.892Z" level=info msg=reconcileAgentPod namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:05.902Z" level=info msg="Update workflows 200"
time="2022-05-10T03:39:05.903Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=205931334 workflow=test-zkw9z
time="2022-05-10T03:39:05.911Z" level=info msg="Create events 201"
time="2022-05-10T03:39:10.634Z" level=info msg="Get leases 200"
time="2022-05-10T03:39:10.639Z" level=info msg="Update leases 200"
time="2022-05-10T03:39:15.648Z" level=info msg="Get leases 200"
time="2022-05-10T03:39:15.654Z" level=info msg="Update leases 200"
time="2022-05-10T03:39:15.896Z" level=info msg="Processing workflow" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:15.897Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=test-zkw9z
time="2022-05-10T03:39:15.897Z" level=info msg="Node became daemoned" namespace=argo nodeId=test-zkw9z-431270118 workflow=test-zkw9z
time="2022-05-10T03:39:15.897Z" level=info msg="node changed" new.message= new.phase=Running new.progress=0/1 nodeID=test-zkw9z-431270118 old.message= old.phase=Pending old.progress=0/1
time="2022-05-10T03:39:15.897Z" level=info msg="All of node test-zkw9z.daemon-dependent dependencies [daemoned] completed" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:15.897Z" level=info msg="Pod node test-zkw9z-1814496085 initialized Pending" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:15.931Z" level=info msg="Create pods 201"
time="2022-05-10T03:39:15.932Z" level=info msg="Created pod: test-zkw9z.daemon-dependent (test-zkw9z-daemon-dependent-1814496085)" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:15.933Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:15.933Z" level=info msg=reconcileAgentPod namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:15.946Z" level=info msg="Update workflows 200"
time="2022-05-10T03:39:15.947Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=205931403 workflow=test-zkw9z
time="2022-05-10T03:39:15.955Z" level=info msg="Create events 201"
time="2022-05-10T03:39:20.661Z" level=info msg="Get leases 200"
time="2022-05-10T03:39:20.666Z" level=info msg="Update leases 200"
time="2022-05-10T03:39:25.675Z" level=info msg="Get leases 200"
time="2022-05-10T03:39:25.681Z" level=info msg="Update leases 200"
time="2022-05-10T03:39:25.937Z" level=info msg="Processing workflow" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.937Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg="node unchanged" nodeID=test-zkw9z-431270118
time="2022-05-10T03:39:25.938Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg="node changed" new.message= new.phase=Succeeded new.progress=0/1 nodeID=test-zkw9z-1814496085 old.message= old.phase=Pending old.progress=0/1
time="2022-05-10T03:39:25.938Z" level=info msg="Outbound nodes of test-zkw9z set to [test-zkw9z-1814496085]" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg="node test-zkw9z phase Running -> Succeeded" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg="node test-zkw9z finished: 2022-05-10 03:39:25.93878082 +0000 UTC" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg="Checking daemoned children of test-zkw9z" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg=reconcileAgentPod namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg="Updated phase Running -> Succeeded" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg="Marking workflow completed" namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.938Z" level=info msg="Checking daemoned children of " namespace=argo workflow=test-zkw9z
time="2022-05-10T03:39:25.944Z" level=info msg="cleaning up pod" action=deletePod key=argo/test-zkw9z-1340600742-agent/deletePod
time="2022-05-10T03:39:25.944Z" level=info msg="cleaning up pod" action=shutdownPod key=argo/test-zkw9z-431270118/shutdownPod
time="2022-05-10T03:39:25.945Z" level=info msg="Create events 201"
time="2022-05-10T03:39:25.948Z" level=info msg="Update workflows 200"
time="2022-05-10T03:39:25.949Z" level=info msg="Delete pods 404"
time="2022-05-10T03:39:25.949Z" level=info msg="Workflow update successful" namespace=argo phase=Succeeded resourceVersion=205931470 workflow=test-zkw9z
time="2022-05-10T03:39:25.954Z" level=info msg="Create events 201"
time="2022-05-10T03:39:25.954Z" level=info msg="DeleteCollection workflowtaskresults 200"
time="2022-05-10T03:39:25.959Z" level=info msg="Create events 201"
time="2022-05-10T03:39:25.959Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/test-zkw9z-daemoned-431270118/labelPodCompleted
time="2022-05-10T03:39:25.959Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/test-zkw9z-daemon-dependent-1814496085/labelPodCompleted
time="2022-05-10T03:39:25.966Z" level=info msg="Create events 201"

Logs without POD_NAME v2
time="2022-05-10T03:30:39.975Z" level=info msg="Processing workflow" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:39.975Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=test-27xpp
time="2022-05-10T03:30:39.976Z" level=info msg="Node became daemoned" namespace=argo nodeId=test-27xpp-2065021140 workflow=test-27xpp
time="2022-05-10T03:30:39.976Z" level=info msg="node changed" new.message= new.phase=Running new.progress=0/1 nodeID=test-27xpp-2065021140 old.message= old.phase=Pending old.progress=0/1
time="2022-05-10T03:30:39.976Z" level=info msg="All of node test-27xpp.daemon-dependent dependencies [daemoned] completed" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:39.976Z" level=info msg="Pod node test-27xpp-2674366387 initialized Pending" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:40.012Z" level=info msg="Create pods 201"
time="2022-05-10T03:30:40.016Z" level=info msg="Created pod: test-27xpp.daemon-dependent (test-27xpp-2674366387)" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:40.017Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:40.017Z" level=info msg=reconcileAgentPod namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:40.024Z" level=info msg="Update workflows 200"
time="2022-05-10T03:30:40.025Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=205928263 workflow=test-27xpp
time="2022-05-10T03:30:40.030Z" level=info msg="Create events 201"
time="2022-05-10T03:30:40.873Z" level=info msg="Get leases 200"
time="2022-05-10T03:30:40.878Z" level=info msg="Update leases 200"
time="2022-05-10T03:30:45.884Z" level=info msg="Get leases 200"
time="2022-05-10T03:30:45.890Z" level=info msg="Update leases 200"
time="2022-05-10T03:30:50.017Z" level=info msg="Processing workflow" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.017Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=test-27xpp
time="2022-05-10T03:30:50.017Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.018Z" level=info msg="node changed" new.message= new.phase=Succeeded new.progress=0/1 nodeID=test-27xpp-2674366387 old.message= old.phase=Pending old.progress=0/1
time="2022-05-10T03:30:50.018Z" level=info msg="node unchanged" nodeID=test-27xpp-2065021140
time="2022-05-10T03:30:50.019Z" level=info msg="Outbound nodes of test-27xpp set to [test-27xpp-2674366387]" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.019Z" level=info msg="node test-27xpp phase Running -> Succeeded" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.019Z" level=info msg="node test-27xpp finished: 2022-05-10 03:30:50.019392447 +0000 UTC" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.019Z" level=info msg="Checking daemoned children of test-27xpp" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.020Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.020Z" level=info msg=reconcileAgentPod namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.020Z" level=info msg="Updated phase Running -> Succeeded" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.020Z" level=info msg="Marking workflow completed" namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.020Z" level=info msg="Checking daemoned children of " namespace=argo workflow=test-27xpp
time="2022-05-10T03:30:50.024Z" level=info msg="cleaning up pod" action=shutdownPod key=argo/test-27xpp-2065021140/shutdownPod
time="2022-05-10T03:30:50.025Z" level=info msg="https://xxxx.com:443/api/v1/namespaces/argo/pods/test-27xpp-2065021140/exec?command=%2Fbin%2Fsh&command=-c&command=kill+-15+%24%28pidof+argoexec%29&container=wait&stderr=true&stdout=true&tty=false"
time="2022-05-10T03:30:50.026Z" level=info msg="cleaning up pod" action=deletePod key=argo/test-27xpp-1340600742-agent/deletePod
time="2022-05-10T03:30:50.028Z" level=info msg="Create events 201"
time="2022-05-10T03:30:50.029Z" level=info msg="Update workflows 200"

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@alexec
Copy link
Contributor

alexec commented May 10, 2022

@JPZ13 could you take a look please?

@alexec
Copy link
Contributor

alexec commented May 11, 2022

The bug is in this line of code:

woc.controller.queuePodForCleanup(woc.wf.Namespace, childNode.ID, terminateContainers)

Should be something like this:

		podName := util.PodName(woc.wf.Name, childNode.Name, childNode.TemplateName, childNode.ID, util.GetWorkflowPodNameVersion(woc.wf))
		woc.controller.queuePodForCleanup(woc.wf.Namespace, podName, terminateContainers)

@KesavanKing would you like to submit a PR to fix?

@alexec alexec added area/controller Controller issues, panics and removed triage labels May 11, 2022
@KesavanKing
Copy link
Contributor Author

KesavanKing commented May 11, 2022

Sure
I'll raise a PR for the same.

@JPZ13
Copy link
Member

JPZ13 commented May 11, 2022

Sorry I didn't see this earlier. Happy to review your PR @KesavanKing

@KesavanKing
Copy link
Contributor Author

@JPZ13 @alexec
Can anyone one of you please review #8708

terrytangyuan pushed a commit that referenced this issue May 11, 2022
This was referenced Jun 20, 2022
@sarabala1979 sarabala1979 mentioned this issue Jul 30, 2022
51 tasks
sarabala1979 pushed a commit that referenced this issue Aug 8, 2022
Signed-off-by: i342464 <[email protected]>
Signed-off-by: Saravanan Balasubramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants