Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI hangs when attempting to view Sensor logs if pod has more than one container #9459

Closed
3 tasks done
jsvk opened this issue Aug 28, 2022 · 0 comments · Fixed by #9438
Closed
3 tasks done

UI hangs when attempting to view Sensor logs if pod has more than one container #9459

jsvk opened this issue Aug 28, 2022 · 0 comments · Fixed by #9438

Comments

@jsvk
Copy link
Contributor

jsvk commented Aug 28, 2022

Checklist

  • Double-checked my configuration.
  • Tested using the latest version.
  • Used the Emissary executor.

Summary

What happened/what you expected to happen?

We're using a Kubernetes mutating webhook to inject a fluent-bit container into certain running pods for the purposes of collecting logs and sending them to Splunk. When attempting to view Sensor logs in the UI, the Argo Workflows API hangs and eventually returns a 504 - Gateway Timeout.

This seems to happen because the API encounters this message when attempting to query container logs:

time="2022-08-25T14:30:21.960Z" level=error msg="a container name must be specified for pod adobe-platform--[snip], choose one of: [main fluent-bit]" namespace=ns-team-adobe-platform--[snip] podName=adobe-platform--[snip]

The API does support parsing podLogOptions.container, and so by appending podLogOptions.container=main to the API call, I confirmed the call returns instantly with the logs we expect, so this is a matter of updating this at the UI level.

The container name of main can be seen here: https://github.com/argoproj/argo-events/blob/56196143ecbf8b451d5ecb02a0ca13835e20954c/controllers/sensor/resource.go#L267

A potential fix to this is here: #9438

What version are you running?

Argo Workflows v3.3.8

Diagnostics

Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.

This happens with any workflow so long as the Sensor pod has more than 1 container.

# Logs from the workflow controller:
kubectl logs -n argo deploy/workflow-controller | grep ${workflow} 

# If the workflow's pods have not been created, you can skip the rest of the diagnostics.

# The workflow's pods that are problematic:
kubectl get pod -o yaml -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

# Logs from in your workflow's wait container, something like:
kubectl logs -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

jsvk added a commit to jsvk/argo-workflows that referenced this issue Aug 29, 2022
jsvk added a commit to jsvk/argo-workflows that referenced this issue Aug 29, 2022
jsvk added a commit to jsvk/argo-workflows that referenced this issue Aug 29, 2022
@alexec alexec added the area/ui label Sep 5, 2022
alexec pushed a commit that referenced this issue Sep 5, 2022
juchaosong pushed a commit to juchaosong/argo-workflows that referenced this issue Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants