You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're using a Kubernetes mutating webhook to inject a fluent-bit container into certain running pods for the purposes of collecting logs and sending them to Splunk. When attempting to view Sensor logs in the UI, the Argo Workflows API hangs and eventually returns a 504 - Gateway Timeout.
This seems to happen because the API encounters this message when attempting to query container logs:
time="2022-08-25T14:30:21.960Z" level=error msg="a container name must be specified for pod adobe-platform--[snip], choose one of: [main fluent-bit]" namespace=ns-team-adobe-platform--[snip] podName=adobe-platform--[snip]
The API does support parsing podLogOptions.container, and so by appending podLogOptions.container=main to the API call, I confirmed the call returns instantly with the logs we expect, so this is a matter of updating this at the UI level.
Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.
This happens with any workflow so long as the Sensor pod has more than 1 container.
# Logs from the workflow controller:
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}# If the workflow's pods have not been created, you can skip the rest of the diagnostics.# The workflow's pods that are problematic:
kubectl get pod -o yaml -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
# Logs from in your workflow's wait container, something like:
kubectl logs -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered:
Checklist
Summary
What happened/what you expected to happen?
We're using a Kubernetes mutating webhook to inject a fluent-bit container into certain running pods for the purposes of collecting logs and sending them to Splunk. When attempting to view Sensor logs in the UI, the Argo Workflows API hangs and eventually returns a 504 - Gateway Timeout.
This seems to happen because the API encounters this message when attempting to query container logs:
The API does support parsing
podLogOptions.container
, and so by appendingpodLogOptions.container=main
to the API call, I confirmed the call returns instantly with the logs we expect, so this is a matter of updating this at the UI level.The container name of
main
can be seen here: https://github.com/argoproj/argo-events/blob/56196143ecbf8b451d5ecb02a0ca13835e20954c/controllers/sensor/resource.go#L267A potential fix to this is here: #9438
What version are you running?
Argo Workflows v3.3.8
Diagnostics
Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.
This happens with any workflow so long as the Sensor pod has more than 1 container.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: