Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkerd viz prometheus attempts to scrape metrics from completed argo workflow pods #13346

Open
bwmetcalf opened this issue Nov 19, 2024 · 0 comments
Labels

Comments

@bwmetcalf
Copy link

What is the issue?

If argo workflow pods are injected with linkerd-proxy, once they go into a completed state, viz prometheus will still attempt to scrape metrics from them resulting in a high rate of 504s

{"caller":"scrape.go:1400","component":"scrape manager","err":"server returned HTTP status 504 Gateway Timeout","level":"debug","msg":"Scrape failed","scrape_pool":"linkerd-proxy","target":"http://10.3.136.62:4191/metrics","ts":"2024-11-19T01:47:24.385Z"}

linkerd prometheus should be smart enough to not attempt to scrape metrics from completed pods. Argo server has the ability to keep a configurable number of workflow pods before they are deleted which is desirable for troubleshooting, for example.

How can it be reproduced?

Create an meshed argo workflow pod and when it completes prometheus will try to scrape metrics against an unresponsive pod and throw a 504.

Logs, error output, etc

See above.

output of linkerd check -o short

% linkerd check -o short
linkerd-version
---------------
‼ cli is up-to-date
    unsupported version channel: stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane and cli versions match
    control plane running edge-24.11.3 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies and cli versions match
    linkerd-destination-5ddc58f9bc-5x9nh running edge-24.11.3 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-cp-proxy-cli-version for hints

linkerd-ha-checks
-----------------
‼ pod injection disabled on kube-system
    kube-system namespace needs to have the label config.linkerd.io/admission-webhooks: disabled if injector webhook failure policy is Fail
    see https://linkerd.io/2.14/checks/#l5d-injection-disabled for hints

linkerd-viz
-----------
‼ viz extension proxies and cli versions match
    metrics-api-5789bcc5d-2zdck running edge-24.11.3 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

Server Version: v1.29.8-eks-a737599

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

@bwmetcalf bwmetcalf added the bug label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant