Linkerd viz prometheus attempts to scrape metrics from completed argo workflow pods #13346

bwmetcalf · 2024-11-19T03:20:40Z

What is the issue?

If argo workflow pods are injected with linkerd-proxy, once they go into a completed state, viz prometheus will still attempt to scrape metrics from them resulting in a high rate of 504s

{"caller":"scrape.go:1400","component":"scrape manager","err":"server returned HTTP status 504 Gateway Timeout","level":"debug","msg":"Scrape failed","scrape_pool":"linkerd-proxy","target":"http://10.3.136.62:4191/metrics","ts":"2024-11-19T01:47:24.385Z"}

linkerd prometheus should be smart enough to not attempt to scrape metrics from completed pods. Argo server has the ability to keep a configurable number of workflow pods before they are deleted which is desirable for troubleshooting, for example.

How can it be reproduced?

Create an meshed argo workflow pod and when it completes prometheus will try to scrape metrics against an unresponsive pod and throw a 504.

Logs, error output, etc

See above.

output of `linkerd check -o short`

% linkerd check -o short
linkerd-version
---------------
‼ cli is up-to-date
    unsupported version channel: stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane and cli versions match
    control plane running edge-24.11.3 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies and cli versions match
    linkerd-destination-5ddc58f9bc-5x9nh running edge-24.11.3 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-cp-proxy-cli-version for hints

linkerd-ha-checks
-----------------
‼ pod injection disabled on kube-system
    kube-system namespace needs to have the label config.linkerd.io/admission-webhooks: disabled if injector webhook failure policy is Fail
    see https://linkerd.io/2.14/checks/#l5d-injection-disabled for hints

linkerd-viz
-----------
‼ viz extension proxies and cli versions match
    metrics-api-5789bcc5d-2zdck running edge-24.11.3 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

Server Version: v1.29.8-eks-a737599

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

The text was updated successfully, but these errors were encountered:

bwmetcalf added the bug label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linkerd viz prometheus attempts to scrape metrics from completed argo workflow pods #13346

Linkerd viz prometheus attempts to scrape metrics from completed argo workflow pods #13346

bwmetcalf commented Nov 19, 2024

Linkerd viz prometheus attempts to scrape metrics from completed argo workflow pods #13346

Linkerd viz prometheus attempts to scrape metrics from completed argo workflow pods #13346

Comments

bwmetcalf commented Nov 19, 2024

What is the issue?

How can it be reproduced?

Logs, error output, etc

output of linkerd check -o short

Environment

Possible solution

Additional context

Would you like to work on fixing this bug?

output of `linkerd check -o short`