Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add best-effort validation for prometheus scrape interval #11376

Merged
merged 4 commits into from
Sep 26, 2023

Conversation

adleong
Copy link
Member

@adleong adleong commented Sep 16, 2023

If a Linkerd-viz metrics query has a time window which is shorter than the Prometheus scrape interval, no data will be returned. This can be confusing and unexpected.

It is difficult to validate that the time window is at least as long as the scrape interval if an external prometheus is used. Therefore, we do a best-effort validation where we validate the scrape interval if we can find it in the prometheus config in the default location.

@adleong adleong requested a review from a team as a code owner September 16, 2023 00:01
@alpeb
Copy link
Member

alpeb commented Sep 18, 2023

Prometheus exposes a config API that contains the scrape config. A more robust approach could consist on having the metrics-api controller consult that to validate the requested time window before triggering the prom query.

@adleong
Copy link
Member Author

adleong commented Sep 18, 2023

@alpeb oh nice, I'll check out the config API 👍 thanks

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just one nit 👍

}

if t < scrape_interval {
return fmt.Errorf("Time window (%s) must be at least as long as the Prometheus scrape interval (%s)", window, scrape_interval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error strings should not be capitalized

Suggested change
return fmt.Errorf("Time window (%s) must be at least as long as the Prometheus scrape interval (%s)", window, scrape_interval)
return fmt.Errorf("time window (%s) must be at least as long as the Prometheus scrape interval (%s)", window, scrape_interval)

Signed-off-by: Alex Leong <[email protected]>
@adleong adleong merged commit b567bb1 into main Sep 26, 2023
34 checks passed
@adleong adleong deleted the alex/validate-scrape-interval branch September 26, 2023 23:06
olix0r added a commit that referenced this pull request Sep 28, 2023
This edge release makes Linkerd even better.

* Added a controlPlaneVersion override to the `linkerd-control-plane` Helm chart
  to support including SHA256 image digests in Linkerd manifests (thanks
  @cromulentbanana!) ([#11406])
* Improved `linkerd viz check` to attempt to validate that the Prometheus scrape
  interval will work well with the CLI and Web query parameters ([#11376])
* Fixed an issue where the destination controller would not update pod metadata
  for profile resolutions for a pod accessed via the host network (e.g.
  HostPort endpoints) ([#11334]).
* Added a validating webhook config for httproutes.gateway.networking.k8s.io
  resources (thanks @mikutas!) ([#11150])

[#11150]: #11150
[#11334]: #11334
[#11376]: #11376
[#11406]: #11406
@olix0r olix0r mentioned this pull request Sep 28, 2023
olix0r added a commit that referenced this pull request Sep 28, 2023
This edge release makes Linkerd even better.

* Added a controlPlaneVersion override to the `linkerd-control-plane` Helm chart
  to support including SHA256 image digests in Linkerd manifests (thanks
  @cromulentbanana!) ([#11406])
* Improved `linkerd viz check` to attempt to validate that the Prometheus scrape
  interval will work well with the CLI and Web query parameters ([#11376])
* Fixed an issue where the destination controller would not update pod metadata
  for profile resolutions for a pod accessed via the host network (e.g.
  HostPort endpoints) ([#11334]).
* Added a validating webhook config for httproutes.gateway.networking.k8s.io
  resources (thanks @mikutas!) ([#11150])

[#11150]: #11150
[#11334]: #11334
[#11376]: #11376
[#11406]: #11406
olix0r added a commit that referenced this pull request Sep 29, 2023
This edge release makes Linkerd even better.

* Added a controlPlaneVersion override to the `linkerd-control-plane` Helm chart
  to support including SHA256 image digests in Linkerd manifests (thanks
  @cromulentbanana!) ([#11406])
* Improved `linkerd viz check` to attempt to validate that the Prometheus scrape
  interval will work well with the CLI and Web query parameters ([#11376])
* Improved CLI error handling to print differentiated error information when
  versioncheck.linkerd.io cannot be resolved (thanks @dtaskai) ([#11377])
* Fixed an issue where the destination controller would not update pod metadata
  for profile resolutions for a pod accessed via the host network (e.g.
  HostPort endpoints) ([#11334]).
* Added a validating webhook config for httproutes.gateway.networking.k8s.io
  resources (thanks @mikutas!) ([#11150])
* Introduced a new `multicluster check --timeout` flag to limit the time
  allowed for Kubernetes API calls (thanks @moki1202) ([#11420])

[#11150]: #11150
[#11334]: #11334
[#11376]: #11376
[#11377]: #11377
[#11406]: #11406
[#11420]: #11420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants