Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes interval on grafana dashboards to match scrape interval #1669

Merged

Conversation

joshleecreates
Copy link
Contributor

Changes

Changes interval on the Grafana dashboard to match scrape interval, which fixes the broken visualizations for RED metrics.

(The changes to the axisBorderShow property appear to be from the updated version of Grafana)

@joshleecreates joshleecreates requested a review from a team as a code owner July 14, 2024 20:19
@github-actions github-actions bot added the helm-update-required Requires an update to the Helm chart when released label Jul 14, 2024
@julianocosta89
Copy link
Member

hey @joshleecreates, I was still not able to see the RED metrics in Grafana.
All charts are with No Data.

@joshleecreates
Copy link
Contributor Author

hey @joshleecreates, I was still not able to see the RED metrics in Grafana. All charts are with No Data.

It worked once for me but I am now seeing the same. I suspect that when it worked I was working off of the branch with the span filtering rules to reduce cardinality, I'll test again after that is merged.

@puckpuck puckpuck linked an issue Jul 16, 2024 that may be closed by this pull request
@joshleecreates
Copy link
Contributor Author

I'm still seeing issues with Prometheus with the cardinality fix merged:

prometheus  | ts=2024-07-16T15:58:30.576Z caller=manager.go:163 level=info component="rule manager" msg="Starting rule manager..."

prometheus  | ts=2024-07-16T16:21:51.834Z caller=write_handler.go:134 level=error component=web msg="Out of order sample from remote write" err="out of order sample" series="{__name__=\"target_info\", container_id=\"3e0bddb9d06cf94fb84a2d772930cc64db46193213180c29951a1ee887391103\", docker_cli_cobra_command_path=\"docker compose\", host_arch=\"aarch64\", host_name=\"3e0bddb9d06c\", job=\"quoteservice\", os_description=\"6.6.32-linuxkit\", os_name=\"Linux\", os_type=\"linux\", os_version=\"#1 SMP Thu Jun 13 14:13:01 UTC 2024\", process_command=\"public/index.php\", process_command_args=\"[\\\"public/index.php\\\"]\", process_executable_path=\"/usr/local/bin/php\", process_owner=\"www-data\", process_pid=\"7\", process_runtime_name=\"cli\", process_runtime_version=\"8.3.9\", service_version=\"1.0.0+no-version-set\", telemetry_distro_name=\"opentelemetry-php-instrumentation\", telemetry_distro_version=\"1.0.3\", telemetry_sdk_language=\"php\", telemetry_sdk_name=\"opentelemetry\", telemetry_sdk_version=\"1.0.8\"}" timestamp=1721146894925

There's a 23 minute gap before the error appears. I checked in the beginning of that time frame and didn't see any span metrics (or Grafana errors).

I think the changes in this PR are necessary for Grafana but there is still something else going on with Prometheus.

@puckpuck
Copy link
Contributor

That second issue is different from Prometheus, and it existed before. #1622 is the related issue for it.

I pushed a couple of fixes to your branch for that Prometheus issue, and to use a 2m interval instead of 1m. I guess with a 1m metric, we need at least 2 samples for rate to work.

@puckpuck puckpuck linked an issue Jul 19, 2024 that may be closed by this pull request
@puckpuck
Copy link
Contributor

I think that Prometheus change fixed out of order samples, but now we are getting out of order exemplars.

prometheus  | ts=2024-07-19T13:05:38.103Z caller=write_handler.go:175 level=warn component=web msg="Error on ingesting out-of-order exemplars" num_dropped=1

Some searching tells me this particular thing is not yet solved in Prometheus, and there is an open issue for it.

@puckpuck
Copy link
Contributor

I've had this branch running for 4+ days, and Prometheus is stable, with the dashboards working as intended.

We still have an error in Prometheus for out of order exemplars, but that is a known issue with Prometheus, that we should track separately.

@julianocosta89 can you take another look to see if this works for you?

@julianocosta89
Copy link
Member

🥳 thanks @joshleecreates and @puckpuck!
It seems we have charts working again

@julianocosta89 julianocosta89 merged commit d7a21a7 into open-telemetry:main Jul 23, 2024
28 checks passed
@joshleecreates
Copy link
Contributor Author

Thanks @puckpuck for wrapping this up!

ahealy-newr pushed a commit to ahealy-newr/opentelemetry-demo-ahealy that referenced this pull request Jul 24, 2024
…-telemetry#1669)

* Changes interval on grafana dashboards to match scrape interval

* fix out of order sample

* use 2m interval for spanmetrics

* use 30m for out of order samples

---------

Co-authored-by: Juliano Costa <[email protected]>
Co-authored-by: Pierre Tessier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helm-update-required Requires an update to the Helm chart when released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prometheus out of order sample from remote write Spanmetrics panel always returns no results
3 participants