-
Notifications
You must be signed in to change notification settings - Fork 590
HDDS-7576. Prometheus metrics do not remove stale metrics until restart #4057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@kerneltime Most methods from This class should only be testing Also, if you try to publish the metrics more than once, you get I think we should fix any issues, cleanup and refactor |
|
@smengcl This is the PR with the test changes we were discussing. |
|
@xBis7 I have not tried out the change against a UI, I have a question. Let's say no more objects are being created in a bucket. Will this drop reporting the object count as a metric post flush? What does a dashboard such as graphana report for the metric? |
@kerneltime No, it won't. Although, a metric might not have an updated value, it will still be pushed to the sink and therefore presented. This PR removes only the metrics that get unregistered. It wouldn't make sense to present only the metrics that get an update on their value because we would end up with different metrics every time and it would be really hard to track changes. This issue was discoreved in #3781 where there were some metrics that after some operations we would unregister them but |
|
As part of a separate review do you want to look into if |
|
@kerneltime Thanks for reviewing this. I made all reads and writes synchronized for thread-safety. |
|
Thank you @xBis7 for your contribution! |
What changes were proposed in this pull request?
For Prometheus, if a metric is unregistered and not pushed to the sink any more it will still exist in the map in
PrometheusMetricsSinkand it will be presented to the user. In this PR, we are storing all the metrics pushed to the sink to a second map which will be cleared every time we callflush(). This way, if a metric is stale and not pushed to the sink, it will not be presented to the user. This implementation is the same that was followed for hadoop in this PR. The other issues described in the hadoop PR seem to have been previously resolved for ozone.This problem was uncovered and discussed here: #3781 (comment)
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7576
How was this patch tested?
This patch was tested with a new unit test. It was also tested manually in docker for the case discussed in #3781. The metrics that become stale are no longer available after a short period of time.