-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-17804. Expose prometheus metrics only after a flush and dedupe with tag values #3369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17804. Expose prometheus metrics only after a flush and dedupe with tag values #3369
Conversation
|
💔 -1 overall
This message was automatically generated. |
8e80be8 to
9bee65b
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
@Kimahriman |
|
LGTM + 1. |
Ahh didn't see that one, should be fixed now. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@Kimahriman Thank you for fixing the checkstyle issues. Would you fix trailing whitespace? I'm +1 if that is addressed. |
ea51dbb to
255223c
Compare
|
Clearly don't have my dev env setup to tell me these things yet. Third try is the charm |
|
🎊 +1 overall
This message was automatically generated. |
aajisaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @Kimahriman
...oject/hadoop-common/src/main/java/org/apache/hadoop/metrics2/sink/PrometheusMetricsSink.java
Show resolved
Hide resolved
|
💔 -1 overall
This message was automatically generated. |
|
Seems to be a race condition in the test itself, investigating... |
|
🎊 +1 overall
This message was automatically generated. |
aajisaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Thank you for the update.
… with tag values (#3369) Signed-off-by: Akira Ajisaka <[email protected]> (cherry picked from commit 4ced012)
… with tag values (apache#3369) Signed-off-by: Akira Ajisaka <[email protected]>
Description of PR
Fixes a bug with the Prometheus metrics sink where metrics were deduped on their name alone, and didn't include the tag values for deduplication purposes. Prometheus metrics are uniquely identified by their name and labels, so several metrics were just getting dropped. Specifically things like RPC metrics were only including one of the servers/ports per metric type, and Yarn queue metrics only included metrics for one queue.
Additionally, because of the "push" nature of Hadoop metrics, this would end up creating a lot of extra metrics for things where the tags can change over time but they still actually mean the same thing. For example, the
hastateof namenode metrics can change, but you really only want the most recent one. To address this, I changed it to only expose metrics after aflushcall, and to start fresh after eachflushcall. This prevents old metrics from hanging around and constantly being exposed until the service is restarted.There are still some "bad" tags that are exposed which can lead to multiple Prometheus series being created when really they are the same thing. However, these can be dealt with on the Prometheus side, ignoring certain labels, rather than trying to hard code all the bad tags on the Hadoop side.
I don't think there should be any threading/race conditions with publishing metrics, since the publish metrics methods are synchronized.
Also adds the help line to the output.
How was this patch tested?
New unit tests.
For code changes:
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?If applicable, have you updated theLICENSE,LICENSE-binary,NOTICE-binaryfiles?