-
Notifications
You must be signed in to change notification settings - Fork 589
HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric #3878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CC @duongkame |
|
@kerneltime any updates on this? |
|
Thanks @xBis7 for the patch.
I still think this should rather be a change in |
|
@xBis7 I understand the need, but I think there was intent in the original metric, and this comes from being able to diagnose which application/user is seeing what metric. Is this for |
|
@kerneltime @duongkame Thanks for taking a look at the patch.
I agree, that's why there are no functional changes. The metric that gets passed back and forth is exactly the same and we are only filtering how the endpoint presents it, like here with the RocksDb metrics.
It was used with
That was my first thought but it seems too complex for what we need. We don't want to change the metric or reduce the times it gets registered. We need the name to be consistent, so that it's easier to track with a wildcard search or something like that, but still want to be able to see the user for every registry. |
This applies to both @kerneltime. This prometheus metric, |
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/DecayRpcSchedulerUtil.java
Outdated
Show resolved
Hide resolved
|
Thanks @xBis7 for this important patch. Thanks @duongkame and @kerneltime for reviewing this PR and for your comments. |
…ername in the metric (apache#3878)
* master: (110 commits) HDDS-7472. EC: Fix NSSummaryEndpoint#getDiskUsage for EC keys (apache#3987) HDDS-5704. Ozone URI syntax description in help content needs to mention about ozone service id (apache#3862) HDDS-7555. Upgrade Ratis to 2.4.2-8b8bdda-SNAPSHOT. (apache#4028) HDDS-7541. FSO recursive delete directory with hierarchy takes much time for cleanup (apache#4008) HDDS-7581. Fix update-jar-report for snapshot (apache#4034) HDDS-7253. Fix exception when '/' in key name (apache#4038) HDDS-7579. Use Netty 4.1.77 for consistency (apache#4031) HDDS-7562. Suppress warning about long filenames in tar (apache#4017) HDDS-7563. Add a handler for under replicated Ratis containers in RM (apache#4025) HDDS-7497. Fix mkdir does not update bucket's usedNamespace (apache#3969) HDDS-7567. Invalid entries in LICENSE (apache#4020) HDDS-7575. Correct showing of RATIS-THREE icon in Recon UI (apache#4026) HDDS-7540. Let reusable workflow inherit secrets (apache#4012) HDDS-7568. Bump copyright year in NOTICE (apache#4018) HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric (apache#3878) HDDS-7510. Recon: Return number of open containers in `/clusterState` endpoint (apache#3989) HDDS-7561. Improve setquota, clrquota CLI usage (apache#4016) HDDS-6615. EC: Improve write performance by pipelining encode and flush (apache#3994) HDDS-7554. Recon UI should show DORMANT in pipeline status filter (apache#4010) HDDS-7540. Separate scheduled CI from push/PR workflows (apache#4004) ...
What changes were proposed in this pull request?
On the Prometheus endpoint for the OM, in the DecayRpcScheduler summary for users, the username is exposed in the metric name. It makes almost impossible to monitor these values as every time a new user shows up we need to register a new metrics name.
The metric name from
org_apache_hadoop_ipc_decay_rpc_scheduler_volumebecomesorg_apache_hadoop_ipc_decay_rpc_scheduler_caller_hadoop_volumefor a user withhadoopusername.The proposed solution is to filter the logs and remove the username from the metric and add it in a username tag.
This metric comes from
hadoop-common-3.3.4.jar/DecayRpcSchedulerand more specificallyThe name is in the format
Caller(username).MetricTypeeg.Caller(hadoop).Volume. The username might exist in the metric name for a purpose. We don't want to change the way the metric works but the way it's presented. The cleanest way to deal with this seems to filter the metric inPrometheusMetricsSink. We are not making any functional changes but we are only filtering what is presented to the user.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7394
How was this patch tested?
New unit tests were added for this patch. It was also tested manually, with a docker cluster and the OM
/promendpoint.To test it in a docker environment:
in
compose/ozoneadd in docker-configthen
on your browser go to
http://localhost:9874/promand you should see