Skip to content

Conversation

@xichen01
Copy link
Contributor

What changes were proposed in this pull request?

  • Use the AggregatedMetrics replica the MutableRate S3G Metrics of S3GatewayMetrics.
  • The AggregatedMetrics including MutableQuantiles and MutableMinMax and MutableStat Metrics, the MutableStat is same with original MutableRate Metrics.
  • Replacing MutableRate with AggregatedMetrics ensures that the original MutableRate remains unchanged, and adds configurable MutableQuantiles and MutableMinMax metrics.

Before

[root@VM-8-3-centos ~/community/ozone]$ curl -s http://localhost:9878/prom | grep s3_gateway_metrics | grep -v '#'
s3_gateway_metrics_abort_multipart_upload_failure{hostname="VM-8-3-centos"} 0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_avg_time{hostname="VM-8-3-centos"} 0.0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_num_ops{hostname="VM-8-3-centos"} 0

After

Default (Add MutableMinMax Metrics):

[root@VM-8-3-centos ~/community/ozone]$ curl -s http://localhost:9878/prom | grep s3_gateway_metrics | grep -v '#'
s3_gateway_metrics_abort_multipart_upload_failure{hostname="VM-8-3-centos"} 0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_avg_time{hostname="VM-8-3-centos"} 0.0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_i_max_time{hostname="VM-8-3-centos"} 1.401298464324817E-45
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_i_min_time{hostname="VM-8-3-centos"} 3.4028234663852886E38
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_num_ops{hostname="VM-8-3-centos"} 0

Enable quantiles Metrics:

  <property>
    <name>oozone.s3g.metrics.percentiles.intervals.seconds</name>
    <value>15</value>
  </property>
[root@VM-8-3-centos ~/community/ozone]$ curl -s http://localhost:9878/prom | grep s3_gateway_metrics | grep -v '#'
s3_gateway_metrics_abort_multipart_upload_failure{hostname="VM-8-3-centos"} 0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns15s50th_percentile_time{hostname="VM-8-3-centos"} 0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns15s75th_percentile_time{hostname="VM-8-3-centos"} 0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns15s90th_percentile_time{hostname="VM-8-3-centos"} 0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns15s95th_percentile_time{hostname="VM-8-3-centos"} 0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns15s99th_percentile_time{hostname="VM-8-3-centos"} 0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns15s_num_ops{hostname="VM-8-3-centos"} 0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_avg_time{hostname="VM-8-3-centos"} 0.0
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_i_max_time{hostname="VM-8-3-centos"} 1.401298464324817E-45
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_i_min_time{hostname="VM-8-3-centos"} 3.4028234663852886E38
s3_gateway_metrics_abort_multipart_upload_failure_latency_ns_num_ops{hostname="VM-8-3-centos"} 0
//...

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9717

How was this patch tested?

Manually Test

*
* @param source the metrics source
* @param registry the metrics registry
* @param intervals the intervals for quantiles computation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some documentation for memory consumption implications for intervals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kerneltime
Copy link
Contributor

cc @tanvipenumudy @muskan1012

@kerneltime kerneltime changed the title Add P99 quantiles and Min/Max Metrics for S3G Performance Metrics HDDS-9717. Add P99 quantiles and Min/Max Metrics for S3G Performance Metrics Nov 20, 2023
@kerneltime
Copy link
Contributor

Can you look into the find bug failures?
Also, do you use a dashboarding software to track metrics? It would be nice to see some dashboards used shared in the repo as well? Ref: https://github.com/apache/ozone/tree/master/hadoop-ozone/dist/src/main/compose/common/grafana/dashboards

@adoroszlai
Copy link
Contributor

Can you look into the find bug failures?

These are the same false positive ones we already encountered in 2 other PRs.

@xichen01
Copy link
Contributor Author

Failed TestSecureOzoneRpcClient.testValidateBlockLengthWithCommitKey seems to be an unstable test, and this failed test is not related to this change.

@adoroszlai
Copy link
Contributor

Failed TestSecureOzoneRpcClient.testValidateBlockLengthWithCommitKey seems to be an unstable test, and this failed test is not related to this change.

Created HDDS-9758 for it.

# Conflicts:
#	hadoop-ozone/s3gateway/src/test/java/org/apache/hadoop/ozone/s3/endpoint/TestPermissionCheck.java
# Conflicts:
#	hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/metrics/S3GatewayMetrics.java
@kerneltime
Copy link
Contributor

The change is in good shape, please address the defaults review comment and rebase the branch.

pony.chen and others added 4 commits January 6, 2024 07:11
@adoroszlai
Copy link
Contributor

@kerneltime conflicts are resolved, please take another look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants