-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8147. Introduce latency metrics for S3 Gateway operations #4383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @SaketaChalamchala @xBis7 can you please take a look. |
|
@tanvipenumudy Thanks for working on this. It looks good so far. How are you planning to test it? |
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java
Outdated
Show resolved
Hide resolved
|
Hi @xBis7, as for testing: my plan is to use monitoring add-ons, such as Prometheus/Grafana locally to closely monitor the newly introduced S3G latency metrics, thanks. |
hadoop-hdds/common/src/main/java/org/apache/hadoop/util/GenericCheckedSupplier.java
Outdated
Show resolved
Hide resolved
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tanvipenumudy for working on this.
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/BucketEndpoint.java
Outdated
Show resolved
Hide resolved
...zone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/metrics/S3GatewayLatencyMetrics.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/BucketEndpoint.java
Outdated
Show resolved
Hide resolved
|
@tanvipenumudy thanks for updating the patch. Please check acceptance test failures. https://github.com/tanvipenumudy/ozone/actions/runs/4675843133 |
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/BucketEndpoint.java
Outdated
Show resolved
Hide resolved
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tanvipenumudy for updating the patch, LGTM.
|
@duongkame, @jojochuang could you please take another look at this? |
|
Thanks @tanvipenumudy for the patch, @duongkame, @jojochuang, @xBis7 for the review. |
* master: (440 commits) HDDS-8445. Move PlacementPolicy back to SCM (apache#4588) HDDS-8335. ReplicationManager: EC Mis and Under replication handlers should handle overloaded exceptions (apache#4593) HDDS-8355. Intermittent failure in TestOMRatisSnapshots#testInstallSnapshot (apache#4592) HDDS-8444. Increase timeout of CI build (apache#4586) HDDS-8446. Selective checks: handle change in ci.yaml (apache#4587) HDDS-8440. Ozone Manager crashed with ClassCastException when deleting FSO bucket. (apache#4582) HDDS-7309. Enable by default GRPC between S3G and OM (apache#3820) HDDS-8458. Mark TestBlockDeletion#testBlockDeletion as flaky HDDS-8385. Ozone can't process snapshot when service UID > 2097151 (apache#4580) HDDS-8424: Preserve legacy bucket getKeyInfo behavior (apache#4576) HDDS-8453. Mark TestDirectoryDeletingServiceWithFSO#testDirDeletedTableCleanUpForSnapshot as flaky HDDS-8137. [Snapshot] SnapDiff to use tombstone entries in SST files (apache#4376) HDDS-8270. Measure checkAccess latency for Ozone objects (apache#4467) HDDS-8109. Seperate Ratis and EC MisReplication Handling (apache#4577) HDDS-8429. Checkpoint is not closed properly in OMDBCheckpointServlet (apache#4575) HDDS-8253. Set ozone.metadata.dirs to temporary dir if not defined in S3 Gateway (apache#4455) HDDS-8400. Expose rocksdb last sequence number through metrics (apache#4557) HDDS-8333. ReplicationManager: Allow partial EC reconstruction if insufficient nodes available (apache#4579) HDDS-8147. Introduce latency metrics for S3 Gateway operations (apache#4383) HDDS-7908. Support OM Metadata operation Generator in `Ozone freon` (apache#4251) ...
What changes were proposed in this pull request?
We are introducing latency metrics for various S3 operations in Apache Ozone to help us monitor and improve the performance. We have added the following metrics to the
S3GatewayLatencyMetricsclass (to their respective endpoints):ObjectEndpoint
BucketEndpoint
RootEndpoint
Each metric measures the latency for a specific S3 operation in nanoseconds. These metrics shall be used to identify performance bottlenecks and improve efficiency.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8147
How was this patch tested?
The patch has been tested over a cluster that has the Ozone services running to see what the S3 Gateway latency metrics look like. A sample screenshot of the Prometheus UI has been attached (for reference) capturing
s3_gateway_latency_metrics_create_bucket_latency_ns_avg_time(in nanoseconds):