Skip to content

Conversation

@tanvipenumudy
Copy link
Contributor

@tanvipenumudy tanvipenumudy commented Mar 13, 2023

What changes were proposed in this pull request?

We are introducing latency metrics for various S3 operations in Apache Ozone to help us monitor and improve the performance. We have added the following metrics to the S3GatewayLatencyMetrics class (to their respective endpoints):

ObjectEndpoint

  • initMultipartUploadLatencyNs
  • createMultipartKeyLatencyNs
  • completeMultipartUploadLatencyNs
  • abortMultipartUploadLatencyNs
  • copyObjectLatencyNs
  • listPartsLatencyNs
  • createKeyLatencyNs
  • getKeyLatencyNs
  • headKeyLatencyNs
  • deleteKeyLatencyNs

BucketEndpoint

  • getAclLatencyNs
  • putAclLatencyNs
  • createBucketLatencyNs
  • getBucketLatencyNs
  • headBucketLatencyNs
  • deleteBucketLatencyNs
  • listMultipartUploadsLatencyNs

RootEndpoint

  • listS3BucketsLatencyNs

Each metric measures the latency for a specific S3 operation in nanoseconds. These metrics shall be used to identify performance bottlenecks and improve efficiency.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8147

How was this patch tested?

The patch has been tested over a cluster that has the Ozone services running to see what the S3 Gateway latency metrics look like. A sample screenshot of the Prometheus UI has been attached (for reference) capturing s3_gateway_latency_metrics_create_bucket_latency_ns_avg_time (in nanoseconds):

image (53)-2

@kerneltime
Copy link
Contributor

cc @SaketaChalamchala @xBis7 can you please take a look.

@xBis7
Copy link
Contributor

xBis7 commented Mar 13, 2023

@tanvipenumudy Thanks for working on this. It looks good so far. How are you planning to test it?

@tanvipenumudy
Copy link
Contributor Author

tanvipenumudy commented Mar 14, 2023

Hi @xBis7, as for testing: my plan is to use monitoring add-ons, such as Prometheus/Grafana locally to closely monitor the newly introduced S3G latency metrics, thanks.

@tanvipenumudy tanvipenumudy marked this pull request as ready for review March 16, 2023 12:03
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tanvipenumudy for working on this.

@adoroszlai
Copy link
Contributor

@tanvipenumudy thanks for updating the patch. Please check acceptance test failures.

https://github.com/tanvipenumudy/ozone/actions/runs/4675843133

@tanvipenumudy
Copy link
Contributor Author

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tanvipenumudy for updating the patch, LGTM.

@adoroszlai adoroszlai requested a review from duongkame April 13, 2023 06:10
@adoroszlai adoroszlai requested a review from jojochuang April 13, 2023 06:10
@tanvipenumudy
Copy link
Contributor Author

@duongkame, @jojochuang could you please take another look at this?

@adoroszlai adoroszlai merged commit 600bbd5 into apache:master Apr 18, 2023
@adoroszlai
Copy link
Contributor

Thanks @tanvipenumudy for the patch, @duongkame, @jojochuang, @xBis7 for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Apr 20, 2023
* master: (440 commits)
  HDDS-8445. Move PlacementPolicy back to SCM (apache#4588)
  HDDS-8335. ReplicationManager: EC Mis and Under replication handlers should handle overloaded exceptions (apache#4593)
  HDDS-8355. Intermittent failure in TestOMRatisSnapshots#testInstallSnapshot (apache#4592)
  HDDS-8444. Increase timeout of CI build (apache#4586)
  HDDS-8446. Selective checks: handle change in ci.yaml (apache#4587)
  HDDS-8440. Ozone Manager crashed with ClassCastException when deleting FSO bucket. (apache#4582)
  HDDS-7309. Enable by default GRPC between S3G and OM (apache#3820)
  HDDS-8458. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8385. Ozone can't process snapshot when service UID > 2097151 (apache#4580)
  HDDS-8424: Preserve legacy bucket getKeyInfo behavior (apache#4576)
  HDDS-8453. Mark TestDirectoryDeletingServiceWithFSO#testDirDeletedTableCleanUpForSnapshot as flaky
  HDDS-8137. [Snapshot] SnapDiff to use tombstone entries in SST files (apache#4376)
  HDDS-8270. Measure checkAccess latency for Ozone objects (apache#4467)
  HDDS-8109. Seperate Ratis and EC MisReplication Handling (apache#4577)
  HDDS-8429. Checkpoint is not closed properly in OMDBCheckpointServlet (apache#4575)
  HDDS-8253. Set ozone.metadata.dirs to temporary dir if not defined in S3 Gateway (apache#4455)
  HDDS-8400. Expose rocksdb last sequence number through metrics (apache#4557)
  HDDS-8333. ReplicationManager: Allow partial EC reconstruction if insufficient nodes available (apache#4579)
  HDDS-8147. Introduce latency metrics for S3 Gateway operations (apache#4383)
  HDDS-7908. Support OM Metadata operation Generator in `Ozone freon` (apache#4251)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants