-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-7309. Enable by default GRPC between S3G and OM #3820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I would like to see OM peak performance numbers with Hadoop RPC vs grpc. I understand that per client, the performance of grpc is better, but we need to check the server side of the performance as well. Should be a quick test to run. |
|
cc @duongkame |
|
@kerneltime As we said at the apacheCon, we will look at having this number when we have a bit more time, however this should not be a prerequisite to enable that feature by default. We already demonstrate that it increase the number of ops the S3G can handle and the user can still disable it if they want. Enabling it by default will allow to have more people to try it and it to be sure it doesnt break in the future with new pr |
+1. I also think that the full gRpc scalability test is not required to enable the protocol on the server-side by default. That may be needed when we indeed start using gRpc as the default protocol on the client sides, like S3G or OFS. |
...rc/test/resources/META-INF/services/org.apache.hadoop.ozone.om.protocolPB.OmTransportFactory
Show resolved
Hide resolved
Makes sense. I think we should rename the Jira to indicate that this just starts the GRPC server on the OM and does not switch the client in S3G to start using it. |
Maybe I'm missing something, but it seems to me this patch does switch S3 Gateway to use gRPC for talking to OM by default.
Line 15 in 46e58a6
if the value configured for ozone/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java Lines 341 to 343 in 46e58a6
Prior to this patch default value of Lines 48 to 54 in 46e58a6
So From The first log message is one I added in Also, enabling it only in OM, but not in S3 Gateway, would not have the following benefit:
|
|
@adoroszlai is right. We are enabling gRPC for S3G here. |
Thank you @adoroszlai. @xBis7 would be possible to get peak OM performance numbers, I would like to validate that the peak OM performance stays the same or is better with GRPC. I understand that each client is seeing better performance but it would be prudent to check the peak performance on OM as well. |
...one/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
Outdated
Show resolved
Hide resolved
|
Hi @adoroszlai @smengcl , In dev email communication, this PR is required by Ozone 1.3. Currently the release of Ozone-1.3 is blocked on this PR. Could you help continue to review this PR? cc @neils-dev |
|
@captainzmc, there is no need for further review of this PR, it is good for I don't see dev communication that this is blocking the release. I see a request to include it in the release if possible. I'm not in favor of making such a change so late in the 1.3 release cycle. Enabling OM gRPC is a matter of simple config change, anyone can do it manually. I think it makes sense to first ship it disabled by default, let users enable on an as-needed basis and provide feedback, then enable in the next release based on that. |
@adoroszlai Agree with you. In fact, users can change the configuration to decide whether to use grpc. I will remove this issue from the block list in 1.3. Then we can start preparing for 1.3.0-rc0. |
|
Thanks @captainzmc for following up for this PR on the 1.3.0-rc branch and @adoroszlai for comments on that. I did request that this PR be included in the 1.3.0-rc but the comment from @adoroszlai is great, with it merged into the master as the default s3g config and allowing the 1.3.0-rc preparations without it,
|
|
Hello guys~ looks like this PR is ready to be merged. Maybe we could go ahead to merge? |
|
@DaveTeng0 There is one bit of information we are waiting for, we need to measure the peak OM performance for Hadoop RPC vs GRPC and then this can be merged if there is no regression. |
|
@xBis7 we did some initial tests and OM peak performance is about the same between the 2. |
|
@xBis7 this test error seems to be related, failed in both fork and PR runs: https://github.com/xBis7/ozone/actions/runs/4534441887/jobs/7988627717#step:5:3707 |
|
@adoroszlai Thanks for updating the PR and pointing out the test failure. I'll take a look. |
|
We added this file I've removed it and run a workflow on a dummy branch successfully. |
|
Thanks @xBis7 for the patch, @duongkame, @kerneltime, @smengcl for the review. |
* master: (440 commits) HDDS-8445. Move PlacementPolicy back to SCM (apache#4588) HDDS-8335. ReplicationManager: EC Mis and Under replication handlers should handle overloaded exceptions (apache#4593) HDDS-8355. Intermittent failure in TestOMRatisSnapshots#testInstallSnapshot (apache#4592) HDDS-8444. Increase timeout of CI build (apache#4586) HDDS-8446. Selective checks: handle change in ci.yaml (apache#4587) HDDS-8440. Ozone Manager crashed with ClassCastException when deleting FSO bucket. (apache#4582) HDDS-7309. Enable by default GRPC between S3G and OM (apache#3820) HDDS-8458. Mark TestBlockDeletion#testBlockDeletion as flaky HDDS-8385. Ozone can't process snapshot when service UID > 2097151 (apache#4580) HDDS-8424: Preserve legacy bucket getKeyInfo behavior (apache#4576) HDDS-8453. Mark TestDirectoryDeletingServiceWithFSO#testDirDeletedTableCleanUpForSnapshot as flaky HDDS-8137. [Snapshot] SnapDiff to use tombstone entries in SST files (apache#4376) HDDS-8270. Measure checkAccess latency for Ozone objects (apache#4467) HDDS-8109. Seperate Ratis and EC MisReplication Handling (apache#4577) HDDS-8429. Checkpoint is not closed properly in OMDBCheckpointServlet (apache#4575) HDDS-8253. Set ozone.metadata.dirs to temporary dir if not defined in S3 Gateway (apache#4455) HDDS-8400. Expose rocksdb last sequence number through metrics (apache#4557) HDDS-8333. ReplicationManager: Allow partial EC reconstruction if insufficient nodes available (apache#4579) HDDS-8147. Introduce latency metrics for S3 Gateway operations (apache#4383) HDDS-7908. Support OM Metadata operation Generator in `Ozone freon` (apache#4251) ...
What changes were proposed in this pull request?
After discussions and testing, submitting this patch to enable GRPC server by default between S3G and OM, to improve performance.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7309
How was this patch tested?
This patch was tested manually.