HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g #1031

runzhiwang · 2020-06-06T06:00:53Z

What changes were proposed in this pull request?

What's the problem ?
I start a ozone cluster with 1000 datanodes and 10 s3gateway, and run two weeks with heavy workload, and perf om and scm.

From om perf, getServiceList cost 63.75% cpu.

2.From scm perf, queryNode come from om::getServiceList cost 33.20% cpu

What's the reason ?
Now s3g create a client for each request (we should improve it in the future). when create each RpcClient, s3g will call ozoneManagerClient.getServiceInfo(), getServiceInfo will call getServiceList. getServiceList is a very heavy call, because it need to get DatanodeDetails of 1000 Datanodes, so om and scm are very busy with getServiceList.

But s3g does not use the List(ServiceInfo) which got from getServiceList at all.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3745

How was this patch tested?

Existed tests.

hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java

runzhiwang · 2020-06-10T01:27:13Z

@elek Could you help review this patch ? Thank you very much.

elek · 2020-06-10T13:30:06Z

Thanks the patch @runzhiwang

It looks correct me, but it's also a question about the long-term usage of getServiceInfo. Originally it was introduced (as far as I remember) to get the address of the SCM. But over the time the client is improved to avoid all the direct calls to the SCM.

I agree, that long-term we should use proxy users for S3 and pooling the connections.

Short-term this patch looks good to me, but why don't we use the getServiceInfo in case of secure clusters? Do we need to replace it with something simple.

I would be interested about the opinion of @nandakumar131. As far as I remember he worked on the original implementation.

Personally I think a generic getServiceClient can be useful. For example active storage-class-es can be downloaded from the server at the beginning of the connection. But that's a long term plan and this patch can help short term.

Let's wait for more opinions.

nandakumar131 · 2020-06-11T15:45:51Z

Some background
gerServiceInfo call was introduced as part of Service Discovery API (HDFS-12868).
The idea behind Service Discovery API was to not worry about setting multiple properties at client to connect to Ozone.
Point the client to any one of the OzoneManager (even in case of HA), the client should be able to get all the required configuration and be able to connect to Ozone.
This will make the client configuration simple.

When gerServiceInfo was introduced, Ozone client just needed OM's IP and port to connect, nothing else. Over time we made lot of code change and never revisited Service Discovery API.
After OM HA, it is no longer true that we can discover other services and required configurations just with single OM's IP/Port. We need ozone-site.xml at client and set of properties that will allow us to interact with OM HA.

Now I'm not even sure where and why gerServiceInfo is getting used.

bharatviswa504 · 2020-06-11T16:39:00Z

Some background
gerServiceInfo call was introduced as part of Service Discovery API (HDFS-12868).
The idea behind Service Discovery API was to not worry about setting multiple properties at client to connect to Ozone.
Point the client to any one of the OzoneManager (even in case of HA), the client should be able to get all the required configuration and be able to connect to Ozone.
This will make the client configuration simple.

When gerServiceInfo was introduced, Ozone client just needed OM's IP and port to connect, nothing else. Over time we made lot of code change and never revisited Service Discovery API.
After OM HA, it is no longer true that we can discover other services and required configurations just with single OM's IP/Port. We need ozone-site.xml at client and set of properties that will allow us to interact with OM HA.

Now I'm not even sure where and why gerServiceInfo is getting used.

For OM HA, the reason for not using the getServiceInfo was if client knows one of the OzoneManager addresses, and if that is down, client will not be able to talk to the ozone cluster.
Even right now getServiceInfo, will now return quorum of OM address with each role, but this information cannot be used for the above reason, so the client still needs a quorum of OM address.

nandakumar131 · 2020-06-11T17:03:27Z

@bharatviswa504 I understand the reason behind it.

We should revisit Service Discovery API and make a decision on whether we can make it work or should we remove it completely.

bharatviswa504 · 2020-06-11T17:08:25Z

From the code point of I see the usage of getServiceInfo used to get CaCert and used by XceiverClientRatis and Grpc.

    ServiceInfoEx serviceInfoEx = ozoneManagerClient.getServiceInfo();
    String caCertPem = null;
    if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
      caCertPem = serviceInfoEx.getCaCertificate();
    }

Looking into the PR code now it has eliminated using this getServiceInfo, and added a new API for getcaCert when security is enabled.

bharatviswa504 · 2020-06-11T17:10:30Z

@bharatviswa504 I understand the reason behind it.

We should revisit Service Discovery API and make a decision on whether we can make it work or should we remove it completely.

After OM HA, it is no longer true that we can discover other services and required >configurations just with single OM's IP/Port. We need ozone-site.xml at client and set of .properties that will allow us to interact with OM HA.

Agreed. I just want to give you a context for the above statement mentioned.

runzhiwang · 2020-06-11T23:49:03Z

@elek @nandakumar131 @bharatviswa504 Hi, Thanks for review. The problem of getServiceInfo is that it contains all the datanodes' information which cost too much if the cluster is big. If it only contain OM address, there is no need to replace it with getCaCertificate.

xiaoyuyao · 2020-06-12T00:10:46Z

@runzhiwang the reason the serviceinfo request retrieve all the DNs is for the DN REST service address. After the Ozone REST support was removed a while back, those leftover are not cleaned up completely. Maybe a time to clean that up to make it lighter without introducing a new RPC?

runzhiwang · 2020-06-12T00:25:15Z

@xiaoyuyao Thanks for explanation, I agree and I will clean that up.

bharatviswa504 · 2020-06-12T00:29:28Z

@elek @nandakumar131 @bharatviswa504 Hi, Thanks for review. The problem of getServiceInfo is that it contains all the datanodes' information which cost too much if the cluster is big. If it only contain OM address, there is no need to replace it with getCaCertificate.

Thank You @runzhiwang for the explanation of the issue. I like @xiaoyuyao proposal, if DN information is no more required, we can clean that up, and use getServiceInfo instead of a new RPC call.

…ormation to s3g

codecov-commenter · 2020-06-15T02:03:14Z

Codecov Report

Merging #1031 into master will decrease coverage by 0.05%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master    #1031      +/-   ##
============================================
- Coverage     70.38%   70.33%   -0.06%     
+ Complexity     9234     9227       -7     
============================================
  Files           961      961              
  Lines         48130    48115      -15     
  Branches       4676     4674       -2     
============================================
- Hits          33877    33842      -35     
- Misses        12001    12020      +19     
- Partials       2252     2253       +1

Impacted Files	Coverage Δ	Complexity Δ
.../java/org/apache/hadoop/ozone/om/OzoneManager.java	`63.94% <ø> (+0.04%)`	`184.00 <0.00> (-1.00)`	⬆️
.../apache/hadoop/hdds/scm/node/StaleNodeHandler.java	`88.88% <0.00%> (-11.12%)`	`4.00% <0.00%> (ø%)`
...er/common/transport/server/GrpcXceiverService.java	`70.00% <0.00%> (-10.00%)`	`3.00% <0.00%> (ø%)`
...ntainerLocationProtocolServerSideTranslatorPB.java	`38.31% <0.00%> (-5.15%)`	`17.00% <0.00%> (-2.00%)`
...ntainerLocationProtocolClientSideTranslatorPB.java	`41.17% <0.00%> (-3.93%)`	`20.00% <0.00%> (-2.00%)`
...adoop/hdds/scm/server/SCMClientProtocolServer.java	`47.65% <0.00%> (-3.52%)`	`23.00% <0.00%> (-4.00%)`
...g/apache/hadoop/hdds/protocol/DatanodeDetails.java	`88.23% <0.00%> (-1.69%)`	`29.00% <0.00%> (-1.00%)`
.../common/states/endpoint/HeartbeatEndpointTask.java	`69.81% <0.00%> (-1.26%)`	`25.00% <0.00%> (-1.00%)`
...p/ozone/container/keyvalue/helpers/ChunkUtils.java	`85.45% <0.00%> (-0.91%)`	`30.00% <0.00%> (-1.00%)`
...iner/common/statemachine/DatanodeStateMachine.java	`82.87% <0.00%> (-0.56%)`	`29.00% <0.00%> (-1.00%)`
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23034fb...13fb742. Read the comment docs.

runzhiwang · 2020-06-15T02:24:46Z

@xiaoyuyao @bharatviswa504 Thanks for review. I have updated the patch.

bharatviswa504 · 2020-06-15T04:21:22Z

Thank You for the update.
Can we also make the change in RpcClient.java, calling getServiceInfo only when security is enabled?

elek · 2020-06-15T13:25:14Z

Can we also make the change in RpcClient.java, calling getServiceInfo only when security is enabled?

I have some fear if the secre/unsecure cluster would have such difference. For example, I should run all the performance tests on secure cluster. We should be careful and execute all the tests on both secure / unsecure cluster. Seems to be safer to keep it as is. But I can be convinced.

xiaoyuyao · 2020-06-16T17:32:30Z

The getServiceInfo could be used for non-secure configuration from the server in future. I agree with @elek to leave it as-is.

bharatviswa504 · 2020-06-16T17:48:30Z

Can we also make the change in RpcClient.java, calling getServiceInfo only when security is enabled?

I have some fear if the secre/unsecure cluster would have such difference. For example, I should run all the performance tests on secure cluster. We should be careful and execute all the tests on both secure / unsecure cluster. Seems to be safer to keep it as is. But I can be convinced.

From code, I see that we use getServiceInfo value only when security is enabled. So, from S3G perspective, for each request, we can save one Rpc Call.

    ServiceInfoEx serviceInfoEx = ozoneManagerClient.getServiceInfo();
    String caCertPem = null;
    if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
      caCertPem = serviceInfoEx.getCaCertificate();
    }

I have not understood, what is the problem mentioned here, as from code we can see the value from getServiceInfo is only used when security is enabled.

If someone has mistakenly used some of the information from getServiceInfo mistakenly with out initializing it, it will be caught immediately as it will throw NPE. And also we are not initializing serviceinfo, to any class parameter, it is locally used in the RpcClient constructor only.

xiaoyuyao · 2020-06-16T18:04:38Z

I have not understood, what is the problem mentioned here, as from code we can see the value from getServiceInfo is only used when security is enabled.

getServiceInfo is the service discovery API for OM, which is not for security only. If we go with the other route, i.e., a dedicated API for getCaCertificate(), I'm OK with including check for security enabled.

bharatviswa504 · 2020-06-17T17:48:07Z

But right now getServiceInfo is used only to getCaCertificate, other than that in current code there is no usage of it. As now the client does not need an SCM address, for OM, we get a quorum of address from config, and also we don't need DN address which this patch has eliminated.

If we can skip getServiceInfo call in a non-secure cluster where we can save one RPC Call. I am fine with reusing this API or with totally a new API.

elek · 2020-06-22T11:29:55Z

So what's next here? I propose to merge this as is (clean simple patch, just remove the datanode part).

If getServiceInfo() is not required anymore: we can remove it fully in a separated patch. (Not just the usage, but we can deprecate and/or remove the server side parts.

@bharatviswa504, @xiaoyuyao Are you ok with this approach?

bharatviswa504 · 2020-06-24T00:13:38Z

Fine with me. In this Jira let's focus on removing datanode part.

I will open a new Jira for removing the usage of getServiceInfo.

+1 LGTM.

bharatviswa504 · 2020-06-24T00:15:13Z

Thank You @runzhiwang for the contribution and @elek and @xiaoyuyao for the review.

* upstream/master: (56 commits) HDDS-3264. Fix TestCSMMetrics.java. (apache#1120) HDDS-3858. Remove support to start Ozone and HDFS datanodes in the same JVM (apache#1117) HDDS-3704. Update all the documentation to use ozonefs-hadoop2/3 instead of legacy/current (apache#1099) HDDS-3773. Add OMDBDefinition to define structure of om.db. (apache#1076) Revert "HDDS-3263. Fix TestCloseContainerByPipeline.java. (apache#1119)" (apache#1126) HDDS-3821. Disable Ozone SPNEGO should not fall back to hadoop.http.a… (apache#1101) HDDS-3819. OzoneManager#listVolumeByUser ignores userName parameter when ACL is enabled (apache#1087) HDDS-3779. Add csi interface documents to show how to use ozone csi (apache#1059) HDDS-3857. Datanode in compose/ozonescripts can't be started (apache#1116) HDDS-3430. Enable TestWatchForCommit test cases. (apache#1114) HDDS-3263. Fix TestCloseContainerByPipeline.java. (apache#1119) HDDS-3512. s3g multi-part-upload saved incorrect content using streaming (apache#1092) HDDS-3836. Modify ContainerPlacementPolicyFactory JavaDoc (apache#1097) HDDS-3780. Replace the imagePullPolicy from always to IfNotPresent (apache#1055) HDDS-3847. Change OMNotLeaderException logging to DEBUG (apache#1118) HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g (apache#1031) HDDS-3286. BasicOzoneFileSystem support batchDelete. (apache#814) HDDS-3850. Update the admin document to let user know how to show the status of all rules. (apache#1109) HDDS-3848. Add ratis.thirdparty.version in main pom.xml (apache#1108) HDDS-3815. Avoid buffer copy in ContainerCommandRequestProto. (apache#1085) ...

runzhiwang force-pushed the getServiceList branch 2 times, most recently from 5757fd1 to 5e11bb4 Compare June 6, 2020 07:13

runzhiwang changed the title ~~HDDS-3745. Improve OM and SCM performance with 50% by avoid call getServiceInfo in s3g~~ HDDS-3745. Improve OM and SCM performance with 63% by avoid call getServiceInfo in s3g Jun 6, 2020

runzhiwang changed the title ~~HDDS-3745. Improve OM and SCM performance with 63% by avoid call getServiceInfo in s3g~~ HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g Jun 6, 2020

GlenGeng-awx reviewed Jun 8, 2020

View reviewed changes

hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java Outdated Show resolved Hide resolved

runzhiwang force-pushed the getServiceList branch from 5e11bb4 to 3325a78 Compare June 8, 2020 11:07

runzhiwang force-pushed the getServiceList branch from 5e78b07 to a002f5f Compare June 14, 2020 10:17

runzhiwang changed the title ~~HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g~~ HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information s3g Jun 14, 2020

runzhiwang changed the title ~~HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information s3g~~ HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g Jun 14, 2020

runzhiwang force-pushed the getServiceList branch 3 times, most recently from f333972 to 13fb742 Compare June 15, 2020 00:56

Improve OM and SCM performance with 64% by avoid collect datanode inf…

13fb742

…ormation to s3g

bharatviswa504 merged commit 3ca7f5c into apache:master Jun 24, 2020

HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g #1031

HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g #1031

Uh oh!

Conversation

runzhiwang commented Jun 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

Uh oh!

runzhiwang commented Jun 10, 2020

Uh oh!

elek commented Jun 10, 2020

Uh oh!

nandakumar131 commented Jun 11, 2020

Uh oh!

bharatviswa504 commented Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nandakumar131 commented Jun 11, 2020

Uh oh!

bharatviswa504 commented Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bharatviswa504 commented Jun 11, 2020

Uh oh!

runzhiwang commented Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiaoyuyao commented Jun 12, 2020

Uh oh!

runzhiwang commented Jun 12, 2020

Uh oh!

bharatviswa504 commented Jun 12, 2020

Uh oh!

codecov-commenter commented Jun 15, 2020

Codecov Report

Uh oh!

runzhiwang commented Jun 15, 2020

Uh oh!

bharatviswa504 commented Jun 15, 2020

Uh oh!

elek commented Jun 15, 2020

Uh oh!

xiaoyuyao commented Jun 16, 2020

Uh oh!

bharatviswa504 commented Jun 16, 2020

Uh oh!

xiaoyuyao commented Jun 16, 2020 • edited by elek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bharatviswa504 commented Jun 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elek commented Jun 22, 2020

Uh oh!

bharatviswa504 commented Jun 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bharatviswa504 commented Jun 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

runzhiwang commented Jun 6, 2020 •

edited

Loading

bharatviswa504 commented Jun 11, 2020 •

edited

Loading

bharatviswa504 commented Jun 11, 2020 •

edited

Loading

runzhiwang commented Jun 11, 2020 •

edited

Loading

xiaoyuyao commented Jun 16, 2020 •

edited by elek

Loading

bharatviswa504 commented Jun 17, 2020 •

edited

Loading

bharatviswa504 commented Jun 24, 2020 •

edited

Loading