Skip to content

Conversation

@bharatviswa504
Copy link
Contributor

What changes were proposed in this pull request?

Make container report processing multi-threaded.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5341

How was this patch tested?

Added UT.

@GlenGeng-awx
Copy link
Contributor

cc @GlenGeng @JacksonYao287 @ChenSammi

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bharatviswa504 for this work , and it does make sense to use multi-thread handler for container report handler

@bharatviswa504
Copy link
Contributor Author

Thank You @JacksonYao287 for the review. I have fixed review comments.

@sodonnel
Copy link
Contributor

I've got a couple of questions on this topic:

  1. Does it make sense to thread pool the Incremental Reports too? I know they are much smaller, but they are also much more frequent. I have not seen any evidence they are backing up or not, but its worth considering / checking if we can.
  2. I believe the DNs send a FCR every 60 - 90 seconds. Is that frequency really needed? Unknown bugs aside, is there any reason to send a FCR after startup and first registration? ON HDFS datanodes only send full block reports every 6 hours by default. If ICRs carry all the required information for SCM, perhaps we should increase the FCR interval to an hour or more?
  3. Have we been able to capture any profiles (eg flame charts) of processing a large FCR to see if there is anything to be optimised in that flow?

@JacksonYao287
Copy link
Contributor

JacksonYao287 commented Jun 23, 2021

@sodonnel

  1. Does it make sense to thread pool the Incremental Reports too? I know they are much smaller, but they are also much more frequent. I have not seen any evidence they are backing up or not, but its worth considering / checking if we can.

as far as i know, when a heartbeat from the data node arrives on SCM, It is queued for processing with the time stamp of when the heartbeat arrived. There is a heartbeat processing thread inside SCM that runs at a specified interval. So i think the point is how many reports is queued at SCM in the specified interval and how fast the report hander can deal with these report. the total num of Incremental Report in a specified interval(default 3s) is not very large , but it makes sense to promote it to a thread pool if needed in the future.

  1. I believe the DNs send a FCR every 60 - 90 seconds. Is that frequency really needed? Unknown bugs aside, is there any reason to send a FCR after startup and first registration? ON HDFS datanodes only send full block reports every 6 hours by default. If ICRs carry all the required information for SCM, perhaps we should increase the FCR interval to an hour or more?

yea, i think this makes sense. too many FCR will Increase the burden of SCM

@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Jun 23, 2021

I've got a couple of questions on this topic:

  1. Does it make sense to thread pool the Incremental Reports too? I know they are much smaller, but they are also much more frequent. I have not seen any evidence they are backing up or not, but its worth considering / checking if we can.

Previously with ICR's, we used to send full container report, that is fixed by HDDS-5111. We shall be testing with this PR and HDDS-5111 with huge container reports from each DN. If we observe issue with ICR, we can add thread pool to ICR also.

  1. I believe the DNs send a FCR every 60 - 90 seconds. Is that frequency really needed? Unknown bugs aside, is there any reason to send a FCR after startup and first registration? ON HDFS datanodes only send full block reports every 6 hours by default. If ICRs carry all the required information for SCM, perhaps we should increase the FCR interval to an hour or more?

During startup/registration we need to send a full container report as the ContainerSafeMode rule is dependent on that to validate its rule. And also we fire container report event, where we process container reports and build container replica set.

But I completely agree with you we can change the full container report interval to a larger value. And I don't think we need to have a large value like HDFS, as compared to HDFS our container report size should be very less.

From our scale testing
With 9 DN's with each data node filled with 500 TB data, we have seen around 350K containers in the cluster. So, there are a total of 1 million replicas will be reported from all DN's.(When compared with HDFS our container report size is far less in size)

  1. Have we been able to capture any profiles (eg flame charts) of processing a large FCR to see if there is anything to be optimized in that flow?

We have not debugged at this level, in future testing, we shall look into this.

Copy link
Contributor

@bshashikant bshashikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@bharatviswa504
Copy link
Contributor Author

Thank You @JacksonYao287 @bshashikant and @sodonnel for the review/comments.
I will commit this shortly.

@sodonnel In our scale testing, we have not observed an issue with ICR processing, in future, if it is needed, we can add that. As of now with this PR we have a general framework needed to support multi-threaded processing.

@bharatviswa504 bharatviswa504 merged commit 450c375 into apache:master Jun 25, 2021
errose28 added a commit to errose28/ozone that referenced this pull request Jul 7, 2021
* master: (28 commits)
  HDDS-5332. Add a new column family and a service provider in Recon DB for Namespace Summaries (apache#2366)
  HDDS-5405. Refactor pom files for HadoopRpc and Grpc/Ratis compilation properties. (apache#2386)
  HDDS-5406. add proto version to all the proto files. (apache#2385)
  HDDS-5398. Avoid object creation in ReplicationManger debug log statements (apache#2379)
  HDDS-5396. Fix negligence issue conditional expressions in MockCRLStore.java (apache#2380)
  HDDS-5395. Avoid unnecessary numKeyOps.incr() call in OMMetrics (apache#2374)
  HDDS-5389. Include ozoneserviceid in fs.defaultFS when configuring o3fs (apache#2370)
  HDDS-5383. Eliminate expensive string creation in debug log messages (apache#2372)
  HDDS-5380. Get more accurate space info for DedicatedDiskSpaceUsage. (apache#2365)
  HDDS-5341. Container report processing is single threaded (apache#2338)
  HDDS-5387. ProfileServlet to move the default output location to an ozone specific directory (apache#2368)
  HDDS-5289. Update container's deleteTransactionId on creation of the transaction in SCM. (apache#2361)
  HDDS-5369. Cleanup unused configuration related to SCM HA (apache#2359)
  HDDS-5381. SCM terminated with exit status 1: null. (apache#2362)
  HDDS-5353. Avoid unnecessary executeBatch call in insertAudits (apache#2342)
  HDDS-5350 :  Add allocate block support in MockOmTransport (apache#2341). Contributed by Uma Maheswara Rao G.
  HDDS-4926. Support start/stop for container balancer via command line (apache#2278)
  HDDS-5269. Datandoe with low ratis log volume space should not be considered for new pipeline allocation. (apache#2344)
  HDDS-5367. Update modification time when updating quota/storageType/versioning (apache#2355)
  HDDS-5352. java.lang.ClassNotFoundException: org/eclipse/jetty/alpn/ALPN (apache#2347)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants