-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-5341. Container report processing is single threaded #2338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @GlenGeng @JacksonYao287 @ChenSammi |
JacksonYao287
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bharatviswa504 for this work , and it does make sense to use multi-thread handler for container report handler
...ds/framework/src/main/java/org/apache/hadoop/hdds/server/events/FixedThreadPoolExecutor.java
Outdated
Show resolved
Hide resolved
...ds/framework/src/main/java/org/apache/hadoop/hdds/server/events/FixedThreadPoolExecutor.java
Outdated
Show resolved
Hide resolved
1f57435 to
847f800
Compare
|
Thank You @JacksonYao287 for the review. I have fixed review comments. |
ae89bfd to
8fedd12
Compare
|
I've got a couple of questions on this topic:
|
as far as i know, when a heartbeat from the data node arrives on SCM, It is queued for processing with the time stamp of when the heartbeat arrived. There is a heartbeat processing thread inside SCM that runs at a specified interval. So i think the point is how many reports is queued at SCM in the specified interval and how fast the report hander can deal with these report. the total num of Incremental Report in a specified interval(default 3s) is not very large , but it makes sense to promote it to a thread pool if needed in the future.
yea, i think this makes sense. too many FCR will Increase the burden of SCM |
Previously with ICR's, we used to send full container report, that is fixed by HDDS-5111. We shall be testing with this PR and HDDS-5111 with huge container reports from each DN. If we observe issue with ICR, we can add thread pool to ICR also.
During startup/registration we need to send a full container report as the ContainerSafeMode rule is dependent on that to validate its rule. And also we fire container report event, where we process container reports and build container replica set. But I completely agree with you we can change the full container report interval to a larger value. And I don't think we need to have a large value like HDFS, as compared to HDFS our container report size should be very less. From our scale testing
We have not debugged at this level, in future testing, we shall look into this. |
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfigKeys.java
Show resolved
Hide resolved
bshashikant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
|
Thank You @JacksonYao287 @bshashikant and @sodonnel for the review/comments. @sodonnel In our scale testing, we have not observed an issue with ICR processing, in future, if it is needed, we can add that. As of now with this PR we have a general framework needed to support multi-threaded processing. |
* master: (28 commits) HDDS-5332. Add a new column family and a service provider in Recon DB for Namespace Summaries (apache#2366) HDDS-5405. Refactor pom files for HadoopRpc and Grpc/Ratis compilation properties. (apache#2386) HDDS-5406. add proto version to all the proto files. (apache#2385) HDDS-5398. Avoid object creation in ReplicationManger debug log statements (apache#2379) HDDS-5396. Fix negligence issue conditional expressions in MockCRLStore.java (apache#2380) HDDS-5395. Avoid unnecessary numKeyOps.incr() call in OMMetrics (apache#2374) HDDS-5389. Include ozoneserviceid in fs.defaultFS when configuring o3fs (apache#2370) HDDS-5383. Eliminate expensive string creation in debug log messages (apache#2372) HDDS-5380. Get more accurate space info for DedicatedDiskSpaceUsage. (apache#2365) HDDS-5341. Container report processing is single threaded (apache#2338) HDDS-5387. ProfileServlet to move the default output location to an ozone specific directory (apache#2368) HDDS-5289. Update container's deleteTransactionId on creation of the transaction in SCM. (apache#2361) HDDS-5369. Cleanup unused configuration related to SCM HA (apache#2359) HDDS-5381. SCM terminated with exit status 1: null. (apache#2362) HDDS-5353. Avoid unnecessary executeBatch call in insertAudits (apache#2342) HDDS-5350 : Add allocate block support in MockOmTransport (apache#2341). Contributed by Uma Maheswara Rao G. HDDS-4926. Support start/stop for container balancer via command line (apache#2278) HDDS-5269. Datandoe with low ratis log volume space should not be considered for new pipeline allocation. (apache#2344) HDDS-5367. Update modification time when updating quota/storageType/versioning (apache#2355) HDDS-5352. java.lang.ClassNotFoundException: org/eclipse/jetty/alpn/ALPN (apache#2347) ...
What changes were proposed in this pull request?
Make container report processing multi-threaded.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5341
How was this patch tested?
Added UT.