-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16891 Avoid the overhead of copy-on-write exception list while loading inodes sub sections in parallel #5300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…oading inodes sub sections in parallel
|
💔 -1 overall
This message was automatically generated. |
|
@jojochuang @sodonnel could you please review this PR? |
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @virajjasani . This looks like a good idea. Can the code be simplifed to this?
final List<IOException> exceptions = Collections.synchronizedList(new ArrayList<>());
Then, we wouldn't need to manage a separate lock object.
|
CC @sodonnel and @jojochuang who originally authored/reviewed this code in HDFS-14617 / #1028 , just in case they had some reason I haven't considered to prefer |
|
I don't recall my reason for using a copyOnWrite list, but the list is only used in the case of an exception, which will result in the image failing to load and the NN aborting, so its an exception that we really don't expect to happen. Therefore as it stands, the CopyOnWrite list has basically zero overhead. Even if there are exceptions, the total number of entries is equal to the parallel loading threads, so low tens of entries at the most. Using |
|
Thanks for the reviews @cnauroth @sodonnel.
That is correct. As such this is going to lead to failure eventually. The only reason I came across this sometime back was due to profiling of a purposeful failure asserting test. We would like to use this parallelism of inodes loading with hadoop 3 upgrades (still running hadoop 2 for majority clusters), and hence running some tests around this.
Sounds good, thanks. |
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, pending a new CI run. @virajjasani , thank you for incorporating the code review feedback.
|
💔 -1 overall
This message was automatically generated. |
…oading inodes sub sections in parallel (#5300) Reviewed-by: Stephen O'Donnell <[email protected]> Signed-off-by: Chris Nauroth <[email protected]> (cherry picked from commit 04f3573)
|
I have committed this to trunk and branch-3.3. I did not commit to branch-3.2, because the original HDFS-14617 changes for parallel fsimage loading are not present in branch-3.2. @virajjasani , thank you for contributing this improvement. @sodonnel , thank you for the help with code review. |
If we enable parallel loading and persisting of inodes from/to fs image, we get the benefit of improved performance. However, while loading sub-sections INODE_DIR_SUB and INODE_SUB, if we encounter any errors, we use copy-on-write list to maintain the list of exceptions. Since our usecase is not to iterate over this list while executor threads are adding new elements to the list, using copy-on-write is bit of an overhead for this usecase.
It would be better to synchronize adding new elements to the list rather than having the list copy all elements over every time new element is added to the list.