Skip to content

Conversation

@virajjasani
Copy link
Contributor

If we enable parallel loading and persisting of inodes from/to fs image, we get the benefit of improved performance. However, while loading sub-sections INODE_DIR_SUB and INODE_SUB, if we encounter any errors, we use copy-on-write list to maintain the list of exceptions. Since our usecase is not to iterate over this list while executor threads are adding new elements to the list, using copy-on-write is bit of an overhead for this usecase.

It would be better to synchronize adding new elements to the list rather than having the list copy all elements over every time new element is added to the list.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 43m 46s trunk passed
+1 💚 compile 1m 29s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 1m 22s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 1m 7s trunk passed
+1 💚 mvnsite 1m 30s trunk passed
+1 💚 javadoc 1m 9s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 27s trunk passed
+1 💚 shadedclient 25m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 24s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 1m 15s the patch passed
+1 💚 compile 1m 13s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 1m 13s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 51s the patch passed
+1 💚 mvnsite 1m 20s the patch passed
+1 💚 javadoc 0m 50s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 23s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 15s the patch passed
+1 💚 shadedclient 25m 25s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 298m 12s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 51s The patch does not generate ASF License warnings.
415m 57s
Reason Tests
Failed junit tests hadoop.hdfs.TestLeaseRecovery2
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5300/1/artifact/out/Dockerfile
GITHUB PR #5300
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 604e84c5cdea 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 8d6faad
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5300/1/testReport/
Max. process+thread count 3151 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5300/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani
Copy link
Contributor Author

@jojochuang @sodonnel could you please review this PR?

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @virajjasani . This looks like a good idea. Can the code be simplifed to this?

final List<IOException> exceptions = Collections.synchronizedList(new ArrayList<>());

Then, we wouldn't need to manage a separate lock object.

@cnauroth
Copy link
Contributor

CC @sodonnel and @jojochuang who originally authored/reviewed this code in HDFS-14617 / #1028 , just in case they had some reason I haven't considered to prefer CopyOnWriteArrayList for this.

@sodonnel
Copy link
Contributor

I don't recall my reason for using a copyOnWrite list, but the list is only used in the case of an exception, which will result in the image failing to load and the NN aborting, so its an exception that we really don't expect to happen. Therefore as it stands, the CopyOnWrite list has basically zero overhead. Even if there are exceptions, the total number of entries is equal to the parallel loading threads, so low tens of entries at the most.

Using Collections.synchronizedList does seem simpler or synchronizing on the exceptions object rather than having a separate lock object probably makes sense to simplify this change further.

@virajjasani
Copy link
Contributor Author

Thanks for the reviews @cnauroth @sodonnel.

which will result in the image failing to load and the NN aborting, so its an exception that we really don't expect to happen.

That is correct. As such this is going to lead to failure eventually. The only reason I came across this sometime back was due to profiling of a purposeful failure asserting test. We would like to use this parallelism of inodes loading with hadoop 3 upgrades (still running hadoop 2 for majority clusters), and hence running some tests around this.

Can the code be simplifed to this?
final List exceptions = Collections.synchronizedList(new ArrayList<>());

Using Collections.synchronizedList does seem simpler or synchronizing on the exceptions object rather than having a separate lock object probably makes sense to simplify this change further.

Sounds good, thanks.

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, pending a new CI run. @virajjasani , thank you for incorporating the code review feedback.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 43m 49s trunk passed
+1 💚 compile 1m 26s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 1m 19s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 1m 7s trunk passed
+1 💚 mvnsite 1m 31s trunk passed
+1 💚 javadoc 1m 7s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 34s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 23s trunk passed
+1 💚 shadedclient 25m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 compile 1m 17s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 1m 17s the patch passed
+1 💚 compile 1m 10s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 1m 10s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 51s the patch passed
+1 💚 mvnsite 1m 19s the patch passed
+1 💚 javadoc 0m 49s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 26s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 15s the patch passed
+1 💚 shadedclient 25m 36s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 306m 56s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 47s The patch does not generate ASF License warnings.
424m 25s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5300/2/artifact/out/Dockerfile
GITHUB PR #5300
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 2e2dd3ae5836 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f205782
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5300/2/testReport/
Max. process+thread count 3066 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5300/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@cnauroth cnauroth merged commit 04f3573 into apache:trunk Jan 18, 2023
cnauroth pushed a commit that referenced this pull request Jan 18, 2023
…oading inodes sub sections in parallel (#5300)

Reviewed-by: Stephen O'Donnell <[email protected]>
Signed-off-by: Chris Nauroth <[email protected]>
(cherry picked from commit 04f3573)
@cnauroth
Copy link
Contributor

I have committed this to trunk and branch-3.3. I did not commit to branch-3.2, because the original HDFS-14617 changes for parallel fsimage loading are not present in branch-3.2.

@virajjasani , thank you for contributing this improvement. @sodonnel , thank you for the help with code review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants