Skip to content

Conversation

@sodonnel
Copy link
Contributor

@sodonnel sodonnel commented Apr 1, 2021

This is a relatively simple change to reduce the memory used by the Directory Scanner and also simplify the logic in the ScanInfo object.

This change ensures the same File object is re-used for all blocks in a directory. Previously a large part of the path was repeated for each block file.

Aside from that, the logic of the directory scanner remains the same.

Comparing heap dumps, the memory used by 100K blocks goes from ~35MB to 19MB. Or 350MB per 1M blocks down to 190MB per 1M blocks. This is a reduction of about 46%.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 57s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 37s trunk passed
+1 💚 compile 1m 20s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 12s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 1m 2s trunk passed
+1 💚 mvnsite 1m 22s trunk passed
+1 💚 javadoc 0m 53s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 26s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 2s trunk passed
+1 💚 shadedclient 16m 0s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 16s the patch passed
+1 💚 compile 1m 13s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 13s the patch passed
+1 💚 compile 1m 10s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 1m 10s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 51s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 92 unchanged - 0 fixed = 95 total (was 92)
+1 💚 mvnsite 1m 13s the patch passed
+1 💚 javadoc 0m 45s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 17s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
-1 ❌ spotbugs 3m 17s /new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 16m 44s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 421m 50s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
508m 3s
Reason Tests
SpotBugs module:hadoop-hdfs-project/hadoop-hdfs
Redundant nullcheck of file, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.compileReport(File, File, Collection, DirectoryScanner$ReportCompiler) Redundant null check at FsVolumeImpl.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.compileReport(File, File, Collection, DirectoryScanner$ReportCompiler) Redundant null check at FsVolumeImpl.java:[line 1477]
Failed junit tests hadoop.hdfs.server.namenode.TestDecommissioningStatus
hadoop.hdfs.TestViewDistributedFileSystemContract
hadoop.hdfs.TestSnapshotCommands
hadoop.hdfs.TestPersistBlocks
hadoop.hdfs.TestDFSShell
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme
hadoop.hdfs.TestStateAlignmentContextWithHA
hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
hadoop.hdfs.server.namenode.TestNamenodeStorageDirectives
hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS
hadoop.hdfs.server.namenode.TestFileTruncate
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
hadoop.hdfs.server.namenode.ha.TestEditLogTailer
hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
hadoop.hdfs.server.datanode.TestBlockScanner
hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
hadoop.hdfs.server.datanode.TestBlockRecovery
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.TestHDFSFileSystemContract
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
hadoop.hdfs.web.TestWebHdfsFileSystemContract
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/1/artifact/out/Dockerfile
GITHUB PR #2849
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux b96fd57f0d68 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d189e66
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/1/testReport/
Max. process+thread count 2494 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 35m 4s trunk passed
+1 💚 compile 1m 22s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 12s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 1m 1s trunk passed
+1 💚 mvnsite 1m 21s trunk passed
+1 💚 javadoc 0m 51s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 23s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 20s trunk passed
+1 💚 shadedclient 18m 44s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 14s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 15s the patch passed
+1 💚 compile 1m 6s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 1m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 55s the patch passed
+1 💚 mvnsite 1m 14s the patch passed
+1 💚 javadoc 0m 45s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 15s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 19s the patch passed
+1 💚 shadedclient 18m 59s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 349m 5s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
442m 11s
Reason Tests
Failed junit tests hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
hadoop.hdfs.server.balancer.TestBalancer
hadoop.hdfs.server.datanode.TestBlockScanner
hadoop.hdfs.TestRollingUpgrade
hadoop.hdfs.server.namenode.TestFileTruncate
hadoop.hdfs.TestViewDistributedFileSystemWithMountLinks
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
hadoop.hdfs.TestPersistBlocks
hadoop.hdfs.server.namenode.ha.TestEditLogTailer
hadoop.hdfs.TestDFSShell
hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
hadoop.hdfs.server.datanode.TestIncrementalBrVariations
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
hadoop.hdfs.server.namenode.TestDecommissioningStatus
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/2/artifact/out/Dockerfile
GITHUB PR #2849
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux ca1c9f2ea511 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e71a460
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/2/testReport/
Max. process+thread count 1858 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@jojochuang
Copy link
Contributor

The spotbugs warning looks like a false positive to me.
Redundant nullcheck of file, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.compileReport(File, File, Collection, DirectoryScanner$ReportCompiler) Redundant null check at FsVolumeImpl.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.compileReport(File, File, Collection, DirectoryScanner$ReportCompiler) Redundant null check at FsVolumeImpl.java:[line 1477]

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch looks reasonable to me. Thanks for working on this @sodonnel !
Just a little suggestion for javadoc

*
* @param blockId the block ID
* @param blockFile the path to the block data file
* @param metaFile the path to the block meta-data file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a @param for basePath. Also add that metaFile stores only the suffix.

@sodonnel
Copy link
Contributor Author

@jojochuang Thanks for the review - I have pushed a new commit with the change. Please have a look when you get a chance.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 5s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 34m 47s trunk passed
+1 💚 compile 1m 17s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 14s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 0m 59s trunk passed
+1 💚 mvnsite 1m 19s trunk passed
+1 💚 javadoc 0m 51s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 21s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 9s trunk passed
+1 💚 shadedclient 16m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 9s the patch passed
+1 💚 compile 1m 10s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 10s the patch passed
+1 💚 compile 1m 6s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 1m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 51s the patch passed
+1 💚 mvnsite 1m 13s the patch passed
+1 💚 javadoc 0m 43s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 18s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 10s the patch passed
+1 💚 shadedclient 16m 22s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 408m 41s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
497m 13s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.TestDecommissioningStatus
hadoop.hdfs.TestViewDistributedFileSystemContract
hadoop.hdfs.TestSnapshotCommands
hadoop.hdfs.TestPersistBlocks
hadoop.hdfs.TestDFSShell
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme
hadoop.hdfs.TestStateAlignmentContextWithHA
hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
hadoop.hdfs.TestBlocksScheduledCounter
hadoop.hdfs.server.namenode.ha.TestEditLogTailer
hadoop.hdfs.server.datanode.TestBlockScanner
hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand
hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
hadoop.hdfs.TestViewDistributedFileSystem
hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.TestHDFSFileSystemContract
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
hadoop.hdfs.web.TestWebHdfsFileSystemContract
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/3/artifact/out/Dockerfile
GITHUB PR #2849
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux f836a2fe8698 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 74f4eb9
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/3/testReport/
Max. process+thread count 2714 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2849/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@sodonnel sodonnel merged commit 605ed85 into apache:trunk Apr 26, 2021
asfgit pushed a commit that referenced this pull request Apr 26, 2021
…Contributed by Stephen O'Donnell

(cherry picked from commit 605ed85)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
asfgit pushed a commit that referenced this pull request Apr 26, 2021
…Contributed by Stephen O'Donnell

(cherry picked from commit 605ed85)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java

(cherry picked from commit f6efb58)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
asfgit pushed a commit that referenced this pull request Apr 26, 2021
…Contributed by Stephen O'Donnell

(cherry picked from commit 605ed85)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java

(cherry picked from commit f6efb58)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java

(cherry picked from commit 7a81e50)
kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
). Contributed by Stephen O'Donnell

(cherry picked from commit 605ed85)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java

(cherry picked from commit f6efb58)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java

(cherry picked from commit 7a81e50)
(cherry picked from commit 71a9885)
Change-Id: I0974bdb6d05f8bd7741071b855ffce4389be969a
(cherry picked from commit 97cfe8d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants