HDFS-15945. DataNodes with zero capacity and zero blocks should be decommissioned immediately. #2854

tasanuma · 2021-04-02T08:14:06Z

JIRA: https://issues.apache.org/jira/browse/HDFS-15945

…commissioned immediately.

hadoop-yetus · 2021-04-02T16:33:40Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 8s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	34m 22s		trunk passed
+1 💚	compile	1m 23s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	1m 19s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	1m 4s		trunk passed
+1 💚	mvnsite	1m 28s		trunk passed
+1 💚	javadoc	0m 58s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 31s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	3m 16s		trunk passed
+1 💚	shadedclient	17m 47s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 19s		the patch passed
+1 💚	compile	1m 20s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	1m 20s		the patch passed
+1 💚	compile	1m 10s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	1m 10s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 54s		the patch passed
+1 💚	mvnsite	1m 14s		the patch passed
+1 💚	javadoc	0m 46s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 21s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	3m 19s		the patch passed
+1 💚	shadedclient	17m 34s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	406m 19s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 44s		The patch does not generate ASF License warnings.
		498m 12s

Reason	Tests
Failed junit tests	hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS
	hadoop.hdfs.TestDecommission
	hadoop.hdfs.TestWriteConfigurationToDFS
	hadoop.hdfs.web.TestWebHdfsFileSystemContract
	hadoop.hdfs.TestPersistBlocks
	hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
	hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand
	hadoop.hdfs.TestBlocksScheduledCounter
	hadoop.hdfs.TestStateAlignmentContextWithHA
	hadoop.hdfs.TestDecommissionWithBackoffMonitor
	hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
	hadoop.hdfs.TestDFSShell
	hadoop.hdfs.TestSnapshotCommands
	hadoop.hdfs.TestHDFSFileSystemContract
	hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
	hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
	hadoop.hdfs.server.namenode.ha.TestEditLogTailer
	hadoop.hdfs.TestDistributedFileSystem
	hadoop.hdfs.server.datanode.TestBlockRecovery
	hadoop.hdfs.server.datanode.TestIncrementalBrVariations
	hadoop.hdfs.server.datanode.TestBlockScanner
	hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme
	hadoop.hdfs.server.datanode.TestDirectoryScanner
	hadoop.hdfs.server.namenode.TestDecommissioningStatus
	hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
	hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/1/artifact/out/Dockerfile
GITHUB PR	#2854
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname	Linux 866a680d575c 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `755de61`
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/1/testReport/
Max. process+thread count	2879 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

tasanuma · 2021-04-05T02:52:40Z

Seems the failure of TestDecommission.testDecommissionWithNamenodeRestart is related. I will investigate it.

hadoop-yetus · 2021-04-05T15:36:28Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 12s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 2s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	36m 3s		trunk passed
+1 💚	compile	1m 22s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	1m 10s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	0m 59s		trunk passed
+1 💚	mvnsite	1m 21s		trunk passed
+1 💚	javadoc	0m 53s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 21s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	3m 15s		trunk passed
+1 💚	shadedclient	18m 32s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 16s		the patch passed
+1 💚	compile	1m 16s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	1m 16s		the patch passed
+1 💚	compile	1m 8s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	1m 8s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 53s		the patch passed
+1 💚	mvnsite	1m 15s		the patch passed
+1 💚	javadoc	0m 47s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 17s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	3m 24s		the patch passed
+1 💚	shadedclient	18m 40s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	332m 20s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 36s		The patch does not generate ASF License warnings.
		426m 33s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.blockmanagement.TestBlockReportLease
	hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
	hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
	hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
	hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
	hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy
	hadoop.hdfs.TestDFSShell
	hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
	hadoop.hdfs.server.datanode.TestBlockScanner
	hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks
	hadoop.hdfs.server.datanode.TestDirectoryScanner
	hadoop.hdfs.server.datanode.TestBlockRecovery
	hadoop.hdfs.server.mover.TestMover
	hadoop.hdfs.TestPersistBlocks
	hadoop.hdfs.server.namenode.TestDecommissioningStatus

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/2/artifact/out/Dockerfile
GITHUB PR	#2854
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname	Linux 9774e98c56cc 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `ca9cbcd`
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/2/testReport/
Max. process+thread count	2296 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

virajjasani

One minor comment, else looks good.

...ct/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java

tasanuma · 2021-04-06T09:05:01Z

Based on a discussion with @virajjasani (#2854 (comment)), I realized it doesn't matter what the capacity is. If a datanode which doesn't have any blocks, it could be decommissioned safely. Updated the PR based on that.

virajjasani

+1 (non-binding)

This reverts commit 0aa3649.

tasanuma · 2021-04-06T12:31:28Z

On second thought, the last commit has a problem. Just after restarting NameNode, NameNode hasn't received any block reports from any DataNode, so NameNode recognizes all DataNodes as zero blocks. Therefore, when restarting NameNode while decommissioning a DataNode, the DataNode becomes decommissioned imediately before replicating its blocks. Actually TestDecommission#testDecommissionWithNamenodeRestart() covers this case and it fails for 0aa3649.

After all, I think we need to consider if the DataNode has zero capacity or not. If the capacity is zero, it means the DataNode has a problem with its storage, and we can decommission it safely.

virajjasani · 2021-04-06T13:10:20Z

Oh I see, yeah this is a possibility. I agree that we should bring back zero capacity check.

Actually TestDecommission#testDecommissionWithNamenodeRestart() covers this case

Nice

tasanuma · 2021-04-06T13:36:32Z

@virajjasani Thanks for your confirmation. Reverted the last commit and added more comment.

hadoop-yetus · 2021-04-06T16:26:53Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 52s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	35m 1s		trunk passed
+1 💚	compile	1m 21s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	1m 13s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	0m 59s		trunk passed
+1 💚	mvnsite	1m 21s		trunk passed
+1 💚	javadoc	0m 52s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 21s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	3m 18s		trunk passed
+1 💚	shadedclient	18m 59s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 11s		the patch passed
+1 💚	compile	1m 16s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	1m 16s		the patch passed
+1 💚	compile	1m 8s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	1m 8s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 53s		the patch passed
+1 💚	mvnsite	1m 14s		the patch passed
+1 💚	javadoc	0m 46s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 17s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	3m 19s		the patch passed
+1 💚	shadedclient	19m 18s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	354m 18s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 45s		The patch does not generate ASF License warnings.
		448m 10s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.datanode.TestIncrementalBrVariations
	hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
	hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
	hadoop.hdfs.server.namenode.ha.TestEditLogTailer
	hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
	hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
	hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
	hadoop.hdfs.server.namenode.TestFileTruncate
	hadoop.hdfs.TestDFSShell
	hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
	hadoop.hdfs.server.datanode.TestBlockScanner
	hadoop.hdfs.TestStateAlignmentContextWithHA
	hadoop.hdfs.server.datanode.TestDirectoryScanner
	hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
	hadoop.hdfs.TestDecommissionWithBackoffMonitor
	hadoop.hdfs.TestPersistBlocks
	hadoop.hdfs.server.namenode.TestDecommissioningStatus

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/3/artifact/out/Dockerfile
GITHUB PR	#2854
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname	Linux 06b9281e83ee 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `0aa3649`
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/3/testReport/
Max. process+thread count	1890 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2021-04-06T21:11:39Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 5s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	38m 3s		trunk passed
+1 💚	compile	1m 33s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	1m 28s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	1m 8s		trunk passed
+1 💚	mvnsite	1m 38s		trunk passed
+1 💚	javadoc	1m 6s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 31s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	3m 43s		trunk passed
+1 💚	shadedclient	21m 40s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 28s		the patch passed
+1 💚	compile	1m 32s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	1m 32s		the patch passed
+1 💚	compile	1m 22s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	1m 22s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 1s		the patch passed
+1 💚	mvnsite	1m 32s		the patch passed
+1 💚	javadoc	0m 57s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 24s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	3m 42s		the patch passed
+1 💚	shadedclient	21m 35s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	349m 49s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 37s		The patch does not generate ASF License warnings.
		455m 9s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.datanode.TestBlockRecovery2
	hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS
	hadoop.hdfs.server.datanode.TestIncrementalBrVariations
	hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
	hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
	hadoop.hdfs.server.namenode.ha.TestEditLogTailer
	hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
	hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
	hadoop.hdfs.server.namenode.TestFileTruncate
	hadoop.hdfs.TestDFSShell
	hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
	hadoop.hdfs.server.datanode.TestBlockScanner
	hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics
	hadoop.hdfs.server.datanode.TestDirectoryScanner
	hadoop.hdfs.TestPersistBlocks
	hadoop.hdfs.server.namenode.TestDecommissioningStatus

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/4/artifact/out/Dockerfile
GITHUB PR	#2854
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname	Linux eee729e1d2bb 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `61e8a90`
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/4/testReport/
Max. process+thread count	1875 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/4/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

tasanuma · 2021-04-07T02:33:55Z

The failed tests succeeded locally.

jojochuang · 2021-04-07T03:56:50Z

...ct/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java

    if (!node.checkBlockReportReceived()) {
-      LOG.info("Node {} hasn't sent its first block report.", node);
-      return false;
+      if (node.getCapacity() == 0 && node.getNumBlocks() == 0) {


DatanodeDescriptor#getNumBlocks() returns the variable numBlocks.
However, it is only set during initialization.

Instead, I suspect we want to DatanodeDescriptor#use numBlocks() where the number is computed, aggregated from all existing storage volumes.

jojochuang · 2021-04-07T04:00:15Z

hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java

+        capacities[i][j] = 0;
+      }
+    }
+    getCluster().startDataNodes(getConf(), 1, null, true, null, null, null,


IMO a more complete repro of the scenario should include:

start DN with volumes, update config to tolerate volume failures.

intentionally corrupt the volumes (delete VERSION file, for example)

trigger volume scanner, wait for the DN to drop the volume

Maybe we don't need a very faithful repro, but I am worried this test doesn't cover the real scenario.

Thanks for your detailed reviews, @jojochuang. I will try to reproduce it by a unit test.

tasanuma · 2021-05-13T11:22:45Z

@jojochuang Sorry for being very late. We found the root cause of this problem. There is a bug in hadoop-3.3.0 that DataNode doesn't shutdown even if the number of the failed volumes is greater than dfs.datanode.failed.volumes.tolerated. Therefore, the capacity of a DataNode can be zero. Recently, the bug is solved by HDFS-15963. After HDFS-15963, the capacity of DataNode can't be 0. (dfs.datanode.failed.volumes.tolerated is limited to storageNum-1 at most.)

tasanuma · 2021-05-17T01:05:31Z

As I said in the last comment, this is not a problem anymore after HDFS-15963. I'm closing this PR.
Thanks for your kind reviews, @virajjasani and @jojochuang.

jojochuang · 2021-05-17T04:53:30Z

Great! Glad to find out.

HDFS-15945. DataNodes with zero capacity and zero blocks should be de…

755de61

…commissioned immediately.

Update PR addressing the failed unit test

ca9cbcd

virajjasani reviewed Apr 5, 2021

View reviewed changes

...ct/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java Show resolved Hide resolved

ignore capacity status

0aa3649

virajjasani approved these changes Apr 6, 2021

View reviewed changes

Revert "ignore capacity status"

5c5576c

This reverts commit 0aa3649.

add more comment

61e8a90

jojochuang reviewed Apr 7, 2021

View reviewed changes

virajjasani mentioned this pull request Apr 7, 2021

HDFS-15940. Fix TestBlockRecovery2#testRaceBetweenReplicaRecoveryAndFinalizeBlock (ADDENDUM) #2874

Merged

tasanuma closed this May 17, 2021

HDFS-15945. DataNodes with zero capacity and zero blocks should be decommissioned immediately. #2854

HDFS-15945. DataNodes with zero capacity and zero blocks should be decommissioned immediately. #2854

Uh oh!

Conversation

tasanuma commented Apr 2, 2021

Uh oh!

hadoop-yetus commented Apr 2, 2021

Uh oh!

tasanuma commented Apr 5, 2021

Uh oh!

hadoop-yetus commented Apr 5, 2021

Uh oh!

virajjasani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tasanuma commented Apr 6, 2021

Uh oh!

virajjasani left a comment

Choose a reason for hiding this comment

Uh oh!

tasanuma commented Apr 6, 2021

Uh oh!

virajjasani commented Apr 6, 2021

Uh oh!

tasanuma commented Apr 6, 2021

Uh oh!

hadoop-yetus commented Apr 6, 2021

Uh oh!

hadoop-yetus commented Apr 6, 2021

Uh oh!

tasanuma commented Apr 7, 2021

Uh oh!

jojochuang Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

jojochuang Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

tasanuma Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

tasanuma commented May 13, 2021

Uh oh!

tasanuma commented May 17, 2021

Uh oh!

jojochuang commented May 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants