Skip to content

Conversation

@tasanuma
Copy link
Member

@tasanuma tasanuma commented Apr 2, 2021

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 34m 22s trunk passed
+1 💚 compile 1m 23s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 19s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 1m 4s trunk passed
+1 💚 mvnsite 1m 28s trunk passed
+1 💚 javadoc 0m 58s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 31s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 16s trunk passed
+1 💚 shadedclient 17m 47s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 19s the patch passed
+1 💚 compile 1m 20s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 20s the patch passed
+1 💚 compile 1m 10s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 1m 10s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 54s the patch passed
+1 💚 mvnsite 1m 14s the patch passed
+1 💚 javadoc 0m 46s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 21s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 19s the patch passed
+1 💚 shadedclient 17m 34s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 406m 19s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
498m 12s
Reason Tests
Failed junit tests hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS
hadoop.hdfs.TestDecommission
hadoop.hdfs.TestWriteConfigurationToDFS
hadoop.hdfs.web.TestWebHdfsFileSystemContract
hadoop.hdfs.TestPersistBlocks
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand
hadoop.hdfs.TestBlocksScheduledCounter
hadoop.hdfs.TestStateAlignmentContextWithHA
hadoop.hdfs.TestDecommissionWithBackoffMonitor
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
hadoop.hdfs.TestDFSShell
hadoop.hdfs.TestSnapshotCommands
hadoop.hdfs.TestHDFSFileSystemContract
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
hadoop.hdfs.server.namenode.ha.TestEditLogTailer
hadoop.hdfs.TestDistributedFileSystem
hadoop.hdfs.server.datanode.TestBlockRecovery
hadoop.hdfs.server.datanode.TestIncrementalBrVariations
hadoop.hdfs.server.datanode.TestBlockScanner
hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme
hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.server.namenode.TestDecommissioningStatus
hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/1/artifact/out/Dockerfile
GITHUB PR #2854
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 866a680d575c 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 755de61
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/1/testReport/
Max. process+thread count 2879 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@tasanuma
Copy link
Member Author

tasanuma commented Apr 5, 2021

Seems the failure of TestDecommission.testDecommissionWithNamenodeRestart is related. I will investigate it.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 12s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 2s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 3s trunk passed
+1 💚 compile 1m 22s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 10s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 0m 59s trunk passed
+1 💚 mvnsite 1m 21s trunk passed
+1 💚 javadoc 0m 53s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 21s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 15s trunk passed
+1 💚 shadedclient 18m 32s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 16s the patch passed
+1 💚 compile 1m 16s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 16s the patch passed
+1 💚 compile 1m 8s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 1m 8s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 53s the patch passed
+1 💚 mvnsite 1m 15s the patch passed
+1 💚 javadoc 0m 47s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 17s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 24s the patch passed
+1 💚 shadedclient 18m 40s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 332m 20s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
426m 33s
Reason Tests
Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockReportLease
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy
hadoop.hdfs.TestDFSShell
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
hadoop.hdfs.server.datanode.TestBlockScanner
hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks
hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.server.datanode.TestBlockRecovery
hadoop.hdfs.server.mover.TestMover
hadoop.hdfs.TestPersistBlocks
hadoop.hdfs.server.namenode.TestDecommissioningStatus
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/2/artifact/out/Dockerfile
GITHUB PR #2854
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 9774e98c56cc 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ca9cbcd
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/2/testReport/
Max. process+thread count 2296 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, else looks good.

@tasanuma
Copy link
Member Author

tasanuma commented Apr 6, 2021

Based on a discussion with @virajjasani (#2854 (comment)), I realized it doesn't matter what the capacity is. If a datanode which doesn't have any blocks, it could be decommissioned safely. Updated the PR based on that.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (non-binding)

@tasanuma
Copy link
Member Author

tasanuma commented Apr 6, 2021

On second thought, the last commit has a problem. Just after restarting NameNode, NameNode hasn't received any block reports from any DataNode, so NameNode recognizes all DataNodes as zero blocks. Therefore, when restarting NameNode while decommissioning a DataNode, the DataNode becomes decommissioned imediately before replicating its blocks. Actually TestDecommission#testDecommissionWithNamenodeRestart() covers this case and it fails for 0aa3649.

After all, I think we need to consider if the DataNode has zero capacity or not. If the capacity is zero, it means the DataNode has a problem with its storage, and we can decommission it safely.

@virajjasani
Copy link
Contributor

Oh I see, yeah this is a possibility. I agree that we should bring back zero capacity check.

Actually TestDecommission#testDecommissionWithNamenodeRestart() covers this case

Nice

@tasanuma
Copy link
Member Author

tasanuma commented Apr 6, 2021

@virajjasani Thanks for your confirmation. Reverted the last commit and added more comment.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 35m 1s trunk passed
+1 💚 compile 1m 21s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 13s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 0m 59s trunk passed
+1 💚 mvnsite 1m 21s trunk passed
+1 💚 javadoc 0m 52s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 21s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 18s trunk passed
+1 💚 shadedclient 18m 59s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 11s the patch passed
+1 💚 compile 1m 16s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 16s the patch passed
+1 💚 compile 1m 8s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 1m 8s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 53s the patch passed
+1 💚 mvnsite 1m 14s the patch passed
+1 💚 javadoc 0m 46s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 17s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 19s the patch passed
+1 💚 shadedclient 19m 18s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 354m 18s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
448m 10s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestIncrementalBrVariations
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
hadoop.hdfs.server.namenode.ha.TestEditLogTailer
hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
hadoop.hdfs.server.namenode.TestFileTruncate
hadoop.hdfs.TestDFSShell
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
hadoop.hdfs.server.datanode.TestBlockScanner
hadoop.hdfs.TestStateAlignmentContextWithHA
hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
hadoop.hdfs.TestDecommissionWithBackoffMonitor
hadoop.hdfs.TestPersistBlocks
hadoop.hdfs.server.namenode.TestDecommissioningStatus
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/3/artifact/out/Dockerfile
GITHUB PR #2854
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 06b9281e83ee 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0aa3649
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/3/testReport/
Max. process+thread count 1890 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 5s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 3s trunk passed
+1 💚 compile 1m 33s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 28s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 1m 8s trunk passed
+1 💚 mvnsite 1m 38s trunk passed
+1 💚 javadoc 1m 6s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 31s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 43s trunk passed
+1 💚 shadedclient 21m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 28s the patch passed
+1 💚 compile 1m 32s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 32s the patch passed
+1 💚 compile 1m 22s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 1m 22s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 1s the patch passed
+1 💚 mvnsite 1m 32s the patch passed
+1 💚 javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 24s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 42s the patch passed
+1 💚 shadedclient 21m 35s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 349m 49s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
455m 9s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestBlockRecovery2
hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS
hadoop.hdfs.server.datanode.TestIncrementalBrVariations
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
hadoop.hdfs.server.namenode.ha.TestEditLogTailer
hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
hadoop.hdfs.server.namenode.TestFileTruncate
hadoop.hdfs.TestDFSShell
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
hadoop.hdfs.server.datanode.TestBlockScanner
hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics
hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.TestPersistBlocks
hadoop.hdfs.server.namenode.TestDecommissioningStatus
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/4/artifact/out/Dockerfile
GITHUB PR #2854
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux eee729e1d2bb 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 61e8a90
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/4/testReport/
Max. process+thread count 1875 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2854/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@tasanuma
Copy link
Member Author

tasanuma commented Apr 7, 2021

The failed tests succeeded locally.

if (!node.checkBlockReportReceived()) {
LOG.info("Node {} hasn't sent its first block report.", node);
return false;
if (node.getCapacity() == 0 && node.getNumBlocks() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatanodeDescriptor#getNumBlocks() returns the variable numBlocks.
However, it is only set during initialization.

Instead, I suspect we want to DatanodeDescriptor#use numBlocks() where the number is computed, aggregated from all existing storage volumes.

capacities[i][j] = 0;
}
}
getCluster().startDataNodes(getConf(), 1, null, true, null, null, null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a more complete repro of the scenario should include:

  1. start DN with volumes, update config to tolerate volume failures.
  2. intentionally corrupt the volumes (delete VERSION file, for example)
  3. trigger volume scanner, wait for the DN to drop the volume

Maybe we don't need a very faithful repro, but I am worried this test doesn't cover the real scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your detailed reviews, @jojochuang. I will try to reproduce it by a unit test.

@tasanuma
Copy link
Member Author

@jojochuang Sorry for being very late. We found the root cause of this problem. There is a bug in hadoop-3.3.0 that DataNode doesn't shutdown even if the number of the failed volumes is greater than dfs.datanode.failed.volumes.tolerated. Therefore, the capacity of a DataNode can be zero. Recently, the bug is solved by HDFS-15963. After HDFS-15963, the capacity of DataNode can't be 0. (dfs.datanode.failed.volumes.tolerated is limited to storageNum-1 at most.)

@tasanuma
Copy link
Member Author

As I said in the last comment, this is not a problem anymore after HDFS-15963. I'm closing this PR.
Thanks for your kind reviews, @virajjasani and @jojochuang.

@tasanuma tasanuma closed this May 17, 2021
@jojochuang
Copy link
Contributor

Great! Glad to find out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants