-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16333. fix balancer bug when transfer an EC block #3679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
3daa567 to
46dbec8
Compare
|
@liubingxing can you extend an UT for your scenario |
|
💔 -1 overall
This message was automatically generated. |
46dbec8 to
a6d0012
Compare
|
💔 -1 overall
This message was automatically generated. |
a6d0012 to
e999e87
Compare
|
💔 -1 overall
This message was automatically generated. |
@hemanthboyina sorry for the late reply, I add a UT to simulate the balancer with EC file, the excluded node is to simulate the decommissioning node. |
|
@hemanthboyina Please take a look at this and give some advice. Thanks a lot |
tasanuma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liubingxing Thanks for reporting the issues and submitting the PR. I reviewed it and left some comments. Please confirm them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding another void runBalancer method
private void runBalancer(Configuration conf, long totalUsedSpace,
long totalCapacity, BalancerParameters p, int excludedNodes)
throws Exception {
+ runBalancer(conf, totalUsedSpace, totalCapacity, p, excludedNodes, false);
+ }
+
+ private void runBalancer(Configuration conf, long totalUsedSpace,
+ long totalCapacity, BalancerParameters p, int excludedNodes, boolean checkFailedNum)
+ throws Exception {
waitForHeartBeat(totalUsedSpace, totalCapacity, client, cluster);and just calling it from the unit test?
| final int run = runBalancer(namenodes, pBuilder.build(), conf, true); | |
| if (conf.getInt( | |
| DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_KEY, | |
| DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_DEFAULT) | |
| == 0) { | |
| assertEquals(ExitStatus.NO_MOVE_PROGRESS.getExitCode(), run); | |
| } else { | |
| assertEquals(ExitStatus.SUCCESS.getExitCode(), run); | |
| } | |
| waitForHeartBeat(totalUsedSpace, totalCapacity, client, cluster); | |
| runBalancer(namenodes, pBuilder.build(), conf, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the UT and calling runBalancer from the unit test.
And I also add a parameter boolean checkExcludeNodesUtilization in waitForBalancer to determine whether to check the nodeUtilization of excluded datanode
DatanodeInfo[] datanodeReport =
client.getDatanodeReport(DatanodeReportType.ALL);
assertEquals(datanodeReport.length, cluster.getDataNodes().size());
balanced = true;
int actualExcludedNodeCount = 0;
for (DatanodeInfo datanode : datanodeReport) {
double nodeUtilization = ((double)datanode.getDfsUsed())
/ datanode.getCapacity();
if (Dispatcher.Util.isExcluded(p.getExcludedNodes(), datanode)) {
if (checkExcludeNodesUtilization) {
assertTrue(nodeUtilization == 0);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please provide more detailed comments on when the locations could be updated and why we need to adjust indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fix the code and add more comments like this.
if (!adjustList.isEmpty()) {
// block.locations mismatch with block.indices
// adjust indices to get correct internalBlock for Datanode in #getInternalBlock
((DBlockStriped) block).adjustIndices(adjustList);
Preconditions.checkArgument(((DBlockStriped) block).indices.length
== block.locations.size());
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide comments on what this method does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the comments like this.
/**
* Adjust EC block indices,it will remove the element of adjustList from indices.
* @param adjustList the list will be removed from indices
*/
public void adjustIndices(List<Integer> adjustList) {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the unit test multiple times, and sometimes the length of the locatedBlocks is larger than groupSize, and the verification failed. Could you check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because waitForBalancer not waiting for namenode to delete extra replicas.
I fix the code and check the total block counts before StripedFileTestUtil.verifyLocatedStripedBlocks like this.
// check total blocks, max wait time 60s
long startTime = Time.monotonicNow();
int count = 0;
while (count < 20) {
count++;
DatanodeInfo[] datanodeReport1 = client.getDatanodeReport(DatanodeReportType.ALL);
long totalBlocksAfterBalancer = 0;
for (DatanodeInfo dn : datanodeReport1) {
totalBlocksAfterBalancer += dn.getNumBlocks();
}
if (totalBlocks == totalBlocksAfterBalancer) {
System.out.println("wait " + (Time.monotonicNow() - startTime) + "ms to check blocks, count " + count);
break;
}
cluster.triggerHeartbeats();
Thread.sleep(3000L);
}
// verify locations of striped blocks
locatedBlocks = client.getBlockLocations(fileName, 0, fileLen);
StripedFileTestUtil.verifyLocatedStripedBlocks(locatedBlocks, groupSize);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. How about using GenericTestUtils#waitFor for checking it?
|
@tasanuma Thank you for your review and comments. I run the UT Therefore, it is not good to use |
|
One possible solution is to add Preconditions.checkArgument that checks that the length of the indices is equal to the size of the block location, and to check that ExitStatus is SUCCESS in the unit test. if (blkLocs instanceof StripedBlockWithLocations) {
// adjust indices if locations has been updated
((DBlockStriped) block).adjustIndices(adjustList);
Preconditions.checkArgument(((DBlockStriped) block).indices.length
== block.locations.size());
} |
e999e87 to
1d6f9e6
Compare
|
@tasanuma Thank you for your advice and I fix the code and according to your suggestion. Please take a look. |
|
🎊 +1 overall
This message was automatically generated. |
1d6f9e6 to
ed3a77c
Compare
tasanuma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liubingxing Thanks for updating PR. I left some comments. Would you please confirm them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move testBalancerWithExcludeListWithStripedFile() and doTestBalancerWithExcludeListWithStripedFile () after doTestBalancerWithStripedFile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. How about using GenericTestUtils#waitFor for checking it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Preconditions.checkArgument() can throw IllegalArgumentException, getBlockList() should declares it, and dispatchBlocks() should catch the exception.
- private long getBlockList() throws IOException {
+ private long getBlockList() throws IOException, IllegalArgumentException { try {
final long received = getBlockList();
if (received == 0) {
return;
}
blocksToReceive -= received;
continue;
- } catch (IOException e) {
+ } catch (IOException|IllegalArgumentException e) {
LOG.warn("Exception while getting reportedBlock list", e);
return;
}|
💔 -1 overall
This message was automatically generated. |
ed3a77c to
15a3fbd
Compare
|
Thanks for updating it. +1, pending Jenkins. |
|
@tasanuma Thanks for your review. |
|
💔 -1 overall
This message was automatically generated. |
|
The failed unit tests is not related to this PR |
|
Merged into trunk. Thanks for your contribution, @liubingxing! |
(cherry picked from commit 35556ea)
(cherry picked from commit 35556ea)
|
@tasanuma Thanks for your review and merge. |
|
After cherry-picking into branch-3.3, the build of branch-3.3 fails with the following error. So I reverted it for now. @liubingxing It seems TestBalancer doesn't import |
(cherry picked from commit 35556ea)

JIRA: HDFS-16333
We set the EC policy to (6+3) and we also have nodes that were decommissioning when we executed balancer.
With the balancer running, we find many error logs as follow.

Node A wants to transfer an EC block to node B, but we found that the block is not on node A. The FSCK command to show the block status as follow

In the dispatcher. getBlockList function

Assume that the location of the an EC block in storageGroupMap look like this
indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
node:[a, b, c, d, e, f, g, h, i]
after decommission operation, the internal block on indices[1] were decommission to another node.
indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
node:[a, j, c, d, e, f, g, h, i]
the location of indices[1] change from node b to node j.
When the balancer get the block location and check it with the location in storageGroupMap.
If a node is not found in storageGroupMap, it will not be add to block locations.
In this case, node j will not be added to the block locations, while the indices is not updated.
Finally, the block location may look like this,
indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
block.location:[a, c, d, e, f, g, h, i]
the location of the nodes does not match their indices
Solution:
we should update the indices and match with the nodes
indices:[0, 2, 3, 4, 5, 6, 7, 8]
block.location:[a, c, d, e, f, g, h, i]