-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "HDFS-16776 Erasure Coding: The length of targets should be checked when DN gets a reconstruction task" #6964
Conversation
…ecked wh…" This reverts commit 9a29075.
💔 -1 overall
This message was automatically generated. |
i have an idea,we can limit the BlockPlacementPolicyDefault to choose ec target one by one,so it will not break dn checkArgument, , instead of choose all target storageType |
cc @zhangshuyan0 and @haiyang1987 |
Thanks for your comment. Optimize the |
We try to solve this problem from the NN side. Close this PR. thank you~ |
@Hexiaoqiao @tomscut @zhangshuyan0 @haiyang1987 I think the problem is that ec's code is too dependent on the original process, the original process is based on continuous block copy. Many bugs come from over-reliance on this process and unnecessary parameter passing. I submit HDFS-17542, try to rearrange these code. After this pr, I think there are no need to check the length of target. Can you please review HDFS-17542? As for the maintenance, maybe I think it is not just maintainance. There are some imperfections in calculating the ec replica state, mainly because the uniqueness of the internal blocks is not taken into account when calculating NumberReplicas. Although some issues attempt to solve these problems (such as the inaccurate calculation of DECOMMISSIONING solved by HDFS-14920), it is not thorough. HDFS-17542 introduce NumberReplicasStriped, I think it is a better way. |
Reverts #4901
As a result of this change, maintainance can get stuck in two ways:
Here's a more complex example. We recently did maintainance on a batch of nodes, including host4 and host8.
Configuration:
hdfs fsck -fs -blockId blk_-9223372035217210640
Datanode log:
In this block group, there is a block written on the SSD (blk_-9223372035217210639).
When doing maintainance, two blocks need to be added: one is to migrate the blocks of SSD to HDD(In order to satisfy the storage policy), and the other is to ensure at least 7 blocks during maintainance.
Then the maintainance process to get stuck.