-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-6957. EC: ReplicationManager - priortise under replicated containers #3574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -77,10 +77,24 @@ public static class HealthyResult extends ContainerHealthResult { | |
| public static class UnderReplicatedHealthResult | ||
| extends ContainerHealthResult { | ||
|
|
||
| // For under replicated containers, the best remaining redundancy we can | ||
| // have is 3 for EC-10-4, 2 for EC-6-3, 1 for EC-3-2 and 2 for Ratis. | ||
| // A container which is under-replicated due to decommission will have one | ||
| // more, ie 4, 3, 2, 3 respectively. Ideally we want to sort decommission | ||
| // only under-replication after all other under-replicated containers. | ||
| // It may also make sense to allow under-replicated containers a chance to | ||
| // retry once before processing the decommission only under replication. | ||
| // Therefore we should adjust the weighted remaining redundancy of | ||
| // decommission only under-replicated containers to a floor of 5 so they | ||
| // sort after an under-replicated container with 3 remaining replicas ( | ||
| // EC-10-4) and plus one retry. | ||
| private static final int DECOMMISSION_REDUNDANCY = 5; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Basically if the idea is to keep the decom elements last priority than underreplication, will that cause decom to take very long time if there are lot of under replication items in that cluster?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yea, but that is how it should be. Decommission is less important that repairing containers which are at risk of dataloss.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure we should block decom tasks. Decommision tasks (replicate commands are lighter weight compared to reconstruction tasks). If cluster has too manay reconstruction tasks ( may be due to rack down or so), decommission will take very long time may be. I just looked at HDFS, looks we there is no separate queue for decom. Probably let's move ahead with the current plan and revisit based on how this is going with decom in practice. I am wondering there may be complaints on decom taking longer time in practice. |
||
|
|
||
| private final int remainingRedundancy; | ||
| private final boolean dueToDecommission; | ||
| private final boolean sufficientlyReplicatedAfterPending; | ||
| private final boolean unrecoverable; | ||
| private int requeueCount = 0; | ||
|
|
||
| UnderReplicatedHealthResult(ContainerInfo containerInfo, | ||
| int remainingRedundancy, boolean dueToDecommission, | ||
|
|
@@ -93,7 +107,7 @@ public static class UnderReplicatedHealthResult | |
| } | ||
|
|
||
| /** | ||
| * How many more replicas can be lost before the the container is | ||
| * How many more replicas can be lost before the container is | ||
| * unreadable. For containers which are under-replicated due to decommission | ||
| * or maintenance only, the remaining redundancy will include those | ||
| * decommissioning or maintenance replicas, as they are technically still | ||
|
|
@@ -104,6 +118,41 @@ public int getRemainingRedundancy() { | |
| return remainingRedundancy; | ||
| } | ||
|
|
||
| /** | ||
| * The weightedRedundancy, is the remaining redundancy + the requeue count. | ||
| * When this value is used for ordering in a priority queue it ensures the | ||
| * priority is reduced each time it is requeued, to prevent it from blocking | ||
| * other containers from being processed. | ||
| * Additionally, so that decommission and maintenance replicas are not | ||
| * ordered ahead of under-replicated replicas, a redundancy of | ||
| * DECOMMISSION_REDUNDANCY is used for the decommission redundancy rather | ||
| * than its real redundancy. | ||
| * @return The weightedRedundancy of this result. | ||
| */ | ||
| public int getWeightedRedundancy() { | ||
| int result = requeueCount; | ||
| if (dueToDecommission) { | ||
| result += DECOMMISSION_REDUNDANCY; | ||
| } else { | ||
| result += remainingRedundancy; | ||
| } | ||
| return result; | ||
| } | ||
|
|
||
| /** | ||
| * If there is an attempt to process this under-replicated result, and it | ||
| * fails and has to be requeued, this method should be called to increment | ||
| * the requeue count to ensure the result is not placed back at the head | ||
| * of the queue. | ||
| */ | ||
| public void incrementRequeueCount() { | ||
| ++requeueCount; | ||
| } | ||
|
|
||
| public int getRequeueCount() { | ||
| return requeueCount; | ||
| } | ||
|
|
||
| /** | ||
| * Indicates whether the under-replication is caused only by replicas | ||
| * being decommissioned or entering maintenance. Ie, there are not replicas | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more means weight right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under-replicated due to decommission is not missing any replicas - so its remaining redundancy is still the same as if it was not under-replicated at all.