-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-6957. EC: ReplicationManager - priortise under replicated containers #3574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
umamaheswararao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sodonnel for working on this patch. I have dropped few comments. PTAL!
| // For under replicated containers, the best remaining redundancy we can | ||
| // have is 3 for EC-10-4, 2 for EC-6-3, 1 for EC-3-2 and 2 for Ratis. | ||
| // A container which is under-replicated due to decommission will have one | ||
| // more, ie 4, 3, 2, 3 respectively. Ideally we want to sort decommission |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more means weight right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under-replicated due to decommission is not missing any replicas - so its remaining redundancy is still the same as if it was not under-replicated at all.
| // decommission only under-replicated containers to a floor of 5 so they | ||
| // sort after an under-replicated container with 3 remaining replicas ( | ||
| // EC-10-4) and plus one retry. | ||
| private static final int DECOMMISSION_REDUNDANCY = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically if the idea is to keep the decom elements last priority than underreplication, will that cause decom to take very long time if there are lot of under replication items in that cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, but that is how it should be. Decommission is less important that repairing containers which are at risk of dataloss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should block decom tasks. Decommision tasks (replicate commands are lighter weight compared to reconstruction tasks). If cluster has too manay reconstruction tasks ( may be due to rack down or so), decommission will take very long time may be. I just looked at HDFS, looks we there is no separate queue for decom. Probably let's move ahead with the current plan and revisit based on how this is going with decom in practice. I am wondering there may be complaints on decom taking longer time in practice.
...m/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManager.java
Outdated
Show resolved
Hide resolved
| replicationManager.processContainer(underRep1, underRep, overRep, | ||
| repReport); | ||
| replicationManager.processContainer(underRep0, underRep, overRep, | ||
| repReport); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use process all?
...m/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManager.java
Show resolved
Hide resolved
...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java
Outdated
Show resolved
Hide resolved
umamaheswararao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: (46 commits) HDDS-6901. Configure HDDS volume reserved as percentage of the volume space. (apache#3532) HDDS-6978. EC: Cleanup RECOVERING container on DN restarts (apache#3585) HDDS-6982. EC: Attempt to cleanup the RECOVERING container when reconstruction failed at coordinator. (apache#3583) HDDS-6968. Addendum: [Multi-Tenant] Fix USER_MISMATCH error even on correct user. (apache#3578) HDDS-6794. EC: Analyze and add putBlock even on non writing node in the case of partial single stripe. (apache#3514) HDDS-6900. Propagate TimeoutException for all SCM HA Ratis calls. (apache#3564) HDDS-6938. handle NPE when removing prefixAcl (apache#3568) HDDS-6960. EC: Implement the Over-replication Handler (apache#3572) HDDS-6979. Remove unused plexus dependency declaration (apache#3579) HDDS-6957. EC: ReplicationManager - priortise under replicated containers (apache#3574) HDDS-6723. Close Rocks objects properly in OzoneManager (apache#3400) HDDS-6942. Ozone Buckets/Objects created via S3 should not allow group access (apache#3553) HDDS-6965. Increase timeout for basic check (apache#3563) HDDS-6969. Add link to compose directory in smoketest README (apache#3567) HDDS-6970. EC: Ensure DatanodeAdminMonitor can handle EC containers during decommission (apache#3573) HDDS-6977. EC: Remove references to ContainerReplicaPendingOps in TestECContainerReplicaCount (apache#3575) HDDS-6217. Cleanup XceiverClientGrpc TODOs, and document how the client works and should be used. (apache#3012) HDDS-6773. Cleanup TestRDBTableStore (apache#3434) - fix checkstyle HDDS-6773. Cleanup TestRDBTableStore (apache#3434) HDDS-6676. KeyValueContainerData#getProtoBufMessage() should set block count (apache#3371) ... Conflicts: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/SCMUpgradeFinalizer.java
HDDS-6957. EC: ReplicationManager - priortise under replicated containers (apache#3574) (cherry picked from commit 03cd7c4) Change-Id: I524bb79b44ead9432fb752ccb82ae0b6e168e1a5
What changes were proposed in this pull request?
After the under / over replicated containers are collect in HDDS-6699, they need to be priortised and placed on a queue for the next stage of RM to pick up and process.
This change adds a priority queue for the under replicated containers, where they are priortised by the remaining redundancy and made available for processing by the next stage of the replication process.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6957
How was this patch tested?
New unit tests