Skip to content

Conversation

@sodonnel
Copy link
Contributor

What changes were proposed in this pull request?

After the under / over replicated containers are collect in HDDS-6699, they need to be priortised and placed on a queue for the next stage of RM to pick up and process.

This change adds a priority queue for the under replicated containers, where they are priortised by the remaining redundancy and made available for processing by the next stage of the replication process.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6957

How was this patch tested?

New unit tests

Copy link
Contributor

@umamaheswararao umamaheswararao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sodonnel for working on this patch. I have dropped few comments. PTAL!

// For under replicated containers, the best remaining redundancy we can
// have is 3 for EC-10-4, 2 for EC-6-3, 1 for EC-3-2 and 2 for Ratis.
// A container which is under-replicated due to decommission will have one
// more, ie 4, 3, 2, 3 respectively. Ideally we want to sort decommission
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more means weight right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under-replicated due to decommission is not missing any replicas - so its remaining redundancy is still the same as if it was not under-replicated at all.

// decommission only under-replicated containers to a floor of 5 so they
// sort after an under-replicated container with 3 remaining replicas (
// EC-10-4) and plus one retry.
private static final int DECOMMISSION_REDUNDANCY = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically if the idea is to keep the decom elements last priority than underreplication, will that cause decom to take very long time if there are lot of under replication items in that cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, but that is how it should be. Decommission is less important that repairing containers which are at risk of dataloss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should block decom tasks. Decommision tasks (replicate commands are lighter weight compared to reconstruction tasks). If cluster has too manay reconstruction tasks ( may be due to rack down or so), decommission will take very long time may be. I just looked at HDFS, looks we there is no separate queue for decom. Probably let's move ahead with the current plan and revisit based on how this is going with decom in practice. I am wondering there may be complaints on decom taking longer time in practice.

replicationManager.processContainer(underRep1, underRep, overRep,
repReport);
replicationManager.processContainer(underRep0, underRep, overRep,
repReport);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use process all?

@adoroszlai adoroszlai added the EC label Jul 1, 2022
Copy link
Contributor

@umamaheswararao umamaheswararao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@umamaheswararao umamaheswararao merged commit 03cd7c4 into apache:master Jul 2, 2022
errose28 added a commit to errose28/ozone that referenced this pull request Jul 12, 2022
* master: (46 commits)
  HDDS-6901. Configure HDDS volume reserved as percentage of the volume space. (apache#3532)
  HDDS-6978. EC: Cleanup RECOVERING container on DN restarts (apache#3585)
  HDDS-6982. EC: Attempt to cleanup the RECOVERING container when reconstruction failed at coordinator. (apache#3583)
  HDDS-6968. Addendum: [Multi-Tenant] Fix USER_MISMATCH error even on correct user. (apache#3578)
  HDDS-6794. EC: Analyze and add putBlock even on non writing node in the case of partial single stripe. (apache#3514)
  HDDS-6900. Propagate TimeoutException for all SCM HA Ratis calls. (apache#3564)
  HDDS-6938. handle NPE when removing prefixAcl (apache#3568)
  HDDS-6960. EC: Implement the Over-replication Handler (apache#3572)
  HDDS-6979. Remove unused plexus dependency declaration (apache#3579)
  HDDS-6957. EC: ReplicationManager - priortise under replicated containers (apache#3574)
  HDDS-6723. Close Rocks objects properly in OzoneManager (apache#3400)
  HDDS-6942. Ozone Buckets/Objects created via S3 should not allow group access (apache#3553)
  HDDS-6965. Increase timeout for basic check (apache#3563)
  HDDS-6969. Add link to compose directory in smoketest README (apache#3567)
  HDDS-6970. EC: Ensure DatanodeAdminMonitor can handle EC containers during decommission (apache#3573)
  HDDS-6977. EC: Remove references to ContainerReplicaPendingOps in TestECContainerReplicaCount (apache#3575)
  HDDS-6217. Cleanup XceiverClientGrpc TODOs, and document how the client works and should be used. (apache#3012)
  HDDS-6773. Cleanup TestRDBTableStore (apache#3434) - fix checkstyle
  HDDS-6773. Cleanup TestRDBTableStore (apache#3434)
  HDDS-6676. KeyValueContainerData#getProtoBufMessage() should set block count (apache#3371)
  ...

Conflicts:
    hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/SCMUpgradeFinalizer.java
duongkame pushed a commit to duongkame/ozone that referenced this pull request Aug 16, 2022
HDDS-6957. EC: ReplicationManager - priortise under replicated containers (apache#3574)

(cherry picked from commit 03cd7c4)
Change-Id: I524bb79b44ead9432fb752ccb82ae0b6e168e1a5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants