-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8616. Underreplication not fixed if all replicas start decommissioning #4711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… getMaintenanceCount()
sodonnel
approved these changes
May 16, 2023
Contributor
sodonnel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for splitting the refactoring changes into separate commits - makes it a lot easier to review.
Contributor
Author
|
Thanks @sodonnel for the review. |
errose28
added a commit
to errose28/ozone
that referenced
this pull request
May 17, 2023
* master: (78 commits) HDDS-8575. Intermittent failure in TestCloseContainerEventHandler.testCloseContainerWithDelayByLeaseManager (apache#4688) HDDS-7241. EC: Reconstruction could fail with orphan blocks. (apache#4718) HDDS-8577. [Snapshot] Disable compaction log when loading metadata for snapshot (apache#4697) HDDS-7080. EC: Offline reconstruction needs better logging (apache#4719) HDDS-8626. Config thread pool in ReplicationServer (apache#4715) HDDS-8616. Underreplication not fixed if all replicas start decommissioning (apache#4711) HDDS-8254. Close containers when volume reaches utilisation threshold (apache#4583) HDDS-8254. Close containers when volume reaches utilisation threshold (apache#4583) HDDS-8615. Explicitly show EC block type in 'ozone debug chunkinfo' command output (apache#4706) HDDS-8623. Delete duplicate getBucketInfo in OMKeyCommitRequest (apache#4712) HDDS-8339. Recon Show the number of keys marked for Deletion in Recon UI. (apache#4519) HDDS-8572. Support CodecBuffer for protobuf v3 codecs. (apache#4693) HDDS-8010. Improve DN warning message when getBlock does not find the block. (apache#4698) HDDS-8621. IOException is never thrown in SCMRatisServer.getRatisRoles(). (apache#4710) HDDS-8463. S3 key uniqueness in deletedTable (apache#4660) HDDS-8584. Hadoop client write slowly when stream enabled (apache#4703) HDDS-7732. EC: Verify block deletion from missing EC containers (apache#4705) HDDS-8581. Avoid random ports in integration tests (apache#4699) HDDS-8504. ReplicationManager: Pass used and excluded node separately for Under and Mis-Replication (apache#4694) HDDS-8576. Close RocksDB instance in RDBStore if RDBStore's initialization fails after RocksDB instance creation (apache#4692) ...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
TestDecommissionAndMaintenance#testContainerIsReplicatedWhenAllNodesGotoMaintenancefails with the new replication manager (i.e. if legacy is disabled). If all replicas are starting maintenance, underreplication is not fixed.RatisReplicationCheckHandlerskips because there are no healthy replicas, andRatisUnhealthyReplicationCheckHandlerskips because neither are there any unhealthy ones. Decommissioning and maintenance replicas are counted separately, and we lose the information regarding their health.This change fixes the problem by counting healthy/unhealthy decom/maint replicas separately, and including them in total healthy/unhealthy counts (
getHealthyReplicaCount()andgetUnhealthyReplicaCount()).It also includes some refactoring as separate commits, reducing code duplication and duplicate calculation of some values.
https://issues.apache.org/jira/browse/HDDS-8616
How was this patch tested?
New unit test is added to reproduce the problem.
Legacy replication manager in
TestDecommissionAndMaintenanceis disabled, since it now passes with the new one.CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/4972183233