-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDDS-1207. Refactor Container Report Processing logic and plugin new Replication Manager. #662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Replication Manager.
|
🎊 +1 overall
This message was automatically generated. |
| LOG.info("Starting Replication Monitor Thread."); | ||
| running = true; | ||
| replicationMonitor.start(); | ||
| CompletableFuture.runAsync(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use the forkJoin commonPool. It has very few threads and can be easily exhausted. We saw this to be a frequent issue in unit tests.
Instead use the overload that accepts an Executor.
| " in {} milliseconds.", delay); | ||
| Thread.sleep(delay); | ||
| } catch (InterruptedException ignored) { | ||
| // InterruptedException is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set interrupted flag here.
| } | ||
| } | ||
| final Set<ContainerID> missingReplicas = new HashSet<>(containersInSCM); | ||
| missingReplicas.removeAll(containersInDn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the last few lines are doing? Didn't quite get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have to identify the replicas which are missing from the reported datanode.
SCM (NodeManager) has the list of container replicas that are expected in a given datanode, we identify the missing replicas by finding the delta between the list maintained in NodeManager and the list of replicas reported by the datanode.
Once we identify the missing replicas, we go and update the replica map in ContainerManager.
| * @param replicas List of ContainerReplicaProto | ||
| * @param publisher EventPublisher reference | ||
| */ | ||
| private void updateDeleteTransaction(final DatanodeDetails datanodeDetails, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a few questions on this function. I am not familiar with this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for updating the block deletion transaction Id reported by the container replica. If the container replica is lagging behind in block deletion, we trigger PENDING_DELETE_STATUS event so that SCMBlockDeletingService can resend block deletion commands to the datanode.
|
🎊 +1 overall
This message was automatically generated. |
arp7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 lgtm!
This Request is a copy of apache#647(got garbled). This PR already addresses all the comments brought up in the other request. Author: Boris S <[email protected]> Author: Boris S <[email protected]> Author: Boris Shkolnik <[email protected]> Reviewers: Shanthoosh Venkatraman <[email protected]>, Prateek Maheshwari <[email protected]> Closes apache#662 from sborya/NewConsumerAdmin2
PR #620 brings in new ReplicationManager, this change is to refactor ContainerReportProcessing logic in SCM so that it complements ReplicationManager and plugin the new ReplicationManager code.