-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-7853. Add support for RemoveSCM in SCMRatisServer. #4358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @duongkame this should be related to management of secret keys. |
4eee4e2 to
58db613
Compare
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAManager.java
Outdated
Show resolved
Hide resolved
|
|
||
| // TODO: Remove the certificate from certificate store. | ||
| return scmHAManager.removeSCM(request); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removePeerFromHARing function is not used in the patch. We can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the server-side code Sammi. There will be a follow-up change to introduce client-side code that will call this method.
| primarySCMHAManager.addSCM(request); | ||
| Assertions.assertEquals(2, primarySCMHAManager.getRatisServer() | ||
| .getDivision().getGroup().getPeers().size()); | ||
| scm2.getScmHAManager().getRatisServer().stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use try to capture the possible exception from addSCM, and close the ratis server in this case.
| primarySCMHAManager.removeSCM(removeSCMRequest); | ||
| Assertions.assertEquals(1, primarySCMHAManager.getRatisServer() | ||
| .getDivision().getGroup().getPeers().size()); | ||
| scm2.getScmHAManager().getRatisServer().stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion as above.
|
@nandakumar131 , the overall patch looks good. |
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAManagerImpl.java
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMRatisServerImpl.java
Outdated
Show resolved
Hide resolved
| public boolean removePeerFromHARing(RemoveSCMRequest request) | ||
| throws IOException { | ||
| // We cannot remove a node if it's currently leader. | ||
| if (scmContext.isLeader() && request.getScmId().equals(getScmId())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q. If we need to perform maintenance on the current leader SCM node, how do we decommission it?
// We cannot remove a node if it's currently leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, we have to stop the SCM service that we want to decommission. This will be part of SCM decommissioning documentation.
| throw new IOException("Removal of primordial node is not supported."); | ||
| } | ||
|
|
||
| // TODO: Remove the certificate from certificate store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nandakumar131 for the comment on certificate removal for node removed from ratis ring. What needs to be done to remove certificate? Are we blocked/waiting on something needed for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a new sub-task under HDDS-7852 for removing the SCM certificate form the database.
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/ha/TestSCMHAManagerImpl.java
Outdated
Show resolved
Hide resolved
|
thanks @nandakumar131 . +1 |
* master: (43 commits) HDDS-8148. Improve log for Pipeline creation failure (apache#4385) HDDS-7853. Add support for RemoveSCM in SCMRatisServer. (apache#4358) HDDS-8042. Display certificate issuer in cert list command. (apache#4429) HDDS-8189. [Snapshot] renamedKeyTable should only track keys in buckets that has at least one active snapshot. (apache#4436) HDDS-8154. Perf: Reuse Mac instances in S3 token validation (apache#4433) HDDS-8245. Info log for keyDeletingService when nonzero number of keys are deleted. (apache#4451) HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers (apache#4447) HDDS-8220. [Ozone-Streaming] Trigger volume check on IOException in StreamDataChannelBase (apache#4428) HDDS-8173. Fix to remove enrties from RocksDB after container gets deleted. (apache#4445) HDDS-7975. Rebalance acceptance tests (apache#4437) HDDS-8152. Reduce S3 acceptance test setup time (apache#4393) HDDS-8172. ECUnderReplicationHandler should consider commands already sent when processing the container (apache#4435) HDDS-7883. [Snapshot] Accommodate FSO, key renames and implement OMSnapshotPurgeRequest for SnapshotDeletingService (apache#4407) HDDS-8168. Make deadlines inside MoveManager for move commands configurable (apache#4415) HDDS-7918. EC: ECBlockReconstructedStripeInputStream should check for spare replicas before failing an index (apache#4441) HDDS-8222. EndpointBase#getBucket should handle BUCKET_NOT_FOUND (apache#4431) HDDS-8068. Fix Exception: JMXJsonServlet, getting attribute RatisRoles of Hadoop:service=OzoneManager. (apache#4352) HDDS-8139. Datanodes should not drop block delete transactions based on transaction ID (apache#4384) HDDS-8216. EC: OzoneClientConfig is overwritten in ECKeyOutputStream (apache#4425) HDDS-8054. Fix NPE in metrics for failed volume (apache#4340) ...
What changes were proposed in this pull request?
This change introduces support for
removeSCMinSCMRatisServer.As part of SCM decommissioning, the Ratis set configuration should be executed with a new peer list to remove the decommissioning SCM node from the Raft Ring.
SCMRatisServershould expose the method to perform remove node which should call Ratis set configuration with a new peer list.notifyConfigurationChangedonSCMStateMachineshould be idempotent as this can be called during SCM restart while applying the Raft log.What is the link to the Apache JIRA
HDDS-7853
How was this patch tested?
Unit tests added