-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-4330. Bootstrap new OM node #1494
Conversation
7cdf67c
to
c44fba2
Compare
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OzoneIllegalArgumentException.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
Outdated
Show resolved
Hide resolved
...ne/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java
Outdated
Show resolved
Hide resolved
...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall approach LGTM, I have few comments/questions.
Pending test changes review will post comments soon for test changes.
Can we also open a new doc jIra to document details to how to convert single node ratis to 3 node ratis
And also how we can add new nodes to Ratis ring.
Because for a single node to 3 node, end user should use
serviceid as omServiceIdDefault
And nodeID as storage version file omID
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OmTransport.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/ozonefs-hadoop2/src/main/java/org/apache/hadoop/fs/ozone/Hadoop27RpcTransport.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public boolean bootstrap(List<OMNodeInfo> omNodeInfos) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are directly using proto classes, can we have helper classes like how we have done for all other requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Using OMNodeDetails instead.
getConfiguration(), getOMServiceId(), newOMNodeId); | ||
if (newOMNodeDetails == null) { | ||
// Load new configuration object to read in new peer information | ||
setConfiguration(new OzoneConfiguration()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is newly bootstrapped OM, so when it starts it has loaded new config, why do we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is run on all OMs (old OMs also). OzoneManager#addNewOMNode() is called when Ratis sends notifyConfigurationChanged(). And since the old OMs are not necessarily restarted, we need to reload the configuration.
newOMNodes.stream().forEach(newOMNode -> | ||
newOMNodeIdsBuilder.append(", ").append(newOMNode.getOMNodeId())); | ||
String newOMNodeIds = newOMNodeIdsBuilder.toString().substring(2); | ||
LOG.info("Bootstrapping new OM(s): {}", newOMNodeIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not using httpAddrress, httpsAddress, rpcPort, serviceID which are sent thru bootStraprequest
And in notifyConfigurationChanged finally reading from the current OM config. So, I see there is no real use in sending these information only required thing need looks like nodeID. Not sure If I am missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Not propagating the unused data anymore.
|
||
// If the OM is started in bootstrap mode, do not add it to the ratis ring. | ||
// It will be added later using SetConfiguration from the leader OM. | ||
if (isBootstrapping) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question:
If OM is started normally, and prepareBootStrapRequest is performed will it cause any issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could possibly cause a different ring to form with the new OMs.
The most likely scenario is that the new OMs would shutdown. But we need to find a mechanism to find this erroneous scenario and address it.
I think it should be done in a different Jira though.
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/admin/om/AddOMSubcommand.java
Outdated
Show resolved
Hide resolved
RATIS_BOOTSTRAP_ERROR = 3; | ||
} | ||
|
||
message BootstrapRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this AddOMRequest, as this is an admin command submitted to add new OM's to the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto
Outdated
Show resolved
Hide resolved
...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
Show resolved
Hide resolved
Thank You @hanishakoneru for the PR. |
881cb6e
to
4e9e1e8
Compare
Thanks @swagle, @fapifta and @bharatviswa504 for the reviews. I have addressed the review comments in the new commit (or answered inline). |
791d470
to
f9244cb
Compare
@hanishakoneru , ca you please rebase?? |
Thanks @hanishakoneru for working on this. We have a plan to enable OM HA in our production cluster. This feature is definitely we need. |
https://issues.apache.org/jira/browse/RATIS-1326 is blocker for this. |
@hanishakoneru Are you planning to update this? |
@elek , yes I am working on it currently. Will upload a new patch in a day or two. |
f9244cb
to
3897837
Compare
Updated the patch to remove the extra "addOM" command. Now when a new OM bootstraps, it sends an RPC request to existing OM ring to add itself to the ring. |
@bharatviswa504 can you please take a look when you get a chance. Thanks. |
c1b64ed
to
f39d8d2
Compare
Both these scenarios will not work. But I think it is reasonable to ask users to first upgrade (without any config changes) and then follow the steps to convert non-HA to HA.
Correct. |
I am okay with it if this is the way we want to pursue it. Now I got more clarity on the current PR deployment approach and limitations. We can revisit the decision if there is a requirement to update the config before the upgrade. (This kind of requirement update config only after start, might be a little tricky in deployment setups like Docker/Kubernetes.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
Few minor comments/questions.
...ne/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java
Outdated
Show resolved
Hide resolved
...one-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/utils/OzoneManagerRatisUtils.java
Show resolved
Hide resolved
@bharatviswa504, can you please take a look when you get a chance. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank You @hanishakoneru for the update.
Overall LGTM. Few comments, once addressed good to commit.
And also looks like it needs a rebase.
public void updatePeerList(List<String> omNodeIds) { | ||
List<String> ratisServerPeerIdsList = omRatisServer.getPeerIds(); | ||
for (String omNodeId : omNodeIds) { | ||
if (getOMNodeId().equals(omNodeId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In normal OM start node, ratisServer peer list has local + peer.
if (!isBootstrapping) {
// On regular startup, add all OMs to Ratis ring
raftPeers.add(localRaftPeer);
In bootstrap case, local node is not added to ratis server peer list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reading more, I have observed that before this change, we have not added local OM node to peer node list before(raftPeers). Should we have the same behavior?. So, should we remove adding local node to peer list?
// On regular startup, add all OMs to Ratis ring
raftPeers.add(localRaftPeer);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In bootstrap case, local node is not added to ratis server peer list.
Yes. This is because the local node is not part of the ratis ring yet. When the SetConfiguration is executed, this node would be added to the peer list.
While reading more, I have observed that before this change, we have not added local OM node to peer node list before(raftPeers). Should we have the same behavior?. So, should we remove adding local node to peer list?
Before this change also the local OM node was added to the peer list in OMRatisServer. Are you referring to any other class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newOMRatisServer
for (OMNodeDetails peerInfo : peerNodes) {
String peerNodeId = peerInfo.getNodeId();
RaftPeerId raftPeerId = RaftPeerId.valueOf(peerNodeId);
RaftPeer raftPeer;
if (peerInfo.isHostUnresolved()) {
raftPeer = RaftPeer.newBuilder()
.setId(raftPeerId)
.setAddress(peerInfo.getRatisHostPortStr())
.build();
} else {
InetSocketAddress peerRatisAddr = new InetSocketAddress(
peerInfo.getInetAddress(), peerInfo.getRatisPort());
raftPeer = RaftPeer.newBuilder()
.setId(raftPeerId)
.setAddress(peerRatisAddr)
.build();
}
// Add other OM nodes belonging to the same OM service to the Ratis ring
raftPeers.add(raftPeer);
}
return new OzoneManagerRatisServer(ozoneConf, omProtocol, omServiceId,
localRaftPeerId, ratisAddr, raftPeers, secConfig, certClient);
Looks like previously we don't save raftPeers in OzoneManagerRatisServer.
But now with this PR, on old node all nodes peer+local will be in peer list.
if (!isBootstrapping) {
// On regular startup, add all OMs to Ratis ring
raftPeers.add(localRaftPeer);
Where as for bootstrap due to this skip, we don't add
if (getOMNodeId().equals(omNodeId)) {
continue;
}
Due to this on bootstrap node peerlist does not include local node
There is a difference in raftPeers list in bootstrap and old node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference in raftPeers list in bootstrap and old node.
Correct. On a bootstrapping node, we do not add the local node to raftPeer list. This is done later when a setConfiguration request is executed and it calls OMRatisServer#addRaftPeer().
/**
* Add given node to list of RaftPeers.
*/
public void addRaftPeer(OMNodeDetails omNodeDetails) {
InetSocketAddress newOMRatisAddr = new InetSocketAddress(
omNodeDetails.getHostAddress(), omNodeDetails.getRatisPort());
raftPeers.add(RaftPeer.newBuilder()
.setId(RaftPeerId.valueOf(omNodeDetails.getNodeId()))
.setAddress(newOMRatisAddr)
.build());
LOG.info("Added OM {} to Ratis Peers list.", omNodeDetails.getNodeId());
}
As for the old code, the local node was added to raftPeers there too (OzoneManagerRatisServer Line#348 in old code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done later when a setConfiguration request is executed and it calls
OMRatisServer#addRaftPeer().
Yes, but notifyConfigurationChanged calls updatePeerList
@Override
public void notifyConfigurationChanged(long term, long index,
RaftProtos.RaftConfigurationProto newRaftConfiguration) {
List<RaftProtos.RaftPeerProto> newPeers =
newRaftConfiguration.getPeersList();
LOG.info("Received Configuration change notification from Ratis. New Peer" +
" list:\n{}", newPeers);
List<String> newPeerIds = new ArrayList<>();
for (RaftProtos.RaftPeerProto raftPeerProto : newPeers) {
newPeerIds.add(RaftPeerId.valueOf(raftPeerProto.getId()).toString());
}
// Check and update the peer list in OzoneManager
ozoneManager.updatePeerList(newPeerIds);
}
And in updatePeerList we have
public void updatePeerList(List<String> omNodeIds) {
List<String> ratisServerPeerIdsList = omRatisServer.getPeerIds();
for (String omNodeId : omNodeIds) {
if (getOMNodeId().equals(omNodeId)) {
continue;
}
Which skips calling addRaftPeer, when it is matching with localNode ID. Let me know if I am missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Bharat. I get it now. Should not skip the checks for local node. I have updated the patch. Thanks for catching this.
...ne/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM, pending CI.
Thank You @hanishakoneru for the update.
Thank you @bharatviswa504. I had to make a minor change to fix the unit tests. Can you please take a look at the last commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
Thank You @hanishakoneru for the work and for patiently addressing questions/review comments |
Thank you @bharatviswa504 for diligently reviewing all the iterations. |
* master: (48 commits) HDDS-5514. Skip check for UNHEALTHY containers for datanode finalize. (apache#2469) HDDS-5279. OFS mkdir -p does not work when Volume is not pre-created (apache#2412) HDDS-5328. Remove delete container command from admin CLI (apache#2456) HDDS-5382. Increase default container report interval to 60 mins (apache#2363) HDDS-5378 Add APIs to retrieve Namespace Summary from Recon (apache#2417) HDDS-5466. Refactor BlockOutputStream. (apache#2442) HDDS-5465. Delete redundant code when set、add and remove bucket acl (apache#2439) HDDS-5184. Use separate DB profile for Datanodes. (apache#2214) HDDS-5494. Reduce retry in Kubernetes test (apache#2461) HDDS-5414. Data buffers incorrectly filtered for Ozone Insight (apache#2387) HDDS-5450. Avoid refresh pipeline for S3 headObject (apache#2431) HDDS-5500. New k3s version breaks kubernetes test (apache#2464) HDDS-5489. Install OS-specific flekszible (apache#2462) Multi-raft style placement with permutations for offline data generator (apache#2434) HDDS-5484. Intermittent failure in TestReplicationManager#testMovePrerequisites (apache#2454) HDDS-5443 Create and then recreate a bucket with a randomized name (apache#2436) HDDS-5492. Disable failing kubernetes test (apache#2459) HDDS-4330. Bootstrap new OM node (apache#1494) HDDS-5418. Let Recon send reregisterCommand to Datanodes if DatanodeDetails changed (apache#2392) HDDS-5479. s3g bucket list failed when there is non-english key name. (apache#2450) ...
What changes were proposed in this pull request?
In a ratis enabled OM cluster, add support to bootstrap a new OM node and add it to OM ratis ring.
First step would be to update the ozone-site.xml with the configs (nodeId, address, ports etc.) for the new OM.
The new node(s) should be started in BOOTSTRAP mode using the following command. This command will also initialize the OM. Hence, no need to run om init command before this command.
ozone om --bootstrap
When this command is run, it starts the OM in bootstrap mode. The new OM starts up its Ratis server and sends a request to existing OMs to perform a SetConfiguration operation. After the SetConfiguration operation is successful, the new OM is bootstrapped and part of the OM Ratis ring.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4330
How was this patch tested?
Added unit tests