Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-4330. Bootstrap new OM node #1494

Merged
merged 15 commits into from
Jul 23, 2021
Merged

Conversation

hanishakoneru
Copy link
Contributor

@hanishakoneru hanishakoneru commented Oct 13, 2020

What changes were proposed in this pull request?

In a ratis enabled OM cluster, add support to bootstrap a new OM node and add it to OM ratis ring. 

First step would be to update the ozone-site.xml with the configs (nodeId, address, ports etc.) for the new OM.
The new node(s) should be started in BOOTSTRAP mode using the following command. This command will also initialize the OM. Hence, no need to run om init command before this command.
ozone om --bootstrap
When this command is run, it starts the OM in bootstrap mode. The new OM starts up its Ratis server and sends a request to existing OMs to perform a SetConfiguration operation. After the SetConfiguration operation is successful, the new OM is bootstrapped and part of the OM Ratis ring.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4330

How was this patch tested?

Added unit tests

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach LGTM, I have few comments/questions.

Pending test changes review will post comments soon for test changes.

Can we also open a new doc jIra to document details to how to convert single node ratis to 3 node ratis
And also how we can add new nodes to Ratis ring.

Because for a single node to 3 node, end user should use
serviceid as omServiceIdDefault
And nodeID as storage version file omID

}

@Override
public boolean bootstrap(List<OMNodeInfo> omNodeInfos) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are directly using proto classes, can we have helper classes like how we have done for all other requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Using OMNodeDetails instead.

getConfiguration(), getOMServiceId(), newOMNodeId);
if (newOMNodeDetails == null) {
// Load new configuration object to read in new peer information
setConfiguration(new OzoneConfiguration());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is newly bootstrapped OM, so when it starts it has loaded new config, why do we need to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is run on all OMs (old OMs also). OzoneManager#addNewOMNode() is called when Ratis sends notifyConfigurationChanged(). And since the old OMs are not necessarily restarted, we need to reload the configuration.

newOMNodes.stream().forEach(newOMNode ->
newOMNodeIdsBuilder.append(", ").append(newOMNode.getOMNodeId()));
String newOMNodeIds = newOMNodeIdsBuilder.toString().substring(2);
LOG.info("Bootstrapping new OM(s): {}", newOMNodeIds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using httpAddrress, httpsAddress, rpcPort, serviceID which are sent thru bootStraprequest

And in notifyConfigurationChanged finally reading from the current OM config. So, I see there is no real use in sending these information only required thing need looks like nodeID. Not sure If I am missing something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Not propagating the unused data anymore.


// If the OM is started in bootstrap mode, do not add it to the ratis ring.
// It will be added later using SetConfiguration from the leader OM.
if (isBootstrapping) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question:
If OM is started normally, and prepareBootStrapRequest is performed will it cause any issues?

Copy link
Contributor Author

@hanishakoneru hanishakoneru Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could possibly cause a different ring to form with the new OMs.
The most likely scenario is that the new OMs would shutdown. But we need to find a mechanism to find this erroneous scenario and address it.
I think it should be done in a different Jira though.

RATIS_BOOTSTRAP_ERROR = 3;
}

message BootstrapRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this AddOMRequest, as this is an admin command submitted to add new OM's to the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bharatviswa504
Copy link
Contributor

Thank You @hanishakoneru for the PR.
PR also needs a rebase.

@hanishakoneru hanishakoneru force-pushed the HDDS-4330 branch 2 times, most recently from 881cb6e to 4e9e1e8 Compare December 17, 2020 01:26
@hanishakoneru
Copy link
Contributor Author

Thanks @swagle, @fapifta and @bharatviswa504 for the reviews. I have addressed the review comments in the new commit (or answered inline).

@bshashikant
Copy link
Contributor

@hanishakoneru , ca you please rebase??

@ChenSammi
Copy link
Contributor

Thanks @hanishakoneru for working on this. We have a plan to enable OM HA in our production cluster. This feature is definitely we need.

@bshashikant
Copy link
Contributor

https://issues.apache.org/jira/browse/RATIS-1326 is blocker for this.

@elek
Copy link
Member

elek commented Apr 19, 2021

@hanishakoneru Are you planning to update this?

@hanishakoneru
Copy link
Contributor Author

@hanishakoneru Are you planning to update this?

@elek , yes I am working on it currently. Will upload a new patch in a day or two.

@hanishakoneru
Copy link
Contributor Author

Updated the patch to remove the extra "addOM" command. Now when a new OM bootstraps, it sends an RPC request to existing OM ring to add itself to the ring.

@hanishakoneru
Copy link
Contributor Author

@bharatviswa504 can you please take a look when you get a chance. Thanks.

@hanishakoneru hanishakoneru force-pushed the HDDS-4330 branch 2 times, most recently from c1b64ed to f39d8d2 Compare April 23, 2021 22:35
@hanishakoneru
Copy link
Contributor Author

hanishakoneru commented Jun 15, 2021

Questions:

  1. On an upgraded cluster, config updated before the start and old OM start normally and new OM's started with bootstrap will it work?
  2. The same scenario config updated before and one OM started normally, and the other two bootstrap ways on the fresh install cluster will not work, as discussed in the above comments?

Both these scenarios will not work. But I think it is reasonable to ask users to first upgrade (without any config changes) and then follow the steps to convert non-HA to HA.

This is one way to handle this, as we cannot remove the old way of starting up OM's. So, if I understandand for the normal start we shall follow the existing way of start-up, and when new OM needs to be added then only we shall use bootstrap way?

Correct.

@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Jun 21, 2021

Questions:

  1. On an upgraded cluster, config updated before the start and old OM start normally and new OM's started with bootstrap will it work?
  2. The same scenario config updated before and one OM started normally, and the other two bootstrap ways on the fresh install cluster will not work, as discussed in the above comments?
    Both these scenarios will not work. But I think it is reasonable to ask users to first upgrade (without any config changes) >>and then follow the steps to convert non-HA to HA.

I am okay with it if this is the way we want to pursue it. Now I got more clarity on the current PR deployment approach and limitations. We can revisit the decision if there is a requirement to update the config before the upgrade. (This kind of requirement update config only after start, might be a little tricky in deployment setups like Docker/Kubernetes.)

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.
Few minor comments/questions.

@hanishakoneru
Copy link
Contributor Author

@bharatviswa504, can you please take a look when you get a chance. Thanks.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank You @hanishakoneru for the update.
Overall LGTM. Few comments, once addressed good to commit.

And also looks like it needs a rebase.

public void updatePeerList(List<String> omNodeIds) {
List<String> ratisServerPeerIdsList = omRatisServer.getPeerIds();
for (String omNodeId : omNodeIds) {
if (getOMNodeId().equals(omNodeId)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In normal OM start node, ratisServer peer list has local + peer.

 if (!isBootstrapping) {
      // On regular startup, add all OMs to Ratis ring
      raftPeers.add(localRaftPeer);

In bootstrap case, local node is not added to ratis server peer list.

Copy link
Contributor

@bharatviswa504 bharatviswa504 Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reading more, I have observed that before this change, we have not added local OM node to peer node list before(raftPeers). Should we have the same behavior?. So, should we remove adding local node to peer list?

      // On regular startup, add all OMs to Ratis ring
      raftPeers.add(localRaftPeer);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bootstrap case, local node is not added to ratis server peer list.

Yes. This is because the local node is not part of the ratis ring yet. When the SetConfiguration is executed, this node would be added to the peer list.

While reading more, I have observed that before this change, we have not added local OM node to peer node list before(raftPeers). Should we have the same behavior?. So, should we remove adding local node to peer list?

Before this change also the local OM node was added to the peer list in OMRatisServer. Are you referring to any other class?

Copy link
Contributor

@bharatviswa504 bharatviswa504 Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newOMRatisServer

    for (OMNodeDetails peerInfo : peerNodes) {
      String peerNodeId = peerInfo.getNodeId();
      RaftPeerId raftPeerId = RaftPeerId.valueOf(peerNodeId);
      RaftPeer raftPeer;
      if (peerInfo.isHostUnresolved()) {
        raftPeer = RaftPeer.newBuilder()
            .setId(raftPeerId)
            .setAddress(peerInfo.getRatisHostPortStr())
            .build();
      } else {
        InetSocketAddress peerRatisAddr = new InetSocketAddress(
            peerInfo.getInetAddress(), peerInfo.getRatisPort());
        raftPeer = RaftPeer.newBuilder()
            .setId(raftPeerId)
            .setAddress(peerRatisAddr)
            .build();
      }

      // Add other OM nodes belonging to the same OM service to the Ratis ring
      raftPeers.add(raftPeer);
    }

    return new OzoneManagerRatisServer(ozoneConf, omProtocol, omServiceId,
        localRaftPeerId, ratisAddr, raftPeers, secConfig, certClient);

Looks like previously we don't save raftPeers in OzoneManagerRatisServer.

But now with this PR, on old node all nodes peer+local will be in peer list.


if (!isBootstrapping) { 
// On regular startup, add all OMs to Ratis ring 
raftPeers.add(localRaftPeer);

Where as for bootstrap due to this skip, we don't add

if (getOMNodeId().equals(omNodeId)) {
continue;
}

Due to this on bootstrap node peerlist does not include local node

There is a difference in raftPeers list in bootstrap and old node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference in raftPeers list in bootstrap and old node.

Correct. On a bootstrapping node, we do not add the local node to raftPeer list. This is done later when a setConfiguration request is executed and it calls OMRatisServer#addRaftPeer().

  /**
   * Add given node to list of RaftPeers.
   */
  public void addRaftPeer(OMNodeDetails omNodeDetails) {
    InetSocketAddress newOMRatisAddr = new InetSocketAddress(
        omNodeDetails.getHostAddress(), omNodeDetails.getRatisPort());

    raftPeers.add(RaftPeer.newBuilder()
        .setId(RaftPeerId.valueOf(omNodeDetails.getNodeId()))
        .setAddress(newOMRatisAddr)
        .build());

    LOG.info("Added OM {} to Ratis Peers list.", omNodeDetails.getNodeId());
  }

As for the old code, the local node was added to raftPeers there too (OzoneManagerRatisServer Line#348 in old code).

Copy link
Contributor

@bharatviswa504 bharatviswa504 Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done later when a setConfiguration request is executed and it calls

OMRatisServer#addRaftPeer().

Yes, but notifyConfigurationChanged calls updatePeerList

@Override
  public void notifyConfigurationChanged(long term, long index,
      RaftProtos.RaftConfigurationProto newRaftConfiguration) {
    List<RaftProtos.RaftPeerProto> newPeers =
        newRaftConfiguration.getPeersList();
    LOG.info("Received Configuration change notification from Ratis. New Peer" +
        " list:\n{}", newPeers);

    List<String> newPeerIds = new ArrayList<>();
    for (RaftProtos.RaftPeerProto raftPeerProto : newPeers) {
      newPeerIds.add(RaftPeerId.valueOf(raftPeerProto.getId()).toString());
    }
    // Check and update the peer list in OzoneManager
    ozoneManager.updatePeerList(newPeerIds);
  }

And in updatePeerList we have

 public void updatePeerList(List<String> omNodeIds) {
    List<String> ratisServerPeerIdsList = omRatisServer.getPeerIds();
    for (String omNodeId : omNodeIds) {
      if (getOMNodeId().equals(omNodeId)) {
        continue;
      }

Which skips calling addRaftPeer, when it is matching with localNode ID. Let me know if I am missing something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bharat. I get it now. Should not skip the checks for local node. I have updated the patch. Thanks for catching this.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM, pending CI.
Thank You @hanishakoneru for the update.

@hanishakoneru
Copy link
Contributor Author

Thank you @bharatviswa504. I had to make a minor change to fix the unit tests. Can you please take a look at the last commit.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@bharatviswa504
Copy link
Contributor

Thank You @hanishakoneru for the work and for patiently addressing questions/review comments

@hanishakoneru
Copy link
Contributor Author

Thank you @bharatviswa504 for diligently reviewing all the iterations.

@hanishakoneru hanishakoneru merged commit 839fc94 into apache:master Jul 23, 2021
errose28 added a commit to errose28/ozone that referenced this pull request Jul 30, 2021
* master: (48 commits)
  HDDS-5514. Skip check for UNHEALTHY containers for datanode finalize. (apache#2469)
  HDDS-5279. OFS mkdir -p does not work when Volume is not pre-created (apache#2412)
  HDDS-5328. Remove delete container command from admin CLI (apache#2456)
  HDDS-5382. Increase default container report interval to 60 mins (apache#2363)
  HDDS-5378 Add APIs to retrieve Namespace Summary from Recon (apache#2417)
  HDDS-5466. Refactor BlockOutputStream. (apache#2442)
  HDDS-5465. Delete redundant code when set、add and remove bucket acl (apache#2439)
  HDDS-5184. Use separate DB profile for Datanodes. (apache#2214)
  HDDS-5494. Reduce retry in Kubernetes test (apache#2461)
  HDDS-5414. Data buffers incorrectly filtered for Ozone Insight (apache#2387)
  HDDS-5450. Avoid refresh pipeline for S3 headObject (apache#2431)
  HDDS-5500. New k3s version breaks kubernetes test (apache#2464)
  HDDS-5489. Install OS-specific flekszible (apache#2462)
  Multi-raft style placement with permutations for offline data generator (apache#2434)
  HDDS-5484. Intermittent failure in TestReplicationManager#testMovePrerequisites (apache#2454)
  HDDS-5443 Create and then recreate a bucket with a randomized name (apache#2436)
  HDDS-5492. Disable failing kubernetes test (apache#2459)
  HDDS-4330. Bootstrap new OM node (apache#1494)
  HDDS-5418. Let Recon send reregisterCommand to Datanodes if DatanodeDetails changed (apache#2392)
  HDDS-5479. s3g bucket list failed when there is non-english key name. (apache#2450)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants