Skip to content

Conversation

@siddhantsangwan
Copy link
Contributor

@siddhantsangwan siddhantsangwan commented May 16, 2022

What changes were proposed in this pull request?

Make Container Balancer highly available by persisting its configurations (state) through RocksDB and replicating through Ratis. This change introduces a new protobuf message, ContainerBalancerConfiguration, containing configurations that need to be persisted. On leader change or restart, ContainerBalancer checks if it needs to start by reading this persisted ContainerBalancerConfiguration proto in the new leader.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6280

How was this patch tested?

Introduced a new test TestFailoverWithSCMHA#testContainerBalancerPersistsConfigurationInAllSCMs()

…tartBalancer.

This ensures that balancer starts with the 0th iteration when ContainerBalancer#startBalancer is called.
Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan Thanks for working on this! I have a few comments inline.

@siddhantsangwan
Copy link
Contributor Author

I think we need to somehow clearly distinguish between the ContainerBalancer#startBalancer and SCMService#start methods. startBalancer is a command for persisting configuration and starting balancer. start should be used when there's been a leader change or restart - to read configuration and start balancer.

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @siddhantsangwan for this work! i have a few comments inline, PTAL!

// balancer has been stopped in another thread
if (!isBalancerRunning()) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a balancer running check currentThread = null inside stopBalancer, so that the two operations will be protected by a single lock. if we first call isBalancerRunning() and then call stopBalancer() , there might be a case that between the two operation in a single thread, stopbalancer is called from another thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopBalancer calls validateState(true), which checks if balancer is currently running. Is this what you're looking for?

@Override
public boolean shouldRun() {
return false;
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check the status and the delay here?

  public boolean shouldRun() {
    serviceLock.lock();
    try {
      // If safe mode is off, then this SCMService starts to run with a delay.
      return serviceStatus == ServiceStatus.RUNNING &&
          clock.millis() - lastTimeToBeReadyInMillis >= waitTimeInMillis;
    } finally {
      serviceLock.unlock();
    }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already reading the persisted status for telling whether balancer should run, we probably don't need to check and update ServiceStatus. The persisted status is the most reliable information in balancer's case since it's not a background service.

lock.lock();
try {
if (!scmContext.isLeader() || scmContext.isInSafeMode()) {
if (isBalancerRunning()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check and set the ServiceStatus here? for example

      if (scmContext.isLeaderReady() && !scmContext.isInSafeMode()) {
        if (serviceStatus != ServiceStatus.RUNNING) {
          LOG.info("Service {} transitions to RUNNING.", getServiceName());
          serviceStatus = ServiceStatus.RUNNING;
          lastTimeToBeReadyInMillis = clock.millis();
        }
      } else {
        if (serviceStatus != ServiceStatus.PAUSING) {
          LOG.info("Service {} transitions to PAUSING.", getServiceName());
          serviceStatus = ServiceStatus.PAUSING;
        }
      }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my previous reply, I think holding state in ServiceStatus along with persisting it in RocksDB would make the logic a bit complex. We'd then have three ways of checking state - ServiceStatus, RocksDB, and checking the current thread for null. What do you think @JacksonYao287 ?

@siddhantsangwan
Copy link
Contributor Author

@lokeshj1703 @JacksonYao287 thanks for reviewing! I've updated the PR.

@siddhantsangwan
Copy link
Contributor Author

During manual testing in a cluster, I discovered that if a Storage Container Manager process is stopped, it would lead to balancer also stopping because of containerBalancer.stopBalancer(); being called in StorageContainerManager#stop. This is undesirable since we want balancer to start running in the new leader. I'm now testing if calling containerBalancer.stop() instead has the desired effect.

@siddhantsangwan
Copy link
Contributor Author

@JacksonYao287 @lokeshj1703 @nandakumar131 After the latest changes, balancer seems to be working as expected.

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan Thanks for updating the PR! I have few minor comments.

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan Thanks for updating the PR! The changes look good to me. +1.

/**
* A dummy codec that serializes a ByteString object to ByteString.
*/
public class ByteStringCodec implements Codec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a ByteStringCodec class. Is it possible to reuse it?

@siddhantsangwan siddhantsangwan merged commit e8abd0f into apache:master Jun 1, 2022
@siddhantsangwan
Copy link
Contributor Author

Thanks for the reviews! I've merged this PR to master.

errose28 added a commit to errose28/ozone that referenced this pull request Jun 7, 2022
* master: (87 commits)
  HDDS-6686. Do Leadship check before SASL token verification. (apache#3382)
  HDDS-4364: [FSO]List FileStatus : startKey can be a non-existed path (apache#3481)
  HDDS-6091. Add file checksum to OmKeyInfo (apache#3201)
  HDDS-6706. Exposing Volume Information Metrics to the DataNode UI (apache#3478)
  HDDS-6759: Add listblock API in MockDatanodeStorage (apache#3452)
  HDDS-5821 Container cache management for closing RockDB  (apache#3426)
  HDDS-6683. Refactor OM server bucket layout configuration usage (apache#3477)
  HDDS-6824. Revert changes made in proto.lock by HDDS-6768. (apache#3480)
  HDDS-6811. Bucket create message with layout type (apache#3479)
  HDDS-6810. Add a optional flag to trigger listStatus as part of listKeys for FSO buckets. (apache#3461)
  HDDS-6828. Revert RockDB version pending leak fixes (apache#3475)
  HDDS-6764: EC: DN ability to create RECOVERING containers for EC reconstruction. (apache#3458)
  HDDS-6795: EC: PipelineStateMap#addPipeline should not have precondition checks post db updates (apache#3453)
  HDDS-6823. Intermittent failure in TestOzoneECClient#testExcludeOnDNMixed (apache#3476)
  HDDS-6820. Bucket Layout Post-Finalization Validators for ACL Requests. (apache#3472)
  HDDS-6819. Add LEGACY to AllowedBucketLayouts in CreateBucketHandler (apache#3473)
  HDDS-4859. [FSO]ListKeys: seek all the files/dirs from startKey to keyPrefix (apache#3466)
  HDDS-6705 Add metrics for volume statistics including disk capacity, usage, Reserved (apache#3430)
  HDDS-6474. Add test to cover the FSO bucket list status with beyond batch boundary and cache. (apache#3379). Contributed by aswinshakil
  HDDS-6280. Support Container Balancer HA (apache#3423)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants