Skip to content

Conversation

@GlenGeng-awx
Copy link
Contributor

What changes were proposed in this pull request?

We want to provide SCMContext, which would be a single source of truth for some key information that is shared across all components within SCM.

SCMContext holds two kind of key information:

  • RaftServer related info: isLeader, term.
  • SafeMode related info: inSafeMode, preCheckComplete.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4568

How was this patch tested?

CI

@GlenGeng-awx GlenGeng-awx changed the title HDDS-4568. add SCMContext to SCM HA HDDS-4568. Add SCMContext to SCM HA Dec 25, 2020
@GlenGeng-awx
Copy link
Contributor Author

Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the changes related to new Ratis API are in master branch already. It might be better to merge master to 2823 branch and then rebase this PR to remove those changes (e.g. changes in XceiverServerRatis)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of a change is contained in master (#1728)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. so this is from the cherry picked commit from master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it make sense to use SCMContext to get current term?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use SCMContext to encapsulate raft related info, so that components in SCM won't need to hold a reference of SCMHAManager or SCMRatisServer.

For non-HA mode or unit test, we just need an empty SCMContext, instead of a mocked SCMHAManager or a mocked SCMRatisServer.

Copy link
Contributor

@linyiqun linyiqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GlenGeng , I left one minor comment below.

I see now we use the term index value as the SCM leader check. This is used across SCM internal components. Does this will cover the client request behaviour?

For example, one client configured a single SCM address that is a Follower role and then send the request. Will it update the SCM metadata, like pipeline, containers? I see there was a isLeader check before but now that was removed in HDDS-4551. Do you know the context for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also catch the NotLeaderException like other places does? There are some other places we don't catch this exception that is swallowed by IOException.

        } catch (NotLeaderException nle) {

        }  catch (IOException e) {
          // We may tolerate a number of failures for sometime
          // but if it continues to fail, at some point we need to raise
          // an exception and probably fail the SCM ? At present, it simply
          // continues to retry the scanning.
          LOG.error("Failed to get block deletion transactions from delTX log",
              e);
          return EmptyTaskResult.newResult();
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will fix them in the next patch.

@GlenGeng-awx
Copy link
Contributor Author

@GlenGeng , I left one minor comment below.

I see now we use the term index value as the SCM leader check. This is used across SCM internal components. Does this will cover the client request behaviour?

For example, one client configured a single SCM address that is a Follower role and then send the request. Will it update the SCM metadata, like pipeline, containers? I see there was a isLeader check before but now that was removed in HDDS-4551. Do you know the context for this?

Hey Yiqun, thanks for the review!

For example, one client configured a single SCM address that is a Follower role and then send the request.

For now, we need client to know all the SCM instances that engage in the SCM raft cluster, if client send request to a follower SCM, it will get a NotLeaderException, and failover to the next SCM instance.

Will it update the SCM metadata, like pipeline, containers?

We removed the leader check in HDDS-4551, since all the metadata updates that will be saved into rocksdb will go through ratis, as a RaftClientRequest, so if underly Raft is in a non-Leader role, the replied RaftClientReply will be injected with a NotLeaderException.

Please check SCMHAInvocationHandler to see how we implement this: for now, add container, remove container, update container state, add pipeline, remove pipeline, update pipeline state will go through ratis.

If needed, we can schedule a zoom meeting to discuss about current SCM HA design.

@linyiqun
Copy link
Contributor

linyiqun commented Jan 4, 2021

@GlenGeng , thanks for the detailed explanation!

@amaliujia
Copy link
Contributor

Thanks Glen to rebase PR! Will try to give another pass on this PR.

Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Left a comment.

Copy link
Contributor

@runzhiwang runzhiwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bshashikant
Copy link
Contributor

Thanks @GlenGeng for detailed explanation. The changes look good to me except one minor comment. Thanks for the efforts.

@ChenSammi ChenSammi merged commit bb9c68f into apache:HDDS-2823 Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants