Skip to content

Conversation

@amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Aug 25, 2020

What changes were proposed in this pull request?

CLI command to show current SCM leader and follower status. E.g. ozone admin scmha listratisstatus

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4115

How was this patch tested?

Unit Test

Command line output

[180.3.14.5:2333, 180.3.14.21:2333, 180.3.14.145:2333]

@amaliujia
Copy link
Contributor Author

amaliujia commented Aug 25, 2020

R: @timmylicheng

This is still WIP. Can you give some suggestions? Just want to make sure I am on the right track.

@elek elek marked this pull request as draft August 25, 2020 08:20
@timmylicheng
Copy link
Contributor

Could you attach your CLI command output?

@amaliujia
Copy link
Contributor Author

amaliujia commented Aug 27, 2020

@timmylicheng

The output is empty array. In current test setup, scm.getScmHAManager().getRatisServer().getRaftPeers() returns empty list. My current guess is I haven't enabled SCM HA in the test setup in a correct way.

I expect it will print something like [127.0.1.1:1313, 127.0.1.1:1314, 127.0.1.1:1315]. Also I guess this JIRA requires a format of, e.g., [leader: 127.0.1.1:1313, follower: 127.0.1.1:1314, follower: 127.0.1.1:1315]?

So if my approach looks reasonable, I can continue working on this version to have a valid test somehow to make sure it prints desired formats .

@amaliujia
Copy link
Contributor Author

Ok I think I can work to have an example output first.

@amaliujia
Copy link
Contributor Author

@timmylicheng

I found the reason why in my current test empty array was printed. It is because the test is using MockSCMHaManager: https://github.com/apache/hadoop-ozone/blob/HDDS-2823/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/ha/MockSCMHAManager.java#L191

I manually add a peer locally and get the output print: [localhost:7238], which is expected.

Any suggestion how should I proceed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name seems a bit verbose. Let me find a syntax to align with OM HA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both this one and OM HA getserviceroles can be improved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai

What kind of improvement OM HA getserviceroles you are thinking of? I am happy to make a separate PR for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status or roles would probably be enough to indicate the goal of the subcommand, something like

ozone admin (om|scm) status

Copy link
Contributor Author

@amaliujia amaliujia Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will adopt ozone admin scm status in this PR and I will send another PR for ozone admin om status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ozone admin om getserviceroles -id=<>

This is what OM does. I have my +1 on ozone admin (om|scm) roles. Status is more like health check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with both.

@adoroszlai what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on whether you want to keep this command specific to roles, or may extend the same command in the future with other status info. Probably "roles" is better now, and "status" can be either a separate command or another alias later.

Copy link
Contributor Author

@amaliujia amaliujia Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense. Indeed status sounds more like a health check and can carry much more information.

Consider OM is also actually getting, essentially, roles. We can start from roles

@amaliujia amaliujia changed the title [WIP] HDDS-4115. CLI command to show current SCM leader and follower status. HDDS-4115. CLI command to show current SCM leader and follower status. Sep 1, 2020
@amaliujia amaliujia marked this pull request as ready for review September 1, 2020 19:14
@amaliujia
Copy link
Contributor Author

Addressed the following comments

  1. Merge getRatisStatus with GetScmInfo
  2. Adopt command syntax ozone admin scm status
  3. added an acceptance test

@timmylicheng I am not sure how to test an acceptance test. Can you share a way to run it locally?

@amaliujia
Copy link
Contributor Author

Uploaded one commit to

  1. use ozone admin scm roles
  2. fix the acceptance test.

@amaliujia
Copy link
Contributor Author

Also R: @nandakumar131 can you please take a look?

@amaliujia
Copy link
Contributor Author

amaliujia commented Sep 9, 2020

R @timmylicheng @nandakumar131

I am thinking maybe we can first merge this PR and create a JIRA to track left work. Right now per feedback this command could print more information about Ratis peers, e.g. leader/follower roles, leader term, etc.

I took a look at how does OM HA does: https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java#L2535

Basically it seems such request will directly hit the leader OM, then get status will be much easier. Currently in SCM HA we haven't reached to the point with a robust Ratis setup.

@Override
public Void call() throws Exception {
ScmClient scmClient = parent.createScmClient();
List<String> status = scmClient.getScmRatisStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to align the naming here. Either we use roles or status for all external and internal interfaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have switched all status to roles

conf = new OzoneConfiguration();

// Init HA cluster
omServiceId = "om-service-test1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean SCM here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I meant OM. It turns out that I cannot ignore omServiceId to start the cluster (there is a check for this service id).


@Override
public List<String> getRatisStatus() {
return Arrays.asList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the default SCM RatisServer port. Check ratisBindPort in SCMHAConfiguration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to use the default port. In the future I will think about how to better construct MockSCMHAManager for testing purpose (as the code is evolving we can keep updating MockSCMHAManager)

@timmylicheng
Copy link
Contributor

Thanks @amaliujia for the contribution. The patch looks good overall. Just a few comments inline.

@amaliujia
Copy link
Contributor Author

@timmylicheng comments addressed. Can you take another look?

@bshashikant
Copy link
Contributor

bshashikant commented Sep 29, 2020

I feel the CLI should be common for both OM and SCM and probably extended to Datanodes as well.

@amaliujia
Copy link
Contributor Author

Re @bshashikant

Agreed. Right now the command itself is unified (for both OM and SCM, we name this command as roles). Then we should unify the behavior fo both commands (and if there is DN command, that should be the same).

@timmylicheng
Copy link
Contributor

+1. Thanks for Rui's contribution.
Merging

@timmylicheng timmylicheng merged commit 1c2a950 into apache:HDDS-2823 Sep 29, 2020
@amaliujia amaliujia deleted the HDDS-2823-patch branch September 29, 2020 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants