-
Notifications
You must be signed in to change notification settings - Fork 590
HDDS-4115. CLI command to show current SCM leader and follower status. #1346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is still WIP. Can you give some suggestions? Just want to make sure I am on the right track. |
...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java
Outdated
Show resolved
Hide resolved
...mmon/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/shell/TestScmAdminHA.java
Outdated
Show resolved
Hide resolved
|
Could you attach your CLI command output? |
|
The output is empty array. In current test setup, I expect it will print something like So if my approach looks reasonable, I can continue working on this version to have a valid test somehow to make sure it prints desired formats . |
|
Ok I think I can work to have an example output first. |
|
I found the reason why in my current test empty array was printed. It is because the test is using I manually add a peer locally and get the output print: Any suggestion how should I proceed? |
.../apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name seems a bit verbose. Let me find a syntax to align with OM HA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both this one and OM HA getserviceroles can be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of improvement OM HA getserviceroles you are thinking of? I am happy to make a separate PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status or roles would probably be enough to indicate the goal of the subcommand, something like
ozone admin (om|scm) status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will adopt ozone admin scm status in this PR and I will send another PR for ozone admin om status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ozone admin om getserviceroles -id=<>
This is what OM does. I have my +1 on ozone admin (om|scm) roles. Status is more like health check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with both.
@adoroszlai what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on whether you want to keep this command specific to roles, or may extend the same command in the future with other status info. Probably "roles" is better now, and "status" can be either a separate command or another alias later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense. Indeed status sounds more like a health check and can carry much more information.
Consider OM is also actually getting, essentially, roles. We can start from roles
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/shell/TestScmAdminHA.java
Outdated
Show resolved
Hide resolved
8cf3741 to
0269d6c
Compare
|
Addressed the following comments
@timmylicheng I am not sure how to test an acceptance test. Can you share a way to run it locally? |
|
Uploaded one commit to
|
|
Also R: @nandakumar131 can you please take a look? |
|
R @timmylicheng @nandakumar131 I am thinking maybe we can first merge this PR and create a JIRA to track left work. Right now per feedback this command could print more information about Ratis peers, e.g. leader/follower roles, leader term, etc. I took a look at how does OM HA does: https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java#L2535 Basically it seems such request will directly hit the leader OM, then get status will be much easier. Currently in SCM HA we haven't reached to the point with a robust Ratis setup. |
| @Override | ||
| public Void call() throws Exception { | ||
| ScmClient scmClient = parent.createScmClient(); | ||
| List<String> status = scmClient.getScmRatisStatus(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to align the naming here. Either we use roles or status for all external and internal interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have switched all status to roles
| conf = new OzoneConfiguration(); | ||
|
|
||
| // Init HA cluster | ||
| omServiceId = "om-service-test1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean SCM here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I meant OM. It turns out that I cannot ignore omServiceId to start the cluster (there is a check for this service id).
|
|
||
| @Override | ||
| public List<String> getRatisStatus() { | ||
| return Arrays.asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the default SCM RatisServer port. Check ratisBindPort in SCMHAConfiguration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to use the default port. In the future I will think about how to better construct MockSCMHAManager for testing purpose (as the code is evolving we can keep updating MockSCMHAManager)
|
Thanks @amaliujia for the contribution. The patch looks good overall. Just a few comments inline. |
|
@timmylicheng comments addressed. Can you take another look? |
|
I feel the CLI should be common for both OM and SCM and probably extended to Datanodes as well. |
|
Re @bshashikant Agreed. Right now the command itself is unified (for both OM and SCM, we name this command as |
|
+1. Thanks for Rui's contribution. |
What changes were proposed in this pull request?
CLI command to show current SCM leader and follower status. E.g.
ozone admin scmha listratisstatusWhat is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4115
How was this patch tested?
Unit Test
Command line output