Skip to content

Conversation

@hanishakoneru
Copy link
Contributor

@hanishakoneru hanishakoneru commented Oct 13, 2019

A command line tool (ozone omha) to get information related to OM HA.
This Jira proposes to add the getServiceState option for OM HA which lists all the OMs in the service and their corresponding Ratis server roles (LEADER/ FOLLOWER).
We can later add more options to this tool.

Migrated from apache/hadoop#1586

@hanishakoneru
Copy link
Contributor Author

@anuengineer , @elek , @dineshchitlangia,
I have addressed your review comments from apache/hadoop#1586. Can you please review the changes.

@dineshchitlangia
Copy link
Contributor

@hanishakoneru Thank you very much for the update.
The integration test failure is unrelated to patch, I verified this in my env.
Could you please address the new findbug violation ?

@hanishakoneru
Copy link
Contributor Author

@anuengineer, @elek , can you please take a look at the new patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So today when a client connects to the OM -- it connect reads to get ServiceList, without this API we cannot communicate to Ozone. So Why would this info be an independent API, since in my mind this info is needed by the clients to communicate to OM, since the leader needs to be discovered.

if that is true, then admin command can use the same API do discover the state right ? do we need a new API in the RPC layer? Perhaps I am missing something here, Thanks

Copy link
Contributor Author

@hanishakoneru hanishakoneru Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client connects to an OM and submits a getServiceList request. If this OM is not the leader, then the client tries a different OM. This all happens through OMFailoverProxyProvider.
Currently, the getServiceList API is only used to discover the SCM address. It does not return the list of OMs in the cluster.

But based on which OM serves the getServiceList request, we can presume it to be the leader. We do not need a new request type.

@hanishakoneru
Copy link
Contributor Author

Updated the patch to remove new OMRequest.
Using GetServiceList Api to provide the Leader role info as well.

@anuengineer
Copy link
Contributor

There is a compile failure, @hanishakoneru please take a look when you get a chance, thanks

@hanishakoneru
Copy link
Contributor Author

/retest

@hanishakoneru
Copy link
Contributor Author

@anuengineer, I fixed the compile failure. Please take a look when you get a chance. Thanks.

@smengcl
Copy link
Contributor

smengcl commented Oct 24, 2019

  1. Compilation passed with mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true
  2. Checked ./hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/bin/ozone, admin command shows up:
    Client Commands:

admin         Ozone admin tool
  1. Ran ozone admin om getserviceroles --om-service-id=id1 in a docker-compose cluster with profile compose/ozone-om-ha/. It succeeded on om1 but throws this scary protobuf.ServiceException:
bash-4.2$ ozone admin om getserviceroles --om-service-id=id1
com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.NotLeaderException): OM om2 is not the leader. Suggested leader is om3
	at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createNotLeaderException(OzoneManagerProtocolServerSideTranslatorPB.java:192)
	at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:178)
	at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:110)
	at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
	at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
	at org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
, while invoking $Proxy16.submitRequest over nodeId=om2,nodeAddress=om2:9862 after 1 failover attempts. Trying to failover immediately.
om1 : FOLLOWER
om2 : FOLLOWER
om3 : LEADER

The same exception shows up when I'm running the same command on om2 or om3. Looks like it is always trying to contact om2, which is not the leader in our case. Can we suppress this fail over message unless the command fails?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, why are these log level being setting here and why it is set INFO, and also NativeCodeLoader to ERROR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the settings which were set for SCM Cli. I will update the root logger to WARN. We can probably keep the NativeCodeLoader to Error so that the "Unable to load native-hadoop library for your platform" warning is not displayed always.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. One minor comment in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this service-id, instead of --om-service-id, as anyway this command is used ozone admin om --getserviceroles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will update it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line missing, for newly added files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry did not understand this.

Copy link
Contributor

@dineshchitlangia dineshchitlangia Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the latest CI run, this doesn't seem to be an issue.
@hanishakoneru He meant that one of the checkstyle rule is that any source file must have an empty new line as its last line.

Copy link
Contributor

@bharatviswa504 bharatviswa504 Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Newline at end of the file. I think there is no checkstyle rule for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bharatviswa504 I just downloaded the patch and confirmed the new line at EOF is there. So no changes are needed.

@hanishakoneru
Copy link
Contributor Author

hanishakoneru commented Oct 29, 2019

Thanks for the reviews @smengcl and @bharatviswa504 .
I addressed your comments in the latest patch.

@smengcl
Copy link
Contributor

smengcl commented Oct 29, 2019

Thanks @hanishakoneru . The command is working as expected now. lgtm +1. Pending @bharatviswa504 's comments.

bash-4.2$ ozone admin om getserviceroles
Missing required option '--service-id=<omServiceId>'
Usage: ozone admin om getserviceroles [-hV] -id=<omServiceId>
List all OMs and their respective Ratis server roles
  -h, --help      Show this help message and exit.
      -id, --service-id=<omServiceId>
                  OM Service ID
  -V, --version   Print version information and exit.

bash-4.2$ ozone admin om getserviceroles --service-id=id1
om1 : FOLLOWER
om3 : FOLLOWER
om2 : LEADER

bash-4.2$ ozone admin om getserviceroles -id=id1
om1 : FOLLOWER
om3 : FOLLOWER
om2 : LEADER

@anuengineer
Copy link
Contributor

Hi All, Thank you all for an extensive code review of this patch. If we are all in Sync and there are no issues left outstanding, is it okay for me to commit this ?

I am +1 on this patch, and it does look very good to me. But I will wait for signal from others. Thanks

Copy link
Contributor

@dineshchitlangia dineshchitlangia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for merge. Thanks @hanishakoneru for the work and everyone for the reviews.

@arp7
Copy link
Contributor

arp7 commented Oct 29, 2019

Thank you all for carefully reviewing multiple iterations of this patch! 🙂

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM (with one minor comment of adding a new line at end of newly added files). This can be taken care of during commit.

@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Oct 29, 2019

Thank You everyone for the reviews and @hanishakoneru for the contribution.

@bharatviswa504 bharatviswa504 merged commit 7b043e9 into apache:master Oct 29, 2019
@hanishakoneru
Copy link
Contributor Author

Thank you all for the reviews.

GlenGeng-awx referenced this pull request in GlenGeng-awx/hadoop-ozone Sep 18, 2020
HDDS-4186: Adjust RetryPolicy of SCMConnectionManager for SCM/Recon
@hanishakoneru hanishakoneru deleted the HDDS-2240 branch December 1, 2020 21:28
kuenishi referenced this pull request in pfnet/ozone Feb 22, 2022
tanvipenumudy added a commit to tanvipenumudy/ozone that referenced this pull request May 12, 2022
# This is the 1st commit message:

Initial Commit

# This is the commit message apache#2:

more slight changes

# This is the commit message apache#3:

changes++

# This is the commit message apache#4:

getExecutorService Changes

# This is the commit message apache#5:

applyTransaction() Changes

# This is the commit message apache#6:

changes++

# This is the commit message apache#7:

TestOzoneManagerLock changes

# This is the commit message apache#8:

add changes

# This is the commit message apache#9:

add more minor changes

# This is the commit message apache#10:

add config to ozone-default.xml

# This is the commit message apache#11:

minor changes

# This is the commit message apache#12:

change modulo logic

# This is the commit message apache#13:

changes

# This is the commit message apache#14:

changes++

# This is the commit message apache#15:

add changes++

# This is the commit message apache#16:

minor changes

# This is the commit message apache#17:

Changes (to be reverted)

# This is the commit message apache#18:

Changes 09/05
k5342 pushed a commit to k5342/ozone that referenced this pull request Oct 20, 2023
ptlrs pushed a commit to ptlrs/ozone that referenced this pull request Mar 8, 2025
…h Ozone ListStatusLight API (apache#9)

Cherry picked: e96e314
Conflict files:
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/BasicOmKeyInfo.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFileSystem.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestOzoneRpcClientWithKeyLatestVersion.java
hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneClientAdapterImpl.java
hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicRootedOzoneClientAdapterImpl.java
hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicRootedOzoneFileSystem.java

Change-Id: I90e813e03fcc661f66ff8800b1b73ca227a389b2
ivanzlenko pushed a commit to ivanzlenko/ozone that referenced this pull request Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants