Skip to content

Conversation

@xBis7
Copy link
Contributor

@xBis7 xBis7 commented Feb 22, 2023

What changes were proposed in this pull request?

Patch #4140 added OM HA metrics, with some logic in OM to unregister if leader is unknown. However, unregistration happens in the wrong if branch, should be done if leader is null, not its id (which is required to be non-null).

Related discussion: #4140 (comment)

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8009

How was this patch tested?

This patch was tested manually in docker clusters under

/hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone-ha
and
/hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone.

Also, after taking another look in method OzoneManager.updatePeerList() under which we register OMHAMetrics, I can see that this method is only called after electing a leader. Under any scenario that there is no leader, this method won't even get called so we won't have to worry about RatisServer leader being null. It should be safe to remove the leader check altogether.

OzoneManager.updatePeerList() gets used only in OzoneManagerStateMachine.notifyConfigurationChanged(..) if we search the logs we can verify, that it only gets called after new leader election.
(2023-02-22 15:05:04,865 [om2@group-D66704EFC61C-StateMachineUpdater] INFO ratis.OzoneManagerStateMachine: Received Configuration change notification from Ratis. New Peer list: )

2023-02-22 15:05:04,798 [grpc-default-executor-1] INFO server.RaftServer$Division: om2@group-D66704EFC61C replies to ELECTION vote request: om3<-om2#0:OK-t2. Peer's state: om2@group-D66704EFC61C:t2, leader=null, voted=om3, raftlog=Memoized:om2@group-D66704EFC61C-SegmentedRaftLog:OPENED:c0, conf=0: peers:[om1|rpc:om1:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER, om3|rpc:om3:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER, om2|rpc:om2:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
2023-02-22 15:05:04,837 [om2-server-thread2] INFO server.RaftServer$Division: om2@group-D66704EFC61C: change Leader from null to om3 at term 2 for appendEntries, leader elected after 42ms
2023-02-22 15:05:04,844 [om2-server-thread1] INFO server.RaftServer$Division: om2@group-D66704EFC61C: set configuration 1: peers:[om1|rpc:om1:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER, om3|rpc:om3:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER, om2|rpc:om2:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
2023-02-22 15:05:04,845 [om2-server-thread1] INFO segmented.SegmentedRaftLogWorker: om2@group-D66704EFC61C-SegmentedRaftLogWorker: Rolling segment log-0_0 to index:0
2023-02-22 15:05:04,846 [om2@group-D66704EFC61C-SegmentedRaftLogWorker] INFO segmented.SegmentedRaftLogWorker: om2@group-D66704EFC61C-SegmentedRaftLogWorker: Rolled log segment from /data/metadata/ratis/5cb24680-b9e7-3c90-a862-d66704efc61c/current/log_inprogress_0 to /data/metadata/ratis/5cb24680-b9e7-3c90-a862-d66704efc61c/current/log_0-0
2023-02-22 15:05:04,855 [om2@group-D66704EFC61C-SegmentedRaftLogWorker] INFO segmented.SegmentedRaftLogWorker: om2@group-D66704EFC61C-SegmentedRaftLogWorker: created new log segment /data/metadata/ratis/5cb24680-b9e7-3c90-a862-d66704efc61c/current/log_inprogress_1
2023-02-22 15:05:04,865 [om2@group-D66704EFC61C-StateMachineUpdater] INFO ratis.OzoneManagerStateMachine: Received Configuration change notification from Ratis. New Peer list:

@xBis7
Copy link
Contributor Author

xBis7 commented Feb 22, 2023

@adoroszlai Can you please take a look at this PR?

Copy link
Contributor

@neils-dev neils-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xBis7 for the fix. Thanks @adoroszlai for filing the Jira.

@adoroszlai adoroszlai merged commit c592d9c into apache:master Feb 23, 2023
@adoroszlai
Copy link
Contributor

Thanks @xBis7 for the patch, @neils-dev for the review.

it only gets called after new leader election

Yes, it seems so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants