HDDS-11618. Enable HA mode for OM and SCM#20
Conversation
ptlrs
left a comment
There was a problem hiding this comment.
Thank you for the PR @Tejaskriya and @pyttel.
The implementation looks good to me.
I mainly have some questions around the upgrade flow.
Some of the other suggestions can be done in followup PRs.
ptlrs
left a comment
There was a problem hiding this comment.
Thanks @Tejaskriya for the updates to the PR.
I have posted my concerns as discussed about the two jobs. We can address those in future PRs.
Rest of the changes LGTM.
|
@ptlrs thanks for the feedback. The suggestions make sense, and as you have mentioned, I'd like to take care of these in followup tasks. Thanks for your approval on the PR! @kerneltime would you like to take a look? |
rakeshadr
left a comment
There was a problem hiding this comment.
@Tejaskriya thanks for the efforts in testing and maintaining this PR. Added a few comments, please go through it.
… retry to the OM bootstrap script, Added RPC port to OM headless service, made Ratis ports conditional
|
@Tejaskriya Thanks for putting up the changes. I could see a few open comments, I don't think these are blockers. Please create a followup task under HDDS-14382 and work on it. Then proceed further. +1 LGTM |
|
Thanks for the reviews @rakeshadr @ptlrs and the co-contribution @pyttel |
What changes were proposed in this pull request?
Co-authored by: @pyttel
HA for om and scm is implemented with the help of the "replicas count". In the helpers.tpl, if the replica count is higher than 1, then the necessary configs for enabling HA are set.
Further,
ozone.om.bootstrap.nodesandozone.om.decommissioned.nodeslists are maintained to keep a track of which OMs are bootstarapped and decommissioned respectively.The required pods are exposed for each service.
Ref, the original stale PR: #10
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11618
How was this patch tested?
Green CI run, and tested manually with leader transfer and basic ozone commands