-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPE-3684] Reinitialise raft #611
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #611 +/- ##
==========================================
+ Coverage 71.82% 72.58% +0.75%
==========================================
Files 13 13
Lines 3219 3392 +173
Branches 477 525 +48
==========================================
+ Hits 2312 2462 +150
- Misses 791 806 +15
- Partials 116 124 +8 ☔ View full report in Codecov by Sentry. |
4c7e201
to
29aaf18
Compare
e69800c
to
adbf7eb
Compare
self.update_config() | ||
self._patroni.start_patroni() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restarting the non-primary units.
for unit in self._peers.units: | ||
self._add_to_members_ips(self._get_unit_ip(unit)) | ||
self._add_to_members_ips(self._get_unit_ip(self.unit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the primary and leader are different, cluster will be unable to reconfigure, since the leader patroni is down and outside the cluster, so we have to keep the list here.
@taurus-forever gave me a charm branch ( However, it's not working for me. Could you please take a look into my steps if I'm doing something wrong or a patch could be incomplete?
Steps
Both are expected.
^^^ This should return And the raft membership has a problem.
^^^ |
Dear @nobuto-m , thank you for testing and sharing us feedback! Please check this the PR description: The reported test case has stopped both Leader and Sync_Standby simultaneously. We discussed this case on the last sync call:
We are going to improve UX and Juju statuses to inform users better about the Replica only left in the cluster. |
this PR addresses case #5, can you please check it from your side. |
Based on the discussion I have re-checked the cluster installation speed and the case 4:
but for some reason the raft re-initialization didn't happen in full. @dragomirp can you please check the case 4 in #611 (comment) and make sure the same logic applied here as in case 5. Tnx!
|
Syncobj RAFT implementation, used as a standalone DCS for Patroni, cannot elect a leader if the cluster loses quorum and becomes read only. This will prevent Patroni from automatically switching over, even in cases where sync_standbys are available in the cluster and could take over as primary.
The PR adds logic to detect when the RAFT cluster becomes read only and to reinitialise it, if a sync_standby is available to become a primary.