[DPE-3684] Reinitialise raft #611

dragomirp · 2024-09-03T19:09:12Z

Syncobj RAFT implementation, used as a standalone DCS for Patroni, cannot elect a leader if the cluster loses quorum and becomes read only. This will prevent Patroni from automatically switching over, even in cases where sync_standbys are available in the cluster and could take over as primary.

The PR adds logic to detect when the RAFT cluster becomes read only and to reinitialise it, if a sync_standby is available to become a primary.

codecov · 2024-09-03T19:12:19Z

Codecov Report

Attention: Patch coverage is 82.18391% with 31 lines in your changes missing coverage. Please review.

Project coverage is 72.58%. Comparing base (5c8949f) to head (d155abf).

Files with missing lines	Patch %	Lines
src/charm.py	78.37%	15 Missing and 9 partials ⚠️
src/cluster.py	88.88%	4 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #611      +/-   ##
==========================================
+ Coverage   71.82%   72.58%   +0.75%     
==========================================
  Files          13       13              
  Lines        3219     3392     +173     
  Branches      477      525      +48     
==========================================
+ Hits         2312     2462     +150     
- Misses        791      806      +15     
- Partials      116      124       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dragomirp · 2024-09-30T12:59:26Z

src/charm.py

+            self.update_config()
+            self._patroni.start_patroni()


Restarting the non-primary units.

dragomirp · 2024-09-30T13:01:20Z

src/charm.py

+            for unit in self._peers.units:
+                self._add_to_members_ips(self._get_unit_ip(unit))
+            self._add_to_members_ips(self._get_unit_ip(self.unit))


If the primary and leader are different, cluster will be unable to reconfigure, since the leader patroni is down and outside the cluster, so we have to keep the list here.

nobuto-m · 2024-10-24T09:06:44Z

@taurus-forever gave me a charm branch (14/edge/pr611) of testing this in the context of the following issue.
#571

However, it's not working for me. Could you please take a look into my steps if I'm doing something wrong or a patch could be incomplete?

postgresql_debug.log

unit-postgresql-2: 08:25:06 INFO juju.worker.uniter found queued "leader-elected" hook
unit-postgresql-2: 08:25:10 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:15 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:20 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:26 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:31 WARNING unit.postgresql/2.juju-log Failed to connect to PostgreSQL.
unit-postgresql-2: 08:25:31 INFO juju.worker.uniter.operation ran "leader-elected" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:05 WARNING unit.postgresql/2.juju-log database-peers:0: Remove raft member: Stuck raft cluster detected
unit-postgresql-2: 08:27:05 WARNING unit.postgresql/2.juju-log database-peers:0: Stuck raft not yet detected on all units
unit-postgresql-2: 08:27:10 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:15 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:20 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:25 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:30 WARNING unit.postgresql/2.juju-log database-peers:0: Failed to connect to PostgreSQL.
unit-postgresql-2: 08:27:30 INFO juju.worker.uniter.operation ran "database-peers-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:31 INFO juju.worker.uniter.operation ran "restart-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:32 INFO juju.worker.uniter.operation ran "upgrade-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:33 WARNING unit.postgresql/2.juju-log database-peers:0: Stuck raft not yet detected on all units
unit-postgresql-2: 08:27:33 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:27:49 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:32:12 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:36:35 INFO juju.worker.uniter.operation ran "database-peers-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:36:36 INFO juju.worker.uniter.operation ran "restart-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:36:37 INFO juju.worker.uniter.operation ran "upgrade-relation-departed" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 08:36:49 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)

Steps

juju deploy postgresql -n 3 --channel 14/edge/pr611 --base [email protected]
check the topology

$ sudo -u snap_daemon patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml topology
+ Cluster: postgresql (7429252738803380683) -+-----------+----+-----------+
| Member         | Host       | Role         | State     | TL | Lag in MB |
+----------------+------------+--------------+-----------+----+-----------+
| postgresql-1   | 10.0.9.118 | Leader       | running   |  1 |           |
| + postgresql-0 | 10.0.9.67  | Sync Standby | streaming |  1 |         0 |
| + postgresql-2 | 10.0.9.77  | Replica      | streaming |  1 |         0 |
+----------------+------------+--------------+-----------+----+-----------+

take down two nodes including the leader

lxc stop -f juju-c3fd60-1
lxc stop -f juju-c3fd60-0

check the status

$ juju status
Model        Controller  Cloud/Region         Version  SLA          Timestamp
postgres-pr  localhost   localhost/localhost  3.5.4    unsupported  08:24:32Z

App         Version  Status  Scale  Charm       Channel        Rev  Exposed  Message
postgresql  14.13    active    1/3  postgresql  14/edge/pr611  504  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/0   unknown   lost   0        10.0.9.67       5432/tcp  agent lost, see 'juju show-status-log postgresql/0'
postgresql/1*  unknown   lost   1        10.0.9.118      5432/tcp  agent lost, see 'juju show-status-log postgresql/1'
postgresql/2   active    idle   2        10.0.9.77       5432/tcp  

Machine  State    Address     Inst id        Base          AZ  Message
0        down     10.0.9.67   juju-c3fd60-0  [email protected]      Running
1        down     10.0.9.118  juju-c3fd60-1  [email protected]      Running
2        started  10.0.9.77   juju-c3fd60-2  [email protected]      Running

$ sudo -u snap_daemon env PATRONI_LOG_LEVEL=DEBUG patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml topology
2024-10-24 08:26:12,698 - DEBUG - Loading configuration from file /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml
2024-10-24 08:26:17,946 - INFO - waiting on raft
2024-10-24 08:26:22,947 - INFO - waiting on raft
2024-10-24 08:26:27,948 - INFO - waiting on raft
^C
Aborted!

Both are expected.

Remove units one by one

$ juju remove-unit postgresql/0 --force
WARNING This command will perform the following actions:
will remove unit postgresql/0
- will remove storage pgdata/0

Continue [y/N]? y

$ juju remove-unit postgresql/1 --force
WARNING This command will perform the following actions:
will remove unit postgresql/1
- will remove storage pgdata/1

Continue [y/N]? y

check the status once again

$ juju status
Model        Controller  Cloud/Region         Version  SLA          Timestamp
postgres-pr  localhost   localhost/localhost  3.5.4    unsupported  08:45:42Z

App         Version  Status  Scale  Charm       Channel        Rev  Exposed  Message
postgresql  14.13    active      1  postgresql  14/edge/pr611  504  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/2*  active    idle   2        10.0.9.77       5432/tcp  

Machine  State    Address    Inst id        Base          AZ  Message
2        started  10.0.9.77  juju-c3fd60-2  [email protected]      Running

$ juju run postgresql/leader get-primary
Running operation 3 with 1 task
  - task 4 on unit-postgresql-2

Waiting for task 4...
primary: postgresql/1

^^^ This should return postgresql/2 since that's the only member remaining in the model.

And the raft membership has a problem.

$ sudo -u snap_daemon env PATRONI_LOG_LEVEL=DEBUG patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml topology
2024-10-24 08:39:04,529 - DEBUG - Loading configuration from file /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml
2024-10-24 08:39:09,759 - INFO - waiting on raft
2024-10-24 08:39:14,759 - INFO - waiting on raft
2024-10-24 08:39:19,760 - INFO - waiting on raft
^C
Aborted!

$ syncobj_admin -conn localhost:2222 -status -pass Eu9IEl4ef4SqV6iL
commit_idx: 269
enabled_code_version: 0
has_quorum: False
last_applied: 269
leader: None
leader_commit_idx: 269
log_len: 2
match_idx_count: 0
next_node_idx_count: 0
partner_node_status_server_10.0.9.118:2222: 0
partner_node_status_server_10.0.9.67:2222: 0
partner_nodes_count: 2
raft_term: 108
readonly_nodes_count: 0
revision: deprecated
self: 10.0.9.77:2222
self_code_version: 0
state: 1
uptime: 3343
version: 0.3.12

^^^ partner_nodes_count: 2

taurus-forever · 2024-10-24T12:19:09Z

Dear @nobuto-m , thank you for testing and sharing us feedback!

Please check this the PR description: The PR adds logic to detect when the RAFT cluster becomes read only and to reinitialise it, **if a sync_standby is available** to become a primary.

The reported test case has stopped both Leader and Sync_Standby simultaneously.
Automated promotion in this case is not triggered by design,
as we cannot guarantee consistency for Replica type of the node.

We discussed this case on the last sync call:

(recommended) user restores one of the offline node (raft unlocks and old Primary/Sync_Standby will be elected as a new Primary)
Patroni/Raft cluster should be re-initialised (manually now). We will create the new force-something action (separate PR). It should be executed as a last resort if user accepts risks of loosing the last transaction(s).
in three nodes setup always keep: Primary + 2*Sync_Standby => noticeable write performance impact.
No other options on the table.

We are going to improve UX and Juju statuses to inform users better about the Replica only left in the cluster.
Do you see other improvements/fixes necessary here? Thank you!

taurus-forever · 2024-10-24T16:05:15Z

L # Primary  Sync-Standby    Replica
* 0                                   # All OK: RW+RO
* 1                             X     # Degraded mode RW+RO: no changes.
  2               X                   # Degraded mode RW+RO: Replica -> Sync-Standby
  3   X                               # Degraded mode RW+RO: Sync-Standby -> Primary, Replica -> Sync-Standby
* 4               X             X     # Disaster mode: re-add/restore units ASAP
  5   X                         X     # Disaster mode: Sync-Standby -> Primary AFTER juju remove-units (PR #611)
  6   X           X                   # Disaster mode: requires manual confirmation of Replica -> Primary. NEW action.
  7   X           X             X     # Disaster mode: reinstall and restore from backup or reuse Primary/SyncStandby disk

this PR addresses case #5, can you please check it from your side.

taurus-forever · 2024-10-24T17:18:31Z

Based on the discussion I have re-checked the cluster installation speed and the case 4:

The fresh 3-nodes cluster installation on my laptop takes 00:05:12.
The snap downloading time is close to 1 minute on each unit.
Re-tested the case 4: the raft issues detected after juju remove-unit --force:

unit-postgresql-2: 19:07:29 WARNING unit.postgresql/2.juju-log database-peers:0: Stuck raft has no candidate                         
unit-postgresql-2: 19:07:29 DEBUG unit.postgresql/2.juju-log database-peers:0: Early exit on_peer_relation_changed: stuck raft recovery

but for some reason the raft re-initialization didn't happen in full.

@dragomirp can you please check the case 4 in #611 (comment) and make sure the same logic applied here as in case 5. Tnx!

as for the case 6: we will approve the dedicated juju action name and prepare the separate PR/commit in the same GH issue.

dragomirp added 7 commits August 26, 2024 12:58

Use syncobj lib directly

05833c7

Bump libs

5753d1f

Add raft removal test

e2b082e

Merge branch 'main' into dpe-3684-syncobj

df0c37e

Merge branch 'main' into dpe-3684-syncobj

c31c0d5

Merge branch 'main' into dpe-3684-syncobj

6ae0dcc

Reinitialise RAFT WIP

538f721

github-actions bot added the Libraries: OK label Sep 3, 2024

Initial scaling test

29aaf18

dragomirp force-pushed the dpe-3684-reinitialise-raft branch from 4c7e201 to 29aaf18 Compare September 3, 2024 19:36

dragomirp added 17 commits September 3, 2024 23:21

Endless loop

d050e6e

Try to do fast shutdown

7626c99

Merge branch 'main' into dpe-3684-reinitialise-raft

c869331

Correct rerendering of yaml

774ea95

Unit tests

76bbe84

Ignore connection error

bdae203

Catch exception

fb922e5

Sync replica stereo test

e0bbb08

Fix test

895a8c4

No down unit

e739a56

Merge branch 'main' into dpe-3684-reinitialise-raft

6f71b36

Add check_writes to test

ed56f3d

Unit synchronisation WIP

7c8c95c

Tweak synchronisation

c7b26d2

Add logging

a4f0b39

Update endpoints after raft nuke

02d08ff

Fix unit tests

b454d06

dragomirp force-pushed the dpe-3684-reinitialise-raft branch from e69800c to adbf7eb Compare September 10, 2024 23:56

Track down unit

a281f7a

github-actions bot added Libraries: Out of sync and removed Libraries: OK labels Sep 30, 2024

dragomirp commented Sep 30, 2024

View reviewed changes

dragomirp mentioned this pull request Sep 30, 2024

[DPE-3684] database reinit loop #630

Closed

marceloneppel approved these changes Oct 1, 2024

View reviewed changes

dragomirp added 2 commits October 5, 2024 16:45

Merge branch 'main' into dpe-3684-reinitialise-raft

439073b

Merge branch 'main' into dpe-3684-reinitialise-raft

1783170

github-actions bot added Libraries: OK and removed Libraries: Out of sync labels Oct 12, 2024

dragomirp added 2 commits October 18, 2024 13:49

Merge branch 'main' into dpe-3684-reinitialise-raft

a7912fc

Update linting

bd67a39

github-actions bot added Libraries: Out of sync and removed Libraries: OK labels Oct 18, 2024

Merge branch 'main' into dpe-3684-reinitialise-raft

ac6c13f

taurus-forever mentioned this pull request Oct 24, 2024

The charm cannot recover from a quorum loss event of 3-node cluster #571

Open

dragomirp added 2 commits October 27, 2024 20:22

Merge branch 'main' into dpe-3684-reinitialise-raft

c769b8e

Merge branch 'main' into dpe-3684-reinitialise-raft

0896245

github-actions bot added Libraries: OK and removed Libraries: Out of sync labels Nov 19, 2024

Merge branch 'main' into dpe-3684-reinitialise-raft

5535896

dragomirp mentioned this pull request Nov 22, 2024

[DPE-3684] Factor out minor changes from the raft reinit PR #681

Merged

dragomirp added 2 commits November 25, 2024 18:58

Merge branch 'main' into dpe-3684-reinitialise-raft

a307bd8

Merge branch 'main' into dpe-3684-reinitialise-raft

d155abf

github-actions bot added Libraries: Out of sync and removed Libraries: OK labels Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPE-3684] Reinitialise raft #611

[DPE-3684] Reinitialise raft #611

dragomirp commented Sep 3, 2024 •

edited

Loading

codecov bot commented Sep 3, 2024 •

edited

Loading

dragomirp Sep 30, 2024

dragomirp Sep 30, 2024

nobuto-m commented Oct 24, 2024 •

edited

Loading

taurus-forever commented Oct 24, 2024

taurus-forever commented Oct 24, 2024

taurus-forever commented Oct 24, 2024 •

edited

Loading

[DPE-3684] Reinitialise raft #611

Are you sure you want to change the base?

[DPE-3684] Reinitialise raft #611

Conversation

dragomirp commented Sep 3, 2024 • edited Loading

codecov bot commented Sep 3, 2024 • edited Loading

Codecov Report

dragomirp Sep 30, 2024

Choose a reason for hiding this comment

dragomirp Sep 30, 2024

Choose a reason for hiding this comment

nobuto-m commented Oct 24, 2024 • edited Loading

Steps

taurus-forever commented Oct 24, 2024

taurus-forever commented Oct 24, 2024

taurus-forever commented Oct 24, 2024 • edited Loading

dragomirp commented Sep 3, 2024 •

edited

Loading

codecov bot commented Sep 3, 2024 •

edited

Loading

nobuto-m commented Oct 24, 2024 •

edited

Loading

taurus-forever commented Oct 24, 2024 •

edited

Loading