Handle etcd compacted revisions #208

lukasertl · 2024-02-22T14:33:32Z

vip-manager currently doesn't seem to handle etcd compacted revisions gracefully:

Feb 22 15:17:03 host vip-manager[3872968]: 2024/02/22 15:17:03 IP address 10.x.y.z/16 state is true, desired true
Feb 22 15:17:13 host vip-manager[3872968]: 2024/02/22 15:17:13 IP address 10.x.y.z/16 state is true, desired true
Feb 22 15:17:21 host vip-manager[3872968]: 2024/02/22 15:17:21 etcd watcher returned error: etcdserver: mvcc: required revision has been compacted
Feb 22 15:17:21 host vip-manager[3872968]: 2024/02/22 15:17:21 IP address 10.x.y.z/16 state is true, desired false
Feb 22 15:17:21 host vip-manager[3872968]: 2024/02/22 15:17:21 Removing address 10.x.y.z/16 on ens192

Restarting vip-manager fixes this, but of course the database is not accessible until then.

The text was updated successfully, but these errors were encountered:

cfredericksen · 2024-03-19T02:38:08Z

I noticed this too when I was testing failover. The new leader requires a restart of vip-manager. Im trying to figure out a way to auto restart vip-manager daemon using monit on failover.

pashagolub · 2024-03-19T17:34:33Z

Any chance you guys can throw me a link to get idea of compacted revisions without chatting with AI or Google? :)

Thanks in advance!

cfredericksen · 2024-03-19T18:20:41Z

I guess I am less aware of "compacted revisions" and more referring to auto-recovery.

I am using vip-manager as part of https://github.com/vitabaks/postgresql_cluster. If I hard stop (power off the VM, simulating a hardware failure) the leader of the cluster, vip-manager never recovers until I restart vip-manager on the new leader. Sorry for the confusion.

SDV109 · 2024-04-05T12:19:49Z

Hi!
I also encountered this problem If one ETCD node is unavailable for a short period of time, in my case ETCD 3.5.11 and vip-manager 2.3.0 are used. The solution that helped was to restart the vip manager on the server that became the master in patroni cluster.

@pashagolub, I used the official etcd documentation (https://etcd.io/docs/v3.5/op-guide/maintenance/) the Auto Compression block when in the repository https://github.com/vitabaks/postgresql_cluster added PR with the addition of the compression function of the internal database ETCD.

vitabaks · 2024-04-15T09:14:01Z

@pashagolub please help.

etcdserver: mvcc: required revision has been compacted

maybe it will be enough to simply retry here to re-read the value of the latest version of the key?

pashagolub · 2024-04-15T14:41:14Z

I'd rather fix this error. But I cannot find how I can reproduce it in a simple way

vitabaks · 2024-04-15T14:50:02Z

I cannot find how I can reproduce it in a simple way

Add ETCD_AUTO_COMPACTION_RETENTION="1" option to etcd.conf

Example:

ETCD_NAME="pgnode01"
ETCD_LISTEN_CLIENT_URLS="http://192.168.150.141:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.150.141:2379"
ETCD_LISTEN_PEER_URLS="http://192.168.150.141:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.150.141:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-postgres-cluster"
ETCD_INITIAL_CLUSTER="pgnode01=http://192.168.150.141:2380,pgnode02=http://192.168.150.142:2380,pgnode03=http://192.168.150.143:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_ELECTION_TIMEOUT="5000"
ETCD_HEARTBEAT_INTERVAL="1000"
ETCD_INITIAL_ELECTION_TICK_ADVANCE="false"
ETCD_AUTO_COMPACTION_RETENTION="1"

Or use postgresql_cluster to deploy the Postgres cluster (with vip-manager) or to deploy ETCD cluster only.

SDV109 · 2024-04-16T12:25:14Z

@pashagolub, Hi, there is new information.
In connection with the detected problem, for the time being until the new version and fix of the problem, I turned off maintenance for ETCD on our DB clusters today and that's what an interesting moment I discovered.
On the cluster where vip-manager 2.3.0 was, when restarting ETCD, vip manager gave the error described above and vip disappeared before restarting vip-manager, and on the second cluster, where vip-manager version 2.1.0 is used, I did not find any problems with vip-manager, perhaps this is due to hardware, because on in the second cluster, the hardware is better, then the ETCD restart time was faster, but perhaps this will give a hint in finding a solution.
I will try to reproduce this scenario in my test zone with different versions as soon as possible and I will unsubscribe with an additional comment.

pashagolub · 2024-04-19T12:26:35Z

Hi people. Would you please try #217

Thanks in advance!

SDV109 · 2024-04-21T08:44:02Z

@pashagolub Hi!
I've done several iterations of testing.
In the first case, I did apt-get --purge --autoremove remove vip-manager for the old version, after installing the version from the 208-handle-etcd-compressed-revisions branch, and in this case, it was extremely difficult to repeat the error, but in one of the many attempts to cause this error, I managed to get it to call.
Next, I deployed the cluster from 0 and installed vip-manager immediately from the 208-handle-etcd-compressed-revisions branch, and this time, no matter how many different attempts I made, the error could not be reproduced. I tried to stop etcd on one of the 3 nodes, restart etc also on each of the nodes, but the error did not reproduce.

Upd:
Perhaps the error in the first case is related to incomplete removal of the vip-manager via apt-get --purge --autoremove remove vip-manager, since after removal, even if I restart the VM, the vip-manager service continues to work, despite the fact that there are no binaries anymore.
After_remove_vip-manager.txt

pashagolub self-assigned this Mar 19, 2024

pashagolub added the bug label Mar 19, 2024

pashagolub added a commit that referenced this issue Apr 19, 2024

[-] add compacted revision handler, fixes #208

def5a49

pashagolub linked a pull request Apr 19, 2024 that will close this issue

[-] add compacted revision handler, fixes #208 #217

Merged

pashagolub closed this as completed in #217 Apr 22, 2024

pashagolub added a commit that referenced this issue Apr 22, 2024

[-] add compacted revision handler, fixes #208 (#217)

60d6db6

lukasertl mentioned this issue May 22, 2024

Handle etcd leader changes #228

Closed

github-project-automation bot added this to vip-manager Aug 28, 2024

github-project-automation bot moved this to Done in vip-manager Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle etcd compacted revisions #208

Handle etcd compacted revisions #208

lukasertl commented Feb 22, 2024 •

edited

Loading

cfredericksen commented Mar 19, 2024

pashagolub commented Mar 19, 2024

cfredericksen commented Mar 19, 2024 •

edited

Loading

SDV109 commented Apr 5, 2024

vitabaks commented Apr 15, 2024

pashagolub commented Apr 15, 2024

vitabaks commented Apr 15, 2024 •

edited

Loading

SDV109 commented Apr 16, 2024

pashagolub commented Apr 19, 2024

SDV109 commented Apr 21, 2024 •

edited

Loading

Handle etcd compacted revisions #208

Handle etcd compacted revisions #208

Comments

lukasertl commented Feb 22, 2024 • edited Loading

cfredericksen commented Mar 19, 2024

pashagolub commented Mar 19, 2024

cfredericksen commented Mar 19, 2024 • edited Loading

SDV109 commented Apr 5, 2024

vitabaks commented Apr 15, 2024

pashagolub commented Apr 15, 2024

vitabaks commented Apr 15, 2024 • edited Loading

SDV109 commented Apr 16, 2024

pashagolub commented Apr 19, 2024

SDV109 commented Apr 21, 2024 • edited Loading

lukasertl commented Feb 22, 2024 •

edited

Loading

cfredericksen commented Mar 19, 2024 •

edited

Loading

vitabaks commented Apr 15, 2024 •

edited

Loading

SDV109 commented Apr 21, 2024 •

edited

Loading