Reduce connect timeout to 5s and reduce reconnect timeout from 60s to 30s #165

davissp14 · 2023-03-08T23:16:24Z

This works reduces the time it takes to identify and fence a primary in the event of a network partition.

When a network partition is initiated a couple things need to happen:

Repmgr will attempt to connect to a registered standby with a 5s connect_timeout.
Repmgr will wait up to 30 seconds for the standby to reconnect before issuing a child_node_disconnect event.
The child_node_disconnect event is then processed and triggers a cluster state evaluation.
The time it takes to evaluate the cluster will depend on the number of nodes registered with the cluster that are no longer reachable. Worst case, we should expect 5s per registered standby.

The split-brain detection window can be calculated using the following formula:

connect_timeout + standby_reconnect_timeout + (registered standbys * 5)

For a typical 3 node cluster we are looking at:

Connect timeout: 5s
Standby reconnect timeout: 30s
Registered standbys: (2 * 5s) = 10s

Total time: 45 seconds.

There are some optimizations we can make here to cut-down on time. E.G. We could get away evaluating only a subset of the registered members and bail once we know quorum can't be met. I'll have to think more about this.

Reduce connect timeout to 5s and reduce reconnect timeout from 60 to 30s

d1bb7b4

davissp14 merged commit 5d21394 into master Mar 8, 2023

davissp14 deleted the reduce-dataloss-window branch March 20, 2023 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce connect timeout to 5s and reduce reconnect timeout from 60s to 30s #165

Reduce connect timeout to 5s and reduce reconnect timeout from 60s to 30s #165

Uh oh!

davissp14 commented Mar 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Reduce connect timeout to 5s and reduce reconnect timeout from 60s to 30s #165

Reduce connect timeout to 5s and reduce reconnect timeout from 60s to 30s #165

Uh oh!

Conversation

davissp14 commented Mar 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant