Skip to content

Conversation

davissp14
Copy link
Contributor

@davissp14 davissp14 commented Mar 19, 2023

A witness member does not hold user-data and only exists for voting purposes.

Motivation
This enables 2+1 setups where the cluster runs 2 standard members "primary" and "standby" and then an additional "witness" node that is only there for voting purposes to ensure quorum can be met.

This offers a more cost effective way to achieve HA as the witness member's resource requirements are minimal.

Things to note:

  1. In a 2+1 setup, it's important that the witness node is not placed on the same hardware as members 1 and 2.
  2. The witness node does not contain any user data, only repmgr metadata.
  3. The witness node should be removed when running 3 full members.
  4. 3 member setups should always be preferred for production level setups.

return false, nil
}

if n.Witness && strings.Contains(err.Error(), "42P01") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to re-evaluate whether this is actually needed or not.

@davissp14 davissp14 marked this pull request as ready for review March 19, 2023 22:20
cleanup

this connection is not needed

Conside witness nodes as well as standbys when calculating quorum

Remove unnecessary logic
log.Println("Registering standby")
if err := n.RepMgr.registerStandby(); err != nil {
return fmt.Errorf("failed to register new standby: %s", err)
if n.Witness {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity have you tested switching a nodes role between member and witness? Does that break things?

Copy link
Contributor Author

@davissp14 davissp14 Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't, but It would break that node at the very least.

Comment on lines +378 to +397
return fmt.Errorf("failed to create required users: %s", err)
}

// Setup repmgr database and extension
if err := n.RepMgr.enable(ctx, conn); err != nil {
return fmt.Errorf("failed to enable repmgr: %s", err)
}

primary, err := n.RepMgr.ResolveMemberOverDNS(ctx)
if err != nil {
return fmt.Errorf("failed to resolve primary member: %s", err)
}

if err := n.RepMgr.registerWitness(primary.Hostname); err != nil {
return fmt.Errorf("failed to register witness: %s", err)
}
} else {
log.Println("Registering standby")
if err := n.RepMgr.registerStandby(); err != nil {
return fmt.Errorf("failed to register new standby: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking for a while we should set up a better logging system (obviously doesn't have to happen now) so we can annotate logs with the codepath they are hitting. Like instead of just loging:

failed to register new standby: %s

we could do something like

[witness-registration] failed to register new standby: %s

since that error can possibly happen in a few different places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zap and other logging frameworks provide codepath out of the box, but i'm not sure how useful it would be for the end-user. I think the existing error hierarchy gets us pretty close though, or at least I haven't run into an issue yet matching an error to a specific condition.

Comment on lines 65 to 69
if os.Getenv("WITNESS") != "" {
node.Witness = true
} else {
node.Witness = false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely unimportant nit: if just checking for existence os.LookupEnv

@davissp14 davissp14 merged commit 901612f into master Mar 20, 2023
@davissp14 davissp14 deleted the witness-support branch March 20, 2023 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants