-
Notifications
You must be signed in to change notification settings - Fork 76
Witness support #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Witness support #174
Conversation
internal/flypg/registration.go
Outdated
return false, nil | ||
} | ||
|
||
if n.Witness && strings.Contains(err.Error(), "42P01") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to re-evaluate whether this is actually needed or not.
cleanup this connection is not needed Conside witness nodes as well as standbys when calculating quorum Remove unnecessary logic
eb9469b
to
ff14dcd
Compare
internal/flypg/node.go
Outdated
log.Println("Registering standby") | ||
if err := n.RepMgr.registerStandby(); err != nil { | ||
return fmt.Errorf("failed to register new standby: %s", err) | ||
if n.Witness { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity have you tested switching a nodes role between member and witness? Does that break things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't, but It would break that node at the very least.
return fmt.Errorf("failed to create required users: %s", err) | ||
} | ||
|
||
// Setup repmgr database and extension | ||
if err := n.RepMgr.enable(ctx, conn); err != nil { | ||
return fmt.Errorf("failed to enable repmgr: %s", err) | ||
} | ||
|
||
primary, err := n.RepMgr.ResolveMemberOverDNS(ctx) | ||
if err != nil { | ||
return fmt.Errorf("failed to resolve primary member: %s", err) | ||
} | ||
|
||
if err := n.RepMgr.registerWitness(primary.Hostname); err != nil { | ||
return fmt.Errorf("failed to register witness: %s", err) | ||
} | ||
} else { | ||
log.Println("Registering standby") | ||
if err := n.RepMgr.registerStandby(); err != nil { | ||
return fmt.Errorf("failed to register new standby: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking for a while we should set up a better logging system (obviously doesn't have to happen now) so we can annotate logs with the codepath they are hitting. Like instead of just loging:
failed to register new standby: %s
we could do something like
[witness-registration] failed to register new standby: %s
since that error can possibly happen in a few different places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zap
and other logging frameworks provide codepath out of the box, but i'm not sure how useful it would be for the end-user. I think the existing error hierarchy gets us pretty close though, or at least I haven't run into an issue yet matching an error to a specific condition.
internal/flypg/node.go
Outdated
if os.Getenv("WITNESS") != "" { | ||
node.Witness = true | ||
} else { | ||
node.Witness = false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely unimportant nit: if just checking for existence os.LookupEnv
A witness member does not hold user-data and only exists for voting purposes.
Motivation
This enables 2+1 setups where the cluster runs 2 standard members "primary" and "standby" and then an additional "witness" node that is only there for voting purposes to ensure quorum can be met.
This offers a more cost effective way to achieve HA as the
witness
member's resource requirements are minimal.Things to note:
witness
node should be removed when running 3 full members.