Backport of Prevent split-brain active node writes when using Consul into release/1.14.x #23443

hc-github-team-secure-vault-core · 2023-10-02T14:57:19Z

Backport

This PR is auto-generated from #23013 to be assessed for backporting due to the inclusion of the label backport/1.14.x.

The below text is copied from the body of the original PR.

This PR will add a test and then a fix for a correctness bug when using Consul as a HA backend.

This issue, while possible to hit in our Community Edition as shown by the test added, is much less likely to cause noticeable problem in CE. It could at worse cause a major failure - say if the mount table was simultaneously modified by two active nodes in the small window of time that they both thing they are active. Chances are, even if you do manage to hit this, you will only loose a handful of updates, and even then only if multiple clients are writing to the same keys at the same time.

The issue is much more pronounced in Vault Enterprise where the active node is responsible for managing replicated state and does so with use of an index that must remain perfectly consistent with the underlying state.

Test Scenario

This test is specially constructed to maximise the chance of detecting bad behaviour in the case that multiple nodes consider themselves to be active for a while.

We use a KVv2 mount and patch updates. Two separate client goroutines connect to two different servers (one starts as active node) and then write updates to the same Key but with unique sub-keys. If Vault is correct, no matter what happens to the leader nodes, we should always end up with a single consistent record containing the full set of keys written by both. If we allow multiple leaders to overlap and still write (like we do before this fix) then each active node is likely to overwrite updates from the other resulting in gaps in one or both of the client's sets of keys.

We start writing, partition the leader from the rest of the Vault/Consul nodes, wait for more writes to complete on a new leader, then un-partition the leader again while it still has a client writing directly to it. This currently results in that leader completing at least one write that conflicts with the new writes from the new leader and "looses" some data.

This PR initially will be just the new test to ensure that it fails and fails for the right reasons in CI. Once I've seen that I'll push the fix too.

Overview of commits

0fa36a3

github-actions · 2023-10-02T15:22:36Z

CI Results:
All Go tests succeeded! ✅

github-actions · 2023-10-02T15:34:50Z

Build Results:
All builds succeeded! ✅

hghaf099

LGTM!

backport of commit 0fa36a3

282e98a

hc-github-team-secure-vault-core force-pushed the backport/consul-fencing/plainly-choice-grub branch from f1a99b5 to 282e98a Compare October 2, 2023 14:57

hc-github-team-secure-vault-core requested a review from banks October 2, 2023 14:57

github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Oct 2, 2023

banks added this to the 1.14.5 milestone Oct 2, 2023

banks enabled auto-merge (squash) October 2, 2023 16:32

hghaf099 approved these changes Oct 2, 2023

View reviewed changes

banks merged commit f8a29da into release/1.14.x Oct 2, 2023

banks deleted the backport/consul-fencing/plainly-choice-grub branch October 2, 2023 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport of Prevent split-brain active node writes when using Consul into release/1.14.x #23443

Backport of Prevent split-brain active node writes when using Consul into release/1.14.x #23443

hc-github-team-secure-vault-core commented Oct 2, 2023

github-actions bot commented Oct 2, 2023

github-actions bot commented Oct 2, 2023

hghaf099 left a comment

Backport of Prevent split-brain active node writes when using Consul into release/1.14.x #23443

Backport of Prevent split-brain active node writes when using Consul into release/1.14.x #23443

Conversation

hc-github-team-secure-vault-core commented Oct 2, 2023

Backport

Test Scenario

github-actions bot commented Oct 2, 2023

github-actions bot commented Oct 2, 2023

hghaf099 left a comment

Choose a reason for hiding this comment