Prevent split-brain active node writes when using Consul #23013

banks · 2023-09-12T16:38:39Z

This PR will add a test and then a fix for a correctness bug when using Consul as a HA backend.

This issue, while possible to hit in our Community Edition as shown by the test added, is much less likely to cause noticeable problem in CE. It could at worse cause a major failure - say if the mount table was simultaneously modified by two active nodes in the small window of time that they both thing they are active. Chances are, even if you do manage to hit this, you will only loose a handful of updates, and even then only if multiple clients are writing to the same keys at the same time.

The issue is much more pronounced in Vault Enterprise where the active node is responsible for managing replicated state and does so with use of an index that must remain perfectly consistent with the underlying state.

Test Scenario

This test is specially constructed to maximise the chance of detecting bad behaviour in the case that multiple nodes consider themselves to be active for a while.

We use a KVv2 mount and patch updates. Two separate client goroutines connect to two different servers (one starts as active node) and then write updates to the same Key but with unique sub-keys. If Vault is correct, no matter what happens to the leader nodes, we should always end up with a single consistent record containing the full set of keys written by both. If we allow multiple leaders to overlap and still write (like we do before this fix) then each active node is likely to overwrite updates from the other resulting in gaps in one or both of the client's sets of keys.

We start writing, partition the leader from the rest of the Vault/Consul nodes, wait for more writes to complete on a new leader, then un-partition the leader again while it still has a client writing directly to it. This currently results in that leader completing at least one write that conflicts with the new writes from the new leader and "looses" some data.

This PR initially will be just the new test to ensure that it fails and fails for the right reasons in CI. Once I've seen that I'll push the fix too.

github-actions · 2023-09-12T17:05:10Z

CI Results:
All Go tests succeeded! ✅

banks · 2023-09-12T19:15:50Z

Yay, the new test failed as expected:

        	Error:      	Not equal: 
        	            	expected: []int{0, 1, 2, 3, 4, 5, 6, 7, 8}
        	            	actual  : []int{0, 1, 2, 3, 4}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,2 @@
        	            	-([]int) (len=9) {
        	            	+([]int) (len=5) {
        	            	  (int) 0,
        	            	@@ -5,7 +5,3 @@
        	            	  (int) 3,
        	            	- (int) 4,
        	            	- (int) 5,
        	            	- (int) 6,
        	            	- (int) 7,
        	            	- (int) 8
        	            	+ (int) 4
        	            	 }
        	Test:       	TestConsulFencing_PartitionedLeaderCantWrite
        	Messages:   	Client 0 writes lost

In this case client 0 which is connected to the old leader that is partitioned managed to write some data to it which was acknowledged by that leader, but then was "overwritten" by a subsequent update from the new leader that didn't know about those updates so they end up missing from the final set of results in the Key.

If you look at the state at intermediate points you can see the opposite happen - where client 1 has written lots more new entries to the new leader during the partitoin and they are then lost when the old leader writes back over them only knowing about the writes from before the partition. But because we wait for the old leader to notice the partition resilve and step down, the last write in the test is pretty much always going to be from the new leader and so we'll detect the failure as missing the last few writes that old leader (client 0) wrote rather than a gap in client 2's set of writes.

banks · 2023-09-12T19:57:04Z

CI passed the new test first time now we have the fix in place.

The fix uses Consul's session-check operation and Txn API to turn every write into one which atomically checks the node writing still holds the leader lock, preventing even a deposed leader that's not noticed it isn't still leader yet from being able to write at all.

I've run the new test scenario locally on my mac 180+ times in a loop without a single failure so I'm relatively confident it is a correct fix an a relatively reliable test despite it's inherent non-determinism.

banks · 2023-09-12T20:00:28Z

I've also performed some performance testing with this change to verify my assumption that the additional check-session would not impact write throughput or latencies significantly. I worked on Consul for 5.5 years but still an assumption worth checking!

I was unable to measure a significant difference in throughput or latency between this branch and main last week. I tested with a few different disk configurations for Consul including SSD-backed remote disks and NVME local SSDs in GCP. My actual averaged results ended up with this branch being marginally quicker but it was within the noise/variance of the results in general so indicates no significant impact. I've reported those results in more detail internally on the Enterprise portion of this change.

github-actions · 2023-09-12T20:18:21Z

Build Results:
All builds succeeded! ✅

raskchanky

vault/external_tests/consul_fencing_binary/consul_fencing_test.go

Co-authored-by: Josh Black <[email protected]>

banks added the pr/no-changelog label Sep 12, 2023

github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Sep 12, 2023

banks force-pushed the consul-fencing branch from 1c52bd3 to dc27239 Compare September 12, 2023 16:50

banks added the pr/no-milestone label Sep 12, 2023

banks added this to the 1.15.1 milestone Sep 12, 2023

banks removed pr/no-milestone pr/no-changelog labels Sep 12, 2023

banks force-pushed the consul-fencing branch from c41a027 to ae10069 Compare September 12, 2023 19:37

banks marked this pull request as ready for review September 12, 2023 20:01

raskchanky approved these changes Sep 12, 2023

View reviewed changes

banks changed the title ~~Add test to demonstrate a split-brain active node when using Consul~~ Prevent split-brain active node writes when using Consul Sep 14, 2023

banks force-pushed the consul-fencing branch from 73e50d9 to 345c93c Compare September 14, 2023 12:27

banks and others added 3 commits September 21, 2023 15:01

Add test to demonstrate a split-brain active node when using Consul

83622f4

Add Consul session check to prevent split-brain updates

f736a9e

It's not right

4d03b34

Co-authored-by: Josh Black <[email protected]>

banks force-pushed the consul-fencing branch from 345c93c to 4d03b34 Compare September 21, 2023 14:01

banks merged commit 0fa36a3 into main Sep 22, 2023

banks deleted the consul-fencing branch September 22, 2023 15:16

banks added the backport/1.15.x label Oct 2, 2023

hc-github-team-secure-vault-core mentioned this pull request Oct 2, 2023

Backport of Prevent split-brain active node writes when using Consul into release/1.15.x #23442

Merged

banks added the backport/1.14.x label Oct 2, 2023

hc-github-team-secure-vault-core mentioned this pull request Oct 2, 2023

Backport of Prevent split-brain active node writes when using Consul into release/1.14.x #23443

Merged

banks modified the milestones: 1.15.1, 1.14.5 Oct 2, 2023

banks added the backport/1.13.x label Oct 2, 2023

rrjjvv mentioned this pull request Jun 8, 2024

Test fixes #27412

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent split-brain active node writes when using Consul #23013

Prevent split-brain active node writes when using Consul #23013

banks commented Sep 12, 2023

github-actions bot commented Sep 12, 2023 •

edited

Loading

banks commented Sep 12, 2023 •

edited

Loading

banks commented Sep 12, 2023

banks commented Sep 12, 2023 •

edited

Loading

github-actions bot commented Sep 12, 2023

raskchanky left a comment

Prevent split-brain active node writes when using Consul #23013

Prevent split-brain active node writes when using Consul #23013

Conversation

banks commented Sep 12, 2023

Test Scenario

github-actions bot commented Sep 12, 2023 • edited Loading

banks commented Sep 12, 2023 • edited Loading

banks commented Sep 12, 2023

banks commented Sep 12, 2023 • edited Loading

github-actions bot commented Sep 12, 2023

raskchanky left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 12, 2023 •

edited

Loading

banks commented Sep 12, 2023 •

edited

Loading

banks commented Sep 12, 2023 •

edited

Loading