Skip to content

Add support for encrypting data at rest on server side #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 10, 2021

Conversation

LaPetiteSouris
Copy link
Contributor

@LaPetiteSouris LaPetiteSouris commented Mar 26, 2021

Context

To solve: #217

Encrypting message contents before storing in the partition's segment on the server side. The option is opt-in, enabled either by server configuration (for all streams) or configurable for each newly created stream.

Implementation

The general strategy for encryption:

  • Generate a random AES data key (DKS) for each incoming message.
  • Encrypt the message with the DKS key
  • Wrap the DKS key with a Master Key using RFC 3394 standard
  • Concatenate the wrapped key and the encrypted message in a single bytes array. The first byte indicates the length of the wrapped key.
  • Example for a case of a wrapped key contains n bytes, an encrypted message of m bytes.

datencryption

The general strategy for decryption:

  • Read the first byte in the array to know the length of the wrapped DKS.
  • Then the following bytes in the array to get wrapped DKS Key
  • Unwrap DKS with a Master Key using RFC 3394 standard
  • Decode the message.
  1. For encryption:

datencryption_encrypt

  1. For decryption:

datencryption_encrypt (1)

Requirements

liftbridge-io/liftbridge-api#52

liftbridge-io/go-liftbridge#111

- Add encryption on partition writes on messageProcessingLoop
- Add decryption on partition read on api.subscribe
- Not yet handler partition's data encryption-at-rest config
- Add default config
- Add config overwrite
- Add test case to verify server can handle both streams with encryption and no encryption at the same time
- Make sure that on stream creation, when encryption is enabled, a local variable LOCAL_KEY_MASTER must be set
- Make sure that on stream creation, when encryption is enabled, a local variable LOCAL_KEY_MASTER must be set
Copy link
Contributor

@annismckenzie annismckenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of doing a first review. This is awesome. 🎉

Copy link
Member

@tylertreat tylertreat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need time to review more thoroughly, but one of the questions I have is around the key wrapping. In the PR description, you say every message is encrypted with its own data key, but in the code it looks like there is one data key that gets generated (defaultDKS) and used for all messages unless I missed something. However, generating a data encryption key for every message seems expensive. Would it make more sense to encrypt sets of messages using the same data encryption key and change the key periodically?

@LaPetiteSouris
Copy link
Contributor Author

I need time to review more thoroughly, but one of the questions I have is around the key wrapping. In the PR description, you say every message is encrypted with its own data key, but in the code it looks like there is one data key that gets generated (defaultDKS) and used for all messages unless I missed something. However, generating a data encryption key for every message seems expensive. Would it make more sense to encrypt sets of messages using the same data encryption key and change the key periodically?

Actually it is my PR description which is a bit confusing.

	// Generate a default Data Key (DKS) if not yet available
	if handler.defaultDKS == nil {
		dksKey, err := handler.generateDKS()

		if err != nil {
			return nil, err
		}
		handler.defaultDKS = dksKey
	}

The Key Data is generated upon partition initialisation and can be regenerate at any other moment at the encryption of each message.

Not sure about the performance of _, err := rand.Read(key), but it should not be that expensive .

The key data can be just anything, the only condition is that it should be stored in an easily accessible location to facilitate decryption.

@LaPetiteSouris
Copy link
Contributor Author

Status update:

  • Good to review again
  • Some tests are failing (likely flaky tests)

Cheers

@LaPetiteSouris LaPetiteSouris force-pushed the master branch 2 times, most recently from 3715634 to 1eb8c22 Compare April 25, 2021 14:04
@LaPetiteSouris
Copy link
Contributor Author

In the tests, some tests in activity_test.go are randomly failing.

=== RUN   TestActivityStreamPauseStream
panic: database not open

It's rather strange because those tests only fail when being executed together, in sequence. It rarely fails when being executed separately. I am a bit clueless here. Any suggestion ?

@LaPetiteSouris
Copy link
Contributor Author

LaPetiteSouris commented Apr 27, 2021

It looks like that the test
TestActivityStreamSetStreamReadwrite fails in isolation.

I figured out that may be the server tries to access BoltDB after it is closed or before it is opened . See BoltError

The full test

func TestActivityStreamSetStreamReadwrite(t *testing.T) {
	defer cleanupStorage(t)

	// Configure server.
	s1Config := getTestConfig("a", true, 5050)
	s1Config.ActivityStream.Enabled = true
	s1Config.ActivityStream.PublishTimeout = time.Second
	s1Config.ActivityStream.PublishAckPolicy = liftApi.AckPolicy_LEADER
	s1 := runServerWithConfig(t, s1Config)
	defer s1.Stop()

	// Wait for server to elect itself leader.
	getMetadataLeader(t, 10*time.Second, s1)

	client, err := lift.Connect([]string{"localhost:5050"})
	require.NoError(t, err)
	defer client.Close()

	// Create stream.
	stream := "foo-stream"
	require.NoError(t, client.CreateStream(context.Background(), "foo", stream))

	// Set stream readonly.
	require.NoError(t, client.SetStreamReadonly(context.Background(), stream))

	// Set stream readwrite.
	require.NoError(t, client.SetStreamReadonly(context.Background(), stream, lift.Readonly(false)))

	// The first message read back should be the stream readwrite.
	msgs := make(chan *lift.Message, 1)
	ctx, cancel := context.WithCancel(context.Background())
	err = client.Subscribe(ctx, activityStream, func(msg *lift.Message, err error) {
		require.NoError(t, err)
		msgs <- msg
		cancel()
	}, lift.StartAtLatestReceived())
	require.NoError(t, err)

	// Wait to get the message.
	select {
	case msg := <-msgs:
		var se liftApi.ActivityStreamEvent
		err = se.Unmarshal(msg.Value())
		require.NoError(t, err)
		require.Equal(t, liftApi.ActivityStreamOp_SET_STREAM_READONLY, se.GetOp())
		require.Equal(t, stream, se.SetStreamReadonlyOp.GetStream())
		require.Equal(t, []int32{0}, se.SetStreamReadonlyOp.GetPartitions())
		require.Equal(t, false, se.SetStreamReadonlyOp.GetReadonly())
	case <-time.After(5 * time.Second):
		t.Fatal("Did not receive expected message")
	}
}

Any suggestion is welcomed .

@tylertreat
Copy link
Member

Will take a look at the flaky tests...

@LaPetiteSouris
Copy link
Contributor Author

activity_test is now fixed. The fix is present in 6b29cbc. In short, we shall wait a few seconds for the server before executing tests in such case.

The remaining tests that are failing seems to be flaky ones that are mentioned in #311

- in flaky replication tests, make sure not to kill the server that is serving the test client
@LaPetiteSouris
Copy link
Contributor Author

The tests shall be fixed (hopefully) by this PR: #341

@LaPetiteSouris
Copy link
Contributor Author

Tests are finally fixed thanks to #341

@tylertreat tylertreat merged commit 8c3bb96 into liftbridge-io:master May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants