Skip to content

Conversation

@ggershinsky
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the core label Nov 4, 2021
@ggershinsky ggershinsky marked this pull request as draft November 4, 2021 14:58
Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's beneficial to also include code about how this class will be used in EncryptionManager. Last time during the discussion I remember Ryan was for having these key generation logic directly in the manager itself as a BaseKmsEncryptionManager. I think adding that part of the logic here can facilitate the discussion.

I am personally more inclined right now with having a separated KmsClient interface, mostly because this is the service integration point with different KMS services on the market, they don't need to know any logic related to actual Iceberg encryption, this separated interface seems cleaner to me. I would expect only very advanced users to provide an entire custom EncryptionManager implementation. @rdblue do you have any comments around the approach to go here?

*
* @return true if KMS server supports key generation and KmsClient implementation
* is interested to leverage this capability. Otherwise, return false - Iceberg will
* then generate secret keys locally (using the SecureRandom mechanism) and call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: {@link java.security.SecureRandom}

ByteBuffer unwrapKey(String wrappedKey, String wrappingKeyId);

/**
* Returns the algorithm used by the provider when generating keys, such as AES_256_GCM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: {@link KeyGenerationAlgorithm#AES_256_GCM}

KeyGenerationAlgorithm keyGenerationAlgorithm();

/**
* Returns the spec of the key, such as AES_128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: {@link KeySpec#AES_128}

*
* @param properties kms client properties
*/
void setProperties(Map<String, String> properties);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the method that initializes the client? In that case we call it initialize(Map<String, String> properties); like https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/io/FileIO.java#L64-L69

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep; actually, it used to be named initialize in the MVP code, but then I changed it to setProperties in this PR split to make it more flexible and allow for runtime update of credentials (like tokens with expiration). Now I see though that it makes it more complex and less intuitive. And there are many other ways to update the credentials, with or without explicit API. So changing this back to 'initialize`.

import java.util.Map;
import javax.crypto.SecretKey;

public class DemoKmsClient implements KmsClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call it something more meaningful like LocalKeyFileKmsClient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. The main point of this sample is leveraging the standard keystore API, so lets call it KeystoreKmsClient.


public class DemoKmsClient implements KmsClient {

public static final String KEY_FILE_PATH_PROP = "demo.kms.client.keyfile.path";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these table properties / catalog properties? There are corresponding classes for them at TableProperties and CatalogProperties.

Copy link
Contributor Author

@ggershinsky ggershinsky Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are "kms specific" parameters, consumed by the custom KmsClient code. This parameter is relatively permanent and is not a secret, so can be set as a custom table property. But the other one (the password) can't. In this demo, it is passed as a Hadoop parameter, which can a fully in-memory transport; alternatively, it can be also passed via an env variable, etc.
Btw, now we can rename the file path parameter to something like keystore.kms.client.file.path - keeping in mind that this is not a standard table parameter (so shouldn't add it to the TableProperties class), but rather a custom parameter for a custom plug-in code.

@ggershinsky
Copy link
Contributor Author

I am personally more inclined right now with having a separated KmsClient interface, mostly because this is the service integration point with different KMS services on the market, they don't need to know any logic related to actual Iceberg encryption, this separated interface seems cleaner to me. I would expect only very advanced users to provide an entire custom EncryptionManager implementation.

I also think this is a simpler and cleaner approach for organizations and their users. One developer/ team would implement the KmsClient class for their organization, integrating with the local KMS and IAM services (the KmsClient interface should be able to support any). Once the implemented class is deployed, any authorized user in the org can write encrypted tables, passing only one parameter - a string with the table key name (later, we'll add column keys). The readers don't need to pass anything. Both writers and readers use auth credentials (KMS-specific).
Last but not least, this approach is based on the standard "envelope encryption" practice.

during the discussion I remember Ryan was for having these key generation logic directly in the manager itself as a BaseKmsEncryptionManager. I think adding that part of the logic here can facilitate the discussion.

One area where we might need to expand the basic approach, described above, is custom encrypted file streaming. By default, Iceberg will use its built-in encryption for the tables (PME/ORC for data; GcmStream for metadata, and potentially for Avro data; we can also add a built-in AwsS3ClientStream in Iceberg). But if an organization prefers to use its custom encryptors instead of those built in Iceberg, we can load them. I'm sending a new commit to this PR, that adds an interface currently called BaseKmsEncryptionManager. The name and content are a subject for discussion.

I think it's beneficial to also include code about how this class will be used in EncryptionManager.

We have an internal implementation of EncryptionManager, a class called EnvelopeEncryptionManager that uses a KmsClient plug-in, and leverages the "envelope encryption" practice.

/**
* Module for custom encrypting and decrypting streams for table files.
*/
public interface BaseKmsEncryptionManager {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't extend the original EncryptionManager, because this would allow for non-standard key metadata management; this should be done instead via envelope encryption and the KmsClient.

@ggershinsky
Copy link
Contributor Author

ggershinsky commented Nov 10, 2021

@rdblue do you have any comments around the approach to go here?

Ryan, your feedback will be appreciated. This doc section expands on this subject -
https://docs.google.com/document/d/19O_qiQumz_66CdWLpw38GFJEsUpnNxXckP9rnYIQnCo/edit#heading=h.bq3t7sd49j30

Or, if I try to compress this subject in a few sentences, it would be something like this:

  • For new usecases, we create a minimal interface, based on the standard "envelope encryption" practice, and designed to enable connection to KMS/IAM of any org.
  • For existing usecases (if any) and for orgs that for some reason don't want to use envelope encryption (I currently can't see why), we will continue to support the EncryptionManager interface, and will add a missing code that can actually load these custom implementations; but this support shouldn't affect the simplicity/cleanliness of the new interface.
  • Finally (a smaller point, later/advanced feature), the new interface will allow users to plug custom file encryptors into the envelope encryption system, if they are not satisfied with the Iceberg built-in file encryptors (PME, ORC, GcmStream, AwsS3ClientStream).

@ggershinsky
Copy link
Contributor Author

Ok, I'll proceed with the plan discussed at the slack channel - completing the code updates across the encryption PRs this quarter, so the community can review them (including the interface) next quarter.
cc @rdblue

@ggershinsky ggershinsky marked this pull request as ready for review January 20, 2022 11:54
@github-actions github-actions bot added the API label Jan 31, 2022
import java.util.Map;

/**
* a minimum client interface to connect to a key management service (KMS).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: a -> A

/**
* Wrap a secret key, using a wrapping/master key which is stored in KMS and referenced by an ID.
* Wrapping means encryption of the secret key with the master key, and adding optional metadata
* that allows KMS to decrypt the secret key in an unwrap call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unwrap -> unwrapping?


/**
* Wrap a secret key, using a wrapping/master key which is stored in KMS and referenced by an ID.
* Wrapping means encryption of the secret key with the master key, and adding optional metadata
Copy link
Contributor

@flyrain flyrain Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does "optional metadata" refer to the wrappingKeyId? Probably not, right? Key is shouldn't be optional. What is it then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the "optional metadata" is not the wrappingKeyId . A KMS gets a request to wrap a "secret" using the wrappingKeyId. The returned String (a json, typically) contains everything this KMS will need to unwrap the "secret". Always KMS-specific; can be things like the version of the wrapping key, an IV, etc.

* @return true if KMS server supports key generation and KmsClient implementation
* is interested to leverage this capability. Otherwise, return false - Iceberg will
* then generate secret keys locally (using the SecureRandom mechanism) and call
* wrapKey to wrap them in KMS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: wrapKey -> {@link #wrapKey(ByteBuffer, String)}

ByteBuffer unwrapKey(String wrappedKey, String wrappingKeyId);

/**
* Set kms client properties
Copy link
Contributor

@flyrain flyrain Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: -> Initialize the KMS client with given properties

*/
void initialize(Map<String, String> properties);

class KeyGenerationResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a Java doc for this class as well.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a unit test for KeyStoreKmsClient? It can demo how the client is used.

@ggershinsky ggershinsky changed the title Core: KMS client interface API: KMS client interface Feb 2, 2022
@ggershinsky
Copy link
Contributor Author

Can we have a unit test for KeyStoreKmsClient? It can demo how the client is used.

We will have such a demo / unitest in a later PR, where a table is encrypted using a kms client. However, this rudimentary client won't be a good example for developers. I think the best KmsClient example would be based on Hashicorp Vault (a real KMS; open-source), similar to what we have in Apache Parquet - developed by @andersonm . Maya - will appreciate if you'd be ok with building/contributing such client to Iceberg too :), in a separate pull request.

@andersonm-ibm
Copy link

Can we have a unit test for KeyStoreKmsClient? It can demo how the client is used.

We will have such a demo / unitest in a later PR, where a table is encrypted using a kms client. However, this rudimentary client won't be a good example for developers. I think the best KmsClient example would be based on Hashicorp Vault (a real KMS; open-source), similar to what we have in Apache Parquet - developed by @andersonm . Maya - will appreciate if you'd be ok with building/contributing such client to Iceberg too :), in a separate pull request.

Yes, @ggershinsky , I am going to work on such an example in a separate PR.

* Wrapping means encryption of the secret key with the master key, and adding optional metadata
* that allows KMS to decrypt the secret key in an unwrap call.
*
* @param key a secret key being wrapped
Copy link
Member

@RussellSpitzer RussellSpitzer Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; no vertical alignment

@Override
public void initialize(Map<String, String> properties) {
String keyFilePath = properties.get(KEY_FILE_PATH_PROP);
if (null == keyFilePath) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use Precondition.checkArg for things like this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preconditions.checkNotNull(properties.get(KEY_FILE_PATH_PROP), PROP + " cannot be null)


@Override
public String wrapKey(ByteBuffer key, String wrappingKeyId) {
// keytool keeps key names in lower case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is here but we don't actually make the key lower case here

* then generate secret keys locally (using the SecureRandom mechanism) and call
* wrapKey to wrap them in KMS.
*/
boolean supportsKeyGeneration();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a default implementation of false here?

* @param wrappingKeyId a key ID that represents a wrapping key stored in KMS
* @return a pair of plaintext and wrapped key encrypted with the given wrappingKeyId
*/
KeyGenerationResult generateKey(String wrappingKeyId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one maybe should have a default to "unsupported operation exception"?

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some style nits, No Vertical alignment, Use of precondition.

The only real concern I have is that the test class implementations are here but we don't make sure they work. That's probably fine since the actual usage of this interface will exercise those apis? If that's coming up then +1 from me

@ggershinsky
Copy link
Contributor Author

ggershinsky commented Feb 3, 2022

The only real concern I have is that the test class implementations are here but we don't make sure they work. That's probably fine since the actual usage of this interface will exercise those apis? If that's coming up then +1 from me

Yep, these implementations are placed in the test module; they'll serve as a util for a unitest that encrypts/decrypts a table using these test kms classes. This unitest is coming up in the 5th PR in this series, where we connect all parts to enable table encryption.

@ggershinsky
Copy link
Contributor Author

the commit addresses the comments in the last round. I also made an extra change in the Keystore client (using env var instead of hadoop prop for the password; this is safer and more intuitive).

@RussellSpitzer
Copy link
Member

@jackye1995 Are you +1 on this? I think we have consensus to start getting these PR's in

@ggershinsky looks like the tests are currently failing, could you rebase so we can be good to go?

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@ggershinsky
Copy link
Contributor Author

looks like the tests are currently failing

Hi @RussellSpitzer , this PR depends on #2638

* @param wrappingKeyId a key ID that represents a wrapping key stored in KMS
* @return wrapped key material
*/
String wrapKey(ByteBuffer key, String wrappingKeyId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder why we return String instead of ByteBuffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KMS systems typically return a JSON String with a wrapped key. If for some reason a KMS returns a raw byte buffer, it should be converted to a String via base64.

add interface for custom file encryptors

update the pr

refactor sample implementation

move KmsClient interface to API module

post-review changes
Preconditions.checkNotNull(keystorePath, KEYSTORE_FILE_PATH_PROP + " must be set in hadoop or table " +
"properties");

String keystorePassword = System.getenv(KEYSTORE_PASSWORD_ENV_VAR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just a demo class, but won't this intern the keystorePassword in memory? Not sure if that's a concern or anything I am still an encryption novice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @RussellSpitzer , thanks for merging this . To answer the question - this is not a concern in a basic threat model, where we don't trust the storage domain, but we do trust the domain of compute nodes - we have to, because the node memory sees the sensitive data, the encryption keys, etc. The meaning of "trust" is a deep hole.., but to give a simple example, the compute nodes run in-house, while the storage backend is on another company's platform.

/**
* Some KMS systems support generation of secret keys inside the KMS server.
*
* @return true if KMS server supports key generation and KmsClient implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think the "and KmsClient Implementation .... " part makes sense to me here? Is the statement a "Return true IFF the Kms Server can generate keys and this client should use that capability?"

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only question on this is whether or not we should have tests for this interface or do we not yet have the components required to do a meaningful test yet?

@RussellSpitzer
Copy link
Member

Oh sorry I forgot we talked about this before. Everyone is on board so let's move forward.

@RussellSpitzer RussellSpitzer merged commit a0e75ef into apache:master Apr 13, 2022
@RussellSpitzer
Copy link
Member

RussellSpitzer commented Apr 13, 2022

Thanks @ggershinsky for the pr and
@jackye1995 + @flyrain and @shangxinli for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants