-
Notifications
You must be signed in to change notification settings - Fork 2.9k
API: KMS client interface #3470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's beneficial to also include code about how this class will be used in EncryptionManager. Last time during the discussion I remember Ryan was for having these key generation logic directly in the manager itself as a BaseKmsEncryptionManager. I think adding that part of the logic here can facilitate the discussion.
I am personally more inclined right now with having a separated KmsClient interface, mostly because this is the service integration point with different KMS services on the market, they don't need to know any logic related to actual Iceberg encryption, this separated interface seems cleaner to me. I would expect only very advanced users to provide an entire custom EncryptionManager implementation. @rdblue do you have any comments around the approach to go here?
| * | ||
| * @return true if KMS server supports key generation and KmsClient implementation | ||
| * is interested to leverage this capability. Otherwise, return false - Iceberg will | ||
| * then generate secret keys locally (using the SecureRandom mechanism) and call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: {@link java.security.SecureRandom}
| ByteBuffer unwrapKey(String wrappedKey, String wrappingKeyId); | ||
|
|
||
| /** | ||
| * Returns the algorithm used by the provider when generating keys, such as AES_256_GCM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: {@link KeyGenerationAlgorithm#AES_256_GCM}
| KeyGenerationAlgorithm keyGenerationAlgorithm(); | ||
|
|
||
| /** | ||
| * Returns the spec of the key, such as AES_128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: {@link KeySpec#AES_128}
| * | ||
| * @param properties kms client properties | ||
| */ | ||
| void setProperties(Map<String, String> properties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the method that initializes the client? In that case we call it initialize(Map<String, String> properties); like https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/io/FileIO.java#L64-L69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep; actually, it used to be named initialize in the MVP code, but then I changed it to setProperties in this PR split to make it more flexible and allow for runtime update of credentials (like tokens with expiration). Now I see though that it makes it more complex and less intuitive. And there are many other ways to update the credentials, with or without explicit API. So changing this back to 'initialize`.
| import java.util.Map; | ||
| import javax.crypto.SecretKey; | ||
|
|
||
| public class DemoKmsClient implements KmsClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call it something more meaningful like LocalKeyFileKmsClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. The main point of this sample is leveraging the standard keystore API, so lets call it KeystoreKmsClient.
|
|
||
| public class DemoKmsClient implements KmsClient { | ||
|
|
||
| public static final String KEY_FILE_PATH_PROP = "demo.kms.client.keyfile.path"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these table properties / catalog properties? There are corresponding classes for them at TableProperties and CatalogProperties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are "kms specific" parameters, consumed by the custom KmsClient code. This parameter is relatively permanent and is not a secret, so can be set as a custom table property. But the other one (the password) can't. In this demo, it is passed as a Hadoop parameter, which can a fully in-memory transport; alternatively, it can be also passed via an env variable, etc.
Btw, now we can rename the file path parameter to something like keystore.kms.client.file.path - keeping in mind that this is not a standard table parameter (so shouldn't add it to the TableProperties class), but rather a custom parameter for a custom plug-in code.
I also think this is a simpler and cleaner approach for organizations and their users. One developer/ team would implement the KmsClient class for their organization, integrating with the local KMS and IAM services (the KmsClient interface should be able to support any). Once the implemented class is deployed, any authorized user in the org can write encrypted tables, passing only one parameter - a string with the table key name (later, we'll add column keys). The readers don't need to pass anything. Both writers and readers use auth credentials (KMS-specific).
One area where we might need to expand the basic approach, described above, is custom encrypted file streaming. By default, Iceberg will use its built-in encryption for the tables (PME/ORC for data; GcmStream for metadata, and potentially for Avro data; we can also add a built-in AwsS3ClientStream in Iceberg). But if an organization prefers to use its custom encryptors instead of those built in Iceberg, we can load them. I'm sending a new commit to this PR, that adds an interface currently called BaseKmsEncryptionManager. The name and content are a subject for discussion.
We have an internal implementation of EncryptionManager, a class called EnvelopeEncryptionManager that uses a KmsClient plug-in, and leverages the "envelope encryption" practice. |
| /** | ||
| * Module for custom encrypting and decrypting streams for table files. | ||
| */ | ||
| public interface BaseKmsEncryptionManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't extend the original EncryptionManager, because this would allow for non-standard key metadata management; this should be done instead via envelope encryption and the KmsClient.
Ryan, your feedback will be appreciated. This doc section expands on this subject - Or, if I try to compress this subject in a few sentences, it would be something like this:
|
|
Ok, I'll proceed with the plan discussed at the slack channel - completing the code updates across the encryption PRs this quarter, so the community can review them (including the interface) next quarter. |
| import java.util.Map; | ||
|
|
||
| /** | ||
| * a minimum client interface to connect to a key management service (KMS). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: a -> A
| /** | ||
| * Wrap a secret key, using a wrapping/master key which is stored in KMS and referenced by an ID. | ||
| * Wrapping means encryption of the secret key with the master key, and adding optional metadata | ||
| * that allows KMS to decrypt the secret key in an unwrap call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unwrap -> unwrapping?
|
|
||
| /** | ||
| * Wrap a secret key, using a wrapping/master key which is stored in KMS and referenced by an ID. | ||
| * Wrapping means encryption of the secret key with the master key, and adding optional metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Does "optional metadata" refer to the wrappingKeyId? Probably not, right? Key is shouldn't be optional. What is it then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the "optional metadata" is not the wrappingKeyId . A KMS gets a request to wrap a "secret" using the wrappingKeyId. The returned String (a json, typically) contains everything this KMS will need to unwrap the "secret". Always KMS-specific; can be things like the version of the wrapping key, an IV, etc.
| * @return true if KMS server supports key generation and KmsClient implementation | ||
| * is interested to leverage this capability. Otherwise, return false - Iceberg will | ||
| * then generate secret keys locally (using the SecureRandom mechanism) and call | ||
| * wrapKey to wrap them in KMS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: wrapKey -> {@link #wrapKey(ByteBuffer, String)}
| ByteBuffer unwrapKey(String wrappedKey, String wrappingKeyId); | ||
|
|
||
| /** | ||
| * Set kms client properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: -> Initialize the KMS client with given properties
| */ | ||
| void initialize(Map<String, String> properties); | ||
|
|
||
| class KeyGenerationResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a Java doc for this class as well.
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a unit test for KeyStoreKmsClient? It can demo how the client is used.
We will have such a demo / unitest in a later PR, where a table is encrypted using a kms client. However, this rudimentary client won't be a good example for developers. I think the best KmsClient example would be based on Hashicorp Vault (a real KMS; open-source), similar to what we have in Apache Parquet - developed by @andersonm . Maya - will appreciate if you'd be ok with building/contributing such client to Iceberg too :), in a separate pull request. |
Yes, @ggershinsky , I am going to work on such an example in a separate PR. |
| * Wrapping means encryption of the secret key with the master key, and adding optional metadata | ||
| * that allows KMS to decrypt the secret key in an unwrap call. | ||
| * | ||
| * @param key a secret key being wrapped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; no vertical alignment
| @Override | ||
| public void initialize(Map<String, String> properties) { | ||
| String keyFilePath = properties.get(KEY_FILE_PATH_PROP); | ||
| if (null == keyFilePath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use Precondition.checkArg for things like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preconditions.checkNotNull(properties.get(KEY_FILE_PATH_PROP), PROP + " cannot be null)
|
|
||
| @Override | ||
| public String wrapKey(ByteBuffer key, String wrappingKeyId) { | ||
| // keytool keeps key names in lower case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is here but we don't actually make the key lower case here
| * then generate secret keys locally (using the SecureRandom mechanism) and call | ||
| * wrapKey to wrap them in KMS. | ||
| */ | ||
| boolean supportsKeyGeneration(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a default implementation of false here?
| * @param wrappingKeyId a key ID that represents a wrapping key stored in KMS | ||
| * @return a pair of plaintext and wrapped key encrypted with the given wrappingKeyId | ||
| */ | ||
| KeyGenerationResult generateKey(String wrappingKeyId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one maybe should have a default to "unsupported operation exception"?
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some style nits, No Vertical alignment, Use of precondition.
The only real concern I have is that the test class implementations are here but we don't make sure they work. That's probably fine since the actual usage of this interface will exercise those apis? If that's coming up then +1 from me
Yep, these implementations are placed in the test module; they'll serve as a util for a unitest that encrypts/decrypts a table using these test kms classes. This unitest is coming up in the 5th PR in this series, where we connect all parts to enable table encryption. |
|
the commit addresses the comments in the last round. I also made an extra change in the Keystore client (using env var instead of hadoop prop for the password; this is safer and more intuitive). |
|
@jackye1995 Are you +1 on this? I think we have consensus to start getting these PR's in @ggershinsky looks like the tests are currently failing, could you rebase so we can be good to go? |
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Hi @RussellSpitzer , this PR depends on #2638 |
| * @param wrappingKeyId a key ID that represents a wrapping key stored in KMS | ||
| * @return wrapped key material | ||
| */ | ||
| String wrapKey(ByteBuffer key, String wrappingKeyId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder why we return String instead of ByteBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KMS systems typically return a JSON String with a wrapped key. If for some reason a KMS returns a raw byte buffer, it should be converted to a String via base64.
add interface for custom file encryptors update the pr refactor sample implementation move KmsClient interface to API module post-review changes
| Preconditions.checkNotNull(keystorePath, KEYSTORE_FILE_PATH_PROP + " must be set in hadoop or table " + | ||
| "properties"); | ||
|
|
||
| String keystorePassword = System.getenv(KEYSTORE_PASSWORD_ENV_VAR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is just a demo class, but won't this intern the keystorePassword in memory? Not sure if that's a concern or anything I am still an encryption novice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @RussellSpitzer , thanks for merging this . To answer the question - this is not a concern in a basic threat model, where we don't trust the storage domain, but we do trust the domain of compute nodes - we have to, because the node memory sees the sensitive data, the encryption keys, etc. The meaning of "trust" is a deep hole.., but to give a simple example, the compute nodes run in-house, while the storage backend is on another company's platform.
| /** | ||
| * Some KMS systems support generation of secret keys inside the KMS server. | ||
| * | ||
| * @return true if KMS server supports key generation and KmsClient implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think the "and KmsClient Implementation .... " part makes sense to me here? Is the statement a "Return true IFF the Kms Server can generate keys and this client should use that capability?"
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only question on this is whether or not we should have tests for this interface or do we not yet have the components required to do a meaningful test yet?
|
Oh sorry I forgot we talked about this before. Everyone is on board so let's move forward. |
|
Thanks @ggershinsky for the pr and |
No description provided.