Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest list encryption #7770

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ggershinsky
Copy link
Contributor

No description provided.

@ggershinsky ggershinsky marked this pull request as draft June 5, 2023 06:10
@@ -162,6 +162,15 @@ default Iterable<DeleteFile> removedDeleteFiles(FileIO io) {
*/
String manifestListLocation();

/**
* Return the size of this snapshot's manifest list. For encrypted tables, a verified plaintext
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix comment

@ggershinsky ggershinsky marked this pull request as ready for review March 27, 2024 07:07
@@ -162,6 +162,25 @@ default Iterable<DeleteFile> removedDeleteFiles(FileIO io) {
*/
String manifestListLocation();

/**
* Return the size of this snapshot's manifest list file. Must be a verified value, taken from a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused here, we have a default of -1 as well set in base Snapshot which seemed to also be allowed as an "unset". Should we mention that here or is it always required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define this field to be required only for encrypted tables. It will be not set in the snapshot file for unencrypted tables - where this method can return 0 (or -1, I'll make it consistent across all implementation classes).

if (manifestListKeyMetadata != null) { // encrypted manifest list file
Preconditions.checkArgument(
fileIO instanceof EncryptingFileIO,
"No encryption in FileIO class " + fileIO.getClass());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot read manifest list (%s) because it is encrypted but the configured FileIO (%s) does not implement EncryptingFileIO)

EncryptingFileIO encryptingFileIO = (EncryptingFileIO) fileIO;
Preconditions.checkArgument(
encryptingFileIO.encryptionManager() instanceof StandardEncryptionManager,
"Encryption manager for encrypted manifest list files can currently only be an instance of "
Copy link
Member

@RussellSpitzer RussellSpitzer Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot decrypt manifest list (%s) because the encryption manager (%s) does not implement StandardEncryptionManager

generator.writeStringField(MANIFEST_LIST_KEY_METADATA, snapshot.manifestListKeyMetadata());
}

// TODO discuss: do we need to sign the size value? Or sign the whole snapshot?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this attack work? Wouldn't the user also need the key to encrypt the replacement files? I thought we were storing the metadata.json key in the catalog so an attacker could replace everything but still not be able to trick a client using the catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the question. Some thoughts on the scenarios and protection options:

  • currently, we don't have a metadata.json key. We have only a key for snapshot's manifest list file. Besides using it for encrypting the manifest list file, we can also use this key for signing snapshot's sensitive parts like the manifest list size field. Or for signing the whole metadata.json file (should be possible with some effort) - then we also protect the integrity of e.g. the table properties (like the table key id).
  • snapshot (metadata.json file) doesn't keep secret values, so encrypting it might not be required. The signatures, mentioned above, would be kept in added snapshot fields - sufficient for detecting the file modification attacks.
  • these protection techniques are not required with the REST catalog - because we trust the catalog service (we don't trust the storage service). Since the whole snapshot is stored in the REST catalog, we don't need to sign anything.
  • the manifest list key is not stored in the catalog. Instead, it is wrapped in a KMS with the table master key, and stored in the snapshot MANIFEST_LIST_KEY_METADATA field. Only the KMS-authorized (for the table key) users/processes will be able to get the manifest list key.
  • In catalogs other than the REST, the signatures provide a partial protection - because the metadata.json is kept in the untrusted storage. With the signatures, it can't be modified. But the whole folder can be replaced (e.g. a replay attack - where all table files are removed, and replaced with files of an older version of the table). To prevent this attack in non-REST catalogs, we will have to update the catalog per each table snapshot (setting eg the latest table version/sequence number, or a random AAD prefix)

&& encryptedManifestList.keyMetadata().buffer() != null) {
Preconditions.checkArgument(
encryptionManager instanceof StandardEncryptionManager,
"Encryption manager for encrypted manifest list files can currently only be an instance of "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment as above. "Cannot X because Y"

@@ -85,7 +85,7 @@ public class TestManifestEncryption {

private static final DataFile DATA_FILE =
new GenericDataFile(
0,
SPEC.specId(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completing the previous patch #8252 (comment)

private static final long EXISTING_ROWS = 857273L;
private static final int DELETED_FILES = 1;
private static final long DELETED_ROWS = 22910L;
private static final List<ManifestFile.PartitionFieldSummary> PARTITION_SUMMARIES =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need a test example which has a non-empty list of partition field summaries


@Test
public void testV2Write() throws IOException {
ManifestFile manifest = writeAndReadManifestList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: writeAndReadEncryptedManifestList

public void testV2Write() throws IOException {
ManifestFile manifest = writeAndReadManifestList();

// all v2 fields should be read correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert J has some helper for this, Not sure if it is correct

assertThat(actual).usingRecursiveComparison().isEqualTo(expected);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// TODO discuss: do we need to sign the size value? Or sign the whole snapshot?
// Or rely on REST catalog? - the only option that prevents "full folder replacement" attack.
if (snapshot.manifestListSize() >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another small question here, we essentially are doing a transform here

manifestlists sizes < 0 become 0.

Also nit: we are also ignoring 0's that get passed through although we will read this as 0 if it is missing.

Just wondering what the intent here is. I think it may be better to have a defined missing value? Not sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, part of this thread #7770 (comment)

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that we store the manifest list size? Won't the encryption be enough to prove the file is the right one?

@ggershinsky
Copy link
Contributor Author

Yep, this is due to https://github.com/apache/iceberg/blob/main/format/gcm-stream-spec.md#file-length . There are options for table modification attacks if this field is not (safely) stored.

this.v1ManifestLocations = v1ManifestLocations;
this.manifestListKeyMetadata = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment above (group manifest vars)

* In encrypted tables, return the size of this snapshot's manifest list file. Must be a verified
* value, taken from a trusted source. In unencrypted tables, can return 0.
*/
default long manifestListSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding new methods for each piece of new information we want to pass, what about adding a ManifestList object that contains the location, size, and key metadata?

@@ -162,6 +162,16 @@ default Iterable<DeleteFile> removedDeleteFiles(FileIO io) {
*/
String manifestListLocation();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is String manifestListLocation() same as what is stored in ManifestListFile? If so, do we still need it as a separate field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue @RussellSpitzer what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are dozens of calls to this method today (tests etc), so we'll likely need to keep it for a while.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw that. We can consolidate in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. We need to keep this because it is part of the public API and used in many cases.

ByteBuffer.wrap(
Base64.getDecoder().decode(snapshot.manifestListFile().wrappedKeyMetadata()));

NativeEncryptionKeyMetadata wrappedKeyMetadata =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll remove these lines.

Comment on lines 179 to 181
ByteBuffer manifestListKeyMetadata = null;
ByteBuffer wrappedManifestListKeyMetadata = null;
String wrappedKeyEncryptionKey = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Declare these inside if (node.has(MANIFEST_LIST_KEY_METADATA)) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used later, in the BaseSnapshot constructor. We can use a different constructor (if no manifest list encryption), but the code won't be more compact.

return size;
}

public String wrappedKeyEncryptionKey() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc: "Manifest list keys are encrypted with a table "key encryption key". This function returns a KMS wrap of the key encryption key.

@ggershinsky ggershinsky reopened this Aug 4, 2024
@@ -143,7 +172,13 @@ private void cacheManifests(FileIO fileIO) {

if (allManifests == null) {
// if manifests isn't set, then the snapshotFile is set and should be read to get the list
this.allManifests = ManifestLists.read(fileIO.newInputFile(manifestListLocation));
InputFile manifestListInputFile = fileIO.newInputFile(manifestListFile.location());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this under if else (called only for unencrypted)

@ggershinsky
Copy link
Contributor Author

@rdblue Thanks for the patch, I've merged it into this PR. We still need to sync on caching the unwrapped keys, I've added a commit that implements one way of doing this, will appreciate your review.

return keyBytes;
}

public void setUnwrappedKey(ByteBuffer key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not contain a key. It's fine to cache, but cache in unwrapKey instead.

update snapshot producer

update the patch

clean up

fix for previous patch

address review comments

move key wrapping to metadata encryption

encrypt manifest list key metadata

new aad util

null key needs no encryption

comment; clearer method/var names

use key encryption key for manifest list keys

add encryption util changes

update EncryptionTestHelpers

handle api change

remove unused lines

revert revapi.yml

KEK cache

unitest update

rename var

address review comments

fix timeout default

change writer kek timeout default

Updates from review.

cache unwrapped keys
@ggershinsky
Copy link
Contributor Author

@rdblue thanks for the caffeine-based cache for unwrapped keys, I've applied the patch.

@ggershinsky
Copy link
Contributor Author

ggershinsky commented Sep 12, 2024

Regarding the caching limit - 1 minute might be too short. If a reader (or writer) refreshes a table each few minutes, there could be many KMS unwrap calls. In the new commit, I've changed this to a combined limit of 1000 entries or 1 day. Each entry is comprised of a "wrapped key" (could be up to a few hundred bytes, depending on KMS), and "unwrapped key" (a few dozen bytes), so the 1000 entries / max size is something like 0.5MB.
@rdblue what do you think?

EDT: actually, we might be able to make the cache ~x10 smaller, if we use the "key id" as the cache key. Its size is a few dozen bytes.

new: "method java.lang.Long org.apache.iceberg.encryption.NativeEncryptionKeyMetadata::fileLength()"
justification: "New method in interface"
- code: "java.method.addedToInterface"
new: "method org.apache.iceberg.encryption.NativeEncryptionKeyMetadata org.apache.iceberg.encryption.NativeEncryptionKeyMetadata::copyWithLength(long)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than revapi exceptions, we should add default implementations.

}

@Override
public ByteBuffer decryptKeyMetadata(EncryptionManager em) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When was this added? I don't think this should be here because it just calls a method on EncryptionManager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added in PR11 (these lines: interface, implementation ). I think the main is reason is, the class EncryptingFileIO is in the api module, so it cannot access StandardEncryptionManager and EncryptionUtil classes.

@@ -160,4 +160,10 @@ private CatalogProperties() {}

public static final String ENCRYPTION_KMS_TYPE = "encryption.kms-type";
public static final String ENCRYPTION_KMS_IMPL = "encryption.kms-impl";
public static final String WRITER_KEK_TIMEOUT_SEC = "encryption.kek-timeout-sec";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using kek because it is not clear. Is there a good term for the key you're referring to here?

Copy link
Contributor Author

@ggershinsky ggershinsky Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have a reasonably well defined value of this parameter in the NIST spec, maybe we don't need to make it configurable in an initial version of table encryption. This parameter is also not easy to explain. So I'll remove this from the CatalogProperties for now. If, for some reason, a requirement comes later to make this configurable, we can add this back, with a proper name and documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great!

}

protected abstract ManifestFile prepare(ManifestFile manifest);

protected abstract FileAppender<ManifestFile> newAppender(
OutputFile file, Map<String, String> meta);
OutputFile outputFile, Map<String, String> meta);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like file was accidentally renamed to outputFile. Can you revert that change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverting it triggers the following build message:

> Task :iceberg-core:checkstyleMain
[ant:checkstyle] [ERROR] /<path>/iceberg/core/src/main/java/org/apache/iceberg/ManifestListWriter.java:66:18: 
'file' hides a field. [HiddenField]

> Task :iceberg-core:checkstyleMain FAILED

FAILURE: Build failed with an exception.

@@ -27,4 +27,15 @@ public interface NativeEncryptionKeyMetadata extends EncryptionKeyMetadata {

/** Additional authentication data as a {@link ByteBuffer} */
ByteBuffer aadPrefix();

/** Encrypted file length */
Long fileLength();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From revapi changes, I think these need default implementations.

import java.io.Serializable;
import java.nio.ByteBuffer;

public class WrappedEncryptionKey implements Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: needs Javadoc to explain what "Wrapped" means in this context.

@rdblue
Copy link
Contributor

rdblue commented Sep 16, 2024

@ggershinsky, I'm not too concerned with the size of the cache. I'm okay with 1 day, but that seems like a long time to have unencrypted key material in memory. I'll defer to your judgement here.

@ggershinsky
Copy link
Contributor Author

Ok. We don't have clear guidelines on key caching in memory (key copies are spread all over the process memory - cache, plug-in KMS client code, an HTTP library in the KMS client code; the Java GC - so there are no guarantees for when an uncached key is deleted from memory, if ever, before the process stops). But I agree a day could be too long. I'll change it to 1 hour - might be a reasonable trade-off between performance requirements (KMS call overhead), and safety requirements (there is a chance a key will be deleted from the memory within a business day).

@ggershinsky
Copy link
Contributor Author

Hi @rdblue , I've built the integration code with the latest version of this patch, works ok. Can we merge this PR?

@RussellSpitzer
Copy link
Member

@rdblue Did you have any more comments on this one? I can do another pass as well but I'd like to finish this up as well soon

@ggershinsky
Copy link
Contributor Author

@rdblue @RussellSpitzer All current comments should have been addressed in this thread and in the last commit. An additional review round is always welcome, I too would like to complete this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

4 participants