-
Notifications
You must be signed in to change notification settings - Fork 3k
Encryption: Simplify Hive key handling and add transaction tests #14752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -105,8 +105,8 @@ private static List<DataFile> currentDataFiles(Table table) { | |
|
|
||
| @TestTemplate | ||
| public void testRefresh() { | ||
| catalog.initialize(catalogName, catalogConfig); | ||
| Table table = catalog.loadTable(tableIdent); | ||
| validationCatalog.initialize(catalogName, catalogConfig); | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically, the catalog -> validationCatalog isn't needed here, but figure it doesn't hurt to set things up for #13225. |
||
| Table table = validationCatalog.loadTable(tableIdent); | ||
|
|
||
| assertThat(currentDataFiles(table)).isNotEmpty(); | ||
|
|
||
|
|
@@ -117,21 +117,73 @@ public void testRefresh() { | |
| } | ||
|
|
||
| @TestTemplate | ||
| public void testTransaction() { | ||
| catalog.initialize(catalogName, catalogConfig); | ||
| public void testAppendTransaction() { | ||
| validationCatalog.initialize(catalogName, catalogConfig); | ||
| Table table = validationCatalog.loadTable(tableIdent); | ||
|
|
||
| Table table = catalog.loadTable(tableIdent); | ||
| List<DataFile> dataFiles = currentDataFiles(table); | ||
| Transaction transaction = table.newTransaction(); | ||
| AppendFiles append = transaction.newAppend(); | ||
|
|
||
| // add an arbitrary datafile | ||
| append.appendFile(dataFiles.get(0)); | ||
| append.commit(); | ||
| transaction.commitTransaction(); | ||
|
|
||
| assertThat(currentDataFiles(table)).hasSize(dataFiles.size() + 1); | ||
| } | ||
|
|
||
| @TestTemplate | ||
| public void testConcurrentAppendTransactions() { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This tests an append retry |
||
| validationCatalog.initialize(catalogName, catalogConfig); | ||
| Table table = validationCatalog.loadTable(tableIdent); | ||
|
|
||
| List<DataFile> dataFiles = currentDataFiles(table); | ||
| Transaction transaction = table.newTransaction(); | ||
| AppendFiles append = transaction.newAppend(); | ||
|
|
||
| // add an arbitrary datafile | ||
| append.appendFile(dataFiles.get(0)); | ||
|
|
||
| // append to the table in the meantime. use a separate load to avoid shared operations | ||
| validationCatalog.loadTable(tableIdent).newFastAppend().appendFile(dataFiles.get(0)).commit(); | ||
|
|
||
| append.commit(); | ||
| transaction.commitTransaction(); | ||
|
|
||
| assertThat(currentDataFiles(table).size()).isEqualTo(dataFiles.size() + 1); | ||
| assertThat(currentDataFiles(table)).hasSize(dataFiles.size() + 2); | ||
| } | ||
|
|
||
| // See CatalogTests#testConcurrentReplaceTransactions | ||
| @TestTemplate | ||
| public void testConcurrentReplaceTransactions() { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This tests a replace retry - note that replace retries don't re-apply updates to refreshed metadata, because they'll anyway replace the whole table. #13225 didn't handle this case at one point, because it requires keys to be maintained on refreshing different metadata, so I think valuable to test |
||
| validationCatalog.initialize(catalogName, catalogConfig); | ||
|
|
||
| Table table = validationCatalog.loadTable(tableIdent); | ||
| DataFile file = currentDataFiles(table).get(0); | ||
| Schema schema = table.schema(); | ||
|
|
||
| // Write data for a replace transaction that will be committed later | ||
| Transaction secondReplace = | ||
| validationCatalog | ||
| .buildTable(tableIdent, schema) | ||
| .withProperty("encryption.key-id", UnitestKMS.MASTER_KEY_NAME1) | ||
| .replaceTransaction(); | ||
| secondReplace.newFastAppend().appendFile(file).commit(); | ||
|
|
||
| // Commit another replace transaction first | ||
| Transaction firstReplace = | ||
| validationCatalog | ||
| .buildTable(tableIdent, schema) | ||
| .withProperty("encryption.key-id", UnitestKMS.MASTER_KEY_NAME1) | ||
| .replaceTransaction(); | ||
| firstReplace.newFastAppend().appendFile(file).commit(); | ||
| firstReplace.commitTransaction(); | ||
|
|
||
| secondReplace.commitTransaction(); | ||
|
|
||
| Table afterSecondReplace = validationCatalog.loadTable(tableIdent); | ||
| assertThat(currentDataFiles(afterSecondReplace)).hasSize(1); | ||
| } | ||
|
|
||
| @TestTemplate | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can assume that encryption properties are unchanging (aside from keys) and therefore do not need to be updated on refresh, there's a actually simpler approach that just adds metadata keys to the current EM's transient state: https://github.com/apache/iceberg/compare/main...smaheshwar-pltr:iceberg:sm/add-keys?expand=1.
We could also go with that approach, and introduce setters in case we can't make that assumption. But then I suspect just re-initialising the EM on refresh makes more sense.
Curious though for thoughts on this assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I like the idea of rather re-initializing the EM on refresh, so I'd rather stick to it. I suspect the transient state was never intended to be accesible from outside with set/put methods.
@ggershinsky let me know if you agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @smaheshwar-pltr and @szlta , I also think it's cleaner to re-init the EM upon refresh.