Ensure Delta Lake vacuum procedure runs on supported table versions#14579
Ensure Delta Lake vacuum procedure runs on supported table versions#14579alexjo2144 wants to merge 1 commit intotrinodb:masterfrom
Conversation
| private static final int MAX_SUPPORTED_WRITER_VERSION = 3; | ||
| private static final int MAX_SUPPORTED_READER_VERSION = 2; |
There was a problem hiding this comment.
I've made these separate because I think we can advance them faster than we advance support for updates/deletes. As long as newer versions don't include new types of metadata files we can vacuum them.
| import static java.lang.String.format; | ||
| import static org.assertj.core.api.Assertions.assertThatThrownBy; | ||
|
|
||
| public class TestDeltaLakeVacuumCompatibility |
There was a problem hiding this comment.
No changes requested. I feel it would better to have vacuum execution between both Trino and Delta if we call "compatibility" tests.
| @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS}) | ||
| public void testVacuumOnUnsupportedTableVersion() | ||
| { | ||
| String tableName = "test_dl_create_table_compat_" + randomTableSuffix(); |
There was a problem hiding this comment.
nit:
| String tableName = "test_dl_create_table_compat_" + randomTableSuffix(); | |
| String tableName = "test_dl_unsupported_vacuum_" + randomTableSuffix(); |
https://docs.delta.io/2.1.0/delta-change-data-feed.html#change-data-storage mentions the below and it's the current behavior as far as I tested in #14590. What's preventing vacuuming v4 tables?
|
|
After doing some testing with the Databricks implementation, I don't think we actually need to make this any smarter than it is for v4 tables. I thought there would be some logic needed to make sure that the CDF files for retained versions were also left in place but Databricks doesn't seem to have implemented it this way either. Closing this issue, we can have a separate one for adding compatibility tests to ensure we have the same treatment of CDF files as Databricks does. |
Description
Add version checks to the Delta Lake
vacuumprocedure. Version 4 of the Delta Lake writer specification adds new metadata files created by the change data feed, this prevents vacuuming v4 tables until the procedure is updated to account for those files.Relates to: #12637
Non-technical explanation
Add version checks to avoid corrupting newer Delta Lake tables.
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: