Add test for vacuum procedure with CDF table in Delta#14590
Conversation
| // Vacuum procedure should remove files in _change_data directory | ||
| // https://docs.delta.io/2.1.0/delta-change-data-feed.html#change-data-storage | ||
| onTrino().executeQuery("SET SESSION delta.vacuum_min_retention = '0s'"); | ||
| onTrino().executeQuery("CALL delta.system.vacuum('default', '" + tableName + "', '0s')"); | ||
|
|
||
| Assertions.assertThat(s3.listObjectsV2(bucketName, changeDataPrefix).getObjectSummaries()).hasSize(0); |
There was a problem hiding this comment.
My gut feeling is that CDF files which are referenced from transaction log versions which are retained should also be retained.
So in this case, we vacuum everything except for the most recent version of the table, there should still be one CDF file left after the vacuum.
Does that seem like the right behavior?
There was a problem hiding this comment.
Maybe we could see what databricks is doing and do the same thing? Or have you checked it already?
There was a problem hiding this comment.
My gut feeling is that CDF files which are referenced from transaction log versions which are retained should also be retained.
I just tried this out on Databricks 10.4 and this is not what they do. If you have the retention time set to zero they delete all of the CDF files.
I guess this is fine then
There was a problem hiding this comment.
Or have you checked it already?
Yes, I confirmed Databricks behavior before sending this PR. Added another test which runs on Databricks just in case.
There was a problem hiding this comment.
I just tried this out on Databricks 10.4 and this is not what they do. If you have the retention time set to zero they delete all of the CDF files.
😮
so they treat CDF files just like any other untracked files?
@vkorukanti can you please confirm?
...rc/main/java/io/trino/tests/product/deltalake/TestDeltaLakeWriteDatabricksCompatibility.java
Outdated
Show resolved
Hide resolved
e4faeb3 to
fd7dcaa
Compare
c9bd3c8 to
4df2713
Compare
4df2713 to
f124b39
Compare
Description
Relates to #12637
Release notes
(x) This is not user-visible or docs only and no release notes are required.