-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Add unit test for ExpireSnapshot with DeleteFile #4141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| sql("DELETE FROM %s WHERE id=1", tableName); | ||
|
|
||
| Table table = validationCatalog.loadTable(tableIdent); | ||
| table.refresh(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't need to refresh here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| sql("INSERT INTO TABLE %s VALUES (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')", tableName); | ||
| sql("DELETE FROM %s WHERE id=1", tableName); | ||
|
|
||
| Table table = validationCatalog.loadTable(tableIdent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know it may be easier to always just use the SparkUtil loadSparkTable class now. Wouldn't ever have to refresh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...tensions/src/test/java/org/apache/iceberg/spark/extensions/TestExpireSnapshotsProcedure.java
Show resolved
Hide resolved
...tensions/src/test/java/org/apache/iceberg/spark/extensions/TestExpireSnapshotsProcedure.java
Show resolved
Hide resolved
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small things, thanks for adding the test!
|
Let me take a look tomorrow too. Sorry for the delay! |
af3c6b5 to
aa9398f
Compare
|
@RussellSpitzer replied to comments , when you have time for another look. @aokolnychyi no problem, thanks for reviewing! |
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. LGTM. Thank you @szehon-ho!
aokolnychyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits
| Path deleteManifestPath = new Path(deleteManifests(table).iterator().next().path()); | ||
| Path deleteFilePath = new Path(String.valueOf(deleteFiles(table).iterator().next().path())); | ||
|
|
||
| sql("CALL %s.system.rewrite_data_files(table => '%s', options => map" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it would be easier to read if the args were on separate lines like in a few other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| sql("CREATE TABLE %s (id bigint NOT NULL, data string) USING iceberg TBLPROPERTIES" + | ||
| "('format-version'='2', 'write.delete.mode'='merge-on-read')", tableName); | ||
|
|
||
| sql("INSERT INTO TABLE %s VALUES (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')", tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid we don't know the number of files that this insert will produce. That's why ID = 1 may end up in a separate file (unlikely but possible). If we write just a single file with ID = 1, the DELETE operation below will be a metadata operation and the test will fail.
I think it would be safer to use a typed Dataset and SimpleRecord. That way, we can call coalesce(1) before writing to make sure we produce only 1 file and the subsequent DELETE operation will produce a delete file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good catch, done.
| Assert.assertFalse("Delete file should be removed", localFs.exists(deleteFilePath)); | ||
| } | ||
|
|
||
| private Set<ManifestFile> deleteManifests(Table table) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this may be simplified a bit?
private List<ManifestFile> deleteManifests(Table table) {
return table.currentSnapshot().deleteManifests();
}
private Set<DeleteFile> deleteFiles(Table table) {
Set<DeleteFile> deleteFiles = Sets.newHashSet();
for (FileScanTask task : table.newScan().planFiles()) {
deleteFiles.addAll(task.deletes());
}
return deleteFiles;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
...tensions/src/test/java/org/apache/iceberg/spark/extensions/TestExpireSnapshotsProcedure.java
Show resolved
Hide resolved
f723e51 to
04365e3
Compare
|
Looks like everything has been addressed so I'll merge this. Thanks, @szehon-ho! |
|
Thanks @RussellSpitzer , @kbendick , @aokolnychyi for reviews, and @rdblue for the weekend merge ! |
|
whichi version suppose ExpireSnapshot delete with DeleteFile? i use iceberg 0.14.0, ExpireSnapshot cannot delete DeleteFile |
|
Hi, when i tested this, i think the issue is not that expireSnapshot doesnt remove delete files. Its that the delete files dont get removed from current snapshot. Issue is described here: #4127 Can you check that? You can query 'files' table, does that have the delete files? If so then its still on current snapshot). Im working on a design for fix for this, but havent had time yet. Hopefully next week will put a design doc up. |
|
yes, when i query files, there are delete files, but data files which are referenced are remove |
Adding a test to demonstrate how to age off delete files and eventually remove them by expire snapshots.