Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

This PR is a follow-up to #4304. In that PR, we updated the method that checks whether a manifest has an entry to be deleted. However, we did not update the method that actually does the manifest filtering. As a consequence, the test added in this PR would fail with a runtime exception as one delete file should be removed and one kept. In addition, we should not introduce any extra calls to evaluator as they are expensive.

Overall, the problem is that whenever we overwrite data by filter, we should not fail operations when it is not possible to remove delete files. Removal of delete files is optional and is an optimization we should perform if possible.

@github-actions github-actions bot added the core label Mar 21, 2022
dropPartitions.contains(file.specId(), file.partition()) ||
(isDelete && entry.sequenceNumber() > 0 && entry.sequenceNumber() < minSequenceNumber);

boolean nonMatchingDeleteFile = !file.content().equals(FileContent.DATA) && !evaluator.rowsMustMatch(file);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid extra calls to evaluator if possible. Also, there was isDelete var already defined.

reader.entries().forEach(entry -> {
F file = entry.file();
boolean fileDelete = deletePaths.contains(file.path()) ||
boolean markedForDelete = deletePaths.contains(file.path()) ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed for consistency.

manifest.path(), wrapper.get());
duplicateDeleteCount += 1;
if (allRowsMatch) {
writer.delete(entry);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply moved the existing logic under if now.

.commit();

validateTableFiles(table, FILE_DAY_2, FILE_DAY_2_MODIFIED);
validateTableDeleteFiles(table, FILE_DAY_2_POS_DELETES);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would previously fail as we should remove one delete file and one should be kept.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix. It makes sense, you are right on the optimization tip and the missing case where it passes the check but not the actual delete, sorry I did not see this earlier.

hasDeletedFiles = allRowsMatch;

if (failAnyDelete) {
throw new DeleteException(reader.spec().partitionToPath(file.partition()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In failAnyDelete case, this would trigger if we have a deleteFile but rows do not match. Probably this never happens in delete call, but wonder if we need to update the condition (failAnyDelete && allRowsMatch).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll fix.

table.newOverwrite()
.overwriteByRowFilter(EXPRESSION_DAY_2_ANOTHER_ID_RANGE)
.addFile(FILE_DAY_2_MODIFIED)
.validateFromSnapshot(baseSnapshot.snapshotId())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: this validate from snapshot seems to be a bit useless in reproducing the problem, especially as there should be no other transactions after this base here, would it be better to remove all the validateXX flags to simplify the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, will simplify.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, new changes look good to me.

@aokolnychyi aokolnychyi merged commit ab1a3a8 into apache:master Mar 23, 2022
@aokolnychyi
Copy link
Contributor Author

Thanks, @szehon-ho!

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants