Skip to content

Conversation

@szehon-ho
Copy link
Member

Adds a spark procedure: rewrite_position_delete_files for #7389

@github-actions github-actions bot added the spark label May 9, 2023
@szehon-ho szehon-ho added this to the Iceberg 1.3.0 milestone May 9, 2023
mapBuilder.put("register_table", RegisterTableProcedure::builder);
mapBuilder.put("publish_changes", PublishChangesProcedure::builder);
mapBuilder.put("create_changelog_view", CreateChangelogViewProcedure::builder);
mapBuilder.put("rewrite_position_delete_files", RewritePositionDeleteFilesProcedure::builder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe shorten the name by just calling it as rewrite_position_deletes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does everyone think? I use RewritePositionDeletes and PositionDeletes in the code a lot for shortness, but was not sure as the procedure names all indicate files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I'd go for shorter names, but I see here all the keys in the map already have files suffix so I think it's fine as it is currently

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea thats the primary reason why i kept 'files' though i do think it is long, @aokolnychyi what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion. I also like shorter names but it seems we use files suffix in other procedures so I'll be inclined to keep the name as is.

Long.valueOf(snapshotSummary.get(ADDED_FILE_SIZE_PROP)))),
output);

Assert.assertEquals(1, TestHelpers.deleteFiles(table).size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we also validate the contents of the new delete file? (whether it really has all the rewritten files' contents?)

Copy link
Member Author

@szehon-ho szehon-ho May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted this test to be about the procedure code, as I already have added delete file content check in the test of of Action itself in TestRewritePositionDeleteFilesAction

private Map<String, String> snapshotSummary() {
return validationCatalog.loadTable(tableIdent).currentSnapshot().summary();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also add a test case with dangling deletes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, added already tests of dangling deletes on : TestRewritePositionDeleteFilesAction, and was thinking this to be a quicker test just to validate the procedure code only

@szehon-ho szehon-ho force-pushed the rewrite_deletes_procedures branch from 714c6dc to f8158e7 Compare May 11, 2023 00:16
mapBuilder.put("register_table", RegisterTableProcedure::builder);
mapBuilder.put("publish_changes", PublishChangesProcedure::builder);
mapBuilder.put("create_changelog_view", CreateChangelogViewProcedure::builder);
mapBuilder.put("rewrite_position_delete_files", RewritePositionDeleteFilesProcedure::builder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I'd go for shorter names, but I see here all the keys in the map already have files suffix so I think it's fine as it is currently

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @szehon-ho !

@aokolnychyi
Copy link
Contributor

Let me take a look as well.

});
}

private InternalRow[] toOutputRows(Result result) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Another way to write it.

private InternalRow toOutputRow(Result result) {
  return newInternalRow(
      result.rewrittenDeleteFilesCount(),
      result.rewrittenBytesCount(),
      result.addedDeleteFilesCount(),
      result.addedBytesCount());
}
...
return new InternalRow[] {toOutputRow(result)};

Up to you, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, figured out your argument is in the wrong order, but yep, done.

@szehon-ho szehon-ho force-pushed the rewrite_deletes_procedures branch from b22c46f to 12ba560 Compare May 16, 2023 23:38
return modifyIcebergTable(
tableIdent,
table -> {
RewritePositionDeleteFiles action =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed I can get remove this as well and just chain the action call

@szehon-ho szehon-ho force-pushed the rewrite_deletes_procedures branch from 12ba560 to 662e534 Compare May 16, 2023 23:52
@aokolnychyi aokolnychyi merged commit d65fa09 into apache:master May 17, 2023
@aokolnychyi
Copy link
Contributor

Thanks everyone!

aokolnychyi pushed a commit that referenced this pull request May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants