-
Notifications
You must be signed in to change notification settings - Fork 3k
API: add an action API for rewrite deletes #2841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
aokolnychyi
merged 14 commits into
apache:master
from
chenjunjiedada:add-rewrite-delete-api
Nov 2, 2021
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
a2eee90
API: add an action API for rewrite deletes
chenjunjiedada ed51fc7
add more API functions
chenjunjiedada 0697cf5
update according to Jack's comments
chenjunjiedada 9537fa1
use more common api
chenjunjiedada c68303b
address comments
chenjunjiedada 6014276
address comments
chenjunjiedada 597e8f9
minor update
chenjunjiedada 82c8118
address comments
chenjunjiedada bc1a0b9
update definition accroding to proposal
chenjunjiedada dee5962
address comments
chenjunjiedada 70b7162
add partition filter and minor doc updates
chenjunjiedada dcb3201
address more comments
chenjunjiedada a6a2540
remove partition filter
chenjunjiedada 6d0853b
minor words update
chenjunjiedada File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
56 changes: 56 additions & 0 deletions
56
api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg.actions; | ||
|
|
||
| import org.apache.iceberg.expressions.Expression; | ||
|
|
||
| /** | ||
| * An action for converting the equality delete files to position delete files. | ||
| */ | ||
| public interface ConvertEqualityDeleteFiles | ||
| extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> { | ||
|
|
||
| /** | ||
| * A filter for finding the equality deletes to convert. | ||
| * <p> | ||
| * The filter will be converted to a partition filter with an inclusive projection. Any file that may contain rows | ||
| * matching this filter will be used by the action. The matching delete files will be converted to position delete | ||
| * files. | ||
| * | ||
| * @param expression An iceberg expression used to find deletes. | ||
| * @return this for method chaining | ||
| */ | ||
| ConvertEqualityDeleteFiles filter(Expression expression); | ||
|
|
||
| /** | ||
| * The action result that contains a summary of the execution. | ||
| */ | ||
| interface Result { | ||
| /** | ||
| * Returns the count of the deletes that been converted. | ||
| */ | ||
| int convertedEqualityDeleteFilesCount(); | ||
|
|
||
| /** | ||
| * Returns the count of the added position delete files. | ||
| */ | ||
| int addedPositionDeleteFilesCount(); | ||
| } | ||
| } | ||
57 changes: 57 additions & 0 deletions
57
api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg.actions; | ||
|
|
||
| import org.apache.iceberg.expressions.Expression; | ||
|
|
||
| /** | ||
| * An action for rewriting position delete files. | ||
| * <p> | ||
| * Generally used for optimizing the size and layout of position delete files within a table. | ||
| */ | ||
| public interface RewritePositionDeleteFiles | ||
| extends SnapshotUpdate<RewritePositionDeleteFiles, RewritePositionDeleteFiles.Result> { | ||
|
|
||
| /** | ||
| * A filter for finding deletes to rewrite. | ||
| * <p> | ||
| * The filter will be converted to a partition filter with an inclusive projection. Any file that may contain rows | ||
| * matching this filter will be used by the action. The matching delete files will be rewritten. | ||
| * | ||
| * @param expression An iceberg expression used to find deletes. | ||
| * @return this for method chaining | ||
| */ | ||
| RewritePositionDeleteFiles filter(Expression expression); | ||
|
|
||
| /** | ||
| * The action result that contains a summary of the execution. | ||
| */ | ||
| interface Result { | ||
| /** | ||
| * Returns the count of the position deletes that been rewritten. | ||
| */ | ||
| int rewrittenDeleteFilesCount(); | ||
|
|
||
| /** | ||
| * Returns the count of the added delete files. | ||
| */ | ||
| int addedDeleteFilesCount(); | ||
| } | ||
| } |
79 changes: 79 additions & 0 deletions
79
core/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteStrategy.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg.actions; | ||
|
|
||
| import java.util.Map; | ||
| import java.util.Set; | ||
| import org.apache.iceberg.DeleteFile; | ||
| import org.apache.iceberg.FileScanTask; | ||
| import org.apache.iceberg.Table; | ||
|
|
||
| /** | ||
| * A strategy for the action to convert equality delete to position deletes. | ||
| */ | ||
| public interface ConvertEqualityDeleteStrategy { | ||
|
|
||
| /** | ||
| * Returns the name of this convert deletes strategy | ||
| */ | ||
| String name(); | ||
|
|
||
| /** | ||
| * Returns the table being modified by this convert strategy | ||
| */ | ||
| Table table(); | ||
|
|
||
| /** | ||
| * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not | ||
| * specified here will be rejected at runtime. | ||
| */ | ||
| Set<String> validOptions(); | ||
|
|
||
| /** | ||
| * Sets options to be used with this strategy | ||
| */ | ||
| RewritePositionDeleteStrategy options(Map<String, String> options); | ||
|
|
||
| /** | ||
| * Select the delete files to convert. | ||
| * | ||
| * @param deleteFiles iterable of delete files in a group. | ||
| * @return iterable of original delete file to be converted. | ||
| */ | ||
| Iterable<DeleteFile> selectDeleteFiles(Iterable<DeleteFile> deleteFiles); | ||
|
|
||
| /** | ||
| * Groups delete files into lists which will be processed in a single executable unit. Each group will end up being | ||
| * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying | ||
| * Action. | ||
| * | ||
| * @param dataFiles iterable of data files that contain the DeleteFile to be converted | ||
| * @return iterable of lists of FileScanTasks which will be processed together | ||
| */ | ||
| Iterable<Iterable<FileScanTask>> planDeleteFileGroups(Iterable<FileScanTask> dataFiles); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: The order of methods doesn't match
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will update. |
||
|
|
||
| /** | ||
| * Define how to convert the deletes. | ||
| * | ||
| * @param deleteFilesToConvert a group of files to be converted together | ||
| * @return iterable of delete files used to replace the original delete files. | ||
| */ | ||
| Iterable<DeleteFile> convertDeleteFiles(Iterable<DeleteFile> deleteFilesToConvert); | ||
| } | ||
78 changes: 78 additions & 0 deletions
78
core/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteStrategy.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg.actions; | ||
|
|
||
| import java.util.Map; | ||
| import java.util.Set; | ||
| import org.apache.iceberg.DeleteFile; | ||
| import org.apache.iceberg.Table; | ||
|
|
||
| /** | ||
| * A strategy for an action to rewrite position delete files. | ||
| */ | ||
| public interface RewritePositionDeleteStrategy { | ||
|
|
||
| /** | ||
| * Returns the name of this rewrite deletes strategy | ||
| */ | ||
| String name(); | ||
|
|
||
| /** | ||
| * Returns the table being modified by this rewrite strategy | ||
| */ | ||
| Table table(); | ||
|
|
||
| /** | ||
| * Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not | ||
| * specified here will be rejected at runtime. | ||
| */ | ||
| Set<String> validOptions(); | ||
|
|
||
| /** | ||
| * Sets options to be used with this strategy | ||
| */ | ||
| RewritePositionDeleteStrategy options(Map<String, String> options); | ||
|
|
||
| /** | ||
| * Select the delete files to rewrite. | ||
| * | ||
| * @param deleteFiles iterable of delete files in a group. | ||
| * @return iterable of original delete file to be replaced. | ||
| */ | ||
| Iterable<DeleteFile> selectDeleteFiles(Iterable<DeleteFile> deleteFiles); | ||
|
|
||
| /** | ||
| * Groups into lists which will be processed in a single executable unit. Each group will end up being | ||
| * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying | ||
| * Action. | ||
| * | ||
| * @param deleteFiles iterable of DeleteFile to be rewritten | ||
| * @return iterable of lists of FileScanTasks which will be processed together | ||
| */ | ||
| Iterable<Iterable<DeleteFile>> planDeleteFileGroups(Iterable<DeleteFile> deleteFiles); | ||
|
|
||
| /** | ||
| * Define how to rewrite the deletes. | ||
| * | ||
| * @param deleteFilesToRewrite a group of files to be rewritten together | ||
| * @return iterable of delete files used to replace the original delete files. | ||
| */ | ||
| Iterable<DeleteFile> rewriteDeleteFiles(Iterable<DeleteFile> deleteFilesToRewrite); | ||
| } |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a partition filter and a row/data filter, or just a row filter? Since equality delete files are stored by partition, I think that we will actually convert a filter to a partition filter and then rewrite any equality delete file in that partition. So it may make sense to allow both. If, for example, you have a table that is bucketed and you rewrite the deletes in 1/10th of the buckets every day. Then you'd want to be able to specify
id_bucket IN (...)rather than supplying a data filter.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me. Even a partition filter is enough, but let me keep both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue , Anton mentioned that we could always convert the data filter to a partition filter, just want to confirm that do we have any specific reason to keep the partition filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess you could use
in(bucket("id"), 1, 2, 3). That would be fine. We can remove the partition filter.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done!