API: add an action API for rewrite deletes #2841

chenjunjiedada · 2021-07-19T13:23:58Z

This defines an action API for rewriting deletes as we mentioned in here. With this API, users could define their own rewrite strategy to handle deletes according to different features of different storage or engine.

This reuses parts of the idea from @RussellSpitzer 's rewrite framework. It is a simple version so that it could be implemented with the previous rewrite delete PRs.

chenjunjiedada · 2021-07-19T13:32:47Z

@jackye1995 @RussellSpitzer Could you please take a look?

RussellSpitzer

Looks good to me! We are gonna have a lot of fun new rewrites!

jackye1995 · 2021-07-19T17:57:07Z

core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java

+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+


I think we also need the methods similar to RewriteStrategy such as name, validOptions

I didn't add validOptions because we haven't finalized what options are valid and it can be added late. But I'm fine to add them now.

For name, It looks like the class name already includes the strategy name. Do we still need it?

name is mostly used for logging purpose. I agree it is already in the class name, but for consistency with RewriteStrategy I think we can still have it here.

jackye1995 · 2021-07-19T19:24:51Z

core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java

+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();


could you explain a bit about how these 2 interfaces will be used? For RewriteStrategy the method Iterable<FileScanTask> selectFilesToRewrite(Iterable<FileScanTask> dataFiles) was quite straightforward, but here the method has no arguments so I am not very sure how would it be used to actually select deletes.

I think it goes to the fundamental question of what are some potential rewrite delete strategies? For rewrite data files we know there are well-known operations like bin-packing. For deletes, what are some strategies we plan to implement?

The argument of selectFilesToRewrite(Iterable<FileScanTask> dataFiles) comes from planFileGroups and the filter logic. I think we could build similar logic by storing the iterable of FileScanTask as a local variable in the concrete RewriteDeleteStrategy class, and grouping them in rewriteDeletes() API. As for filter logic, we could add it later and apply it in selectDeletes() API.

As for strategies, it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they could impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. So we are planning two strategies at first: 1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete. I think the first one is kind of bin-packing, the second one is convert-and-bin-packing. In the future, we might also want to sort the merged position delete by file name and break it into pieces so that executors could read only the delete contents that match the data files.

Thanks for the explanation. The case you described are also what I am interested in. I think for RewriteStrategy we first had a design doc so when there was the PR everything was in place. I am fine if we are taking incremental progress for delete strategies to gradually evolve the interface and add configuration options.

Hi @chenjunjiedada , thanks for tracking the v2's compaction for long time, It's really challenging. I think we reviewers were distracted by other internal or external things , which blocked the v2 compaction progress for a long time. While the fact is: the current v2 compaction feature is the biggest blocker for v2 right now, I'd like to push this feature forward, so that we don't block v2 forever. (And thanks for your bandwidth @jackye1995, you always come out to help when the things get stuck).

About this PR, since it's a API definition, I'd like to take a look the whole delete rewrite design or preview PR if you have one because in that case I could evaluate whether the API abstraction is perfectly matching the real implementation. Of course, we could introduce a basic API, and then iterate the interface design when we get more implementation. It's looks better for me to reach a global agreement before we starting to provide a specific design, that can help us reduce context sync on the core paths.

+1 for a general design. And please let me know if there is anything that can be parallelized for implementation once we reached an agreement on this. I am currently mostly spending time trying to push the work on Trino side, but should have some bandwidth to take some implementation work for this if necessary. As openinx said, this is the biggest blocker for v2.

I don't have a design doc right now, but most of the ideas come from discussions in #2216 and #2364. I'm working on rebasing #2216 to apply #2372 (API definition) and #2841 (delete row reader). That would be a full implementation and hopefully could give you the full picture of how these APIs are used. I will post in the following days.

great, let me read through those, thank you!

jackye1995 · 2021-07-21T06:15:34Z

core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java

+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();


Thanks for the explanation. The case you described are also what I am interested in. I think for RewriteStrategy we first had a design doc so when there was the PR everything was in place. I am fine if we are taking incremental progress for delete strategies to gradually evolve the interface and add configuration options.

chenjunjiedada · 2021-07-22T07:48:05Z

@RussellSpitzer @jackye1995 Is this ready to merge now?

chenjunjiedada · 2021-07-23T23:49:54Z

@rdblue @aokolnychyi @openinx You may want to take a look as well.

chenjunjiedada · 2021-07-28T03:24:02Z

@openinx @jackye1995 I rebased the #2364, now you can take a quick look at how the APIs are used.

cc @rdblue @aokolnychyi @RussellSpitzer @yyanyy

jackye1995 · 2021-07-28T04:27:20Z

Thank you! I reviewed the PR. I will paste my major comment also here:

After reading ConvertEqDeletesStrategy.rewriteDeletes, I think we can make the RewriteDeleteStrategy interface closer to RewriteStrategy interface. What we have there is basically the equivalent of planFileGroups plus rewriteFiles in RewriteStrategy. So I would propose we have the following methods in RewriteDeleteStrategy to be more aligned:

Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Iterable<List<FileScanTask>> planDeleteGroups(Iterable<DeleteFile> deleteFiles);

Set<DeleteFile> rewriteDeletes(List<DeleteFile> deleteFilesToRewrite);

And we can get the partition StructLike directly from the list of scan tasks instead of passing it through the task pair in EqualityDeleteRewriter. In this way, we can also enable partial progress for commits.

Any thoughts?

yyanyy · 2021-07-28T21:31:28Z

+1 to Jack's comment on making the delete strategy API mimic the one for RewriteStrategy since I believe the intention there was so that the rewrites can happen in batches/people can decide if they want to rewrite a certain partition/subset of files. If we directly take no argument for select files, we are assuming users will always want to rewrite the entire table.

yyanyy · 2021-07-28T21:36:02Z

api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java

+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes rewriteEqDeletes();


Curious on the reason for having rewriteEqDeletes and rewritePosDeletes here, looks like in RewriteDataFiles we define methods based on the exact strategy to use, and for rewriting delete files there could be strategies like eqDelete->posDelete, combine posDelete into one, and even take both delete and data files and return data files (I think the last one should be a separate interface). Here it seems like the user can decide whether to rewrite certain type of delete files, but I would think strategy might be the thing that the user want to control more? Sorry I've been away from the community conversations for a while so might lack a lot of context if this was previously discussed.

rewriteEqDeletes and rewritePosDeletes is a top-level choice for users to choose which kind of delete he or she wants to rewrite. Here's an example: actionsProvider().rewriteDeletes(table).rewriteEqDeletes().execute();, users could clearly know what kind of deletes are handled. But you remind me that, we might also don't need these two as well, use one strategy API to select its implementation is enough.

The strategy options are open and we could pass options from options interface from the specified strategy implementation.

jackye1995

Thank you for the change, this looks good to me!

yyanyy · 2021-08-04T02:25:03Z

api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java

+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);


Is it possible for us to pass in RewriteDeleteStrategy here rather than a free formed string?

RewriteDeleteStrategy is defined in the core package, while this is in the api package. This is like the pattern for the Catalog, as the usage of loading catalog:

Catalog catalog = CatalogUtil.loadCatalog(HiveCatalog.class.getName(), "hive", ImmutableMap.of(), hadoopConf);

we could have a similar usage like:

actionsProvider() .rewriteDeletes(table) .strategy(ConvertEqDeletesStrategy.class.getName()) .execute();

For data file rewrites, we didn't allow custom extension because we don't think that there's a lot of value in it just yet and we want to keep things simple if there isn't a use for dynamically loaded classes. I think we should probably take the same approach here.

+1 for not having a custom impl here because if you think some strategy is valuable, it's likely also valuable for other users. Or maybe could you provide more context about what customization you would like to do in addition?

I think we all agree that eq->pos is the one that can share with others. Let me update this.

+1 here too. We debated this while designing the new data compaction API.

I think another common strategy would be to just compact delete files. For example, take 100 position deletes and write them as one (within a partition). Another use case is compacting a large number of small global deletes.

+1. That is the next action, I have verified internally that it could bring some benefit for query.

chenjunjiedada · 2021-08-04T15:26:27Z

@RussellSpitzer @jackye1995 @yyanyy @openinx , I think we are on the same page now, can we merge this to continue rewrite actions?

chenjunjiedada · 2021-08-04T23:44:11Z

cc @rdblue @aokolnychyi

chenjunjiedada · 2021-09-13T14:25:42Z

ping @rdblue @openinx @jackye1995 again... This might be lost in PRs.

rdblue · 2021-09-13T16:49:52Z

core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java

+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);


Nit: it would be nice to use the same method order as RewriteStrategy for consistency.

rdblue · 2021-09-13T16:51:04Z

core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java

+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);


Why accept FileScanTask here? Will the delete action only find delete files based on a scan?

This was discussed in #2841 (comment). It is in order to align with RewriteStrategy.

I think as long as we follow a similar pattern it's fine, but it's more important to make the interface suitable for deletes. Here we can consider 2 cases:

rewrite equality -> position: we actually need the reverse of the current information, which is a list of data files associated with the delete file.

rewrite position -> data: the current interface using FileScanTask seems suitable because we need to generate something similar for rewrite anyway.

I see a few different ways we can go with this:

use this current interface, the "scan task" is just a holder of delete and data files organized in the form ready for a scan. equality -> position can then reorganize information.

invent a new interface suitable for both use cases, something like a Iterable<RewriteDeleteTask>

instead of having a single rewrite strategy for both, having one for each, so that we can have something customized for equality delete.

I think there is a third use case: simply compacting position deletes.

Also, will position -> data be something we take here or in RewriteDataFiles? I can see it being part of the data compaction. Like if we detect that certain data files have too many matching delete files, we want to rewrite them.

I agree. This is the major compaction that we mentioned before. And I think this may be already available through RewriteDataFiles.

@jackye1995, In order to make API more generic, maybe we can go back to use empty args and let the strategy implementation maintain the map like the implementation here. So that users could choose the deletes in their own way, for example, read from manifest directly. And since we are rewriting deletes, it looks more natural to return Iterable<DeleteFile>.

@rdblue @jackye1995 WDYT?

@aokolnychyi thanks for the feedback!

I think there is a third use case: simply compacting position deletes.

Yes agree, that maps to the Hive minor compaction as well.

will position -> data be something we take here or in RewriteDataFiles? I can see it being part of the data compaction. Like if we detect that certain data files have too many matching delete files, we want to rewrite them.

I remember I had a discussion around this with Russell when he was implementing bin packing. When we rewrite data files, yes position delete information is merged to that, but delete files are not dropped, because the delete file might still reference other files that are not included during bin-packing. We discussed the possibility for also adding that feature in the data compaction action, the conclusion was to have another action specifically for this use case to (1) not make data compaction too complicated, (2) have separated actions for different purposes, (3) have better SLA control for different actions.

rdblue · 2021-09-13T16:52:09Z

core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java

+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeletes(Set<DeleteFile> deleteFilesToRewrite);


How does this strategy handle sequence numbers? Are all of the new delete files committed to the table at the next sequence number? If so, wouldn't that mean that this may only return position delete files?

Good question! Besides converting to position deletes we could split equality delete into small ones so that each file scan can carry small equality deletes as Jack mentioned before. But as you pointed out, I think it would be hard to handle the sequence number since the committed snapshot should be immutable. Can we do the trick?

I'm not sure how much benefit that splitting can provide, but I believe small files bring worse problems. How about we make rewriteDeletes more specific? Like removeEqDeletes and rewritePosDeletes.

When I was talking about that use case I was also in experimental phase, but so far what I see is that in practice the 2 most commonly used ones are still (1) equality -> position, (2) position -> data. Splitting equality deletes is most of the time not worth the computation effort because it's already doing a planning and it's more cost efficient to just rewrite it to position deletes.

Going back to the conversation we are having for the interface for selectDeletesToRewrite, it feels like we should have 2 different interfaces for rewriting equality and position deletes so that their interfaces can be a bit more specialized for their own use cases?

Thanks for the verification! @jackye1995. Now, I'm inclined to use more specific ones. @rdblue WDYT?

Since we can choose a specific rewrite strategy in RewriteDeleteFiles, and we don't have another strategy to handle equality deletes, I think we could keep this one as it is. Does that sound reasonable to you? @rdblue @jackye1995 .

aokolnychyi · 2021-09-17T00:22:45Z

I'll do a pass today too.

aokolnychyi · 2021-09-17T00:33:36Z

api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java

+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {


nit: shall we add some Javadoc similarly to RewriteDataFiles?

/** * An action for rewriting delete files according to a rewrite strategy. * Generally used for optimizing the sizing and layout of delete files within a table. */

aokolnychyi · 2021-09-17T00:35:48Z

api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java

+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {


nit: RewriteDeletes -> RewriteDeleteFiles to match RewriteDataFiles for consistency?

aokolnychyi · 2021-10-15T22:29:22Z

api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java

+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {


Are there any other conversions we plan to support other than equality -> position? According to the doc, it seems to be the only conversion we consider.

If so, would it make sense to call it ConvertEqualityDeleteFiles? Just asking. Then we can drop the method below.

Make sense to me, the below function should be used in the original API definition which has both rewrite and converts stuff

aokolnychyi · 2021-10-15T22:30:58Z

api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java

+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.


I am not sure the second sentence is accurate. It applies more to the rewrite action.

Hm, I think the first one should be enough, let me delete them.

aokolnychyi · 2021-10-15T22:33:25Z

api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java

+    /**
+     * Returns the count of the added position delete files.
+     */
+    int addedDeleteFilesCount();


If we think this action will be always about converting equality to position, then these methods can be a bit more specific just like their Javadoc.

For example, we can call them convertedEqualityDeleteFilesCount and addedPositionDeleteFilesCount.

Make sense to me. Will do.

aokolnychyi · 2021-10-15T22:38:20Z

api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java

+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files according to a rewrite strategy.


nit: In the doc, we are still debating whether a rewrite of equality deletes makes sense. I agree it is useful to have a generic name for this action and I like RewriteDeleteFiles. Should we ensure the Javadoc is generic too? What about dropping position in the class doc?

Instead of make API and doc generic, how about make them more specific? I mean change RewriteDeleteFiles to RewritePositionDeleteFiles.

aokolnychyi · 2021-10-15T22:41:20Z

api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java

+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles combinePositionDeletes();


Question. There is consensus that bin-packing position deletes is probably the most useful rewrite. However, we may add a rewrite that would also reshuffle deletes, not only bin-pack them. We don't know whether that will be needed and when we will implement it but what about calling this binPackPositionDeletes similar to binPack in data compaction? That way, we can add another strategy later as combineXXX seems vague.

I call this combineXXX because I think bin-pack is a kind of combined action. But this indeed may be vague if we have something like reshuffle in the future. Let me make this more specific.

aokolnychyi · 2021-10-15T22:46:16Z

api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java

+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {


I assume we will compact each partition separately so we may want to provide some file group stats in the future, just like we do in data compaction. Too early to tell whether that will be useful, though. Just something to keep in mind.

Thanks. We could always improve the API later.

aokolnychyi · 2021-10-15T22:53:00Z

core/src/main/java/org/apache/iceberg/actions/ConvertDeleteStrategy.java

+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface ConvertDeleteStrategy {


@RussellSpitzer, we may want to rename RewriteStrategy into something data-specific since we are about to add more rewrite strategies.

aokolnychyi · 2021-10-15T23:04:25Z

Any other thoughts, @RussellSpitzer @rdblue @jackye1995?

rdblue · 2021-10-17T19:59:06Z

api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java

+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A filter for choosing the equality deletes to convert.


I think that "find" would be a better verb than "choose". This doesn't really allow choice, it limits the parts of the table that are considered.

OK, will update.

rdblue · 2021-10-17T20:00:05Z

api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java

+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.


I don't think that there is a conversion strategy here, right? We can probably remove that.

rdblue · 2021-10-17T20:04:11Z

api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java

+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertEqualityDeleteFiles filter(Expression expression);


Should we have a partition filter and a row/data filter, or just a row filter? Since equality delete files are stored by partition, I think that we will actually convert a filter to a partition filter and then rewrite any equality delete file in that partition. So it may make sense to allow both. If, for example, you have a table that is bucketed and you rewrite the deletes in 1/10th of the buckets every day. Then you'd want to be able to specify id_bucket IN (...) rather than supplying a data filter.

Make sense to me. Even a partition filter is enough, but let me keep both.

@rdblue , Anton mentioned that we could always convert the data filter to a partition filter, just want to confirm that do we have any specific reason to keep the partition filter?

Yeah, I guess you could use in(bucket("id"), 1, 2, 3). That would be fine. We can remove the partition filter.

Thanks, done!

api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java

rdblue · 2021-10-19T16:20:41Z

api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java

+/**
+ * An action for rewriting the position delete files.
+ * <p>
+ * Generally used for optimizing the sizing and layout of position delete files within a table.


Nit: should be "size" not "sizing"

rdblue · 2021-10-19T16:20:59Z

api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java

+    extends SnapshotUpdate<RewritePositionDeleteFiles, RewritePositionDeleteFiles.Result> {
+
+  /**
+   * bin-pack the position deletes.


Should be capitalized.

rdblue · 2021-10-19T16:21:46Z

api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java

+   *
+   * @return this for method chaining
+   */
+  RewritePositionDeleteFiles binPackPositionDeletes();


Why add PositionDeletes to the end of the method name and not just binPack()? Also, if we aren't exposing multiple strategies in the near term, should we simply remove this?

Make sense to remove this, ConvertEqualityDeleteFiles already remove the convertEqualityDelete().

rdblue · 2021-10-19T16:23:34Z

api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java

+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A row filter for finding the equality deletes to convert.


I think that this should mention that this filter will be converted to a partition filter with an inclusive projection and that matching delete files will be rewritten. It should be clear that it isn't possible to match individual deletes, only whole files.

will add doc.

rdblue · 2021-10-19T16:24:41Z

api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java

+   * @param expression An iceberg expression used to find deletes.
+   * @return this for method chaining
+   */
+  RewritePositionDeleteFiles rowFilter(Expression expression);


I think that we plan to convert this to a partition filter using an inclusive projection like the equality delete rewrite, correct? @jackye1995, is there a use case for matching files and then using stats to filter out position delete files that don't match? I can't think of one.

rdblue · 2021-10-19T16:28:48Z

core/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteStrategy.java

+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();


In the data file rewrite strategy, the equivalent method accepts an Iterable<FileScanTask> Should this be passed an Iterable<DeleteFile> so that this is selecting from the already loaded files rather than responsible for finding delete files itself?

In my POC implementation, there is a local variable that holds the iterable. Accepting the Itrerable<FileScanTask> also sounds reasonable to me. Let up add it.

rdblue · 2021-10-19T16:29:44Z

core/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteStrategy.java

+   * @param dataFiles iterable of data files that contain the DeleteFile to be converted
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<FileScanTask>> planDeleteFileGroups(Iterable<FileScanTask> dataFiles);


Nit: The order of methods doesn't match RewriteStrategy for data files.

will update.

rdblue · 2021-10-25T21:18:58Z

api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java

+   * A row filter for finding the equality deletes to convert.
+   * <p>
+   * The row filter will be converted to a partition filter with an inclusive projection, so that candidate deletes are
+   * selected if any row match the expression. The matching delete files will be converted to position delete files.


This isn't quite correct. An inclusive projection matches if a file can contain rows that match. But it doesn't necessarily. I would change this to "Any file that may contain rows matching this filter will be used by the action."

An inclusive metrics evaluator evaluates the file in that way. But when the expression was projected to partition filter, since the partition value of the file is determined, the matching logic should be determined. Please correct me if I understand wrongly.

rdblue · 2021-10-25T21:19:41Z

api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java

+   * @param expression An iceberg expression used to find deletes.
+   * @return this for method chaining
+   */
+  ConvertEqualityDeleteFiles rowFilter(Expression expression);


Since we have just one filter, there is no need to add "row" in the method name. Let's change it back to filter to match RewriteDataFiles.

aokolnychyi · 2021-11-02T19:24:19Z

This looks good to me. Let's merge it and follow up with PRs that consume these interfaces. We can make adjustments as needed later.

aokolnychyi · 2021-11-02T19:25:23Z

Thanks @chenjunjiedada for the hard work. Thanks everyone for reviewing!

API: add an action API for rewrite deletes

a2eee90

github-actions bot added API core labels Jul 19, 2021

RussellSpitzer approved these changes Jul 19, 2021

View reviewed changes

jackye1995 reviewed Jul 19, 2021

View reviewed changes

add more API functions

ed51fc7

jackye1995 approved these changes Jul 21, 2021

View reviewed changes

yyanyy reviewed Jul 28, 2021

View reviewed changes

chenjunjiedada added 2 commits July 30, 2021 11:17

update according to Jack's comments

0697cf5

use more common api

9537fa1

jackye1995 approved these changes Aug 2, 2021

View reviewed changes

yyanyy reviewed Aug 4, 2021

View reviewed changes

yyanyy approved these changes Aug 4, 2021

View reviewed changes

rdblue reviewed Sep 13, 2021

View reviewed changes

address comments

c68303b

aokolnychyi reviewed Sep 17, 2021

View reviewed changes

aokolnychyi reviewed Oct 15, 2021

View reviewed changes

address comments

dee5962

rdblue reviewed Oct 17, 2021

View reviewed changes

add partition filter and minor doc updates

70b7162

rdblue reviewed Oct 19, 2021

View reviewed changes

api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java Outdated Show resolved Hide resolved

rdblue reviewed Oct 19, 2021

View reviewed changes

chenjunjiedada added 2 commits October 20, 2021 21:50

address more comments

dcb3201

remove partition filter

a6a2540

rdblue reviewed Oct 25, 2021

View reviewed changes

minor words update

6d0853b

aokolnychyi approved these changes Nov 2, 2021

View reviewed changes

aokolnychyi merged commit 1c62453 into apache:master Nov 2, 2021

		import org.apache.iceberg.Table;

		public interface RewriteDeleteStrategy {

API: add an action API for rewrite deletes #2841

API: add an action API for rewrite deletes #2841

Uh oh!

Conversation

chenjunjiedada commented Jul 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenjunjiedada commented Jul 19, 2021

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

jackye1995 Jul 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenjunjiedada Jul 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenjunjiedada commented Jul 22, 2021

Uh oh!

chenjunjiedada commented Jul 23, 2021

Uh oh!

chenjunjiedada commented Jul 28, 2021

Uh oh!

jackye1995 commented Jul 28, 2021

Uh oh!

yyanyy commented Jul 28, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackye1995 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenjunjiedada Aug 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Sep 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenjunjiedada commented Aug 4, 2021

Uh oh!

chenjunjiedada commented Aug 4, 2021

Uh oh!

chenjunjiedada commented Sep 13, 2021

chenjunjiedada commented Jul 19, 2021 •

edited

Loading

jackye1995 Jul 19, 2021 •

edited

Loading

chenjunjiedada Jul 20, 2021 •

edited

Loading

chenjunjiedada Aug 4, 2021 •

edited

Loading

aokolnychyi Sep 17, 2021 •

edited

Loading

aokolnychyi Sep 17, 2021 •

edited

Loading