Spark: support replace equality deletes to position deletes #2216

chenjunjiedada · 2021-02-05T04:03:17Z

This adds a spark action to replace the equality deletes to position deletes which I think is minor compaction. The logic is:

Plan and group the tasks by partition. Current it doesn't consider the filter, we may consider filter, such as partition filter, later.
Use the delete matcher to keep rows that match the equality delete set. The rows are projected with file and pos fields.
Write the matched rows via position delete writer.
Perform the rewrite files to replace equality deletes with position deletes.

This adds an API in RewriteFiles to rewrite equality deletes to position deletes. It should keep the same semantic with current API that rows must be the same as before as after. This could be used to combine position deletes to reduce some small files.

chenjunjiedada · 2021-02-10T00:47:39Z

Hi @rdblue @aokolnychyi @openinx, This is a draft for replacing deletes. Could you please help to take a look and check whether this is the right direction? I'd like to add another API in action to compact multiple position deletes to one.

data/src/main/java/org/apache/iceberg/data/DeleteFilter.java

rdblue · 2021-02-10T02:00:41Z

I'll try to make time this week. Thanks for working on this, @chenjunjiedada!

openinx · 2021-03-05T06:44:17Z

spark/src/main/java/org/apache/iceberg/actions/ReplaceDeleteAction.java

+    Map<StructLikeWrapper, Collection<FileScanTask>> groupedTasks = groupTasksByPartition(tasksWithEqDelete.iterator());
+
+    // Split and combine tasks under each partition
+    // TODO: can we split task?


Yes, we can split the task based on DataFile(s) here. But that introduce another issue here, the current balance policy ( for splitting tasks ) only consider the DataFile , the idea way should be considering both insert file size and delete file size. I think there should be another separate issues to address this.

Filed an issue for this: #2298

openinx · 2021-03-05T07:27:25Z

spark/src/main/java/org/apache/iceberg/actions/ReplaceDeleteAction.java

+      Broadcast<EncryptionManager> encryption = sparkContext.broadcast(encryptionManager());
+
+      DeleteRewriter deleteRewriter = new DeleteRewriter(table, caseSensitive, io, encryption);
+      List<DeleteFile> posDeletes = deleteRewriter.toPosDeletes(taskRDD);


I'd like to move the RDD chaining out of the DeleteRewriter class , so that we could reuse that class for other compute engine's ReplaceDeleteAction.

List<DeleteFile> posDeletes = taskRDD.map(deleteRewriter::toPosDeletes) .collect() .stream() .flatMap(Collection::stream) .collect(Collectors.toList());

OK, the DeleteRewriter is still using few other spark's class such as SparkAppenderFactory. We may need to abstract that part logics, so that we could reuse the rewrite logics between different engines.

openinx · 2021-03-05T07:30:55Z

spark/src/main/java/org/apache/iceberg/actions/ReplaceDeleteAction.java

+        scan.deletes().stream().anyMatch(delete -> delete.content().equals(FileContent.EQUALITY_DELETES))
+    );
+
+    List<DeleteFile> eqDeletes = Lists.newArrayList();


Nit: I think the eqDeletes should better be defined as HashSet because different FileScanTask will share the same equality delete files (Though we've use the HashSet to deduplicate the same equality delete files in RewriteFiles , I still think it's better to do this before calling that API).

openinx · 2021-03-05T09:21:04Z

data/src/main/java/org/apache/iceberg/data/DeleteFilter.java

+          CloseableIterable.transform(CloseableIterable.concat(deleteRecords), Record::copy),
+          deleteSchema.asStruct());
+
+      matchedRecords = CloseableIterable.concat(Lists.newArrayList(matchedRecords, Deletes.match(remainRecords,


Here I'm concerning it may not worth to take such a high complexity. Let's define the whole data set as S, for the first equality field ids <1,2>, the deleteSet is S1, the second equality field ids <1,2>, the deleteSet is S2, the third equality delete field ids <2,5>, the deleteSet is S3.

Finally the concat matchedRecords will be

Intersection(S, S1) UNION Intersection(( S - S1 ), S2) UNION Intersection((S - S1- S2), S3)

( Here S - S1 means it will return all elements which is in set S but not in set S1 )

though the current code will return the correct converted positional deletions , but it will iterate the big data set S three times ? This overhead will be very large...

I think it would not iterate the data set several times since these are iterable chains and should be computed lazily.

For a filter chain of a data set of N elements with filters (F1, F2, F3, F4), suppose it will filter out (N1, N2, N3, N4) items, I think it iterates data set one time and the number of filter calls should be:

F1 N times

F2 (N-N1) times

F3 (N-N1-N2) times

F4 (N-N1-N2-N3) times

For a matching chain of a data set of N elements with filters (F1, F2, F3, F4), suppose it matches out (N1, N2, N3, N4) items, I think it iterates data set one time and the number of filter calls should be:

F1 2N times (filter and match)

F2 2(N-N1) times (filter and match)

F3 2(N-N1-N2) times (filter and match)

F4 2(N-N1-N2-N3) times (filter and match)

@rdblue Could you please help to correct me if I am wrong?

Here is an alternative implementation that collects all delete sets in a list and does the projection in the filter. It doesn't depend on temporary iterables and looks a bit straightforward. I could change to this one if you like it.

public static <T> CloseableIterable<T> match(CloseableIterable<T> rows, BiFunction<T, StructProjection, StructLike> rowToDeleteKey, List<Pair<StructProjection, StructLikeSet>> unprojectedDeleteSets) { if (unprojectedDeleteSets.isEmpty()) { return rows; } EqualitySetDeleteMatcher<T> equalityFilter = new EqualitySetDeleteMatcher<>(rowToDeleteKey, unprojectedDeleteSets); return equalityFilter.filter(rows); } private static class EqualitySetDeleteMatcher<T> extends Filter<T> { private final List<Pair<StructProjection, StructLikeSet>> deleteSets; private final BiFunction<T, StructProjection, StructLike> extractEqStruct; protected EqualitySetDeleteMatcher(BiFunction<T, StructProjection, StructLike> extractEq, List<Pair<StructProjection, StructLikeSet>> deleteSets) { this.extractEqStruct = extractEq; this.deleteSets = deleteSets; } @Override protected boolean shouldKeep(T row) { for (Pair<StructProjection, StructLikeSet> deleteSet : deleteSets) { if (deleteSet.second().contains(extractEqStruct.apply(row, deleteSet.first()))) { return true; } } return false; } }

PS: For the delete files with the same equality field IDs we will collect the deletes in one set.

Let me post it first.

I think it would not iterate the data set several times since these are iterable chains and should be computed lazily.

That's incorrect, to analysis the complexity, we only need to consider the key sentence:

Deletes.match(remainRecords, record -> projectRow.wrap(asStructLike(record)), deleteSet)

The final returned matchedRecords is composed by several above Iterable (s). When iterate this Iterable, we will scan all the elements in remainRecords, finally we will scan the original data set multiple times. That's why I said the complexity is too high.

chenjunjiedada · 2021-03-05T10:00:38Z

Thanks @openinx for reviewing! I will update ASAP.

openinx · 2021-03-08T03:03:49Z

api/src/main/java/org/apache/iceberg/RewriteFiles.java

+   * @param deletesToAdd files that will be added, cannot be null or empty.
+   * @return this for method chaining
+   */
+  RewriteFiles rewriteDeletes(Set<DeleteFile> deletesToDelete, Set<DeleteFile> deletesToAdd);


Before we start the replacing equality deletes with position deletes, I think we need to refactor the RewriteFiles API to adjust more cases:

Rewrite data files and remove all the delete rows. The files to delete will be a set of data files and a set of delete files, and the files to add will be a set of data files.

Replace equality deletes with position deletes, the files to delete will be a set of equality delete files (we will need to ensure that all delete files are equality delete files ? ) , the files to add will be a set of position delete files.

Merging small delete files into a bigger delete files. The files to delete will be a set of equality/position delete files, the files to add will be a set of equality/position delete files.

That makes sense to me. I think we could parallelize the API refactoring and the implementation.

core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java

openinx · 2021-03-08T03:10:47Z

core/src/main/java/org/apache/iceberg/deletes/Deletes.java

    return filter.filter(rows);
  }

+  public static <T> CloseableIterable<T> match(CloseableIterable<T> rows,


match is not a good candidate for me to express the meaning of finding the existing row data that hits the equality delete sets. I may need a better name for this.

data/src/main/java/org/apache/iceberg/data/DeleteFilter.java

openinx

The design looks good to me overall now, I think it's time to split the bigger PR into several small PRs for reviewing now. FYI @rdblue @aokolnychyi .

openinx · 2021-03-09T06:16:31Z

spark/src/main/java/org/apache/iceberg/actions/ReplaceDeleteAction.java

+    try (CloseableIterator<FileScanTask> iterator = tasksIter) {
+      iterator.forEachRemaining(task -> {
+        StructLikeWrapper structLike = StructLikeWrapper.forType(spec.partitionType()).set(task.file().partition());
+        if (TableScanUtil.hasDeletes(task)) {


Here the task must have at least one delete files because the Collection<FileScanTask> has been got by filtering the EQUALITY_DELETES delete files.

openinx · 2021-03-09T06:27:15Z

spark/src/main/java/org/apache/iceberg/spark/source/DeleteRewriter.java

+      deleteRowReader.close();
+      deleteRowReader = null;
+
+      posDeleteWriter.close();


Don't have to close the posDeleteWriter here because the following complete() will close it inside automatically.

openinx · 2021-03-09T06:31:57Z

spark/src/main/java/org/apache/iceberg/spark/source/DeleteRewriter.java

+    PartitionKey key = new PartitionKey(spec, schema);
+    key.partition(task.first());


Could we just pass the PartitionKey when groupTasksByPartition in ReplaceDeleteAction ? then we don't have to partition it again here , actually it's already partition value for the current task.first().

You are right! The original logic here has a problem, it uses the partition value for PartitionKey which should expect a data row. I updated the writer constructor to accept StructLike instead of PartitionKey to fix this. Let me update unit tests as well to cover this.

openinx · 2021-03-09T07:53:26Z

core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java

+  public RewriteFiles rewriteDeletes(Set<DeleteFile> deletesToDelete, Set<DeleteFile> deletesToAdd) {
+    Preconditions.checkArgument(deletesToDelete != null && !deletesToDelete.isEmpty(),
+        "Files to delete cannot be null or empty");
+    Preconditions.checkArgument(deletesToAdd != null && !deletesToAdd.isEmpty(),


This check is incorrect, because if all the equality deletes are not hit the data files, then there will be no position delete to produce..

I will suggest to add an unit test for this.

I understand your concern. The check is used to discard the invalid rewrite, we don't want to continue the rewrite if there is no position delete produced. Don't we?

This kind of rewrite is valid actually because it replace all the useless equality files to empty position delete files. After the rewrite action, the normal read path don't have to filter the useless equality deletes again, that will be a great performance improvement. So we have to submit the RewriteFiles transaction here.

You could see the validation in the extended RewriteFiles API here ( https://github.com/apache/iceberg/pull/2294/files#diff-b92a78b7fb207d4979d503a442189d9d096e4d19519a4b83eed9e1e779843810R68)

Make sense to me! I will update then.

yyanyy · 2021-03-17T21:44:43Z

data/src/main/java/org/apache/iceberg/data/DeleteFilter.java

  }

-  private CloseableIterable<T> applyEqDeletes(CloseableIterable<T> records) {
+  public CloseableIterable<T> matchEqDeletes(CloseableIterable<T> records) {


Looks like this method is mostly the same as applyEqDeletes except for the predicate evaluation, do we want to abstract the common logic out?

Yes, in this separate PR, we've abstracted them into a single method. https://github.com/apache/iceberg/pull/2320/files#diff-a6641d31cdfd66835b3447bef04be87786849126b07761e47b852837f67a988aR151

yyanyy · 2021-03-17T21:46:52Z

data/src/main/java/org/apache/iceberg/data/DeleteFilter.java

+      deleteSetFilters.add(predicate);
+    }
+
+    Filter<T> findDeleteRows = new ChainOrFilter<>(deleteSetFilters);


Do we need an extra class for this? This seems to be achievable via something like

return CloseableIterable.filter(records, record -> deleteSetFilters.stream().anyMatch(filter -> filter.test(record)));

We've removed the ChainOrFilter in the committed PR #2320. The reviewed patch should have fixed your concern.

yyanyy · 2021-03-18T01:00:19Z

spark/src/main/java/org/apache/iceberg/actions/ReplaceDeleteAction.java

+      DeleteRewriter deleteRewriter = new DeleteRewriter(table, caseSensitive, io, encryption);
+      List<DeleteFile> posDeletes = deleteRewriter.toPosDeletes(taskRDD);
+
+      if (!eqDeletes.isEmpty() && !posDeletes.isEmpty()) {


I think the comment "This kind of rewrite is valid actually because it replace all the useless equality files to empty position delete files" also applies here that we don't need to check for empty posDeletes?

Yeah, you're correct !

yyanyy · 2021-03-18T01:31:32Z

spark/src/main/java/org/apache/iceberg/spark/source/DeleteRowReader.java

+    // update the current file for Spark's filename() function
+    InputFileBlockHolder.set(file.path().toString(), task.start(), task.length());
+
+    return matches.matchEqDeletes(open(task, requiredSchema, idToConstant)).iterator();


Looks like mostly only this line differs between this class and RowDataReader that I think we can abstract a lot of the code out

The newly introduced EqualityDeleteReader should have fixed your comment: https://github.com/apache/iceberg/pull/2320/files#diff-6dc9ab9ec3abcb1972bc39e5c0f0fa95b00a822c0b8996b3c94d2dc702381fe4R34

yyanyy · 2021-03-18T01:33:00Z

spark/src/main/java/org/apache/iceberg/actions/ReplaceDeleteAction.java

+      Broadcast<EncryptionManager> encryption = sparkContext.broadcast(encryptionManager());
+
+      DeleteRewriter deleteRewriter = new DeleteRewriter(table, caseSensitive, io, encryption);
+      List<DeleteFile> posDeletes = deleteRewriter.toPosDeletes(taskRDD);


chenjunjiedada · 2021-03-20T04:34:50Z

@yyanyy Thanks a lot for your review! I Will update ASAP.

coolderli · 2021-09-13T08:31:22Z

@chenjunjiedada any update?Thanks.

chenwyi2 · 2023-10-07T09:15:27Z

why this mr was closed?

chenjunjiedada · 2023-12-27T15:11:49Z

@chenwyi2 This was split into #2864 and #2364. Please see #2372 to check why this is closed.

Spark: support replace equality deletes to position deletes

76f2908

github-actions bot added API core data INFRA spark labels Feb 5, 2021

chenjunjiedada added 2 commits February 5, 2021 13:11

fix failed ut

75e24ce

minor changes

16c8840

chenjunjiedada commented Feb 10, 2021

View reviewed changes

data/src/main/java/org/apache/iceberg/data/DeleteFilter.java Show resolved Hide resolved

chenjunjiedada mentioned this pull request Mar 4, 2021

Core: Support rewriting delete files. #2294

Merged

openinx reviewed Mar 5, 2021

View reviewed changes

openinx linked an issue Mar 5, 2021 that may be closed by this pull request

Add an action to rewrite equality deletes as position deletes #1026

Closed

openinx reviewed Mar 5, 2021

View reviewed changes

change matcher implementation

844a5ea

openinx reviewed Mar 8, 2021

View reviewed changes

use filter chain to simplify filter logic

3eaa2e8

openinx reviewed Mar 9, 2021

View reviewed changes

chenjunjiedada added 2 commits March 9, 2021 20:00

fix using wrong partition value

b57d8f9

update unit test

74e08e6

openinx mentioned this pull request Mar 11, 2021

Spark: Refactor action for expiring snapshots #2314

Merged

yyanyy reviewed Mar 18, 2021

View reviewed changes

chenjunjiedada mentioned this pull request Mar 24, 2021

Spark: Add an action to rewrite equality deletes #2364

Closed

chenjunjiedada mentioned this pull request Jul 26, 2021

API: add an action API for rewrite deletes #2841

Merged

chenjunjiedada closed this Nov 12, 2021

chenwyi2 mentioned this pull request Dec 4, 2023

RewritePositionDeleteFiles cannot work with equality delete file? #9209

Closed

pdames mentioned this pull request Jun 30, 2025

Native Flink IO Connector ray-project/deltacat#562

Open

		PartitionKey key = new PartitionKey(spec, schema);
		key.partition(task.first());

Spark: support replace equality deletes to position deletes #2216

Spark: support replace equality deletes to position deletes #2216

Uh oh!

Conversation

chenjunjiedada commented Feb 5, 2021

Uh oh!

chenjunjiedada commented Feb 10, 2021

Uh oh!

Uh oh!

rdblue commented Feb 10, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenjunjiedada Mar 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenjunjiedada commented Mar 5, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openinx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx Mar 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenjunjiedada Mar 7, 2021 •

edited

Loading

openinx Mar 9, 2021 •

edited

Loading