-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5553] Prevent partition(s) from being dropped if there are pending… #7669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
ebdb3b6
d4feb2e
dae2ca6
0f6bfe7
45d75d1
431d249
c4ecd3d
b75ee03
c1896d9
cd84566
6e1d03f
343b22a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,6 +27,7 @@ | |
| import org.apache.hudi.common.model.WriteOperationType; | ||
| import org.apache.hudi.common.table.timeline.HoodieInstant; | ||
| import org.apache.hudi.common.table.timeline.TimelineMetadataUtils; | ||
| import org.apache.hudi.common.table.view.SyncableFileSystemView; | ||
| import org.apache.hudi.common.util.HoodieTimer; | ||
| import org.apache.hudi.common.util.collection.Pair; | ||
| import org.apache.hudi.config.HoodieWriteConfig; | ||
|
|
@@ -37,11 +38,13 @@ | |
| import org.apache.hudi.table.action.HoodieWriteMetadata; | ||
|
|
||
| import java.time.Duration; | ||
| import java.util.ArrayList; | ||
| import java.util.Collections; | ||
| import java.util.HashMap; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.stream.Collectors; | ||
| import java.util.stream.Stream; | ||
|
|
||
| import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED; | ||
| import static org.apache.hudi.common.table.timeline.HoodieTimeline.REPLACE_COMMIT_ACTION; | ||
|
|
@@ -62,6 +65,8 @@ public FlinkDeletePartitionCommitActionExecutor(HoodieEngineContext context, | |
|
|
||
| @Override | ||
| public HoodieWriteMetadata<List<WriteStatus>> execute() { | ||
| checkPreconditions(); | ||
|
|
||
| try { | ||
| HoodieTimer timer = new HoodieTimer().startTimer(); | ||
| context.setJobStatus(this.getClass().getSimpleName(), "Gather all file ids from all deleting partitions."); | ||
|
|
@@ -98,4 +103,43 @@ private List<String> getAllExistingFileIds(String partitionPath) { | |
| // because new commit is not complete. it is safe to mark all existing file Ids as old files | ||
| return table.getSliceView().getLatestFileSlices(partitionPath).map(FileSlice::getFileId).distinct().collect(Collectors.toList()); | ||
| } | ||
|
|
||
| /** | ||
| * Check if there are any pending table service actions (requested + inflight) on a table affecting the partitions to | ||
| * be dropped. | ||
| * <p> | ||
| * This check is to prevent a drop-partition from proceeding should a partition have a table service action in | ||
| * the pending stage. If this is allowed to happen, the filegroup that is an input for a table service action, might | ||
| * also be a candidate for being replaced. As such, when the table service action and drop-partition commits are | ||
| * committed, there will be two commits replacing a single filegroup. | ||
| * <p> | ||
| * For example, a timeline might have an execution order as such: | ||
| * 000.replacecommit.requested (clustering filegroup_1 + filegroup_2 -> filegroup_3) | ||
| * 001.replacecommit.requested, 001.replacecommit.inflight, 0001.replacecommit (drop_partition to replace filegroup_1) | ||
| * 000.replacecommit.inflight (clustering is executed now) | ||
| * 000.replacecommit (clustering completed) | ||
| * For an execution order as shown above, 000.replacecommit and 001.replacecommit will both flag filegroup_1 to be replaced. | ||
| * This will cause downstream duplicate key errors when a map is being constructed. | ||
| */ | ||
| private void checkPreconditions() { | ||
| List<String> instantsOfOffendingPendingTableServiceAction = new ArrayList<>(); | ||
| // ensure that there are no pending inflight clustering/compaction operations involving this partition | ||
| SyncableFileSystemView fileSystemView = (SyncableFileSystemView) table.getSliceView(); | ||
|
|
||
| // separating the iteration of pending compaction operations from clustering as they return different stream types | ||
| Stream.concat(fileSystemView.getPendingCompactionOperations(), fileSystemView.getPendingLogCompactionOperations()) | ||
| .filter(op -> partitions.contains(op.getRight().getPartitionPath())) | ||
| .forEach(op -> instantsOfOffendingPendingTableServiceAction.add(op.getLeft())); | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not only compaction, all the table service in pending should be avoided right ?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, compaction and clustering should be avoided. In line >133, the pending clustering groups are iterated over for validation.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, for example, there are pending clustering instants for the partition to be deleted.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @voonhous, is there any updates for pending clustering instants?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please refer to this comment: |
||
| fileSystemView.getFileGroupsInPendingClustering() | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @SteNicholas CMIIW, no additional commits were pushed for that as it is already evaluated in the original commit. Does this evaluation require any enhancement?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @voonhous, could the pending compaction and clustering logic combine?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure! Will change it to improve readability. The implementation here was copied over from
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @voonhous, I got above point. IMO, this could improve the readability and performance because the operation of getting the pending compaction and clustering instants from FileSystemView consume a little performance and time.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I took a look at the code again, I forgot to mention that:
and
returns different I'll add a comment to note this. Not really sure how one can improve performance for this given. Are there any other APIs that we can use to improve the performance of this?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @voonhous, you could introduce new interface to get all pending table service action instant. |
||
| .filter(fgIdInstantPair -> partitions.contains(fgIdInstantPair.getLeft().getPartitionPath())) | ||
| .forEach(x -> instantsOfOffendingPendingTableServiceAction.add(x.getRight().getTimestamp())); | ||
|
|
||
| if (instantsOfOffendingPendingTableServiceAction.size() > 0) { | ||
| throw new HoodieDeletePartitionException("Failed to drop partitions. " | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there is any pending compaction or clustering action, is it better to wait the action completion and then drop partition?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In such a case, the Spark-SQL session might get stuck waiting for extended periods of time if there are multiple compaction/clustering plans involving the partition to be dropped. If there are 5 compaction/clustering jobs involving a partition, we will need to wait for all 5 pending jobs to finish before we can drop the partition.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @voonhous, IMO, from user perspective, there are always compaction/clustering actions for inline table service in Flink jobs and users are hard to control when to drop partition. Therefore, this could wait for the completion of compaction/clustering and drop the partition immediately. WDYT?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. I have given this a thought before. There are a few ways around this issue. The list below is ranked in terms of difficulty to implement (easiest to hardest).
As such, i chose to implement the first option first. A "full" fix is definitely required, this PR is to address the immediate problem of correctness through adding and making limitations known.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @voonhous, IMO, users could check the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeap!
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @voonhous, we could introduce this validation in this pull request and create another ticket for waiting for the completion of pending table service. WDYT?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO, that would be most ideal! There are many ways around the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @voonhous, after improving above readability, I have no any problem for this pull request.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| + "Please ensure that there are no pending table service actions (clustering/compaction) for the partitions to be deleted: " + partitions + ". " | ||
| + "Instant(s) of offending pending table service action: " | ||
| + instantsOfOffendingPendingTableServiceAction.stream().distinct().collect(Collectors.toList())); | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this introduce an untility class into util package of hudi-client-common? And then the Flink and Spark client could invoke this method.