-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Add validation for row-level deletes with rewrites #2865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | |||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -21,8 +21,12 @@ | ||||||||
|
|
|||||||||
| import java.util.Set; | |||||||||
| import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | |||||||||
| import org.apache.iceberg.relocated.com.google.common.collect.Sets; | |||||||||
|
|
|||||||||
| class BaseRewriteFiles extends MergingSnapshotProducer<RewriteFiles> implements RewriteFiles { | |||||||||
| private final Set<DataFile> replacedDataFiles = Sets.newHashSet(); | |||||||||
| private Long startingSnapshotId = null; | |||||||||
|
|
|||||||||
| BaseRewriteFiles(String tableName, TableOperations ops) { | |||||||||
| super(tableName, ops); | |||||||||
|
|
|||||||||
|
|
@@ -63,15 +67,16 @@ private void verifyInputAndOutputFiles(Set<DataFile> dataFilesToDelete, Set<Dele | ||||||||
| } | |||||||||
|
|
|||||||||
| @Override | |||||||||
| public RewriteFiles rewriteFiles(Set<DataFile> dataFilesToDelete, Set<DeleteFile> deleteFilesToDelete, | |||||||||
| public RewriteFiles rewriteFiles(Set<DataFile> dataFilesToReplace, Set<DeleteFile> deleteFilesToReplace, | |||||||||
| Set<DataFile> dataFilesToAdd, Set<DeleteFile> deleteFilesToAdd) { | |||||||||
| verifyInputAndOutputFiles(dataFilesToDelete, deleteFilesToDelete, dataFilesToAdd, deleteFilesToAdd); | |||||||||
| verifyInputAndOutputFiles(dataFilesToReplace, deleteFilesToReplace, dataFilesToAdd, deleteFilesToAdd); | |||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As we've already changed the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @openinx, I had a version where I did that and fixed the error messages, but it ended up being too many unrelated test changes so I'll do that in a follow-up. |
|||||||||
| replacedDataFiles.addAll(dataFilesToReplace); | |||||||||
|
|
|||||||||
| for (DataFile dataFile : dataFilesToDelete) { | |||||||||
| for (DataFile dataFile : dataFilesToReplace) { | |||||||||
| delete(dataFile); | |||||||||
| } | |||||||||
|
|
|||||||||
| for (DeleteFile deleteFile : deleteFilesToDelete) { | |||||||||
| for (DeleteFile deleteFile : deleteFilesToReplace) { | |||||||||
| delete(deleteFile); | |||||||||
| } | |||||||||
|
|
|||||||||
|
|
@@ -85,4 +90,18 @@ public RewriteFiles rewriteFiles(Set<DataFile> dataFilesToDelete, Set<DeleteFile | ||||||||
|
|
|||||||||
| return this; | |||||||||
| } | |||||||||
|
|
|||||||||
| @Override | |||||||||
| public RewriteFiles validateFromSnapshot(long snapshotId) { | |||||||||
| this.startingSnapshotId = snapshotId; | |||||||||
| return this; | |||||||||
| } | |||||||||
|
|
|||||||||
| @Override | |||||||||
| protected void validate(TableMetadata base) { | |||||||||
| if (replacedDataFiles.size() > 0) { | |||||||||
| // if there are replaced data files, there cannot be any new row-level deletes for those data files | |||||||||
| validateNoNewDeletesForDataFiles(base, startingSnapshotId, replacedDataFiles); | |||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should also handle the case :
At timestamp Do we plan to add this validation in the following PR, or did not consider this case because we currently don't have introduced any RewriteFiles action that converting equality delete files into positional delete files ?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my thought, the data conflicts between REPLACE operation and APPEND/OVERWRITE/DELETE operations is because of the dependency relationship from positional delete files to data files. Then there will be two cases:
As a final solution, I think we should handle both the cases perfectly, so that we won't encounter any unexpected interruption when reading the iceberg table.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @openinx, I opened a draft PR, #2892, for to fix this, but I'm not sure that we actually want to merge it. Originally, I copied the Then I realized that not all of those cases apply. For example, if I delete a file that has row deltas, that's perfectly okay. The operation actually deletes all rows that weren't already deleted and any dangling positional deletes will just never get applied. Similarly, an overwrite from, for example, a MERGE INTO operation that replaces a file will need to first read that file and then build its replacement. The read must merge the existing deletes, so the rows will already be removed from the replacement file. Then the deletes in that case are also fine to leave dangling even if they are rewritten into a new file. I ended up leaving a validation that the data files are not rewritten, but I also think that may not be necessary because I don't think it would be reasonable to rewrite a data file without also applying the deletes. If we build a compaction operation we will probably apply deletes during compaction. We could also build one that compacts just data files, but in order for that operation to be correct it would also need to rewrite delete files (that are pointing to the compacted data files) or use an older sequence number with equality deletes. If it rewrote delete files, then the delete rewrite would fail. If it changed sequence numbers, then we would have a lot more validation to do when building that plan. After thinking about it, I don't think that we actually need to handle any cases where a compaction and delete file rewrite are concurrent, at least not from the delete file rewrite side. Any compaction should handle deletes as part of the compaction.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You mean people will need to know whether the data-files-only compaction will break the existing data set ? This puts too much burden on users ? I think we still need to provide the validation because we developers are sure that those cases will break the data correctness ...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The contract of the
I think that is a reasonable requirement for the operation. Compacting only data files without accounting for deletes would change the table data, even without a concurrent delete file rewrite. For example, if I have
Compacting We have to trust the That means either the caller must do one of the following:
If the caller uses option 1, then it doesn't matter to concurrent operations that are rewriting the delete. It becomes a dangling delete that doesn't need to be applied. If the caller uses option 2, then there is a new file tracking the delete that would not be part of a concurrent rewrite operation; a concurrent rewrite may contain a dangling delete. The only case we would need to worry about is option 3, and only if a concurrent delete rewrites an equality delete as a positional delete. To be careful, we could merge #2892 to be able to validate this case. But I'm skeptical that altering the sequence number of the compacted file is a good idea. It may not even be possible if the sequence number of a delete that must be applied is between the sequence numbers of the starting data files. As I said, I'm okay not merging the validation for option 3 because I think it is unlikely that this is going to be a viable strategy. But I'm fine merging #2892 if you think it is needed. |
|||||||||
| } | |||||||||
| } | |||||||||
| } | |||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -312,6 +312,7 @@ static Builder builderFor(FileIO io, Iterable<ManifestFile> deleteManifests) { | |
| static class Builder { | ||
| private final FileIO io; | ||
| private final Set<ManifestFile> deleteManifests; | ||
| private long minSequenceNumber = 0L; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I remember the sequence number 0 is kept for the data files for iceberg v1, so in theory the sequence number from delete files should start from 1. So setting it to 0 as the default value sounds correct. |
||
| private Map<Integer, PartitionSpec> specsById = null; | ||
| private Expression dataFilter = Expressions.alwaysTrue(); | ||
| private Expression partitionFilter = Expressions.alwaysTrue(); | ||
|
|
@@ -323,6 +324,11 @@ static class Builder { | |
| this.deleteManifests = Sets.newHashSet(deleteManifests); | ||
| } | ||
|
|
||
| Builder afterSequenceNumber(long seq) { | ||
| this.minSequenceNumber = seq; | ||
| return this; | ||
| } | ||
|
|
||
| Builder specsById(Map<Integer, PartitionSpec> newSpecsById) { | ||
| this.specsById = newSpecsById; | ||
| return this; | ||
|
|
@@ -357,8 +363,10 @@ DeleteFileIndex build() { | |
| .run(deleteFile -> { | ||
| try (CloseableIterable<ManifestEntry<DeleteFile>> reader = deleteFile) { | ||
| for (ManifestEntry<DeleteFile> entry : reader) { | ||
| // copy with stats for better filtering against data file stats | ||
| deleteEntries.add(entry.copy()); | ||
| if (entry.sequenceNumber() > minSequenceNumber) { | ||
| // copy with stats for better filtering against data file stats | ||
| deleteEntries.add(entry.copy()); | ||
| } | ||
| } | ||
| } catch (IOException e) { | ||
| throw new RuntimeIOException(e, "Failed to close"); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,6 +41,7 @@ | |
| import org.apache.iceberg.relocated.com.google.common.collect.Lists; | ||
| import org.apache.iceberg.relocated.com.google.common.collect.Sets; | ||
| import org.apache.iceberg.util.CharSequenceSet; | ||
| import org.apache.iceberg.util.Pair; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
|
|
@@ -62,6 +63,9 @@ abstract class MergingSnapshotProducer<ThisT> extends SnapshotProducer<ThisT> { | |
| ImmutableSet.of(DataOperations.OVERWRITE, DataOperations.REPLACE, DataOperations.DELETE); | ||
| private static final Set<String> VALIDATE_DATA_FILES_EXIST_SKIP_DELETE_OPERATIONS = | ||
| ImmutableSet.of(DataOperations.OVERWRITE, DataOperations.REPLACE); | ||
| // delete files can be added in "overwrite" or "delete" operations | ||
| private static final Set<String> VALIDATE_REPLACED_DATA_FILES_OPERATIONS = | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I am not sure the var name conveys its purpose and does not follow the convention compared used in 2 vars above. If I understand correctly, this is a set of operations that may produce delete files.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll update this. And I'll also add "delete" as an operation. Just because we only create "overwrite" commits for row-level deletes today doesn't mean that we won't create "delete" commits later. I considered updating the We can discuss whether to convert overwrite commits to delete commits that later, but for now I think that we should add "delete" to this list because it would be a valid commit. Just because the reference implementation doesn't produce them doesn't mean it isn't valid.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had the same thought about using the DELETE operation type when we just add delete files.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think the variable name does follow the convention. If we are validating replaced data files then those are the types of commits that are needed for validation. I'm going to update this by adding |
||
| ImmutableSet.of(DataOperations.OVERWRITE, DataOperations.DELETE); | ||
|
|
||
| private final String tableName; | ||
| private final TableOperations ops; | ||
|
|
@@ -253,28 +257,10 @@ protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotI | |
| return; | ||
| } | ||
|
|
||
| List<ManifestFile> manifests = Lists.newArrayList(); | ||
| Set<Long> newSnapshots = Sets.newHashSet(); | ||
|
|
||
| Long currentSnapshotId = base.currentSnapshot().snapshotId(); | ||
| while (currentSnapshotId != null && !currentSnapshotId.equals(startingSnapshotId)) { | ||
| Snapshot currentSnapshot = ops.current().snapshot(currentSnapshotId); | ||
|
|
||
| ValidationException.check(currentSnapshot != null, | ||
| "Cannot determine history between starting snapshot %s and current %s", | ||
| startingSnapshotId, currentSnapshotId); | ||
|
|
||
| if (VALIDATE_ADDED_FILES_OPERATIONS.contains(currentSnapshot.operation())) { | ||
| newSnapshots.add(currentSnapshotId); | ||
| for (ManifestFile manifest : currentSnapshot.dataManifests()) { | ||
| if (manifest.snapshotId() == (long) currentSnapshotId) { | ||
| manifests.add(manifest); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| currentSnapshotId = currentSnapshot.parentId(); | ||
| } | ||
| Pair<List<ManifestFile>, Set<Long>> history = | ||
| validationHistory(base, startingSnapshotId, VALIDATE_ADDED_FILES_OPERATIONS, ManifestContent.DATA); | ||
| List<ManifestFile> manifests = history.first(); | ||
| Set<Long> newSnapshots = history.second(); | ||
|
|
||
| ManifestGroup conflictGroup = new ManifestGroup(ops.io(), manifests, ImmutableList.of()) | ||
| .caseSensitive(caseSensitive) | ||
|
|
@@ -297,6 +283,39 @@ protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotI | |
| } | ||
| } | ||
|
|
||
| /** | ||
| * Validates that no new delete files that must be applied to the given data files have been added to the table since | ||
| * a starting snapshot. | ||
| * | ||
| * @param base table metadata to validate | ||
| * @param startingSnapshotId id of the snapshot current at the start of the operation | ||
| * @param dataFiles data files to validate have no new row deletes | ||
| */ | ||
| protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, | ||
| Iterable<DataFile> dataFiles) { | ||
| // if there is no current table state, no files have been added | ||
| if (base.currentSnapshot() == null) { | ||
| return; | ||
| } | ||
|
|
||
| Pair<List<ManifestFile>, Set<Long>> history = | ||
| validationHistory(base, startingSnapshotId, VALIDATE_REPLACED_DATA_FILES_OPERATIONS, ManifestContent.DELETES); | ||
| List<ManifestFile> deleteManifests = history.first(); | ||
|
|
||
| long startingSequenceNumber = startingSnapshotId == null ? 0 : base.snapshot(startingSnapshotId).sequenceNumber(); | ||
| DeleteFileIndex deletes = DeleteFileIndex.builderFor(ops.io(), deleteManifests) | ||
| .afterSequenceNumber(startingSequenceNumber) | ||
| .specsById(ops.current().specsById()) | ||
| .build(); | ||
|
|
||
| for (DataFile dataFile : dataFiles) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Do we have to iterate through data files if the index is empty and we did not find any delete manifests? This logic will be triggered on top of various operations including copy-on-write MERGE.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The data files here are the ones that are in memory because they are being replaced. I don't think that this is going to be a significant slow-down since we're just doing an index check, but we can follow up with a couple improvements to make it faster. One improvement I'd opt for before avoiding this loop is to only read manifest files that were created in the new snapshots. That is, when the snapshot ID of the delete file is one of the snapshots newer than the starting snapshot. We don't currently do that because the delete file index builder doesn't support it and it looked more invasive to update the index builder (and we're trying to get 0.12.0 out).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, that would be way more important than skipping this loop. |
||
| // if any delete is found that applies to files written in or before the starting snapshot, fail | ||
| if (deletes.forDataFile(startingSequenceNumber, dataFile).length > 0) { | ||
| throw new ValidationException("Cannot commit, found new delete for replaced data file: %s", dataFile); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| @SuppressWarnings("CollectionUndefinedEquality") | ||
| protected void validateDataFilesExist(TableMetadata base, Long startingSnapshotId, | ||
| CharSequenceSet requiredDataFiles, boolean skipDeletes) { | ||
|
|
@@ -309,6 +328,31 @@ protected void validateDataFilesExist(TableMetadata base, Long startingSnapshotI | |
| VALIDATE_DATA_FILES_EXIST_SKIP_DELETE_OPERATIONS : | ||
| VALIDATE_DATA_FILES_EXIST_OPERATIONS; | ||
|
|
||
| Pair<List<ManifestFile>, Set<Long>> history = | ||
| validationHistory(base, startingSnapshotId, matchingOperations, ManifestContent.DATA); | ||
| List<ManifestFile> manifests = history.first(); | ||
| Set<Long> newSnapshots = history.second(); | ||
|
|
||
| ManifestGroup matchingDeletesGroup = new ManifestGroup(ops.io(), manifests, ImmutableList.of()) | ||
| .filterManifestEntries(entry -> entry.status() != ManifestEntry.Status.ADDED && | ||
| newSnapshots.contains(entry.snapshotId()) && requiredDataFiles.contains(entry.file().path())) | ||
| .specsById(base.specsById()) | ||
| .ignoreExisting(); | ||
|
|
||
| try (CloseableIterator<ManifestEntry<DataFile>> deletes = matchingDeletesGroup.entries().iterator()) { | ||
| if (deletes.hasNext()) { | ||
| throw new ValidationException("Cannot commit, missing data files: %s", | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Cannot commit, deletes have been added to the table which refer to data files which no longer exist"? I think I go that right here ... It would help me a lot if the message here was a bit more explanatory about what went wrong and why.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an existing error message. We can fix it, but I'd prefer to do it in a separate commit so that we don't have changes to more test cases than necessary. I also plan to fix the initial validation in |
||
| Iterators.toString(Iterators.transform(deletes, entry -> entry.file().path().toString()))); | ||
| } | ||
|
|
||
| } catch (IOException e) { | ||
| throw new UncheckedIOException("Failed to validate required files exist", e); | ||
| } | ||
| } | ||
|
|
||
| private Pair<List<ManifestFile>, Set<Long>> validationHistory(TableMetadata base, Long startingSnapshotId, | ||
| Set<String> matchingOperations, | ||
| ManifestContent content) { | ||
| List<ManifestFile> manifests = Lists.newArrayList(); | ||
| Set<Long> newSnapshots = Sets.newHashSet(); | ||
|
|
||
|
|
@@ -322,31 +366,25 @@ protected void validateDataFilesExist(TableMetadata base, Long startingSnapshotI | |
|
|
||
| if (matchingOperations.contains(currentSnapshot.operation())) { | ||
| newSnapshots.add(currentSnapshotId); | ||
| for (ManifestFile manifest : currentSnapshot.dataManifests()) { | ||
| if (manifest.snapshotId() == (long) currentSnapshotId) { | ||
| manifests.add(manifest); | ||
| if (content == ManifestContent.DATA) { | ||
| for (ManifestFile manifest : currentSnapshot.dataManifests()) { | ||
| if (manifest.snapshotId() == (long) currentSnapshotId) { | ||
openinx marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| manifests.add(manifest); | ||
| } | ||
| } | ||
| } else { | ||
| for (ManifestFile manifest : currentSnapshot.deleteManifests()) { | ||
| if (manifest.snapshotId() == (long) currentSnapshotId) { | ||
| manifests.add(manifest); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| currentSnapshotId = currentSnapshot.parentId(); | ||
| } | ||
|
|
||
| ManifestGroup matchingDeletesGroup = new ManifestGroup(ops.io(), manifests, ImmutableList.of()) | ||
| .filterManifestEntries(entry -> entry.status() != ManifestEntry.Status.ADDED && | ||
| newSnapshots.contains(entry.snapshotId()) && requiredDataFiles.contains(entry.file().path())) | ||
| .specsById(base.specsById()) | ||
| .ignoreExisting(); | ||
|
|
||
| try (CloseableIterator<ManifestEntry<DataFile>> deletes = matchingDeletesGroup.entries().iterator()) { | ||
| if (deletes.hasNext()) { | ||
| throw new ValidationException("Cannot commit, missing data files: %s", | ||
| Iterators.toString(Iterators.transform(deletes, entry -> entry.file().path().toString()))); | ||
| } | ||
|
|
||
| } catch (IOException e) { | ||
| throw new UncheckedIOException("Failed to validate required files exist", e); | ||
| } | ||
| return Pair.of(manifests, newSnapshots); | ||
| } | ||
|
|
||
| @Override | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see all the
snapshotIdare set to beops.current().snapshotId()except the test cases ? I'm thinking is there neccesary to introduce a method that validating since a older history snaphsot id.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if this needs to be an open api? Seems like we always want to set it to table.currentSnapshot.snapshotID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because the current snapshot's ID may not be the one that was used for the operation. This
RewriteFilesoperation may be created significantly later than the original planning that is done for compaction. For example, in the new Spark compaction code, the rewrite is created when a group is ready to commit. The table state may have changed from that time so using the current snapshot ID could easily skip delta commits that were interleaved, leaving the same issue that we are fixing here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also see this in the test case, where we actually create commits and then run the rewrite differently to simulate concurrent operations: https://github.com/apache/iceberg/pull/2865/files#diff-b914d99ace31a09150792c10e3b0f1e40331edb1c24233ee6912d4608cfde0c4R615-R643
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that make sense !