-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: support rewrite data files with starting sequence number #3480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * @param sequenceNumber a sequence number | ||
| * @return this for method chaining | ||
| */ | ||
| RewriteFiles overrideSequenceNumberForNewDataFiles(long sequenceNumber); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is probably a shorter, more descriptive name for this. Something like commitAtSequenceNumber?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasons I used this long name are:
- this is not actually committing with the given sequence number, the overall sequence number still increments, but the sequence number provided is used in the manifest entry of the newly added data files.
- I was thinking maybe we will need to do something similar for delete files, that's why there is a suffix of
ForNewDataFiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points about naming, but this still seems long to me. Maybe we should pass the sequence number in with the data files then?
We could add rewriteFiles(Iterable<DataFile> toRemove, Iterable<DataFile> toAdd, long sequenceNumber). I think that's a bit more clear because you don't remove files at a sequence number. We also don't have the problem that it is confusing for delete files because it is the data file method only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a fairly clean way to configure everything now that I'm looking at context. Up to you whether to use rewriteFiles to pass it or a new method, but I think we should come up with a shorter name if we go with the new method approach. What about dataFileSequenceNumber?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I think the new rewriteFiles method sounds good to me, let me update with that.
| * Defaults to false. | ||
| */ | ||
| String USE_STARTING_SEQUENCE_NUMBER = "use-starting-sequence-number"; | ||
| boolean USE_STARTING_SEQUENCE_NUMBER_DEFAULT = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we wouldn't use this as the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. And I have some concerns. In what situation do users need to use the new sequence number when commit the rewritten data file instead of using the starting sequence number? Maybe this configuration is not necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I used false here is because otherwise it changes the existing behavior that has been released. Are we allowed to change this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to change this behavior. It doesn't affect correctness and it will avoid conflicts. That's a win. I also can't imagine people relying on the behavior of sequence numbers at this level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I will update
| validateNoNewDeletesForDataFiles( | ||
| base, startingSnapshotId, conflictDetectionFilter, | ||
| deletedDataFiles, caseSensitive); | ||
| deletedDataFiles, caseSensitive, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear what's happening here. If you're introducing a new method argument, can you add an override so that this doesn't need to change? If you think that the boolean is needed, then can you add an inline comment to explain what you're setting?
| return writer.length(); | ||
| } | ||
|
|
||
| void useSequenceNumber(long sequenceNumber) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the other configuration is passed in when creating the writer. Is there a reason not to do that here? The benefit of doing that is that the sequence number would not be mutable. So you wouldn't be able to do something like this:
ManifestWriter<DataFile> writer = new ManifestWriter(...);
writer.add(dataFile1);
writer.setSequenceNumber(N);
writer.add(dataFile2);It isn't really clear what the correct behavior for that code would be.
Also, if the sequence number was fixed at create time, then we could write the sequence number into every file that doesn't have a sequence number instead of using inheritance. I think that we prefer setting the sequence number on entries rather than inheriting. And also the manifest itself could have the correct sequence number when it was added.
If we always have the correct sequence number for manifests themselves, then I think we get what we wanted with the proposal to use two sequence numbers. The manifest sequence number is always when the ADDED files were actually added. The file sequence number is always when the the is in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cleanest way to do this is to add a new add method:
/**
* Add an added entry for a file at a specific sequence number.
* <p>
* The entry's snapshot ID will be this manifest's snapshot ID.
*
* @param addedFile a data file
* @param sequenceNumber sequence number for the data file
*/
public void add(F addedFile, long sequenceNumber) {
addEntry(reused.wrapAppend(snapshotId, sequenceNumber, addedFile));
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, will do it through add
| DeleteFile[] deleteFiles = deletes.forDataFile(startingSequenceNumber, dataFile); | ||
| if (ignoreEqualityDeletes) { | ||
| ValidationException.check(Arrays.stream(deleteFiles) | ||
| .noneMatch(deleteFile -> deleteFile.content() == FileContent.POSITION_DELETES), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you wrap the line after check? I think it looks strange to have a double indent because the first part of the line wasn't wrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, | ||
| Expression dataFilter, Iterable<DataFile> dataFiles, | ||
| boolean caseSensitive) { | ||
| boolean caseSensitive, boolean ignoreEqualityDeletes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense for the internal API, but I generally prefer new method names for booleans, like validateNoNewPositionDeletesForDataFiles and validateNoNewDeletesForDataFiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add some comment to indicate why we could ignore the equality deletes and when we should ignore or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new method and moved this to a private method, please let me know if that is enough
| long minSeqNumber = minSequenceNumber != null ? minSequenceNumber : UNASSIGNED_SEQ; | ||
| return new GenericManifestFile(file.location(), writer.length(), specId, content(), | ||
| UNASSIGNED_SEQ, minSeqNumber, snapshotId, | ||
| manifestSequenceNumber, minSeqNumber, snapshotId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also set sequenceNumber for manifestfile, so i think maybe it is same as #3204. As @rdblue
say in #3204 (comment) maybe we need one new property like dataSeqnumber in manifestfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But i also think dataSeqnumber in manifestfile will cause more complex when we read data. And it maybe will cause some problem in version compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some different between #3204 and this PR. In this PR, the specific seqNum will be set to manifest file, and manifest list file will still got a new seqNum when we commit the snapshot of rewrite. In the result, we could got the incremental seqNum from snapshot and got spercific seqNum form data files because data file will inherit seqNum from menifest file but not from snapshot. Then we can use the seqNum of the data file to verify whether there are deleted files to modify the rewritten data files.
But in #3204, the seqNum of snapshot will override by specific seqNum, and we will got two snapshot which have same seqNum. So that, the monotonicity of snapshot seqNum will be break and we can not recognize which snapshot is the new one because they have same seqNum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue @Reo-LEI As fllows, it is the manifestFile content produce by this PR, it is also inherit old sequence_number and set it in manifestFile, what get same result as #3204.
{"manifest_path":"hdfs://ns/group/user/root/meta/hive-temp-table/iceberg_hive_catalog_test1.db/sample35/metadata/ee0fd321-8178-4631-b62b-50d6bb365bc2-m1.avro","manifest_length":6674,"partition_spec_id":0,"content":0,"sequence_number":1837,"min_sequence_number":1839,"added_snapshot_id":5384193052566473584,"added_data_files_count":1,"existing_data_files_count":0,"deleted_data_files_count":0,"added_rows_count":14,"existing_rows_count":0,"deleted_rows_count":0,"partitions":{"array":[]}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there may be some confusion here. This PR should be updated so that the manifest list and manifest files use a newly assigned sequence number and just the data file entries use the specified sequence number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue I think @jackye1995 use the same idea as mine just because iceberg apply eq-delete by manifestfile seqnumber, so it is easy to resolve this problem by this way.
What you worry in this comment, I think the manifest file information has snapshot id , and every snapshot has different seqnumber in metadata json file, we can recover everything as we want. So can you give an example that you worry about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hameizi, the snapshot and manifest list should be written with the latest sequence number so that we can track it. Re-sequenced files can override the sequence number, but we should have correct values in metadata. We will probably add another field in the manifests to inherit it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should be updated so that the manifest list and manifest files use a newly assigned sequence number and just the data file entries use the specified sequence number.
Yes I think that is the approach used in this PR
| Snapshot baseSnapshot = table.currentSnapshot(); | ||
|
|
||
| // add an equality delete file | ||
| DeleteFile deleteFile1 = FileMetadata.deleteFileBuilder(table.spec()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some helpers for this (at least with data files)
newDeleteFile and newDataFile
In testTableBase
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am good with this, I agree if we want a true "this was added" id we can just keep that at the manifest level and change the "sequence number" to a "deletes up to this number applied"
| sequenceNumberForNewDataFiles != null); | ||
| } | ||
|
|
||
| protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you copy the javadoc for the previous version? I think it's helpful to have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added back
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question about how this works with V1 but this looks good to me.
|
|
||
| /** | ||
| * Validates that no new delete files that must be applied to the given data files have been added to the table since | ||
| * a starting snapshot, with the option to ignore equality deletes during the validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nbd, but I would add a note here about why we want to ignore equality deletes, just so future readers could understand.
| * Defaults to true. | ||
| */ | ||
| String USE_STARTING_SEQUENCE_NUMBER = "use-starting-sequence-number"; | ||
| boolean USE_STARTING_SEQUENCE_NUMBER_DEFAULT = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is true, do we have to ignore it with V1 Tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was thinking about that right now. Technically I think it has no harm for v1 tables, because the sequence number is always 0, and it is not read or written anywhere. Let me add a unit test for v1. Do you see any place this might affect v1 table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's all I was thinking, V1 Tables don't have sequence numbers so I just wanted to make sure they don't break if we are trying to set them.
| } | ||
|
|
||
| @Override | ||
| public RewriteFiles rewriteFiles(Set<DataFile> filesToDelete, Set<DataFile> filesToAdd, long sequenceNumber) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly unrelated concern: should we be using Set here? It seems needlessly restrictive. Plus, DataFile is an interface, so you could easily pass files that don't implement equals/hashSet and are always considered unique. Just one thing that's always made me wonder about this API.
Not a blocker for this PR though!
|
Thanks, @jackye1995! Great to have this work done! |
Add new hook in
RewriteFilessnapshot update to allow accepting a sequence number that is used for all new data files. That will force the manifest writer to produceManifestFilewith the provided sequence number instead of-1.Also add a new property in
RewriteDataFilesto use this feature through configuse-starting-sequence-number. When enabled, the sequence number when compaction starts will be used for commit.This whole mechanism solves the issue today in CDC where compaction has conflicts with new equality delete files. With this change,
RewriteFilescan go through as long as the newly added data files don't have new position deletes.