-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Support delete file stats in partitions metadata table #6661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
decc9b0 to
4fd9575
Compare
4fd9575 to
1cf5242
Compare
f5c14dc to
68571a1
Compare
|
cc: @szehon-ho, @RussellSpitzer, @rdblue, @flyrain |
| Types.NestedField.required(1, "partition", Partitioning.partitionType(table)), | ||
| Types.NestedField.required(2, "record_count", Types.LongType.get()), | ||
| Types.NestedField.required(3, "file_count", Types.IntegerType.get()), | ||
| Types.NestedField.required(4, "spec_id", Types.IntegerType.get())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought moving the spec-id to the last looks cleaner as all the stats will be together.
As these are not stored tables and are computed freshly on a query, I thought no need to worry about compatibility. Let me know If I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past we have general said that readers should not rely on position of fields in tables, only on names. That said every time we do something like this I think we end up breaking somebody.
If we are gonna move things it probably should go before file and record count, though rather than after all the stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks. I think we can move spec_id to field id = 2 (before file and record count)
@szehon-ho : What is your opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RussellSpitzer I think it was , we should keep ids/names but not necessarily order. Ref: #3015 (comment)
Didn't see we broke anything that time, we resolved #3015 (comment) eventually as a classpath issue with old classes still in play.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So being said, we should not change the id of spec_id here, but ok to move it to after partition as you guys are saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for linking the previous discussions. I will keep the name and id as it is while moving the field
581643b to
0e05838
Compare
f157341 to
3134359
Compare
|
@szehon-ho: b) I have derived the refactoring work of this PR into a separate one #6975 to reduce the review effort on this PR (we need to merge that first) |
| private int dataFileCount; | ||
|
|
||
| private final Set<DeleteFile> equalityDeleteFiles; | ||
| private final Set<DeleteFile> positionDeleteFiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double check DeleteFile equal/hashcode are working correctly, if conflict? Should we rather do something safer, like paths? Maybe it will even be better for memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Set<DeleteFile> deleteFilesToReplace, |
Set<DeleteFile> was already used in the existing code, as seen above.
But there are no equals() and hashcode() impl for these classes. Which means it is using the default ones.
Note that Set<DataFile> also exist in the code and it also doesn't have equal/hashcode impl.
I am ok with using Paths. But just wondering why it doesn't exist and maybe we can handle these in a follow-up PR instead of this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I am not so confident, that if for example you have two FileScanTask, returning deletes() that are the same logical file, will the two DeleteFile object be different or not? Java default equals() is instance equality, isnt it? And even if it does work, may break if we implement those at some point. Hence the suggestion to use something with established hashCode/equals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same delete file object from the context is reused while making the fileScan here.
| DeleteFile[] deleteFiles = ctx.deletes().forEntry(entry); |
hence, I believe the default equals() is enough.
I don't disagree about having the equals and hashcode for DataFile and DeleteFile. But code is widely using the Set<DataFile> and Set<DeleteFile> already. So, if we are adding it. It should be in a separate PR/discussion.
Let us see what others think on this.
cc: @RussellSpitzer, @jackye1995
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it may work due to cache inside DeleteIndex, but I don't know. I see Set<DeleteFile> but it seems those case are where they are limited to not conflict with each other except instance equality. I would also be interested in the best way here. Will ping @aokolnychyi to take a look as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be more worried about the memory usage here, seems like we have to keep the entire set of all delete files in memory (for all partitions) while we are building this table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i agree in this case, it's probably best to go forward with just storing their paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
38db2b5 to
d5fc274
Compare
|
build failed due to flaky failure. Debugging in a separate PR Will just retrigger the build. |
| this.equalityDeleteFiles = Sets.newHashSet(); | ||
| } | ||
|
|
||
| private void update(FileScanTask task) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would strongly want to consider an approach which either disposes of the set's when the partition info is done being constructed or doesn't use this set approach. It looks like in the current implementation we end up keeping the entire set of delete file objects in memory indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also had the similar concern, and thought of using PositionDeletesTableScan, but it seems there is no equivalent EqualityDeletesTableScan yet.
Maybe another approach here could be, going through each partition and using DeleteFileIndex one by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that way I thought will be quite expensive (two pass).
Probably the only way to effectively do it , until this whole table is migrated over to some kind of view of 'files' table, is to rewrite the PartitionsTableScan to directly use the underlying code: ManifestReader.readDeleteManifest() / ManifestReader.read(), and then go through those iterators, instead of using the ManifestGroup.planFiles() / FileScanTask way.
That way, we can iterate through the DataFile/ DeleteFile, and collect number of delete files/data files in one pass without keeping them in memory. It's definitely do-able but will be a bit more work though. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the only way to effectively do it , until this whole table is migrated over to some kind of view of 'files' table, is to rewrite the PartitionsTableScan to directly use the underlying code: ManifestReader.readDeleteManifest() / ManifestReader.read(), and then go through those iterators, instead of using the ManifestGroup.planFiles() / FileScanTask way.
@szehon-ho: I have spent some time and realized that I am not super familiar with this side of code. Would you like to contribute a PR for data files for replacing ManifestGroup.planFiles()? I can then extend it to delete files and handle these stats updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I am thinking something along the lines of what currently BaseFilesTable/ ManifestsTable does, sure let me take a look , when I get a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will also explore those files.
BTW I have updated to Map<String, Integer> instead of set and cleared it after the usage now. But looking forward to solving it as you suggested by changing the planFiles
c5532f4 to
42babde
Compare
| 3, "file_count", Types.IntegerType.get(), "Count of data files")); | ||
| 3, "file_count", Types.IntegerType.get(), "Count of data files"), | ||
| Types.NestedField.required( | ||
| 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note:
spec_id (4) is present at the top.
b993ff0 to
1448763
Compare
| "pos_delete_record_count", | ||
| Types.LongType.get(), | ||
| "Count of records in position delete files"), | ||
| Types.NestedField.required(6, "pos_delete_file_count", Types.IntegerType.get()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also add descriptions for the file count fields?
|
FYI, we are looking at refactor of existing partitions table to make this task easier, and alleviate concern about keeping large hashset ( @dramaticlly also mentioned interest in looking at it as well and may contribute) |
68bec9e to
8efba58
Compare
|
@szehon-ho, @RussellSpitzer, @jackye1995: I have reworked the PR based on #7189 now. Please review this PR again. |
| // cache a position map needed by each partition spec to normalize partitions to final schema | ||
| Map<Integer, int[]> normalizedPositionsBySpec = | ||
| Maps.newHashMapWithExpectedSize(table.specs().size()); | ||
| try (CloseableIterable<DataFile> datafiles = planDataFiles(scan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to just keep planFiles which returns CloseableIterable<ContentFile<?>> but I couldn't succeed at it during transforming it into ParallelIterable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say, to try this if we can, both to clean the code and also as we don't do another scan for perf reasons.
I think it may work by doing 'CloseableIterables.transform()' in the worst case to cast the type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajantha-bhat can we see about this? Will it work to wrap the manifest iterable to get the right type in planDataFiles()?
CloseableIterable.transform(ManifestFiles.read(manifest, table.io(), table.specs())
.caseSensitive(scan.isCaseSensitive())
.select(BaseScan.SCAN_COLUMNS).entries(), t -> (ContentFile<?>) t);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
8efba58 to
4a2b37f
Compare
| 3, "file_count", Types.IntegerType.get(), "Count of data files"), | ||
| Types.NestedField.required( | ||
| 5, | ||
| "pos_delete_record_count", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'position_delete', 'equality_delete' may be better in the full form, to match SnapshotSummary. Maybe even 'position_delete_count' , 'equality_delete_count' (SnapshotSummary has added_position_deletes, added_equality_deletes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
|
HI, @ajantha-bhat do you still plan to work on this? We are also interested in this and can also give a try to parameterize the method. |
|
I was out of office last week. Hopefully, I can work on it this week and get it merged. |
55fbb86 to
07b0882
Compare
|
@szehon-ho: PR is ready for review. Thanks. |
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, some comments
| case POSITION_DELETES: | ||
| this.posDeleteRecordCount = file.recordCount(); | ||
| this.posDeleteFileCount += 1; | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about specId here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this partition value, while updating the data file count, the Spec id would have been updated.
I don't think there will be delete files without data file entries. So, I assumed that again updating here would be redundant. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it now thinking if the delete happens after the partition evolution, it should reflect the latest spec id.
| partitionsTable.schema().asStruct(), expected.get(i), actual.get(i)); | ||
| } | ||
|
|
||
| testDeleteStats(tableIdentifier, table, partitionsTable, builder, partitionBuilder, expected); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this another test? As the existing test is already quite long and hard to read, and it's doing more table modifications inside the new method, seems like it fits another test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
I had to squash the commits for easy rebase and conflict resolution. |
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Sorry about rebase, I guess it was my intervening refactor. Anyway, looks good, just a minor comment
| @Override | ||
| public Schema schema() { | ||
| if (table().spec().fields().size() < 1) { | ||
| return schema.select("record_count", "file_count"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's actually a bug here, and it only checks latest spec for Unpartitioned. if we have partition fields before but removed them, we will not show them. See other metadata tables like BaseFilesTable.schema.
Anyway its unrelated, but made #7533 to track it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I will explore this in the follow-up.
| CloseableIterable.transform( | ||
| ManifestFiles.open(manifest, table.io(), table.specs()) | ||
| .caseSensitive(scan.isCaseSensitive()) | ||
| .select(BaseScan.DELETE_SCAN_COLUMNS), // don't select stats columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can do a switch on content to get either DELETE_SCAN_COLUMNS or SCAN_COLUMNS to make it clearer. Looks a bit weird, but I guess it works as DELETE_SCAN_COLUMNS is superset of SCAN_COLUMNS .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it with the switch.
DELETE_SCAN_COLUMNS has only content type as an extra field. Which can work for both the data and the delete file.
Agree that the name looks weird when used for generic content. CONTENT_SCAN_COLUMNS would have been the more suitable name for it.
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will merge tomorrow if no further comments
| PartitionUtil.coercePartition( | ||
| partitionType, table.specs().get(dataFile.specId()), dataFile.partition()); | ||
| partitions.get(partition).update(dataFile); | ||
| try (CloseableIterable<ContentFile<?>> files = planFiles(scan)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for closing this
|
Merged, thanks @ajantha-bhat for the work! |
Currently partitions metadata table only has the data file stats
When the delete files are present, these stats are inaccurate (as we don't decrement these values).
So, capture the delete file stats to give a rough idea about why these stats are inaccurate.
Note that we are not applying the deletes to the data file and computing the effective result as it will be a very expensive operation. Users are suggested to execute rewrite_data_files periodically to apply the delete files to the data files.
Delete file stats to be added:
Note:
Fixes #6042