Skip to content

Conversation

@ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Jan 25, 2023

Currently partitions metadata table only has the data file stats

file_count
record_count

When the delete files are present, these stats are inaccurate (as we don't decrement these values).
So, capture the delete file stats to give a rough idea about why these stats are inaccurate.
Note that we are not applying the deletes to the data file and computing the effective result as it will be a very expensive operation. Users are suggested to execute rewrite_data_files periodically to apply the delete files to the data files.

Delete file stats to be added:

pos_delete_file_count
pos_delete_record_count
eq_delete_file_count
eq_delete_record_count

Note:

  • Docs will be updated in a follow-up PR, probably after renaming file_count, and record_count.
  • The same schema will also be used for the partition stats feature during implementation.

Fixes #6042

@github-actions github-actions bot added the core label Jan 25, 2023
@github-actions github-actions bot added the spark label Jan 25, 2023
@ajantha-bhat ajantha-bhat marked this pull request as draft January 25, 2023 16:55
@ajantha-bhat ajantha-bhat marked this pull request as draft January 25, 2023 16:55
@ajantha-bhat ajantha-bhat force-pushed the delete-stats branch 3 times, most recently from f5c14dc to 68571a1 Compare January 26, 2023 15:47
@ajantha-bhat ajantha-bhat changed the title [WIP] Core: Support delete files stats in partitions metadata table Core: Support delete file stats in partitions metadata table Jan 26, 2023
@ajantha-bhat ajantha-bhat marked this pull request as ready for review January 26, 2023 16:51
@ajantha-bhat
Copy link
Member Author

Types.NestedField.required(1, "partition", Partitioning.partitionType(table)),
Types.NestedField.required(2, "record_count", Types.LongType.get()),
Types.NestedField.required(3, "file_count", Types.IntegerType.get()),
Types.NestedField.required(4, "spec_id", Types.IntegerType.get()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought moving the spec-id to the last looks cleaner as all the stats will be together.

As these are not stored tables and are computed freshly on a query, I thought no need to worry about compatibility. Let me know If I am wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past we have general said that readers should not rely on position of fields in tables, only on names. That said every time we do something like this I think we end up breaking somebody.

If we are gonna move things it probably should go before file and record count, though rather than after all the stats.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks. I think we can move spec_id to field id = 2 (before file and record count)

@szehon-ho : What is your opinion on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussellSpitzer I think it was , we should keep ids/names but not necessarily order. Ref: #3015 (comment)

Didn't see we broke anything that time, we resolved #3015 (comment) eventually as a classpath issue with old classes still in play.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So being said, we should not change the id of spec_id here, but ok to move it to after partition as you guys are saying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for linking the previous discussions. I will keep the name and id as it is while moving the field

@ajantha-bhat ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from 581643b to 0e05838 Compare February 20, 2023 13:24
@ajantha-bhat ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from f157341 to 3134359 Compare March 1, 2023 12:34
@ajantha-bhat
Copy link
Member Author

ajantha-bhat commented Mar 1, 2023

@szehon-ho:
a) I have addressed the duplicate delete file problem using HashSet instead of PositionDeletesScan because we don't have a similar logic for equality deletes and code looks odd. Maybe later we can replace the logic as it is internal logic.

b) I have derived the refactoring work of this PR into a separate one #6975 to reduce the review effort on this PR (we need to merge that first)

private int dataFileCount;

private final Set<DeleteFile> equalityDeleteFiles;
private final Set<DeleteFile> positionDeleteFiles;
Copy link
Member

@szehon-ho szehon-ho Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check DeleteFile equal/hashcode are working correctly, if conflict? Should we rather do something safer, like paths? Maybe it will even be better for memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set<DeleteFile> deleteFilesToReplace,

Set<DeleteFile> was already used in the existing code, as seen above.
But there are no equals() and hashcode() impl for these classes. Which means it is using the default ones.

Note that Set<DataFile> also exist in the code and it also doesn't have equal/hashcode impl.

I am ok with using Paths. But just wondering why it doesn't exist and maybe we can handle these in a follow-up PR instead of this one.

Copy link
Member

@szehon-ho szehon-ho Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am not so confident, that if for example you have two FileScanTask, returning deletes() that are the same logical file, will the two DeleteFile object be different or not? Java default equals() is instance equality, isnt it? And even if it does work, may break if we implement those at some point. Hence the suggestion to use something with established hashCode/equals.

Copy link
Member Author

@ajantha-bhat ajantha-bhat Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same delete file object from the context is reused while making the fileScan here.

DeleteFile[] deleteFiles = ctx.deletes().forEntry(entry);

hence, I believe the default equals() is enough.

I don't disagree about having the equals and hashcode for DataFile and DeleteFile. But code is widely using the Set<DataFile> and Set<DeleteFile> already. So, if we are adding it. It should be in a separate PR/discussion.

Let us see what others think on this.
cc: @RussellSpitzer, @jackye1995

Copy link
Member

@szehon-ho szehon-ho Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it may work due to cache inside DeleteIndex, but I don't know. I see Set<DeleteFile> but it seems those case are where they are limited to not conflict with each other except instance equality. I would also be interested in the best way here. Will ping @aokolnychyi to take a look as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be more worried about the memory usage here, seems like we have to keep the entire set of all delete files in memory (for all partitions) while we are building this table?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i agree in this case, it's probably best to go forward with just storing their paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@ajantha-bhat
Copy link
Member Author

build failed due to flaky failure. Debugging in a separate PR
#7010

Will just retrigger the build.

@ajantha-bhat ajantha-bhat reopened this Mar 6, 2023
this.equalityDeleteFiles = Sets.newHashSet();
}

private void update(FileScanTask task) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would strongly want to consider an approach which either disposes of the set's when the partition info is done being constructed or doesn't use this set approach. It looks like in the current implementation we end up keeping the entire set of delete file objects in memory indefinitely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also had the similar concern, and thought of using PositionDeletesTableScan, but it seems there is no equivalent EqualityDeletesTableScan yet.

Maybe another approach here could be, going through each partition and using DeleteFileIndex one by one.

Copy link
Member

@szehon-ho szehon-ho Mar 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that way I thought will be quite expensive (two pass).

Probably the only way to effectively do it , until this whole table is migrated over to some kind of view of 'files' table, is to rewrite the PartitionsTableScan to directly use the underlying code: ManifestReader.readDeleteManifest() / ManifestReader.read(), and then go through those iterators, instead of using the ManifestGroup.planFiles() / FileScanTask way.

That way, we can iterate through the DataFile/ DeleteFile, and collect number of delete files/data files in one pass without keeping them in memory. It's definitely do-able but will be a bit more work though. Any thoughts?

Copy link
Member Author

@ajantha-bhat ajantha-bhat Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the only way to effectively do it , until this whole table is migrated over to some kind of view of 'files' table, is to rewrite the PartitionsTableScan to directly use the underlying code: ManifestReader.readDeleteManifest() / ManifestReader.read(), and then go through those iterators, instead of using the ManifestGroup.planFiles() / FileScanTask way.

@szehon-ho: I have spent some time and realized that I am not super familiar with this side of code. Would you like to contribute a PR for data files for replacing ManifestGroup.planFiles()? I can then extend it to delete files and handle these stats updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I am thinking something along the lines of what currently BaseFilesTable/ ManifestsTable does, sure let me take a look , when I get a chance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will also explore those files.

BTW I have updated to Map<String, Integer> instead of set and cleared it after the usage now. But looking forward to solving it as you suggested by changing the planFiles

3, "file_count", Types.IntegerType.get(), "Count of data files"));
3, "file_count", Types.IntegerType.get(), "Count of data files"),
Types.NestedField.required(
5,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note:
spec_id (4) is present at the top.

@ajantha-bhat ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from b993ff0 to 1448763 Compare March 21, 2023 04:40
@ajantha-bhat ajantha-bhat reopened this Mar 21, 2023
"pos_delete_record_count",
Types.LongType.get(),
"Count of records in position delete files"),
Types.NestedField.required(6, "pos_delete_file_count", Types.IntegerType.get()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also add descriptions for the file count fields?

@szehon-ho
Copy link
Member

szehon-ho commented Mar 22, 2023

FYI, we are looking at refactor of existing partitions table to make this task easier, and alleviate concern about keeping large hashset ( @dramaticlly also mentioned interest in looking at it as well and may contribute)

@ajantha-bhat ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from 68bec9e to 8efba58 Compare April 10, 2023 15:13
@ajantha-bhat
Copy link
Member Author

@szehon-ho, @RussellSpitzer, @jackye1995: I have reworked the PR based on #7189 now. Please review this PR again.

// cache a position map needed by each partition spec to normalize partitions to final schema
Map<Integer, int[]> normalizedPositionsBySpec =
Maps.newHashMapWithExpectedSize(table.specs().size());
try (CloseableIterable<DataFile> datafiles = planDataFiles(scan);
Copy link
Member Author

@ajantha-bhat ajantha-bhat Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to just keep planFiles which returns CloseableIterable<ContentFile<?>> but I couldn't succeed at it during transforming it into ParallelIterable

Copy link
Member

@szehon-ho szehon-ho Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say, to try this if we can, both to clean the code and also as we don't do another scan for perf reasons.

I think it may work by doing 'CloseableIterables.transform()' in the worst case to cast the type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajantha-bhat can we see about this? Will it work to wrap the manifest iterable to get the right type in planDataFiles()?

CloseableIterable.transform(ManifestFiles.read(manifest, table.io(), table.specs())
        .caseSensitive(scan.isCaseSensitive())
        .select(BaseScan.SCAN_COLUMNS).entries(), t -> (ContentFile<?>) t);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

3, "file_count", Types.IntegerType.get(), "Count of data files"),
Types.NestedField.required(
5,
"pos_delete_record_count",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'position_delete', 'equality_delete' may be better in the full form, to match SnapshotSummary. Maybe even 'position_delete_count' , 'equality_delete_count' (SnapshotSummary has added_position_deletes, added_equality_deletes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@szehon-ho
Copy link
Member

HI, @ajantha-bhat do you still plan to work on this? We are also interested in this and can also give a try to parameterize the method.

@ajantha-bhat
Copy link
Member Author

I was out of office last week. Hopefully, I can work on it this week and get it merged.

@ajantha-bhat ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from 55fbb86 to 07b0882 Compare May 3, 2023 10:42
@ajantha-bhat
Copy link
Member Author

@szehon-ho: PR is ready for review. Thanks.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, some comments

case POSITION_DELETES:
this.posDeleteRecordCount = file.recordCount();
this.posDeleteFileCount += 1;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about specId here and below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this partition value, while updating the data file count, the Spec id would have been updated.
I don't think there will be delete files without data file entries. So, I assumed that again updating here would be redundant. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it now thinking if the delete happens after the partition evolution, it should reflect the latest spec id.

partitionsTable.schema().asStruct(), expected.get(i), actual.get(i));
}

testDeleteStats(tableIdentifier, table, partitionsTable, builder, partitionBuilder, expected);
Copy link
Member

@szehon-ho szehon-ho May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this another test? As the existing test is already quite long and hard to read, and it's doing more table modifications inside the new method, seems like it fits another test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@ajantha-bhat
Copy link
Member Author

I had to squash the commits for easy rebase and conflict resolution.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Sorry about rebase, I guess it was my intervening refactor. Anyway, looks good, just a minor comment

@Override
public Schema schema() {
if (table().spec().fields().size() < 1) {
return schema.select("record_count", "file_count");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's actually a bug here, and it only checks latest spec for Unpartitioned. if we have partition fields before but removed them, we will not show them. See other metadata tables like BaseFilesTable.schema.

Anyway its unrelated, but made #7533 to track it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I will explore this in the follow-up.

CloseableIterable.transform(
ManifestFiles.open(manifest, table.io(), table.specs())
.caseSensitive(scan.isCaseSensitive())
.select(BaseScan.DELETE_SCAN_COLUMNS), // don't select stats columns
Copy link
Member

@szehon-ho szehon-ho May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can do a switch on content to get either DELETE_SCAN_COLUMNS or SCAN_COLUMNS to make it clearer. Looks a bit weird, but I guess it works as DELETE_SCAN_COLUMNS is superset of SCAN_COLUMNS .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it with the switch.

DELETE_SCAN_COLUMNS has only content type as an extra field. Which can work for both the data and the delete file.
Agree that the name looks weird when used for generic content. CONTENT_SCAN_COLUMNS would have been the more suitable name for it.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will merge tomorrow if no further comments

PartitionUtil.coercePartition(
partitionType, table.specs().get(dataFile.specId()), dataFile.partition());
partitions.get(partition).update(dataFile);
try (CloseableIterable<ContentFile<?>> files = planFiles(scan)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for closing this

@szehon-ho szehon-ho merged commit 1cb1320 into apache:master May 5, 2023
@szehon-ho
Copy link
Member

Merged, thanks @ajantha-bhat for the work!

manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add delete file information to partitions table

5 participants