Core: Support delete file stats in partitions metadata table #6661

ajantha-bhat · 2023-01-25T10:26:19Z

Currently partitions metadata table only has the data file stats

file_count
record_count

When the delete files are present, these stats are inaccurate (as we don't decrement these values).
So, capture the delete file stats to give a rough idea about why these stats are inaccurate.
Note that we are not applying the deletes to the data file and computing the effective result as it will be a very expensive operation. Users are suggested to execute rewrite_data_files periodically to apply the delete files to the data files.

Delete file stats to be added:

pos_delete_file_count
pos_delete_record_count
eq_delete_file_count
eq_delete_record_count

Note:

Docs will be updated in a follow-up PR, probably after renaming file_count, and record_count.
The same schema will also be used for the partition stats feature during implementation.

Fixes #6042

ajantha-bhat · 2023-01-26T16:52:30Z

cc: @szehon-ho, @RussellSpitzer, @rdblue, @flyrain

ajantha-bhat · 2023-01-26T13:29:49Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

            Types.NestedField.required(1, "partition", Partitioning.partitionType(table)),
            Types.NestedField.required(2, "record_count", Types.LongType.get()),
            Types.NestedField.required(3, "file_count", Types.IntegerType.get()),
-            Types.NestedField.required(4, "spec_id", Types.IntegerType.get()));


I thought moving the spec-id to the last looks cleaner as all the stats will be together.

As these are not stored tables and are computed freshly on a query, I thought no need to worry about compatibility. Let me know If I am wrong.

In the past we have general said that readers should not rely on position of fields in tables, only on names. That said every time we do something like this I think we end up breaking somebody.

If we are gonna move things it probably should go before file and record count, though rather than after all the stats.

Ok. Thanks. I think we can move spec_id to field id = 2 (before file and record count)

@szehon-ho : What is your opinion on this?

@RussellSpitzer I think it was , we should keep ids/names but not necessarily order. Ref: #3015 (comment)

Didn't see we broke anything that time, we resolved #3015 (comment) eventually as a classpath issue with old classes still in play.

So being said, we should not change the id of spec_id here, but ok to move it to after partition as you guys are saying.

Thanks for linking the previous discussions. I will keep the name and id as it is while moving the field

core/src/main/java/org/apache/iceberg/PartitionsTable.java

ajantha-bhat · 2023-03-01T15:07:27Z

@szehon-ho:
a) I have addressed the duplicate delete file problem using HashSet instead of PositionDeletesScan because we don't have a similar logic for equality deletes and code looks odd. Maybe later we can replace the logic as it is internal logic.

b) I have derived the refactoring work of this PR into a separate one #6975 to reduce the review effort on this PR (we need to merge that first)

szehon-ho · 2023-03-02T02:32:09Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

+    private int dataFileCount;
+
+    private final Set<DeleteFile> equalityDeleteFiles;
+    private final Set<DeleteFile> positionDeleteFiles;


Can you double check DeleteFile equal/hashcode are working correctly, if conflict? Should we rather do something safer, like paths? Maybe it will even be better for memory?

iceberg/api/src/main/java/org/apache/iceberg/RewriteFiles.java

Line 71 in 76c1df0

Set<DeleteFile> deleteFilesToReplace,

Set<DeleteFile> was already used in the existing code, as seen above.
But there are no equals() and hashcode() impl for these classes. Which means it is using the default ones.

Note that Set<DataFile> also exist in the code and it also doesn't have equal/hashcode impl.

I am ok with using Paths. But just wondering why it doesn't exist and maybe we can handle these in a follow-up PR instead of this one.

So I am not so confident, that if for example you have two FileScanTask, returning deletes() that are the same logical file, will the two DeleteFile object be different or not? Java default equals() is instance equality, isnt it? And even if it does work, may break if we implement those at some point. Hence the suggestion to use something with established hashCode/equals.

The same delete file object from the context is reused while making the fileScan here.

iceberg/core/src/main/java/org/apache/iceberg/ManifestGroup.java

Line 351 in d42d1e8

DeleteFile[] deleteFiles = ctx.deletes().forEntry(entry);

hence, I believe the default equals() is enough.

I don't disagree about having the equals and hashcode for DataFile and DeleteFile. But code is widely using the Set<DataFile> and Set<DeleteFile> already. So, if we are adding it. It should be in a separate PR/discussion.

Let us see what others think on this.
cc: @RussellSpitzer, @jackye1995

Yea it may work due to cache inside DeleteIndex, but I don't know. I see Set<DeleteFile> but it seems those case are where they are limited to not conflict with each other except instance equality. I would also be interested in the best way here. Will ping @aokolnychyi to take a look as well.

I'd be more worried about the memory usage here, seems like we have to keep the entire set of all delete files in memory (for all partitions) while we are building this table?

I think i agree in this case, it's probably best to go forward with just storing their paths.

core/src/main/java/org/apache/iceberg/PartitionsTable.java

ajantha-bhat · 2023-03-06T04:41:35Z

build failed due to flaky failure. Debugging in a separate PR
#7010

Will just retrigger the build.

RussellSpitzer · 2023-03-10T17:33:20Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

+      this.equalityDeleteFiles = Sets.newHashSet();
+    }
+
+    private void update(FileScanTask task) {


I think I would strongly want to consider an approach which either disposes of the set's when the partition info is done being constructed or doesn't use this set approach. It looks like in the current implementation we end up keeping the entire set of delete file objects in memory indefinitely.

I also had the similar concern, and thought of using PositionDeletesTableScan, but it seems there is no equivalent EqualityDeletesTableScan yet.

Maybe another approach here could be, going through each partition and using DeleteFileIndex one by one.

Actually that way I thought will be quite expensive (two pass).

Probably the only way to effectively do it , until this whole table is migrated over to some kind of view of 'files' table, is to rewrite the PartitionsTableScan to directly use the underlying code: ManifestReader.readDeleteManifest() / ManifestReader.read(), and then go through those iterators, instead of using the ManifestGroup.planFiles() / FileScanTask way.

That way, we can iterate through the DataFile/ DeleteFile, and collect number of delete files/data files in one pass without keeping them in memory. It's definitely do-able but will be a bit more work though. Any thoughts?

Probably the only way to effectively do it , until this whole table is migrated over to some kind of view of 'files' table, is to rewrite the PartitionsTableScan to directly use the underlying code: ManifestReader.readDeleteManifest() / ManifestReader.read(), and then go through those iterators, instead of using the ManifestGroup.planFiles() / FileScanTask way.

@szehon-ho: I have spent some time and realized that I am not super familiar with this side of code. Would you like to contribute a PR for data files for replacing ManifestGroup.planFiles()? I can then extend it to delete files and handle these stats updates.

Yea, I am thinking something along the lines of what currently BaseFilesTable/ ManifestsTable does, sure let me take a look , when I get a chance.

Thanks. I will also explore those files.

BTW I have updated to Map<String, Integer> instead of set and cleared it after the usage now. But looking forward to solving it as you suggested by changing the planFiles

ajantha-bhat · 2023-03-21T04:27:32Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

-                3, "file_count", Types.IntegerType.get(), "Count of data files"));
+                3, "file_count", Types.IntegerType.get(), "Count of data files"),
+            Types.NestedField.required(
+                5,


Note:
spec_id (4) is present at the top.

jackye1995 · 2023-03-21T21:01:02Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

+                "pos_delete_record_count",
+                Types.LongType.get(),
+                "Count of records in position delete files"),
+            Types.NestedField.required(6, "pos_delete_file_count", Types.IntegerType.get()),


can we also add descriptions for the file count fields?

szehon-ho · 2023-03-22T02:48:45Z

FYI, we are looking at refactor of existing partitions table to make this task easier, and alleviate concern about keeping large hashset ( @dramaticlly also mentioned interest in looking at it as well and may contribute)

ajantha-bhat · 2023-04-10T15:16:11Z

@szehon-ho, @RussellSpitzer, @jackye1995: I have reworked the PR based on #7189 now. Please review this PR again.

ajantha-bhat · 2023-04-10T15:18:47Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

-    // cache a position map needed by each partition spec to normalize partitions to final schema
-    Map<Integer, int[]> normalizedPositionsBySpec =
-        Maps.newHashMapWithExpectedSize(table.specs().size());
+    try (CloseableIterable<DataFile> datafiles = planDataFiles(scan);


I wanted to just keep planFiles which returns CloseableIterable<ContentFile<?>> but I couldn't succeed at it during transforming it into ParallelIterable

I would say, to try this if we can, both to clean the code and also as we don't do another scan for perf reasons.

I think it may work by doing 'CloseableIterables.transform()' in the worst case to cast the type

@ajantha-bhat can we see about this? Will it work to wrap the manifest iterable to get the right type in planDataFiles()?

CloseableIterable.transform(ManifestFiles.read(manifest, table.io(), table.specs()) .caseSensitive(scan.isCaseSensitive()) .select(BaseScan.SCAN_COLUMNS).entries(), t -> (ContentFile<?>) t);

szehon-ho · 2023-04-18T05:25:59Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

+                3, "file_count", Types.IntegerType.get(), "Count of data files"),
+            Types.NestedField.required(
+                5,
+                "pos_delete_record_count",


I think 'position_delete', 'equality_delete' may be better in the full form, to match SnapshotSummary. Maybe even 'position_delete_count' , 'equality_delete_count' (SnapshotSummary has added_position_deletes, added_equality_deletes)

szehon-ho · 2023-04-28T23:03:38Z

HI, @ajantha-bhat do you still plan to work on this? We are also interested in this and can also give a try to parameterize the method.

ajantha-bhat · 2023-05-01T03:03:27Z

I was out of office last week. Hopefully, I can work on it this week and get it merged.

ajantha-bhat · 2023-05-03T12:38:33Z

@szehon-ho: PR is ready for review. Thanks.

szehon-ho

Looks mostly good, some comments

szehon-ho · 2023-05-03T23:56:16Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

+        case POSITION_DELETES:
+          this.posDeleteRecordCount = file.recordCount();
+          this.posDeleteFileCount += 1;
+          break;


How about specId here and below?

For this partition value, while updating the data file count, the Spec id would have been updated.
I don't think there will be delete files without data file entries. So, I assumed that again updating here would be redundant. WDYT?

updated it now thinking if the delete happens after the partition evolution, it should reflect the latest spec id.

szehon-ho · 2023-05-04T00:01:01Z

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java

          partitionsTable.schema().asStruct(), expected.get(i), actual.get(i));
    }
+
+    testDeleteStats(tableIdentifier, table, partitionsTable, builder, partitionBuilder, expected);


Could we make this another test? As the existing test is already quite long and hard to read, and it's doing more table modifications inside the new method, seems like it fits another test.

ajantha-bhat · 2023-05-04T16:41:51Z

I had to squash the commits for easy rebase and conflict resolution.

szehon-ho

Thanks. Sorry about rebase, I guess it was my intervening refactor. Anyway, looks good, just a minor comment

szehon-ho · 2023-05-04T21:45:09Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

  @Override
  public Schema schema() {
    if (table().spec().fields().size() < 1) {
-      return schema.select("record_count", "file_count");


I think there's actually a bug here, and it only checks latest spec for Unpartitioned. if we have partition fields before but removed them, we will not show them. See other metadata tables like BaseFilesTable.schema.

Anyway its unrelated, but made #7533 to track it.

Good catch. I will explore this in the follow-up.

szehon-ho · 2023-05-04T21:47:36Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

+                CloseableIterable.transform(
+                    ManifestFiles.open(manifest, table.io(), table.specs())
+                        .caseSensitive(scan.isCaseSensitive())
+                        .select(BaseScan.DELETE_SCAN_COLUMNS), // don't select stats columns


Nit: can do a switch on content to get either DELETE_SCAN_COLUMNS or SCAN_COLUMNS to make it clearer. Looks a bit weird, but I guess it works as DELETE_SCAN_COLUMNS is superset of SCAN_COLUMNS .

updated it with the switch.

DELETE_SCAN_COLUMNS has only content type as an extra field. Which can work for both the data and the delete file.
Agree that the name looks weird when used for generic content. CONTENT_SCAN_COLUMNS would have been the more suitable name for it.

szehon-ho

will merge tomorrow if no further comments

szehon-ho · 2023-05-05T04:49:43Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

-          PartitionUtil.coercePartition(
-              partitionType, table.specs().get(dataFile.specId()), dataFile.partition());
-      partitions.get(partition).update(dataFile);
+    try (CloseableIterable<ContentFile<?>> files = planFiles(scan)) {


Thanks for closing this

szehon-ho · 2023-05-05T18:24:00Z

Merged, thanks @ajantha-bhat for the work!

…6661)

github-actions bot added the core label Jan 25, 2023

ajantha-bhat force-pushed the delete-stats branch from decc9b0 to 4fd9575 Compare January 25, 2023 12:12

github-actions bot added the spark label Jan 25, 2023

ajantha-bhat force-pushed the delete-stats branch from 4fd9575 to 1cf5242 Compare January 25, 2023 13:28

ajantha-bhat marked this pull request as draft January 25, 2023 16:55

ajantha-bhat force-pushed the delete-stats branch 3 times, most recently from f5c14dc to 68571a1 Compare January 26, 2023 15:47

ajantha-bhat changed the title ~~[WIP] Core: Support delete files stats in partitions metadata table~~ Core: Support delete file stats in partitions metadata table Jan 26, 2023

ajantha-bhat marked this pull request as ready for review January 26, 2023 16:51

ajantha-bhat commented Jan 26, 2023

View reviewed changes

szehon-ho reviewed Feb 1, 2023

View reviewed changes

core/src/main/java/org/apache/iceberg/PartitionsTable.java Outdated Show resolved Hide resolved

ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from 581643b to 0e05838 Compare February 20, 2023 13:24

ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from f157341 to 3134359 Compare March 1, 2023 12:34

ajantha-bhat mentioned this pull request Mar 1, 2023

Core: Minor refactoring of PartitionsTable #6975

Merged

szehon-ho reviewed Mar 2, 2023

View reviewed changes

ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from 38db2b5 to d5fc274 Compare March 2, 2023 14:35

szehon-ho mentioned this pull request Mar 4, 2023

How do I know which partition has delete files and the count? #6995

Closed

ajantha-bhat closed this Mar 6, 2023

ajantha-bhat reopened this Mar 6, 2023

RussellSpitzer reviewed Mar 10, 2023

View reviewed changes

ajantha-bhat force-pushed the delete-stats branch from c5532f4 to 42babde Compare March 21, 2023 04:26

ajantha-bhat commented Mar 21, 2023

View reviewed changes

ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from b993ff0 to 1448763 Compare March 21, 2023 04:40

ajantha-bhat closed this Mar 21, 2023

ajantha-bhat reopened this Mar 21, 2023

jackye1995 reviewed Mar 21, 2023

View reviewed changes

dramaticlly mentioned this pull request Mar 23, 2023

Refactor the planning in PartitionTable #7189

Closed

ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from 68bec9e to 8efba58 Compare April 10, 2023 15:13

ajantha-bhat commented Apr 10, 2023

View reviewed changes

ajantha-bhat force-pushed the delete-stats branch from 8efba58 to 4a2b37f Compare April 10, 2023 16:34

szehon-ho reviewed Apr 18, 2023

View reviewed changes

ajantha-bhat force-pushed the delete-stats branch 2 times, most recently from 55fbb86 to 07b0882 Compare May 3, 2023 10:42

szehon-ho reviewed May 4, 2023

View reviewed changes

Core: Support delete files stats in partitions metadata table

974643b

ajantha-bhat force-pushed the delete-stats branch from 4a23b1d to 974643b Compare May 4, 2023 16:39

szehon-ho approved these changes May 4, 2023

View reviewed changes

Address nit

8f43f1e

ajantha-bhat force-pushed the delete-stats branch from b358603 to 8f43f1e Compare May 5, 2023 04:17

szehon-ho approved these changes May 5, 2023

View reviewed changes

szehon-ho reviewed May 5, 2023

View reviewed changes

szehon-ho merged commit 1cb1320 into apache:master May 5, 2023

manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023

Core: Support delete file stats in partitions metadata table (apache#…

0c7c89d

…6661)

Core: Support delete file stats in partitions metadata table #6661

Core: Support delete file stats in partitions metadata table #6661

Uh oh!

Conversation

ajantha-bhat commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajantha-bhat commented Jan 26, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ajantha-bhat commented Mar 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szehon-ho Mar 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat Mar 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ajantha-bhat commented Mar 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat Mar 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajantha-bhat commented Apr 10, 2023

Uh oh!

ajantha-bhat Apr 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

ajantha-bhat commented Jan 25, 2023 •

edited

Loading

ajantha-bhat commented Mar 1, 2023 •

edited

Loading

szehon-ho Mar 2, 2023 •

edited

Loading

szehon-ho Mar 2, 2023 •

edited

Loading

ajantha-bhat Mar 3, 2023 •

edited

Loading

szehon-ho Mar 4, 2023 •

edited

Loading

szehon-ho Mar 11, 2023 •

edited

Loading

ajantha-bhat Mar 14, 2023 •

edited

Loading

szehon-ho commented Mar 22, 2023 •

edited

Loading

ajantha-bhat Apr 10, 2023 •

edited

Loading

szehon-ho Apr 18, 2023 •

edited

Loading

szehon-ho May 4, 2023 •

edited

Loading