Fixes read metadata failed after dropped partition for V1 format #3411

ConeyLiu · 2021-10-28T13:00:14Z

This patch fixes two problems:

For V1 tables, we use a VoidTransform to replace the removed partition field. While the result type of VoidTransform same as the type of the field. For example, the result type of Bucket(2, string_type_field) is int, while is string for VoidTransform(string_type_field). So we should use the original partition field type(int instead of string) to build the common partitioning type.
We should use the matched PartitionSpec to convert the PartitionFiedlSummary to human string, not the current table spec.

ConeyLiu · 2021-10-28T13:01:45Z

Hi @RussellSpitzer @rdblue @openinx @jackye1995, could you help to review this? Thanks a lot.

core/src/main/java/org/apache/iceberg/Partitioning.java

RussellSpitzer · 2022-02-23T19:45:58Z

OK I think I understand the issue now, this is specifically an error where the attempt to build a common Partition Spec builds a spec using void transforms in V1 tables, the void column has a different return type than the original partition transform and throws an error. Is that correct?

I would recommend renaming this issue/pr to match the underlying issue. I also am guessing because this is the issue, there is no problem with V2 tables.

@aokolnychyi Should probably also take a look

core/src/main/java/org/apache/iceberg/AllManifestsTable.java

aokolnychyi · 2022-02-23T19:55:15Z

I may be able to take a look within the next few days

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

RussellSpitzer · 2022-02-23T19:56:09Z

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

+
+    Table manifestsTable = new ManifestsTable(table.ops(), table);
+    TableScan scan = manifestsTable.newScan()
+            .filter(Expressions.lessThan("length", 10000L));


Why are we filtering?

It could be removed. The test is adjusted from testManifestsTableAlwaysIgnoresResiduals.

core/src/main/java/org/apache/iceberg/Partitioning.java

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

core/src/test/java/org/apache/iceberg/TestPartitioning.java

ConeyLiu · 2022-02-25T09:41:19Z

Thanks @RussellSpitzer for the review.

OK I think I understand the issue now, this is specifically an error where the attempt to build a common Partition Spec builds a spec using void transforms in V1 tables, the void column has a different return type than the original partition transform and throws an error. Is that correct?

Yes, that's right.

I would recommend renaming this issue/pr to match the underlying issue. I also am guessing because this is the issue, there is no problem with V2 tables.

Any suggestion for this? What about Fixes read metadata failed after dropped partition in V1 tables?

core/src/main/java/org/apache/iceberg/Partitioning.java

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

ConeyLiu · 2022-03-02T09:44:13Z

@RussellSpitzer, I am sorry for the late update. Thanks for the review. Comments have been addressed. Pls take another look when you are free.

aokolnychyi · 2022-03-05T01:04:25Z

Sorry for the delay on my side. This PR is the next one I'll look into. I took a quick look and I think I remember the context.

aokolnychyi · 2022-03-05T01:24:57Z

Okay, I think I got the issue. Essentially, we don't know the correct type produced by Void transforms that we assign in v1 tables when a partition field is dropped. We assume the output type matches the source field type but it is not always the case. Let me take a look at the solution on Monday.

core/src/test/java/org/apache/iceberg/TestPartitioning.java

core/src/main/java/org/apache/iceberg/Partitioning.java

szlta · 2022-03-08T16:31:09Z

core/src/main/java/org/apache/iceberg/Partitioning.java

    Map<Integer, PartitionField> fieldMap = Maps.newHashMap();
-    List<NestedField> structFields = Lists.newArrayList();
+    Map<Integer, Type> typeMap = Maps.newHashMap();
+    Map<Integer, String> nameMap = Maps.newHashMap();


I have run into similar issues, and I think this will help resolve the type change of columns in older specs.
Another thing I have seen and is probably still a problem, is that this method may return a column name multiple times. Consider the following:
Table schema: a int, b date, c date
spec0: year(b), a
spec1: a
spec2: year(b), year(c)
then the result is something like this: 1000: b_year int, 1001: a int, 1002: b_year int, 1003: c_year int
Further down the line when we construct a Schema object, we will have a failure due to b_year name being present in two fields (1000, 1002).
How should this case be handled? Maybe appending _r [fieldId] to each column name?
cc: @RussellSpitzer , @aokolnychyi

@szlta, Iceberg should rename older columns when it updates the spec to avoid conflicts. That's why we have to take the latest column names. I think we should have some tests in TestPartitioning. Could you check, @szlta?

@aokolnychyi I think you're referring to what's happening in V1 tables. For those the spec is ever-growing in a way that no partition fields/transforms are removed, but rather converted to void.
The rename logic is there: https://github.com/apache/iceberg/blob/master/core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Ficeberg%2FBaseUpdatePartitionSpec.java#L179
but I think this is not used for V2 tables, as per https://github.com/apache/iceberg/blob/master/core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Ficeberg%2FBaseUpdatePartitionSpec.java#L261-L268

Since in V2, specs don't retain old deleted partition fields this rename is not required for normal operations.

The problem I'm describing only affects the metadata table queries, because for V2, due to the lack of above renames, Partitioning.partitionType() collects all partition fields from all previous specs too. With the lack of renames this can result in the same field name being present multiple times, and cause the PartitionsTable's (or DataFilesTable's) schema to be failed to get constructed.

I guess you are right, @szlta. Here is an example to reproduce.

PartitionSpec initialSpec = PartitionSpec.builderFor(SCHEMA) .identity("data") .build(); TestTables.TestTable table = TestTables.create(tableDir, "test", SCHEMA, initialSpec, V2_FORMAT_VERSION); table.updateSpec() .removeField("data") .commit(); table.updateSpec() .addField("data") .addField("id") .commit(); struct<1000: data: optional string, 1001: data: optional string, 1002: id: optional int>

While it would be great to update the logic that evolves the spec, I think we have to adapt the method that builds a common representation too. Otherwise, existing tables may be broken. Maintaining a set of used names and appending a suffix of the field ID sounds like a reasonable approach.

Thoughts, @rdblue @RussellSpitzer @szehon-ho @flyrain?

Looks like we are using field IDs while projecting values from the common representation in Spark.
At least, that part is working.

protected Map<Integer, StructProjection> buildPartitionProjections(Types.StructType partitionType, Map<Integer, PartitionSpec> specs) { Map<Integer, StructProjection> partitionProjections = Maps.newHashMap(); specs.forEach((specID, spec) -> partitionProjections.put(specID, StructProjection.create(partitionType, spec.partitionType())) ); return partitionProjections; }

I think we have two cases to solve this problem. The combined partition struct type is used in both metadata tables and when we need to project the _partition field for certain queries. For queries that use the _partition field, I think adding a suffix to make the names unique is the right approach. That's internal so it doesn't really matter what we rename to, as long as we get the projections that produce the partition tuple for a given spec right.

For metadata tables, we expect the field names to match the partition names. A simple example is that we use identity partitions using the original column name. So it would be weird to partition by category and need to query category_r1003 in the metadata table. I think that the right thing to do for metadata tables is to have a separate way to produce the combined struct that produces only one partition column per name.

In @aokolnychyi's example, 1001: data should be present and 1000: data should not be used for metadata tables.

This is also an area where we may want to bring back compatible partition columns. There's no reason why Anton's example couldn't detect that 1000: data was in an old spec and reuse the ID to avoid this problem.

What about partition transforms where the transform itself could change, but the field name remains the same?
E.g.: bucket(data, 10) would be data_bucket, and after dropping this from spec and re-adding with a new spec bucket(data, 8), id, then we would have 1000: data_bucket, 1001: data_bucket, 1002: id where the two data_bucket fields describe different things. I guess this is similar for truncate.

A partition in the old spec where data_bucket = 0 would be different from a partition in the new spec where data_bucket = 0

I agree with @rdblue that it's weird to have category_r1003 in queries, but if we want to avoid it, I see two ways of proceeding from metadata query perspective:
A metadata table should..

..either give back partition information as per the latest spec only..

..or we could combine the data returned for similarly named partition fields cascaded into just one field, but extend such tables with spec_id information in these cases.

So as per above example use case the partitions table would look like this for the two cases (with 1 partition in each spec):

1: +---------------------------------+--------------------+------------------+ | test.partition | test.record_count | test.file_count | +---------------------------------+--------------------+------------------+ | {data_bucket : 0, id : 1} | 1 | 1 | +---------------------------------+--------------------+------------------+ 2: +---------------------------------+--------------------+------------------+--------------+ | test.partition | test.record_count | test.file_count | test.spec_id | +---------------------------------+--------------------+------------------+--------------+ | {data_bucket : 0, id : null} | 1 | 1 | 0 | | {data_bucket : 0, id : 1} | 1 | 1 | 1 | +---------------------------------+--------------------+------------------+--------------+

Note this is how it would be for V2 tables. For V1, due to the renaming we're already doing, the renamed fields would be present too (unless we aim at changing that too):

1: +---------------------------------+--------------------+------------------+ | test.partition | test.record_count | test.file_count | +---------------------------------+--------------------+------------------+ | {data_bucket : 0, id : 1} | 1 | 1 | +---------------------------------+--------------------+------------------+ 2: +----------------------------------------------------------+--------------------+------------------+--------------+ | test.partition | test.record_count | test.file_count | test.spec_id | +----------------------------------------------------------+--------------------+------------------+--------------+ | {data_bucket_1000 : 0, data_bucket : null, id : null} | 1 | 1 | 0 | | {data_bucket_1000 : null, data_bucket: 0, id : 1} | 1 | 1 | 1 | +----------------------------------------------------------+--------------------+------------------+--------------+

I'm personally more for solution 2, where we don't have to omit the old partitions but at the same time we get a nice and coherent partition info result. It's kind of in league what I'm proposing in issue #4292 (we could also continue the discussion of this problem there, I didn't mean to hijack @ConeyLiu 's PR like this)

I'm also up for solution 2, where we rename just the columns from older specs. That sounds like a reasonable solution.

I'll catch up today. Sorry for the delay!

I think that the fixes in this PR look good other than a couple minor updates that are needed. We should definitely follow up with a PR that fixes the names as suggested by @szlta.

rdblue · 2022-03-09T23:38:38Z

@ConeyLiu, can you update the description? It is really helpful to not only say what you're trying to fix, but also to describe the approach and why the changes are needed. Thanks!

ConeyLiu · 2022-03-13T12:47:26Z

@ConeyLiu, can you update the description? It is really helpful to not only say what you're trying to fix, but also to describe the approach and why the changes are needed. Thanks!

Updated.

core/src/main/java/org/apache/iceberg/Partitioning.java

core/src/test/java/org/apache/iceberg/TestPartitioning.java

rdblue · 2022-03-13T18:10:56Z

core/src/main/java/org/apache/iceberg/Partitioning.java

-        .sorted(Comparator.comparingInt(NestedField::fieldId))
+    List<NestedField> sortedStructFields = fieldMap.keySet().stream()
+        .sorted(Comparator.naturalOrder())
+        .map(fieldId -> NestedField.optional(fieldId, nameMap.get(fieldId), typeMap.get(fieldId)))


I think the changes in this file are correct and fix the first issue, where the type may be incorrect for void transform fields in v1 tables.

ConeyLiu · 2022-03-28T11:01:38Z

Rebased, thanks for the reminder. @aokolnychyi

RussellSpitzer · 2022-03-28T15:34:04Z

Thanks @ConeyLiu !

RussellSpitzer · 2022-03-28T15:35:05Z

Also thanks to all of the reviewers who took their time to check on this PR, I know this was a tricky issue and I'm glad we had so many eyes on the issue.

ConeyLiu · 2022-03-29T01:57:53Z

Thanks all!

szehon-ho · 2022-04-15T06:52:01Z

Should this fix be included in 0.13.2? @rdblue @RussellSpitzer

rdblue · 2022-04-15T15:40:16Z

@szehon-ho, that sounds fine to me.

szehon-ho · 2022-04-15T16:31:25Z

👍 looks like it has popped up in a few places.

Thanks, @ConeyLiu let me know if you want to make a pr against 0.13.x branch and I can review, otherwise i will take a look. (not sure exactly if thats the process)

ConeyLiu · 2022-04-16T02:20:00Z

Thanks @szehon-ho, will do it.

…r V1 format (apache#3411) V1 Tables replace dropped partition transforms with a Void Transform which always returns null. The type of this Void transform always matches the column's original type. This has issues when the column's type differed from the transform used to previously partition the column. Here the issue is patched by retaining the correct type for the transform values when those values are being read after the transform has been dropped.

…r V1 format (#3411) (#4572)

…r V1 format (apache#3411) V1 Tables replace dropped partition transforms with a Void Transform which always returns null. The type of this Void transform always matches the column's original type. This has issues when the column's type differed from the transform used to previously partition the column. Here the issue is patched by retaining the correct type for the transform values when those values are being read after the transform has been dropped. (cherry picked from commit 6441d6e)

github-actions bot added the core label Oct 28, 2021

RussellSpitzer reviewed Oct 28, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/Partitioning.java Show resolved Hide resolved

mbabic mentioned this pull request Feb 14, 2022

select .files and rewrite data files failed after remove partition field #3374

Closed

RussellSpitzer reviewed Feb 23, 2022

View reviewed changes

core/src/main/java/org/apache/iceberg/AllManifestsTable.java Show resolved Hide resolved

RussellSpitzer reviewed Feb 23, 2022

View reviewed changes

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java Show resolved Hide resolved

RussellSpitzer reviewed Feb 23, 2022

View reviewed changes

core/src/main/java/org/apache/iceberg/Partitioning.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed Feb 23, 2022

View reviewed changes

core/src/main/java/org/apache/iceberg/Partitioning.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed Feb 23, 2022

View reviewed changes

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java Show resolved Hide resolved

RussellSpitzer reviewed Feb 23, 2022

View reviewed changes

core/src/test/java/org/apache/iceberg/TestPartitioning.java Outdated Show resolved Hide resolved

ConeyLiu force-pushed the drop-partitions branch from ec22082 to 1905c5e Compare February 25, 2022 09:37

RussellSpitzer reviewed Feb 25, 2022

View reviewed changes

core/src/main/java/org/apache/iceberg/Partitioning.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed Feb 25, 2022

View reviewed changes

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java Show resolved Hide resolved

ConeyLiu force-pushed the drop-partitions branch from 1905c5e to c8aab84 Compare March 2, 2022 09:41

aokolnychyi reviewed Mar 8, 2022

View reviewed changes

core/src/test/java/org/apache/iceberg/TestPartitioning.java Outdated Show resolved Hide resolved

aokolnychyi reviewed Mar 8, 2022

View reviewed changes

core/src/main/java/org/apache/iceberg/Partitioning.java Outdated Show resolved Hide resolved

szlta reviewed Mar 8, 2022

View reviewed changes

ConeyLiu changed the title ~~Fixes read metadata failed after dropped partition~~ Fixes read metadata failed after dropped partition for V1 format Mar 13, 2022

rdblue reviewed Mar 13, 2022

View reviewed changes

core/src/main/java/org/apache/iceberg/Partitioning.java Outdated Show resolved Hide resolved

rdblue reviewed Mar 13, 2022

View reviewed changes

core/src/test/java/org/apache/iceberg/TestPartitioning.java Outdated Show resolved Hide resolved

rdblue reviewed Mar 13, 2022

View reviewed changes

ConeyLiu added 5 commits March 24, 2022 11:27

address comments

ac228c0

adress comments

07688e6

address comments

4bd2c51

address comments

574ddbc

address comments

159a0d5

ConeyLiu force-pushed the drop-partitions branch from 0e52d2f to 159a0d5 Compare March 28, 2022 09:51

RussellSpitzer merged commit 6441d6e into apache:master Mar 28, 2022

RussellSpitzer approved these changes Mar 28, 2022

View reviewed changes

ConeyLiu deleted the drop-partitions branch March 29, 2022 01:57

szlta mentioned this pull request Apr 11, 2022

Adding specId for partitions metadata table #4516

Merged

ajantha-bhat mentioned this pull request Apr 15, 2022

Failure in expire snapshots action due to exception creating partition summary rows #4565

Closed

szehon-ho mentioned this pull request Apr 15, 2022

[0.13] Core: Fix filter pushdown for metadata tables with evolved specs (#4520) #4569

Merged

rdblue added this to the Iceberg 0.13.2 Release milestone Apr 15, 2022

ConeyLiu mentioned this pull request Apr 16, 2022

[0.13] Core: Fixes read metadata failed after dropped partition transform for V1 format #4572

Merged

szehon-ho pushed a commit that referenced this pull request Apr 18, 2022

Core: Fixes read metadata failed after dropped partition transform fo…

b521f40

…r V1 format (#3411) (#4572)

This was referenced Apr 28, 2022

Metadata table queries fail if a partition column was reused in V2 #4661

Closed

Core: Metadata table queries fail if a partition column was reused in V2 #4662

Merged

ajantha-bhat mentioned this pull request May 25, 2022

Error while evolving partition column of a table #2327

Closed

szlta mentioned this pull request May 31, 2022

Uniform partition field names generation #4868

Closed

szehon-ho mentioned this pull request Jun 29, 2022

Support catalog method to set table metadata #5163

Closed

Fixes read metadata failed after dropped partition for V1 format #3411

Fixes read metadata failed after dropped partition for V1 format #3411

Uh oh!

Conversation

ConeyLiu commented Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ConeyLiu commented Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

RussellSpitzer commented Feb 23, 2022

Uh oh!

Uh oh!

aokolnychyi commented Feb 23, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ConeyLiu commented Feb 25, 2022

Uh oh!

Uh oh!

Uh oh!

ConeyLiu commented Mar 2, 2022

Uh oh!

aokolnychyi commented Mar 5, 2022

Uh oh!

aokolnychyi commented Mar 5, 2022

Uh oh!

Uh oh!

Uh oh!

szlta Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Mar 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Mar 9, 2022

Uh oh!

ConeyLiu commented Mar 13, 2022

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConeyLiu commented Mar 28, 2022

Uh oh!

RussellSpitzer commented Mar 28, 2022

Uh oh!

RussellSpitzer commented Mar 28, 2022

Uh oh!

ConeyLiu commented Mar 29, 2022

Uh oh!

szehon-ho commented Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ConeyLiu commented Oct 28, 2021 •

edited

Loading

ConeyLiu commented Oct 28, 2021 •

edited

Loading

szlta Mar 8, 2022 •

edited

Loading

aokolnychyi Mar 9, 2022 •

edited

Loading

szehon-ho commented Apr 15, 2022 •

edited

Loading

szehon-ho commented Apr 15, 2022 •

edited

Loading