Core: Fix querying metadata tables with multiple specs #2936

aokolnychyi · 2021-08-04T17:24:34Z

This PR adds a utility method to derive a common type for all partition specs.

Prior to this change, querying metadata tables in v2 format with evolved partitioning failed with runtime exceptions.

aokolnychyi · 2021-08-04T17:26:12Z

cc @RussellSpitzer @rdblue @openinx @kbendick @jackye1995 @yyanyy @flyrain @karuppayya

flyrain · 2021-08-04T18:47:17Z

core/src/test/java/org/apache/iceberg/TestPartitioning.java

+
+    TableOperations ops = ((HasTableOperations) table).operations();
+    TableMetadata current = ops.current();
+    ops.commit(current, current.updatePartitionSpec(newSpec));


Can we make change for the method updatePartitionSpec to avoid conflicts?

This API is hidden from users. The user-facing API is UpdatePartitionSpec accessible via Table. That one actually ensures we don't hit this case. The spec evolution in v1 tables is actually limited as described here.

There could be some tables where people evolved partitioning before the public API appeared. It is an edge case but this test ensures we get a reasonable exception for such tables.

I see, this part, https://iceberg.apache.org/spec/#partition-evolution.

flyrain

LGTM

flyrain · 2021-08-04T20:17:38Z

.../src/test/java/org/apache/iceberg/spark/source/TestMetadataTablesWithPartitionEvolution.java

+    table.updateSpec()
+        .removeField("data")
+        .commit();


Nit: could be in one line.

core/src/main/java/org/apache/iceberg/Partitioning.java

core/src/test/java/org/apache/iceberg/TestPartitioning.java

RussellSpitzer · 2021-08-04T21:19:39Z

Do we have to worry about the Manifests Table? Or is it ok because we are always displaying in the context of the current spec?

RussellSpitzer

A few minor comments, but looks good to me overall

jackye1995

looks good to me, thanks for the fix!

.../src/test/java/org/apache/iceberg/spark/source/TestMetadataTablesWithPartitionEvolution.java

aokolnychyi · 2021-08-05T16:08:59Z

@RussellSpitzer, somehow tests for all_data_files started to fail after #2877. It looks like we did not update that metadata table in the PR. Is that on purpose?

Checking what is actually going on.

aokolnychyi · 2021-08-05T16:11:28Z

@RussellSpitzer, I think simply using schema() instead of fileSchema in AllDataFilesTable fixes the problem.
Could you check?

RussellSpitzer · 2021-08-05T16:12:18Z

@aokolnychyi Sounds good to me, I thought I mimicd the way we were doing it in the FilesTable using the "fileSchema" as the projected schema, if that isn't the scan.schema we should change it

aokolnychyi · 2021-08-05T16:17:19Z

Well, I am not sure. I'll need your help to verify whether my assumption is correct. We are using the same ManifestReadTask in both but AllDataFilesTable still passes fileSchema instead of schema().

core/src/main/java/org/apache/iceberg/Partitioning.java

rdblue · 2021-08-09T21:54:05Z

core/src/main/java/org/apache/iceberg/Partitioning.java

+
+    List<NestedField> sortedCommonFields = commonFields.values().stream()
+        .sorted(Comparator.comparingInt(NestedField::fieldId))
+        .collect(Collectors.toList());


+1 for sorting by fieldId.

kbendick

Overall, this looks good to me.

Same question that Ryan has regarding checking name during validation in partitionType, but overall this looks good to me. Thanks Anton!

aokolnychyi · 2021-08-10T16:58:03Z

core/src/main/java/org/apache/iceberg/Partitioning.java

+    List<NestedField> structFields = Lists.newArrayList();
+
+    // sort the spec IDs in descending order to pick up the most recent field names
+    List<Integer> specIds = table.specs().keySet().stream()


Added a sort by spec ID to make sure we pick up the most recent field name (see a dedicated test too).

aokolnychyi · 2021-08-10T17:00:14Z

core/src/test/java/org/apache/iceberg/TestPartitioning.java

+  }
+
+  @Test
+  public void testPartitionTypeWithAddingBackSamePartitionFieldInV1Table() {


This test validates we ignore field names when building the common type. The original spec will have 1000:data and the last spec will have 1000:data_1000 as the old field was renamed to avoid naming conflicts.

api/src/main/java/org/apache/iceberg/PartitionField.java

rdblue · 2021-08-13T21:33:33Z

core/src/main/java/org/apache/iceberg/Partitioning.java

+          structFields.add(structField);
+        } else {
+          // verify the fields are compatible as they may conflict in v1 tables
+          ValidationException.check(field.compatibleWith(existingField),


I would probably just make equivalentIgnoringName a private method in this class for this.

rdblue · 2021-08-13T21:34:36Z

core/src/test/java/org/apache/iceberg/TestPartitioning.java

+        NestedField.optional(1001, "data", Types.StringType.get())
+    );
+    StructType actualType = Partitioning.partitionType(table);
+    Assert.assertEquals("Types must match", expectedType, actualType);


I think you could argue that adding data back should re-use the old ID. Not something to fix here, but we should probably fix it at some point.

It is indeed wierd. However, I would not worry too much here as we have already stated not to rename and drop fields in v1 tables.

rdblue

Looks good to me, other than the compatibleWith method that conflicts with the one in PartitionSpec.

aokolnychyi · 2021-08-14T03:13:20Z

Yeah, I wasn't sure about the method name. Updated.

aokolnychyi · 2021-08-14T06:17:08Z

Thanks for reviewing, @flyrain @karuppayya @RussellSpitzer @rdblue @kbendick @jackye1995!

github-actions bot added core spark labels Aug 4, 2021

flyrain reviewed Aug 4, 2021

View reviewed changes

flyrain approved these changes Aug 4, 2021

View reviewed changes

RussellSpitzer reviewed Aug 4, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/Partitioning.java Show resolved Hide resolved

RussellSpitzer reviewed Aug 4, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/Partitioning.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed Aug 4, 2021

View reviewed changes

core/src/test/java/org/apache/iceberg/TestPartitioning.java Show resolved Hide resolved

RussellSpitzer approved these changes Aug 4, 2021

View reviewed changes

jackye1995 approved these changes Aug 4, 2021

View reviewed changes

karuppayya reviewed Aug 5, 2021

View reviewed changes

.../src/test/java/org/apache/iceberg/spark/source/TestMetadataTablesWithPartitionEvolution.java Outdated Show resolved Hide resolved

.../src/test/java/org/apache/iceberg/spark/source/TestMetadataTablesWithPartitionEvolution.java Outdated Show resolved Hide resolved

aokolnychyi force-pushed the metadata-table-specs branch from 3e5ba47 to 93a7fd8 Compare August 5, 2021 16:05

RussellSpitzer mentioned this pull request Aug 5, 2021

Core: Missed Nested Schema Projection in AllDataFiles Table #2941

Merged

Core: Fix querying metadata tables with multiple specs

ec1f573

aokolnychyi force-pushed the metadata-table-specs branch from 93a7fd8 to ec1f573 Compare August 5, 2021 17:53

rdblue reviewed Aug 9, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/Partitioning.java Outdated Show resolved Hide resolved

rdblue reviewed Aug 9, 2021

View reviewed changes

kbendick approved these changes Aug 10, 2021

View reviewed changes

Review feedback

fbf7c4f

github-actions bot added the API label Aug 10, 2021

aokolnychyi commented Aug 10, 2021

View reviewed changes

aokolnychyi added 2 commits August 10, 2021 10:06

Two more tests

90e60fb

Fix tests

663855c

rdblue reviewed Aug 13, 2021

View reviewed changes

api/src/main/java/org/apache/iceberg/PartitionField.java Outdated Show resolved Hide resolved

rdblue reviewed Aug 13, 2021

View reviewed changes

rdblue requested changes Aug 13, 2021

View reviewed changes

Review feedback

31a20d6

aokolnychyi requested a review from rdblue August 14, 2021 03:15

aokolnychyi merged commit 95cde3a into apache:master Aug 14, 2021

aokolnychyi mentioned this pull request Aug 18, 2021

Core: Validate transforms while building partition type #2992

Merged

RussellSpitzer mentioned this pull request Oct 26, 2021

select .files and rewrite data files failed after remove partition field #3374

Closed

kbendick pushed a commit to kbendick/iceberg that referenced this pull request Nov 2, 2021

Core: Fix querying metadata tables with multiple specs (apache#2936)

2114474

kbendick mentioned this pull request Nov 2, 2021

Investigate amount of work needed to backport #3240 to 0.12.1 #3443

Closed

alexjo2144 mentioned this pull request May 11, 2022

Iceberg $partitions metadata table only uses the current Spec trinodb/trino#12323

Closed

Core: Fix querying metadata tables with multiple specs #2936

Core: Fix querying metadata tables with multiple specs #2936

Uh oh!

Conversation

aokolnychyi commented Aug 4, 2021

Uh oh!

aokolnychyi commented Aug 4, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyrain left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RussellSpitzer commented Aug 4, 2021

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

jackye1995 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aokolnychyi commented Aug 5, 2021

Uh oh!

aokolnychyi commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RussellSpitzer commented Aug 5, 2021

Uh oh!

aokolnychyi commented Aug 5, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbendick left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue left a comment

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Aug 14, 2021

Uh oh!

aokolnychyi commented Aug 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

aokolnychyi commented Aug 5, 2021 •

edited

Loading