-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Fix querying metadata tables with multiple specs #2936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| TableOperations ops = ((HasTableOperations) table).operations(); | ||
| TableMetadata current = ops.current(); | ||
| ops.commit(current, current.updatePartitionSpec(newSpec)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make change for the method updatePartitionSpec to avoid conflicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is hidden from users. The user-facing API is UpdatePartitionSpec accessible via Table. That one actually ensures we don't hit this case. The spec evolution in v1 tables is actually limited as described here.
There could be some tables where people evolved partitioning before the public API appeared. It is an edge case but this test ensures we get a reasonable exception for such tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this part, https://iceberg.apache.org/spec/#partition-evolution.
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| table.updateSpec() | ||
| .removeField("data") | ||
| .commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could be in one line.
|
Do we have to worry about the Manifests Table? Or is it ok because we are always displaying in the context of the current spec? |
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, but looks good to me overall
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, thanks for the fix!
.../src/test/java/org/apache/iceberg/spark/source/TestMetadataTablesWithPartitionEvolution.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/iceberg/spark/source/TestMetadataTablesWithPartitionEvolution.java
Outdated
Show resolved
Hide resolved
3e5ba47 to
93a7fd8
Compare
|
@RussellSpitzer, somehow tests for Checking what is actually going on. |
|
@RussellSpitzer, I think simply using |
|
@aokolnychyi Sounds good to me, I thought I mimicd the way we were doing it in the FilesTable using the "fileSchema" as the projected schema, if that isn't the scan.schema we should change it |
|
Well, I am not sure. I'll need your help to verify whether my assumption is correct. We are using the same |
93a7fd8 to
ec1f573
Compare
|
|
||
| List<NestedField> sortedCommonFields = commonFields.values().stream() | ||
| .sorted(Comparator.comparingInt(NestedField::fieldId)) | ||
| .collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for sorting by fieldId.
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good to me.
Same question that Ryan has regarding checking name during validation in partitionType, but overall this looks good to me. Thanks Anton!
| List<NestedField> structFields = Lists.newArrayList(); | ||
|
|
||
| // sort the spec IDs in descending order to pick up the most recent field names | ||
| List<Integer> specIds = table.specs().keySet().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a sort by spec ID to make sure we pick up the most recent field name (see a dedicated test too).
| } | ||
|
|
||
| @Test | ||
| public void testPartitionTypeWithAddingBackSamePartitionFieldInV1Table() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test validates we ignore field names when building the common type. The original spec will have 1000:data and the last spec will have 1000:data_1000 as the old field was renamed to avoid naming conflicts.
| structFields.add(structField); | ||
| } else { | ||
| // verify the fields are compatible as they may conflict in v1 tables | ||
| ValidationException.check(field.compatibleWith(existingField), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably just make equivalentIgnoringName a private method in this class for this.
| NestedField.optional(1001, "data", Types.StringType.get()) | ||
| ); | ||
| StructType actualType = Partitioning.partitionType(table); | ||
| Assert.assertEquals("Types must match", expectedType, actualType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could argue that adding data back should re-use the old ID. Not something to fix here, but we should probably fix it at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed wierd. However, I would not worry too much here as we have already stated not to rename and drop fields in v1 tables.
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, other than the compatibleWith method that conflicts with the one in PartitionSpec.
|
Yeah, I wasn't sure about the method name. Updated. |
|
Thanks for reviewing, @flyrain @karuppayya @RussellSpitzer @rdblue @kbendick @jackye1995! |
This PR adds a utility method to derive a common type for all partition specs.
Prior to this change, querying metadata tables in v2 format with evolved partitioning failed with runtime exceptions.