-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Only validate the current partition specs #5707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -421,6 +421,31 @@ public void testSparkTableAddDropPartitions() throws Exception { | |
| "spark table partition should be empty", 0, sparkTable().partitioning().length); | ||
| } | ||
|
|
||
| @Test | ||
| public void testDropColumnOfOldPartitionFieldV1() { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is the right place to test this. There are partition spec evolution tests in core, which is a much better place than here. |
||
| // default table created in v1 format | ||
| sql( | ||
| "CREATE TABLE %s (id bigint NOT NULL, ts timestamp, day_of_ts date) USING iceberg PARTITIONED BY (day_of_ts)", | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we explicitly set format-version=1, in case the default changes in the future?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great suggestion, added! 👍🏻
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Fokko, it doesn't look like this change was pushed?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was working on a follow-up to fix the other issue in the read-path. I waited with pushing until I had a proper fix for this but turns out that it is a bit more complicated than I originally anticipated. I've created a new PR #5907 |
||
| tableName); | ||
|
|
||
| sql("ALTER TABLE %s REPLACE PARTITION FIELD day_of_ts WITH days(ts)", tableName); | ||
|
|
||
| sql("ALTER TABLE %s DROP COLUMN day_of_ts", tableName); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we also include a sql to read back the data from the table after the drop? We have noticed the same issue in Trino, but for us the drop column still succeeds, it's the subsequent read operations that start failing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @marton-bod, the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, Interesting. This is what I assumed would happen and went with spec change (disruptive change) @marton-bod: I have tagged you in one of the slack discussions. Where I proposed spec change to handle this. |
||
| } | ||
|
|
||
| @Test | ||
| public void testDropColumnOfOldPartitionFieldV2() { | ||
| sql( | ||
| "CREATE TABLE %s (id bigint NOT NULL, ts timestamp, day_of_ts date) USING iceberg PARTITIONED BY (day_of_ts)", | ||
| tableName); | ||
|
|
||
| sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version' = '2');", tableName); | ||
|
|
||
| sql("ALTER TABLE %s REPLACE PARTITION FIELD day_of_ts WITH days(ts)", tableName); | ||
|
|
||
| sql("ALTER TABLE %s DROP COLUMN day_of_ts", tableName); | ||
| } | ||
|
|
||
| private void assertPartitioningEquals(SparkTable table, int len, String transform) { | ||
| Assert.assertEquals("spark table partition should be " + len, len, table.partitioning().length); | ||
| Assert.assertEquals( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[doubt] as per my understanding, I think this might not work with v1 partition spec considering we still have
voidtransform for dropped partitions, which would be holding reference to dropped source field.Should we update
PartitionSpec#checkCompatibilityto handle void transforms. Thoughts ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also should we add a unit test to TableMetadataParserTest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@singhpk234 Can you elaborate a bit more? I'm not sure that I understand the issue.
For V1, using the
specinstead ofspecs, nothing changes. Since it takes the other branch:iceberg/core/src/main/java/org/apache/iceberg/TableMetadataParser.java
Lines 393 to 402 in 24d5a53
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for not being clear,
What I meant here was, mapping the current schema to the current spec might still cause issue as in case of tables of V1 format, (Considering the UT in this PR, last DDL) The current spec will have a
voidtransform forday_of_tssinceday_of_tsis dropped from partitioning (in V1 format, we replace this transform with avoidtransform). Now when we attempt to bind current partition spec with current schema, current schema will not haveday_of_ts, but inPartitionSpec#checkCompatibilitywe will try to findschema.findType(field.sourceId()), from current schema forvoidtransform ofday_of_tsand it will fail, as we have removedday_of_tsfrom current schema.A sample UT for repro (modified from this PR) :
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@singhpk234 Thanks for the thorough explanation. I wasn't aware of the replacement by a void transform, and it indeed has a similar issue:
Thanks for raising this, and let me think of a solution. Do you know the historical reason to replace it with a void transform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko , This explanation from @rdblue, nicely explains the motivation behind introducing void transforms in v1 format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for linking @singhpk234, and that is indeed a very clear explanation and it makes a lot of sense.
I've added a condition to skip the validation when we encounter a
VoidTransform. Since it is almost always compatible, I think we should be okay with that, but curious to hear what others think.