Handle partition schema evolution in partitions metadata#12416
Handle partition schema evolution in partitions metadata#12416findepi merged 2 commits intotrinodb:masterfrom
Conversation
There was a problem hiding this comment.
drop variable assignment.
There was a problem hiding this comment.
Do we need to perform deduplication here?
|
Can you pls add a test inspired by https://blog.starburst.io/trino-on-ice-ii-in-place-table-evolution-and-cloud-compatibility-with-iceberg which uses transforms https://trino.io/docs/current/connector/iceberg.html#partitioned-tables (e.g. : from Also adding a partition field, dropping it and later adding it again would be welcome to see whether the deduplication of the partition fields is needed. |
|
This PR could build much easier tests upon the functionality exposed by the PR #12259 |
There was a problem hiding this comment.
| private List<PartitionField> getAllPartitionFields(Table icebergTable) | |
| private static List<PartitionField> getAllPartitionFields(Table icebergTable) |
There was a problem hiding this comment.
toUnmodifiableList() -> toImmutableList()
There was a problem hiding this comment.
Isn't there a class level variable with these already?
| return buildRecordCursor(getStatisticsByPartition(tableScan), getAllPartitionFields(icebergTable)); | |
| return buildRecordCursor(getStatisticsByPartition(tableScan), partitionFields); |
There was a problem hiding this comment.
it is not, but I can make it
c0f61a5 to
e3854c2
Compare
|
@alexjo2144 @findinpath PTAL |
There was a problem hiding this comment.
unrelated to the current commit:
if (partitionFields.isEmpty()) {
return Optional.empty();
}
this can be replaced with a check
if (fields.isEmpty()) {
return Optional.empty();
}
at the beginning of the method.
There was a problem hiding this comment.
I will add it as a separate commit
There was a problem hiding this comment.
please check the partitioning before doing changes on the partition fields.
There was a problem hiding this comment.
Why is the old_partition_key shown here?
The current snapshot of the table has only new_partition_key as partition key.
Checking the partitions metadata table with Iceberg Spark implementation shows also only new_partition_key.
There was a problem hiding this comment.
discussed offline, I will add more query to spark to show that behaviour is the same
There was a problem hiding this comment.
Can we use a Map from fieldId -> Type rather than doing the double iteration on the lists?
There was a problem hiding this comment.
I may be missing something but I failed to create such a map.
The reason for this inner loop is not to find type(it is found above it) but to assign data from partitionStruct to a correct partition field. The problem is partitionStruct comes from reading the files using fileScanTasks and here I am trying to match that with partitioning columns that come from reading the table spec.
My assumption is that none of this place contains all information this is why I need to match them.
It'd be great if I'm wrong about that, please point it out.
Tough before I started changing this it was done this way, maybe that was wrong and I should have tried to change it - I doubt this.
There was a problem hiding this comment.
hmm I may have figured out how to improve this tough
There was a problem hiding this comment.
Can you add an insert along with each of these alter table sections so we can see the partition using this field show up?
e3854c2 to
a8b1e3f
Compare
There was a problem hiding this comment.
nit (optional): extract the logic for filtering duplicates to a different method. (just to improve the readability)
There was a problem hiding this comment.
The field partitionColumnTypes can be dropped now.
It is used only once in a code branch which is dependent to partitionColumnType
There was a problem hiding this comment.
this.fieldIdToIndex = fields.stream().collect(Collectors.toMap(NestedField::fieldId, Function.identity()));
There was a problem hiding this comment.
This is incorrect. I need mapping from fieldId -> its index, your code gives fieldId -> NestedField, I could probably do it with some zipWithIndex method but I think standard way is more readible
There was a problem hiding this comment.
Sorry, i didn't pay enough attention here. Thanks for the explanation.
a8b1e3f to
30f0d72
Compare
alexjo2144
left a comment
There was a problem hiding this comment.
Couple nits but looks good to me
There was a problem hiding this comment.
| partitionFields = getAllPartitionFields(icebergTable); | |
| this.partitionFields = getAllPartitionFields(icebergTable); |
There was a problem hiding this comment.
This is not possible right now, because of apache/iceberg#4563 right?
There was a problem hiding this comment.
yes but not only, it is also to avoid name conflicts with columns that were renamed
There was a problem hiding this comment.
I can't rely on PartitionField implementation of hashCode and equals as they take transformation into account and I really care about Id's. I could maybe provide my own comparator or something but I think this is cleaner now
There was a problem hiding this comment.
Lets rename this, now that the field is fieldIdToIndex instead of the structType itself
30f0d72 to
4d5e849
Compare
4d5e849 to
9d31fc7
Compare
|
Merged, thanks! |
Description
Provides information about partitioning based on the set of all columns which were used in any spec.
Related issues, pull requests, and links
Fixes: #12323
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: