Handle partition schema evolution in partitions metadata by homar · Pull Request #12416 · trinodb/trino

homar · 2022-05-16T12:24:01Z

Description

Provides information about partitioning based on the set of all columns which were used in any spec.

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

findinpath · 2022-05-16T13:40:23Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

drop variable assignment.

findinpath · 2022-05-16T13:46:29Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

Do we need to perform deduplication here?

Yeah, I think we do

findinpath · 2022-05-16T13:54:05Z

Can you pls add a test inspired by https://blog.starburst.io/trino-on-ice-ii-in-place-table-evolution-and-cloud-compatibility-with-iceberg which uses transforms https://trino.io/docs/current/connector/iceberg.html#partitioned-tables (e.g. : from month(ts) to day(ts)).

Also adding a partition field, dropping it and later adding it again would be welcome to see whether the deduplication of the partition fields is needed.

findinpath · 2022-05-16T13:55:12Z

This PR could build much easier tests upon the functionality exposed by the PR #12259

@alexjo2144

alexjo2144 · 2022-05-16T14:21:49Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

Suggested change

private List<PartitionField> getAllPartitionFields(Table icebergTable)

private static List<PartitionField> getAllPartitionFields(Table icebergTable)

alexjo2144 · 2022-05-16T14:22:14Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

Yeah, I think we do

alexjo2144 · 2022-05-16T14:22:34Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

toUnmodifiableList() -> toImmutableList()

alexjo2144 · 2022-05-16T14:24:14Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

Isn't there a class level variable with these already?

Suggested change

return buildRecordCursor(getStatisticsByPartition(tableScan), getAllPartitionFields(icebergTable));

return buildRecordCursor(getStatisticsByPartition(tableScan), partitionFields);

it is not, but I can make it

findepi · 2022-05-19T15:08:48Z

@alexjo2144 @findinpath PTAL

findinpath · 2022-05-19T15:20:04Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

unrelated to the current commit:

if (partitionFields.isEmpty()) { return Optional.empty(); }

this can be replaced with a check

if (fields.isEmpty()) { return Optional.empty(); }

at the beginning of the method.

I will add it as a separate commit

findinpath · 2022-05-19T15:24:57Z

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

please check the partitioning before doing changes on the partition fields.

findinpath · 2022-05-19T19:30:59Z

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

Why is the old_partition_key shown here?

The current snapshot of the table has only new_partition_key as partition key.

Checking the partitions metadata table with Iceberg Spark implementation shows also only new_partition_key.

discussed offline, I will add more query to spark to show that behaviour is the same

alexjo2144 · 2022-05-19T19:46:51Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

fieldIds?

alexjo2144 · 2022-05-19T19:53:16Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

Can we use a Map from fieldId -> Type rather than doing the double iteration on the lists?

I may be missing something but I failed to create such a map.
The reason for this inner loop is not to find type(it is found above it) but to assign data from partitionStruct to a correct partition field. The problem is partitionStruct comes from reading the files using fileScanTasks and here I am trying to match that with partitioning columns that come from reading the table spec.
My assumption is that none of this place contains all information this is why I need to match them.
It'd be great if I'm wrong about that, please point it out.
Tough before I started changing this it was done this way, maybe that was wrong and I should have tried to change it - I doubt this.

hmm I may have figured out how to improve this tough

alexjo2144 · 2022-05-19T19:55:43Z

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

Can you add an insert along with each of these alter table sections so we can see the partition using this field show up?

…Type

findinpath · 2022-05-23T08:11:13Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

nit (optional): extract the logic for filtering duplicates to a different method. (just to improve the readability)

findinpath · 2022-05-23T08:19:52Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

The field partitionColumnTypes can be dropped now.
It is used only once in a code branch which is dependent to partitionColumnType

findinpath · 2022-05-23T08:24:43Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

this.fieldIdToIndex = fields.stream().collect(Collectors.toMap(NestedField::fieldId, Function.identity()));

This is incorrect. I need mapping from fieldId -> its index, your code gives fieldId -> NestedField, I could probably do it with some zipWithIndex method but I think standard way is more readible

Sorry, i didn't pay enough attention here. Thanks for the explanation.

alexjo2144

Couple nits but looks good to me

alexjo2144 · 2022-05-23T15:19:11Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

Suggested change

partitionFields = getAllPartitionFields(icebergTable);

this.partitionFields = getAllPartitionFields(icebergTable);

alexjo2144 · 2022-05-23T15:22:11Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

This is not possible right now, because of apache/iceberg#4563 right?

yes but not only, it is also to avoid name conflicts with columns that were renamed

alexjo2144 · 2022-05-23T15:22:41Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

Use Stream#distinct?

I can't rely on PartitionField implementation of hashCode and equals as they take transformation into account and I really care about Id's. I could maybe provide my own comparator or something but I think this is cleaner now

alexjo2144 · 2022-05-23T15:25:44Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

Use ImmutableMap.Builder

sure but that's more code

alexjo2144 · 2022-05-23T15:26:11Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTable.java

Lets rename this, now that the field is fieldIdToIndex instead of the structType itself

findepi · 2022-05-25T20:03:30Z

Merged, thanks!
Thank you @findinpath @alexjo2144 for your review

cla-bot bot added the cla-signed label May 16, 2022

github-actions bot added the tests:hive label May 16, 2022

findepi requested review from alexjo2144 and findinpath May 16, 2022 13:26

findinpath reviewed May 16, 2022

View reviewed changes

alexjo2144 reviewed May 16, 2022

View reviewed changes

homar force-pushed the homar/fix_iceberg_partitions_metadata branch 3 times, most recently from c0f61a5 to e3854c2 Compare May 18, 2022 07:24

homar requested review from alexjo2144, findepi and findinpath May 18, 2022 10:02

findinpath reviewed May 19, 2022

View reviewed changes

alexjo2144 reviewed May 19, 2022

View reviewed changes

Refactor of io.trino.plugin.iceberg.PartitionTable#getPartitionColumn…

45b04b5

…Type

homar force-pushed the homar/fix_iceberg_partitions_metadata branch from e3854c2 to a8b1e3f Compare May 20, 2022 15:45

findinpath reviewed May 23, 2022

View reviewed changes

findinpath approved these changes May 23, 2022

View reviewed changes

homar force-pushed the homar/fix_iceberg_partitions_metadata branch from a8b1e3f to 30f0d72 Compare May 23, 2022 10:08

alexjo2144 approved these changes May 23, 2022

View reviewed changes

homar force-pushed the homar/fix_iceberg_partitions_metadata branch from 30f0d72 to 4d5e849 Compare May 25, 2022 07:57

Handle partition schema evolution in partitions metadata

9d31fc7

homar force-pushed the homar/fix_iceberg_partitions_metadata branch from 4d5e849 to 9d31fc7 Compare May 25, 2022 10:14

findepi merged commit db099cf into trinodb:master May 25, 2022

github-actions bot added this to the 382 milestone May 25, 2022

mosabua mentioned this pull request May 25, 2022

Add Trino 382 release notes #12440

Merged

	private List<PartitionField> getAllPartitionFields(Table icebergTable)
	private static List<PartitionField> getAllPartitionFields(Table icebergTable)

	return buildRecordCursor(getStatisticsByPartition(tableScan), getAllPartitionFields(icebergTable));
	return buildRecordCursor(getStatisticsByPartition(tableScan), partitionFields);

	partitionFields = getAllPartitionFields(icebergTable);
	this.partitionFields = getAllPartitionFields(icebergTable);

Conversation

homar commented May 16, 2022

Description

Related issues, pull requests, and links

Documentation

Release notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findinpath commented May 16, 2022

Uh oh!

findinpath commented May 16, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findepi commented May 19, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findinpath May 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

homar May 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexjo2144 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

homar May 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findinpath May 19, 2022 •

edited

Loading

homar May 20, 2022 •

edited

Loading

homar May 25, 2022 •

edited

Loading