Skip to content

Store physical partition name in Delta CREATE TABLE AS SELECT#18662

Merged
ebyhr merged 1 commit intotrinodb:masterfrom
ebyhr:ebi/delta-ctas-partition
Aug 28, 2023
Merged

Store physical partition name in Delta CREATE TABLE AS SELECT#18662
ebyhr merged 1 commit intotrinodb:masterfrom
ebyhr:ebi/delta-ctas-partition

Conversation

@ebyhr
Copy link
Copy Markdown
Member

@ebyhr ebyhr commented Aug 14, 2023

Description

Fixes #18661

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Fix writing incorrect transaction log in case of partitioned tables with `id` or `name` column mapping mode. ({issue}`18661`)

@cla-bot cla-bot bot added the cla-signed label Aug 14, 2023
@github-actions github-actions bot added tests:hive delta-lake Delta Lake connector labels Aug 14, 2023
@ebyhr ebyhr requested review from findepi, findinpath and pajaks August 14, 2023 00:36
@ebyhr ebyhr force-pushed the ebi/delta-ctas-partition branch from c690881 to dbe6df0 Compare August 14, 2023 02:08
@ebyhr ebyhr marked this pull request as draft August 14, 2023 03:40
Also, fix dropColumn method to avoid name base matching.
@ebyhr ebyhr force-pushed the ebi/delta-ctas-partition branch from dbe6df0 to 4021c4b Compare August 14, 2023 03:51
@ebyhr ebyhr marked this pull request as ready for review August 14, 2023 05:44
@ebyhr ebyhr removed the request for review from pajaks August 14, 2023 05:44
ColumnMappingMode columnMappingMode = handle.getColumnMappingMode();
String schemaString = handle.getSchemaString();
List<String> columnNames = handle.getInputColumns().stream().map(DeltaLakeColumnHandle::getBaseColumnName).collect(toImmutableList());
List<String> physicalPartitionNames = handle.getInputColumns().stream()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be something we didn't consider in #12638
I don't see at the moment any further related open issues.

handle.getComment(),
handle.getProtocolEntry());
appendAddFileEntries(transactionLogWriter, dataFileInfos, handle.getPartitionedBy(), columnNames, true);
appendAddFileEntries(transactionLogWriter, dataFileInfos, physicalPartitionNames, columnNames, true);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a couple lines up in the appendTableEntries call, should we use physicalPartitionNames there too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean handle.getPartitionedBy() for partitionColumns in metaData? The current implementation is same as OSS Delta Lake. They store non-physical partition column names in the field.

@ebyhr ebyhr merged commit d26b72e into trinodb:master Aug 28, 2023
@ebyhr ebyhr deleted the ebi/delta-ctas-partition branch August 28, 2023 22:22
@ebyhr ebyhr self-assigned this Aug 28, 2023
@github-actions github-actions bot added this to the 426 milestone Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector

Development

Successfully merging this pull request may close these issues.

Cannot read partition & id or name column mapping mode tables created by CREATE TABLE AS SELECT

5 participants