Add support for partition schema evolution for HUDI tables by imjalpreet · Pull Request #16348 · prestodb/presto

imjalpreet · 2021-06-28T09:02:21Z

This is a follow-up PR for #16011. This PR enables partition schema evolution for HUDI tables.

== RELEASE NOTES ==

Hive Changes
* Add support for allowing to match columns between table and partition schemas by names for HUDI tables. This is enabled when configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index.

zhenxiao

one minor thing

zhenxiao · 2021-06-29T14:38:19Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

how about:
s/finalHiveStorageFormat/resolvedStorageFormat/g

Sure, made the requested change.

zhenxiao · 2021-06-29T14:40:18Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

shall we check session property or configuration here? according to the release note:
This is enabled when configuration property hive.parquet.use-column-names or the hive catalog session property parquet_use_column_names is set to true. By default they are mapped by index.

@zhenxiao Line 392 and Line 393 are required in any case. Only next couple of lines are not mandatory if hive.parquet.use-column-names is not set to true. But in any case value of this variable is only used in one method getTableToPartitionMapping which has the required check at line 528.

Anyways, I have added a check here as well, let me know if you feel it is not necessary.

imjalpreet

@zhenxiao Sorry for the delay, I was away for the past few days.

I have made the requested changes, please let me know your views.

imjalpreet · 2021-07-08T08:02:58Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

@zhenxiao Line 392 and Line 393 are required in any case. Only next couple of lines are not mandatory if hive.parquet.use-column-names is not set to true. But in any case value of this variable is only used in one method getTableToPartitionMapping which has the required check at line 528.

Anyways, I have added a check here as well, let me know if you feel it is not necessary.

imjalpreet · 2021-07-08T08:03:33Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

Sure, made the requested change.

zhenxiao

thank you, @imjalpreet
looks good. once minor issue

zhenxiao · 2021-07-13T04:25:24Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

zhenxiao · 2021-07-13T04:26:10Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

+        StorageFormat storageFormat = table.getStorage().getStorageFormat();
+        Optional<HiveStorageFormat> hiveStorageFormat = getHiveStorageFormat(storageFormat);
+
+        Optional<HiveStorageFormat> resolvedHiveStorageFormat;


how about:

Optional<HiveStorageFormat> resolvedHiveStorageFormat = hiveStorageFormat; if (isUseParquetColumnNames(session)) { ... }

@zhenxiao The variable resolvedHiveStorageFormat is being used in a lambda function so it has to be a final or an effectively final variable. If I set it before the if condition it won't remain an effectively final variable as it's value will change for the second time inside the if block.

Due to this reason I had to use an else block.

Let me know if you think I can improve it some other way.

I might miss something. which lambda function is used for resolvedHiveStorageFormat? seems it is only used in getTableToPartitionMapping?

imjalpreet

@zhenxiao I have left a comment at the line where the lambda function starts. Can you have a look and let me know in case of any concerns.

imjalpreet · 2021-07-14T06:44:42Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

+        }

        Iterable<List<HivePartition>> partitionNameBatches = partitionExponentially(hivePartitions, minPartitionBatchSize, maxPartitionBatchSize);
        Iterable<List<HivePartitionMetadata>> partitionBatches = transform(partitionNameBatches, partitionBatch -> {


@zhenxiao I was talking about this. You are right that resolvedHiveStorageFormat is only being used in the method getTableToPartitionMapping but that method is being called from this lambda function. The call is on the line 466 in the updated code.

get it. you are correct

zhenxiao · 2021-07-14T07:02:35Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

+        }

        Iterable<List<HivePartition>> partitionNameBatches = partitionExponentially(hivePartitions, minPartitionBatchSize, maxPartitionBatchSize);
        Iterable<List<HivePartitionMetadata>> partitionBatches = transform(partitionNameBatches, partitionBatch -> {


get it. you are correct

imjalpreet requested a review from zhenxiao June 28, 2021 09:03

zhenxiao reviewed Jun 29, 2021

View reviewed changes

Add support for partition schema evolution for HUDI tables

bc530d9

imjalpreet commented Jul 8, 2021

View reviewed changes

imjalpreet force-pushed the HUDISchemaEvolution branch from dc3c96a to bc530d9 Compare July 8, 2021 08:52

imjalpreet requested a review from zhenxiao July 8, 2021 08:56

zhenxiao approved these changes Jul 13, 2021

View reviewed changes

imjalpreet commented Jul 14, 2021

View reviewed changes

zhenxiao approved these changes Jul 14, 2021

View reviewed changes

zhenxiao merged commit 5090612 into prestodb:master Jul 14, 2021

swapsmagic mentioned this pull request Jul 19, 2021

Add release notes for 0.258 #16442

Merged

Conversation

imjalpreet commented Jun 28, 2021

Uh oh!

zhenxiao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imjalpreet Jul 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imjalpreet left a comment

Choose a reason for hiding this comment

Uh oh!

imjalpreet Jul 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhenxiao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imjalpreet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imjalpreet Jul 8, 2021 •

edited

Loading

imjalpreet Jul 8, 2021 •

edited

Loading