Add support for partition schema evolution for HUDI tables#16348
Add support for partition schema evolution for HUDI tables#16348zhenxiao merged 1 commit intoprestodb:masterfrom
Conversation
There was a problem hiding this comment.
how about:
s/finalHiveStorageFormat/resolvedStorageFormat/g
There was a problem hiding this comment.
Sure, made the requested change.
There was a problem hiding this comment.
shall we check session property or configuration here? according to the release note:
This is enabled when configuration property hive.parquet.use-column-names or the hive catalog session property parquet_use_column_names is set to true. By default they are mapped by index.
There was a problem hiding this comment.
@zhenxiao Line 392 and Line 393 are required in any case. Only next couple of lines are not mandatory if hive.parquet.use-column-names is not set to true. But in any case value of this variable is only used in one method getTableToPartitionMapping which has the required check at line 528.
Anyways, I have added a check here as well, let me know if you feel it is not necessary.
imjalpreet
left a comment
There was a problem hiding this comment.
@zhenxiao Sorry for the delay, I was away for the past few days.
I have made the requested changes, please let me know your views.
There was a problem hiding this comment.
@zhenxiao Line 392 and Line 393 are required in any case. Only next couple of lines are not mandatory if hive.parquet.use-column-names is not set to true. But in any case value of this variable is only used in one method getTableToPartitionMapping which has the required check at line 528.
Anyways, I have added a check here as well, let me know if you feel it is not necessary.
There was a problem hiding this comment.
Sure, made the requested change.
dc3c96a to
bc530d9
Compare
zhenxiao
left a comment
There was a problem hiding this comment.
thank you, @imjalpreet
looks good. once minor issue
| StorageFormat storageFormat = table.getStorage().getStorageFormat(); | ||
| Optional<HiveStorageFormat> hiveStorageFormat = getHiveStorageFormat(storageFormat); | ||
|
|
||
| Optional<HiveStorageFormat> resolvedHiveStorageFormat; |
There was a problem hiding this comment.
how about:
Optional<HiveStorageFormat> resolvedHiveStorageFormat = hiveStorageFormat;
if (isUseParquetColumnNames(session)) {
...
}
There was a problem hiding this comment.
@zhenxiao The variable resolvedHiveStorageFormat is being used in a lambda function so it has to be a final or an effectively final variable. If I set it before the if condition it won't remain an effectively final variable as it's value will change for the second time inside the if block.
Due to this reason I had to use an else block.
Let me know if you think I can improve it some other way.
There was a problem hiding this comment.
I might miss something. which lambda function is used for resolvedHiveStorageFormat? seems it is only used in getTableToPartitionMapping?
imjalpreet
left a comment
There was a problem hiding this comment.
@zhenxiao I have left a comment at the line where the lambda function starts. Can you have a look and let me know in case of any concerns.
| } | ||
|
|
||
| Iterable<List<HivePartition>> partitionNameBatches = partitionExponentially(hivePartitions, minPartitionBatchSize, maxPartitionBatchSize); | ||
| Iterable<List<HivePartitionMetadata>> partitionBatches = transform(partitionNameBatches, partitionBatch -> { |
There was a problem hiding this comment.
@zhenxiao I was talking about this. You are right that resolvedHiveStorageFormat is only being used in the method getTableToPartitionMapping but that method is being called from this lambda function. The call is on the line 466 in the updated code.
| } | ||
|
|
||
| Iterable<List<HivePartition>> partitionNameBatches = partitionExponentially(hivePartitions, minPartitionBatchSize, maxPartitionBatchSize); | ||
| Iterable<List<HivePartitionMetadata>> partitionBatches = transform(partitionNameBatches, partitionBatch -> { |
This is a follow-up PR for #16011. This PR enables partition schema evolution for HUDI tables.