Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
import com.facebook.presto.hive.metastore.Partition;
import com.facebook.presto.hive.metastore.PartitionStatistics;
import com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore;
import com.facebook.presto.hive.metastore.StorageFormat;
import com.facebook.presto.hive.metastore.Table;
import com.facebook.presto.spi.ColumnHandle;
import com.facebook.presto.spi.ConnectorSession;
Expand All @@ -55,7 +56,10 @@
import com.google.common.collect.Lists;
import com.google.common.collect.Ordering;
import io.airlift.units.DataSize;
import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;
import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
import org.apache.hudi.hadoop.HoodieParquetInputFormat;
import org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat;
import org.weakref.jmx.Managed;
import org.weakref.jmx.Nested;

Expand Down Expand Up @@ -385,7 +389,18 @@ private Iterable<HivePartitionMetadata> getPartitionMetadata(
}
}

Optional<HiveStorageFormat> storageFormat = getHiveStorageFormat(table.getStorage().getStorageFormat());
StorageFormat storageFormat = table.getStorage().getStorageFormat();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we check session property or configuration here? according to the release note:
This is enabled when configuration property hive.parquet.use-column-names or the hive catalog session property parquet_use_column_names is set to true. By default they are mapped by index.

Copy link
Copy Markdown
Member Author

@imjalpreet imjalpreet Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao Line 392 and Line 393 are required in any case. Only next couple of lines are not mandatory if hive.parquet.use-column-names is not set to true. But in any case value of this variable is only used in one method getTableToPartitionMapping which has the required check at line 528.

Anyways, I have added a check here as well, let me know if you feel it is not necessary.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks nice

Optional<HiveStorageFormat> hiveStorageFormat = getHiveStorageFormat(storageFormat);

Optional<HiveStorageFormat> resolvedHiveStorageFormat;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

Optional<HiveStorageFormat> resolvedHiveStorageFormat = hiveStorageFormat;
if (isUseParquetColumnNames(session)) {
 ...
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao The variable resolvedHiveStorageFormat is being used in a lambda function so it has to be a final or an effectively final variable. If I set it before the if condition it won't remain an effectively final variable as it's value will change for the second time inside the if block.

Due to this reason I had to use an else block.

Let me know if you think I can improve it some other way.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might miss something. which lambda function is used for resolvedHiveStorageFormat? seems it is only used in getTableToPartitionMapping?


if (isUseParquetColumnNames(session)) {
// Use Hive Storage Format as Parquet if table is of HUDI format
resolvedHiveStorageFormat = (!hiveStorageFormat.isPresent() && isHudiFormat(storageFormat)) ? Optional.of(PARQUET) : hiveStorageFormat;
}
else {
resolvedHiveStorageFormat = hiveStorageFormat;
}

Iterable<List<HivePartition>> partitionNameBatches = partitionExponentially(hivePartitions, minPartitionBatchSize, maxPartitionBatchSize);
Iterable<List<HivePartitionMetadata>> partitionBatches = transform(partitionNameBatches, partitionBatch -> {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao I was talking about this. You are right that resolvedHiveStorageFormat is only being used in the method getTableToPartitionMapping but that method is being called from this lambda function. The call is on the line 466 in the updated code.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it. you are correct

Expand Down Expand Up @@ -448,7 +463,7 @@ private Iterable<HivePartitionMetadata> getPartitionMetadata(
if ((tableColumns == null) || (partitionColumns == null)) {
throw new PrestoException(HIVE_INVALID_METADATA, format("Table '%s' or partition '%s' has null columns", tableName, partitionName));
}
TableToPartitionMapping tableToPartitionMapping = getTableToPartitionMapping(session, storageFormat, tableName, partitionName, tableColumns, partitionColumns);
TableToPartitionMapping tableToPartitionMapping = getTableToPartitionMapping(session, resolvedHiveStorageFormat, tableName, partitionName, tableColumns, partitionColumns);

if (hiveBucketHandle.isPresent() && !hiveBucketHandle.get().isVirtuallyBucketed()) {
Optional<HiveBucketProperty> partitionBucketProperty = partition.getStorage().getBucketProperty();
Expand Down Expand Up @@ -603,6 +618,21 @@ private PrestoException tablePartitionColumnMismatchException(SchemaTableName ta
partitionType));
}

/**
* This method is used to check if a table is of HUDI format
*
* @param storageFormat Table Storage Format
* @return true if table is of HUDI format, else false
*/
private boolean isHudiFormat(StorageFormat storageFormat)
{
String serde = storageFormat.getSerDeNullable();
String inputFormat = storageFormat.getInputFormatNullable();
return serde != null && serde.equals(ParquetHiveSerDe.class.getName())
&& (inputFormat != null && (inputFormat.equals(HoodieParquetInputFormat.class.getName())
|| inputFormat.equals(HoodieParquetRealtimeInputFormat.class.getName())));
}

private Map<String, PartitionSplitInfo> getPartitionSplitInfo(
ConnectorSession session,
SemiTransactionalHiveMetastore metastore,
Expand Down