Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -383,12 +383,12 @@ private void updatePartitionStatisticsBatch(Table table, Map<String, Function<Pa
.collect(toImmutableMap(HiveUtil::toPartitionValues, identity()));

List<Partition> partitions = batchGetPartition(table, ImmutableList.copyOf(updates.keySet()));
Map<Partition, Map<String, HiveColumnStatistics>> statisticsPerPartition = columnStatisticsProvider.getPartitionColumnStatistics(partitions);
Map<String, PartitionStatistics> partitionsStatistics = getPartitionStatistics(table, partitions);

statisticsPerPartition.forEach((partition, columnStatistics) -> {
partitions.forEach(partition -> {
Function<PartitionStatistics, PartitionStatistics> update = updates.get(partitionValuesToName.get(partition.getValues()));

PartitionStatistics currentStatistics = new PartitionStatistics(getHiveBasicStatistics(partition.getParameters()), columnStatistics);
PartitionStatistics currentStatistics = partitionsStatistics.get(makePartitionName(table, partition));
PartitionStatistics updatedStatistics = update.apply(currentStatistics);

Map<String, String> updatedStatisticsParameters = updateStatisticsParameters(partition.getParameters(), updatedStatistics.getBasicStatistics());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -297,11 +297,53 @@ public void testUpdatePartitionColumnStatisticsEmptyOptionalFields()
// in order to avoid incorrect data we skip writes for statistics with min/max = null
}

@Override
public void testUpdateBasicPartitionStatistics()
throws Exception
{
SchemaTableName tableName = temporaryTable("update_basic_partition_statistics");
try {
createDummyPartitionedTable(tableName, STATISTICS_PARTITIONED_TABLE_COLUMNS);
testUpdatePartitionStatistics(
tableName,
PartitionStatistics.empty(),
ImmutableList.of(BASIC_STATISTICS_1, BASIC_STATISTICS_2),
ImmutableList.of(BASIC_STATISTICS_2, BASIC_STATISTICS_1));
}
finally {
dropTable(tableName);
}
}

@Override
public void testUpdatePartitionColumnStatistics()
throws Exception
{
SchemaTableName tableName = temporaryTable("update_partition_column_statistics");
try {
createDummyPartitionedTable(tableName, STATISTICS_PARTITIONED_TABLE_COLUMNS);
// When the table has partitions, but row count statistics are set to zero, we treat this case as empty
// statistics to avoid underestimation in the CBO. This scenario may be caused when other engines are
// used to ingest data into partitioned hive tables.
testUpdatePartitionStatistics(
tableName,
PartitionStatistics.empty(),
ImmutableList.of(STATISTICS_1_1, STATISTICS_1_2, STATISTICS_2),
ImmutableList.of(STATISTICS_1_2, STATISTICS_1_1, STATISTICS_2));
}
finally {
dropTable(tableName);
}
}

@Override
public void testStorePartitionWithStatistics()
throws Exception
{
testStorePartitionWithStatistics(STATISTICS_PARTITIONED_TABLE_COLUMNS, BASIC_STATISTICS_1, BASIC_STATISTICS_2, BASIC_STATISTICS_1, ZERO_TABLE_STATISTICS);
// When the table has partitions, but row count statistics are set to zero, we treat this case as empty
// statistics to avoid underestimation in the CBO. This scenario may be caused when other engines are
// used to ingest data into partitioned hive tables.
Comment on lines 343 to 345
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test override is in Glue tests, but comment wording doesn't sound Glue-specific.
Please explain.

Copy link
Copy Markdown
Member Author

@Dith3r Dith3r Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR which introduced this regression was about altering hive metastore response for partitioned tables when metastore returned zero row count and is consistent across all impacted implementations (excluding in memory, file). Provided link to PR in the description.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see similar overrides in other tests added in #17677.

shouldn't we rather change default test behavior to be what-we-want-most, and have the overrides for metastores that lag behind?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per offline discussion. override is needed only for those (broken) metastores that incorrectly return rowCount=0 when table/partition has not been analyzed.

testStorePartitionWithStatistics(STATISTICS_PARTITIONED_TABLE_COLUMNS, BASIC_STATISTICS_1, BASIC_STATISTICS_2, BASIC_STATISTICS_1, PartitionStatistics.empty());
}

@Override
Expand Down