Re-commit iceberg statistics and table handle revert#22666
Re-commit iceberg statistics and table handle revert#22666ZacBlanco merged 3 commits intoprestodb:masterfrom
Conversation
This makes working with estimates more friendly and ergonomic. They mimic the same style methods as java.util.Optional
Previously, the data size statistic was computed by using the Iceberg data manifests data size field. This is value is misleading for Presto because it represents the compressed on-disk size. This change allows ANALYZE to read and write data size statistic values to puffin files. This change also updates the hive-statistics-merge-strategy config value in the Iceberg connector to accept a comma-separated list of valid values to override from the HMS instead of using an independent enum. This allows for a wider variety of combinations using less code.
The `iceberg.pushdown_filter_enabled` flag added recently changed how the iceberg connector metadata generates table layouts. The previous fork in the logic would cause layouts to be generated without a partition predicate constraint even if it existed. This can lead to table stats being incorrect and hence incorrect statistics and poor query planning when filtering on partitioned tables or when the config is disabled. The original logic led to the codepath which was responsible for filling in the predicate for tables when filter pushdown was disabled to be unreachable. This change will now allows the predicates to show up on partition columns in the layout.
29885cd to
463fbe1
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff be20d29...463fbe1.
|
|
Is the root cause of the revert #22661 identified? |
|
@jaystarshot Yes, it's identified and described in #22664. I'm looking to add some additional tests to verify that change doesn't have other adverse affects when histograms are used. |
| TableMetadata tableMetadata = metadata.getTableMetadata(session, tableHandle); | ||
| List<ColumnHandle> nonHiddenColumns = ImmutableList.copyOf(tableMetadata.getColumns().stream().filter(column -> !column.isHidden()) | ||
| .map(ColumnMetadata::getName) | ||
| .map(columnHandles::get) | ||
| .filter(Objects::nonNull) | ||
| .collect(Collectors.toList())); | ||
| TableStatistics tableStatistics = metadata.getTableStatistics(session, tableHandle, nonHiddenColumns, constraint); |
There was a problem hiding this comment.
I see that this part is originally from #22327
Can you explain what this change is and how does it affect the non-iceberg tables?
There was a problem hiding this comment.
This prevents hidden columns from being queried when a user executes SHOW STATS. The HiveMetadata already implicitly filters out hidden columns when returning from getTableStatistics without a constraint. The line that renders the table also filters out the hidden columns.
presto/presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java
Lines 845 to 848 in 80ac016
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local build of docs, everything looks good. Thanks!
Description
This PR adds a separate commit with the prerequisite changes that are required when #22661 was merged. It re-adds the two additional commits so that they don't depend on the histogram PR.
Motivation and Context
Re-commit some changes that were reverted in #22661
Impact
N/A
Test Plan
N/A
Contributor checklist
Release Notes