Skip to content

[Iceberg] Add predicate in layout without pushdown_filter_enabled#22395

Merged
ZacBlanco merged 1 commit intoprestodb:masterfrom
ZacBlanco:upstream-iceberg-stat-select-inconsistency
May 1, 2024
Merged

[Iceberg] Add predicate in layout without pushdown_filter_enabled#22395
ZacBlanco merged 1 commit intoprestodb:masterfrom
ZacBlanco:upstream-iceberg-stat-select-inconsistency

Conversation

@ZacBlanco
Copy link
Contributor

@ZacBlanco ZacBlanco commented Apr 1, 2024

Description

The iceberg.pushdown_filter_enabled flag added recently
changed how the iceberg connector metadata generates
table layouts. The previous fork in the logic would cause
layouts to be generated without a partition predicate constraint
even if it existed. This can lead to table stats being incorrect
and hence incorrect statistics and poor query planning when
filtering on partitioned tables or when the config is disabled.

The original logic led to the codepath responsible for
filling in the predicate for tables when filter pushdown was disabled
to be unreachable. This change will now allows the predicates to
show up on partition columns in the layout.

Motivation and Context

Incorrect stats for planning.

Fixes #22357

Impact

N/A

Test Plan

  • New test to ensure layouts are pushed into the connector table layout which is propagated to the stats collection code.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-stat-select-inconsistency branch 10 times, most recently from f5d4180 to 407a56a Compare April 3, 2024 15:36
@ZacBlanco ZacBlanco changed the title [Iceberg] Constraints in layouts for Java clusters [Iceberg] Add predicate in layout without pushdown_filter_enabled Apr 3, 2024
@ZacBlanco ZacBlanco marked this pull request as ready for review April 3, 2024 17:28
@ZacBlanco ZacBlanco requested review from a team and hantangwangd as code owners April 3, 2024 17:28
hantangwangd
hantangwangd previously approved these changes Apr 6, 2024
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only one nit.

@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-stat-select-inconsistency branch from 407a56a to 5bf3f7c Compare April 8, 2024 16:07
hantangwangd
hantangwangd previously approved these changes Apr 8, 2024
@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-stat-select-inconsistency branch from 5bf3f7c to ca1c9f5 Compare April 25, 2024 15:26
@tdcmeehan tdcmeehan self-assigned this Apr 25, 2024
The `iceberg.pushdown_filter_enabled` flag added recently
changed how the iceberg connector metadata generates
table layouts. The previous fork in the logic would cause
layouts to be generated without a partition predicate constraint
even if it existed. This can lead to table stats being incorrect
and hence incorrect statistics and poor query planning when
filtering on partitioned tables or when the config is disabled.

The original logic led to the codepath which was responsible for
filling in the predicate for tables when filter pushdown was disabled
to be unreachable. This change will now allows the predicates to
show up on partition columns in the layout.
@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-stat-select-inconsistency branch from ca1c9f5 to 4053a18 Compare April 25, 2024 16:38
@ZacBlanco ZacBlanco merged commit b2d4532 into prestodb:master May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Iceberg] Distinct value statistics are wrong when filters are applied

3 participants