feat(optimizer): Support predicate stitching in MaterializedViewRewrite#26728
feat(optimizer): Support predicate stitching in MaterializedViewRewrite#26728tdcmeehan merged 9 commits intoprestodb:masterfrom
Conversation
There was a problem hiding this comment.
Sorry @tdcmeehan, your pull request is larger than the review limit of 150000 diff characters
steveburnett
left a comment
There was a problem hiding this comment.
Nice work on the documentation! A few nits and suggestions, nothing major.
fae20f5 to
188b139
Compare
dc5813d to
b8e71ad
Compare
There was a problem hiding this comment.
Sorry @tdcmeehan, your pull request is larger than the review limit of 150000 diff characters
57a3df9 to
cb620c1
Compare
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
...om/facebook/presto/sql/planner/iterative/rule/materializedview/DifferentialPlanRewriter.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
5f10aec to
5e519da
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
@tdcmeehan thanks for implementing this massive in scope but incredibly sophisticated feature. I've just finished going through the entire PR — overall looks great to me! Just had a few more small questions/comments on some specific parts.
...om/facebook/presto/sql/planner/iterative/rule/materializedview/DifferentialPlanRewriter.java
Show resolved
Hide resolved
...om/facebook/presto/sql/planner/iterative/rule/materializedview/DifferentialPlanRewriter.java
Show resolved
Hide resolved
...om/facebook/presto/sql/planner/iterative/rule/materializedview/DifferentialPlanRewriter.java
Show resolved
Hide resolved
...om/facebook/presto/sql/planner/iterative/rule/materializedview/DifferentialPlanRewriter.java
Outdated
Show resolved
Hide resolved
...om/facebook/presto/sql/planner/iterative/rule/materializedview/DifferentialPlanRewriter.java
Outdated
Show resolved
Hide resolved
8edc873 to
4c87f53
Compare
4c87f53 to
248d61c
Compare
|
Thank you for your thorough review @hantangwangd. The CI is green now, would you please review again when you can? |
There was a problem hiding this comment.
@tdcmeehan thanks for the fix. Just a few final nits and minor issues, otherwise looks great to me!
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
...om/facebook/presto/sql/planner/iterative/rule/materializedview/DifferentialPlanRewriter.java
Outdated
Show resolved
Hide resolved
hantangwangd
left a comment
There was a problem hiding this comment.
@tdcmeehan thanks for the fix. Just a few final nits and minor issues, otherwise looks great to me!
a599f00 to
477dce7
Compare
477dce7 to
7d28cca
Compare
|
Thank you @hantangwangd, I've updated the PR. |
hantangwangd
left a comment
There was a problem hiding this comment.
LGTM! Thanks for this great feature.
aaneja
left a comment
There was a problem hiding this comment.
LGTM. Nothing major - just some ideas for future work + questions
| if (!status.isFullyMaterialized() && !status.getPartitionsFromBaseTables().isEmpty()) { | ||
| Map<SchemaTableName, MaterializedDataPredicates> constraints = status.getPartitionsFromBaseTables(); | ||
|
|
||
| if (shouldStitch && canUseDataTableWithSecurityChecks(node, metadataResolver, session, definition, context)) { |
| AND o.order_date >= '2024-01-01' -- Original filter preserved | ||
| ) | ||
|
|
||
| The partition predicate is propagated to equivalent columns in joined tables (in this case, |
There was a problem hiding this comment.
Is this done by hand ? The predicate pushdown rule would've done this too once the filter on the data query is established
| * <p>Only columns marked as {@code isDirectMapped=true} are included in equivalence classes. | ||
| * This is a safety constraint: passthrough columns contain exactly the same data | ||
| * as the base table columns, allowing predicates to be safely translated between them. | ||
| * | ||
| * <p>Columns that are transformed (e.g., {@code COALESCE(dt, '2024-01-01')}) are NOT | ||
| * included because predicate translation through transformations could produce incorrect | ||
| * results. For example, a base table row with {@code dt=NULL} would not match | ||
| * {@code dt='2024-01-01'}, but the MV row would match after the COALESCE transformation. |
There was a problem hiding this comment.
Yet another way to implement this could be to build the logical properties for the dataTablePlan and the viewQueryPlan plans. Then by comparing their equivalence properties we can infer if a predicate on the view plan can translate to a predicate on the data table plan
It removes the need to track if a MV column is isDirectMapped
| lessThan(VARCHAR, utf8Slice("2024-01-02")), | ||
| greaterThan(VARCHAR, utf8Slice("2024-01-02")))), false), |
There was a problem hiding this comment.
I think we have a bug in the Util#domainsMatch method - if one was to accidentally make this a Range lessThan(VARCHAR, utf8Slice("2025-01-02")), we would get an infinite range and the contains check in domainsMatch would let it pass through
Should we implement an exact matcher for the MV table scan constraints instead since it is key to this test ?
| assertUpdate("CREATE MATERIALIZED VIEW mv_agg_join " + | ||
| "WITH (partitioning = ARRAY['order_date', 'reg_date']) AS " + | ||
| "SELECT c.name, o.order_date, c.reg_date, SUM(o.amount) as total_amount, COUNT(*) as order_count " + | ||
| "FROM agg_orders o JOIN agg_customers c ON o.customer_id = c.customer_id " + | ||
| "GROUP BY c.name, o.order_date, c.reg_date"); |
There was a problem hiding this comment.
How about a test that proves that no stitching is possible when -
- The GROUP BY for the query on the MV is for a non-partitioning column
- The MV query is itself an aggregation but on a non-partitioning column
Note to reviewers: while this PR is large, a substantial portion of this PR is in test coverage. Please focus the review on
MaterializedViewRewrite,DifferentialPlanRewriterandPassthroughColumnEquivalences, which are more manageable (~`1500 LOCs).Description
Implement predicate stitching for materialized views in MaterializedViewRewrite. When a materialized view is partially stale, the optimizer can now generate a UNION query that reads fresh data from storage and recomputes only the stale portions from base tables.
Depends on: #26764
Motivation and Context
Fixes #26756
Large materialized views are expensive to fully recompute. When only some base table partitions have changed since the last refresh, this change enables the optimizer to selectively recompute only the stale data rather than either serving stale results or reprocessing terabytes of unchanged data.
Impact
USE_STITCHINGvalue formaterialized_view_stale_read_behaviorsession property (for default behavior when no table property value is present)materialized_view_staleness_window,materialized_view_force_staleiceberg.materialized-view-max-changed-partitions(default: 100)Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.