Skip to content

feat(optimizer): Support predicate stitching in MaterializedViewRewrite#26728

Merged
tdcmeehan merged 9 commits intoprestodb:masterfrom
tdcmeehan:mv-iceberg-disjuncts
Mar 10, 2026
Merged

feat(optimizer): Support predicate stitching in MaterializedViewRewrite#26728
tdcmeehan merged 9 commits intoprestodb:masterfrom
tdcmeehan:mv-iceberg-disjuncts

Conversation

@tdcmeehan
Copy link
Contributor

@tdcmeehan tdcmeehan commented Dec 2, 2025

Note to reviewers: while this PR is large, a substantial portion of this PR is in test coverage. Please focus the review on MaterializedViewRewrite, DifferentialPlanRewriter and PassthroughColumnEquivalences, which are more manageable (~`1500 LOCs).

Description

Implement predicate stitching for materialized views in MaterializedViewRewrite. When a materialized view is partially stale, the optimizer can now generate a UNION query that reads fresh data from storage and recomputes only the stale portions from base tables.

Depends on: #26764

Motivation and Context

Fixes #26756

Large materialized views are expensive to fully recompute. When only some base table partitions have changed since the last refresh, this change enables the optimizer to selectively recompute only the stale data rather than either serving stale results or reprocessing terabytes of unchanged data.

Impact

  • New USE_STITCHING value for materialized_view_stale_read_behavior session property (for default behavior when no table property value is present)
  • New session properties: materialized_view_staleness_window, materialized_view_force_stale
  • New Iceberg config: iceberg.materialized-view-max-changed-partitions (default: 100)
  • Iceberg connector now tracks changed partitions for staleness detection

Test Plan

  • Added unit tests and integration tests

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add ``USE_STITCHING`` mode for ``materialized_view_stale_read_behavior`` session property to selectively recompute stale data instead of full recomputation.
* Add ``materialized_view_staleness_window`` session property to configure acceptable staleness duration.
* Add ``materialized_view_force_stale`` session property for testing stale read behavior.

Iceberg Connector Changes
* Add ``iceberg.materialized-view-max-changed-partitions`` config property (default: 100) to limit partition tracking for predicate stitching.
* Add support for tracking changed partitions in materialized views to enable predicate stitching optimization.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Dec 2, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @tdcmeehan, your pull request is larger than the review limit of 150000 diff characters

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the documentation! A few nits and suggestions, nothing major.

@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch 9 times, most recently from fae20f5 to 188b139 Compare December 10, 2025 21:12
@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch 2 times, most recently from dc5813d to b8e71ad Compare December 12, 2025 18:19
@tdcmeehan tdcmeehan requested a review from aaneja December 12, 2025 18:28
@tdcmeehan tdcmeehan marked this pull request as ready for review December 12, 2025 18:28
@prestodb-ci prestodb-ci requested review from a team and NivinCS and removed request for a team December 12, 2025 18:28
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @tdcmeehan, your pull request is larger than the review limit of 150000 diff characters

@tdcmeehan tdcmeehan marked this pull request as draft December 13, 2025 04:01
@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch 3 times, most recently from 57a3df9 to cb620c1 Compare February 17, 2026 18:33
@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch from 5f10aec to 5e519da Compare February 23, 2026 15:06
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan thanks for implementing this massive in scope but incredibly sophisticated feature. I've just finished going through the entire PR — overall looks great to me! Just had a few more small questions/comments on some specific parts.

@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch 3 times, most recently from 8edc873 to 4c87f53 Compare March 5, 2026 02:28
@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch from 4c87f53 to 248d61c Compare March 7, 2026 11:58
@tdcmeehan
Copy link
Contributor Author

Thank you for your thorough review @hantangwangd. The CI is green now, would you please review again when you can?

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan thanks for the fix. Just a few final nits and minor issues, otherwise looks great to me!

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan thanks for the fix. Just a few final nits and minor issues, otherwise looks great to me!

@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch 2 times, most recently from a599f00 to 477dce7 Compare March 9, 2026 10:03
@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch from 477dce7 to 7d28cca Compare March 9, 2026 15:48
@tdcmeehan tdcmeehan requested a review from a team as a code owner March 9, 2026 15:48
@tdcmeehan
Copy link
Contributor Author

Thank you @hantangwangd, I've updated the PR.

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this great feature.

@tdcmeehan tdcmeehan merged commit 2a97656 into prestodb:master Mar 10, 2026
114 of 115 checks passed
Copy link
Contributor

@aaneja aaneja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nothing major - just some ideas for future work + questions

Comment on lines +129 to +132
if (!status.isFullyMaterialized() && !status.getPartitionsFromBaseTables().isEmpty()) {
Map<SchemaTableName, MaterializedDataPredicates> constraints = status.getPartitionsFromBaseTables();

if (shouldStitch && canUseDataTableWithSecurityChecks(node, metadataResolver, session, definition, context)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : if's can be combined

AND o.order_date >= '2024-01-01' -- Original filter preserved
)

The partition predicate is propagated to equivalent columns in joined tables (in this case,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this done by hand ? The predicate pushdown rule would've done this too once the filter on the data query is established

Comment on lines +41 to +48
* <p>Only columns marked as {@code isDirectMapped=true} are included in equivalence classes.
* This is a safety constraint: passthrough columns contain exactly the same data
* as the base table columns, allowing predicates to be safely translated between them.
*
* <p>Columns that are transformed (e.g., {@code COALESCE(dt, '2024-01-01')}) are NOT
* included because predicate translation through transformations could produce incorrect
* results. For example, a base table row with {@code dt=NULL} would not match
* {@code dt='2024-01-01'}, but the MV row would match after the COALESCE transformation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another way to implement this could be to build the logical properties for the dataTablePlan and the viewQueryPlan plans. Then by comparing their equivalence properties we can infer if a predicate on the view plan can translate to a predicate on the data table plan

It removes the need to track if a MV column is isDirectMapped

Comment on lines +1159 to +1160
lessThan(VARCHAR, utf8Slice("2024-01-02")),
greaterThan(VARCHAR, utf8Slice("2024-01-02")))), false),
Copy link
Contributor

@aaneja aaneja Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a bug in the Util#domainsMatch method - if one was to accidentally make this a Range lessThan(VARCHAR, utf8Slice("2025-01-02")), we would get an infinite range and the contains check in domainsMatch would let it pass through

Should we implement an exact matcher for the MV table scan constraints instead since it is key to this test ?

Comment on lines +1135 to +1139
assertUpdate("CREATE MATERIALIZED VIEW mv_agg_join " +
"WITH (partitioning = ARRAY['order_date', 'reg_date']) AS " +
"SELECT c.name, o.order_date, c.reg_date, SUM(o.amount) as total_amount, COUNT(*) as order_count " +
"FROM agg_orders o JOIN agg_customers c ON o.customer_id = c.customer_id " +
"GROUP BY c.name, o.order_date, c.reg_date");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a test that proves that no stitching is possible when -

  1. The GROUP BY for the query on the MV is for a non-partitioning column
  2. The MV query is itself an aggregation but on a non-partitioning column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disjunctive Predicate Stitching for Materialized Views and Partition Stitching for Iceberg

6 participants