Skip to content

feat: Cost-based MV candidate selection for query rewriting (#27222)#27222

Merged
ceekay47 merged 1 commit intoprestodb:masterfrom
ceekay47:export-D92582456
Mar 27, 2026
Merged

feat: Cost-based MV candidate selection for query rewriting (#27222)#27222
ceekay47 merged 1 commit intoprestodb:masterfrom
ceekay47:export-D92582456

Conversation

@ceekay47
Copy link
Copy Markdown
Contributor

@ceekay47 ceekay47 commented Feb 26, 2026

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

Step 1 — AST Rewriting (MaterializedViewQueryOptimizer)
When materialized_view_query_rewrite_cost_based_selection_enabled is true, the optimizer calls rewriteWithAllCandidates() instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into QueryWithMVRewriteCandidates — a new QueryBody AST node that bundles the original QuerySpecification with all candidate MVRewriteCandidate entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

Step 2 — Semantic Analysis (StatementAnalyzer)
A new visitQueryWithMVRewriteCandidates handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in Analysis.mvCandidateScopes. The original query's scope is returned as the node's output scope.

Step 3 — Logical Planning (RelationPlanner)
A new visitQueryWithMVRewriteCandidates handler plans the original query and each candidate query into separate RelationPlan trees. These are bundled into MVRewriteCandidatesNode — a new PlanNode in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

Step 4 — Cost-Based Optimization (SelectLowestCostMVRewrite)
A new IterativeOptimizer rule matches MVRewriteCandidatesNode. It uses the CostProvider to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via CostComparator. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a ProjectNode is added to align them. The rule records which plan was selected (original or which MV) for debugging via getStatsSource().

New types introduced:

  • QueryWithMVRewriteCandidates (presto-parser) — AST node bundling original query with MV rewrite candidates
  • MVRewriteCandidatesNode (presto-spi) — Plan node for cost-based MV selection
  • SelectLowestCostMVRewrite (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property materialized_view_query_rewrite_cost_based_selection_enabled (default: false).

Differential Revision: D92582456

== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.

@ceekay47 ceekay47 requested review from a team, feilong-liu and jaystarshot as code owners February 26, 2026 21:59
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Feb 26, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Feb 26, 2026

Reviewer's Guide

Implements cost-based selection among multiple materialized view rewrite candidates by threading a new AST node and plan node through analysis and planning, and adding an optimizer rule that compares costs of the original and MV-based plans, gated by a new session property.

Sequence diagram for cost-based MV rewrite selection flow

sequenceDiagram
    participant Client
    participant Session
    participant MaterializedViewQueryOptimizer
    participant StatementAnalyzer
    participant RelationPlanner
    participant IterativeOptimizer
    participant SelectLowestCostMVRewrite

    Client->>Session: submit_query
    Session->>MaterializedViewQueryOptimizer: optimize(QuerySpecification)
    MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: rewriteQuerySpecificationIfCompatible
    MaterializedViewQueryOptimizer->>Session: isMaterializedViewQueryRewriteCostBasedSelectionEnabled
    alt cost_based_enabled
        MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: rewriteWithAllCandidates
        MaterializedViewQueryOptimizer-->>StatementAnalyzer: QueryWithMVRewriteCandidates
    else disabled
        MaterializedViewQueryOptimizer-->>StatementAnalyzer: first_compatible_rewrite_or_original
        StatementAnalyzer-->>RelationPlanner: QuerySpecification
        RelationPlanner-->>IterativeOptimizer: logical_plan
        IterativeOptimizer-->>Client: optimized_plan
        Client-->>Client: return
    end

    StatementAnalyzer->>StatementAnalyzer: visitQueryWithMVRewriteCandidates
    StatementAnalyzer->>StatementAnalyzer: analyze originalQuery
    StatementAnalyzer->>StatementAnalyzer: analyze each MVRewriteCandidate.rewrittenQuery
    StatementAnalyzer-->>RelationPlanner: Scope for QueryWithMVRewriteCandidates

    RelationPlanner->>RelationPlanner: visitQueryWithMVRewriteCandidates
    RelationPlanner->>RelationPlanner: plan originalQuery -> RelationPlan originalPlan
    RelationPlanner->>RelationPlanner: plan each candidate.rewrittenQuery -> RelationPlan candidatePlan
    RelationPlanner-->>IterativeOptimizer: MVRewriteCandidatesNode

    IterativeOptimizer->>SelectLowestCostMVRewrite: apply on MVRewriteCandidatesNode
    SelectLowestCostMVRewrite->>Session: isMaterializedViewQueryRewriteCostBasedSelectionEnabled
    alt rule_enabled
        SelectLowestCostMVRewrite->>SelectLowestCostMVRewrite: compute costs via CostProvider
        SelectLowestCostMVRewrite->>SelectLowestCostMVRewrite: compare with CostComparator
        SelectLowestCostMVRewrite->>SelectLowestCostMVRewrite: pick lowest_cost_plan
        alt outputs_match
            SelectLowestCostMVRewrite-->>IterativeOptimizer: selected PlanNode
        else outputs_differ
            SelectLowestCostMVRewrite->>SelectLowestCostMVRewrite: build Assignments
            SelectLowestCostMVRewrite-->>IterativeOptimizer: ProjectNode(selected_plan)
        end
    else rule_disabled
        SelectLowestCostMVRewrite-->>IterativeOptimizer: no_change
    end

    IterativeOptimizer-->>Client: final_optimized_plan
Loading

Class diagram for new MV rewrite AST, plan node, and optimizer rule

classDiagram
    class MaterializedViewQueryOptimizer {
        +rewriteQuerySpecificationIfCompatible(querySpecification: QuerySpecification, baseTable: Table) Node
        -rewriteWithAllCandidates(originalQuery: QuerySpecification, referencedMaterializedViews: List~QualifiedObjectName~) QueryBody
    }

    class QueryBody

    class QuerySpecification

    class QueryWithMVRewriteCandidates {
        -QuerySpecification originalQuery
        -List~MVRewriteCandidate~ candidates
        +QueryWithMVRewriteCandidates(originalQuery: QuerySpecification, candidates: List~MVRewriteCandidate~)
        +QueryWithMVRewriteCandidates(location: NodeLocation, originalQuery: QuerySpecification, candidates: List~MVRewriteCandidate~)
        +getOriginalQuery() QuerySpecification
        +getCandidates() List~MVRewriteCandidate~
        +accept(visitor: AstVisitor, context: Object) Object
        +getChildren() List~Node~
    }

    class MVRewriteCandidate_Ast {
        <<static inner>>
        -QuerySpecification rewrittenQuery
        -String materializedViewCatalog
        -String materializedViewSchema
        -String materializedViewName
        +MVRewriteCandidate(rewrittenQuery: QuerySpecification, materializedViewCatalog: String, materializedViewSchema: String, materializedViewName: String)
        +getRewrittenQuery() QuerySpecification
        +getMaterializedViewCatalog() String
        +getMaterializedViewSchema() String
        +getMaterializedViewName() String
        +getFullyQualifiedName() String
    }

    class PlanNode {
        +getOutputVariables() List~VariableReferenceExpression~
    }

    class VariableReferenceExpression

    class MVRewriteCandidatesNode {
        -PlanNode originalPlan
        -List~MVRewriteCandidate_Plan~ candidates
        -List~VariableReferenceExpression~ outputVariables
        +MVRewriteCandidatesNode(sourceLocation: Optional~SourceLocation~, id: PlanNodeId, originalPlan: PlanNode, candidates: List~MVRewriteCandidate_Plan~, outputVariables: List~VariableReferenceExpression~)
        +MVRewriteCandidatesNode(sourceLocation: Optional~SourceLocation~, id: PlanNodeId, statsEquivalentPlanNode: Optional~PlanNode~, originalPlan: PlanNode, candidates: List~MVRewriteCandidate_Plan~, outputVariables: List~VariableReferenceExpression~)
        +getOriginalPlan() PlanNode
        +getCandidates() List~MVRewriteCandidate_Plan~
        +getOutputVariables() List~VariableReferenceExpression~
        +getSources() List~PlanNode~
        +accept(visitor: PlanVisitor, context: Object) Object
        +assignStatsEquivalentPlanNode(statsEquivalentPlanNode: Optional~PlanNode~) PlanNode
        +replaceChildren(newChildren: List~PlanNode~) PlanNode
    }

    class MVRewriteCandidate_Plan {
        <<static inner>>
        -PlanNode plan
        -String materializedViewCatalog
        -String materializedViewSchema
        -String materializedViewName
        +MVRewriteCandidate(plan: PlanNode, materializedViewCatalog: String, materializedViewSchema: String, materializedViewName: String)
        +getPlan() PlanNode
        +getMaterializedViewCatalog() String
        +getMaterializedViewSchema() String
        +getMaterializedViewName() String
        +getFullyQualifiedName() String
    }

    class Rule~MVRewriteCandidatesNode~

    class SelectLowestCostMVRewrite {
        -CostComparator costComparator
        +SelectLowestCostMVRewrite(costComparator: CostComparator)
        +getPattern() Pattern~MVRewriteCandidatesNode~
        +isEnabled(session: Session) boolean
        +isCostBased(session: Session) boolean
        +apply(node: MVRewriteCandidatesNode, captures: Captures, context: Context) Result
    }

    class CostComparator {
        +compare(session: Session, left: PlanCostEstimate, right: PlanCostEstimate) int
    }

    class CostProvider {
        +getCost(node: PlanNode) PlanCostEstimate
    }

    class ProjectNode {
        +ProjectNode(id: PlanNodeId, source: PlanNode, assignments: Assignments)
    }

    class Assignments {
        +builder() Assignments.Builder
    }

    MaterializedViewQueryOptimizer --> QueryWithMVRewriteCandidates : creates
    QueryWithMVRewriteCandidates --* MVRewriteCandidate_Ast : contains
    MVRewriteCandidatesNode --* MVRewriteCandidate_Plan : contains
    MVRewriteCandidatesNode ..|> PlanNode
    QueryWithMVRewriteCandidates ..|> QueryBody
    SelectLowestCostMVRewrite ..|> Rule~MVRewriteCandidatesNode~
    SelectLowestCostMVRewrite --> MVRewriteCandidatesNode : uses
    SelectLowestCostMVRewrite --> CostProvider : uses
    SelectLowestCostMVRewrite --> CostComparator : uses
    SelectLowestCostMVRewrite --> ProjectNode : may_wrap_selected_plan
Loading

File-Level Changes

Change Details Files
Collect all compatible MV rewrites in the analyzer as a new QueryBody instead of committing to the first compatible MV.
  • Change rewriteQuerySpecificationIfCompatible to optionally return a QueryWithMVRewriteCandidates node when cost-based MV selection is enabled.
  • Introduce rewriteWithAllCandidates to build MVRewriteCandidate entries for each successfully rewritten materialized view and fall back to the original query when none apply.
  • Skip materialized view data consistency checks when cost-based selection is enabled so all candidates can be considered downstream.
  • Minor cleanup of outdated comments in MaterializedViewQueryOptimizer.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/QueryWithMVRewriteCandidates.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java
Thread the new QueryWithMVRewriteCandidates node through semantic analysis and logical planning into a new MVRewriteCandidatesNode plan node.
  • Add StatementAnalyzer.visitQueryWithMVRewriteCandidates to analyze the original query for output scope and analyze each candidate for planning scopes, storing scopes in Analysis.
  • Add RelationPlanner.visitQueryWithMVRewriteCandidates to build RelationPlans for the original query and each candidate, wrapping them in an MVRewriteCandidatesNode with the original field mappings as output variables.
  • Extend PlanVisitor/AstVisitor patterns and sanity checks (ValidateDependenciesChecker) to understand MVRewriteCandidatesNode, validate candidate outputs, and ensure node outputs are consistent with the original plan.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/sanity/ValidateDependenciesChecker.java
presto-spi/src/main/java/com/facebook/presto/spi/plan/MVRewriteCandidatesNode.java
presto-spi/src/main/java/com/facebook/presto/spi/plan/PlanVisitor.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/Patterns.java
Introduce a cost-based optimizer rule to choose the lowest-cost plan among the original and MV candidates, adding projections when necessary.
  • Add SelectLowestCostMVRewrite iterative rule that matches MVRewriteCandidatesNode, uses CostProvider and CostComparator to pick the cheapest plan (skipping unknown-cost candidates), and falls back to the original when appropriate.
  • Ensure output variables match the original by inserting a ProjectNode when the selected candidate’s outputs differ, with a position-based mapping after validating sizes.
  • Wire the rule into PlanOptimizers so it runs in the cost-based phase, and gate it on the new session property.
  • Add unit tests for the rule to cover cheaper candidate selection, original selection when cheaper/equal, handling unknown costs, multiple candidates, output-variable projection, and mixed known/unknown stats.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SelectLowestCostMVRewrite.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSelectLowestCostMVRewrite.java
Define and test the new MVRewriteCandidatesNode plan node for holding original and candidate MV plans.
  • Implement MVRewriteCandidatesNode with original plan, candidate list, and explicit output variables, including JSON annotations for serialization.
  • Implement getSources, replaceChildren with child-count validation, statsEquivalent propagation, and a nested MVRewriteCandidate carrying MV catalog/schema/name and fully qualified name helper.
  • Add focused unit tests to validate getSources ordering, outputVariables, replaceChildren behavior and error case, candidate metadata accessors, and stats-equivalent propagation.
presto-spi/src/main/java/com/facebook/presto/spi/plan/MVRewriteCandidatesNode.java
presto-spi/src/test/java/com/facebook/presto/spi/plan/TestMVRewriteCandidatesNode.java
Expose and gate cost-based MV rewrite selection via configuration and session properties.
  • Add materializedViewQueryRewriteCostBasedSelectionEnabled to FeaturesConfig with config key materialized-view-query-rewrite-cost-based-selection-enabled.
  • Add SystemSessionProperties constant, metadata, and accessor isMaterializedViewQueryRewriteCostBasedSelectionEnabled, defaulting to false.
  • Use the new session property in MaterializedViewQueryOptimizer and SelectLowestCostMVRewrite.isEnabled to toggle the new behavior.
  • Allow MV rewrite to proceed without data-consistency checks when cost-based selection is enabled so candidates reach the optimizer.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SelectLowestCostMVRewrite.java

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSelectLowestCostMVRewrite.java" line_range="49" />
<code_context>
+ * getStats(node.getSource()), not the filter's own stats.
+ */
+@Test(singleThreaded = true)
+public class TestSelectLowestCostMVRewrite
+{
+    private static final CostComparator COST_COMPARATOR = new CostComparator(1, 1, 1);
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test that verifies the rule is disabled when the session property is false

Currently all tests cover only the cost-based behavior with `materialized_view_query_rewrite_cost_based_selection_enabled` set to `true` in `RuleTester`. Please add a test that runs `SelectLowestCostMVRewrite` with this property disabled (or not set), and asserts that an `MVRewriteCandidatesNode` is not transformed. This will verify that `isEnabled(session)` correctly gates the rule when the feature is turned off.

Suggested implementation:

```java
    @BeforeClass
    public void setUp()
    {
        tester = new RuleTester(ImmutableList.of(), ImmutableMap.of(
                "materialized_view_query_rewrite_cost_based_selection_enabled", "true"), Optional.of(NODES_COUNT));
    }

    @Test
    public void testRuleDisabledWhenSessionPropertyFalse()
    {
        RuleTester disabledTester = new RuleTester(
                ImmutableList.of(),
                ImmutableMap.of("materialized_view_query_rewrite_cost_based_selection_enabled", "false"),
                Optional.of(NODES_COUNT));

        disabledTester.assertThat(new SelectLowestCostMVRewrite(COST_COMPARATOR))
                .on(this::buildPlanWithMvCandidates)
                .doesNotFire();
    }

```

To make this compile and to ensure the test actually verifies the behavior you described, you should:

1. Reuse the same plan shape you already use for the positive/“cost-based” tests:
   - Identify the method or lambda that currently builds a plan containing an `MVRewriteCandidatesNode` which is transformed by `SelectLowestCostMVRewrite` when the rule is enabled.
   - Extract that plan construction logic into a helper method with the following signature in this test class:
     ```java
     private PlanNode buildPlanWithMvCandidates(PlanBuilder planBuilder)
     ```
   - Move the existing plan-building code into this helper, and call it both from the existing “rule fires” test(s) and from `testRuleDisabledWhenSessionPropertyFalse`.

2. If you don’t currently have a helper, create it by lifting the body of the `.on(planBuilder -> { ... })` lambda from your main cost-based test, so that:
   - The cost-based test still asserts `.matches(...)` or equivalent.
   - This new `testRuleDisabledWhenSessionPropertyFalse` uses the exact same plan via `.on(this::buildPlanWithMvCandidates)` but asserts `.doesNotFire()` when the session property is set to `"false"`.

This way, `testRuleDisabledWhenSessionPropertyFalse` will explicitly confirm that `SelectLowestCostMVRewrite.isEnabled(session)` prevents the rule from firing when `materialized_view_query_rewrite_cost_based_selection_enabled` is turned off, while sharing the same MV rewrite candidate plan as the enabled tests.
</issue_to_address>

### Comment 2
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSelectLowestCostMVRewrite.java" line_range="261" />
<code_context>
+    }
+
+    @Test
+    public void testAddsProjectionForDifferentOutputVariables()
+    {
+        PlanNode result = tester.assertThat(new SelectLowestCostMVRewrite(COST_COMPARATOR))
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen projection test by asserting the underlying chosen source as well as the projected outputs

In `testAddsProjectionForDifferentOutputVariables`, the assertions only confirm that a `ProjectNode` is added and that the output variable is re-aliased to `col`. To also validate that the correct MV candidate is chosen, consider asserting that:

- `((ProjectNode) result).getSource()` is a `FilterNode`, and
- The filter’s source `PlanNodeId` equals `"mv1Src"`.

This will ensure the test covers both candidate selection and projection behavior.

Suggested implementation:

```java
        assertTrue(result instanceof ProjectNode);
        ProjectNode projectNode = (ProjectNode) result;
        assertTrue(projectNode.getSource() instanceof FilterNode);
        FilterNode filterNode = (FilterNode) projectNode.getSource();
        assertEquals(filterNode.getSource().getId(), new PlanNodeId("mv1Src"));

```

1. Ensure `ProjectNode` and `FilterNode` are imported at the top of the file if they are not already:
   - `import com.facebook.presto.sql.planner.plan.ProjectNode;`
   - `import com.facebook.presto.sql.planner.plan.FilterNode;`
2. If the assertion line in this test is not exactly `assertTrue(result instanceof ProjectNode);`, adjust the SEARCH text to match the existing assertion on `result` being a `ProjectNode` and apply the same REPLACE block.
3. Keep any existing assertions about the projected outputs (e.g., aliasing to `col`) after this new block so the test still validates both the projection and the chosen MV candidate.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

* getStats(node.getSource()), not the filter's own stats.
*/
@Test(singleThreaded = true)
public class TestSelectLowestCostMVRewrite
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test that verifies the rule is disabled when the session property is false

Currently all tests cover only the cost-based behavior with materialized_view_query_rewrite_cost_based_selection_enabled set to true in RuleTester. Please add a test that runs SelectLowestCostMVRewrite with this property disabled (or not set), and asserts that an MVRewriteCandidatesNode is not transformed. This will verify that isEnabled(session) correctly gates the rule when the feature is turned off.

Suggested implementation:

    @BeforeClass
    public void setUp()
    {
        tester = new RuleTester(ImmutableList.of(), ImmutableMap.of(
                "materialized_view_query_rewrite_cost_based_selection_enabled", "true"), Optional.of(NODES_COUNT));
    }

    @Test
    public void testRuleDisabledWhenSessionPropertyFalse()
    {
        RuleTester disabledTester = new RuleTester(
                ImmutableList.of(),
                ImmutableMap.of("materialized_view_query_rewrite_cost_based_selection_enabled", "false"),
                Optional.of(NODES_COUNT));

        disabledTester.assertThat(new SelectLowestCostMVRewrite(COST_COMPARATOR))
                .on(this::buildPlanWithMvCandidates)
                .doesNotFire();
    }

To make this compile and to ensure the test actually verifies the behavior you described, you should:

  1. Reuse the same plan shape you already use for the positive/“cost-based” tests:

    • Identify the method or lambda that currently builds a plan containing an MVRewriteCandidatesNode which is transformed by SelectLowestCostMVRewrite when the rule is enabled.
    • Extract that plan construction logic into a helper method with the following signature in this test class:
      private PlanNode buildPlanWithMvCandidates(PlanBuilder planBuilder)
    • Move the existing plan-building code into this helper, and call it both from the existing “rule fires” test(s) and from testRuleDisabledWhenSessionPropertyFalse.
  2. If you don’t currently have a helper, create it by lifting the body of the .on(planBuilder -> { ... }) lambda from your main cost-based test, so that:

    • The cost-based test still asserts .matches(...) or equivalent.
    • This new testRuleDisabledWhenSessionPropertyFalse uses the exact same plan via .on(this::buildPlanWithMvCandidates) but asserts .doesNotFire() when the session property is set to "false".

This way, testRuleDisabledWhenSessionPropertyFalse will explicitly confirm that SelectLowestCostMVRewrite.isEnabled(session) prevents the rule from firing when materialized_view_query_rewrite_cost_based_selection_enabled is turned off, while sharing the same MV rewrite candidate plan as the enabled tests.

}

@Test
public void testAddsProjectionForDifferentOutputVariables()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Strengthen projection test by asserting the underlying chosen source as well as the projected outputs

In testAddsProjectionForDifferentOutputVariables, the assertions only confirm that a ProjectNode is added and that the output variable is re-aliased to col. To also validate that the correct MV candidate is chosen, consider asserting that:

  • ((ProjectNode) result).getSource() is a FilterNode, and
  • The filter’s source PlanNodeId equals "mv1Src".

This will ensure the test covers both candidate selection and projection behavior.

Suggested implementation:

        assertTrue(result instanceof ProjectNode);
        ProjectNode projectNode = (ProjectNode) result;
        assertTrue(projectNode.getSource() instanceof FilterNode);
        FilterNode filterNode = (FilterNode) projectNode.getSource();
        assertEquals(filterNode.getSource().getId(), new PlanNodeId("mv1Src"));
  1. Ensure ProjectNode and FilterNode are imported at the top of the file if they are not already:
    • import com.facebook.presto.sql.planner.plan.ProjectNode;
    • import com.facebook.presto.sql.planner.plan.FilterNode;
  2. If the assertion line in this test is not exactly assertTrue(result instanceof ProjectNode);, adjust the SEARCH text to match the existing assertion on result being a ProjectNode and apply the same REPLACE block.
  3. Keep any existing assertions about the projected outputs (e.g., aliasing to col) after this new block so the test still validates both the projection and the chosen MV candidate.

@ceekay47 ceekay47 changed the title [presto] Cost-based MV candidate selection for query rewriting feat: Cost-based MV candidate selection for query rewriting Feb 27, 2026
ceekay47 added a commit to ceekay47/presto that referenced this pull request Feb 27, 2026
…odb#27222)

Summary:
Pull Request resolved: prestodb#27222

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

Differential Revision: D92582456
ceekay47 added a commit to ceekay47/presto that referenced this pull request Feb 27, 2026
…odb#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

Differential Revision: D92582456
@ceekay47 ceekay47 force-pushed the export-D92582456 branch 2 times, most recently from 37ff330 to 690f15c Compare February 27, 2026 23:45
ceekay47 added a commit to ceekay47/presto that referenced this pull request Feb 27, 2026
…odb#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

Differential Revision: D92582456
ceekay47 added a commit to ceekay47/presto that referenced this pull request Feb 27, 2026
…odb#27222)

Summary:
Pull Request resolved: prestodb#27222

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

Differential Revision: D92582456
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Feb 27, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: ceekay47 / name: Chandrakant Vankayalapati (36d7806)

ceekay47 added a commit to ceekay47/presto that referenced this pull request Feb 27, 2026
…odb#27222)

Summary:
Pull Request resolved: prestodb#27222

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

Differential Revision: D92582456
@meta-codesync meta-codesync bot changed the title feat: Cost-based MV candidate selection for query rewriting [presto] Cost-based MV candidate selection for query rewriting (#27222) Mar 17, 2026
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 17, 2026
…odb#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

Differential Revision: D92582456
@ceekay47 ceekay47 changed the title [presto] Cost-based MV candidate selection for query rewriting (#27222) feat: Cost-based MV candidate selection for query rewriting (#27222) Mar 17, 2026
@meta-codesync meta-codesync bot changed the title feat: Cost-based MV candidate selection for query rewriting (#27222) [presto] Cost-based MV candidate selection for query rewriting (#27222) Mar 17, 2026
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 17, 2026
…odb#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

Differential Revision: D92582456
@ceekay47 ceekay47 changed the title [presto] Cost-based MV candidate selection for query rewriting (#27222) feat: Cost-based MV candidate selection for query rewriting (#27222) Mar 17, 2026
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Gated behind session property materialized_view_query_rewrite_cost_based_selection_enabled (default: false)."

Is materialized_view_query_rewrite_cost_based_selection_enabled documented? If not, please add doc for it to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties-session.rst.

Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change. I left a few initial comments. Also, I'm wondering if it's possible to add a few test cases in TestHiveLogicalPlanner to verify the execution plans generated for real query examples on Hive.

@ceekay47
Copy link
Copy Markdown
Contributor Author

Thank you for the review @hantangwangd! Addressed the comments and added new test cases.

ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 24, 2026
…#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

Differential Revision: D92582456
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 24, 2026
…#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

```
== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.
```

Differential Revision: D92582456
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 24, 2026
…#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

```
== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.
```

Differential Revision: D92582456
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 24, 2026
…#27222)

Summary:
Pull Request resolved: prestodb#27222

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

```
== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.
```

Differential Revision: D92582456
@ceekay47 ceekay47 requested a review from hantangwangd March 25, 2026 00:51
Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test cases, overall looks good to me. Just a few more little things.

Comment on lines +408 to +439
if (!isMaterializedViewDataConsistencyEnabled(session)) {
session.getRuntimeStats().addMetricValue(OPTIMIZED_WITH_MATERIALIZED_VIEW_SUBQUERY_COUNT, NONE, 1);
return rewrittenQuerySpecification;
}

// TODO: We should be able to leverage this information in the StatementAnalyzer as well.
MaterializedViewStatus materializedViewStatus = getMaterializedViewStatus(querySpecification);
if (materializedViewStatus.isPartiallyMaterialized() || materializedViewStatus.isFullyMaterialized()) {
session.getRuntimeStats().addMetricValue(OPTIMIZED_WITH_MATERIALIZED_VIEW_SUBQUERY_COUNT, NONE, 1);
return rewrittenQuerySpecification;
}
session.getRuntimeStats().addMetricValue(MANY_PARTITIONS_MISSING_IN_MATERIALIZED_VIEW_COUNT, NONE, 1);
return querySpecification;
session.getRuntimeStats().addMetricValue(OPTIMIZED_WITH_MATERIALIZED_VIEW_SUBQUERY_COUNT, NONE, 1);
return rewrittenQuerySpecification;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, not making any changes here means the test cases in TestMaterializedViewQueryOptimizer won't need any adjustments. My initial comment was intended to suggest removing the newly added condition and retaining the original logic. Please let me know if you see it differently.

Copy link
Copy Markdown
Contributor Author

@ceekay47 ceekay47 Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. My intention behind removing the condition entirely was that the MV status check also happens during analysis, which handles stitching for partially refreshed MVs and falls back to base tables for unmaterialized
views. So the check here seemed redundant.

That said, I am happy to take this as a separate discussion and PR if you'd prefer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted this to the original logic + skipped only for the new cost based flow. We can revisit the broader change as a separate discussion. Please let me know if you're aligned.

if (materializedViewStatus.isPartiallyMaterialized() || materializedViewStatus.isFullyMaterialized()) {
                    session.getRuntimeStats().addMetricValue(OPTIMIZED_WITH_MATERIALIZED_VIEW_SUBQUERY_COUNT, NONE, 1);
                    return rewrittenQuerySpecification;
                }
                session.getRuntimeStats().addMetricValue(MANY_PARTITIONS_MISSING_IN_MATERIALIZED_VIEW_COUNT, NONE, 1);
                return querySpecification;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Yes, I got your initial point. I agree with you that, this part is worth a separate discussion, particularly around the validation behavior embedded in getMaterializedViewStatus(querySpecification) — as shown in the test case.

Just one last point to touch on: should the new cost-base flow also take into account the materialized view status when isMaterializedViewDataConsistencyEnabled(session) is true? Or, in other words, what if we simply remove the newly added condition that bypasses the subsequent check — || isMaterializedViewQueryRewriteCostBasedSelectionEnabled(session)? Curious to hear your thoughts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the isMaterializedViewQueryRewriteCostBasedSelectionEnabled(session) condition, the cost-based flow would make a redundant getMaterializedViewStatus call for
each MV candidate here, same as the current flow.

But if we want to discuss whether the status check is needed here separately, I can remove the isMaterializedViewQueryRewriteCostBasedSelectionEnabled
check for now and handle both flows together later. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the isMaterializedViewQueryRewriteCostBasedSelectionEnabled check for now and handle both flows together later.

Thanks for the fix. Sounds good to me!

ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 26, 2026
…#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

```
== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.
```

Differential Revision: D92582456
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 26, 2026
…#27222)

Summary:
Pull Request resolved: prestodb#27222

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

```
== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.
```

Differential Revision: D92582456
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 27, 2026
…#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

```
== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.
```

Differential Revision: D92582456
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 27, 2026
…#27222)

Summary:
Pull Request resolved: prestodb#27222

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

```
== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.
```

Differential Revision: D92582456
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 27, 2026
…#27222)

Summary:

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

```
== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.
```

Differential Revision: D92582456
…#27222)

Summary:
Pull Request resolved: prestodb#27222

Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.

This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:

**Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)**
When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.

**Step 2 — Semantic Analysis (`StatementAnalyzer`)**
A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope.

**Step 3 — Logical Planning (`RelationPlanner`)**
A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.

**Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)**
A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`.

New types introduced:
- `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates
- `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection
- `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan

Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false).

```
== RELEASE NOTES ==

  General Changes
  * Add cost-based selection for materialized view query rewriting. When multiple
    materialized views exist for the same base table, the optimizer now evaluates
    all compatible rewrites and selects the lowest-cost plan. This can be enabled
    with the ``materialized_view_query_rewrite_cost_based_selection_enabled``
    session property.
```

Differential Revision: D92582456
@ceekay47 ceekay47 merged commit acba1e1 into prestodb:master Mar 27, 2026
152 of 157 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants