feat: Cost-based MV candidate selection for query rewriting (#27222)#27222
feat: Cost-based MV candidate selection for query rewriting (#27222)#27222ceekay47 merged 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideImplements cost-based selection among multiple materialized view rewrite candidates by threading a new AST node and plan node through analysis and planning, and adding an optimizer rule that compares costs of the original and MV-based plans, gated by a new session property. Sequence diagram for cost-based MV rewrite selection flowsequenceDiagram
participant Client
participant Session
participant MaterializedViewQueryOptimizer
participant StatementAnalyzer
participant RelationPlanner
participant IterativeOptimizer
participant SelectLowestCostMVRewrite
Client->>Session: submit_query
Session->>MaterializedViewQueryOptimizer: optimize(QuerySpecification)
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: rewriteQuerySpecificationIfCompatible
MaterializedViewQueryOptimizer->>Session: isMaterializedViewQueryRewriteCostBasedSelectionEnabled
alt cost_based_enabled
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: rewriteWithAllCandidates
MaterializedViewQueryOptimizer-->>StatementAnalyzer: QueryWithMVRewriteCandidates
else disabled
MaterializedViewQueryOptimizer-->>StatementAnalyzer: first_compatible_rewrite_or_original
StatementAnalyzer-->>RelationPlanner: QuerySpecification
RelationPlanner-->>IterativeOptimizer: logical_plan
IterativeOptimizer-->>Client: optimized_plan
Client-->>Client: return
end
StatementAnalyzer->>StatementAnalyzer: visitQueryWithMVRewriteCandidates
StatementAnalyzer->>StatementAnalyzer: analyze originalQuery
StatementAnalyzer->>StatementAnalyzer: analyze each MVRewriteCandidate.rewrittenQuery
StatementAnalyzer-->>RelationPlanner: Scope for QueryWithMVRewriteCandidates
RelationPlanner->>RelationPlanner: visitQueryWithMVRewriteCandidates
RelationPlanner->>RelationPlanner: plan originalQuery -> RelationPlan originalPlan
RelationPlanner->>RelationPlanner: plan each candidate.rewrittenQuery -> RelationPlan candidatePlan
RelationPlanner-->>IterativeOptimizer: MVRewriteCandidatesNode
IterativeOptimizer->>SelectLowestCostMVRewrite: apply on MVRewriteCandidatesNode
SelectLowestCostMVRewrite->>Session: isMaterializedViewQueryRewriteCostBasedSelectionEnabled
alt rule_enabled
SelectLowestCostMVRewrite->>SelectLowestCostMVRewrite: compute costs via CostProvider
SelectLowestCostMVRewrite->>SelectLowestCostMVRewrite: compare with CostComparator
SelectLowestCostMVRewrite->>SelectLowestCostMVRewrite: pick lowest_cost_plan
alt outputs_match
SelectLowestCostMVRewrite-->>IterativeOptimizer: selected PlanNode
else outputs_differ
SelectLowestCostMVRewrite->>SelectLowestCostMVRewrite: build Assignments
SelectLowestCostMVRewrite-->>IterativeOptimizer: ProjectNode(selected_plan)
end
else rule_disabled
SelectLowestCostMVRewrite-->>IterativeOptimizer: no_change
end
IterativeOptimizer-->>Client: final_optimized_plan
Class diagram for new MV rewrite AST, plan node, and optimizer ruleclassDiagram
class MaterializedViewQueryOptimizer {
+rewriteQuerySpecificationIfCompatible(querySpecification: QuerySpecification, baseTable: Table) Node
-rewriteWithAllCandidates(originalQuery: QuerySpecification, referencedMaterializedViews: List~QualifiedObjectName~) QueryBody
}
class QueryBody
class QuerySpecification
class QueryWithMVRewriteCandidates {
-QuerySpecification originalQuery
-List~MVRewriteCandidate~ candidates
+QueryWithMVRewriteCandidates(originalQuery: QuerySpecification, candidates: List~MVRewriteCandidate~)
+QueryWithMVRewriteCandidates(location: NodeLocation, originalQuery: QuerySpecification, candidates: List~MVRewriteCandidate~)
+getOriginalQuery() QuerySpecification
+getCandidates() List~MVRewriteCandidate~
+accept(visitor: AstVisitor, context: Object) Object
+getChildren() List~Node~
}
class MVRewriteCandidate_Ast {
<<static inner>>
-QuerySpecification rewrittenQuery
-String materializedViewCatalog
-String materializedViewSchema
-String materializedViewName
+MVRewriteCandidate(rewrittenQuery: QuerySpecification, materializedViewCatalog: String, materializedViewSchema: String, materializedViewName: String)
+getRewrittenQuery() QuerySpecification
+getMaterializedViewCatalog() String
+getMaterializedViewSchema() String
+getMaterializedViewName() String
+getFullyQualifiedName() String
}
class PlanNode {
+getOutputVariables() List~VariableReferenceExpression~
}
class VariableReferenceExpression
class MVRewriteCandidatesNode {
-PlanNode originalPlan
-List~MVRewriteCandidate_Plan~ candidates
-List~VariableReferenceExpression~ outputVariables
+MVRewriteCandidatesNode(sourceLocation: Optional~SourceLocation~, id: PlanNodeId, originalPlan: PlanNode, candidates: List~MVRewriteCandidate_Plan~, outputVariables: List~VariableReferenceExpression~)
+MVRewriteCandidatesNode(sourceLocation: Optional~SourceLocation~, id: PlanNodeId, statsEquivalentPlanNode: Optional~PlanNode~, originalPlan: PlanNode, candidates: List~MVRewriteCandidate_Plan~, outputVariables: List~VariableReferenceExpression~)
+getOriginalPlan() PlanNode
+getCandidates() List~MVRewriteCandidate_Plan~
+getOutputVariables() List~VariableReferenceExpression~
+getSources() List~PlanNode~
+accept(visitor: PlanVisitor, context: Object) Object
+assignStatsEquivalentPlanNode(statsEquivalentPlanNode: Optional~PlanNode~) PlanNode
+replaceChildren(newChildren: List~PlanNode~) PlanNode
}
class MVRewriteCandidate_Plan {
<<static inner>>
-PlanNode plan
-String materializedViewCatalog
-String materializedViewSchema
-String materializedViewName
+MVRewriteCandidate(plan: PlanNode, materializedViewCatalog: String, materializedViewSchema: String, materializedViewName: String)
+getPlan() PlanNode
+getMaterializedViewCatalog() String
+getMaterializedViewSchema() String
+getMaterializedViewName() String
+getFullyQualifiedName() String
}
class Rule~MVRewriteCandidatesNode~
class SelectLowestCostMVRewrite {
-CostComparator costComparator
+SelectLowestCostMVRewrite(costComparator: CostComparator)
+getPattern() Pattern~MVRewriteCandidatesNode~
+isEnabled(session: Session) boolean
+isCostBased(session: Session) boolean
+apply(node: MVRewriteCandidatesNode, captures: Captures, context: Context) Result
}
class CostComparator {
+compare(session: Session, left: PlanCostEstimate, right: PlanCostEstimate) int
}
class CostProvider {
+getCost(node: PlanNode) PlanCostEstimate
}
class ProjectNode {
+ProjectNode(id: PlanNodeId, source: PlanNode, assignments: Assignments)
}
class Assignments {
+builder() Assignments.Builder
}
MaterializedViewQueryOptimizer --> QueryWithMVRewriteCandidates : creates
QueryWithMVRewriteCandidates --* MVRewriteCandidate_Ast : contains
MVRewriteCandidatesNode --* MVRewriteCandidate_Plan : contains
MVRewriteCandidatesNode ..|> PlanNode
QueryWithMVRewriteCandidates ..|> QueryBody
SelectLowestCostMVRewrite ..|> Rule~MVRewriteCandidatesNode~
SelectLowestCostMVRewrite --> MVRewriteCandidatesNode : uses
SelectLowestCostMVRewrite --> CostProvider : uses
SelectLowestCostMVRewrite --> CostComparator : uses
SelectLowestCostMVRewrite --> ProjectNode : may_wrap_selected_plan
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSelectLowestCostMVRewrite.java" line_range="49" />
<code_context>
+ * getStats(node.getSource()), not the filter's own stats.
+ */
+@Test(singleThreaded = true)
+public class TestSelectLowestCostMVRewrite
+{
+ private static final CostComparator COST_COMPARATOR = new CostComparator(1, 1, 1);
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test that verifies the rule is disabled when the session property is false
Currently all tests cover only the cost-based behavior with `materialized_view_query_rewrite_cost_based_selection_enabled` set to `true` in `RuleTester`. Please add a test that runs `SelectLowestCostMVRewrite` with this property disabled (or not set), and asserts that an `MVRewriteCandidatesNode` is not transformed. This will verify that `isEnabled(session)` correctly gates the rule when the feature is turned off.
Suggested implementation:
```java
@BeforeClass
public void setUp()
{
tester = new RuleTester(ImmutableList.of(), ImmutableMap.of(
"materialized_view_query_rewrite_cost_based_selection_enabled", "true"), Optional.of(NODES_COUNT));
}
@Test
public void testRuleDisabledWhenSessionPropertyFalse()
{
RuleTester disabledTester = new RuleTester(
ImmutableList.of(),
ImmutableMap.of("materialized_view_query_rewrite_cost_based_selection_enabled", "false"),
Optional.of(NODES_COUNT));
disabledTester.assertThat(new SelectLowestCostMVRewrite(COST_COMPARATOR))
.on(this::buildPlanWithMvCandidates)
.doesNotFire();
}
```
To make this compile and to ensure the test actually verifies the behavior you described, you should:
1. Reuse the same plan shape you already use for the positive/“cost-based” tests:
- Identify the method or lambda that currently builds a plan containing an `MVRewriteCandidatesNode` which is transformed by `SelectLowestCostMVRewrite` when the rule is enabled.
- Extract that plan construction logic into a helper method with the following signature in this test class:
```java
private PlanNode buildPlanWithMvCandidates(PlanBuilder planBuilder)
```
- Move the existing plan-building code into this helper, and call it both from the existing “rule fires” test(s) and from `testRuleDisabledWhenSessionPropertyFalse`.
2. If you don’t currently have a helper, create it by lifting the body of the `.on(planBuilder -> { ... })` lambda from your main cost-based test, so that:
- The cost-based test still asserts `.matches(...)` or equivalent.
- This new `testRuleDisabledWhenSessionPropertyFalse` uses the exact same plan via `.on(this::buildPlanWithMvCandidates)` but asserts `.doesNotFire()` when the session property is set to `"false"`.
This way, `testRuleDisabledWhenSessionPropertyFalse` will explicitly confirm that `SelectLowestCostMVRewrite.isEnabled(session)` prevents the rule from firing when `materialized_view_query_rewrite_cost_based_selection_enabled` is turned off, while sharing the same MV rewrite candidate plan as the enabled tests.
</issue_to_address>
### Comment 2
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSelectLowestCostMVRewrite.java" line_range="261" />
<code_context>
+ }
+
+ @Test
+ public void testAddsProjectionForDifferentOutputVariables()
+ {
+ PlanNode result = tester.assertThat(new SelectLowestCostMVRewrite(COST_COMPARATOR))
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen projection test by asserting the underlying chosen source as well as the projected outputs
In `testAddsProjectionForDifferentOutputVariables`, the assertions only confirm that a `ProjectNode` is added and that the output variable is re-aliased to `col`. To also validate that the correct MV candidate is chosen, consider asserting that:
- `((ProjectNode) result).getSource()` is a `FilterNode`, and
- The filter’s source `PlanNodeId` equals `"mv1Src"`.
This will ensure the test covers both candidate selection and projection behavior.
Suggested implementation:
```java
assertTrue(result instanceof ProjectNode);
ProjectNode projectNode = (ProjectNode) result;
assertTrue(projectNode.getSource() instanceof FilterNode);
FilterNode filterNode = (FilterNode) projectNode.getSource();
assertEquals(filterNode.getSource().getId(), new PlanNodeId("mv1Src"));
```
1. Ensure `ProjectNode` and `FilterNode` are imported at the top of the file if they are not already:
- `import com.facebook.presto.sql.planner.plan.ProjectNode;`
- `import com.facebook.presto.sql.planner.plan.FilterNode;`
2. If the assertion line in this test is not exactly `assertTrue(result instanceof ProjectNode);`, adjust the SEARCH text to match the existing assertion on `result` being a `ProjectNode` and apply the same REPLACE block.
3. Keep any existing assertions about the projected outputs (e.g., aliasing to `col`) after this new block so the test still validates both the projection and the chosen MV candidate.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| * getStats(node.getSource()), not the filter's own stats. | ||
| */ | ||
| @Test(singleThreaded = true) | ||
| public class TestSelectLowestCostMVRewrite |
There was a problem hiding this comment.
suggestion (testing): Add a test that verifies the rule is disabled when the session property is false
Currently all tests cover only the cost-based behavior with materialized_view_query_rewrite_cost_based_selection_enabled set to true in RuleTester. Please add a test that runs SelectLowestCostMVRewrite with this property disabled (or not set), and asserts that an MVRewriteCandidatesNode is not transformed. This will verify that isEnabled(session) correctly gates the rule when the feature is turned off.
Suggested implementation:
@BeforeClass
public void setUp()
{
tester = new RuleTester(ImmutableList.of(), ImmutableMap.of(
"materialized_view_query_rewrite_cost_based_selection_enabled", "true"), Optional.of(NODES_COUNT));
}
@Test
public void testRuleDisabledWhenSessionPropertyFalse()
{
RuleTester disabledTester = new RuleTester(
ImmutableList.of(),
ImmutableMap.of("materialized_view_query_rewrite_cost_based_selection_enabled", "false"),
Optional.of(NODES_COUNT));
disabledTester.assertThat(new SelectLowestCostMVRewrite(COST_COMPARATOR))
.on(this::buildPlanWithMvCandidates)
.doesNotFire();
}To make this compile and to ensure the test actually verifies the behavior you described, you should:
-
Reuse the same plan shape you already use for the positive/“cost-based” tests:
- Identify the method or lambda that currently builds a plan containing an
MVRewriteCandidatesNodewhich is transformed bySelectLowestCostMVRewritewhen the rule is enabled. - Extract that plan construction logic into a helper method with the following signature in this test class:
private PlanNode buildPlanWithMvCandidates(PlanBuilder planBuilder)
- Move the existing plan-building code into this helper, and call it both from the existing “rule fires” test(s) and from
testRuleDisabledWhenSessionPropertyFalse.
- Identify the method or lambda that currently builds a plan containing an
-
If you don’t currently have a helper, create it by lifting the body of the
.on(planBuilder -> { ... })lambda from your main cost-based test, so that:- The cost-based test still asserts
.matches(...)or equivalent. - This new
testRuleDisabledWhenSessionPropertyFalseuses the exact same plan via.on(this::buildPlanWithMvCandidates)but asserts.doesNotFire()when the session property is set to"false".
- The cost-based test still asserts
This way, testRuleDisabledWhenSessionPropertyFalse will explicitly confirm that SelectLowestCostMVRewrite.isEnabled(session) prevents the rule from firing when materialized_view_query_rewrite_cost_based_selection_enabled is turned off, while sharing the same MV rewrite candidate plan as the enabled tests.
| } | ||
|
|
||
| @Test | ||
| public void testAddsProjectionForDifferentOutputVariables() |
There was a problem hiding this comment.
suggestion (testing): Strengthen projection test by asserting the underlying chosen source as well as the projected outputs
In testAddsProjectionForDifferentOutputVariables, the assertions only confirm that a ProjectNode is added and that the output variable is re-aliased to col. To also validate that the correct MV candidate is chosen, consider asserting that:
((ProjectNode) result).getSource()is aFilterNode, and- The filter’s source
PlanNodeIdequals"mv1Src".
This will ensure the test covers both candidate selection and projection behavior.
Suggested implementation:
assertTrue(result instanceof ProjectNode);
ProjectNode projectNode = (ProjectNode) result;
assertTrue(projectNode.getSource() instanceof FilterNode);
FilterNode filterNode = (FilterNode) projectNode.getSource();
assertEquals(filterNode.getSource().getId(), new PlanNodeId("mv1Src"));- Ensure
ProjectNodeandFilterNodeare imported at the top of the file if they are not already:import com.facebook.presto.sql.planner.plan.ProjectNode;import com.facebook.presto.sql.planner.plan.FilterNode;
- If the assertion line in this test is not exactly
assertTrue(result instanceof ProjectNode);, adjust the SEARCH text to match the existing assertion onresultbeing aProjectNodeand apply the same REPLACE block. - Keep any existing assertions about the projected outputs (e.g., aliasing to
col) after this new block so the test still validates both the projection and the chosen MV candidate.
…odb#27222) Summary: Pull Request resolved: prestodb#27222 Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). Differential Revision: D92582456
…odb#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). Differential Revision: D92582456
37ff330 to
690f15c
Compare
…odb#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). Differential Revision: D92582456
…odb#27222) Summary: Pull Request resolved: prestodb#27222 Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). Differential Revision: D92582456
690f15c to
2ab2c0f
Compare
|
|
…odb#27222) Summary: Pull Request resolved: prestodb#27222 Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). Differential Revision: D92582456
2ab2c0f to
8dccb59
Compare
8dccb59 to
cb57755
Compare
…odb#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). Differential Revision: D92582456
…odb#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). Differential Revision: D92582456
cb57755 to
1278e6a
Compare
steveburnett
left a comment
There was a problem hiding this comment.
"Gated behind session property materialized_view_query_rewrite_cost_based_selection_enabled (default: false)."
Is materialized_view_query_rewrite_cost_based_selection_enabled documented? If not, please add doc for it to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties-session.rst.
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for this change. I left a few initial comments. Also, I'm wondering if it's possible to add a few test cases in TestHiveLogicalPlanner to verify the execution plans generated for real query examples on Hive.
...main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
Show resolved
Hide resolved
...main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
Show resolved
Hide resolved
...main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/sql/planner/optimizations/ApplyConnectorOptimization.java
Outdated
Show resolved
Hide resolved
|
Thank you for the review @hantangwangd! Addressed the comments and added new test cases. |
…#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). Differential Revision: D92582456
38b7282 to
83e50d6
Compare
…#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). ``` == RELEASE NOTES == General Changes * Add cost-based selection for materialized view query rewriting. When multiple materialized views exist for the same base table, the optimizer now evaluates all compatible rewrites and selects the lowest-cost plan. This can be enabled with the ``materialized_view_query_rewrite_cost_based_selection_enabled`` session property. ``` Differential Revision: D92582456
83e50d6 to
4e764aa
Compare
…#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). ``` == RELEASE NOTES == General Changes * Add cost-based selection for materialized view query rewriting. When multiple materialized views exist for the same base table, the optimizer now evaluates all compatible rewrites and selects the lowest-cost plan. This can be enabled with the ``materialized_view_query_rewrite_cost_based_selection_enabled`` session property. ``` Differential Revision: D92582456
4e764aa to
20854f1
Compare
…#27222) Summary: Pull Request resolved: prestodb#27222 Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). ``` == RELEASE NOTES == General Changes * Add cost-based selection for materialized view query rewriting. When multiple materialized views exist for the same base table, the optimizer now evaluates all compatible rewrites and selects the lowest-cost plan. This can be enabled with the ``materialized_view_query_rewrite_cost_based_selection_enabled`` session property. ``` Differential Revision: D92582456
20854f1 to
81c0072
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for adding the test cases, overall looks good to me. Just a few more little things.
| if (!isMaterializedViewDataConsistencyEnabled(session)) { | ||
| session.getRuntimeStats().addMetricValue(OPTIMIZED_WITH_MATERIALIZED_VIEW_SUBQUERY_COUNT, NONE, 1); | ||
| return rewrittenQuerySpecification; | ||
| } | ||
|
|
||
| // TODO: We should be able to leverage this information in the StatementAnalyzer as well. | ||
| MaterializedViewStatus materializedViewStatus = getMaterializedViewStatus(querySpecification); | ||
| if (materializedViewStatus.isPartiallyMaterialized() || materializedViewStatus.isFullyMaterialized()) { | ||
| session.getRuntimeStats().addMetricValue(OPTIMIZED_WITH_MATERIALIZED_VIEW_SUBQUERY_COUNT, NONE, 1); | ||
| return rewrittenQuerySpecification; | ||
| } | ||
| session.getRuntimeStats().addMetricValue(MANY_PARTITIONS_MISSING_IN_MATERIALIZED_VIEW_COUNT, NONE, 1); | ||
| return querySpecification; | ||
| session.getRuntimeStats().addMetricValue(OPTIMIZED_WITH_MATERIALIZED_VIEW_SUBQUERY_COUNT, NONE, 1); | ||
| return rewrittenQuerySpecification; |
There was a problem hiding this comment.
If I understand correctly, not making any changes here means the test cases in TestMaterializedViewQueryOptimizer won't need any adjustments. My initial comment was intended to suggest removing the newly added condition and retaining the original logic. Please let me know if you see it differently.
There was a problem hiding this comment.
Ah I see. My intention behind removing the condition entirely was that the MV status check also happens during analysis, which handles stitching for partially refreshed MVs and falls back to base tables for unmaterialized
views. So the check here seemed redundant.
That said, I am happy to take this as a separate discussion and PR if you'd prefer.
There was a problem hiding this comment.
I've reverted this to the original logic + skipped only for the new cost based flow. We can revisit the broader change as a separate discussion. Please let me know if you're aligned.
if (materializedViewStatus.isPartiallyMaterialized() || materializedViewStatus.isFullyMaterialized()) {
session.getRuntimeStats().addMetricValue(OPTIMIZED_WITH_MATERIALIZED_VIEW_SUBQUERY_COUNT, NONE, 1);
return rewrittenQuerySpecification;
}
session.getRuntimeStats().addMetricValue(MANY_PARTITIONS_MISSING_IN_MATERIALIZED_VIEW_COUNT, NONE, 1);
return querySpecification;
There was a problem hiding this comment.
Thanks for the clarification. Yes, I got your initial point. I agree with you that, this part is worth a separate discussion, particularly around the validation behavior embedded in getMaterializedViewStatus(querySpecification) — as shown in the test case.
Just one last point to touch on: should the new cost-base flow also take into account the materialized view status when isMaterializedViewDataConsistencyEnabled(session) is true? Or, in other words, what if we simply remove the newly added condition that bypasses the subsequent check — || isMaterializedViewQueryRewriteCostBasedSelectionEnabled(session)? Curious to hear your thoughts.
There was a problem hiding this comment.
If we remove the isMaterializedViewQueryRewriteCostBasedSelectionEnabled(session) condition, the cost-based flow would make a redundant getMaterializedViewStatus call for
each MV candidate here, same as the current flow.
But if we want to discuss whether the status check is needed here separately, I can remove the isMaterializedViewQueryRewriteCostBasedSelectionEnabled
check for now and handle both flows together later. What do you think?
There was a problem hiding this comment.
I can remove the isMaterializedViewQueryRewriteCostBasedSelectionEnabled check for now and handle both flows together later.
Thanks for the fix. Sounds good to me!
...-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java
Outdated
Show resolved
Hide resolved
...-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java
Outdated
Show resolved
Hide resolved
…#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). ``` == RELEASE NOTES == General Changes * Add cost-based selection for materialized view query rewriting. When multiple materialized views exist for the same base table, the optimizer now evaluates all compatible rewrites and selects the lowest-cost plan. This can be enabled with the ``materialized_view_query_rewrite_cost_based_selection_enabled`` session property. ``` Differential Revision: D92582456
81c0072 to
46c00d2
Compare
…#27222) Summary: Pull Request resolved: prestodb#27222 Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). ``` == RELEASE NOTES == General Changes * Add cost-based selection for materialized view query rewriting. When multiple materialized views exist for the same base table, the optimizer now evaluates all compatible rewrites and selects the lowest-cost plan. This can be enabled with the ``materialized_view_query_rewrite_cost_based_selection_enabled`` session property. ``` Differential Revision: D92582456
46c00d2 to
60cf276
Compare
…#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). ``` == RELEASE NOTES == General Changes * Add cost-based selection for materialized view query rewriting. When multiple materialized views exist for the same base table, the optimizer now evaluates all compatible rewrites and selects the lowest-cost plan. This can be enabled with the ``materialized_view_query_rewrite_cost_based_selection_enabled`` session property. ``` Differential Revision: D92582456
60cf276 to
651db3b
Compare
…#27222) Summary: Pull Request resolved: prestodb#27222 Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). ``` == RELEASE NOTES == General Changes * Add cost-based selection for materialized view query rewriting. When multiple materialized views exist for the same base table, the optimizer now evaluates all compatible rewrites and selects the lowest-cost plan. This can be enabled with the ``materialized_view_query_rewrite_cost_based_selection_enabled`` session property. ``` Differential Revision: D92582456
651db3b to
6620127
Compare
…#27222) Summary: Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). ``` == RELEASE NOTES == General Changes * Add cost-based selection for materialized view query rewriting. When multiple materialized views exist for the same base table, the optimizer now evaluates all compatible rewrites and selects the lowest-cost plan. This can be enabled with the ``materialized_view_query_rewrite_cost_based_selection_enabled`` session property. ``` Differential Revision: D92582456
6620127 to
85af961
Compare
…#27222) Summary: Pull Request resolved: prestodb#27222 Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan. This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows: **Step 1 — AST Rewriting (`MaterializedViewQueryOptimizer`)** When `materialized_view_query_rewrite_cost_based_selection_enabled` is true, the optimizer calls `rewriteWithAllCandidates()` instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites into `QueryWithMVRewriteCandidates` — a new `QueryBody` AST node that bundles the original `QuerySpecification` with all candidate `MVRewriteCandidate` entries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream. **Step 2 — Semantic Analysis (`StatementAnalyzer`)** A new `visitQueryWithMVRewriteCandidates` handler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope in `Analysis.mvCandidateScopes`. The original query's scope is returned as the node's output scope. **Step 3 — Logical Planning (`RelationPlanner`)** A new `visitQueryWithMVRewriteCandidates` handler plans the original query and each candidate query into separate `RelationPlan` trees. These are bundled into `MVRewriteCandidatesNode` — a new `PlanNode` in the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables. **Step 4 — Cost-Based Optimization (`SelectLowestCostMVRewrite`)** A new `IterativeOptimizer` rule matches `MVRewriteCandidatesNode`. It uses the `CostProvider` to compute costs for the original plan and all candidate plans, then selects the lowest-cost option via `CostComparator`. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, a `ProjectNode` is added to align them. The rule records which plan was selected (original or which MV) for debugging via `getStatsSource()`. New types introduced: - `QueryWithMVRewriteCandidates` (presto-parser) — AST node bundling original query with MV rewrite candidates - `MVRewriteCandidatesNode` (presto-spi) — Plan node for cost-based MV selection - `SelectLowestCostMVRewrite` (presto-main-base) — Optimizer rule selecting lowest-cost plan Gated behind session property `materialized_view_query_rewrite_cost_based_selection_enabled` (default: false). ``` == RELEASE NOTES == General Changes * Add cost-based selection for materialized view query rewriting. When multiple materialized views exist for the same base table, the optimizer now evaluates all compatible rewrites and selects the lowest-cost plan. This can be enabled with the ``materialized_view_query_rewrite_cost_based_selection_enabled`` session property. ``` Differential Revision: D92582456
85af961 to
36d7806
Compare
Summary:
Currently the MaterializedViewQueryOptimizer selects the first compatible materialized view for query rewriting. When multiple MVs exist for the same base table, this leads to suboptimal plan selection since the first MV found may not produce the cheapest execution plan.
This diff introduces cost-based MV selection by deferring the choice to the plan optimizer. Instead of committing to a single MV during AST rewriting, we collect all compatible candidates and let the cost-based optimizer compare their plans. The flow works as follows:
Step 1 — AST Rewriting (
MaterializedViewQueryOptimizer)When
materialized_view_query_rewrite_cost_based_selection_enabledis true, the optimizer callsrewriteWithAllCandidates()instead of returning the first compatible MV. For each referenced MV on the base table, it attempts a rewrite and collects successful rewrites intoQueryWithMVRewriteCandidates— a newQueryBodyAST node that bundles the originalQuerySpecificationwith all candidateMVRewriteCandidateentries (each containing the rewritten query and the MV's fully qualified name). MV data consistency checks are skipped at this stage since the optimizer will handle selection downstream.Step 2 — Semantic Analysis (
StatementAnalyzer)A new
visitQueryWithMVRewriteCandidateshandler analyzes the original query to produce the output scope, then analyzes each candidate's rewritten query and stores its scope inAnalysis.mvCandidateScopes. The original query's scope is returned as the node's output scope.Step 3 — Logical Planning (
RelationPlanner)A new
visitQueryWithMVRewriteCandidateshandler plans the original query and each candidate query into separateRelationPlantrees. These are bundled intoMVRewriteCandidatesNode— a newPlanNodein the SPI that holds the original plan, all candidate plans (each with its MV name metadata), and the output variables.Step 4 — Cost-Based Optimization (
SelectLowestCostMVRewrite)A new
IterativeOptimizerrule matchesMVRewriteCandidatesNode. It uses theCostProviderto compute costs for the original plan and all candidate plans, then selects the lowest-cost option viaCostComparator. Candidates with unknown cost components are skipped. If the selected plan's output variables differ from the expected outputs, aProjectNodeis added to align them. The rule records which plan was selected (original or which MV) for debugging viagetStatsSource().New types introduced:
QueryWithMVRewriteCandidates(presto-parser) — AST node bundling original query with MV rewrite candidatesMVRewriteCandidatesNode(presto-spi) — Plan node for cost-based MV selectionSelectLowestCostMVRewrite(presto-main-base) — Optimizer rule selecting lowest-cost planGated behind session property
materialized_view_query_rewrite_cost_based_selection_enabled(default: false).Differential Revision: D92582456