feat(optimizer): Add SimplifyAggregationsOverConstant iterative rule#27246
Conversation
Reviewer's GuideIntroduces a new iterative optimizer rule, SimplifyAggregationsOverConstant, gated by a session/system property, to fold certain aggregation functions over constant arguments into constants, with accompanying planner, config, session-property wiring, and comprehensive unit/integration tests. Sequence diagram for applying SimplifyAggregationsOverConstant during planningsequenceDiagram
participant Planner
participant PlanOptimizers
participant SimplifyAggregationsOverConstant
participant SystemSessionProperties
participant Session
Planner->>PlanOptimizers: optimize(plan, session)
loop each_rule
PlanOptimizers->>SimplifyAggregationsOverConstant: isEnabled(session)
SimplifyAggregationsOverConstant->>SystemSessionProperties: isSimplifyAggregationsOverConstant(session)
SystemSessionProperties->>Session: getSystemProperty(SIMPLIFY_AGGREGATIONS_OVER_CONSTANT)
Session-->>SystemSessionProperties: property_value
SystemSessionProperties-->>SimplifyAggregationsOverConstant: boolean
alt enabled
PlanOptimizers->>SimplifyAggregationsOverConstant: apply(aggregationNode, captures, context)
SimplifyAggregationsOverConstant-->>PlanOptimizers: Result(newPlanNode or empty)
else disabled
SimplifyAggregationsOverConstant-->>PlanOptimizers: Result.empty
end
end
PlanOptimizers-->>Planner: optimized_plan
Class diagram for SimplifyAggregationsOverConstant optimizer rule and wiringclassDiagram
class SimplifyAggregationsOverConstant {
-StandardFunctionResolution functionResolution
-FunctionAndTypeResolver functionAndTypeResolver
+SimplifyAggregationsOverConstant(FunctionAndTypeManager functionAndTypeManager)
+Pattern getPattern()
+boolean isEnabled(Session session)
+Result apply(AggregationNode node, Captures captures, Context context)
}
class Rule {
<<interface>>
+Pattern getPattern()
+boolean isEnabled(Session session)
}
SimplifyAggregationsOverConstant ..|> Rule
class FeaturesConfig {
-boolean simplifyAggregationsOverConstant
+boolean isSimplifyAggregationsOverConstant()
+FeaturesConfig setSimplifyAggregationsOverConstant(boolean simplifyAggregationsOverConstant)
}
class SystemSessionProperties {
<<final>>
+String SIMPLIFY_AGGREGATIONS_OVER_CONSTANT
+boolean isSimplifyAggregationsOverConstant(Session session)
}
class PlanOptimizers {
+PlanOptimizers(FunctionAndTypeManager functionAndTypeManager)
}
class AggregationNode
class ProjectNode
class ValuesNode
PlanOptimizers --> SimplifyAggregationsOverConstant : creates
SimplifyAggregationsOverConstant --> AggregationNode : transforms
SimplifyAggregationsOverConstant --> ProjectNode : produces
SimplifyAggregationsOverConstant --> ValuesNode : may_replace_with
SystemSessionProperties --> FeaturesConfig : uses_default_from
Flow diagram for SimplifyAggregationsOverConstant.apply logicflowchart TD
A["Start with AggregationNode"] --> B{"Step is SINGLE and has aggregations?"}
B -- "No" --> Z["Return empty result"]
B -- "Yes" --> C["Resolve source PlanNode"]
C --> D["Build ConstantResolver from ProjectNode or ValuesNode"]
D --> E["Iterate aggregations and tryFold each"]
E --> F{"Any aggregation folded?"}
F -- "No" --> Z
F -- "Yes" --> G{"All aggregations folded and no grouping keys?"}
G -- "Yes" --> H["Build single row for output variables using folded constants"]
H --> I["Return ValuesNode"]
G -- "No" --> J["Create new AggregationNode with remaining aggregations"]
J --> K["Build Assignments: pass through newAggregation outputs"]
K --> L["Add folded constants as projections"]
L --> M["Return ProjectNode over newAggregation"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Instead of manually checking the SUM function by comparing qualified names in
isSumFunction, consider usingStandardFunctionResolution(similar to the other aggregation checks) to keep the implementation consistent and less brittle to function renames or aliases. - The
ConstantResolverforProjectNodeandValuesNodeonly folds when the argument is a direct constant or variable mapped to a constant; if you want this rule to catch more cases (e.g., simple deterministic expressions like casts or arithmetic over constants), you could extend the resolver to evaluate such expressions as well.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Instead of manually checking the SUM function by comparing qualified names in `isSumFunction`, consider using `StandardFunctionResolution` (similar to the other aggregation checks) to keep the implementation consistent and less brittle to function renames or aliases.
- The `ConstantResolver` for `ProjectNode` and `ValuesNode` only folds when the argument is a direct constant or variable mapped to a constant; if you want this rule to catch more cases (e.g., simple deterministic expressions like casts or arithmetic over constants), you could extend the resolver to evaluate such expressions as well.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
6ba41f4 to
b76e144
Compare
b76e144 to
12d73c2
Compare
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
SimplifyAggregationsOverConstant.apply, you resolve the source viacontext.getLookup().resolve(node.getSource())for constant analysis, but when constructing the newAggregationNodeyou still usenode.getSource()rather than the resolved source, which can drop intermediate projections or other transformations; consider wiring the resolved source into the rebuilt node to preserve the original plan structure. - The
ConstantResolverforProjectNodeonly treats direct variable→constant assignments as constants; if you want to reliably fold patterns likeMIN(CAST(5 AS BIGINT))or simple deterministic expressions that are already known constants, you may want to extend the resolver to recognize constantRowExpressions beyond justConstantExpressionand simple variable lookups.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `SimplifyAggregationsOverConstant.apply`, you resolve the source via `context.getLookup().resolve(node.getSource())` for constant analysis, but when constructing the new `AggregationNode` you still use `node.getSource()` rather than the resolved source, which can drop intermediate projections or other transformations; consider wiring the resolved source into the rebuilt node to preserve the original plan structure.
- The `ConstantResolver` for `ProjectNode` only treats direct variable→constant assignments as constants; if you want to reliably fold patterns like `MIN(CAST(5 AS BIGINT))` or simple deterministic expressions that are already known constants, you may want to extend the resolver to recognize constant `RowExpression`s beyond just `ConstantExpression` and simple variable lookups.
## Individual Comments
### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyAggregationsOverConstant.java" line_range="148-154" />
<code_context>
+ }
+
+ // Otherwise, remove folded aggregations and project their constants on top
+ AggregationNode newAggregation = new AggregationNode(
+ node.getSourceLocation(),
+ context.getIdAllocator().getNextId(),
+ node.getSource(),
+ remainingAggregations,
+ node.getGroupingSets(),
+ ImmutableList.of(),
+ node.getStep(),
+ node.getHashVariable(),
</code_context>
<issue_to_address>
**issue (bug_risk):** Existing pre-grouped variables are dropped when building the new AggregationNode
Here `preGroupedVariables` is reset to `ImmutableList.of()` instead of reusing `node.getPreGroupedVariables()`. If the original node depended on pre-grouping (e.g., partially pre-aggregated input), this can change aggregation semantics and produce incorrect results or worse plans. Please preserve `node.getPreGroupedVariables()` when constructing the new `AggregationNode`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...in/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyAggregationsOverConstant.java
Outdated
Show resolved
Hide resolved
210d77f to
9075fad
Compare
|
Friendly ping @jaystarshot @feilong-liu @elharo — this PR has been approved by @steveburnett and all CI checks are passing. Could you take a look when you get a chance? Thanks! |
.claude/CLAUDE.md
Outdated
| @@ -0,0 +1,14 @@ | |||
| # Presto Development Notes | |||
There was a problem hiding this comment.
This needs to be removed
Add a new optimizer rule that folds aggregation functions to constants when the argument is a constant, regardless of source cardinality. Only functions whose result is independent of row count are folded: - MIN(constant) -> constant - MAX(constant) -> constant - ARBITRARY(constant) -> constant - APPROX_DISTINCT(constant) -> 1 (non-null) or 0 (null) SUM and COUNT are NOT folded since their results depend on row count (e.g., SUM(5) over N rows = 5*N). The rule works with any grouping (global or GROUP BY): foldable aggregations are removed and projected as constants on top. Bails out for aggregations with FILTER, mask, DISTINCT, or ORDER BY clauses. Gated behind session property simplify_aggregations_over_constant (default OFF).
9075fad to
fee40e3
Compare
|
Hi @kaikalur, thanks for this PR! As part of the release process — do you think this change warrants a release note? If so, would you like to add one? |
Summary
SimplifyAggregationsOverConstantthat folds aggregation functions to constants when the argument is a constant, regardless of source cardinalitysimplify_aggregations_over_constant(default OFF)Test plan
TestSimplifyAggregationsOverConstant— 18 tests covering MIN/MAX/ARBITRARY/APPROX_DISTINCT folding, SUM/COUNT non-folding, partial step, disabled session, filtered aggregation, non-constant args, mixed foldable/unfoldable, GROUP BY, non-scalar sourceTestFeaturesConfig— default and non-default config valuesTestPruneCountAggregationOverScalar— still passing (no conflicts)testSimplifyAggregationsOverConstantinAbstractTestQueries— tests with real tables comparing enabled vs disabled sessionsSummary by Sourcery
Add a gated optimizer rule that simplifies certain aggregations over constant expressions and validate it with planner and query tests.
New Features:
Enhancements:
Documentation:
Tests:
Chores: