Skip to content

feat(optimizer): Add SimplifyAggregationsOverConstant iterative rule#27246

Merged
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:simplify-aggregations-over-scalar
Mar 7, 2026
Merged

feat(optimizer): Add SimplifyAggregationsOverConstant iterative rule#27246
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:simplify-aggregations-over-scalar

Conversation

@kaikalur
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur commented Mar 2, 2026

Summary

  • Add a new optimizer rule SimplifyAggregationsOverConstant that folds aggregation functions to constants when the argument is a constant, regardless of source cardinality
  • Only folds functions whose result is independent of row count: MIN, MAX, ARBITRARY, APPROX_DISTINCT
  • SUM and COUNT are NOT folded (their results depend on N rows)
  • Works with global and GROUP BY aggregations; bails out for FILTER/mask/DISTINCT/ORDER BY
  • Gated behind session property simplify_aggregations_over_constant (default OFF)

Test plan

  • Unit tests: TestSimplifyAggregationsOverConstant — 18 tests covering MIN/MAX/ARBITRARY/APPROX_DISTINCT folding, SUM/COUNT non-folding, partial step, disabled session, filtered aggregation, non-constant args, mixed foldable/unfoldable, GROUP BY, non-scalar source
  • Config tests: TestFeaturesConfig — default and non-default config values
  • Existing rule tests: TestPruneCountAggregationOverScalar — still passing (no conflicts)
  • E2E tests: testSimplifyAggregationsOverConstant in AbstractTestQueries — tests with real tables comparing enabled vs disabled sessions

Summary by Sourcery

Add a gated optimizer rule that simplifies certain aggregations over constant expressions and validate it with planner and query tests.

New Features:

  • Introduce the SimplifyAggregationsOverConstant iterative optimizer rule to fold eligible aggregations over constant arguments to constants.
  • Add a configurable feature flag and session property to enable or disable aggregation simplification over constants.

Enhancements:

  • Wire the new aggregation simplification rule into the standard optimizer pipeline.

Documentation:

  • Add internal development notes documenting expectations around optimizer rule gating and git workflow.

Tests:

  • Add planner rule tests for SimplifyAggregationsOverConstant covering foldable/non-foldable functions, grouping, partial steps, filters, and disabled sessions.
  • Add end-to-end query tests verifying consistent results with the aggregation simplification feature enabled and disabled.
  • Extend FeaturesConfig tests to cover defaults and explicit mappings for the new optimizer.simplify-aggregations-over-constant property.

Chores:

  • Add a Claude-specific development notes file under .claude with project guidelines.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 2, 2026

Reviewer's Guide

Introduces a new iterative optimizer rule, SimplifyAggregationsOverConstant, gated by a session/system property, to fold certain aggregation functions over constant arguments into constants, with accompanying planner, config, session-property wiring, and comprehensive unit/integration tests.

Sequence diagram for applying SimplifyAggregationsOverConstant during planning

sequenceDiagram
participant Planner
participant PlanOptimizers
participant SimplifyAggregationsOverConstant
participant SystemSessionProperties
participant Session

Planner->>PlanOptimizers: optimize(plan, session)
loop each_rule
  PlanOptimizers->>SimplifyAggregationsOverConstant: isEnabled(session)
  SimplifyAggregationsOverConstant->>SystemSessionProperties: isSimplifyAggregationsOverConstant(session)
  SystemSessionProperties->>Session: getSystemProperty(SIMPLIFY_AGGREGATIONS_OVER_CONSTANT)
  Session-->>SystemSessionProperties: property_value
  SystemSessionProperties-->>SimplifyAggregationsOverConstant: boolean
  alt enabled
    PlanOptimizers->>SimplifyAggregationsOverConstant: apply(aggregationNode, captures, context)
    SimplifyAggregationsOverConstant-->>PlanOptimizers: Result(newPlanNode or empty)
  else disabled
    SimplifyAggregationsOverConstant-->>PlanOptimizers: Result.empty
  end
end
PlanOptimizers-->>Planner: optimized_plan
Loading

Class diagram for SimplifyAggregationsOverConstant optimizer rule and wiring

classDiagram
class SimplifyAggregationsOverConstant {
  -StandardFunctionResolution functionResolution
  -FunctionAndTypeResolver functionAndTypeResolver
  +SimplifyAggregationsOverConstant(FunctionAndTypeManager functionAndTypeManager)
  +Pattern getPattern()
  +boolean isEnabled(Session session)
  +Result apply(AggregationNode node, Captures captures, Context context)
}

class Rule {
  <<interface>>
  +Pattern getPattern()
  +boolean isEnabled(Session session)
}

SimplifyAggregationsOverConstant ..|> Rule

class FeaturesConfig {
  -boolean simplifyAggregationsOverConstant
  +boolean isSimplifyAggregationsOverConstant()
  +FeaturesConfig setSimplifyAggregationsOverConstant(boolean simplifyAggregationsOverConstant)
}

class SystemSessionProperties {
  <<final>>
  +String SIMPLIFY_AGGREGATIONS_OVER_CONSTANT
  +boolean isSimplifyAggregationsOverConstant(Session session)
}

class PlanOptimizers {
  +PlanOptimizers(FunctionAndTypeManager functionAndTypeManager)
}

class AggregationNode
class ProjectNode
class ValuesNode

PlanOptimizers --> SimplifyAggregationsOverConstant : creates
SimplifyAggregationsOverConstant --> AggregationNode : transforms
SimplifyAggregationsOverConstant --> ProjectNode : produces
SimplifyAggregationsOverConstant --> ValuesNode : may_replace_with
SystemSessionProperties --> FeaturesConfig : uses_default_from
Loading

Flow diagram for SimplifyAggregationsOverConstant.apply logic

flowchart TD
  A["Start with AggregationNode"] --> B{"Step is SINGLE and has aggregations?"}
  B -- "No" --> Z["Return empty result"]
  B -- "Yes" --> C["Resolve source PlanNode"]
  C --> D["Build ConstantResolver from ProjectNode or ValuesNode"]
  D --> E["Iterate aggregations and tryFold each"]
  E --> F{"Any aggregation folded?"}
  F -- "No" --> Z
  F -- "Yes" --> G{"All aggregations folded and no grouping keys?"}
  G -- "Yes" --> H["Build single row for output variables using folded constants"]
  H --> I["Return ValuesNode"]
  G -- "No" --> J["Create new AggregationNode with remaining aggregations"]
  J --> K["Build Assignments: pass through newAggregation outputs"]
  K --> L["Add folded constants as projections"]
  L --> M["Return ProjectNode over newAggregation"]
Loading

File-Level Changes

Change Details Files
Add SimplifyAggregationsOverConstant iterative rule to fold eligible aggregations over constant arguments into constants, including partial folding when only some aggregations qualify.
  • Implement Rule that matches SINGLE-step aggregations and checks a new session-gated isEnabled condition.
  • Introduce ConstantResolver helper logic to resolve constant arguments from ProjectNode assignments, single-row ValuesNode outputs, or literal constants.
  • Implement tryFold logic to fold MIN/MAX/ARBITRARY directly to the constant (or NULL) and APPROX_DISTINCT to 1 or 0 depending on argument nullability, while bailing out on filters, masks, DISTINCT, ORDER BY, or unsupported signatures.
  • Rewrite aggregation nodes by removing folded aggregations, projecting their constant results above the new aggregation, or replacing the whole node with a ValuesNode when everything folds and there are no grouping keys.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyAggregationsOverConstant.java
Wire the new optimizer rule into the planner pipeline and guard it via a configurable feature flag and session property.
  • Register SimplifyAggregationsOverConstant in PlanOptimizers alongside other scalar/aggregation simplification rules.
  • Add simplifyAggregationsOverConstant boolean field, getter, and @config mapping optimizer.simplify-aggregations-over-constant to FeaturesConfig with default false.
  • Expose a new system session property simplify_aggregations_over_constant backed by the FeaturesConfig flag via SystemSessionProperties, including a constant name and accessor method.
  • Ensure existing configuration mapping tests cover default and explicit property mappings for the new feature flag.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestFeaturesConfig.java
Add unit tests for the SimplifyAggregationsOverConstant rule to validate folding behavior, non-folding cases, grouping semantics, and interaction with the session flag.
  • Create TestSimplifyAggregationsOverConstant covering MIN/MAX/ARBITRARY/APPROX_DISTINCT folding over constants for scalar and non-scalar sources.
  • Verify that SUM, COUNT, COUNT(*), non-constant arguments, PARTIAL aggregation step, filters, and non-constant sources do not trigger the rule.
  • Test behavior when the session flag is disabled to ensure the rule does not fire.
  • Validate partial folding when a mixed set of aggregations contains both foldable and non-foldable functions, including preservation of remaining aggregations in the plan shape.
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSimplifyAggregationsOverConstant.java
Add end-to-end style query tests to ensure correctness with real tables and session toggling for the new optimization.
  • Introduce testSimplifyAggregationsOverConstant in AbstractTestQueries that runs the same queries with the session property enabled and disabled using assertQueryWithSameQueryRunner.
  • Cover MIN/MAX/ARBITRARY/APPROX_DISTINCT over constant projections, including NULL constants and DOUBLE-type constants, both with and without GROUP BY.
  • Add cases with mixed foldable/unfoldable aggregations, filtered aggregations that must not fold, and fully non-constant aggregations to ensure behavior is unchanged.
  • Use orders table projections and scalar subqueries to validate the rule across different plan shapes and source cardinalities.
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java
Document optimizer rule gating expectations and workflow for future development.
  • Add a .claude/CLAUDE.md documenting that new optimizer rules must be gated by a session property defaulting to OFF and outlining the typical configuration pattern.
  • Include notes on the recommended git workflow (squash commits via rebase) for Presto development.
.claude/CLAUDE.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Instead of manually checking the SUM function by comparing qualified names in isSumFunction, consider using StandardFunctionResolution (similar to the other aggregation checks) to keep the implementation consistent and less brittle to function renames or aliases.
  • The ConstantResolver for ProjectNode and ValuesNode only folds when the argument is a direct constant or variable mapped to a constant; if you want this rule to catch more cases (e.g., simple deterministic expressions like casts or arithmetic over constants), you could extend the resolver to evaluate such expressions as well.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Instead of manually checking the SUM function by comparing qualified names in `isSumFunction`, consider using `StandardFunctionResolution` (similar to the other aggregation checks) to keep the implementation consistent and less brittle to function renames or aliases.
- The `ConstantResolver` for `ProjectNode` and `ValuesNode` only folds when the argument is a direct constant or variable mapped to a constant; if you want this rule to catch more cases (e.g., simple deterministic expressions like casts or arithmetic over constants), you could extend the resolver to evaluate such expressions as well.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@kaikalur kaikalur marked this pull request as draft March 2, 2026 19:36
@kaikalur kaikalur force-pushed the simplify-aggregations-over-scalar branch from 6ba41f4 to b76e144 Compare March 2, 2026 20:37
@kaikalur kaikalur changed the title feat(optimizer): Add SimplifyAggregationsOverConstantScalar iterative rule feat(optimizer): Add SimplifyAggregationsOverConstant iterative rule Mar 2, 2026
@kaikalur kaikalur force-pushed the simplify-aggregations-over-scalar branch from b76e144 to 12d73c2 Compare March 3, 2026 00:06
@kaikalur kaikalur marked this pull request as ready for review March 3, 2026 00:08
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In SimplifyAggregationsOverConstant.apply, you resolve the source via context.getLookup().resolve(node.getSource()) for constant analysis, but when constructing the new AggregationNode you still use node.getSource() rather than the resolved source, which can drop intermediate projections or other transformations; consider wiring the resolved source into the rebuilt node to preserve the original plan structure.
  • The ConstantResolver for ProjectNode only treats direct variable→constant assignments as constants; if you want to reliably fold patterns like MIN(CAST(5 AS BIGINT)) or simple deterministic expressions that are already known constants, you may want to extend the resolver to recognize constant RowExpressions beyond just ConstantExpression and simple variable lookups.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `SimplifyAggregationsOverConstant.apply`, you resolve the source via `context.getLookup().resolve(node.getSource())` for constant analysis, but when constructing the new `AggregationNode` you still use `node.getSource()` rather than the resolved source, which can drop intermediate projections or other transformations; consider wiring the resolved source into the rebuilt node to preserve the original plan structure.
- The `ConstantResolver` for `ProjectNode` only treats direct variable→constant assignments as constants; if you want to reliably fold patterns like `MIN(CAST(5 AS BIGINT))` or simple deterministic expressions that are already known constants, you may want to extend the resolver to recognize constant `RowExpression`s beyond just `ConstantExpression` and simple variable lookups.

## Individual Comments

### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyAggregationsOverConstant.java" line_range="148-154" />
<code_context>
+        }
+
+        // Otherwise, remove folded aggregations and project their constants on top
+        AggregationNode newAggregation = new AggregationNode(
+                node.getSourceLocation(),
+                context.getIdAllocator().getNextId(),
+                node.getSource(),
+                remainingAggregations,
+                node.getGroupingSets(),
+                ImmutableList.of(),
+                node.getStep(),
+                node.getHashVariable(),
</code_context>
<issue_to_address>
**issue (bug_risk):** Existing pre-grouped variables are dropped when building the new AggregationNode

Here `preGroupedVariables` is reset to `ImmutableList.of()` instead of reusing `node.getPreGroupedVariables()`. If the original node depended on pre-grouping (e.g., partially pre-aggregated input), this can change aggregation semantics and produce incorrect results or worse plans. Please preserve `node.getPreGroupedVariables()` when constructing the new `AggregationNode`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@kaikalur kaikalur force-pushed the simplify-aggregations-over-scalar branch 4 times, most recently from 210d77f to 9075fad Compare March 3, 2026 14:34
steveburnett
steveburnett previously approved these changes Mar 3, 2026
@kaikalur
Copy link
Copy Markdown
Contributor Author

kaikalur commented Mar 3, 2026

Friendly ping @jaystarshot @feilong-liu @elharo — this PR has been approved by @steveburnett and all CI checks are passing. Could you take a look when you get a chance? Thanks!

@@ -0,0 +1,14 @@
# Presto Development Notes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed

Add a new optimizer rule that folds aggregation functions to constants
when the argument is a constant, regardless of source cardinality.
Only functions whose result is independent of row count are folded:

- MIN(constant) -> constant
- MAX(constant) -> constant
- ARBITRARY(constant) -> constant
- APPROX_DISTINCT(constant) -> 1 (non-null) or 0 (null)

SUM and COUNT are NOT folded since their results depend on row count
(e.g., SUM(5) over N rows = 5*N).

The rule works with any grouping (global or GROUP BY): foldable
aggregations are removed and projected as constants on top. Bails out
for aggregations with FILTER, mask, DISTINCT, or ORDER BY clauses.

Gated behind session property simplify_aggregations_over_constant
(default OFF).
@kaikalur kaikalur force-pushed the simplify-aggregations-over-scalar branch from 9075fad to fee40e3 Compare March 7, 2026 06:24
@kaikalur kaikalur merged commit c02223f into prestodb:master Mar 7, 2026
80 checks passed
@ethanyzhang ethanyzhang added the from:Meta PR from Meta label Mar 25, 2026
@hantangwangd
Copy link
Copy Markdown
Member

Hi @kaikalur, thanks for this PR! As part of the release process — do you think this change warrants a release note? If so, would you like to add one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants