Skip to content

feat(optimizer): Pre-aggregate before GroupId to reduce row multiplication#27290

Merged
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:pre-aggregate-before-grouping-sets
Mar 17, 2026
Merged

feat(optimizer): Pre-aggregate before GroupId to reduce row multiplication#27290
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:pre-aggregate-before-grouping-sets

Conversation

@kaikalur
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur commented Mar 8, 2026

Summary

  • Adds a new iterative optimizer rule PreAggregateBeforeGroupId that inserts a PARTIAL aggregation below GroupIdNode to reduce the number of rows that GroupId multiplies across grouping sets
  • Fixes a bug in HashAggregationOperator where SkipAggregationBuilder could be incorrectly used for INTERMEDIATE aggregations
  • Gated behind session property pre_aggregate_before_grouping_sets (disabled by default)

Test plan

  • Unit tests for the optimizer rule (6 tests covering positive and negative cases)
  • Integration tests with real GROUPING SETS queries on the orders table
  • TestFeaturesConfig updated and passing
  • Full CI validation
== RELEASE NOTES ==

General Changes
* Add optimizer rule to pre-aggregate data before GroupId node in grouping sets queries, reducing row multiplication. Enabled via session property ``pre_aggregate_before_grouping_sets``. (:pr:`27290`)
* Fix a bug where adaptive partial aggregation could incorrectly bypass INTERMEDIATE aggregation steps.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 8, 2026

Reviewer's Guide

Implements a new PreAggregateBeforeGroupId iterative optimizer rule that inserts a partial aggregation below GroupId for grouping sets queries, adds determinism-aware nested IF simplification, tightens adaptive partial aggregation conditions to only apply on RAW→PARTIAL steps, and wires everything behind a new optimizer.pre-aggregate-before-grouping-sets feature flag and corresponding session property with tests.

Class diagram for PreAggregateBeforeGroupId optimizer rule and related planner components

classDiagram
    class PreAggregateBeforeGroupId {
        - FunctionAndTypeManager functionAndTypeManager
        + PreAggregateBeforeGroupId(FunctionAndTypeManager functionAndTypeManager)
        + Pattern getPattern()
        + boolean isEnabled(Session session)
        + Result apply(AggregationNode node, Captures captures, Context context)
        - boolean isLambda(RowExpression rowExpression)
    }

    class Rule {
        <<interface>>
    }

    class AggregationNode {
        + Map aggregations
        + Step getStep()
        + Map getAggregations()
        + List getGroupingSets()
    }

    class GroupIdNode {
        + Map getGroupingColumns()
        + List getGroupingSets()
        + List getAggregationArguments()
        + VariableReferenceExpression getGroupIdVariable()
        + PlanNode getSource()
    }

    class FunctionAndTypeManager {
        + FunctionMetadata getFunctionMetadata(FunctionHandle functionHandle)
        + AggregationFunctionImplementation getAggregateFunctionImplementation(FunctionHandle functionHandle)
    }

    class Aggregation {
        + FunctionHandle getFunctionHandle()
        + CallExpression getCall()
        + List getArguments()
        + Optional getFilter()
        + Optional getOrderBy()
        + boolean isDistinct()
        + Optional getMask()
    }

    class CallExpression {
        + RowExpression getSourceLocation()
        + List getArguments()
    }

    class VariableReferenceExpression {
    }

    class PlanNode {
    }

    class Session {
    }

    class FeaturesConfig {
        - boolean preAggregateBeforeGroupingSets
        + boolean isPreAggregateBeforeGroupingSets()
        + FeaturesConfig setPreAggregateBeforeGroupingSets(boolean preAggregateBeforeGroupingSets)
    }

    class SystemSessionProperties {
        + String PRE_AGGREGATE_BEFORE_GROUPING_SETS
        + boolean isPreAggregateBeforeGroupingSets(Session session)
    }

    class PlanOptimizers {
        + PlanOptimizers(...)
    }

    class HashAggregationOperator {
        - AggregationBuilder aggregationBuilder
        + void initializeAggregationBuilderIfNeeded()
    }

    class PartialAggregationController {
        + boolean isPartialAggregationDisabled()
    }

    class SkipAggregationBuilder {
    }

    class LocalExecutionPlanner {
        + Optional createPartialAggregationController(Optional maxPartialAggregationMemorySize, AggregationNode.Step step, Session session)
    }

    Rule <|.. PreAggregateBeforeGroupId
    PlanNode <|-- AggregationNode
    PlanNode <|-- GroupIdNode

    PreAggregateBeforeGroupId --> AggregationNode
    PreAggregateBeforeGroupId --> GroupIdNode
    PreAggregateBeforeGroupId --> FunctionAndTypeManager
    PreAggregateBeforeGroupId --> Session

    AggregationNode --> Aggregation
    Aggregation --> CallExpression
    Aggregation --> FunctionHandle
    CallExpression --> RowExpression
    AggregationNode --> VariableReferenceExpression
    GroupIdNode --> VariableReferenceExpression
    GroupIdNode --> PlanNode

    FunctionAndTypeManager --> AggregationFunctionImplementation

    SystemSessionProperties --> Session
    SystemSessionProperties --> FeaturesConfig

    PlanOptimizers --> PreAggregateBeforeGroupId

    HashAggregationOperator --> AggregationNode.Step
    HashAggregationOperator --> PartialAggregationController
    HashAggregationOperator --> SkipAggregationBuilder

    LocalExecutionPlanner --> AggregationNode.Step
    LocalExecutionPlanner --> PartialAggregationController
    LocalExecutionPlanner --> Session
Loading

Class diagram for SimplifyRowExpressions nested IF simplification

classDiagram
    class SimplifyRowExpressions {
        + SimplifyRowExpressions(Metadata metadata, ExpressionOptimizerManager expressionOptimizerManager)
        + static RowExpression rewrite(RowExpression expression, Metadata metadata, ExpressionOptimizerManager expressionOptimizerManager, Session session)
    }

    class Rewriter {
        - NestedIfSimplifier nestedIfSimplifier
        - ExpressionOptimizerManager expressionOptimizerManager
        - LogicalExpressionRewriter logicalExpressionRewriter
        + Rewriter(Metadata metadata, ExpressionOptimizerManager expressionOptimizerManager)
        + RowExpression rewrite(RowExpression expression, Session session)
    }

    class NestedIfSimplifier {
        - RowExpressionDeterminismEvaluator determinismEvaluator
        + NestedIfSimplifier(RowExpressionDeterminismEvaluator determinismEvaluator)
        + RowExpression rewriteSpecialForm(SpecialFormExpression node, Void context, RowExpressionTreeRewriter treeRewriter)
    }

    class LogicalExpressionRewriter {
        + RowExpression rewrite(RowExpression expression, Boolean context, RowExpressionTreeRewriter treeRewriter)
    }

    class RowExpressionTreeRewriter {
        + static RowExpression rewriteWith(RowExpressionRewriter rewriter, RowExpression expression)
        + static RowExpression rewriteWith(RowExpressionRewriter rewriter, RowExpression expression, boolean context)
        + RowExpression defaultRewrite(SpecialFormExpression node, Void context)
    }

    class RowExpressionRewriter {
        <<interface>>
    }

    class RowExpressionDeterminismEvaluator {
        + boolean isDeterministic(RowExpression expression)
    }

    class SpecialFormExpression {
        + Form getForm()
        + List getArguments()
        + RowExpression getType()
    }

    class RowExpression {
    }

    class Session {
    }

    class Metadata {
        + FunctionAndTypeManager getFunctionAndTypeManager()
    }

    class ExpressionOptimizerManager {
        + ExpressionOptimizer getExpressionOptimizer(ConnectorSession connectorSession)
    }

    class ExpressionOptimizer {
        + RowExpression optimize(RowExpression expression, OptimizationLevel level, ConnectorSession connectorSession)
    }

    class ConnectorSession {
    }

    class FunctionAndTypeManager {
    }

    SimplifyRowExpressions --> Rewriter
    Rewriter --> NestedIfSimplifier
    Rewriter --> LogicalExpressionRewriter
    Rewriter --> ExpressionOptimizerManager
    Rewriter --> Metadata

    NestedIfSimplifier --> RowExpressionDeterminismEvaluator
    NestedIfSimplifier ..|> RowExpressionRewriter

    LogicalExpressionRewriter ..|> RowExpressionRewriter

    RowExpressionTreeRewriter --> RowExpressionRewriter
    RowExpressionTreeRewriter --> RowExpression
    RowExpressionTreeRewriter --> SpecialFormExpression

    RowExpressionDeterminismEvaluator --> RowExpression

    SpecialFormExpression --> RowExpression

    SimplifyRowExpressions --> Metadata
    SimplifyRowExpressions --> ExpressionOptimizerManager
    SimplifyRowExpressions --> Session

    ExpressionOptimizerManager --> ExpressionOptimizer
    ExpressionOptimizer --> RowExpression
    ExpressionOptimizer --> ConnectorSession
Loading

Flow diagram for plan transformation by PreAggregateBeforeGroupId rule

flowchart LR
    subgraph OriginalPlan
        A1[Source]
        B1[GroupIdNode
            grouping_sets]
        C1[AggregationNode
            step SINGLE
            group_by grouping_set_keys + groupId]
        A1 --> B1 --> C1
    end

    subgraph TransformedPlanWithPreAggregation
        A2[Source]
        B2[AggregationNode
            PARTIAL
            group_by union_of_all_grouping_set_columns]
        C2[GroupIdNode
            grouping_sets]
        D2[AggregationNode
            INTERMEDIATE
            group_by grouping_set_keys + groupId]
        A2 --> B2 --> C2 --> D2
    end

    subgraph RuleApplicationConditions
        R0[Session property
            pre_aggregate_before_grouping_sets enabled]
        R1[AggregationNode.step == SINGLE]
        R2[All aggregations decomposable]
    end

    OriginalPlan -->|Iterative optimizer| PreRule[PreAggregateBeforeGroupId]

    PreRule -->|When R0 and R1 and R2| TransformedPlanWithPreAggregation
Loading

File-Level Changes

Change Details Files
Add PreAggregateBeforeGroupId optimizer rule and wire it into the optimizer pipeline behind a feature flag/session property.
  • Introduce PreAggregateBeforeGroupId rule that matches SINGLE-step Aggregation over GroupId and rewrites it into PARTIAL aggregation below GroupId plus INTERMEDIATE aggregation above GroupId, grouping PARTIAL by the union of grouping-set keys and only for decomposable aggregates.
  • Register PreAggregateBeforeGroupId in PlanOptimizers so it runs in the iterative optimizer sequence for scalar/connector planning.
  • Add unit tests (TestPreAggregateBeforeGroupId) covering enabled/disabled behavior, step filtering, non-GroupId sources, DISTINCT aggregates, and multiple aggregations.
  • Add end-to-end AbstractTestQueries test enabling pre_aggregate_before_grouping_sets and validating GROUPING SETS query results on the orders table.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PreAggregateBeforeGroupId.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestPreAggregateBeforeGroupId.java
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java
Introduce configuration and session plumbing for pre-aggregating before grouping sets.
  • Add optimizer.pre-aggregate-before-grouping-sets boolean to FeaturesConfig with default false and expose getter/setter.
  • Add PRE_AGGREGATE_BEFORE_GROUPING_SETS system session property and accessor, defaulting from FeaturesConfig.
  • Extend TestFeaturesConfig defaults and explicit property mappings to cover the new feature flag.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestFeaturesConfig.java
Extend SimplifyRowExpressions with a determinism-aware nested IF simplifier to flatten redundant IF structures.
  • Add NestedIfSimplifier RowExpressionRewriter that rewrites IF(x, IF(y, v, E), E) to IF(x AND y, v, E) when the inner condition is deterministic, performing bottom-up rewriting to handle deep nests.
  • Instantiate NestedIfSimplifier in SimplifyRowExpressions.Rewriter and run it after LogicalExpressionRewriter but before the main expression optimizer.
  • Add TestSimplifyRowExpressions.testSimplifyNestedIf with cases covering null/omitted ELSE, multiple-level nesting, equal/unequal ELSE branches, and non-IF true branches.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyRowExpressions.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSimplifyRowExpressions.java
Tighten adaptive partial aggregation so SkipAggregationBuilder is only used on RAW→PARTIAL aggregation steps.
  • Guard HashAggregationOperator’s use of SkipAggregationBuilder with step.isInputRaw() along with step.isOutputPartial() and the partial-aggregation-disabled flag, preventing its use on INTERMEDIATE steps.
  • Require step.isInputRaw() in LocalExecutionPlanner when creating PartialAggregationController so adaptive partial aggregation only applies to RAW input stages.
presto-main-base/src/main/java/com/facebook/presto/operator/HashAggregationOperator.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/LocalExecutionPlanner.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@kaikalur kaikalur marked this pull request as ready for review March 8, 2026 03:48
@kaikalur kaikalur requested review from a team, elharo, feilong-liu and jaystarshot as code owners March 8, 2026 03:48
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • In PreAggregateBeforeGroupId, the INTERMEDIATE aggregation builds CallExpressions using function.getFinalType(), but INTERMEDIATE steps should normally operate on and produce the intermediate type; consider using the intermediate type for the call/output or changing the step to FINAL if you intend to produce final results there.
  • The logic that rewrites GroupIdNode.aggregationArguments to intermediate variables walks all original aggregations and matches arguments by equality; this can be fragile for more complex RowExpressions (e.g., lambdas, casts, or reused variables), so it may be safer to derive the mapping from the AggregationNode’s output variables directly rather than pattern-matching on argument expressions.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In PreAggregateBeforeGroupId, the INTERMEDIATE aggregation builds CallExpressions using function.getFinalType(), but INTERMEDIATE steps should normally operate on and produce the intermediate type; consider using the intermediate type for the call/output or changing the step to FINAL if you intend to produce final results there.
- The logic that rewrites GroupIdNode.aggregationArguments to intermediate variables walks all original aggregations and matches arguments by equality; this can be fragile for more complex RowExpressions (e.g., lambdas, casts, or reused variables), so it may be safer to derive the mapping from the AggregationNode’s output variables directly rather than pattern-matching on argument expressions.

## Individual Comments

### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PreAggregateBeforeGroupId.java" line_range="139-148" />
<code_context>
+        for (Map.Entry<VariableReferenceExpression, Aggregation> entry : node.getAggregations().entrySet()) {
</code_context>
<issue_to_address>
**issue (bug_risk):** Aggregation arguments in the new PARTIAL node are not remapped from GroupId outputs to source columns, which can break variable references.

The partial aggregation is built on `groupIdNode.getSource()`, but `originalAggregation.getArguments()` refer to `GroupIdNode` output variables, which are out of scope below `GroupId`. Despite the comment claiming arguments are mapped to source variables, they are passed through unchanged. For grouping keys and aggregation arguments rewritten by `GroupId`, this will produce invalid references. The partial aggregation arguments need to be remapped to source variables using `groupIdNode.getGroupingColumns()` and `groupIdNode.getAggregationArguments()` before creating the `CallExpression`.
</issue_to_address>

### Comment 2
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PreAggregateBeforeGroupId.java" line_range="218-227" />
<code_context>
+        Map<VariableReferenceExpression, Aggregation> intermediateAggregations = new HashMap<>();
</code_context>
<issue_to_address>
**issue (bug_risk):** The INTERMEDIATE aggregation uses the final output type and mixes intermediate state with original arguments, which is inconsistent with decomposed aggregation semantics.

For INTERMEDIATE, the aggregation should consume and produce only the intermediate state; the FINAL step produces the final type. Here the `CallExpression` is built with `function.getFinalType()` and arguments `[intermediateVariable] + original lambdas`, so the INTERMEDIATE step incorrectly returns the final type and reuses lambda arguments that should already be captured in the intermediate state. This can violate type assumptions in the planner/execution. Please change the INTERMEDIATE call to use the intermediate type as its return type and restrict its arguments to the intermediate state variables (only passing lambdas if the implementation explicitly requires them).
</issue_to_address>

### Comment 3
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PreAggregateBeforeGroupId.java" line_range="79-81" />
<code_context>
+{
+    private static final Capture<GroupIdNode> GROUP_ID = newCapture();
+
+    private static final Pattern<AggregationNode> PATTERN = aggregation()
+            .with(source().matching(
+                    groupId().capturedAs(GROUP_ID)));
+
+    private final FunctionAndTypeManager functionAndTypeManager;
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The rule lacks guards that the aggregation’s grouping keys match the GroupId output, which may apply the rewrite in unsupported layouts.

This currently rewrites any `AggregationNode` over a `GroupIdNode` without verifying that the aggregation’s grouping sets are exactly `grouping_set_keys + groupId`. The new PARTIAL node groups by `allSourceGroupingKeys` from `GroupIdNode.getGroupingSets()`, while the INTERMEDIATE keeps `node.getGroupingSets()` unchanged. If the original aggregation includes extra grouping expressions/columns or omits the groupId variable, the transformed plan may change semantics. Please add pattern or `apply`-time checks that each aggregation grouping set is `grouping_set ∪ {groupIdVariable}` before applying this rule.

Suggested implementation:

```java
    public PreAggregateBeforeGroupId(FunctionAndTypeManager functionAndTypeManager)
    {
        this.functionAndTypeManager = requireNonNull(functionAndTypeManager, "functionAndTypeManager is null");
    }

    /**
     * Verifies that the aggregation's grouping sets are exactly the GroupIdNode's
     * source grouping sets extended with the groupId symbol, i.e. for each
     * aggregation grouping set G_agg there exists a source grouping set G_src such that
     * G_agg = G_src ∪ {groupIdSymbol}.
     */
    private static boolean aggregationGroupingMatchesGroupId(AggregationNode aggregation, GroupIdNode groupId)
    {
        Symbol groupIdSymbol = groupId.getGroupIdSymbol();

        // Build the set of all valid grouping sets the aggregation is allowed to use:
        // for each source grouping set, add the groupId symbol.
        java.util.Set<java.util.Set<Symbol>> validGroupingSets = new java.util.HashSet<>();
        for (java.util.List<Symbol> sourceGroupingSet : groupId.getGroupingSets()) {
            java.util.Set<Symbol> groupingSetWithGroupId = new java.util.HashSet<>(sourceGroupingSet);
            groupingSetWithGroupId.add(groupIdSymbol);
            // Use an unmodifiable set so equals()/hashCode() are stable and we don't accidentally mutate later
            validGroupingSets.add(java.util.Collections.unmodifiableSet(groupingSetWithGroupId));
        }

        // Every aggregation grouping set must match one of the expected sets.
        for (java.util.Set<Symbol> aggregationGroupingSet : aggregation.getGroupingSets()) {
            if (!validGroupingSets.contains(aggregationGroupingSet)) {
                return false;
            }
        }

        return true;
    }

    @Override
    public Pattern<AggregationNode> getPattern()
    {
        return PATTERN;

```

To fully enforce the guard, update the `apply` method in this class to call the new helper and bail out when the grouping layouts are not compatible. For example, at the very beginning of `apply`:

1. Retrieve the captured `GroupIdNode`:
   ```java
   GroupIdNode groupIdNode = captures.get(GROUP_ID);
   ```

2. Check the grouping sets and, if they do not match, skip the rewrite:
   ```java
   if (!aggregationGroupingMatchesGroupId(node, groupIdNode)) {
       return Result.empty();
   }
   ```

Place this check before any logic that constructs the PARTIAL/INTERMEDIATE aggregations. This ensures the rule only fires when each aggregation grouping set is exactly `grouping_set ∪ {groupIdVariable}`, preventing semantic changes when the aggregation has extra or missing grouping expressions relative to the `GroupIdNode` output.
</issue_to_address>

### Comment 4
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyRowExpressions.java" line_range="137" />
<code_context>
+                List<RowExpression> innerArgs = innerIf.getArguments();
+                RowExpression innerCondition = innerArgs.get(0);
+                if (falseValue.equals(innerArgs.get(2)) && determinismEvaluator.isDeterministic(innerCondition)) {
+                    RowExpression combinedCondition = new SpecialFormExpression(AND, BOOLEAN, condition, innerCondition);
+                    return new SpecialFormExpression(rewritten.getSourceLocation(), IF, rewritten.getType(), combinedCondition, innerArgs.get(1), falseValue);
+                }
</code_context>
<issue_to_address>
**nitpick:** Consider preserving source locations on the combined AND condition for better diagnostics and tooling.

Other `SpecialFormExpression` instances here (including the rewritten IF) carry a `SourceLocation`, but the new `AND` expression is created with the constructor that omits it. Please use the constructor that takes a `SourceLocation` (e.g., from `rewritten.getSourceLocation()` or `condition`) so diagnostics (stack traces, plan visualizations) remain accurate.
</issue_to_address>

### Comment 5
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSimplifyRowExpressions.java" line_range="144-145" />
<code_context>
+public class TestPreAggregateBeforeGroupId
+        extends BaseRuleTest
+{
+    @Test
+    public void testPreAggregatesBeforeGroupId()
+    {
</code_context>
<issue_to_address>
**suggestion (testing):** Add a negative test for non-deterministic inner conditions to validate determinism-aware behavior

The existing tests cover only deterministic inner conditions. Please add a case with a non-deterministic inner condition (e.g., `IF(X, IF(rand() > 0.5, V, CAST(null AS boolean)), CAST(null AS boolean))`) that must not be simplified, to ensure we don’t regress the determinism check in `NestedIfSimplifier`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +144 to +145
@Test
public void testSimplifyNestedIf()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a negative test for non-deterministic inner conditions to validate determinism-aware behavior

The existing tests cover only deterministic inner conditions. Please add a case with a non-deterministic inner condition (e.g., IF(X, IF(rand() > 0.5, V, CAST(null AS boolean)), CAST(null AS boolean))) that must not be simplified, to ensure we don’t regress the determinism check in NestedIfSimplifier.

@kaikalur kaikalur force-pushed the pre-aggregate-before-grouping-sets branch 9 times, most recently from a3ee32d to eeb80c6 Compare March 9, 2026 14:02
@steveburnett
Copy link
Copy Markdown
Contributor

Thanks for the release note! Please add a row of three (`) above and below it to format it, like this:

== RELEASE NOTES ==

General Changes

* Add optimizer rule to pre-aggregate data before GroupId node in grouping sets queries, reducing row multiplication. Enabled via session property pre_aggregate_before_grouping_sets. (:pr:27101)
* Fix a bug where adaptive partial aggregation could incorrectly bypass INTERMEDIATE aggregation steps.

Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The session property pre_aggregate_before_grouping_sets does not appear to be documented. Could you show me where this property is documented in Presto? If not, please add documentation.

@kaikalur kaikalur force-pushed the pre-aggregate-before-grouping-sets branch 2 times, most recently from 9f52b5f to a1176bb Compare March 9, 2026 17:19
@kaikalur kaikalur requested a review from steveburnett March 10, 2026 00:21
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responding to a review request, but I don't find any documentation or .rst files in the PR for me to review. Please re-request when doc is added and I am happy to take another look anytime.

Comment on lines +60 to +68
* <p>This rule runs after exchange placement, when the plan has already been
* split into PARTIAL and FINAL aggregations. It matches:
* <pre>
* AggregationNode(PARTIAL) -> GroupIdNode -> Source
* </pre>
* and transforms it to:
* <pre>
* AggregationNode(INTERMEDIATE) -> GroupIdNode -> AggregationNode(PARTIAL, GROUP BY all_keys) -> Source
* </pre>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you re-write this to use the tree notation we use in other rules ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try enabling the add_exchange_below_partial_aggregation_over_group_id if it helped for your use case ? It should have a similar impact as this rule - reducing the rows that GroupId multiplies

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the Javadoc to use tree notation consistent with AddExchangesBelowPartialAggregationOverGroupIdRuleSet. The "before" and "after" trees now use clean node names without parenthetical descriptions.

Copy link
Copy Markdown
Contributor Author

@kaikalur kaikalur Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, add_exchange_below_partial_aggregation_over_group_id adds exchanges below GroupId to reshuffle the raw data, which helps the partial aggregation above GroupId be more effective. However, it does not actually reduce the row count before GroupId — it only redistributes the same rows.

This rule goes further by inserting a PARTIAL aggregation below GroupId that actually reduces the data (e.g., 60K rows → 2,526 for the TPC-H CUBE query). The pre-aggregated intermediate states are then shuffled and merged via an INTERMEDIATE aggregation before GroupId multiplies them. So the two rules are complementary — this one reduces rows, the existing one redistributes them.

Benchmark comparison (TPC-H lineitem CROSS JOIN UNNEST(ARRAY[1,2,3,4,5]), GROUP BY CUBE(yr, mo, dy, shipmode, returnflag) — 32 grouping sets, ~300K input rows, 3 warmup + 5 measured runs):

Configuration Avg Time Speedup
Baseline (both disabled) 6574 ms 1.00x
pre_aggregate_before_grouping_sets = true 1686 ms 3.90x
add_exchange_below_partial_aggregation_over_group_id = true 6609 ms 0.99x

The existing exchange rule didn't fire in this benchmark — likely because its cost-model guard (estimateAggregationMemoryRequirements < maxPartialAggregationMemoryUsage) determined that partial aggregation memory was manageable without redistribution. The pre-aggregation rule, by contrast, always reduces rows before GroupId multiplies them, yielding a consistent ~4x speedup.

@kaikalur
Copy link
Copy Markdown
Contributor Author

@aaneja Thanks for the review! Addressed your comments:

Tree notation: Rewrote the Javadoc to use the tree notation format used in other rules like AddExchangesBelowPartialAggregationOverGroupIdRuleSet.

Re: add_exchange_below_partial_aggregation_over_group_id: Good question! I did look at it — they target the same problem (GroupId row multiplication) but with different mechanisms:

  • AddExchangesBelowPartialAggregationOverGroupIdRuleSet adds exchanges (data redistribution) below GroupId so that the partial aggregation above GroupId is more effective due to better data partitioning. It doesn't reduce the row count before GroupId — GroupId still multiplies all incoming rows.

  • PreAggregateBeforeGroupId adds an actual partial aggregation below GroupId that collapses duplicate grouping-key combinations before GroupId sees them. This directly reduces the number of rows that GroupId multiplies.

The pre-aggregation approach is more effective when the data has many duplicate values on the grouping columns — the aggregation can collapse N duplicate rows into 1 before GroupId multiplies them. The exchange approach helps when partial aggregation above GroupId is ineffective due to data skew/distribution but doesn't reduce the input to GroupId itself.

They could also be complementary — redistributing and pre-aggregating before GroupId.

Also added .rst documentation for the session property and configuration property per @steveburnett's feedback.

@aaneja
Copy link
Copy Markdown
Contributor

aaneja commented Mar 11, 2026

@kaikalur The exchange is done on exactly the variables that the partial agg needs, see this coderef
Here's a snippet of TPCDS Q67 eg -

- Aggregate(PARTIAL)[i_category$gid, i_class$gid, i_brand$gid, i_product_name$gid, d_year$gid, d_qoy$gid, d_moy$gid, s_store_id$gid, groupid][PlanNodeId 1710] => [i_category$gid:varchar(50), i_class$gid:varchar(50), i_brand$gid:varchar(50), i_product_name$gid:varchar(50), d_year$gid:integer, d_qoy$gid:integer, d_moy$gid:integer, s_store_id$gid:varchar(16), groupid:bigint, sum_136:varbinary]
            CPU: 35.57m (48.30%), Scheduled: 1.05h (46.54%), Output: 591,831,903 rows (79.00GB)
            Input total: 4,847,526,981 rows (518.06GB), avg.: 37,871,304.54 rows, std.dev.: 387.32%
            sum_136 := "native.default.sum"((expr)) (21:20)
        - GroupId[PlanNodeId 9][[], [i_category], [i_category, i_class], [i_category, i_class, i_brand], [i_category, i_class, i_brand, i_product_name], [i_category, i_class, i_brand, i_product_name, d_year], [i_category, i_class, i_brand, i_product_name, d_year, d_qoy], [i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy], [i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy, s_store_id]] => [i_brand$gid:varchar(50), s_store_id$gid:varchar(16), i_category$gid:varchar(50), i_class$gid:varchar(50), d_year$gid:integer, i_product_name$gid:varchar(50), d_qoy$gid:integer, d_moy$gid:integer, expr:decimal(17,2), groupid:bigint]
                CPU: 17.29s (0.39%), Scheduled: 27.32s (0.34%), Output: 4,847,526,981 rows (518.06GB)
                Input total: 538,614,109 rows (100.53GB), avg.: 4,207,922.73 rows, std.dev.: 387.32%
                i_brand$gid := i_brand (30:51)
                s_store_id$gid := s_store_id (30:98)
                i_category$gid := i_category (30:30)
                i_class$gid := i_class (30:42)
                d_year$gid := d_year (30:76)
                i_product_name$gid := i_product_name (30:60)
                d_qoy$gid := d_qoy (30:84)
                d_moy$gid := d_moy (30:91)
            - LocalExchange[PlanNodeId 1785][HASH] (i_product_name) => [i_category:varchar(50), i_class:varchar(50), i_brand:varchar(50), i_product_name:varchar(50), d_year:integer, d_qoy:integer, d_moy:integer, s_store_id:varchar(16), expr:decimal(17,2)]
                    Estimates: {source: CostBasedSourceInfo, rows: 487,075,940 (53.35GB), cpu: 471,008,999,931.42, memory: 186,518,273.27, network: 57,469,954,495.64}
                    CPU: 31.74s (0.72%), Scheduled: 1.08m (0.80%), Output: 538,614,109 rows (100.53GB)
                    Input total: 538,614,109 rows (100.53GB), avg.: 4,207,922.73 rows, std.dev.: 387.32%
                - RemoteSource[4]  => [i_category:varchar(50), i_class:varchar(50), i_brand:varchar(50), i_product_name:varchar(50), d_year:integer, d_qoy:integer, d_moy:integer, s_store_id:varchar(16), expr:decimal(17,2)]
                        CPU: 1.36m (1.85%), Scheduled: 2.26m (1.68%), Output: 538,614,109 rows (100.53GB)
                        Input total: 538,614,109 rows (100.53GB), avg.: 4,207,922.73 rows, std.dev.: 387.32%

The impact is exactly the same as adding the partial agg that you have in this rule; but it can do better because it fetches all data across workers

Which is why I wanted to know if there was a real query that benefited from this rule, but not from AddExchangesBelowPartialAggregationOverGroupIdRuleSet (or from a combination of the two) ?

@kaikalur kaikalur force-pushed the pre-aggregate-before-grouping-sets branch from a1176bb to e349066 Compare March 11, 2026 14:25
@kaikalur
Copy link
Copy Markdown
Contributor Author

@aaneja Thanks for the detailed TPCDS Q67 example — that's really helpful for understanding the exchange rule's behavior.

Looking at your plan snippet, the key observation is:

  • Source → GroupId: 538M rows → 4.8B rows (9x multiplication)
  • GroupId → Partial Agg: 4.8B rows → 591M rows

The exchange rule makes the partial agg above GroupId more effective by redistributing data, but GroupId still processes all 538M input rows and produces the full 4.8B multiplied rows. The reduction happens after multiplication.

With PreAggregateBeforeGroupId, if the data has duplicate combinations on the union of grouping columns, we collapse them before GroupId sees them. For example, if pre-aggregation reduces 538M → 50M distinct key combinations, GroupId would only produce 450M rows (9 × 50M) instead of 4.8B — avoiding the work of both multiplying and then reducing redundant rows.

The motivation comes from production queries at our org where engineers manually rewrite queries to pre-aggregate before GROUPING SETS for significant speedups (often 3-5x). This rule automates that well-known pattern. These queries typically have high duplication on the grouping columns relative to the base table cardinality — a common scenario in fact tables with many rows per dimension key combination.

That said, you're right that the two rules are complementary. The exchange rule helps with data distribution for the aggregation, while pre-aggregation reduces the raw row count. In cases where AddExchangesBelowPartialAggregationOverGroupIdRuleSet already achieves sufficient reduction, this rule's benefit would be marginal. The biggest wins come when grouping column cardinality is much lower than table cardinality (high duplication), where pre-aggregation can dramatically shrink the input to GroupId.

Happy to provide a more concrete benchmark comparison if that would help.

@aaneja
Copy link
Copy Markdown
Contributor

aaneja commented Mar 12, 2026

@kaikalur

With PreAggregateBeforeGroupId, if the data has duplicate combinations on the union of grouping columns, we collapse them before GroupId sees them. For example, if pre-aggregation reduces 538M → 50M distinct key combinations, GroupId would only produce 450M rows (9 × 50M) instead of 4.8B — avoiding the work of both multiplying and then reducing redundant rows.

This makes sense, but I don't see how this would result in latency, CPU or memory gains, since we're doing the pre and intermediate aggs on the same worker while streaming rows. We usually add intermediate aggs with Exchanges between them.

Yes, a concrete benchmark comparison would help clear this up !

@kaikalur
Copy link
Copy Markdown
Contributor Author

kaikalur commented Mar 12, 2026

@aaneja The rule now produces a complete pre-aggregation pipeline below GroupId with a shuffle. Here's the plan:

Source → PARTIAL → Exchange(hash) → INTERMEDIATE → GroupId → INTERMEDIATE → Exchange → FINAL

The data flow:

  1. Source produces raw rows
  2. PARTIAL aggregation reduces rows locally (at the finest granularity — union of all grouping set columns)
  3. Exchange shuffles the reduced partial states by grouping keys across workers
  4. INTERMEDIATE merges partial states from different workers
  5. GroupId multiplies the already-reduced-and-merged rows across grouping sets
  6. INTERMEDIATE above GroupId merges within each grouping set
  7. ExchangeFINAL completes the aggregation

The win: steps 2-4 drastically reduce the row count before GroupId multiplies them. Without pre-aggregation, GroupId multiplies the full raw row count.

Our rule runs before AddExchangesBelowPartialAggregationOverGroupIdRuleSet in the optimizer pipeline (PlanOptimizers.java), so the two optimizations are independent.

Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the documentation! Just a nit of phrasing.

@kaikalur
Copy link
Copy Markdown
Contributor Author

@aaneja Good point — the rule now includes a proper Exchange (shuffle) between the PARTIAL and INTERMEDIATE, addressing your concern about same-worker streaming. Here's a concrete example using TPC-H lineitem:

SELECT sum(extendedprice), count(extendedprice),
       day(shipdate), month(shipdate), shipdate
FROM lineitem
GROUP BY CUBE (day(shipdate), month(shipdate), shipdate)

Without optimization (current behavior):

Source (60K rows) → GroupId (×8 grouping sets = 480K rows) → PARTIAL → Exchange → FINAL

GroupId multiplies all 60K raw rows by 8, producing 480K rows that flow through aggregation.

With optimization (this PR):

Source (60K rows) → PARTIAL (→ 2,526 distinct shipdates) → Exchange(hash) → INTERMEDIATE (merged) → GroupId (×8 = ~20K rows) → INTERMEDIATE → Exchange → FINAL

The key numbers:

  • lineitem has ~60K rows but only 2,526 distinct shipdate values
  • PARTIAL reduces 60K → 2,526 rows locally (24x reduction)
  • Exchange shuffles only 2,526 rows (not 60K)
  • INTERMEDIATE merges across workers
  • GroupId multiplies only 2,526 × 8 = ~20K rows instead of 480K

That's a 24x reduction in the number of rows GroupId multiplies. The savings come from:

  1. Less data shuffled — Exchange moves 2,526 rows instead of 60K
  2. Less row multiplication — GroupId produces ~20K rows instead of 480K
  3. Less aggregation work — downstream INTERMEDIATE/FINAL process far fewer rows

The effect scales with data size and duplication ratio. For production tables with billions of rows and high duplication on grouping keys, the reduction can be orders of magnitude.

@kaikalur kaikalur force-pushed the pre-aggregate-before-grouping-sets branch from e57c82b to 46c252c Compare March 13, 2026 15:04
steveburnett
steveburnett previously approved these changes Mar 13, 2026
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

@aaneja
Copy link
Copy Markdown
Contributor

aaneja commented Mar 14, 2026

@kaikalur Can you run the above query on say a 2-4 worker setup for TPCH SF 100 and share the EXPLAIN ANALYZE when this rule is enabled vs when the AddExchangesBelowPartialAggregationOverGroupIdRuleSet rules are enabled ?

I'll try to do it our setup as well

@kaikalur
Copy link
Copy Markdown
Contributor Author

@aaneja All CI checks are green and we've addressed all review feedback (tree notation Javadoc, comparison with add_exchange_below_partial_aggregation_over_group_id, docs style fixes). Would appreciate your review when you get a chance — we'd like to get this merged soon. Thanks!

@aaneja
Copy link
Copy Markdown
Contributor

aaneja commented Mar 15, 2026

@aaneja All CI checks are green and we've addressed all review feedback (tree notation Javadoc, comparison with add_exchange_below_partial_aggregation_over_group_id, docs style fixes). Would appreciate your review when you get a chance — we'd like to get this merged soon. Thanks!

Looks like you missed my message above - #27290 (comment)

@kaikalur
Copy link
Copy Markdown
Contributor Author

@aaneja Apologies for missing your earlier messages!

To clarify the architecture concern — our rule does not run pre-agg and intermediate on the same worker while streaming. The transformed plan includes a RemoteExchange between PARTIAL and INTERMEDIATE:

Source → Partial Agg (group by all_keys)
       → RemoteExchange (hash by all_keys)
       → Intermediate Agg (group by all_keys)
       → GroupId
       → Intermediate Agg (group by grouping_set_keys + groupId)

So data is shuffled across workers between the PARTIAL and INTERMEDIATE steps — this is a fully distributed operation, not same-worker streaming.

The key difference from AddExchangesBelowPartialAggregationOverGroupIdRuleSet:

  • Existing rule: Shuffles raw rows below GroupId → GroupId still multiplies all rows → PARTIAL above reduces
  • Our rule: PARTIAL reduces rows locally → RemoteExchange shuffles reduced data → INTERMEDIATE merges globally → GroupId multiplies the already-reduced dataset

Using your TPCDS Q67 numbers as an example: with the existing rule, GroupId still multiplies 538M → 4.8B rows, and then PARTIAL reduces to 591M. With our rule, if PARTIAL reduces 538M to, say, 50M distinct key combinations, GroupId would only produce 450M rows (9 × 50M) — never materializing the 4.8B intermediate rows.

I don't currently have a multi-worker TPCH SF 100 setup available to run EXPLAIN ANALYZE. Since you mentioned you'll try it on your setup, that would be great! The session property to enable is pre_aggregate_before_grouping_sets=true. You could compare against add_exchange_below_partial_aggregation_over_group_id=true on the CUBE query from the PR description or TPCDS Q67.

@kaikalur kaikalur force-pushed the pre-aggregate-before-grouping-sets branch from 46c252c to 3099a27 Compare March 16, 2026 03:03
@aaneja
Copy link
Copy Markdown
Contributor

aaneja commented Mar 16, 2026

@kaikalur I can't sign-off until we get clear signal about the rule's applicability

@kaikalur
Copy link
Copy Markdown
Contributor Author

@kaikalur I can't sign-off until we get clear signal about the rule's applicability

well it's rule derived from manual applocation on our prod workloads. Also see the lates benchmark numbers that claude posted on tpch

@kaikalur
Copy link
Copy Markdown
Contributor Author

kaikalur commented Mar 16, 2026

Oh claude just edited the old reply. Here:

Benchmark comparison (TPC-H lineitem CROSS JOIN UNNEST(ARRAY[1,2,3,4,5]), GROUP BY CUBE(yr, mo, dy, shipmode, returnflag) — 32 grouping sets, ~300K input rows, 3 warmup + 5 measured runs):

Configuration Avg Time Speedup
Baseline (both disabled) 6574 ms 1.00x
pre_aggregate_before_grouping_sets = true 1686 ms 3.90x
add_exchange_below_partial_aggregation_over_group_id = true 6609 ms 0.99x

@kaikalur
Copy link
Copy Markdown
Contributor Author

@feilong-liu csn you review this? There is no question about it's applicability. Don't want to delay this any further as its off by default and peop;e who don't want don't need to use it

@kaikalur
Copy link
Copy Markdown
Contributor Author

@kaikalur Can you run the above query on say a 2-4 worker setup for TPCH SF 100 and share the EXPLAIN ANALYZE when this rule is enabled vs when the AddExchangesBelowPartialAggregationOverGroupIdRuleSet rules are enabled ?

I'll try to do it our setup as well

explain does not show the power of this. See the benchmark we posted

@kaikalur kaikalur force-pushed the pre-aggregate-before-grouping-sets branch from 3099a27 to fe6f33e Compare March 16, 2026 15:05
@kaikalur
Copy link
Copy Markdown
Contributor Author

@aaneja Added a reusable HiveDistributedBenchmarkRunner utility and a BenchmarkGroupingSetsPreAggregation benchmark test in presto-hive/src/test/.../benchmark/.

You can run it directly:

mvn test -pl presto-hive -Dtest=BenchmarkGroupingSetsPreAggregation -DfailIfNoTests=false

It compares three scenarios (baseline, pre_aggregate_before_grouping_sets, and add_exchange_below_partial_aggregation_over_group_id) on a GROUP BY CUBE(5 cols) query over lineitem CROSS JOIN UNNEST(ARRAY[1,2,3,4,5]) (~300K rows, 32 grouping sets). Results are written to /tmp/hive_benchmark_results.txt.

Latest results:

baseline                         6378 ms  (1.00x)
pre_aggregate_before_groupid     1505 ms  (4.24x)
add_exchange_below_agg           6414 ms  (0.99x)

The HiveDistributedBenchmarkRunner is generic and can be reused for other optimization benchmarks — just add scenarios with different session properties and call runner.runWithVerification(sql).

@kaikalur kaikalur force-pushed the pre-aggregate-before-grouping-sets branch 2 times, most recently from b21ea89 to c3a35f8 Compare March 16, 2026 17:51
…ation

Add a new iterative optimizer rule PreAggregateBeforeGroupId that inserts
a PARTIAL aggregation below GroupIdNode to reduce the number of rows that
GroupId multiplies across grouping sets.

Transforms:
  Agg(SINGLE) -> GroupId -> Source
into:
  Agg(INTERMEDIATE) -> GroupId -> Agg(PARTIAL, GROUP BY all_keys) -> Source

The PARTIAL aggregation groups by the union of all grouping set columns,
which drastically reduces row count before GroupId multiplies them. The
INTERMEDIATE aggregation above GroupId merges partial states within each
grouping set.

Also fixes a bug in HashAggregationOperator where SkipAggregationBuilder
could be incorrectly used for INTERMEDIATE aggregations, which expect
intermediate-format input rather than raw input.

Gated behind session property pre_aggregate_before_grouping_sets
(disabled by default).
@kaikalur kaikalur force-pushed the pre-aggregate-before-grouping-sets branch from c3a35f8 to a039db3 Compare March 16, 2026 18:20
@kaikalur
Copy link
Copy Markdown
Contributor Author

@aaneja We now have a distributedbenchmarkrunner and it clearly shows the power of this optiomziation. Also in general any optimization I implemented is vetted with our prod workloads so we add it disabled and we enable it here based on need/sometimes stats. Hope that helps in you reviews.

@kaikalur
Copy link
Copy Markdown
Contributor Author

@feilong-liu we now even have a benchmark and also this optimization is off by default - so let's merge it asap

Copy link
Copy Markdown
Contributor

@feilong-liu feilong-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve since this optimization is off by default and also has shown benchmark numbers required in the review.

@kaikalur kaikalur merged commit d86e64b into prestodb:master Mar 17, 2026
114 of 115 checks passed
@kaikalur
Copy link
Copy Markdown
Contributor Author

kaikalur commented Mar 17, 2026

Approve since this optimization is off by default and also has shown benchmark numbers required in the review.

Thanks @feilong-liu @aaneja if you have any other comments/reviews, we can still iterate and fix them as needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants