Skip to content

feat(optimizer): Merge multiple max_by/min_by aggregations with same comparison key#27417

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
kaikalur:merge-min-max-by-aggregations
Mar 27, 2026
Merged

feat(optimizer): Merge multiple max_by/min_by aggregations with same comparison key#27417
feilong-liu merged 1 commit intoprestodb:masterfrom
kaikalur:merge-min-max-by-aggregations

Conversation

@kaikalur
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur commented Mar 23, 2026

Description

Adds a new MergeMinMaxByAggregations optimizer rule that merges multiple max_by or min_by aggregations sharing the same comparison key into a single aggregation with a ROW argument and DEREFERENCE projections.

This is a takeover of #26873, addressing all reviewer feedback from @kaikalur and Sourcery.

Key Changes

  • Unified MergeMinMaxByAggregations class handling both max_by and min_by (per kaikalur review)
  • Session property renamed to merge_max_by_and_min_by_aggregations (per kaikalur review)
  • AggregationKey includes orderBy to prevent merging aggregations with different ordering (per Sourcery review)
  • Instance-based pattern using FunctionResolution instead of string comparison (per Sourcery review)
  • LinkedHashMap for deterministic plan ordering (per Sourcery review)
  • 17 unit tests + 10 e2e integration tests

Example

-- Before optimization:
SELECT max_by(v1, k), max_by(v2, k), max_by(v3, k) FROM t

-- After optimization (internal rewrite):
SELECT merged[0], merged[1], merged[2]
FROM (SELECT max_by(ROW(v1, v2, v3), k) AS merged FROM t)

Motivation and Context

When a query contains multiple max_by/min_by aggregations with the same comparison key, each independently scans and compares values. By merging into a single max_by(ROW(...), key) call:

  1. Reduced computation: One comparison per row instead of N
  2. Better memory efficiency: Single aggregation state instead of N separate states
  3. Simpler plan: Fewer aggregation function calls

Impact

  • New session property merge_max_by_and_min_by_aggregations (boolean, default false)
  • No breaking changes — optimization is disabled by default

Test Plan

  • 17 unit tests in TestMergeMinMaxByAggregations covering positive rewrites (same key, GROUP BY, mixed aggs, multiple groups, both max_by+min_by, matching filters, different value types, DISTINCT) and negative cases (single agg, different keys, disabled, 3-arg, no max_by/min_by, different filters, mixed distinct)
  • 10 e2e integration tests in AbstractTestQueries#testMergeMinMaxByAggregations comparing results with optimization enabled vs disabled
  • All 523 TestLocalQueries pass

Contributor checklist

  • Complies with contributing guide
  • PR description addresses the issue accurately
  • Documented new session property
  • Adequate tests added
  • CI passed locally

Release Notes

== NO RELEASE NOTE ==

Summary by Sourcery

Introduce an optimizer rule to merge compatible max_by and min_by aggregations sharing the same comparison key into a single aggregation and wire it behind a configurable session property.

New Features:

  • Add MergeMinMaxByAggregations rule to rewrite multiple max_by or min_by aggregations with the same comparison key into a single ROW-based aggregation.
  • Introduce the merge_max_by_and_min_by_aggregations session property and corresponding features configuration flag.

Documentation:

  • Document the new merge_max_by_and_min_by_aggregations session property in the admin session properties documentation.

Tests:

  • Add rule unit tests covering positive and negative cases for merging max_by/min_by aggregations, including filters, GROUP BY, DISTINCT, and mixed types.
  • Add integration tests ensuring queries produce identical results with the merge_max_by_and_min_by_aggregations optimization enabled or disabled.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 23, 2026

Reviewer's Guide

Introduces a new iterative optimizer rule MergeMinMaxByAggregations that, when enabled via a new session property, rewrites groups of 2‑arg max_by/min_by aggregations sharing the same comparison key into a single max_by/min_by over a ROW plus dereference projections, with comprehensive unit and integration tests and wiring into config, session properties, and the optimizer pipeline.

Sequence diagram for MergeMinMaxByAggregations rewrite during planning

sequenceDiagram
    participant Client
    participant Coordinator
    participant Planner
    participant IterativeOptimizer
    participant MergeMinMaxByAggregations

    Client->>Coordinator: submit SQL with multiple max_by/min_by
    Coordinator->>Planner: create logical plan
    Planner->>IterativeOptimizer: optimize(plan, session)

    loop apply_rules
        IterativeOptimizer->>MergeMinMaxByAggregations: isEnabled(session)
        MergeMinMaxByAggregations-->>IterativeOptimizer: return isMergeMaxByMinByAggregationsEnabled(session)
        alt rule_enabled
            IterativeOptimizer->>MergeMinMaxByAggregations: apply(AggregationNode)
            MergeMinMaxByAggregations->>MergeMinMaxByAggregations: hasMultipleMergeableAggregations(node)
            alt multiple_mergeable
                MergeMinMaxByAggregations->>MergeMinMaxByAggregations: findMergeableGroups(max_by)
                MergeMinMaxByAggregations->>MergeMinMaxByAggregations: findMergeableGroups(min_by)
                MergeMinMaxByAggregations->>MergeMinMaxByAggregations: mergeGroup(groups)
                MergeMinMaxByAggregations-->>IterativeOptimizer: return rewritten AggregationNode
            else no_mergeable_groups
                MergeMinMaxByAggregations-->>IterativeOptimizer: return no_change
            end
        else rule_disabled
            IterativeOptimizer-->>Planner: no_change
        end
    end

    IterativeOptimizer-->>Planner: optimized plan
    Planner-->>Coordinator: execution plan
    Coordinator-->>Client: query result
Loading

Class diagram for MergeMinMaxByAggregations rule and related planner wiring

classDiagram
    class MergeMinMaxByAggregations {
        - FunctionResolution functionResolution
        - FunctionAndTypeManager functionAndTypeManager
        - Pattern~AggregationNode~ pattern
        + MergeMinMaxByAggregations(FunctionAndTypeManager functionAndTypeManager)
        + boolean isEnabled(Session session)
        + Pattern~AggregationNode~ getPattern()
        + Result apply(AggregationNode node, Captures captures, Context context)
        - boolean hasMultipleMergeableAggregations(AggregationNode aggregationNode)
        - List findMergeableGroups(Map aggregations)
        - void mergeGroup(List group, String functionName, Map newAggregations, Assignments.Builder bottomProjectionsBuilder, Map originalToProjection, Set mergedVariables, Context context)
    }

    class AggregationKey {
        - RowExpression comparisonKey
        - Optional~RowExpression~ filter
        - Optional~VariableReferenceExpression~ mask
        - boolean isDistinct
        - Optional~OrderingScheme~ orderBy
        + AggregationKey(Aggregation aggregation)
        + boolean equals(Object o)
        + int hashCode()
    }

    class FeaturesConfig {
        - boolean mergeMaxByMinByAggregationsEnabled
        + FeaturesConfig setMergeMaxByMinByAggregationsEnabled(boolean mergeMaxByMinByAggregationsEnabled)
        + boolean isMergeMaxByMinByAggregationsEnabled()
    }

    class SystemSessionProperties {
        + static String MERGE_MAX_BY_AND_MIN_BY_AGGREGATIONS
        + static boolean isMergeMaxByMinByAggregationsEnabled(Session session)
    }

    class PlanOptimizers {
        + PlanOptimizers(...)
    }

    class IterativeOptimizer {
        + IterativeOptimizer(Metadata metadata, RuleStats ruleStats, StatsCalculator statsCalculator, CostCalculator costCalculator, Set rules)
    }

    class AggregationNode {
        + Map getAggregations()
        + List getOutputVariables()
        + PlanNode getSource()
        + List getGroupingSets()
    }

    class ProjectNode {
    }

    class FunctionResolution {
        + boolean isMaxByFunction(FunctionHandle functionHandle)
        + boolean isMinByFunction(FunctionHandle functionHandle)
    }

    class FunctionAndTypeManager {
        + FunctionAndTypeResolver getFunctionAndTypeResolver()
    }

    class FunctionAndTypeResolver {
        + FunctionHandle lookupFunction(String functionName, List argumentTypes)
    }

    MergeMinMaxByAggregations --> AggregationKey : uses
    MergeMinMaxByAggregations --> AggregationNode : matches_and_rewrites
    MergeMinMaxByAggregations --> ProjectNode : creates
    MergeMinMaxByAggregations --> FunctionResolution : uses
    MergeMinMaxByAggregations --> FunctionAndTypeManager : uses
    FunctionAndTypeManager --> FunctionAndTypeResolver : provides
    MergeMinMaxByAggregations ..> SystemSessionProperties : reads_session_property
    FeaturesConfig ..> SystemSessionProperties : supplies_default
    PlanOptimizers --> IterativeOptimizer : constructs
    IterativeOptimizer --> MergeMinMaxByAggregations : includes_rule
Loading

File-Level Changes

Change Details Files
Add MergeMinMaxByAggregations rule to merge compatible max_by/min_by aggregations into a single aggregation over a ROW plus dereference projections.
  • Implement Rule that matches aggregation nodes containing at least two 2‑arg max_by or min_by calls using FunctionResolution instead of string matching.
  • Group aggregations by an AggregationKey that captures comparison key expression, filter, mask, distinct flag, and ordering scheme, and only merge groups with size ≥ 2.
  • Rewrite each mergeable group into a bottom Project that builds a ROW of value arguments, a single merged aggregation call (max_by or min_by) over that ROW and the shared key, and a top Project that exposes original outputs via DEREFERENCE on the merged result.
  • Use LinkedHashMap to preserve deterministic aggregation ordering and pass through non‑merged aggregations and grouping behavior unchanged.
  • Gate the rule behind SystemSessionProperties.isMergeMaxByMinByAggregationsEnabled and register it in PlanOptimizers via an IterativeOptimizer instance.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MergeMinMaxByAggregations.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
Introduce configuration and session plumbing for enabling the merge-max-by/min-by optimizer.
  • Add mergeMaxByMinByAggregationsEnabled field, config setter/getter, and default value in FeaturesConfig, plus tests for defaults and explicit mappings.
  • Define MERGE_MAX_BY_AND_MIN_BY_AGGREGATIONS system property constant, register it as a boolean session property with default from FeaturesConfig, and expose an isMergeMaxByMinByAggregationsEnabled helper.
  • Update TestFeaturesConfig to assert default and explicit mappings for the new optimizer.merge-max-by-and-min-by-aggregations property.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestFeaturesConfig.java
Add planner rule unit tests and end-to-end query tests validating the new optimization.
  • Create TestMergeMinMaxByAggregations to cover positive cases (multiple max_by, multiple min_by, mixed groups, grouping sets, filters, DISTINCT, different value types) and negative cases (single agg, different keys, disabled session property, 3‑arg max_by, no max/min_by, mismatched filters or DISTINCT).
  • Extend AbstractTestQueries with testMergeMinMaxByAggregations to compare results with the session property enabled vs disabled across various query shapes (global, GROUP BY, HAVING, mixed with other aggs, multiple key groups, single max_by).
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMergeMinMaxByAggregations.java
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The hasMultipleMergeableAggregations predicate duplicates some of the grouping logic in findMergeableGroups and can give false positives (e.g., two-arg max_by/min_by with mismatched filters/orderBy); consider simplifying it to reuse findMergeableGroups so the rule only fires when there is at least one actually mergeable group and to keep the eligibility logic in one place.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `hasMultipleMergeableAggregations` predicate duplicates some of the grouping logic in `findMergeableGroups` and can give false positives (e.g., two-arg max_by/min_by with mismatched filters/orderBy); consider simplifying it to reuse `findMergeableGroups` so the rule only fires when there is at least one actually mergeable group and to keep the eligibility logic in one place.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@kaikalur kaikalur force-pushed the merge-min-max-by-aggregations branch 2 times, most recently from 50d49f5 to ca7fdb8 Compare March 24, 2026 02:21
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

…comparison key

Add MergeMinMaxByAggregations optimizer rule that rewrites multiple
max_by or min_by aggregations sharing the same comparison key into a
single aggregation with a ROW argument and DEREFERENCE projections.

For example:
  SELECT max_by(v1, k), max_by(v2, k) FROM t
becomes:
  SELECT merged[0], merged[1]
  FROM (SELECT max_by(ROW(v1, v2), k) AS merged FROM t)

This reduces CPU (one comparison per row instead of N) and memory
(single aggregation state instead of N).

Gated by session property merge_max_by_and_min_by_aggregations
(default: false).
@kaikalur kaikalur force-pushed the merge-min-max-by-aggregations branch from ca7fdb8 to 812a5ee Compare March 25, 2026 17:54
@feilong-liu feilong-liu merged commit 9f2f9bd into prestodb:master Mar 27, 2026
114 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants