feat(optimizer): Merge multiple max_by/min_by aggregations with same comparison key#27417
Merged
feilong-liu merged 1 commit intoprestodb:masterfrom Mar 27, 2026
Merged
Conversation
Contributor
Reviewer's GuideIntroduces a new iterative optimizer rule MergeMinMaxByAggregations that, when enabled via a new session property, rewrites groups of 2‑arg max_by/min_by aggregations sharing the same comparison key into a single max_by/min_by over a ROW plus dereference projections, with comprehensive unit and integration tests and wiring into config, session properties, and the optimizer pipeline. Sequence diagram for MergeMinMaxByAggregations rewrite during planningsequenceDiagram
participant Client
participant Coordinator
participant Planner
participant IterativeOptimizer
participant MergeMinMaxByAggregations
Client->>Coordinator: submit SQL with multiple max_by/min_by
Coordinator->>Planner: create logical plan
Planner->>IterativeOptimizer: optimize(plan, session)
loop apply_rules
IterativeOptimizer->>MergeMinMaxByAggregations: isEnabled(session)
MergeMinMaxByAggregations-->>IterativeOptimizer: return isMergeMaxByMinByAggregationsEnabled(session)
alt rule_enabled
IterativeOptimizer->>MergeMinMaxByAggregations: apply(AggregationNode)
MergeMinMaxByAggregations->>MergeMinMaxByAggregations: hasMultipleMergeableAggregations(node)
alt multiple_mergeable
MergeMinMaxByAggregations->>MergeMinMaxByAggregations: findMergeableGroups(max_by)
MergeMinMaxByAggregations->>MergeMinMaxByAggregations: findMergeableGroups(min_by)
MergeMinMaxByAggregations->>MergeMinMaxByAggregations: mergeGroup(groups)
MergeMinMaxByAggregations-->>IterativeOptimizer: return rewritten AggregationNode
else no_mergeable_groups
MergeMinMaxByAggregations-->>IterativeOptimizer: return no_change
end
else rule_disabled
IterativeOptimizer-->>Planner: no_change
end
end
IterativeOptimizer-->>Planner: optimized plan
Planner-->>Coordinator: execution plan
Coordinator-->>Client: query result
Class diagram for MergeMinMaxByAggregations rule and related planner wiringclassDiagram
class MergeMinMaxByAggregations {
- FunctionResolution functionResolution
- FunctionAndTypeManager functionAndTypeManager
- Pattern~AggregationNode~ pattern
+ MergeMinMaxByAggregations(FunctionAndTypeManager functionAndTypeManager)
+ boolean isEnabled(Session session)
+ Pattern~AggregationNode~ getPattern()
+ Result apply(AggregationNode node, Captures captures, Context context)
- boolean hasMultipleMergeableAggregations(AggregationNode aggregationNode)
- List findMergeableGroups(Map aggregations)
- void mergeGroup(List group, String functionName, Map newAggregations, Assignments.Builder bottomProjectionsBuilder, Map originalToProjection, Set mergedVariables, Context context)
}
class AggregationKey {
- RowExpression comparisonKey
- Optional~RowExpression~ filter
- Optional~VariableReferenceExpression~ mask
- boolean isDistinct
- Optional~OrderingScheme~ orderBy
+ AggregationKey(Aggregation aggregation)
+ boolean equals(Object o)
+ int hashCode()
}
class FeaturesConfig {
- boolean mergeMaxByMinByAggregationsEnabled
+ FeaturesConfig setMergeMaxByMinByAggregationsEnabled(boolean mergeMaxByMinByAggregationsEnabled)
+ boolean isMergeMaxByMinByAggregationsEnabled()
}
class SystemSessionProperties {
+ static String MERGE_MAX_BY_AND_MIN_BY_AGGREGATIONS
+ static boolean isMergeMaxByMinByAggregationsEnabled(Session session)
}
class PlanOptimizers {
+ PlanOptimizers(...)
}
class IterativeOptimizer {
+ IterativeOptimizer(Metadata metadata, RuleStats ruleStats, StatsCalculator statsCalculator, CostCalculator costCalculator, Set rules)
}
class AggregationNode {
+ Map getAggregations()
+ List getOutputVariables()
+ PlanNode getSource()
+ List getGroupingSets()
}
class ProjectNode {
}
class FunctionResolution {
+ boolean isMaxByFunction(FunctionHandle functionHandle)
+ boolean isMinByFunction(FunctionHandle functionHandle)
}
class FunctionAndTypeManager {
+ FunctionAndTypeResolver getFunctionAndTypeResolver()
}
class FunctionAndTypeResolver {
+ FunctionHandle lookupFunction(String functionName, List argumentTypes)
}
MergeMinMaxByAggregations --> AggregationKey : uses
MergeMinMaxByAggregations --> AggregationNode : matches_and_rewrites
MergeMinMaxByAggregations --> ProjectNode : creates
MergeMinMaxByAggregations --> FunctionResolution : uses
MergeMinMaxByAggregations --> FunctionAndTypeManager : uses
FunctionAndTypeManager --> FunctionAndTypeResolver : provides
MergeMinMaxByAggregations ..> SystemSessionProperties : reads_session_property
FeaturesConfig ..> SystemSessionProperties : supplies_default
PlanOptimizers --> IterativeOptimizer : constructs
IterativeOptimizer --> MergeMinMaxByAggregations : includes_rule
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
hasMultipleMergeableAggregationspredicate duplicates some of the grouping logic infindMergeableGroupsand can give false positives (e.g., two-arg max_by/min_by with mismatched filters/orderBy); consider simplifying it to reusefindMergeableGroupsso the rule only fires when there is at least one actually mergeable group and to keep the eligibility logic in one place.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `hasMultipleMergeableAggregations` predicate duplicates some of the grouping logic in `findMergeableGroups` and can give false positives (e.g., two-arg max_by/min_by with mismatched filters/orderBy); consider simplifying it to reuse `findMergeableGroups` so the rule only fires when there is at least one actually mergeable group and to keep the eligibility logic in one place.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
50d49f5 to
ca7fdb8
Compare
steveburnett
approved these changes
Mar 24, 2026
Contributor
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
…comparison key Add MergeMinMaxByAggregations optimizer rule that rewrites multiple max_by or min_by aggregations sharing the same comparison key into a single aggregation with a ROW argument and DEREFERENCE projections. For example: SELECT max_by(v1, k), max_by(v2, k) FROM t becomes: SELECT merged[0], merged[1] FROM (SELECT max_by(ROW(v1, v2), k) AS merged FROM t) This reduces CPU (one comparison per row instead of N) and memory (single aggregation state instead of N). Gated by session property merge_max_by_and_min_by_aggregations (default: false).
ca7fdb8 to
812a5ee
Compare
feilong-liu
approved these changes
Mar 27, 2026
This was referenced Mar 31, 2026
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a new
MergeMinMaxByAggregationsoptimizer rule that merges multiplemax_byormin_byaggregations sharing the same comparison key into a single aggregation with aROWargument andDEREFERENCEprojections.This is a takeover of #26873, addressing all reviewer feedback from @kaikalur and Sourcery.
Key Changes
MergeMinMaxByAggregationsclass handling bothmax_byandmin_by(per kaikalur review)merge_max_by_and_min_by_aggregations(per kaikalur review)AggregationKeyincludesorderByto prevent merging aggregations with different ordering (per Sourcery review)FunctionResolutioninstead of string comparison (per Sourcery review)LinkedHashMapfor deterministic plan ordering (per Sourcery review)Example
Motivation and Context
When a query contains multiple
max_by/min_byaggregations with the same comparison key, each independently scans and compares values. By merging into a singlemax_by(ROW(...), key)call:Impact
merge_max_by_and_min_by_aggregations(boolean, defaultfalse)Test Plan
TestMergeMinMaxByAggregationscovering positive rewrites (same key, GROUP BY, mixed aggs, multiple groups, both max_by+min_by, matching filters, different value types, DISTINCT) and negative cases (single agg, different keys, disabled, 3-arg, no max_by/min_by, different filters, mixed distinct)AbstractTestQueries#testMergeMinMaxByAggregationscomparing results with optimization enabled vs disabledContributor checklist
Release Notes
Summary by Sourcery
Introduce an optimizer rule to merge compatible max_by and min_by aggregations sharing the same comparison key into a single aggregation and wire it behind a configurable session property.
New Features:
Documentation:
Tests: