Add optimizer to convert min_by/max_by to row number function #25190

feilong-liu · 2025-05-23T20:57:22Z

Description

This optimization converts queries like

select id, max(ds), max_by(feature1, ds), max_by(feature2, ds) from table group by id

to

select id, ds, feature1, feature2 from (select id, ds, feature1, feature2, row_number() over (partition by id order by ds desc) row_num) where row_num = 1

Here feature1, feature2 are maps. This rewrite can avoid the expensive cost of aggregations on feature1 and feature2. This is commonly used in getting latest features in machine learning workload.

Motivation and Context

Query optimization to reduce cost.

Impact

Query optimization to reduce cost.

Test Plan

Unit tests

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add a new optimization `MinMaxByToWindowFunction` to rewrite min_by/max_by aggregations with row_number window function

presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java

jaystarshot · 2025-06-02T20:52:27Z

Maybe I am missing something but In
select id, max(ds), max_by(feature1, ds), max_by(feature2, ds) from table group by id
feature1/feature2 are not aggregated right? (they just fetch the feature from the max ds row?) so i don't understand

Here feature1, feature2 are maps. This rewrite can avoid the expensive cost of aggregations on feature1 and feature2.

feilong-liu · 2025-06-02T22:31:38Z

Maybe I am missing something but In select id, max(ds), max_by(feature1, ds), max_by(feature2, ds) from table group by id feature1/feature2 are not aggregated right? (they just fetch the feature from the max ds row?) so i don't understand

Here feature1, feature2 are maps. This rewrite can avoid the expensive cost of aggregations on feature1 and feature2.

Correct, here is the definition of max_by(x, y) -> [same as x]() https://prestodb.io/docs/current/functions/aggregate.html#max_by-x-y-same-as-x Returns the value of x associated with the maximum value of y over all input values
This is commonly used in feature selection, i.e. to select the latest feature in a table for a user.

feilong-liu · 2025-06-02T22:33:51Z

Maybe I am missing something but In select id, max(ds), max_by(feature1, ds), max_by(feature2, ds) from table group by id feature1/feature2 are not aggregated right? (they just fetch the feature from the max ds row?) so i don't understand

Here feature1, feature2 are maps. This rewrite can avoid the expensive cost of aggregations on feature1 and feature2.

And the expensive cost of aggregations I mean the process of feature maps in the accumulator (although it's not actually doing aggregation on it by semantics, it can still be expensive, especially for large maps)

jaystarshot · 2025-06-03T18:33:27Z

...e/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MinMaxByToWindowFunction.java

+import static com.facebook.presto.sql.relational.Expressions.comparisonExpression;
+import static com.google.common.collect.ImmutableMap.toImmutableMap;
+
+public class MinMaxByToWindowFunction


Can you add a small comment explaining the plan changes?

Sure, will add in a separate PR.

#25249 @jaystarshot

jaystarshot · 2025-06-03T18:33:44Z

presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java

    private int eagerPlanValidationThreadPoolSize = 20;
    private boolean innerJoinPushdownEnabled;
    private boolean inEqualityJoinPushdownEnabled;
+    private boolean rewriteMinMaxByToTopNEnabled;


Can this be on by default?

I guess row number adds sorting so might not be always efficient but if your performance numbers show other wise then we can make it on by default?

I want to be conservative for now. Will consider to set it to be true after getting more stats for this optimizer

feilong-liu requested review from a team, elharo, jaystarshot and vivek-bharathan as code owners May 23, 2025 20:57

feilong-liu requested a review from ZacBlanco May 23, 2025 20:57

prestodb-ci added the from:Meta PR from Meta label May 23, 2025

feilong-liu marked this pull request as draft May 23, 2025 20:57

feilong-liu requested a review from kaikalur May 23, 2025 20:57

feilong-liu force-pushed the feature_dedup branch 3 times, most recently from dbc5e09 to f1cd3c3 Compare May 28, 2025 05:24

feilong-liu marked this pull request as ready for review May 28, 2025 16:36

feilong-liu requested a review from rschlussel May 28, 2025 16:36

kaikalur requested changes May 28, 2025

View reviewed changes

presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java Show resolved Hide resolved

feilong-liu force-pushed the feature_dedup branch from f1cd3c3 to d928aef Compare May 28, 2025 23:47

Add optimizer to convert min_by/max_by to row number function

655ae6e

feilong-liu force-pushed the feature_dedup branch from d928aef to 655ae6e Compare May 28, 2025 23:51

feilong-liu requested a review from kaikalur May 28, 2025 23:51

kaikalur approved these changes Jun 2, 2025

View reviewed changes

feilong-liu requested a review from hantangwangd June 2, 2025 19:29

jaystarshot reviewed Jun 3, 2025

View reviewed changes

jaystarshot approved these changes Jun 3, 2025

View reviewed changes

feilong-liu merged commit 0729c1d into prestodb:master Jun 4, 2025
97 checks passed

feilong-liu deleted the feature_dedup branch June 4, 2025 00:08

feilong-liu mentioned this pull request Jun 4, 2025

Add description for MinMaxByToWindowFunction optimizer #25249

Merged

6 tasks

unidevel mentioned this pull request Jun 19, 2025

Add release notes for 0.294 unix280/presto#33

Closed

6 tasks

feilong-liu mentioned this pull request Jun 25, 2025

Extend max_by/min_by optimization to mix of array/map/scalar type #25435

Merged

6 tasks

This was referenced Jul 4, 2025

Add release notes for 0.294 unix280/presto#35

Merged

Add release notes for 0.294 unix280/presto#36

Closed

Add release notes for 0.294 unix280/presto#37

Closed

This was referenced Jul 24, 2025

Add release notes for 0.294 unix280/presto#39

Merged

Add release notes for 0.294 unix280/presto#40

Merged

prestodb-ci mentioned this pull request Jul 28, 2025

Add release notes for 0.294 #25633

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimizer to convert min_by/max_by to row number function #25190

Add optimizer to convert min_by/max_by to row number function #25190

Uh oh!

feilong-liu commented May 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

jaystarshot commented Jun 2, 2025 •

edited

Loading

Uh oh!

feilong-liu commented Jun 2, 2025

Uh oh!

feilong-liu commented Jun 2, 2025

Uh oh!

jaystarshot Jun 3, 2025

Uh oh!

feilong-liu Jun 4, 2025

Uh oh!

feilong-liu Jun 4, 2025

Uh oh!

jaystarshot Jun 3, 2025

Uh oh!

jaystarshot Jun 3, 2025

Uh oh!

feilong-liu Jun 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add optimizer to convert min_by/max_by to row number function #25190

Add optimizer to convert min_by/max_by to row number function #25190

Uh oh!

Conversation

feilong-liu commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

Uh oh!

jaystarshot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feilong-liu commented Jun 2, 2025

Uh oh!

feilong-liu commented Jun 2, 2025

Uh oh!

jaystarshot Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

feilong-liu Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

feilong-liu Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

jaystarshot Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

jaystarshot Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

feilong-liu Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feilong-liu commented May 23, 2025 •

edited

Loading

jaystarshot commented Jun 2, 2025 •

edited

Loading