Skip to content

Fix min_by/max_by for window function evaluation#21793

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:fix_min_max_by
Jan 30, 2024
Merged

Fix min_by/max_by for window function evaluation#21793
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:fix_min_max_by

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Jan 26, 2024

Description

Fix issue #21653

Motivation and Context

As described in #21653, the min_by/max_by are not returning the expected result, due to the optimization in window function expression. A similar bug in min/max function was fixed in #18615

Impact

Fix the correctness issue for max_by/min_by in window function.
The queries with the following pattern will be impacted:

  • The max_by/min_by function should take three arguments, i.e. have the n argument like max_by(x, y, n). This is because the max_by(x, y) function has a different implementation than the max_by(x, y, n) function
  • The max_by/min_by function should not have “unbounded following” in the frame definition. This is because the optimization in the window function evaluation only works when the end of frame changes for different rows in the same partition.

Notice that this change will also slightly degrade the performance of this aggregation function. The fix is to restore the accumulation state after outputting the result, which is not needed if not used in window function. However, to mitigate this, we need to know whether it's used in window function in the implementation, which makes the code complex and not worth the benefit.

Test Plan

Tested with the query in the issue.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix a bug for min_by/max_by for window function, where results are incorrect when the function specifies number of elements to keep and the window does not have “unbounded following” in the frame.

@feilong-liu feilong-liu requested a review from a team as a code owner January 26, 2024 06:34
@feilong-liu feilong-liu marked this pull request as draft January 26, 2024 06:34
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heap was restored after the change, hence do not need to update the memory usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore states

@feilong-liu feilong-liu marked this pull request as ready for review January 26, 2024 20:52
Copy link
Contributor

@pranjalssh pranjalssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@feilong-liu feilong-liu merged commit 288abe7 into prestodb:master Jan 30, 2024
@feilong-liu feilong-liu deleted the fix_min_max_by branch January 30, 2024 19:16
@wanglinsong wanglinsong mentioned this pull request Feb 12, 2024
64 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants