Skip to content

Optimize map_normalize#22211

Merged
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:optimize_map_normalize
Mar 15, 2024
Merged

Optimize map_normalize#22211
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:optimize_map_normalize

Conversation

@kaikalur
Copy link
Contributor

@kaikalur kaikalur commented Mar 14, 2024

Description

Optimize MapNormalize function to not call reduce for every element.

Motivation and Context

This function is a sql function and it calls (nested) reduce on the values array for every element which we don't optimize via cse currently (for nested lambdas) #22214. So we pull out the reduce to do it only once for performance.

Impact

Improved UDF performance

Test Plan

Tests exist and also added a couple for NaN/Infinity/null results

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Optimized `map_normalize` builtin SQL UDF to avoid repeated reduce computation

@kaikalur kaikalur requested a review from a team as a code owner March 14, 2024 22:12
@kaikalur kaikalur requested a review from presto-oss March 14, 2024 22:12
@kaikalur kaikalur force-pushed the optimize_map_normalize branch 2 times, most recently from be7d9d7 to 1b1f005 Compare March 14, 2024 22:26
@mbasmanova mbasmanova changed the title Optimize map normalize Optimize map_normalize Mar 14, 2024
mbasmanova
mbasmanova previously approved these changes Mar 14, 2024
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please, update commit message to use map_normalize and add details about the optimization.

CC: @rschlussel @amitkdutta

@rschlussel
Copy link
Contributor

Can you also add tests for all the cases that @mbasmanova was trying over here #22209. And documentation accordingly.

Also add a release note for improving performance of map_normalize

@kaikalur kaikalur force-pushed the optimize_map_normalize branch from 2ebc854 to a9625fa Compare March 15, 2024 19:03
@kaikalur kaikalur requested a review from rschlussel March 15, 2024 19:05
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Missing doc updates about the behavior when the values sum to 0, but not sure that's important if we're also deprecating this function moving it to internal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants