Skip to content

Optimize repeat function by creating runlength block#21984

Merged
pranjalssh merged 1 commit intoprestodb:masterfrom
feilong-liu:repeat_opt
Feb 29, 2024
Merged

Optimize repeat function by creating runlength block#21984
pranjalssh merged 1 commit intoprestodb:masterfrom
feilong-liu:repeat_opt

Conversation

@feilong-liu
Copy link
Copy Markdown
Contributor

@feilong-liu feilong-liu commented Feb 21, 2024

Description

Currently repeat function implementation is to write the result block multiple times. However this is not efficient, since the result block is a repeat of the same value multiple times, we can create a RunLengthBlock which is more efficient, both in memory and construction cost.

Motivation and Context

To improve performance of repeat function.
Also the result block created by repeat function will have small memory size, and it will be evaluated during query compilation (otherwise it will be skipped as the result block is too big

return value instanceof RowExpression || (isSupportedLiteralType(type) && estimatedSizeInBytes(value) <= MAX_SERIALIZABLE_OBJECT_SIZE);
)

Impact

Improve performance for repeat function. repeat(1, 1000) shows 35% performance improvement.

Current:

Benchmark                          (repeatArgument)  Mode  Cnt      Score       Error  Units
BenchmarkRepeatFunction.benchmark              1000  avgt   20  60774.020 ± 14228.873  ns/op

After optimization

Benchmark                          (repeatArgument)  Mode  Cnt      Score      Error  Units
BenchmarkRepeatFunction.benchmark              1000  avgt   20  39639.608 ± 8549.023  ns/op

Test Plan

Add unit tests

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Improve repeat function to create RunLengthEncodedBlock to improve performance

@feilong-liu feilong-liu requested a review from a team as a code owner February 21, 2024 23:50
@feilong-liu feilong-liu marked this pull request as draft February 21, 2024 23:50
@feilong-liu feilong-liu force-pushed the repeat_opt branch 4 times, most recently from e69a0c5 to d1b60d1 Compare February 22, 2024 03:27
@feilong-liu feilong-liu changed the title Repeat opt Optimize repeat function by creating runlength block Feb 22, 2024
@feilong-liu feilong-liu force-pushed the repeat_opt branch 2 times, most recently from 65504c0 to c2911e4 Compare February 22, 2024 05:45
@feilong-liu feilong-liu marked this pull request as ready for review February 22, 2024 05:47
Comment on lines 1708 to 1709
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions now succeeds because it will not exceed the memory limit after outputting RunLengthEncodedBlock.

kaikalur
kaikalur previously approved these changes Feb 22, 2024
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

@kaikalur
Copy link
Copy Markdown
Contributor

Also can we do some similar for SEQUENCE - using literal encoder?

mlyublena
mlyublena previously approved these changes Feb 22, 2024
Copy link
Copy Markdown
Contributor

@mlyublena mlyublena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

rschlussel
rschlussel previously approved these changes Feb 22, 2024
Copy link
Copy Markdown
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it need to be < 10K?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is intentional to avoid generating too large array when this function was added. Since we never heard complaint about it, I think it's reasonable to keep this limit here.

@feilong-liu feilong-liu dismissed stale reviews from rschlussel, mlyublena, and kaikalur via c501027 February 22, 2024 18:29
@feilong-liu feilong-liu force-pushed the repeat_opt branch 2 times, most recently from c501027 to f72450c Compare February 22, 2024 18:31
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for non constant function arguments

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try non-const args here as well

@steveburnett
Copy link
Copy Markdown
Contributor

Nit: suggest changing the first word of the release note entry from "Optimize" to "Improve" following the Order of Changes in the [Release Note Guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines.

@feilong-liu feilong-liu force-pushed the repeat_opt branch 2 times, most recently from 4f0b6f9 to 48f3c99 Compare February 26, 2024 22:07
Copy link
Copy Markdown
Contributor

@pranjalssh pranjalssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff

@pranjalssh pranjalssh self-requested a review February 29, 2024 21:49
@pranjalssh pranjalssh merged commit 61794f0 into prestodb:master Feb 29, 2024
@feilong-liu feilong-liu deleted the repeat_opt branch February 29, 2024 23:01
@wanglinsong wanglinsong mentioned this pull request May 1, 2024
48 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants