Improve performance of aggregation operator by fgwang7w · Pull Request #19425 · prestodb/presto

fgwang7w · 2023-04-18T17:46:54Z

Reduce large long[] memory usage and Improve Group-by performance

== RELEASE NOTES ==

General Changes
* Improve performance of aggregation operator

For memory optimization:

Avoid allocating huge long bytes for MultiChannelGroupByHash.

e.g we are looking at 64MB of long[] bytes * 15 = 960MB that can be avoided for memory allocation

Cherry-pick of trinodb/trino#9514
Cherry-pick of trinodb/trino#10965
Cherry-pick of trinodb/trino#12336
Cherry-pick of trinodb/trino#12597

Test Result: (sample query from tpcds-q10 with multiple grouping sets)
Before:
Peak User Memory | 11.37MB
Peak Total Memory | 78.63MB
Elapsed Time | 7.68s

After:
Peak User Memory | 5.65MB
Peak Total Memory | 61.71MB
Elapsed Time | 2.08s

Performance test on TPC-H 1TB benchmark:

query	before(ms)	after(ms)	Performance gain
presto/tpch/q06.sql	14495	13890	4.17%
presto/tpch/q10.sql	53888	38592	28.39%
presto/tpch/q17.sql	95613	91107	4.71%
presto/tpch/q20.sql	38565	34638	10.18%

Cherry-pick of trinodb/trino@0a70468 co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>

Cherry-pick of trinodb/trino@301ff47 Co-authored-by: skrzypo987<krzysztof.skrzypczynski@starburstdata.com>

If the number of combinations of all dictionaries in a page is below certain number, we can store the results in a small array and reuse found groups Cherry-pick of trinodb/trino@ffd1ee8 Co-authored-by: skrzypo987<krzysztof.skrzypczynski@starburstdata.com>

For simplicity and tiny performance gain. Cherry-pick of trinodb/trino@7ec3bd0 Co-authored-by: skrzypo987 <krzysztof.skrzypczynski@starburstdata.com>

Cherry-pick of trinodb/trino@7ee53ea Co-authored-by: skrzypo987 <krzysztof.skrzypczynski@starburstdata.com>

Cherry-pick of trinodb/trino@27e0c32 Co-authored-by: skrzypo987<krzysztof.skrzypczynski@starburstdata.com>

Previously the hash table capacity was checked every row to see whether a rehash is needed. Now the input page is split into batches and it is assumed that every row in batch will create a new group (which is rarely the case) and rehashing is done in advance before processing. This may slightly increase memory footprint for small number of groups, however there is a tiny performance gain as the capacity is not checked every row. Cherry-pick of trinodb/trino@88cd492 Co-authored-by: skrzypo987<krzysztof.skrzypczynski@starburstdata.com>

linux-foundation-easycla · 2023-04-18T17:47:02Z

❌ - login: @sopel39 / name: Karol Sobczak . The commit (e0684d6, cfcbaae) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.
❌ - login: @skrzypo987 . The commit (822dda8, 676e73a, 398c67e, be2e27c, c67e7d2) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.
❌ - login: @lukasz-stec . The commit (f0b60e9) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

There's an off-by-one error in the check that can cause a failure when the page is empty Cherry-pick of trinodb/trino@08db4fb Co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>

yingsu00 · 2023-04-18T21:28:44Z

@tdcmeehan Do you know how we solve the CLA problems?

fgwang7w · 2023-05-30T20:59:23Z

@tdcmeehan @yingsu00 gentle ping. we still have CLA compliance issues unsolved and need community's support to figure out how to make it passed. thanks

sopel39 and others added 7 commits April 13, 2023 09:58

Add support for dictionaries to BigintGroupByHash

e0684d6

Cherry-pick of trinodb/trino@0a70468 co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>

Remove warnings

822dda8

Cherry-pick of trinodb/trino@301ff47 Co-authored-by: skrzypo987<krzysztof.skrzypczynski@starburstdata.com>

Replace block builder with a simple long[]

398c67e

For simplicity and tiny performance gain. Cherry-pick of trinodb/trino@7ec3bd0 Co-authored-by: skrzypo987 <krzysztof.skrzypczynski@starburstdata.com>

Remove unnecessary casts

be2e27c

Cherry-pick of trinodb/trino@7ee53ea Co-authored-by: skrzypo987 <krzysztof.skrzypczynski@starburstdata.com>

Improve GroupByHashYieldAssertion

f0b60e9

Cherry-pick of trinodb/trino@27e0c32 Co-authored-by: skrzypo987<krzysztof.skrzypczynski@starburstdata.com>

fgwang7w requested a review from yingsu00 April 18, 2023 18:04

Fix bounds check in MultiChannelGroupByHash and BigintGroupByHash

cfcbaae

There's an off-by-one error in the check that can cause a failure when the page is empty Cherry-pick of trinodb/trino@08db4fb Co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>

fgwang7w force-pushed the optimizemultichannelgroupby branch from 3cb3d2a to cfcbaae Compare April 18, 2023 20:16

yingsu00 self-assigned this Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of aggregation operator#19425

Improve performance of aggregation operator#19425
fgwang7w wants to merge 8 commits intoprestodb:masterfrom
fgwang7w:optimizemultichannelgroupby

fgwang7w commented Apr 18, 2023 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Apr 18, 2023 •

edited

Loading

Uh oh!

yingsu00 commented Apr 18, 2023

Uh oh!

fgwang7w commented May 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fgwang7w commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yingsu00 commented Apr 18, 2023

Uh oh!

fgwang7w commented May 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fgwang7w commented Apr 18, 2023 •

edited

Loading

linux-foundation-easycla bot commented Apr 18, 2023 •

edited

Loading