Improve performance of grouping operations by dain · Pull Request #18106 · trinodb/trino

dain · 2023-07-03T00:24:40Z

Description

This PR adds a new flat memory calling convention and then uses this new capability to improve the performance of GroupByHash and some other custom data structures in Trino.

Please note, that this is just the first working design for flat memory. There maybe other more optimal designs, but I believe this one is good enough for now.

`FLAT` and `FLAT_RETURN` calling convention

Each type defines a new flat memory encoding which contains fixed sized data and optional variable width data. The fixed size data is always the same length for each type, and is declared by a method on the type. The variable width data can be a different size for each value. In the current design the type is responsible for recording the offset and length of the variable width data in fixed sized data of the type. This design was chosen as it allows for C++ style optimizations where small variable width data is stored directly in the fixed value sections.

The FLAT argument convention has the following arguments:

@FlatFixed byte[] fixedSizeSlice: the fixed sized data for the value
@FlatFixedOffset int fixedSizeOffset: offset of the value in the fixed data
@FlatVariableWidth byte[] variableSizeSlice: the variable size data for the value

The FLAT_RETURN convention has the following arguments:

byte[] fixedSizeSlice:the fixed sized data for the value
int fixedSizeOffset: offset of the value in the fixed data
byte[] variableSizeSlice: variable width output
int variableSizeOffset: starting offset of the variable width data

A caller can use these calling conventions with the READ_VALUE operator to move data between stack values, blocks, and flat memory.

Flat values

Most types are fixed width, and the flat implementations are trivial. For fixed width types, the type simply declares how many bytes it needs, and then implements READ_VALUE operators that convert to and from stack types. For fixed types that do not use a primitive stack type, the types typically add additional operators to convert to and from blocks for performance.

For variable width, array, and map types, the design is a bit more complicated. The model does not define how variable width memory is managed by the caller, but the calling convention does add some constraints. The variableSizeSilice must be large enough for the value. The getFlatVariableWidthSize on the type is used to compute the length before the READ_VALUE operator is called. Since this method only works with block and position, flat data can currently only be loaded from a block.

Memory layout

In addition to the new FlatGroupByHash, this PR converts a few of aggregation data structures to a flat design. In general, the code uses the following design:

an array of fixed sized records is encoded in a byte array
- (optional) pointer to the variable width data for the record
- field 0
- field 1
- ...
records are grouped into 1024 chunks

The map data structures use a highly modified append only version of swiss tables.

For variable with data, all implementations use the VariableWidthData class which manages a list of allocated chunks for the data. Except for the histogram classes, all implementations are append only, so the implementation is simple. For the histogram code, I use a simple copy collector which rewrites the pointers in the histogram during collection.

Benchmark

For the benchmark below, I started with the BenchmarkGroupByHash, but I split the hash build phase from the reading phase as they have different performance.

varchar

hash build is always faster and gets significantly faster with more hash columns
reading is a bit slower ~10ns/op for a single value, but improves and becomes slightly faster with more values
the speedup in hash build outweighs the slowdown in reading

bigint

single bigint is significantly slower, due to the optimized BigintGroupByHash
the optimized BigintGroupByHash is still used where possible
at 5 bigint values, the flat version becomes faster for hash build
for reading, the flat version is always slower, but the speedup in hash build outweighs the slowdown

Class: BenchmarkGroupByHash
GroupCount: 3,000,000
Warmup: 5
Run: 5
Forks: 1
Units: ns/op
JVM: JDK 17.0.4, OpenJDK 64-Bit Server VM, 17.0.4+8-LTS
CPU: Apple M1 Max

addPages

Channels	dataType	flat	hashEnabled	Score	Error
1	VARCHAR	true	true	103.811	± 25.466
1	VARCHAR	true	false	117.130	± 5.245
1	VARCHAR	false	true	121.904	± 4.668
1	VARCHAR	false	false	172.735	± 22.398
1	BIGINT	true	true	75.452	± 9.512
1	BIGINT	true	false	71.511	± 6.594
1	BIGINT	false	true	27.372	± 1.713
1	BIGINT	false	false	27.248	± 0.980
5	VARCHAR	true	true	220.168	± 3.572
5	VARCHAR	true	false	276.091	± 6.893
5	VARCHAR	false	true	410.091	± 29.373
5	VARCHAR	false	false	637.439	± 28.930
5	BIGINT	true	true	175.844	± 24.282
5	BIGINT	true	false	202.078	± 12.286
5	BIGINT	false	true	207.114	± 12.812
5	BIGINT	false	false	361.983	± 25.841
10	VARCHAR	true	true	322.373	± 74.132
10	VARCHAR	true	false	496.402	± 69.530
10	VARCHAR	false	true	999.048	± 19.745
10	VARCHAR	false	false	1387.814	± 34.134
10	BIGINT	true	true	209.812	± 10.570
10	BIGINT	true	false	350.102	± 20.242
10	BIGINT	false	true	432.306	± 11.102
10	BIGINT	false	false	591.831	± 16.321
15	VARCHAR	true	true	399.927	± 48.456
15	VARCHAR	true	false	578.235	± 27.031
15	VARCHAR	false	true	1560.639	± 28.782
15	VARCHAR	false	false	2183.956	± 66.250
15	BIGINT	true	true	261.694	± 6.521
15	BIGINT	true	false	359.665	± 18.320
15	BIGINT	false	true	716.843	± 7.994
15	BIGINT	false	false	894.774	± 52.457
20	VARCHAR	true	true	430.406	± 6.087
20	VARCHAR	true	false	735.724	± 18.891
20	VARCHAR	false	true	2139.748	± 8.435
20	VARCHAR	false	false	2972.520	± 25.457
20	BIGINT	true	true	305.419	± 10.262
20	BIGINT	true	false	513.633	± 10.311
20	BIGINT	false	true	1017.530	± 10.245
20	BIGINT	false	false	1495.760	± 22.450

writeData

Channels	dataType	flat	hashEnabled	Score	Error
1	VARCHAR	true	true	17.504	± 1.039
1	VARCHAR	true	false	12.487	± 1.439
1	VARCHAR	false	true	7.436	± 0.497
1	VARCHAR	false	false	5.580	± 0.157
1	BIGINT	true	true	9.790	± 1.703
1	BIGINT	true	false	7.555	± 0.596
1	BIGINT	false	true	1.965	± 0.015
1	BIGINT	false	false	1.192	± 0.027
5	VARCHAR	true	true	38.004	± 3.256
5	VARCHAR	true	false	35.391	± 3.224
5	VARCHAR	false	true	31.115	± 0.370
5	VARCHAR	false	false	27.668	± 1.669
5	BIGINT	true	true	22.892	± 1.297
5	BIGINT	true	false	21.485	± 2.003
5	BIGINT	false	true	8.018	± 0.162
5	BIGINT	false	false	6.764	± 0.103
10	VARCHAR	true	true	63.050	± 2.950
10	VARCHAR	true	false	62.337	± 3.360
10	VARCHAR	false	true	63.038	± 3.552
10	VARCHAR	false	false	54.750	± 1.206
10	BIGINT	true	true	37.644	± 5.509
10	BIGINT	true	false	35.823	± 5.388
10	BIGINT	false	true	14.773	± 0.243
10	BIGINT	false	false	13.649	± 0.931
15	VARCHAR	true	true	90.741	± 3.962
15	VARCHAR	true	false	92.005	± 4.998
15	VARCHAR	false	true	100.702	± 3.140
15	VARCHAR	false	false	89.698	± 8.379
15	BIGINT	true	true	50.360	± 8.749
15	BIGINT	true	false	49.332	± 8.752
15	BIGINT	false	true	24.758	± 1.775
15	BIGINT	false	false	21.928	± 0.556
20	VARCHAR	true	true	113.357	± 2.322
20	VARCHAR	true	false	113.714	± 7.999
20	VARCHAR	false	true	144.485	± 3.490
20	VARCHAR	false	false	129.185	± 3.950
20	BIGINT	true	true	62.644	± 7.935
20	BIGINT	true	false	64.959	± 4.246
20	BIGINT	false	true	40.574	± 2.499
20	BIGINT	false	false	40.378	± 3.469

Release notes

(X) Release notes are required, with the following suggested text:

# General
* Improve performance of grouping operations. ({issue}`issuenumber`)

raunaqmorarka · 2023-07-03T08:55:51Z

fyi @lukasz-stec @radek-starburst

core/trino-main/src/main/java/io/trino/operator/FlatHash.java

core/trino-main/src/main/java/io/trino/type/IpAddressType.java

core/trino-spi/src/main/java/io/trino/spi/function/FlatFixed.java

core/trino-spi/src/main/java/io/trino/spi/function/InvocationConvention.java

core/trino-spi/src/main/java/io/trino/spi/type/AbstractVariableWidthType.java

electrum · 2023-08-17T23:04:57Z

core/trino-spi/src/main/java/io/trino/spi/type/TypeOperatorDeclaration.java

Does this need to be moved somewhere?

No. In the old days, type operators were limited to simple things like equals and hashcode, which could only return a primitive, but with the introduction of READ operator, the return type can be anything. This commit is the first case where we have manually defined a READ operator that needed this check removed.

electrum · 2023-08-17T23:57:08Z

core/trino-main/src/main/java/io/trino/operator/JoinDomainBuilder.java

Please verify test code coverage for all of these cases

There are existing tests that cover this codebase well

...main/src/main/java/io/trino/operator/aggregation/builder/InMemoryHashAggregationBuilder.java

core/trino-main/src/main/java/io/trino/SystemSessionProperties.java

core/trino-main/src/main/java/io/trino/operator/FlatHash.java

core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHash.java

ChannelSet is hard coded to pass a simple page with the values in channel 0 and an optional precomputed hash in channel 1. This aligns with the assumptions of the GroupByHash, so the hashChannels argument to the contains method is not needed.

GroupByHash now assumes that the input page contains only group by channels and an optional precomputed hash channel.

dain requested review from electrum and martint July 3, 2023 00:24

cla-bot bot added the cla-signed label Jul 3, 2023

github-actions bot added tests:hive delta-lake Delta Lake connector hive Hive connector mongodb MongoDB connector labels Jul 3, 2023

dain force-pushed the flat branch from 13ba91a to 72f2155 Compare July 3, 2023 04:24

lukasz-stec reviewed Jul 3, 2023

View reviewed changes

core/trino-main/src/main/java/io/trino/operator/FlatHash.java Show resolved Hide resolved

martint mentioned this pull request Jul 10, 2023

Project Hummingbird #14237

Open

19 tasks

dain force-pushed the flat branch from 72f2155 to 32b9187 Compare July 13, 2023 20:24

dain force-pushed the flat branch from 32b9187 to 7854409 Compare August 1, 2023 17:20

dain force-pushed the flat branch 2 times, most recently from d0de237 to b0148aa Compare August 16, 2023 20:10

electrum approved these changes Aug 18, 2023

View reviewed changes

dain added 12 commits August 18, 2023 16:18

Generate distinctFrom from equals for more calling conventions

4b96bd7

Add FLAT and FLAT_RETURN calling convention

83a7f9d

Rewrite TypedHeap using flat data

7988634

Clean up typed histogram

e9407bf

Rewrite TypedHistogram using flat data

6552f6b

Rewrite DynamicFilterSourceOperator to use a custom domain builder

e284ce7

Fix warnings in group by hash and tests

6a7ff79

Change InterpretedHashGenerator to use TypeOperators

a914e55

Simplify GroupByHash usage

273114d

GroupByHash now assumes that the input page contains only group by channels and an optional precomputed hash channel.

Remove unnecessary GroupByHash getTypes method

8d2176d

Add FlatGroupByHash

212932d

dain force-pushed the flat branch from b0148aa to 212932d Compare August 18, 2023 23:18

dain merged commit 873e50e into trinodb:master Aug 19, 2023

dain deleted the flat branch August 19, 2023 04:03

github-actions bot added this to the 425 milestone Aug 19, 2023

wendigo mentioned this pull request Aug 21, 2023

Flaky io.trino.faulttolerant.hive.TestHiveFaultTolerantExecutionJoinQueries.testJoinWithMultipleInSubqueryClauses #18752

Closed

colebow mentioned this pull request Aug 23, 2023

Add Trino 425 release notes #18782

Merged

guyco33 mentioned this pull request Sep 21, 2023

[Spilling] Memory tracking issue: worker OOM in FlatGroupByHash #19119

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of grouping operations#18106

Improve performance of grouping operations#18106
dain merged 12 commits intotrinodb:masterfrom
dain:flat

dain commented Jul 3, 2023

Uh oh!

raunaqmorarka commented Jul 3, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

electrum Aug 17, 2023

Uh oh!

dain Aug 18, 2023

Uh oh!

electrum Aug 17, 2023

Uh oh!

dain Aug 18, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

dain commented Jul 3, 2023

Description

FLAT and FLAT_RETURN calling convention

Flat values

Memory layout

Benchmark

addPages

writeData

Release notes

Uh oh!

raunaqmorarka commented Jul 3, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

electrum Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

dain Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

electrum Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

dain Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

`FLAT` and `FLAT_RETURN` calling convention