Skip to content

Improve performance of grouping operations#18106

Merged
dain merged 12 commits intotrinodb:masterfrom
dain:flat
Aug 19, 2023
Merged

Improve performance of grouping operations#18106
dain merged 12 commits intotrinodb:masterfrom
dain:flat

Conversation

@dain
Copy link
Copy Markdown
Member

@dain dain commented Jul 3, 2023

Description

This PR adds a new flat memory calling convention and then uses this new capability to improve the performance of GroupByHash and some other custom data structures in Trino.

Please note, that this is just the first working design for flat memory. There maybe other more optimal designs, but I believe this one is good enough for now.

FLAT and FLAT_RETURN calling convention

Each type defines a new flat memory encoding which contains fixed sized data and optional variable width data. The fixed size data is always the same length for each type, and is declared by a method on the type. The variable width data can be a different size for each value. In the current design the type is responsible for recording the offset and length of the variable width data in fixed sized data of the type. This design was chosen as it allows for C++ style optimizations where small variable width data is stored directly in the fixed value sections.

The FLAT argument convention has the following arguments:

  • @FlatFixed byte[] fixedSizeSlice: the fixed sized data for the value
  • @FlatFixedOffset int fixedSizeOffset: offset of the value in the fixed data
  • @FlatVariableWidth byte[] variableSizeSlice: the variable size data for the value

The FLAT_RETURN convention has the following arguments:

  • byte[] fixedSizeSlice:the fixed sized data for the value
  • int fixedSizeOffset: offset of the value in the fixed data
  • byte[] variableSizeSlice: variable width output
  • int variableSizeOffset: starting offset of the variable width data

A caller can use these calling conventions with the READ_VALUE operator to move data between stack values, blocks, and flat memory.

Flat values

Most types are fixed width, and the flat implementations are trivial. For fixed width types, the type simply declares how many bytes it needs, and then implements READ_VALUE operators that convert to and from stack types. For fixed types that do not use a primitive stack type, the types typically add additional operators to convert to and from blocks for performance.

For variable width, array, and map types, the design is a bit more complicated. The model does not define how variable width memory is managed by the caller, but the calling convention does add some constraints. The variableSizeSilice must be large enough for the value. The getFlatVariableWidthSize on the type is used to compute the length before the READ_VALUE operator is called. Since this method only works with block and position, flat data can currently only be loaded from a block.

Memory layout

In addition to the new FlatGroupByHash, this PR converts a few of aggregation data structures to a flat design. In general, the code uses the following design:

  • an array of fixed sized records is encoded in a byte array
    • (optional) pointer to the variable width data for the record
    • field 0
    • field 1
    • ...
  • records are grouped into 1024 chunks

The map data structures use a highly modified append only version of swiss tables.

For variable with data, all implementations use the VariableWidthData class which manages a list of allocated chunks for the data. Except for the histogram classes, all implementations are append only, so the implementation is simple. For the histogram code, I use a simple copy collector which rewrites the pointers in the histogram during collection.

Benchmark

For the benchmark below, I started with the BenchmarkGroupByHash, but I split the hash build phase from the reading phase as they have different performance.

varchar

  • hash build is always faster and gets significantly faster with more hash columns
  • reading is a bit slower ~10ns/op for a single value, but improves and becomes slightly faster with more values
  • the speedup in hash build outweighs the slowdown in reading

bigint

  • single bigint is significantly slower, due to the optimized BigintGroupByHash
  • the optimized BigintGroupByHash is still used where possible
  • at 5 bigint values, the flat version becomes faster for hash build
  • for reading, the flat version is always slower, but the speedup in hash build outweighs the slowdown

Class: BenchmarkGroupByHash
GroupCount: 3,000,000
Warmup: 5
Run: 5
Forks: 1
Units: ns/op
JVM: JDK 17.0.4, OpenJDK 64-Bit Server VM, 17.0.4+8-LTS
CPU: Apple M1 Max

addPages

Channels dataType flat hashEnabled Score Error
1 VARCHAR true true 103.811 ± 25.466
1 VARCHAR true false 117.130 ± 5.245
1 VARCHAR false true 121.904 ± 4.668
1 VARCHAR false false 172.735 ± 22.398
1 BIGINT true true 75.452 ± 9.512
1 BIGINT true false 71.511 ± 6.594
1 BIGINT false true 27.372 ± 1.713
1 BIGINT false false 27.248 ± 0.980
5 VARCHAR true true 220.168 ± 3.572
5 VARCHAR true false 276.091 ± 6.893
5 VARCHAR false true 410.091 ± 29.373
5 VARCHAR false false 637.439 ± 28.930
5 BIGINT true true 175.844 ± 24.282
5 BIGINT true false 202.078 ± 12.286
5 BIGINT false true 207.114 ± 12.812
5 BIGINT false false 361.983 ± 25.841
10 VARCHAR true true 322.373 ± 74.132
10 VARCHAR true false 496.402 ± 69.530
10 VARCHAR false true 999.048 ± 19.745
10 VARCHAR false false 1387.814 ± 34.134
10 BIGINT true true 209.812 ± 10.570
10 BIGINT true false 350.102 ± 20.242
10 BIGINT false true 432.306 ± 11.102
10 BIGINT false false 591.831 ± 16.321
15 VARCHAR true true 399.927 ± 48.456
15 VARCHAR true false 578.235 ± 27.031
15 VARCHAR false true 1560.639 ± 28.782
15 VARCHAR false false 2183.956 ± 66.250
15 BIGINT true true 261.694 ± 6.521
15 BIGINT true false 359.665 ± 18.320
15 BIGINT false true 716.843 ± 7.994
15 BIGINT false false 894.774 ± 52.457
20 VARCHAR true true 430.406 ± 6.087
20 VARCHAR true false 735.724 ± 18.891
20 VARCHAR false true 2139.748 ± 8.435
20 VARCHAR false false 2972.520 ± 25.457
20 BIGINT true true 305.419 ± 10.262
20 BIGINT true false 513.633 ± 10.311
20 BIGINT false true 1017.530 ± 10.245
20 BIGINT false false 1495.760 ± 22.450

writeData

Channels dataType flat hashEnabled Score Error
1 VARCHAR true true 17.504 ± 1.039
1 VARCHAR true false 12.487 ± 1.439
1 VARCHAR false true 7.436 ± 0.497
1 VARCHAR false false 5.580 ± 0.157
1 BIGINT true true 9.790 ± 1.703
1 BIGINT true false 7.555 ± 0.596
1 BIGINT false true 1.965 ± 0.015
1 BIGINT false false 1.192 ± 0.027
5 VARCHAR true true 38.004 ± 3.256
5 VARCHAR true false 35.391 ± 3.224
5 VARCHAR false true 31.115 ± 0.370
5 VARCHAR false false 27.668 ± 1.669
5 BIGINT true true 22.892 ± 1.297
5 BIGINT true false 21.485 ± 2.003
5 BIGINT false true 8.018 ± 0.162
5 BIGINT false false 6.764 ± 0.103
10 VARCHAR true true 63.050 ± 2.950
10 VARCHAR true false 62.337 ± 3.360
10 VARCHAR false true 63.038 ± 3.552
10 VARCHAR false false 54.750 ± 1.206
10 BIGINT true true 37.644 ± 5.509
10 BIGINT true false 35.823 ± 5.388
10 BIGINT false true 14.773 ± 0.243
10 BIGINT false false 13.649 ± 0.931
15 VARCHAR true true 90.741 ± 3.962
15 VARCHAR true false 92.005 ± 4.998
15 VARCHAR false true 100.702 ± 3.140
15 VARCHAR false false 89.698 ± 8.379
15 BIGINT true true 50.360 ± 8.749
15 BIGINT true false 49.332 ± 8.752
15 BIGINT false true 24.758 ± 1.775
15 BIGINT false false 21.928 ± 0.556
20 VARCHAR true true 113.357 ± 2.322
20 VARCHAR true false 113.714 ± 7.999
20 VARCHAR false true 144.485 ± 3.490
20 VARCHAR false false 129.185 ± 3.950
20 BIGINT true true 62.644 ± 7.935
20 BIGINT true false 64.959 ± 4.246
20 BIGINT false true 40.574 ± 2.499
20 BIGINT false false 40.378 ± 3.469

Release notes

(X) Release notes are required, with the following suggested text:

# General
* Improve performance of grouping operations. ({issue}`issuenumber`)

@dain dain requested review from electrum and martint July 3, 2023 00:24
@cla-bot cla-bot bot added the cla-signed label Jul 3, 2023
@github-actions github-actions bot added tests:hive delta-lake Delta Lake connector hive Hive connector mongodb MongoDB connector labels Jul 3, 2023
@raunaqmorarka
Copy link
Copy Markdown
Member

fyi @lukasz-stec @radek-starburst

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be moved somewhere?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. In the old days, type operators were limited to simple things like equals and hashcode, which could only return a primitive, but with the introduction of READ operator, the return type can be anything. This commit is the first case where we have manually defined a READ operator that needed this check removed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify test code coverage for all of these cases

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing tests that cover this codebase well

@dain dain merged commit 873e50e into trinodb:master Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector hive Hive connector mongodb MongoDB connector

Development

Successfully merging this pull request may close these issues.

4 participants