Improve performance of grouping operations#18106
Merged
dain merged 12 commits intotrinodb:masterfrom Aug 19, 2023
Merged
Conversation
Member
|
fyi @lukasz-stec @radek-starburst |
lukasz-stec
reviewed
Jul 3, 2023
d0de237 to
b0148aa
Compare
electrum
approved these changes
Aug 18, 2023
core/trino-spi/src/main/java/io/trino/spi/function/FlatFixed.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/function/InvocationConvention.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/type/AbstractVariableWidthType.java
Outdated
Show resolved
Hide resolved
Member
There was a problem hiding this comment.
Does this need to be moved somewhere?
Member
Author
There was a problem hiding this comment.
No. In the old days, type operators were limited to simple things like equals and hashcode, which could only return a primitive, but with the introduction of READ operator, the return type can be anything. This commit is the first case where we have manually defined a READ operator that needed this check removed.
Member
There was a problem hiding this comment.
Please verify test code coverage for all of these cases
Member
Author
There was a problem hiding this comment.
There are existing tests that cover this codebase well
...main/src/main/java/io/trino/operator/aggregation/builder/InMemoryHashAggregationBuilder.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/BenchmarkGroupByHash.java
Outdated
Show resolved
Hide resolved
ChannelSet is hard coded to pass a simple page with the values in channel 0 and an optional precomputed hash in channel 1. This aligns with the assumptions of the GroupByHash, so the hashChannels argument to the contains method is not needed.
GroupByHash now assumes that the input page contains only group by channels and an optional precomputed hash channel.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a new flat memory calling convention and then uses this new capability to improve the performance of GroupByHash and some other custom data structures in Trino.
Please note, that this is just the first working design for flat memory. There maybe other more optimal designs, but I believe this one is good enough for now.
FLATandFLAT_RETURNcalling conventionEach type defines a new flat memory encoding which contains fixed sized data and optional variable width data. The fixed size data is always the same length for each type, and is declared by a method on the type. The variable width data can be a different size for each value. In the current design the type is responsible for recording the offset and length of the variable width data in fixed sized data of the type. This design was chosen as it allows for C++ style optimizations where small variable width data is stored directly in the fixed value sections.
The
FLATargument convention has the following arguments:@FlatFixed byte[] fixedSizeSlice: the fixed sized data for the value@FlatFixedOffset int fixedSizeOffset: offset of the value in the fixed data@FlatVariableWidth byte[] variableSizeSlice: the variable size data for the valueThe
FLAT_RETURNconvention has the following arguments:byte[] fixedSizeSlice:the fixed sized data for the valueint fixedSizeOffset: offset of the value in the fixed databyte[] variableSizeSlice: variable width outputint variableSizeOffset: starting offset of the variable width dataA caller can use these calling conventions with the
READ_VALUEoperator to move data between stack values, blocks, and flat memory.Flat values
Most types are fixed width, and the flat implementations are trivial. For fixed width types, the type simply declares how many bytes it needs, and then implements
READ_VALUEoperators that convert to and from stack types. For fixed types that do not use a primitive stack type, the types typically add additional operators to convert to and from blocks for performance.For variable width, array, and map types, the design is a bit more complicated. The model does not define how variable width memory is managed by the caller, but the calling convention does add some constraints. The
variableSizeSilicemust be large enough for the value. ThegetFlatVariableWidthSizeon the type is used to compute the length before theREAD_VALUEoperator is called. Since this method only works with block and position, flat data can currently only be loaded from a block.Memory layout
In addition to the new
FlatGroupByHash, this PR converts a few of aggregation data structures to a flat design. In general, the code uses the following design:The map data structures use a highly modified append only version of swiss tables.
For variable with data, all implementations use the
VariableWidthDataclass which manages a list of allocated chunks for the data. Except for the histogram classes, all implementations are append only, so the implementation is simple. For the histogram code, I use a simple copy collector which rewrites the pointers in the histogram during collection.Benchmark
For the benchmark below, I started with the BenchmarkGroupByHash, but I split the hash build phase from the reading phase as they have different performance.
varchar
bigint
BigintGroupByHashBigintGroupByHashis still used where possibleClass: BenchmarkGroupByHash
GroupCount: 3,000,000
Warmup: 5
Run: 5
Forks: 1
Units: ns/op
JVM: JDK 17.0.4, OpenJDK 64-Bit Server VM, 17.0.4+8-LTS
CPU: Apple M1 Max
addPages
writeData
Release notes
(X) Release notes are required, with the following suggested text: