Optimize null representation in encoded VariableBlockWidthBlock#15760
Merged
raunaqmorarka merged 1 commit intotrinodb:masterfrom Jan 24, 2023
Conversation
lukasz-stec
reviewed
Jan 18, 2023
Member
lukasz-stec
left a comment
There was a problem hiding this comment.
mostly lgtm, I would make sure that test cover cases with and without nulls
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
60d4498 to
fff7d51
Compare
Contributor
Author
|
micro benchmark results: deserialization is slower as expected but serialization is much faster (relatively). In total, it should be even faster. macrobenchmark: ( macrobenchmark: ( |
Member
raunaqmorarka
left a comment
There was a problem hiding this comment.
Please add results of TPC benchmarks as well
core/trino-main/src/test/java/io/trino/execution/buffer/TestPagesSerde.java
Outdated
Show resolved
Hide resolved
fff7d51 to
8581782
Compare
core/trino-main/src/test/java/io/trino/execution/buffer/TestPagesSerde.java
Outdated
Show resolved
Hide resolved
8fbc6af to
ac3ad6f
Compare
raunaqmorarka
approved these changes
Jan 23, 2023
ac3ad6f to
b283d87
Compare
Currently, encoding of VariableBlockWidthBlock writes offsets (4 bytes per position) for every position, regardless of nullability of the position. Instead of that, it is sufficient to write lengths of non-null positions and null array. From that it is possible to get offsets. Benchmark (nullChance) Mode Cnt Score Error Units Before BenchmarkBlockSerde.deserializeSliceDirect 0 avgt 10 3.223 ± 0.068 ns/op 2.716 ± 0.150 ns/op BenchmarkBlockSerde.deserializeSliceDirect .01 avgt 10 4.499 ± 0.103 ns/op 3.725 ± 0.084 ns/op BenchmarkBlockSerde.deserializeSliceDirect .10 avgt 10 5.180 ± 0.159 ns/op 3.471 ± 0.075 ns/op BenchmarkBlockSerde.deserializeSliceDirect .50 avgt 10 5.819 ± 0.176 ns/op 2.678 ± 0.040 ns/op BenchmarkBlockSerde.deserializeSliceDirect .90 avgt 10 2.100 ± 0.050 ns/op 1.813 ± 0.017 ns/op BenchmarkBlockSerde.deserializeSliceDirect .99 avgt 10 1.104 ± 0.019 ns/op 1.553 ± 0.024 ns/op BenchmarkBlockSerde.serializeSliceDirect 0 avgt 10 2.324 ± 0.051 ns/op 5.436 ± 0.104 ns/op BenchmarkBlockSerde.serializeSliceDirect .01 avgt 10 3.360 ± 0.026 ns/op 5.900 ± 0.021 ns/op BenchmarkBlockSerde.serializeSliceDirect .10 avgt 10 4.127 ± 0.060 ns/op 6.453 ± 0.101 ns/op BenchmarkBlockSerde.serializeSliceDirect .50 avgt 10 2.870 ± 0.040 ns/op 4.948 ± 0.145 ns/op BenchmarkBlockSerde.serializeSliceDirect .90 avgt 10 2.740 ± 0.102 ns/op 4.954 ± 0.090 ns/op BenchmarkBlockSerde.serializeSliceDirect .99 avgt 10 1.951 ± 0.071 ns/op 4.315 ± 0.052 ns/op
b283d87 to
22b1aaa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Currently, when block
VariableWidthBlockis encoded it is writing an array of offsets for all positions regardless of the fact whether a position is null or not. Instead we could save the lengths only for non-null positions and compute offsets from an array of lengths and the array of nullability (array that determines whether position isnullor not).This change should be tested by
io.trino.spi.block.TestVariableWidthBlockEncoding.The difference was tested on query with different value of
Xto have a control on null frequencyResults (cumulative size of exchanged GB via network)
Let's wait for benchmarks results.
Release notes
(*) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: