Optimize null representation in encoded VariableBlockWidthBlock by radek-kondziolka · Pull Request #15760 · trinodb/trino

radek-kondziolka · 2023-01-18T15:46:14Z

Description

Currently, when block VariableWidthBlock is encoded it is writing an array of offsets for all positions regardless of the fact whether a position is null or not. Instead we could save the lengths only for non-null positions and compute offsets from an array of lengths and the array of nullability (array that determines whether position is null or not).

This change should be tested by io.trino.spi.block.TestVariableWidthBlockEncoding.

The difference was tested on query with different value of X to have a control on null frequency

with cs as (select *,  case when rand(100) < X then null else '0' end nullek from catalog_sales cs)
SELECT  count_if(cs.nullek is null), count(*), cs.nullek FROM cs
RIGHT JOIN call_center cc ON cc.cc_call_center_sk =  cs.cs_call_center_sk
GROUP BY 3
ORDER BY 1, 2;

Results (cumulative size of exchanged GB via network)

           No nullw   50% of nulls    99% of nulls   
baseline   28GB      27GB            26GB                    
change     28GB      24GB            21 GB

Let's wait for benchmarks results.

Release notes

(*) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

lukasz-stec

mostly lgtm, I would make sure that test cover cases with and without nulls

core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java

radek-kondziolka · 2023-01-20T10:36:04Z

micro benchmark results:

Benchmark                                   (nullChance)  Mode  Cnt  Score   Error  Units   Before
BenchmarkBlockSerde.deserializeSliceDirect             0  avgt   10  3.223 ± 0.068  ns/op   2.716 ± 0.150  ns/op
BenchmarkBlockSerde.deserializeSliceDirect           .01  avgt   10  4.499 ± 0.103  ns/op   3.725 ± 0.084  ns/op
BenchmarkBlockSerde.deserializeSliceDirect           .10  avgt   10  5.180 ± 0.159  ns/op   3.471 ± 0.075  ns/op
BenchmarkBlockSerde.deserializeSliceDirect           .50  avgt   10  5.819 ± 0.176  ns/op   2.678 ± 0.040  ns/op
BenchmarkBlockSerde.deserializeSliceDirect           .90  avgt   10  2.100 ± 0.050  ns/op   1.813 ± 0.017  ns/op   
BenchmarkBlockSerde.deserializeSliceDirect           .99  avgt   10  1.104 ± 0.019  ns/op   1.553 ± 0.024  ns/op
BenchmarkBlockSerde.serializeSliceDirect               0  avgt   10  2.324 ± 0.051  ns/op   5.436 ± 0.104  ns/op
BenchmarkBlockSerde.serializeSliceDirect             .01  avgt   10  3.360 ± 0.026  ns/op   5.900 ± 0.021  ns/op
BenchmarkBlockSerde.serializeSliceDirect             .10  avgt   10  4.127 ± 0.060  ns/op   6.453 ± 0.101  ns/op
BenchmarkBlockSerde.serializeSliceDirect             .50  avgt   10  2.870 ± 0.040  ns/op   4.948 ± 0.145  ns/op
BenchmarkBlockSerde.serializeSliceDirect             .90  avgt   10  2.740 ± 0.102  ns/op   4.954 ± 0.090  ns/op
BenchmarkBlockSerde.serializeSliceDirect             .99  avgt   10  1.951 ± 0.071  ns/op   4.315 ± 0.052  ns/op

deserialization is slower as expected but serialization is much faster (relatively). In total, it should be even faster.

macrobenchmark: (tpch/tpcds, orc, part, sf1000);

wall time tpch:  -1.49%
wall time tpcds: -1.05% 
cpu time tpch: -2.97% 
cpu time tpcds: -2.64%

macrobenchmark: (tpch/tpcds, orc, unpart, sf1000);

wall time tpch:  -2.37%
wall time tpcds: -1.68% 
cpu time tpch: -1.94% 
cpu time tpcds: -0.72%
network bytes tpch: 0
network bytes tpcds: -0.05%

raunaqmorarka

Please add results of TPC benchmarks as well

core/trino-main/src/test/java/io/trino/execution/buffer/TestPagesSerde.java

Currently, encoding of VariableBlockWidthBlock writes offsets (4 bytes per position) for every position, regardless of nullability of the position. Instead of that, it is sufficient to write lengths of non-null positions and null array. From that it is possible to get offsets. Benchmark (nullChance) Mode Cnt Score Error Units Before BenchmarkBlockSerde.deserializeSliceDirect 0 avgt 10 3.223 ± 0.068 ns/op 2.716 ± 0.150 ns/op BenchmarkBlockSerde.deserializeSliceDirect .01 avgt 10 4.499 ± 0.103 ns/op 3.725 ± 0.084 ns/op BenchmarkBlockSerde.deserializeSliceDirect .10 avgt 10 5.180 ± 0.159 ns/op 3.471 ± 0.075 ns/op BenchmarkBlockSerde.deserializeSliceDirect .50 avgt 10 5.819 ± 0.176 ns/op 2.678 ± 0.040 ns/op BenchmarkBlockSerde.deserializeSliceDirect .90 avgt 10 2.100 ± 0.050 ns/op 1.813 ± 0.017 ns/op BenchmarkBlockSerde.deserializeSliceDirect .99 avgt 10 1.104 ± 0.019 ns/op 1.553 ± 0.024 ns/op BenchmarkBlockSerde.serializeSliceDirect 0 avgt 10 2.324 ± 0.051 ns/op 5.436 ± 0.104 ns/op BenchmarkBlockSerde.serializeSliceDirect .01 avgt 10 3.360 ± 0.026 ns/op 5.900 ± 0.021 ns/op BenchmarkBlockSerde.serializeSliceDirect .10 avgt 10 4.127 ± 0.060 ns/op 6.453 ± 0.101 ns/op BenchmarkBlockSerde.serializeSliceDirect .50 avgt 10 2.870 ± 0.040 ns/op 4.948 ± 0.145 ns/op BenchmarkBlockSerde.serializeSliceDirect .90 avgt 10 2.740 ± 0.102 ns/op 4.954 ± 0.090 ns/op BenchmarkBlockSerde.serializeSliceDirect .99 avgt 10 1.951 ± 0.071 ns/op 4.315 ± 0.052 ns/op

cla-bot bot added the cla-signed label Jan 18, 2023

radek-kondziolka requested review from lukasz-stec and raunaqmorarka January 18, 2023 15:59

radek-kondziolka marked this pull request as ready for review January 18, 2023 15:59

lukasz-stec reviewed Jan 18, 2023

View reviewed changes

raunaqmorarka reviewed Jan 19, 2023

View reviewed changes

radek-kondziolka force-pushed the rk/try_more_optimal_null_representation_in_varchar_block branch 2 times, most recently from 60d4498 to fff7d51 Compare January 20, 2023 10:03

radek-kondziolka requested review from lukasz-stec and raunaqmorarka January 20, 2023 10:36

raunaqmorarka reviewed Jan 20, 2023

View reviewed changes

core/trino-main/src/test/java/io/trino/execution/buffer/TestPagesSerde.java Outdated Show resolved Hide resolved

radek-kondziolka force-pushed the rk/try_more_optimal_null_representation_in_varchar_block branch from fff7d51 to 8581782 Compare January 20, 2023 12:19

raunaqmorarka reviewed Jan 20, 2023

View reviewed changes

core/trino-main/src/test/java/io/trino/execution/buffer/TestPagesSerde.java Outdated Show resolved Hide resolved

radek-kondziolka force-pushed the rk/try_more_optimal_null_representation_in_varchar_block branch 4 times, most recently from 8fbc6af to ac3ad6f Compare January 23, 2023 12:42

raunaqmorarka added the performance label Jan 23, 2023

raunaqmorarka approved these changes Jan 23, 2023

View reviewed changes

radek-kondziolka force-pushed the rk/try_more_optimal_null_representation_in_varchar_block branch from ac3ad6f to b283d87 Compare January 23, 2023 16:44

github-actions bot added the tests:hive label Jan 23, 2023

radek-kondziolka force-pushed the rk/try_more_optimal_null_representation_in_varchar_block branch from b283d87 to 22b1aaa Compare January 24, 2023 08:13

raunaqmorarka merged commit ca1ab5f into trinodb:master Jan 24, 2023

github-actions bot added this to the 406 milestone Jan 24, 2023

colebow mentioned this pull request Jan 25, 2023

Add Trino 406 release notes #15625

Merged

pettyjamesm mentioned this pull request Jan 27, 2023

Improve VariableWidthBlock deserialization #15883

Merged

pettyjamesm mentioned this pull request Nov 20, 2025

Improve VariableWidthBlock encoding #27377

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize null representation in encoded VariableBlockWidthBlock#15760

Optimize null representation in encoded VariableBlockWidthBlock#15760
raunaqmorarka merged 1 commit intotrinodb:masterfrom
radek-kondziolka:rk/try_more_optimal_null_representation_in_varchar_block

radek-kondziolka commented Jan 18, 2023 •

edited by raunaqmorarka

Loading

Uh oh!

lukasz-stec left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

radek-kondziolka commented Jan 20, 2023 •

edited

Loading

Uh oh!

raunaqmorarka left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

radek-kondziolka commented Jan 18, 2023 • edited by raunaqmorarka Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Results (cumulative size of exchanged GB via network)

Release notes

Uh oh!

lukasz-stec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

radek-kondziolka commented Jan 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raunaqmorarka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

radek-kondziolka commented Jan 18, 2023 •

edited by raunaqmorarka

Loading

radek-kondziolka commented Jan 20, 2023 •

edited

Loading