Skip to content

Conversation

@pettyjamesm
Copy link
Member

@pettyjamesm pettyjamesm commented Jan 27, 2023

Description

Follows up from #15760 to further optimize VariableWidthBlockEncoding#readBlock to avoid allocating an unnecessary additional lengths array, since the final offsets array can temporarily store the serialized lengths of non-null positions and transform them back into valid offsets in-place.

Avoiding the extra array allocation does not reduce CPU throughput, and generally also improve as a result of being able to trigger CMOV optimizations a special path for the "no-nulls" case.

Benchmarks:

Benchmark                                  (nulls)  Mode  Before         After          Units 
BenchmarkBlockSerde.deserializeSliceDirect       0  avgt  4.174 ± 0.455  3.700 ± 0.092  ns/op
BenchmarkBlockSerde.deserializeSliceDirect     .01  avgt  4.943 ± 0.340  4.575 ± 0.330  ns/op
BenchmarkBlockSerde.deserializeSliceDirect     .10  avgt  5.471 ± 0.329  5.252 ± 0.078  ns/op
BenchmarkBlockSerde.deserializeSliceDirect     .50  avgt  6.730 ± 0.815  3.592 ± 0.167  ns/op <- CMOV
BenchmarkBlockSerde.deserializeSliceDirect     .90  avgt  2.636 ± 0.265  2.646 ± 0.260  ns/op
BenchmarkBlockSerde.deserializeSliceDirect     .99  avgt  1.303 ± 0.068  1.708 ± 0.038  ns/op

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Jan 27, 2023
@pettyjamesm pettyjamesm force-pushed the further-improve-variablewidthblock-encoding branch 2 times, most recently from 19f13fa to ade22a2 Compare January 27, 2023 21:42
Copy link
Contributor

@radek-kondziolka radek-kondziolka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please run microbenchmarks

@pettyjamesm pettyjamesm force-pushed the further-improve-variablewidthblock-encoding branch 3 times, most recently from 250f2fb to cc6286f Compare January 30, 2023 19:28
@sopel39 sopel39 requested a review from raunaqmorarka January 31, 2023 11:21
@pettyjamesm pettyjamesm force-pushed the further-improve-variablewidthblock-encoding branch from cc6286f to 82b6d42 Compare January 31, 2023 23:48
@pettyjamesm
Copy link
Member Author

@raunaqmorarka - let me know if you have any questions when you get a chance to review this PR.

@raunaqmorarka
Copy link
Member

raunaqmorarka commented Feb 9, 2023

I haven't been able to reproduce improvements from this PR on my laptop
edit: managed to get similar results on my system as well after re-running

@radek-starburst could you please run this and post results from your system as well ?

@radek-kondziolka
Copy link
Contributor

radek-kondziolka commented Feb 9, 2023

My results:

OpenJDK 64-Bit Server VM Zulu17.36+13-CA (build 17.0.4+8-LTS, mixed mode, sharing)


                   
Benchmark                                                  (nullChance)  Mode  Cnt   Before          After          Units
BenchmarkBlockSerde.deserializeSliceDirect             0  avgt   10   3.550 ± 0.296  ns/op   3.188 ± 0.125  ns/op
BenchmarkBlockSerde.deserializeSliceDirect           .01  avgt   10   4.860 ± 0.105  ns/op    4.655 ± 0.188  ns/op
BenchmarkBlockSerde.deserializeSliceDirect           .10  avgt   10   5.443 ± 0.162  ns/op    4.991 ± 0.161  ns/op
BenchmarkBlockSerde.deserializeSliceDirect           .50  avgt   10   6.312 ± 0.295  ns/op    3.530 ± 0.029  ns/op
BenchmarkBlockSerde.deserializeSliceDirect           .90  avgt   10   2.482 ± 0.098  ns/op   2.522 ± 0.054  ns/op
BenchmarkBlockSerde.deserializeSliceDirect           .99  avgt   10   1.288 ± 0.036  ns/op   1.618 ± 0.029  ns/op

Avoids allocating an unnecessary additional array allocation when
deserializing VariableWidthBlocks, since we can use the final offsets
array to temporarily store the serialized lengths values and transform
them back into valid offsets in place.

Benchmark                                  (nulls)  Mode  Before         After          Units
BenchmarkBlockSerde.deserializeSliceDirect       0  avgt  4.174 ± 0.455  3.700 ± 0.092  ns/op
BenchmarkBlockSerde.deserializeSliceDirect     .01  avgt  4.943 ± 0.340  4.575 ± 0.330  ns/op
BenchmarkBlockSerde.deserializeSliceDirect     .10  avgt  5.471 ± 0.329  5.252 ± 0.078  ns/op
BenchmarkBlockSerde.deserializeSliceDirect     .50  avgt  6.730 ± 0.815  3.592 ± 0.167  ns/op
BenchmarkBlockSerde.deserializeSliceDirect     .90  avgt  2.636 ± 0.265  2.646 ± 0.260  ns/op
BenchmarkBlockSerde.deserializeSliceDirect     .99  avgt  1.303 ± 0.068  1.708 ± 0.038  ns/op
@pettyjamesm pettyjamesm force-pushed the further-improve-variablewidthblock-encoding branch from 82b6d42 to 3adf7c3 Compare February 9, 2023 16:16
@raunaqmorarka raunaqmorarka merged commit 513974c into trinodb:master Feb 10, 2023
@github-actions github-actions bot added this to the 407 milestone Feb 10, 2023
@pettyjamesm pettyjamesm deleted the further-improve-variablewidthblock-encoding branch February 10, 2023 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants