Skip to content

Produce RunLengthEncodedBlock in VariableWidthBlockBuilder when all values are null#12043

Merged
sopel39 merged 1 commit intotrinodb:masterfrom
starburstdata:ls/011-varchar-has-non-null-row
Apr 21, 2022
Merged

Produce RunLengthEncodedBlock in VariableWidthBlockBuilder when all values are null#12043
sopel39 merged 1 commit intotrinodb:masterfrom
starburstdata:ls/011-varchar-has-non-null-row

Conversation

@lukasz-stec
Copy link
Copy Markdown
Member

Description

Add support in VariableWidthBlockBuilder for producing RunLengthEncodedBlock if all positions are null.

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

only VariableWidthBlockBuilder class

How would you describe this change to a non-technical end user or system administrator?

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Apr 20, 2022
@lukasz-stec lukasz-stec requested a review from sopel39 April 20, 2022 08:09
@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 20, 2022

Add support in VariableWidthBlockBuilder for producing RunLengthEncodedBlock if all positions are null.

Or should the caller be responsible to do this instead?
If not the caller, are you planning on adding similar functionality to all the block builders?

cc @dain

@lukasz-stec
Copy link
Copy Markdown
Member Author

Add support in VariableWidthBlockBuilder for producing RunLengthEncodedBlock if all positions are null.

Or should the caller be responsible to do this instead? If not the caller, are you planning on adding similar functionality to all the block builders?

cc @dain

It's already there for most of BlockBuilkders (all simple types except this).
I'm adding RowBlockBuilder support in another PR.

@findepi findepi changed the title Add hasNonNullValue support to VariableWidthBlockBuilder Produce RunLengthEncodedBlock in VariableWidthBlockBuilder when all values are null Apr 20, 2022
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertIsNullRle

@lukasz-stec lukasz-stec force-pushed the ls/011-varchar-has-non-null-row branch from 024f412 to 936a885 Compare April 20, 2022 09:16
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% comments

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testBuildProducesNullRle

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRegion doesn't really copy anything.

Maybe just inline these two test methods and call it, testBuilderProducesNullRleForNullRows

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i merged the methods into testBuilderProducesNullRleForNullRows

@lukasz-stec lukasz-stec force-pushed the ls/011-varchar-has-non-null-row branch from 936a885 to 62d6a8a Compare April 20, 2022 09:39
@lukasz-stec lukasz-stec force-pushed the ls/011-varchar-has-non-null-row branch from 62d6a8a to e947ea6 Compare April 20, 2022 13:26
@lukasz-stec
Copy link
Copy Markdown
Member Author

io.trino.execution.buffer.TestPagesSerde#testVarcharSerializedSize was failing. I fixed the test but it's quite interesting that for blocks with just nulls, up until something like ~300 positions, a normal, flat block will serialize to a smaller number of bytes than RLE (because of RLE overhead + good compression of flat block).
This means that this change could theoretically increase network traffic for some special cases (i.e. small, null blocks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants