Push DictionaryBlock through remote partitioned exchange#14937
Push DictionaryBlock through remote partitioned exchange#14937raunaqmorarka merged 2 commits intotrinodb:masterfrom
Conversation
5124e8a to
fea1f25
Compare
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I need to calculate getRetainedSizeInBytes. For that, the actual size of the array is needed and IntArrayList does not expose that info.
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I need to calculate getRetainedSizeInBytes. For that, the actual size of the array is needed and IntArrayList does not expose that info.
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Rename this to DictionaryAwarePositionsAppender. We usually just have DictionaryAwareXXX which handles both RLE and dicts
There was a problem hiding this comment.
I would leave the name unchanged. Unnesting is a more important function of this class. Handling dictionaries is an additional feature.
There was a problem hiding this comment.
Unnesting is a more important function of this class.
Unnesting became almost irrelevant at this point since now RLE and dicts cannot be nested
There was a problem hiding this comment.
Keep state top-level as in io.trino.operator.output.RleAwarePositionsAppender
There was a problem hiding this comment.
Since this class has now two responsibilities (unnesting and building a dictionary) it's more readable to separate them. It also makes it easier to reset the state of the appender.
There was a problem hiding this comment.
Since this class has now two responsibilities (unnesting and building a dictionary)
This class doesn't really unnest much now. RleAwayPositionsAppender could do:
if (source instanceof RunLengthEncodedBlock) {
delegate.appendRle(((RunLengthEncodedBlock) source).getValue(), positions.size());
}
itself really at this point, so UnnestingPositionsAppender would be all about dictionaries
There was a problem hiding this comment.
RleAwayPositionsAppender is not always there. Unnesting part is actually to make sure only flat blocks are passed down to the flat appenders.
Maybe we should merge UnnestingPositionsAppender and RleAwayPositionsAppender into BlockTypeAwarePositionsAppender? although I fear it would make the code messier.
There was a problem hiding this comment.
RleAwayPositionsAppender is not always there.
Why wouldn't it be always there?
There was a problem hiding this comment.
it's not needed if the type is not comparable
There was a problem hiding this comment.
it's not needed if the type is not comparable
I don't think it's worth extra complexity since there are not many types like that.
However. You could then have minimal UnnestingPositionsAppender without rle or dict builder support.
I don't mixing of current UnnestingPositionsAppender with dictionary awareness is needed
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Since this class has now two responsibilities (unnesting and building a dictionary) it's more readable to separate them. It also makes it easier to reset the state of the appender.
There was a problem hiding this comment.
I would leave the name unchanged. Unnesting is a more important function of this class. Handling dictionaries is an additional feature.
ba0e02c to
a754634
Compare
|
Working of a performance regression so converted temporarily to a draft |
a754634 to
f33747e
Compare
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Show resolved
Hide resolved
a2e3297 to
afe4bd5
Compare
cbff50c to
41bf198
Compare
41bf198 to
a731d5f
Compare
a731d5f to
b4514ae
Compare
|
rebased on the master with the |
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Why is dictionary flushed in every case here but in the other append it is flushed only for non-dictionary cases
There was a problem hiding this comment.
I think we flush in every case other than we have the same dictionary (dictionaryBlockBuilder.canAppend to be precise).
We could also do it here as well at a cost of !closed && (dictionary == this.dictionary || this.dictionary == null) for every row. I will try to benchmark how much this impacts performance of the row by row processing
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
The DictionaryBlock is pushed through only if all the input blocks are DictionaryBlocks and use the same instance of the DictionaryBlock.dictionary.
This is to limit the negative impact of using dictionaries due to megamorphic calls but still getting the benefit of transporting dictionary blocks over network
b4514ae to
478b4f5
Compare
lukasz-stec
left a comment
There was a problem hiding this comment.
comments answered and addressed
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I think we flush in every case other than we have the same dictionary (dictionaryBlockBuilder.canAppend to be precise).
We could also do it here as well at a cost of !closed && (dictionary == this.dictionary || this.dictionary == null) for every row. I will try to benchmark how much this impacts performance of the row by row processing
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/output/UnnestingPositionsAppender.java
Outdated
Show resolved
Hide resolved
|
ci failed with |
| else { | ||
| newSize = initialEntryCount; | ||
| } | ||
| newSize = Math.max(newSize, capacity); |
There was a problem hiding this comment.
Is there any possibility that newSize will be lower than capacity?
There was a problem hiding this comment.
well yes, i.e., capacity can be bigger than initialEntryCount or bigger than 1.5 * dictionaryIds.length (calculateNewArraySize)
There was a problem hiding this comment.
Why not just do int newSize = calculateNewArraySize(max(initialEntryCount, capacity, dictionaryIds.length))
ofc this will work different that now. If capacity will 100, it will create an array of size 150.
There was a problem hiding this comment.
we do not want to go over initialEntryCount as the appender can be "pre-sized" in #reset
|
@lukasz-stec what are we waiting for here? |
|
stable benchmarks, mainly to see the impact of dictionary block flattening |
Description
Dictionary-encoded blocks are currently flatened by the partitioned exchange operator. This prevents dictionary based optimizations from taking advantage of the encoded blocks (or results in additional overhead).
This PR adds support for dictionary encoded blocks through partitioned exchange for a case where the same dictionary (the same java object) is used by subsequent Blocks sent through
PartitionedOutputOperatorOther than possible CPU optimizations, transmitting
DictionaryBlocks over network reduces is more efficient than flat blocks.As an example for query on tpch schema (sf10) encoded in orc files
mktsegment is dictionary encoded.
so when we look at the customer table scan we see
and with this optimization
so
Output: 1500000 rows (45.78MB)goes down toOutput: 1500000 rows (34.43MB).Non-technical explanation
Increase dictionary-encoded block usage in the engine.
Release notes
( X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: