Do not allow RLE or Dictionary to be nested in an RLE or Dictionary by dain · Pull Request #14092 · trinodb/trino

dain · 2022-09-11T01:34:42Z

Description

Simplify RLE and Dictionary blocks by not allowing the nested block to be an RLE or Dictionary block.
When an RLE or Dictionary block is zero or one positions, return getRegion over the nested block instead.

Release notes

( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# SPI
* Replace DictionaryBlock constructors with factory method. ({issue}`14092`)
* Replace RunLengthEncodedBlock constructors with factory method. ({issue}`14092`)

core/trino-main/src/main/java/io/trino/operator/output/Int96PositionsAppender.java

core/trino-main/src/main/java/io/trino/operator/output/RleAwarePositionsAppender.java

core/trino-spi/src/main/java/io/trino/spi/block/RunLengthEncodedBlock.java

core/trino-spi/src/main/java/io/trino/spi/block/DictionaryBlock.java

findepi · 2022-09-17T16:18:07Z

What's the rationale or expected benefit of this change?

dain · 2022-09-17T20:07:29Z

@findepi in most cases it is nonsensical to wrap a performance block in a performance block, because many of these are really noops. For example, and RLE in and RLE or a dictionary, is just an RLE, so this just adds an extra level of indirection for no gain. The only compute extra work here is the case where a dictionary is unwrapped because you need to reindex. This case should be rare since most critical places that create dictionaries are already dictionary aware (and any not, we should make aware), and I believe this is well worth the reduced indirection cost, and the developer complexity of dealing with deep nested perf blocks.

Also,

Bypass rle when 0 or 1 positions are used.

sopel39 · 2022-09-26T09:24:57Z

core/trino-spi/src/main/java/io/trino/spi/block/DictionaryBlock.java

+            return new RunLengthEncodedBlock(rle.getValue(), positionCount);
+        }
+
+        // unwrap dictionary in dictionary


This is not a correct unwrap as you cannot preserve dictionarySourceId after unnest.

Take a look at:

topDictBlock1 topDictBlock2 sourceId:A sourceId:A dictionary: dictionary: nestedDictBlock1 nestedDictBlock2 sourceId:1 sourceId:2

(such situation happens at join)

You cannot unwrap it to:

unwrappedDictBlock1 unwrappedDictBlock2 sourceId:A sourceId:A

as you cannot for example compact them in same way (as in compactRelatedBlocks method)

You should assign a new sourceId if unwrapping is done implicitly by Dictionary constructor

How can topDictBlock1 and topDictBlock2 have the same sourceId if the underlying dictionaries are different?

How can topDictBlock1 and topDictBlock2 have the same sourceId if the underlying dictionaries are different?

same sourceId implies same ids, but it’s stronger than that. Two DictionaryBlocks might have same ids coincidentally, but different sourceIds
If you have columns of

page=[ dictA(source: S, ids: X, dict: nestedA), dictB(source: S, ids: X, dict: nestedB), dictC(source: S, ids: X, dict: nestedC)]

then you essentially process it like:

page=MultiChannelDict( source:S, ids:X dict=[nestedA, nestedB, nestedC])

I hope this analogy makes it clearer

sopel39 · 2022-09-26T12:01:20Z

core/trino-spi/src/main/java/io/trino/spi/block/DictionaryBlock.java

     * This should not only be used when creating a projection of another dictionary block.
     */
-    public DictionaryBlock(int positionCount, Block dictionary, int[] ids, DictionaryId dictionarySourceId)
+    public static Block createProjectedDictionaryBlock(int positionCount, Block dictionary, int[] ids, DictionaryId dictionarySourceId)


This method is very similar to DictionaryBlock#getPositions when it unwraps dictionaries. Yet getPositions has more optimizations like taking compactness into account or evaluating uniqueIds.
These optimizations improve serialization for example

sopel39 · 2022-09-26T12:16:56Z

core/trino-spi/src/main/java/io/trino/spi/block/DictionaryBlock.java

+
+        // unwrap dictionary in dictionary
+        if (dictionary instanceof DictionaryBlock dictionaryBlock) {
+            int[] newIds = new int[positionCount];


Unnesting is not neccecerly always beneficial without looking at context, e.g: consider join:

left_col1 | left_col2 | right_col1 =================================== row42 | row42 | rightRow1 row42 | row42 | rightRow2 ... row42 | row42 | rightRowN

row42 from probe is repeated N times. Right now in join we will use dictionary (getPosition) to avoid copying row42 N times. This means that dictionaryId for blocks left_col1 and left_col2 can be same.

If there is now dictionary aware operator on left_col1, left_col2, then because left_col1, left_col2 have same dictionaryId we can process it once rather than N times.

However, if you unnest dictionary always, then you have to drop common dictionaryId for left_col1, left_col2 in this method (see #14092 (comment))

This seems like a very rare scenario compared to the benefits due to reduced complexity and being able to avoid megamorphic calls in certain places (all part of the effort tracked under #14237)

dain requested review from electrum and martint September 11, 2022 01:34

cla-bot bot added the cla-signed label Sep 11, 2022

github-actions bot added the tests:hive label Sep 11, 2022

martint reviewed Sep 14, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/operator/output/Int96PositionsAppender.java Outdated Show resolved Hide resolved

dain force-pushed the block-cleanup branch from ee0218c to 940c7e1 Compare September 14, 2022 17:19

dain commented Sep 16, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/operator/output/RleAwarePositionsAppender.java Outdated Show resolved Hide resolved

dain commented Sep 16, 2022

View reviewed changes

core/trino-spi/src/main/java/io/trino/spi/block/RunLengthEncodedBlock.java Outdated Show resolved Hide resolved

core/trino-spi/src/main/java/io/trino/spi/block/DictionaryBlock.java Outdated Show resolved Hide resolved

martint approved these changes Sep 17, 2022

View reviewed changes

dain added 7 commits September 17, 2022 16:32

Remove outdated SPI change declarations

9ce4fbc

Unwrap a dictionary block in a run length encoded block

391dcc6

Move compactRelatedBlocks from Page to DictionaryBlock

68e8928

Reduce number of dictionary block constructors

2725642

Do not allow dictionary to be a dictionary or rle block

da58b3a

Also,

Replace direct usage of RleBlocks in PageAppenders

1dc1381

Do not allow rle value to be a dictionary or rle block

cf6cada

Bypass rle when 0 or 1 positions are used.

dain force-pushed the block-cleanup branch from 940c7e1 to cf6cada Compare September 18, 2022 00:06

dain merged commit 52eb874 into trinodb:master Sep 18, 2022

github-actions bot added this to the 397 milestone Sep 18, 2022

dain deleted the block-cleanup branch September 18, 2022 16:35

colebow mentioned this pull request Sep 19, 2022

Add Trino 397 release notes #14194

Merged

sopel39 reviewed Sep 26, 2022

View reviewed changes

martint mentioned this pull request Sep 26, 2022

Project Hummingbird #14237

Open

19 tasks

skrzypo987 mentioned this pull request Sep 27, 2022

Empty RLE blocks mayHaveNulls method returns true #14312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not allow RLE or Dictionary to be nested in an RLE or Dictionary#14092

Do not allow RLE or Dictionary to be nested in an RLE or Dictionary#14092
dain merged 7 commits intotrinodb:masterfrom
dain:block-cleanup

dain commented Sep 11, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

findepi commented Sep 17, 2022

Uh oh!

dain commented Sep 17, 2022

Uh oh!

sopel39 Sep 26, 2022 •

edited

Loading

Uh oh!

martint Sep 28, 2022

Uh oh!

sopel39 Sep 28, 2022 •

edited

Loading

Uh oh!

sopel39 Sep 26, 2022 •

edited

Loading

Uh oh!

sopel39 Sep 26, 2022 •

edited

Loading

Uh oh!

martint Sep 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

dain commented Sep 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

findepi commented Sep 17, 2022

Uh oh!

dain commented Sep 17, 2022

Uh oh!

sopel39 Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martint Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

sopel39 Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sopel39 Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sopel39 Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martint Sep 26, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

dain commented Sep 11, 2022 •

edited

Loading

sopel39 Sep 26, 2022 •

edited

Loading

sopel39 Sep 28, 2022 •

edited

Loading

sopel39 Sep 26, 2022 •

edited

Loading

sopel39 Sep 26, 2022 •

edited

Loading