Skip to content

Avoid direct usage of DictionaryBlock constructor#17842

Closed
raunaqmorarka wants to merge 1 commit intotrinodb:masterfrom
raunaqmorarka:dict-block
Closed

Avoid direct usage of DictionaryBlock constructor#17842
raunaqmorarka wants to merge 1 commit intotrinodb:masterfrom
raunaqmorarka:dict-block

Conversation

@raunaqmorarka
Copy link
Copy Markdown
Member

Description

This ensures that dictionaries with RLE or another dictionary block inside them are unwrapped in all code paths

Additional context and related issues

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this constructor is used in multiple places inside DictionaryBlok e.g. copyPositions that can make the dictionary to be RLE.
If we really want to protect ourselves from this case we need a precondition inside this constructor

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, I missed the usages within this class, we need some more changes

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking closer at the other usages, the only way we found to make RLE dictionary out of non-RLE dictionary through get/copyPositions is to use a BlockBuilder as the dictionary. Although it's technically allowed, I'm not sure we need to worry about that. It would be odd to create a DictionaryBlock with a BlockBuilder as the dictionary rather than a pre-built Block.

This ensures that dictionaries with RLE or another dictionary
block inside them are unwrapped in all code paths
Copy link
Copy Markdown
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better for all DictionaryBlock construction to go through the createInternal method but this is step into that direction

Page output = lookupJoinPageBuilder.build(probe);
assertEquals(output.getChannelCount(), 2);
assertTrue(output.getBlock(0) instanceof DictionaryBlock);
assertTrue(output.getBlock(0) instanceof LongArrayBlock);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this assert?

@pettyjamesm
Copy link
Copy Markdown
Member

The changes look ok to me, although the test failures seem related and suggest that something subtle is going on. However, the commit message isn't accurate because getPositions can definitely still create DictionaryBlock nested dictionaries (first call will wrap some block type with DictionaryBlock as its output, and calling getPositions again on that result will create a DictionaryBlock{dictionary=DictionaryBlock} nesting).

Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % there are test failures

Block block = DictionaryBlock.create(1, 2, dictionary, new int[] {1, 3, 6, 8});
assertThat(block).isInstanceOf(DictionaryBlock.class);
assertThat(block.getPositionCount()).isEqualTo(2);
assertThat(block.getSlice(0, 0, block.getSliceLength(0))).isEqualTo(expectedValues[3]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertBlock?

JoinProbe probe = joinProbeFactory.createJoinProbe(page);
Page output = lookupJoinPageBuilder.build(probe);
assertEquals(output.getChannelCount(), 2);
assertTrue(output.getBlock(0) instanceof DictionaryBlock);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why type changed here?

ColumnarRow columnarRow = toColumnarRow(rowIds);
checkArgument(!columnarRow.mayHaveNull(), "The rowIdsRowBlock may not have null rows");
int positionCount = rowIds.getPositionCount();
if (positionCount == 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move before toColumnarRow(rowIds).

if (positionCount == 0) {
return EMPTY_PAGE;
}
checkArgument(!columnarRow.mayHaveNull(), "The rowIdsRowBlock may not have null rows");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undelated change?

Copy link
Copy Markdown
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a full rewrite blocks like these queued up behind the group by hash PR. In the rewrite I introduce the concept of a value block, all blocks other than dictionary, rle, and lazy. Then I update dictionary and RLE to only be able to contain value blocks, and then I change most of the code base to unwrap non-value blocks before invoking functions (on the outside of the loops).

I suggest we don't merge these changes, as they will just conflict and get undone by the changes in my branch.

@raunaqmorarka
Copy link
Copy Markdown
Member Author

I suggest we don't merge these changes, as they will just conflict and get undone by the changes in my branch.

Sure, I had put this aside anyway. There were some unexplained test failures which I didn't have the bandwidth to look into and the original problem we wanted to solve was tacked by #17844

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector

Development

Successfully merging this pull request may close these issues.

5 participants