Skip to content

Add value block abstraction#19385

Merged
dain merged 36 commits intotrinodb:masterfrom
dain:value-block
Oct 21, 2023
Merged

Add value block abstraction#19385
dain merged 36 commits intotrinodb:masterfrom
dain:value-block

Conversation

@dain
Copy link
Copy Markdown
Member

@dain dain commented Oct 12, 2023

Description

This PR adds the ValueBlock interface to the SPI. A value block is a type of Block that contains actual values as opposed to the synthetic wrapper blocks DictionaryBlock, RunLengthEncodedBlock, and LaxyBlock. Each Type now declares the ValueBlock class used to represent the type. When interacting with data for a type, and implementation can safely cast a ValueBlock instance to the Types declared ValueBlockclass.

Additionally, there is a new VALUE_BLOCK_POSITION (and related VALUE_BLOCK_POSITION_NOT_NULL) calling convention that will ensure that synthetic blocks are unwrapped on the caller's side. This call convention is preferred as it empowers the caller to unwrap blocks outside of the core computation loop.

All aggregation functions have been updated to use ValueBlocks with the synthetic blocks unwrapped outside of the core loop. This change does require that each aggregation parameter have a separate position instead of a shared position as the unwrapped blocks can be unaligned.

This is part of #14237

Fixes #13267

Release notes

(X) Release notes are required, with the following suggested text:

# Section
* Add `ValueBlock` abstraction along with `VALUE_BLOCK_POSITION` and `VALUE_BLOCK_POSITION_NOT_NULL` calling convention. ({issue}`19385`)
* Aggregation functions now require a separate block position for each argument.  ({issue}`19385`)

@dain dain requested a review from electrum October 12, 2023 17:25
@cla-bot cla-bot bot added the cla-signed label Oct 12, 2023
@github-actions github-actions bot added tests:hive iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector bigquery BigQuery connector mongodb MongoDB connector labels Oct 12, 2023
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for this to be

public interface ValueBlock<T extends ValueBlock>

Then the methods would be

@Override
T copyPositions(...)

This would require implementations to override all of the methods with the correct return type.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I don't like making these interfaces generic as they aren't used in ways that the generic type would help.... basically all uses would be Value<?> or even worse Value<? extends Value>. I think we can satisfy this by adding a test that verifies no methods on an implementation return ValueBlock (so they are forced to override the return)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a byte array equals method to Slice

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep... we could also go the other way and unwarp here. For now, this follows the style of the other operators.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this version? Or can the engine invert the above?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but the conversion code isn't that bright right now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't all of these method additions need revapi changes? Maybe it doesn't flag on new methods?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only check for backwards incompatible changes

dain added 17 commits October 19, 2023 19:24
The aggregation framework already unwinds RLE and dictionary wrappers,
so this is mostly type changes.
The generated full loops have specialized paths for dictionary or run
length encoded data, and for masked or unmasked data.
The equals method on Field was complex and incomplete. Removing these
simplifies the code and removes some bugs.
Merge RleAwarePositionsAppender into UnnestingPositionsAppender and
simplify combined code
These implementations delegate to Block, but the block implementation
for these types do not support getSlice
@dain
Copy link
Copy Markdown
Member Author

dain commented Oct 20, 2023

/test-with-secrets sha=e872b5c29db025e8673099ff64f4f25e6d2b8aee

@github-actions
Copy link
Copy Markdown

github-actions bot commented Oct 20, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/6583554942

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery BigQuery connector cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector mongodb MongoDB connector

Development

Successfully merging this pull request may close these issues.

Make engine use concrete Block type depending on Column/Symbol type

2 participants