Draft of more join optimisations#13372
Conversation
Some tests in the non-spilling join operator still used the `LookupJoinPageBuilder` class from the spilling operator
At this point it is just default methods falling back to sequential processing and spreading work between partitions.
When spilling disabled the lookup source is set before the creation of JoinProbe and it can be passed in its constructor
Batching API + implementation in BigintPagesHash
If the positions in build side are not unique, i.e. a single position on the probe side matches more than one position on the build side, the `positionLinks` data structure holds the positions to match. If the build side is unique the `positionLinks` object is null and there is a single null-check every probe position. However, since lookup source is most likely partitioned, the null-check needs to go into a proper partition and the partition check and delegation is carried out. This is a relatively expensive operation, so this commit skips it altogether if the build side is unique.
When no filter is applied the `LookupSource::isJoinPositionEligible` method always return true. However, with `PartitionedLookupSource` the check must be delegated to a specific partition which is an expensive operation. This commit skips the check altogether when there is no filter in the lookup source
When join is carried out on a single bigint column, the `BigintPagesHash` implementation ignores hashes. The code in this commit skips splitting hash array in `PartitionedLookupSource in that case.`
This way the lookup source is asked only for a single position instead of iterating over the RLE block
lukasz-stec
left a comment
There was a problem hiding this comment.
lgtm, I like the ideas here (adapting to data shape). ping me once it's not a draft.
|
|
||
| private void processProbe(LookupSource lookupSource) | ||
| { | ||
| if (lookupSource.isMappingUnique()) { |
There was a problem hiding this comment.
Since you are splitting logic here, can you ensure both cases are tested in AbstractTestJoinQueries or a unit test for this?
core/trino-main/src/main/java/io/trino/operator/join/unspilled/PageJoiner.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/join/unspilled/JoinProbe.java
Show resolved
Hide resolved
| int matches = 0; | ||
| int mismatches = 0; | ||
|
|
||
| do { |
There was a problem hiding this comment.
do you think it makes sense to vectorize/batch this loop eventually?
|
Some of the code here is in #13747. |
|
👋 @skrzypo987 - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
The last four commits are relevant
Description
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: