Skip to content

Draft of more join optimisations#13372

Closed
skrzypo987 wants to merge 11 commits intotrinodb:masterfrom
skrzypo987:skrzypo/089-join-unique-positions
Closed

Draft of more join optimisations#13372
skrzypo987 wants to merge 11 commits intotrinodb:masterfrom
skrzypo987:skrzypo/089-join-unique-positions

Conversation

@skrzypo987
Copy link
Copy Markdown
Member

@skrzypo987 skrzypo987 commented Jul 27, 2022

The last four commits are relevant

Description

Is this change a fix, improvement, new feature, refactoring, or other?

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

How would you describe this change to a non-technical end user or system administrator?

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

skrzypo987 added 11 commits July 26, 2022 07:38
Some tests in the non-spilling join operator still used the
`LookupJoinPageBuilder` class from the spilling operator
At this point it is just default methods falling back to sequential processing
and spreading work between partitions.
When spilling disabled the lookup source is set before the creation of JoinProbe
and it can be passed in its constructor
Batching API + implementation in BigintPagesHash
If the positions in build side are not unique, i.e. a single position on the
probe side matches more than one position on the build side, the `positionLinks`
data structure holds the positions to match. If the build side is unique the
`positionLinks` object is null and there is a single null-check every probe
position. However, since lookup source is most likely partitioned, the null-check
needs to go into a proper partition and the partition check and delegation is
carried out. This is a relatively expensive operation, so this commit skips
it altogether if the build side is unique.
When no filter is applied the `LookupSource::isJoinPositionEligible` method
always return true. However, with `PartitionedLookupSource` the check must
be delegated to a specific partition which is an expensive operation.
This commit skips the check altogether when there is no filter in the
lookup source
When join is carried out on a single bigint column, the `BigintPagesHash`
implementation ignores hashes. The code in this commit skips splitting
hash array in `PartitionedLookupSource in that case.`
This way the lookup source is asked only for a single position instead of
iterating over the RLE block
@cla-bot cla-bot bot added the cla-signed label Jul 27, 2022
@skrzypo987 skrzypo987 requested a review from lukasz-stec July 27, 2022 08:24
@skrzypo987 skrzypo987 changed the title Draft of more join optimisation Draft of more join optimisations Jul 27, 2022
Copy link
Copy Markdown
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I like the ideas here (adapting to data shape). ping me once it's not a draft.


private void processProbe(LookupSource lookupSource)
{
if (lookupSource.isMappingUnique()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are splitting logic here, can you ensure both cases are tested in AbstractTestJoinQueries or a unit test for this?

int matches = 0;
int mismatches = 0;

do {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it makes sense to vectorize/batch this loop eventually?

@skrzypo987
Copy link
Copy Markdown
Member Author

Some of the code here is in #13747.
The rest will land in another PR soon

@mosabua
Copy link
Copy Markdown
Member

mosabua commented Jan 12, 2024

👋 @skrzypo987 - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@mosabua mosabua closed this Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants