Skip to content

Hard-fork join operator and remove spilling from it#12618

Merged
raunaqmorarka merged 9 commits intotrinodb:masterfrom
skrzypo987:skrzypo/071-remove-join-spilling-1
Jul 6, 2022
Merged

Hard-fork join operator and remove spilling from it#12618
raunaqmorarka merged 9 commits intotrinodb:masterfrom
skrzypo987:skrzypo/071-remove-join-spilling-1

Conversation

@skrzypo987
Copy link
Copy Markdown
Member

@skrzypo987 skrzypo987 commented May 31, 2022

Description

This PR hard-forkes the entire join operator. Some interfaces remain common as they are an external interface. Minor changes has been made in the original operator to make the fork possible.
The latter commits remove spilling from the forked operator. No observable gain will come from that right now.
This is a preparation for some optimization work in the non-spilling join operator.
Some dead/suboptimal code still exists in the forked operator after this PR. Some cleanup will follow with some more optimisations later on.

Is this change a fix, improvement, new feature, refactoring, or other?

Refactoring to simplify join operator for non-spilled cases

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

Code refactoring to simplify join operator for non-spilled cases

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

() No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Improved join performance when spill to disk is not used. The config property `force-spilling-join-operator` or session property `force_spilling_join` can be set to false to restore usage of spilling join operator. ({issue}`12618`)

@cla-bot cla-bot bot added the cla-signed label May 31, 2022
@skrzypo987 skrzypo987 requested a review from sopel39 May 31, 2022 17:09
@skrzypo987 skrzypo987 force-pushed the skrzypo/071-remove-join-spilling-1 branch from 60fd49d to c0747d3 Compare June 1, 2022 09:05
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip OperatorFactories for "new" join. Won't serve any purpose

@skrzypo987 skrzypo987 force-pushed the skrzypo/071-remove-join-spilling-1 branch from c0747d3 to 69a8fb4 Compare June 1, 2022 13:06
@skrzypo987 skrzypo987 force-pushed the skrzypo/071-remove-join-spilling-1 branch from 69a8fb4 to d6b0f07 Compare June 23, 2022 08:13
@skrzypo987 skrzypo987 force-pushed the skrzypo/071-remove-join-spilling-1 branch 2 times, most recently from 5a558ad to 65e6124 Compare June 29, 2022 08:06
@skrzypo987 skrzypo987 marked this pull request as ready for review June 29, 2022 08:10
@skrzypo987 skrzypo987 requested a review from sopel39 June 29, 2022 08:16
@skrzypo987 skrzypo987 force-pushed the skrzypo/071-remove-join-spilling-1 branch from 65e6124 to cd24df1 Compare June 29, 2022 13:43
@skrzypo987 skrzypo987 force-pushed the skrzypo/071-remove-join-spilling-1 branch from cd24df1 to b5b68ac Compare July 5, 2022 12:41
@skrzypo987
Copy link
Copy Markdown
Member Author

@raunaqmorarka comments addressed. Added q commit.

@skrzypo987
Copy link
Copy Markdown
Member Author

@raunaqmorarka Added 2 commits to resolve the remaining issues

@skrzypo987 skrzypo987 force-pushed the skrzypo/071-remove-join-spilling-1 branch from cab9836 to c915ecc Compare July 6, 2022 08:59
@skrzypo987
Copy link
Copy Markdown
Member Author

Added a session property and a config option to disable the forked operator

@skrzypo987
Copy link
Copy Markdown
Member Author

TPC benchmarks results. Roughly 1%-1.5% gain
Benchmark-non-spilling-join.pdf

skrzypo987 added 2 commits July 6, 2022 13:03
This commit clones most of the classes in the join operator without introducing
any major changes. The forked one will (in future commits) not support spilling,
yet at this point it is identical.
skrzypo987 added 7 commits July 6, 2022 13:03
This commit removes mainly dead code associated with no spilling. No major
logical changes are introduced
After the removal of spilling this class is fairly basic and in 1:1 relation to
LookupJoinOperator, so it can be easily inlined
Only the write lock is used at this point so it can be safely replaced with synchronize
Since there is only one implementation - DefaultPageJoiner, which is now called PageJoiner
Since there is only one implementation - PartitionedLookupSourceFactory
This is a rollback of changes made in 'hard-fork join operator' commit.
The changes were necessary form the fork, but after the spilling is removed
they can be rolled back.
@skrzypo987 skrzypo987 force-pushed the skrzypo/071-remove-join-spilling-1 branch from c915ecc to 3aa42d8 Compare July 6, 2022 10:41
@raunaqmorarka raunaqmorarka merged commit 4c95ee3 into trinodb:master Jul 6, 2022
@github-actions github-actions bot added this to the 389 milestone Jul 6, 2022
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some retrospection comments. Please address as follow-ups

* This page builder creates pages with dictionary blocks:
* normal dictionary blocks for the probe side and the original blocks for the build side.
* <p>
* TODO use dictionary blocks (probably extended kind) to avoid data copying for build side
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing reference to github issue

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not my todo, it's a forked code

for (int index : probe.getOutputChannels()) {
Block block = probe.getPage().getBlock(index);
// Estimate the size of the probe row
// TODO: improve estimation for unloaded blocks by making it similar as in PageProcessor
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing reference to Github issue. You probably should create umbrella issue

return new LookupJoinOperatorFactory(
operatorId,
planNodeId,
(JoinBridgeManager<? extends LookupSourceFactory>) lookupSourceFactoryManager,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should create a new type hierarchy or new operator factory methods rather than explicitly cast here

import static java.util.Objects.requireNonNull;

public final class PartitionedLookupSourceFactory
implements JoinBridge
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pretends to be LookupSourceFactory, but doesn't implement interface. Name is wrong and confusing

probeHashChannel,
partitioningSpillerFactory);
}
else {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else is redundant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants