Skip to content

Batch join lookup source#13352

Merged
raunaqmorarka merged 3 commits intotrinodb:masterfrom
skrzypo987:skrzypo/088-batch-join-lookup-source
Oct 2, 2022
Merged

Batch join lookup source#13352
raunaqmorarka merged 3 commits intotrinodb:masterfrom
skrzypo987:skrzypo/088-batch-join-lookup-source

Conversation

@skrzypo987
Copy link
Copy Markdown
Member

@skrzypo987 skrzypo987 commented Jul 26, 2022

Handle probe side of join in batches, as opposed to row-by-row. Benchmarks in the comment below

Description

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

Improve performance of join operator

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# General
* Improve performance of joins by using batched processing on the probe side of join operator. ({issue}`13352`)

@skrzypo987
Copy link
Copy Markdown
Member Author

Benchmark-joinbatching.pdf
Benchmarks for unpartitioned ORC show >8% tpch and >4% tpcds cpu gain.
Partitioned benchmarks failed due to spot loss, but the partial results were similar.
No visible regressions.

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Jul 26, 2022

Could you fix #12618 (comment) and #12618 (comment) as a prerequisite?

Copy link
Copy Markdown
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look.
Generally, this looks like a step in the right direction, but I wonder if there is gonna be a follow-up to batch PageJoiner, e.g. getNextJoinPosition.

}

private boolean currentRowContainsNull()
private long[] fillCache()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a workaround for not batching PageJoiner. Do you plan to add batching there at some point?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more complicated than it looks like. There is no 1:1 relation between probe and build side and the JoinProbe is the object responsible for handling that relation. So batching on any other level can be done only in certain conditions.
I do plan adding some quasi-batching capabilities to PageJoiner but do not consider this JoinProbe batching as a workaround.
I will post a draft PR tomorrow with some changes in PageJoiner so you can take a look.

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from 8b7b2fb to a5e2dde Compare July 26, 2022 19:18
@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from a5e2dde to 8e9bceb Compare August 1, 2022 05:04
@skrzypo987
Copy link
Copy Markdown
Member Author

I rebased the PR on top of #13432

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from 8e9bceb to 36cd16b Compare August 1, 2022 05:56
@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch 3 times, most recently from cd8837d to 3314d66 Compare August 3, 2022 06:01
Copy link
Copy Markdown
Contributor

@radek-kondziolka radek-kondziolka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AN extremely good results for benchmarks.
LGTM.

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from 3314d66 to f6d4ac3 Compare August 10, 2022 08:37
Copy link
Copy Markdown
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only reviewed first 4 commits, feel free to address them in #13432

int blockPosition = decodePosition(address);
long value = joinChannelBlocks.get(blockIndex).getLong(blockPosition, 0);

int pos = getHashPosition(value, mask);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help to store the hash position in an array here and do the lookups in a separate loop ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to optimize build side here, just probe side. But, indeed, this makes sense. I'll add it to my join todo list

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from f6d4ac3 to c1afc4c Compare August 11, 2022 17:42
@skrzypo987
Copy link
Copy Markdown
Member Author

benchmark-join-batching.pdf
Newest benchmarks on top of the newest version of #13432

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from c1afc4c to ee3821a Compare August 19, 2022 10:53
@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Aug 22, 2022

Could you fix #12618 (comment) and #12618 (comment) as a prerequisite?

ping

@@ -32,8 +32,28 @@ public interface LookupSource

long getJoinPosition(int position, Page hashChannelsPage, Page allChannelsPage, long rawHash);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this method for unspilled path.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartitionLookupSource uses it if the page is small.

}

@Override
public long[] getJoinPosition(int[] positions, Page hashChannelsPage, Page allChannelsPage, long[] rawHashes)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add UT for this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the 8 threshold in PartitionedLookupSource so now it always do batching, regardless of the page size.
That makes TestHashJoinOperator a pretty good unit-ish test for this method.

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch 4 times, most recently from 27d1866 to cb9551f Compare August 24, 2022 08:10
@skrzypo987 skrzypo987 requested a review from lukasz-stec August 25, 2022 08:31
Copy link
Copy Markdown
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments to 'Calculate JoinProbe position in batch' commit.

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch 2 times, most recently from 94df5e6 to d34e52b Compare August 26, 2022 06:03
Copy link
Copy Markdown
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % minor comments


// At this point for any reasoable load factor of a hash array (< .75), there is no more than
// 10 - 15% of positions left. We search for them in a sequential order and update the result array.
for (int i = 0; i < remainingCount; i++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it's a good idea to extract this part to a separate method? it would be useful to see the impact of this part in the profiler output

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will get inlined anyway. As of readability, I prefer those perf-optimised methods to be a bit longer but with some comments. I believe it is more readable that the standard clean code approach, given that the code is not easy to understand and never will be.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will get inlined anyway

Probably, but profilers can figure out the original method in most cases and attribute the time correctly e.g. in the flame graph.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYM? If the method is inlined it does not exist in the JFR. Or am I missing something?

int[] foundKeys = new int[positionCount];

// Search for positions in the hash array. This is the most CPU-consuming part as
// it relies on random memory accesses
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also values[foundKeys[index]] at line 228 that is random load, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. But it only fetches positions that has been found, which is often only a fraction.

int[] foundKeys = new int[positionCount];

// Search for positions in the hash array. This is the most CPU-consuming part as
// it relies on random memory accesses
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

positionEqualsCurrentRowIgnoreNulls can be actually way more expensive as depending on the number of channels it can have a lot more random memory accesses + it has to do the the equals logic

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Batching some methods from PagesHashStrategy might be a good follow-up

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from d34e52b to 3fb2393 Compare September 5, 2022 09:28
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm up to Implement batch API and PLS changes

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from 3fb2393 to a3a3b96 Compare September 6, 2022 09:37
@skrzypo987 skrzypo987 requested a review from sopel39 September 6, 2022 09:42
@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from a3a3b96 to 9487168 Compare September 8, 2022 07:43
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious, should this be Block block = nullableBlocks.get(i) ?
Would it be simpler to pass in list of indexes of nullable blocks of the probe page rather than the list of nullable blocks ?

Copy link
Copy Markdown
Member Author

@skrzypo987 skrzypo987 Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious, should this be Block block = nullableBlocks.get(i) ?

You are right. Fixed

Would it be simpler to pass in list of indexes of nullable blocks of the probe page rather than the list of nullable blocks ?

I prefer my way, because there is no two-level indexing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test which would've caught the above issue ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch 2 times, most recently from 0171069 to ad8ab1a Compare September 21, 2022 14:23
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test which would've caught the above issue ?

@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch 2 times, most recently from 5beca33 to 2733ba3 Compare September 23, 2022 05:40
Comment on lines 127 to 137
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid duplicating this block of code in the else branch ? The only difference appears to be in the contents of positions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one has additional hashes argument

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changing ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests in this class initialize JoinProbe without really using it. After introducing cache to JoinProbe, the LookupSource is being used in the constructor and invokes this method.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

skrzypo987 added 3 commits September 26, 2022 08:49
-Replace int[] -> List<Integer> in constructor
-Pass value by constructor rather than method
-Add requireNonNull in constructors
-Remove redundant field
Additionally add default implementations and calculate JoinProbe positions
using the new methods.
@skrzypo987 skrzypo987 force-pushed the skrzypo/088-batch-join-lookup-source branch from 2733ba3 to cd10ada Compare September 26, 2022 06:50
@skrzypo987
Copy link
Copy Markdown
Member Author

Can we add a test which would've caught the above issue ?

I just tried to test it and it appears that the bug would not return bad data. The only symptom is that rows containing null values may be propagated further and more rows will be eligible for a join. But those rows will no be joined anyway, since the have null values. So this will at most result in some performance degradation which is not really testable.

Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comment

if (!nullableBlocks.isEmpty()) {
Arrays.fill(joinPositionCache, -1);
boolean[] isNull = new boolean[positionCount];
int nonNullCount = getIsNull(nullableBlocks, positionCount, isNull);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to go though boolean[] isNull array as intermediate?
We can't we immediately compute int[] positions array (especially if there is a single nullable block)?

We can pre-allocate int[] positions to be positions = new int[positionCount].

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had discussion along similar lines here #13352 (comment)
The code which uses the array of positions subsequently takes exactly sized array as input parameter, so an array copy is needed to satisfy current API. Maybe we can change that, but let's leave it as a follow-up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure follow-up will happen :). It is be better to have interfaces in more or less stable form form get go.

Going though boolean[] isNull seems like a waste.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe boolean[] isNull make sense for multi-channel joins (rare), but for single channel joins they are just an overhead

@raunaqmorarka raunaqmorarka merged commit 666498b into trinodb:master Oct 2, 2022
@github-actions github-actions bot added this to the 399 milestone Oct 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants