Single column bigint join round 2 by skrzypo987 · Pull Request #13432 · trinodb/trino

skrzypo987 · 2022-08-01T04:55:26Z

Description

This is the next approach to #13178, which has been reverted due to #13380.
This time the memory consumption is not so significantly increased. The additionally allocated long array is indexed by page positions, instead of hash buckets making it significantly smaller. Benchmarks show 3x less of an increase in peak memory usage of TPC benchmarks compared to #13178.
The performance gain should be smaller because of CPUs out-of-order execution is less possible with this approach. However, after merging of #13352 it should not matter.

For reviewers: The only change between this PR and #13178 is the addressing of values array in BigintPagesHash class.

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

Increase the performance of join on single bigint column

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Improve performance of joins over a single BIGINT column

skrzypo987 · 2022-08-01T04:56:30Z

@arhimondr Can you check if this branch passes the verification that failed in #13380?

sopel39

Does it still give good benchmarking results?

sopel39 · 2022-08-01T09:28:09Z

core/trino-main/src/main/java/io/trino/operator/join/BigintPagesHash.java

+
+    private final int mask;
+    private final int[] key;
+    private final long[] values;


this array is redundant to the one in pagesHashStrategy. I think we can get rid of the array in pagesHashStrategy.

The PagesHash is still used to produce the output page in appenTo method so we can get rid of it only if it does not exist in the output. Working on that. I hope that this can be merged as it is and then we can work on further reducing the memory footprint.

I hope that this can be merged as it is and then we can work on further reducing the memory footprint.

I think we can merge it only if we use bigint path on small hash tables. Otherwise there will still be increased memory usage.

skrzypo987 · 2022-08-01T09:33:00Z

Does it still give good benchmarking results?

Benchmarks are still running, but common sense tells me that the results are going to be worse. However, after adding batched execution it should get better again.

skrzypo987 · 2022-08-03T09:57:22Z

I've added a cut-off. If the number of positions in a single PagesHash exceeds 2^19, values are not stored in a separate array.
The good news is that the memory consumption remains low, probably eve slightly lower than previously.
The bad news is that for every position there is an additional if which destroys any perf gain observed before.
The good news is that this if is no longer a problem when we introduce batch processing (#13352). However, this PR itself does not present any gain.
@sopel39

sopel39 · 2022-08-03T10:16:10Z

Ok, so it's tabled after #13352

skrzypo987 · 2022-08-03T10:19:27Z

This PR is a prerequisite for #13352
So we have a chicken and an egg problem

skrzypo987 · 2022-08-03T10:35:56Z

We can also concat those two PRs and merge as one

skrzypo987 · 2022-08-03T13:52:30Z

The cut-off now makes the join use the DefaultPagesHash, which means some perf gain is going to be visible after merging this PR

sopel39

@arhimondr would you be able to test this version?

lukasz-stec

mostly lgtm.
there is one potential issue with generated join classes cache

core/trino-main/src/main/java/io/trino/operator/PagesIndex.java

lukasz-stec · 2022-08-08T07:40:52Z

core/trino-main/src/main/java/io/trino/operator/join/BigintPagesHash.java

+
+// This implementation assumes:
+// -There is only one join channel and it is of type bigint
+// -arrays used in the hash are always a power of 2.


Suppose we want to preserve as much memory as possible. In that case, we could use an approach similar to io.trino.operator.HashGenerator#getPartition to find the hash table slot and not rely on the power of two hash table size.

Outside of scope here. The BigintPagesHash is a close of the default one. Bigger changes may land in subsequent PRs.
BTW this PR is definitely not about saving memory. Saving memory on hash tables will usually result in performance regressions.

I made some benchmarks and there are two conclusions:

calculating hash bucket like in HashGenerator#getPartition is slightly slower than a simple bit mask. Slightly, but it is visible in benchmarks

Having a too big hash table instead of the one perfectly size increases performance simply because the average load is usually smaller than the max load factor. So this is a simple tradeoff between performance and memory and there is no incentive to change it to match perfectly the load factor
I do like this way of thinking though

lukasz-stec · 2022-08-08T07:46:33Z

core/trino-main/src/main/java/io/trino/operator/join/BigintPagesHash.java

+
+    private final int mask;
+    private final int[] key;
+    private final long[] values;


why key is in singular and values in plural form?

Good question. I'll make a separate PR to fix this in both implementations once this lands

lukasz-stec · 2022-08-08T07:51:28Z

core/trino-main/src/main/java/io/trino/operator/join/BigintPagesHash.java

+
+            // index pages
+            for (int position = 0; position < stepSize; position++) {
+                int realPosition = position + stepBeginPosition;


At least to me, the name position collocates with Block position. What do you think about renaming position to batchIndex and realPosition to addressIndex?

Again, out of scope

core/trino-main/src/main/java/io/trino/operator/join/JoinHashSupplier.java

arhimondr

Is this PR still being worked on? Please let me know when it's ready to run a set of test queries. However since there's a limit now I'm pretty confident the memory footprint shouldn't change significantly.

arhimondr · 2022-08-08T21:03:51Z

core/trino-main/src/main/java/io/trino/operator/join/PagesHash.java

-
-// This implementation assumes arrays used in the hash are always a power of 2
-public final class PagesHash
+public interface PagesHash


Interface call might prevent methods from being inlined. I wonder what is the thought process around that?

Those classes are isolated per query and only one implementation is used so with any luck JIT will easily inline it.

arhimondr · 2022-08-08T21:05:34Z

core/trino-main/src/main/java/io/trino/operator/join/JoinHashSupplier.java

+     * This value is purposefully identical to that of IncrementalLoadFactorHashArraySizeSupplier#THRESHOLD_50,
+     * as higher load factor means more excessive memory consumption
+     */
+    private static final int BIGINT_SINGLE_CHANNEL_MAX_ADDRESSES = 1 << 20; // 1024576


So in theory the memory utilization shouldn't increase by more than 8MB, right?

I wonder how difficult would it be to avoid storing values twice (once in a block and once in an array)?

Actually a bit less. We stored a single byte hash per hash bucket before this PR. Now we store 8 bytes per actual position. The load factor for < 1M positions is 0.5 meaning an average load of 0.375. So we swapped x bytes for 8*x*0.375=3*x bytes.
With number f positions > 1M, the load factor is 0.75, which translates to 4.5*x bytes.
The previous version that failed the verifier added a constant 8 bytes, regardless of the load factor

I wonder how difficult would it be to avoid storing values twice (once in a block and once in an array)?

This is, unfortunately, really tricky, since the value blocks are used by other things like filters and sorting and the addressing convention is consistent across all channels, not only joined ones.

skrzypo987 · 2022-08-09T04:44:22Z

s this PR still being worked on?

Sorry about the mess. The benchmark results started going crazy at some point and I am still trying to figure out the cause.
I will ping you once it is ready

sopel39

Do you have newest benchmarks with lowered limit?

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

skrzypo987 · 2022-08-09T11:20:09Z

Do you have newest benchmarks with lowered limit?

After running gazillion of benchmarks I finally figured out why the benchmark are so bad. Isolating both implementation of PagesHash prevents inlining and preformance regresses significantly. Unfortunately we do not know how many positions will be used at the time of isolation, so this totally ruins the cutoff feature as it is now.

skrzypo987 · 2022-08-09T11:22:03Z

@arhimondr Can you verify the second to last commit - ae20b9f.
This is the one without the cut-off but should still use less memory than the one that got reverted.

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

core/trino-main/src/main/java/io/trino/operator/PagesIndex.java

sopel39 · 2022-08-10T14:30:46Z

core/trino-main/src/main/java/io/trino/sql/gen/JoinCompiler.java

                LookupSourceSupplier.class,
                JoinHashSupplier.class,
                JoinHash.class,
+                PagesHash.class,


Why this is needed? or maybe where it stats to slow down without it? This is JIT abc to optimize classes that use single implementation of interface. If this doesn't work at some level, this breaks our assumptions about Java perf

I don't know. Just the fact that it works is enough for me

skrzypo987 · 2022-08-11T05:18:16Z

benchmark-bigint-join-tpcds.pdf
The tpcds benchmark finished. The version without the limit is slightly faster (~1%), but the limit 1M commit regresses about 5% cpu time.
I am going to cherry-pick the queries that suck and try to find a common pattern

skrzypo987 · 2022-08-11T16:31:06Z

I added PartitionedLookupSource to isolated classes and the regression went away. The gain is minimal compared to master but there is no regression.
benchmark-bigint-join-tpcds.pdf

skrzypo987 · 2022-08-17T05:04:18Z

Did some finishing (hopefully) touches:

The threshold is fixed, no config option anymore
Last two commits are squashed. The limit is introduced along with the actual change

The gains (for unpart orc) are:

tpch 6% cpu time gain
tpcds < 1% cpu time gain

Many small, cosmetic comments will be addressed in the follow-up PR

raunaqmorarka

lgtm % comments

core/trino-main/src/main/java/io/trino/operator/IncrementalLoadFactorHashArraySizeSupplier.java

core/trino-main/src/main/java/io/trino/operator/PagesIndex.java

core/trino-main/src/main/java/io/trino/operator/join/BigintPagesHash.java

In the majority of cases the join is on a single bigint column. This commit introduces a specific code path that will handle only single column bigint joins. This way we can skip the logic behind value comparisons. The new class holds bigint values in a long[] array. This provides far superior value comparison performance. It comes, however, with a higher memory consumption. That is why the limit of probe side positions is introduced, beyond which the old implementation is chosen.

raunaqmorarka · 2022-08-18T05:51:12Z

@arhimondr do we need to re-run your tests on this ? It looks good to land to me otherwise.

arhimondr · 2022-08-18T14:02:49Z

@raunaqmorarka The test run came out clean, no out of memory failures.

sopel39

nit: kill switch would still be nice

cla-bot bot added the cla-signed label Aug 1, 2022

skrzypo987 requested a review from sopel39 August 1, 2022 04:55

skrzypo987 mentioned this pull request Aug 1, 2022

Batch join lookup source #13352

Merged

sopel39 reviewed Aug 1, 2022

View reviewed changes

skrzypo987 added the performance label Aug 2, 2022

sopel39 requested review from gaurav8297 and lukasz-stec August 2, 2022 10:28

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from 01dfbf2 to 8b91c80 Compare August 3, 2022 09:52

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from 8b91c80 to e90bb50 Compare August 3, 2022 13:50

sopel39 approved these changes Aug 3, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from e90bb50 to fae76c4 Compare August 8, 2022 06:27

lukasz-stec reviewed Aug 8, 2022

View reviewed changes

arhimondr reviewed Aug 8, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from fae76c4 to 63f70f5 Compare August 9, 2022 04:37

sopel39 reviewed Aug 9, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/FeaturesConfig.java Outdated Show resolved Hide resolved

core/trino-main/src/main/java/io/trino/FeaturesConfig.java Outdated Show resolved Hide resolved

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from 63f70f5 to 7d6bde3 Compare August 10, 2022 06:15

sopel39 reviewed Aug 10, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from 7d6bde3 to a0b6aa3 Compare August 10, 2022 14:48

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from a0b6aa3 to 4d35836 Compare August 11, 2022 16:28

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from 4d35836 to b9856da Compare August 11, 2022 17:26

skrzypo987 added 2 commits August 17, 2022 07:47

Rename class

b6c3918

Extract interface

55c358c

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from b9856da to 50263d8 Compare August 17, 2022 05:00

raunaqmorarka reviewed Aug 17, 2022

View reviewed changes

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from 50263d8 to 63e5e61 Compare August 17, 2022 05:27

skrzypo987 force-pushed the skrzypo/095-bigint-join-round-2 branch from 63e5e61 to 1653c92 Compare August 17, 2022 08:42

raunaqmorarka approved these changes Aug 17, 2022

View reviewed changes

raunaqmorarka requested review from arhimondr and sopel39 August 18, 2022 05:51

sopel39 approved these changes Aug 19, 2022

View reviewed changes

raunaqmorarka merged commit 1b6724a into trinodb:master Aug 19, 2022

raunaqmorarka mentioned this pull request Aug 19, 2022

Release notes for 394 #13723

Closed

github-actions bot added this to the 394 milestone Aug 19, 2022

colebow mentioned this pull request Aug 24, 2022

Add Trino 394 release notes #13813

Merged

Conversation

skrzypo987 commented Aug 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues, pull requests, and links

Documentation

Release notes

Uh oh!

skrzypo987 commented Aug 1, 2022

Uh oh!

sopel39 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skrzypo987 commented Aug 1, 2022

Uh oh!

skrzypo987 commented Aug 3, 2022

Uh oh!

sopel39 commented Aug 3, 2022

Uh oh!

skrzypo987 commented Aug 3, 2022

Uh oh!

skrzypo987 commented Aug 3, 2022

Uh oh!

skrzypo987 commented Aug 3, 2022

Uh oh!

sopel39 left a comment

Choose a reason for hiding this comment

Uh oh!

lukasz-stec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arhimondr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skrzypo987 commented Aug 9, 2022

Uh oh!

sopel39 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

skrzypo987 commented Aug 9, 2022

Uh oh!

skrzypo987 commented Aug 9, 2022

Uh oh!

Uh oh!

Uh oh!

skrzypo987 commented Aug 1, 2022 •

edited

Loading

sopel39 left a comment •

edited

Loading