Skip to content

Extend join prefilter optimizer to use hash for filter#23858

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:join_filter_hash
Oct 22, 2024
Merged

Extend join prefilter optimizer to use hash for filter#23858
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:join_filter_hash

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Oct 18, 2024

Description

JoinPrefilter optimizer was added in #22667 which shows great improvement for a bunch of queries within Meta. However there are two limits of the current implementation

  • When the join key is wide, for example VARCHAR type, the semi join added will have a high memory overhead
  • It currently only supports single join key
    To solve the above limits, this PR:
  • Use the hash of the join key for semi join when the join key is wide.
  • When having multiple join keys, hash each key and combine the hash into a single key for semi join
    We can apply the hash as this pre filter is an opportunistic and does not need to be comprehensive.

Motivation and Context

To apply the join prefilter optimizer to more query shapes

Impact

Apply join prefilter to more query shapes

Test Plan

Add to existing plans
Also run verifier suites
This optimizer is default to be disabled as it's not always beneficial, we are considering using HBO to enable it in the future

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Improve JoinPrefilter optimizer for wide join keys and multiple join keys :pr:`23858`

jaystarshot
jaystarshot previously approved these changes Oct 18, 2024
* - InnerJoin
* leftKey = rightKey
* - scan l
* - semijoin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can show the hash project here to explain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to cover all three cases

  • one join key (not varchar type)
  • one join key (varchar type)
  • multiple join keys

kaikalur
kaikalur previously approved these changes Oct 19, 2024
List<VariableReferenceExpression> rightKeyList = equiJoinClause.stream().map(EquiJoinClause::getRight).collect(toImmutableList());
checkState(IntStream.range(0, leftKeyList.size()).boxed().allMatch(i -> leftKeyList.get(i).getType().equals(rightKeyList.get(i).getType())));

boolean hashJoinKey = leftKeyList.size() > 1 || isWideColumn(leftKeyList.get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it simple and just do it always for varchar/char type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

jaystarshot
jaystarshot previously approved these changes Oct 21, 2024
@feilong-liu
Copy link
Contributor Author

@jaystarshot I have updated the code to address the test failure, can you take another look? Thanks!

@feilong-liu feilong-liu merged commit 1512671 into prestodb:master Oct 22, 2024
@feilong-liu feilong-liu deleted the join_filter_hash branch October 22, 2024 17:38
@jaystarshot jaystarshot mentioned this pull request Nov 1, 2024
25 tasks
@tdcmeehan tdcmeehan added the from:Meta PR from Meta label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants