Skip to content

Add null stats for join probe#21203

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:null_salt_probe
Oct 28, 2023
Merged

Add null stats for join probe#21203
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:null_salt_probe

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Oct 20, 2023

Description

Part of #20355

Track number of probe keys and number of null probe keys in HBO, and use it to enable outer join null salt.

Motivation and Context

Currently HBO only records the null key stats for join build side, and use it to enable null salt in HBO. This PR also add probe side information and use it to enable null salt in HBO too.

Impact

Resolve the probe side skew in null join.

Test Plan

Add unit test
And end to end test on tracking

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Track null probe keys in join and use it to enable outer join null salt in history based optimization.

@feilong-liu feilong-liu requested a review from a team as a code owner October 20, 2023 20:19
@feilong-liu feilong-liu marked this pull request as draft October 20, 2023 20:19
@feilong-liu feilong-liu force-pushed the null_salt_probe branch 4 times, most recently from 8c2f86c to 4967e25 Compare October 20, 2023 21:39
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Piggyback the count here, without adding overhead in counting.

@feilong-liu feilong-liu marked this pull request as ready for review October 20, 2023 23:26
Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

High level observation - salting on the inner side (right side of left and left side of right) is unnecessary. We should instead be filtering them out.

@feilong-liu
Copy link
Contributor Author

Looks good.

High level observation - salting on the inner side (right side of left and left side of right) is unnecessary. We should instead be filtering them out.

Sure, I think this should be in a separate rule, and with this null filter, we will not hit the null skew in the inner side. Looks like that we already have one such rule AddNotNullFiltersToJoinNode which is default to false and we need to enable it.

@kaikalur
Copy link
Contributor

kaikalur commented Oct 24, 2023 via email

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be based on ratio as compared to counts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but counts are better because irrespective of how big the table is you have to shuffle these anyway so in some sense it's not related to ratio per se

@feilong-liu feilong-liu force-pushed the null_salt_probe branch 2 times, most recently from 18d5f8f to e9f398c Compare October 26, 2023 22:23
Collect number of null keys for join, and use it to trigger NULL salt optimization with HBO
@feilong-liu feilong-liu merged commit 2a18649 into prestodb:master Oct 28, 2023
@feilong-liu feilong-liu deleted the null_salt_probe branch October 28, 2023 05:58
@wanglinsong wanglinsong mentioned this pull request Dec 8, 2023
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants