Add null stats for join probe#21203
Conversation
8c2f86c to
4967e25
Compare
There was a problem hiding this comment.
Piggyback the count here, without adding overhead in counting.
kaikalur
left a comment
There was a problem hiding this comment.
Looks good.
High level observation - salting on the inner side (right side of left and left side of right) is unnecessary. We should instead be filtering them out.
Sure, I think this should be in a separate rule, and with this null filter, we will not hit the null skew in the inner side. Looks like that we already have one such rule |
|
[like] Sreeni Viswanadha reacted to your message:
…________________________________
From: feilong-liu ***@***.***>
Sent: Tuesday, October 24, 2023 10:08:09 PM
To: prestodb/presto ***@***.***>
Cc: Sreeni Viswanadha ***@***.***>; Review requested ***@***.***>
Subject: Re: [prestodb/presto] Add null stats for join probe (PR #21203)
Looks good. High level observation - salting on the inner side (right side of left and left side of right) is unnecessary. We should instead be filtering them out. Sure, I think this should be in a separate rule, and with this null filter,
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
ZjQcmQRYFpfptBannerEnd
Looks good.
High level observation - salting on the inner side (right side of left and left side of right) is unnecessary. We should instead be filtering them out.
Sure, I think this should be in a separate rule, and with this null filter, we will not hit the null skew in the inner side. Looks like that we already have one such rule AddNotNullFiltersToJoinNode which is default to false and we need to enable it.
—
Reply to this email directly, view it on GitHub<#21203 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAPNYANMLVON7NRSPQSLDOTYBA34TAVCNFSM6AAAAAA6JQ6K7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZYGEZDCMBRGU>.
You are receiving this because your review was requested.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Should this be based on ratio as compared to counts?
There was a problem hiding this comment.
Yeah but counts are better because irrespective of how big the table is you have to shuffle these anyway so in some sense it's not related to ratio per se
18d5f8f to
e9f398c
Compare
Collect number of null keys for join, and use it to trigger NULL salt optimization with HBO
e9f398c to
c69a449
Compare
Description
Part of #20355
Track number of probe keys and number of null probe keys in HBO, and use it to enable outer join null salt.
Motivation and Context
Currently HBO only records the null key stats for join build side, and use it to enable null salt in HBO. This PR also add probe side information and use it to enable null salt in HBO too.
Impact
Resolve the probe side skew in null join.
Test Plan
Add unit test
And end to end test on tracking
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.