Choose broadcast join when one side is small the other side is unknown#19978
Choose broadcast join when one side is small the other side is unknown#19978feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
kaikalur
left a comment
There was a problem hiding this comment.
Also add some tests with real query or at least plantest?
There was a problem hiding this comment.
I am thinking of default to false, and roll out gradually.
Add plan test |
There was a problem hiding this comment.
How is this different from sizeBasedJoin here?
There was a problem hiding this comment.
The size based join only considers the size of input table, even when the estimation of immediate probe/build input is available.
For example, if I have a query like
with B as (select * from t1 join t2 using (key)) select * from A Join B using (key),
if we have estimation of A unknown, but estimation of B is small, we will not have broadcast with size based join. As in size based join, the size of t1 and t2 considered not representative of size of B after join operation. This is one of the case this PR is trying to solve.
I also thought about patching to the size based join, however, the size based join produces query plans for cases which I do not need here, and I will still need to have a separate session parameter inside the size based join implementation to control it. And it also makes the logic of size based join more complex. Hence I chose to have it as a separate part here.
pranjalssh
left a comment
There was a problem hiding this comment.
It looks good, but the only thing I don't like about this is we have too many session parameters now.
Ideally, this should be enabled by default - and all breaking queries should override necesarry params. But migration is hard. For example, size-based-join is disabled in our clusters as well.
Can we rename this to something like experimental-join-distribution-type-enabled, so its clear we intend to fully roll this out and then eventually remove this?
Do you mean to change the session param name to something like "experiment_broadcast_when_buildsize_small_probeside_unknown". Will it be better to add a TODO comment than changing the name? I feel that this naming could confuse users. But definitely agree that we need to simplify our session params, and will make it a default behavior after fully rolling it out. |
c4c1a63 to
13d4275
Compare
In current cost based join type check, when the estimated size of one side of the join is small, i.e. within broadcast limit, but the other side is unknown, we will end up with partitioned join following syntactic order. This PR adds an option to choose broadcast join with the smaller side to be build input. It's controlled by a session parameter which is default to false.
In current cost based join type check, when the estimated size of one side of the join is small, i.e. within broadcast limit, but the other side is unknown, we will end up with partitioned join following syntactic order. This PR adds an option to choose broadcast join with the smaller side to be build input.
It's controlled by a session parameter which is default to false.
Test plan - (Please fill in how you tested your changes)
Test locally end to end.