[Do not review][Native] Do not carry over partitioning in StreamPropertyDerivations for nested loop joins#23315
[Do not review][Native] Do not carry over partitioning in StreamPropertyDerivations for nested loop joins#23315aditi-pandit wants to merge 1 commit intomasterfrom
Conversation
c4f53c3 to
a2d90e6
Compare
a2d90e6 to
e8d286a
Compare
| .translate(column -> PropertyDerivations.filterIfMissing(outputs, column)) | ||
| .unordered(unordered); | ||
| if (nativeExecution && node.getCriteria().isEmpty()) { | ||
| // This maps to a NestedLoopJoin in Native engine. The NestedLoopJoin output is not |
There was a problem hiding this comment.
@rschlussel @feilong-liu Rebecca, Feilong, would you help take a first pass for this change? At a high level this seems reasonable, but I haven't been working in the optimizer code for quite some time.
There was a problem hiding this comment.
does only left join with no criteria produce nested loop join? What happens for inner join? Also, would it be enough to just change unordered to false for these cases, but otherwise keep the other things the same (does the stream distribution or partitioning change)?
There was a problem hiding this comment.
@rschlussel : That's a good point. I was trying to contain this fix, but this change would be applicable for inner too. This change has come by in correlated queries that seem to use LEFT join predominantly.
But just changing unordered to false didn't work. I had to change the partitioning as well for the change to take effect. The addLocalExchanges seems to leverage distribution as well. More at #22585 (comment)
| { | ||
| private final boolean nativeExecution; | ||
|
|
||
| public ValidateStreamingAggregations() |
There was a problem hiding this comment.
why do we need this zero argument constructor?
| { | ||
| private final boolean nativeExecution; | ||
|
|
||
| public ValidateStreamingJoins() |
There was a problem hiding this comment.
why do we need the zero argument constructor?
| // is very small or much smaller than the right, then flip the join. | ||
| if (rightSize > leftSize || (isSizeBasedJoinDistributionTypeEnabled(context.getSession()) && (Double.isNaN(leftSize) || Double.isNaN(rightSize)) && isLeftSideSmall(joinNode, context))) { | ||
| rewrittenNode = createRuntimeSwappedJoinNode(joinNode, metadata, sqlParser, context.getLookup(), context.getSession(), context.getVariableAllocator(), context.getIdAllocator()); | ||
| // This is never used for Prestissimo. |
There was a problem hiding this comment.
i don't like that this assumption isn't enforced by the code. Could we instead pass the native execution config to this optimizer like we do with other optimizers (may need to inject FeaturesConfig into adaptivePlanOptimizers, but that's fine).
|
@rschlussel : Thanks for your review. I hear more recently that Velox team is planning to fix NestedLoopJoin. This PR will not be required if the Velox change happens. This PR has wide changes in the optimizer. So I will resume work on it only if the Velox change will not happen. |
|
This PR should address the NLJ issue in Velox, so this probably won't be needed: |
|
Closing as the underlying NestedLoopJoin in Velox is fixed to maintain probe row ordering now. facebookincubator/velox#10651 |
Description
Nested joins do not maintain order of rows in Native execution. So do not carry the properties forward in StreamPropertyDerivations class.
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: