Conversation
964134f to
7995116
Compare
There was a problem hiding this comment.
else is not needed as the if above returns.
There was a problem hiding this comment.
Could partitioningColumns be empty?
There was a problem hiding this comment.
Let's also address Masha's comment in the merge join PR by adding comments about these assumptions to the merge join node:
https://github.com/facebookexternal/presto_cpp/pull/692
MergeJoinNode seems to be missing information about the sort order of the join keys. Is there an assumption that the sorting order is ASC and in case of multiple keys the order of the keys is the same as in "criteria"?
There was a problem hiding this comment.
There are cases where merge join can be planned but they cannot pass this check.
For example, if the two tables are both sorted by [A, B], and we join them on [B, A]. In this case, merge join should work.
The order in the leftJoinColumns and buildHashVariables doesn't matter.
We may need to add another field preSortedColumns to the merge join node(as Masha suggested) to indicate which columns are pre-sorted. The order of these columns could be different from the join key orders. In the above example, the join keys are [B, A], the preSortedColumns are [A, B].
Adding this field will also make it possible to support partial merge joins.
This can be done in a separate PR as it is a new feature.
There was a problem hiding this comment.
I addressed this comment by adding new logic; Can you help review if the new logic makes sense?
There was a problem hiding this comment.
The GroupedExecutionEnabled session property is just a hint right? Are there cases where it is set to true but group execution is not enabled?
7995116 to
de4728c
Compare
There was a problem hiding this comment.
This may not be supported at execution time. Currently we don't have a preSortedColumns to indicate the sort by columns. We are using the order of the join keys as the order of the sorted keys at the execution time. This plan means the left table is order by nationkey, custkey which is incorrect.
We need to either add a PreSortedColumns field or disable this case(join order doesn't match sort by column order) for now.
There was a problem hiding this comment.
Although I made this comment in the test file, this is actually about the new logic.
There was a problem hiding this comment.
Yes, sorry, I only added the new verification logic and didn't add the PreSortedColumns because I wanted to discuss with you about the official format in our 1-1
But just add two fields in the PR as initial plan, let me know what you think
|
@kewang1024 Ke, would you update commit message and PR description to explain the problem you are fixing and the solution you chose. |
|
@yuanzhanhku (the PR is not finalized) Can you help review if the new logic makes sense |
de4728c to
5e6eacb
Compare
5e6eacb to
b345cdc
Compare
| @JsonProperty ("left") PlanNode left, | ||
| @JsonProperty ("right") PlanNode right, | ||
| @JsonProperty ("criteria") List<JoinNode.EquiJoinClause> criteria, | ||
| @JsonProperty ("leftSortingProperties") List<SortingProperty<VariableReferenceExpression>> leftSortingProperties, |
There was a problem hiding this comment.
As discussed offline, let's separate this into two PRs:
One PR to fix the correctness issues. Make sure we only plan merge join for the supported use cases. Disable the merge join planning for cases like the join column order doesn't match sort column order and the cases with DESC orders. This PR should not introduce any new fields so it doesn't require CPP side changes. This can unblock the CPP benchmark.
Another PR to support more use cases as mentioned above by adding these additional fields. These are nice-to-have features(low priority) and are not blockers for the benchmark. It requires CPP side change to update the protocols.
Test plan - (Please fill in how you tested your changes)
Unit tests
depends on fix from #17458 for the unit test to pass