Fix merge join plan generation for scenarios of multi partitions and grouped-execution#17699
Closed
kewang1024 wants to merge 2 commits intoprestodb:masterfrom
Closed
Fix merge join plan generation for scenarios of multi partitions and grouped-execution#17699kewang1024 wants to merge 2 commits intoprestodb:masterfrom
kewang1024 wants to merge 2 commits intoprestodb:masterfrom
Conversation
4a0ce19 to
6ba595d
Compare
77c8f9b to
5a86d2e
Compare
yuanzhanhku
reviewed
May 9, 2022
presto-hive/src/test/java/com/facebook/presto/hive/TestMergeJoinPlan.java
Outdated
Show resolved
Hide resolved
1. Merge join requires grouped execution, thus add check if session property of grouped_execution is turned on before generating MergeJoinNode 2. Merge join also requires data to be sharded between splits/files, thus add check of left and right side stream properties
visitMergeJoin function was missing from GroupedExecutionTagger which makes it impossible to obtain the grouped execution property for MergeJoinNode
801548a to
5c591be
Compare
yuanzhanhku
approved these changes
May 12, 2022
Contributor
|
Can you explain why grouped execution is needed for merge join? I thought we just needed sorted input? Also, does that mean the table needs to be bucketed too? |
Contributor
The current implementation of Merge Join only works for bucketed and sorted tables. We need to make sure the two join inputs have the same key spaces to make the merge join work. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes three issues
When join keys don't contain partition key and we query multiple partitions of data, we can't enable merge join.
We fix it by adding stream property check
When grouped execution is enabled, its GroupedExecutionProperties is missing when we generate Merge join node
We fix it by adding the visitMergeJoin function in GroupedExecutionTagger
Next step