Skip to content

Issues with repartition #9701

Answered by JasonLi-cn
matt-martin asked this question in Q&A
Mar 19, 2024 · 3 comments · 3 replies
Discussion options

You must be logged in to vote

AboutQ2:
I believe the purpose of the optimization rule EnforceDistribution is to speed up the query while ensuring the correctness of the results. Although it replaces Hash with RoundRobinBatch, the final result is correct (by that I mean the output row data, not the result data with partitions information that you wanted).
In other words, if there is an Aggregation operator downstream of Repartition(Hash), then it would not be replaced at that time.
Therefore, if you want to achieve the desired result, the solution is to remove EnforceDistribution as I mentioned earlier.

Replies: 3 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@matt-martin
Comment options

Comment options

You must be logged in to vote
2 replies
@JasonLi-cn
Comment options

@JasonLi-cn
Comment options

Answer selected by alamb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants