-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33758][SQL] Prune unrequired partitionings from AliasAwareOutputPartitionings when some columns are dropped from projection #30762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc - @maropu @cloud-fan |
|
ok to test |
|
add to whitelist |
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Thanks @cloud-fan for the review. I have addressed your comments. |
|
Test build #132805 has finished for PR 30762 at commit
|
|
Test build #132808 has finished for PR 30762 at commit
|
|
thanks, merging to master! |
|
BTW, shall we do the same to output ordering? We can prune |
@cloud-fan Sure. Will make changes for it and raise PR in couple of days. |
|
late lgtm. Thanks for fixing it, @prakharjain09 |
What changes were proposed in this pull request?
This PR tries to prune the unrequired output partitionings in cases when the columns are dropped from Project/Aggregates etc.
Why are the changes needed?
Consider this query:
select t1.id from t1 JOIN t2 on t1.id = t2.id
This query will have top level Project node which will just project t1.id. But the outputPartitioning of this project node will be: PartitioningCollection(HashPartitioning(t1.id), HashPartitioning(t2.id)).
But since we are not propagating t2.id column, so we can drop HashPartitioning(t2.id) from the output partitioning of Project node.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added UTs.