[SPARK-33758][SQL] Prune unrequired partitionings from AliasAwareOutputPartitionings when some columns are dropped from projection #30762

prakharjain09 · 2020-12-14T12:39:45Z

What changes were proposed in this pull request?

This PR tries to prune the unrequired output partitionings in cases when the columns are dropped from Project/Aggregates etc.

Why are the changes needed?

Consider this query:
select t1.id from t1 JOIN t2 on t1.id = t2.id

This query will have top level Project node which will just project t1.id. But the outputPartitioning of this project node will be: PartitioningCollection(HashPartitioning(t1.id), HashPartitioning(t2.id)).

But since we are not propagating t2.id column, so we can drop HashPartitioning(t2.id) from the output partitioning of Project node.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UTs.

prakharjain09 · 2020-12-14T12:40:11Z

cc - @maropu @cloud-fan

cloud-fan · 2020-12-15T05:44:39Z

ok to test

cloud-fan · 2020-12-15T05:44:47Z

add to whitelist

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

…itioning

SparkQA · 2020-12-15T06:53:49Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37407/

SparkQA · 2020-12-15T06:59:55Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37407/

SparkQA · 2020-12-15T07:52:49Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37410/

SparkQA · 2020-12-15T08:28:30Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37410/

prakharjain09 · 2020-12-15T08:55:26Z

Thanks @cloud-fan for the review. I have addressed your comments.

SparkQA · 2020-12-15T10:15:28Z

Test build #132805 has finished for PR 30762 at commit c33d418.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-15T11:13:03Z

Test build #132808 has finished for PR 30762 at commit 09b4673.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class SparkPod(pod: Pod, container: Container)
trait KubernetesFeatureConfigStep
public class Distributions
case class AlterTableRecoverPartitions(child: LogicalPlan) extends Command

cloud-fan · 2020-12-15T13:46:42Z

thanks, merging to master!

cloud-fan · 2020-12-17T12:50:12Z

BTW, shall we do the same to output ordering? We can prune SortOrder.sameOrderExpressions

prakharjain09 · 2020-12-17T13:57:00Z

BTW, shall we do the same to output ordering? We can prune SortOrder.sameOrderExpressions

@cloud-fan Sure. Will make changes for it and raise PR in couple of days.

maropu · 2020-12-18T01:40:05Z

late lgtm. Thanks for fixing it, @prakharjain09

Prune unnecessary partitionings

c33d418

prakharjain09 changed the title ~~[SPARK-33758][SQL] Prune unrequired partitionings from AliasAwareOutputExpressions~~ [SPARK-33758][SQL] Prune unrequired partitionings from AliasAwareOutputPartitionings when some columns are dropped from projection Dec 14, 2020

github-actions bot added the SQL label Dec 14, 2020

cloud-fan reviewed Dec 15, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala Outdated Show resolved Hide resolved

cloud-fan approved these changes Dec 15, 2020

View reviewed changes

prakharjain09 added 2 commits December 15, 2020 12:09

review comments addressed

6e748ba

Merge remote-tracking branch 'oss/master' into SPARK-33758-prune-part…

09b4673

…itioning

prakharjain09 requested a review from cloud-fan December 15, 2020 10:16

cloud-fan closed this in 23083aa Dec 15, 2020

[SPARK-33758][SQL] Prune unrequired partitionings from AliasAwareOutputPartitionings when some columns are dropped from projection #30762

[SPARK-33758][SQL] Prune unrequired partitionings from AliasAwareOutputPartitionings when some columns are dropped from projection #30762

Uh oh!

Conversation

prakharjain09 commented Dec 14, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

prakharjain09 commented Dec 14, 2020

Uh oh!

cloud-fan commented Dec 15, 2020

Uh oh!

cloud-fan commented Dec 15, 2020

Uh oh!

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

prakharjain09 commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

cloud-fan commented Dec 15, 2020

Uh oh!

cloud-fan commented Dec 17, 2020

Uh oh!

prakharjain09 commented Dec 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented Dec 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

prakharjain09 commented Dec 17, 2020 •

edited

Loading