Skip to content

Conversation

@prakharjain09
Copy link
Contributor

What changes were proposed in this pull request?

This PR tries to prune the unrequired output partitionings in cases when the columns are dropped from Project/Aggregates etc.

Why are the changes needed?

Consider this query:
select t1.id from t1 JOIN t2 on t1.id = t2.id

This query will have top level Project node which will just project t1.id. But the outputPartitioning of this project node will be: PartitioningCollection(HashPartitioning(t1.id), HashPartitioning(t2.id)).

But since we are not propagating t2.id column, so we can drop HashPartitioning(t2.id) from the output partitioning of Project node.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UTs.

@prakharjain09
Copy link
Contributor Author

cc - @maropu @cloud-fan

@prakharjain09 prakharjain09 changed the title [SPARK-33758][SQL] Prune unrequired partitionings from AliasAwareOutputExpressions [SPARK-33758][SQL] Prune unrequired partitionings from AliasAwareOutputPartitionings when some columns are dropped from projection Dec 14, 2020
@github-actions github-actions bot added the SQL label Dec 14, 2020
@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

add to whitelist

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37407/

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37407/

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37410/

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37410/

@prakharjain09
Copy link
Contributor Author

Thanks @cloud-fan for the review. I have addressed your comments.

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Test build #132805 has finished for PR 30762 at commit c33d418.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Test build #132808 has finished for PR 30762 at commit 09b4673.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SparkPod(pod: Pod, container: Container)
  • trait KubernetesFeatureConfigStep
  • public class Distributions
  • case class AlterTableRecoverPartitions(child: LogicalPlan) extends Command

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 23083aa Dec 15, 2020
@cloud-fan
Copy link
Contributor

BTW, shall we do the same to output ordering? We can prune SortOrder.sameOrderExpressions

@prakharjain09
Copy link
Contributor Author

prakharjain09 commented Dec 17, 2020

BTW, shall we do the same to output ordering? We can prune SortOrder.sameOrderExpressions

@cloud-fan Sure. Will make changes for it and raise PR in couple of days.

@maropu
Copy link
Member

maropu commented Dec 18, 2020

late lgtm. Thanks for fixing it, @prakharjain09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants