[SPARK-28250][SQL] QueryPlan#references should exclude producedAttributes #25052

cloud-fan · 2019-07-04T13:30:05Z

What changes were proposed in this pull request?

This is a followup of the discussion in #24675 (comment)

QueryPlan#references is an important property. The ColumnPrunning rule relies on it.

Some query plan nodes have Seq[Attribute] parameter, which is used as its output attributes. For example, leaf nodes, Generate, MapPartitionsInPandas, etc. These nodes override producedAttributes to make missingInputs correct.

However, these nodes also need to override references to make column pruning work. This PR proposes to exclude producedAttributes from the default implementation of QueryPlan#references, so that we don't need to override references in all these nodes.

Note that, technically we can remove producedAttributes and always ask query plan nodes to override references. But I do find the code can be simpler with producedAttributes in some places, where there is a base class for some specific query plan nodes.

How was this patch tested?

existing tests

cloud-fan · 2019-07-04T13:32:33Z

cc @gatorsmile @viirya @HyukjinKwon

SparkQA · 2019-07-04T14:49:38Z

Test build #107234 has finished for PR 25052 at commit 07cfb9d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2019-07-04T15:17:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala

We can't override this now.

It seems like cheating if we override this method. We want to know if the plan is valid or not by this method, overriding this method means we skip this validation.

Ok. I think it makes sense to me. We should update this comment.

SparkQA · 2019-07-04T18:55:06Z

Test build #107242 has finished for PR 25052 at commit 34530d2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2019-07-05T00:35:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala


  override def outputPartitioning: Partitioning = child.outputPartitioning

-  override def producedAttributes: AttributeSet = AttributeSet(outputObjAttr)


Isn't outputObjAttr produced by FlatMapGroupsInRExec?

This is a no-op override, it's the same as ObjectProducerExec.producedAttributes

SparkQA · 2019-07-05T02:15:14Z

Test build #107247 has finished for PR 25052 at commit e7fc7e9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-07-09T03:04:44Z

Merged to master.

dongjoon-hyun added the SQL label Jul 4, 2019

viirya reviewed Jul 4, 2019

View reviewed changes

QueryPlan#references should exclude producedAttributes

34530d2

cloud-fan force-pushed the minor branch from 07cfb9d to 34530d2 Compare July 4, 2019 15:43

update comment

e7fc7e9

viirya reviewed Jul 5, 2019

View reviewed changes

viirya approved these changes Jul 5, 2019

View reviewed changes

HyukjinKwon approved these changes Jul 9, 2019

View reviewed changes

HyukjinKwon closed this in 75ea02b Jul 9, 2019

HyukjinKwon mentioned this pull request Jul 11, 2019

[MINOR][SQL] Clean up ObjectProducerExec operators #25065

Closed

HyukjinKwon mentioned this pull request Jul 27, 2019

[SPARK-28441][SQL][TESTS][FOLLOW-UP] Skip Python tests if python executable and pyspark library are unavailable #25272

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-28250][SQL] QueryPlan#references should exclude producedAttributes #25052

[SPARK-28250][SQL] QueryPlan#references should exclude producedAttributes #25052

Uh oh!

cloud-fan commented Jul 4, 2019 •

edited

Loading

Uh oh!

cloud-fan commented Jul 4, 2019

Uh oh!

SparkQA commented Jul 4, 2019

Uh oh!

viirya Jul 4, 2019

Uh oh!

cloud-fan Jul 4, 2019

Uh oh!

viirya Jul 4, 2019

Uh oh!

SparkQA commented Jul 4, 2019

Uh oh!

viirya Jul 5, 2019

Uh oh!

cloud-fan Jul 5, 2019

Uh oh!

SparkQA commented Jul 5, 2019

Uh oh!

HyukjinKwon commented Jul 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		override def outputPartitioning: Partitioning = child.outputPartitioning

		override def producedAttributes: AttributeSet = AttributeSet(outputObjAttr)

[SPARK-28250][SQL] QueryPlan#references should exclude producedAttributes #25052

[SPARK-28250][SQL] QueryPlan#references should exclude producedAttributes #25052

Uh oh!

Conversation

cloud-fan commented Jul 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Jul 4, 2019

Uh oh!

SparkQA commented Jul 4, 2019

Uh oh!

viirya Jul 4, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jul 4, 2019

Choose a reason for hiding this comment

Uh oh!

viirya Jul 4, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 4, 2019

Uh oh!

viirya Jul 5, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jul 5, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 5, 2019

Uh oh!

HyukjinKwon commented Jul 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cloud-fan commented Jul 4, 2019 •

edited

Loading