Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Jul 4, 2019

What changes were proposed in this pull request?

This is a followup of the discussion in #24675 (comment)

QueryPlan#references is an important property. The ColumnPrunning rule relies on it.

Some query plan nodes have Seq[Attribute] parameter, which is used as its output attributes. For example, leaf nodes, Generate, MapPartitionsInPandas, etc. These nodes override producedAttributes to make missingInputs correct.

However, these nodes also need to override references to make column pruning work. This PR proposes to exclude producedAttributes from the default implementation of QueryPlan#references, so that we don't need to override references in all these nodes.

Note that, technically we can remove producedAttributes and always ask query plan nodes to override references. But I do find the code can be simpler with producedAttributes in some places, where there is a base class for some specific query plan nodes.

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @gatorsmile @viirya @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Jul 4, 2019

Test build #107234 has finished for PR 25052 at commit 07cfb9d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't override this now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like cheating if we override this method. We want to know if the plan is valid or not by this method, overriding this method means we skip this validation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think it makes sense to me. We should update this comment.

@SparkQA
Copy link

SparkQA commented Jul 4, 2019

Test build #107242 has finished for PR 25052 at commit 34530d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


override def outputPartitioning: Partitioning = child.outputPartitioning

override def producedAttributes: AttributeSet = AttributeSet(outputObjAttr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't outputObjAttr produced by FlatMapGroupsInRExec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a no-op override, it's the same as ObjectProducerExec.producedAttributes

@SparkQA
Copy link

SparkQA commented Jul 5, 2019

Test build #107247 has finished for PR 25052 at commit e7fc7e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants