-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28250][SQL] QueryPlan#references should exclude producedAttributes #25052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #107234 has finished for PR 25052 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't override this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like cheating if we override this method. We want to know if the plan is valid or not by this method, overriding this method means we skip this validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think it makes sense to me. We should update this comment.
|
Test build #107242 has finished for PR 25052 at commit
|
|
|
||
| override def outputPartitioning: Partitioning = child.outputPartitioning | ||
|
|
||
| override def producedAttributes: AttributeSet = AttributeSet(outputObjAttr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't outputObjAttr produced by FlatMapGroupsInRExec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a no-op override, it's the same as ObjectProducerExec.producedAttributes
|
Test build #107247 has finished for PR 25052 at commit
|
|
Merged to master. |
What changes were proposed in this pull request?
This is a followup of the discussion in #24675 (comment)
QueryPlan#referencesis an important property. TheColumnPrunningrule relies on it.Some query plan nodes have
Seq[Attribute]parameter, which is used as its output attributes. For example, leaf nodes,Generate,MapPartitionsInPandas, etc. These nodes overrideproducedAttributesto makemissingInputscorrect.However, these nodes also need to override
referencesto make column pruning work. This PR proposes to excludeproducedAttributesfrom the default implementation ofQueryPlan#references, so that we don't need to overridereferencesin all these nodes.Note that, technically we can remove
producedAttributesand always ask query plan nodes to overridereferences. But I do find the code can be simpler withproducedAttributesin some places, where there is a base class for some specific query plan nodes.How was this patch tested?
existing tests