[SPARK-13549] [SQL] Refactor the Optimizer Rule CollapseProject#11427
[SPARK-13549] [SQL] Refactor the Optimizer Rule CollapseProject#11427gatorsmile wants to merge 4 commits intoapache:masterfrom
Conversation
|
Test build #52177 has finished for PR 11427 at commit
|
|
LGTM |
| projectList1.exists(_.collect { | ||
| case a: Attribute if aliasMap.contains(a) => aliasMap(a).child | ||
| }.exists(!_.deterministic)) | ||
| } |
There was a problem hiding this comment.
Further refactored your code a little bit:
def collectAliases(projectList: Seq[NamedExpression]): AttributeMap[Alias] = {
AttributeMap(projectList.collect {
case a: Alias => a.toAttribute -> a
})
}
def haveCommonNonDeterministicOutput(
upper: Seq[NamedExpression], lower: Seq[NamedExpression]): Boolean = {
val aliases = collectAliases(lower)
upper.exists(_.collect {
case a: Attribute if aliases.contains(a) => aliases(a).child
}).exists(!_.deterministic)
}
def buildCleanedProjectList(
upper: Seq[NamedExpression],
lower: Seq[NamedExpression]): Seq[NamedExpression] = {
val aliases = collectAliases(lower)
val rewrittenUpper = upper.map(_.transform {
case a: Attribute => aliases.getOrElse(a, a)
})
rewrittenUpper.map { p =>
CleanupAliases.trimNonTopLevelAliases(p).asInstanceOf[NamedExpression]
}
}And those inline comments need some rewording as they are now moved to different contexts.
|
Sorry for the late review. In general it looks good. Left some comments for better readability and testing. |
|
This is definitely a low priority issue. Thank you for your review! Will do the change based on your suggestions. @liancheng |
|
Test build #53934 has finished for PR 11427 at commit
|
|
Thanks! Merging to master. |
|
Thank you! @liancheng Now, this rule supports collapsing two continuous After rethinking it, maybe we also can collapse one upper |
|
Let me post an example here: SELECT sum(b) FROM parquet_t2 group by a, bThe current optimized plan is like Ideally, we can directly remove the Project. |
|
But, their physical plan is still the same. I guess, we do not need to do it. : ) |
|
@gatorsmile Yea, whole-stage codegen makes this kind of optimization not that appealing. |
|
I see. Thanks! @liancheng Let me find something else in Optimizer. Will do expression folding this weekend. : ) |
What changes were proposed in this pull request?
The PR #10541 changed the rule
CollapseProjectby enabling collapsingProjectintoAggregate. It leaves a to-do item to remove the duplicate code. This PR is to finish this to-do item. Also added a test case for covering this change.How was this patch tested?
Added a new test case.
@liancheng Could you check if the code refactoring is fine? Thanks!