-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15094][SPARK-14803][SQL] Remove extra Project added in EliminateSerialization #12926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15094][SPARK-14803][SQL] Remove extra Project added in EliminateSerialization #12926
Conversation
|
This is another approach to #12898. |
|
Test build #57858 has finished for PR 12926 at commit
|
|
Test build #57873 has finished for PR 12926 at commit
|
…ialization-projection Conflicts: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
|
Test build #57955 has finished for PR 12926 at commit
|
|
I think a simpler and better approach is just removing this alias only project in next batch. |
|
@cloud-fan But how about to preserve |
|
Sorry, when I say "remove", I mean a safe removal that we transform the plan tree and replace attributes produced by alias with the original attributes. |
|
@cloud-fan ok. let me try it. |
|
|
||
| /** | ||
| * Removes extra Project added in EliminateSerialization rule. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we make it more general? e.g. RemoveAliasOnlyProject, so that not only object operator can benefit from it.
|
Test build #58127 has finished for PR 12926 at commit
|
|
Test build #58326 has finished for PR 12926 at commit
|
| return false | ||
| } else { | ||
| projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) => | ||
| a.child match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it just a semanticEquals attr?
|
Test build #58351 has finished for PR 12926 at commit
|
|
Test build #58353 has finished for PR 12926 at commit
|
| */ | ||
| object RemoveAliasOnlyProject extends Rule[LogicalPlan] { | ||
| // Check if projectList in the Project node has the same attribute names and ordering | ||
| // as its child node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: isAliasOnly
|
mostly LGTM except some style comments |
|
Looks pretty good! |
|
@cloud-fan @zsxwing Thanks! I've addressed your comments now. |
|
Test build #58440 has finished for PR 12926 at commit
|
|
test this please. |
|
Test build #58447 has finished for PR 12926 at commit
|
| } | ||
| // Adds an extra Project here, to preserve the output expr id of `DeserializeToObject`. | ||
| // We will remove it later in RemoveAliasOnlyProject rule. | ||
| val objAttr = Alias(s.child.output.head, "obj")(exprId = d.output.head.exprId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use Alias(s.child.output.head, s.child.output.head.name)(exprId = d.output.head.exprId) to make sure the alias name is same with the attribute name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. update later.
|
Test build #58474 has finished for PR 12926 at commit
|
|
LGTM |
|
Thanks. Merging to master and branch 2.0. |
…teSerialization ## What changes were proposed in this pull request? We will eliminate the pair of `DeserializeToObject` and `SerializeFromObject` in `Optimizer` and add extra `Project`. However, when DeserializeToObject's outputObjectType is ObjectType and its cls can't be processed by unsafe project, it will be failed. To fix it, we can simply remove the extra `Project` and replace the output attribute of `DeserializeToObject` in another rule. ## How was this patch tested? `DatasetSuite`. Author: Liang-Chi Hsieh <[email protected]> Closes #12926 from viirya/fix-eliminate-serialization-projection. (cherry picked from commit 470de74) Signed-off-by: Yin Huai <[email protected]>
What changes were proposed in this pull request?
We will eliminate the pair of
DeserializeToObjectandSerializeFromObjectinOptimizerand add extraProject. However, when DeserializeToObject's outputObjectType is ObjectType and its cls can't be processed by unsafe project, it will be failed.To fix it, we can simply remove the extra
Projectand replace the output attribute ofDeserializeToObjectin another rule.How was this patch tested?
DatasetSuite.