Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented May 5, 2016

What changes were proposed in this pull request?

We will eliminate the pair of DeserializeToObject and SerializeFromObject in Optimizer and add extra Project. However, when DeserializeToObject's outputObjectType is ObjectType and its cls can't be processed by unsafe project, it will be failed.

To fix it, we can simply remove the extra Project and replace the output attribute of DeserializeToObject in another rule.

How was this patch tested?

DatasetSuite.

@viirya
Copy link
Member Author

viirya commented May 5, 2016

This is another approach to #12898.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57858 has finished for PR 12926 at commit 48e6b6d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ObjectProject(
    • case class ObjectProject(

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57873 has finished for PR 12926 at commit 737c518.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…ialization-projection

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@SparkQA
Copy link

SparkQA commented May 6, 2016

Test build #57955 has finished for PR 12926 at commit 3d0554d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 6, 2016

cc @cloud-fan @yhuai @rxin

@cloud-fan
Copy link
Contributor

I think a simpler and better approach is just removing this alias only project in next batch.

@viirya
Copy link
Member Author

viirya commented May 6, 2016

@cloud-fan But how about to preserve DeserializeToObject 's output expr id?

@cloud-fan
Copy link
Contributor

Sorry, when I say "remove", I mean a safe removal that we transform the plan tree and replace attributes produced by alias with the original attributes.

@viirya
Copy link
Member Author

viirya commented May 8, 2016

@cloud-fan ok. let me try it.


/**
* Removes extra Project added in EliminateSerialization rule.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we make it more general? e.g. RemoveAliasOnlyProject, so that not only object operator can benefit from it.

@SparkQA
Copy link

SparkQA commented May 9, 2016

Test build #58127 has finished for PR 12926 at commit 4b0773a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58326 has finished for PR 12926 at commit 29a0c70.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-15094][SPARK-14803][SQL] Add ObjectProject for EliminateSerialization [SPARK-15094][SPARK-14803][SQL] Remove extra Project added in EliminateSerialization May 11, 2016
return false
} else {
projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
a.child match {
Copy link
Contributor

@cloud-fan cloud-fan May 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it just a semanticEquals attr?

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58351 has finished for PR 12926 at commit 85fba17.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58353 has finished for PR 12926 at commit ea55398.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
// Check if projectList in the Project node has the same attribute names and ordering
// as its child node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isAliasOnly

@cloud-fan
Copy link
Contributor

mostly LGTM except some style comments

@zsxwing
Copy link
Member

zsxwing commented May 11, 2016

Looks pretty good!

@viirya
Copy link
Member Author

viirya commented May 12, 2016

@cloud-fan @zsxwing Thanks! I've addressed your comments now.

@SparkQA
Copy link

SparkQA commented May 12, 2016

Test build #58440 has finished for PR 12926 at commit c3748ba.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

test this please.

@SparkQA
Copy link

SparkQA commented May 12, 2016

Test build #58447 has finished for PR 12926 at commit c3748ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
// Adds an extra Project here, to preserve the output expr id of `DeserializeToObject`.
// We will remove it later in RemoveAliasOnlyProject rule.
val objAttr = Alias(s.child.output.head, "obj")(exprId = d.output.head.exprId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use Alias(s.child.output.head, s.child.output.head.name)(exprId = d.output.head.exprId) to make sure the alias name is same with the attribute name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. update later.

@SparkQA
Copy link

SparkQA commented May 12, 2016

Test build #58474 has finished for PR 12926 at commit 882fc66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM

@yhuai
Copy link
Contributor

yhuai commented May 12, 2016

Thanks. Merging to master and branch 2.0.

asfgit pushed a commit that referenced this pull request May 12, 2016
…teSerialization

## What changes were proposed in this pull request?

We will eliminate the pair of `DeserializeToObject` and `SerializeFromObject` in `Optimizer` and add extra `Project`. However, when DeserializeToObject's outputObjectType is ObjectType and its cls can't be processed by unsafe project, it will be failed.

To fix it, we can simply remove the extra `Project` and replace the output attribute of `DeserializeToObject` in another rule.

## How was this patch tested?
`DatasetSuite`.

Author: Liang-Chi Hsieh <[email protected]>

Closes #12926 from viirya/fix-eliminate-serialization-projection.

(cherry picked from commit 470de74)
Signed-off-by: Yin Huai <[email protected]>
@asfgit asfgit closed this in 470de74 May 12, 2016
@viirya viirya deleted the fix-eliminate-serialization-projection branch December 27, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants