Skip to content

Conversation

@sun-rui
Copy link
Contributor

@sun-rui sun-rui commented Apr 21, 2016

What changes were proposed in this pull request?

This PR fixes a bug in EliminateSerialization rule in Catalyst Optimizer. When there is a SerializeFromObject and DeserializeToObject pair with equal inputObjectType and outputObjectType, the pair will be eliminated and replaced by a Project operator. However, when the involved object type is Row, there will be a codegen compilation error, because UnsafeProjection does not support Row.

How was this patch tested?

Unit tests.

@sun-rui
Copy link
Contributor Author

sun-rui commented Apr 21, 2016

@cloud-fan.
It just skips optimization when the object type is an instance of ObjectType. Not sure if there is a better fix. For example, directly pass the object, or have a safeProject operator?

@gatorsmile
Copy link
Member

gatorsmile commented Apr 21, 2016

It sounds like you miss the test case. : ) Please add at least one into EliminateSerializationSuite

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56535 has finished for PR 12575 at commit cf11a75.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

Can you add a test case to expose this bug?

And I think a better way to fix this bug is to remove the alias only project, but not stop the optimization.

@sun-rui
Copy link
Contributor Author

sun-rui commented Apr 25, 2016

I changed the code as follows:

    case d @ DeserializeToObject(_, _, s: SerializeFromObject)
        if d.outputObjectType == s.inputObjectType =>
        if (!d.outputObjectType.isInstanceOf[ObjectType]) {
          // Adds an extra Project here, to preserve the output expr id of `DeserializeToObject`.
          val objAttr = Alias(s.child.output.head, "obj")(exprId = d.output.head.exprId)
          Project(objAttr :: Nil, s.child)
        } else {
          s.child
        }

This works for my cases.
Does "preserve the output expr id of DeserializeToObject." really matter?
@cloud-fan

@cloud-fan
Copy link
Contributor

It do matters, we can't change a plan's output while transforming it. And your change breaks most of the optimizable cases this rule can handle. This rule tries to eleminate object serialization and you filter out all ObjectType...

@sun-rui
Copy link
Contributor Author

sun-rui commented Apr 26, 2016

OK. I will study optimizer and try to have a better fix.

@JoshRosen
Copy link
Contributor

It looks like this issue has been resolved via #12926. @sun-rui, could you close this pull request? Thanks!

@sun-rui sun-rui closed this Sep 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants