Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Mar 7, 2018

What changes were proposed in this pull request?

Add interpreted execution for InitializeJavaBean expression.

How was this patch tested?

Added unit test.

method.getParameterTypes.length == 1
}
assert(foundMethods.length == 1,
throw new RuntimeException("The Java Bean instance should have only one " +
Copy link
Member Author

@viirya viirya Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codegen evaluation does not check method existence. But for non-codegen evaluation here, it is a bit weird to directly invoke first found method (we may not find it and cause non such element exception). cc @hvanhovell

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the compiler will check if the method exists. We could throw a NoSuchMethodError if plan to keep doing resolution in the eval method (see my other comment).

@SparkQA
Copy link

SparkQA commented Mar 7, 2018

Test build #88032 has finished for PR 20756 at commit 0c48a9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 7, 2018

Test build #88034 has finished for PR 20756 at commit b8f171e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

override def eval(input: InternalRow): Any = {
val instance = beanInstance.eval(input).asInstanceOf[Object]
if (instance != null) {
setters.foreach { case (setterMethod, fieldExpr) =>
Copy link
Contributor

@hvanhovell hvanhovell Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we resolving setters during eval? That seems a bit expensive. How about we create the setters before we execute eval? For example (I got a bit carried away):

  private lazy val resolvedSetters = {
    val ObjectType(beanClass) = beanInstance.dataType
    val lookup = MethodHandles.lookup()
    setters.map {
      case (name, expr) =>
        // Resolve expression type (should be better!)
        val fieldClass = CallMethodViaReflection.typeMapping(expr.dataType).head
        val handle = lookup.findVirtual(
          beanClass,
          name,
          MethodType.methodType(classOf[Unit], fieldClass))
        handle -> expr
    }
  }

  override def eval(input: InternalRow): Any = {
    val bean = beanInstance.eval(input)
    if (bean != null) {
      resolvedSetters.foreach {
        case (setter, expr) =>
          // nullability?
          setter.invoke(bean, expr.eval(input))
      }
    }
    bean
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup.findVirtual or lookup.findSetter? Strictly I think InitializeJavaBean should be used only for setters not for general methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup.findSetter means direct field access in the MethodHandle world. So you need to use lookup.findVirtual.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question: Do we need to find method with argument types? Codegen execution also does not do this check. I would think the caller of InitializeJavaBean should give correct arguments to the setters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do. The compiler will check the generated code right? It is just a matter of where you decide to fail. In a perfect world this would happen on the driver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm it should be lenient. Can you try MethodType.methodType(classOf[Object], fieldClass))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MethodType.methodType(classOf[Object], fieldClass)) does not work too.

Actually I am not sure if MethodHandle can find the def add(x$1: Int): Boolean of java.util.LinkedList.

scala> lookup.findVirtual(classOf[java.util.LinkedList[Int]], "add", MethodType.methodType(classOf[Boolean], classOf[Int]))

java.lang.NoSuchMethodException: no such method: java.util.LinkedList.add(int)boolean/invokeVirtual

Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we go back to reflection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks! :)

val serializer = new JavaSerializer(new SparkConf()).newInstance
val resolver = ResolveTimeZone(new SQLConf)
val expr = resolver.resolveTimeZones(serializer.deserialize(serializer.serialize(expression)))
// Make it as method to obtain fresh expression everytime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content of bean instance will be changed after first evaluation of interpreted execution. For example, in the added unit test, the input bean of the later evaluation will become [1] not []. So the later evaluation result will be [1, 1].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using a literal? Ok, makes sense.

val ObjectType(beanClass) = beanInstance.dataType

setters.map { case (setterMethod, fieldExpr) =>
val foundMethods = beanClass.getMethods.filter { method =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Picking up our earlier conversation) You are not checking the argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will updated later.

@SparkQA
Copy link

SparkQA commented Mar 7, 2018

Test build #88044 has finished for PR 20756 at commit b51303a.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • throw new RuntimeException(\"The Java Bean class should have only one \" +

eval
}.getMessage
if (errMsg != expectedErrMsg) {
if (!errMsg.contains(expectedErrMsg)) {
Copy link
Member Author

@viirya viirya Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For codegen error, it has very verbose message like:

Code generation of initializejavabean([], (nonexisting,1)) failed:
[info]   java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 22
: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 22: A method named "nonexisting
" is not declared in any enclosing class nor any supertype, nor through a static import
[info]   java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 22
: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 22: A method named "nonexisting
" is not declared in any enclosing class nor any supertype, nor through a static import
[info]          at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
[info]          at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)

So changes it to test if it contains the given error message.

}
if (methods.isEmpty) {
throw new NoSuchMethodException(s"""A method named "$name" is not declared """ +
"in any enclosing class nor any supertype, nor through a static import")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is copied from compiling error thrown in codegen execution. So we can test the same error message for both execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the last bit , nor through a static import?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

case (name, expr) =>
// Looking for known type mapping first, then using Class attached in `ObjectType`.
// Finally also looking for general `Object`-type parameter for generic methods.
val paramTypes = CallMethodViaReflection.typeMapping.getOrElse(expr.dataType,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As #20753, we need to add other types into the mapping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, who is making those changes? You or @kiszk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have no time to do tonight, if @kiszk do not make those changes, I will do it tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the similar variable typeJavaMapping and typeJavaMapping. This is because typeMapping is used for CallMethodViaReflection whose comment says For now, only types defined in Reflect.typeMapping are supported (basically primitives and string) as input types.

If we can expand this support to additional classes (e.g. DateType, TimestampType, BinaryType, and CalendarIntervalType), I can merge three variables into one typeMapping.

WDYT? @hvanhovell and @viirya

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any special reason it only supports basically primitives and string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to expand the support, maybe we can have another PR to expand it in CallMethodViaReflection and merge those variables.

Copy link
Member

@kiszk kiszk Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is it better to have separate map variables, for now? cc @hvanhovell

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not coming back to this sooner. AFAIK CallMethodViaReflection expression was only designed to work with a couple of primitives. I think we are looking for something a little bit more complete here, i.e. support all types in Spark SQL's type system. I also don't think that we should put the mappings in CallMethodViaReflection because the mapping is now using in more expressions, ScalaReflection is IMO a better place for this logic.

And finally which PR will implement this. cc @maropu for visibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #20753 and #20797

@viirya
Copy link
Member Author

viirya commented Mar 8, 2018

also cc @kiszk @maropu

}

override def eval(input: InternalRow): Any = {
val instance = beanInstance.eval(input).asInstanceOf[Object]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: better to put the cast inside if for avoiding unnecessary casts in case of many null cases?;

    val instance = beanInstance.eval(input)
    if (instance != null) {
      val obj = instance.asInstanceOf[Object]
      resolvedSetters.foreach {
        case (setter: Method, expr) =>
          setter.invoke(obj, expr.eval(input).asInstanceOf[Object])
      }
    }
    instance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@maropu
Copy link
Member

maropu commented Mar 8, 2018

LGTM except for one minor comment

override def eval(input: InternalRow): Any =
throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
private lazy val resolvedSetters = {
val ObjectType(beanClass) = beanInstance.dataType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to put assert(beanInstance.dataType.isInstanceOf[ObjectType]) in the constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88074 has finished for PR 20756 at commit 970ed6c.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88073 has finished for PR 20756 at commit 6372f04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88080 has finished for PR 20756 at commit e20c207.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val bean = instance.asInstanceOf[Object]
resolvedSetters.foreach {
case (setter, expr) =>
setter.invoke(bean, expr.eval(input).asInstanceOf[Object])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the cast to .asInstanceOf[Object]? And why not use AnyRef?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Without cast:

type mismatch;
[error]  found   : Any
[error]  required: Object
[error]           setter.invoke(bean, expr.eval(input))

As you seen it is required Object, but I think AnyRef is equivalent and works too. Do you prefer AnyRef?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnyRef is a bit more scala like, that is all.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88092 has finished for PR 20756 at commit f3fdf57.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 9, 2018

Test build #88113 has finished for PR 20756 at commit 1a78334.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Mar 9, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Mar 9, 2018

Test build #88118 has finished for PR 20756 at commit 1a78334.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val bean = instance.asInstanceOf[Object]
resolvedSetters.foreach {
case (setter, expr) =>
setter.invoke(bean, expr.eval(input).asInstanceOf[AnyRef])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a subtle difference between code generation and interpreted mode here. A null value for an expression that maps to a java primitive will be some default value (e.g. -1) for code generation and null for interpreted mode, this can lead to different results.

I am not sure we should address this, because I am not 100% if this can ever happen. @cloud-fan could you shed some light on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong:
If the setter takes primitive types, like setAge(int i), and we pass a null via reflection, the actual passed value would be 0. This is different from the codegen version, seems like a potential bug.

IMO, I think the codegen version is wrong. In general we should not read the codegen value if it's marked as null.

This doesn't cause any problem, because we only use these object expressions to generate encoders, which means the parameter for a primitive setter won't be null. But if we treat these expressions as a general DSL, we should be careful about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya can you add a null check to both the interpreted and code generated version? Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell For non-primitive setter, seems it is valid to pass a null into a setter method. A null check means we don't allow such case at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya it might be valid to invoke method with a null reference. However let's - for now - make the behavior consistent, and avoid calling a method when its argument is null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Done.

checkEvaluation(initializeBean, list, InternalRow.fromSeq(Seq()))

val initializeWithNonexistingMethod = InitializeJavaBean(
Literal.fromObject(new java.util.LinkedList[Int]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test for when the parameter types do not match up?

Copy link
Member Author

@viirya viirya Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added below.

Note that because for generic method, its parameter type is Object, so for LinkedList[Int], it doesn't make sense to test it with something like Map("add" -> Literal("a string")). So I add TestBean to test this case.

Map("setX" -> Literal("1")))
intercept[Exception] {
evaluateWithoutCodegen(initializeWithWrongParamType, InternalRow.fromSeq(Seq()))
}.getMessage.contains(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For codegen the compile exception is like:

No applicable constructor/method found for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates are: "public void org.apache.spark.sql.catalyst.expressions.TestBean.setX(int)"

I'm not sure if we want to exactly match this kind of exception message from interpreted execution. Might be a little overkill to do that by looking methods with same name. So currently I only test interpreted execution.

@SparkQA
Copy link

SparkQA commented Mar 27, 2018

Test build #88618 has finished for PR 20756 at commit e7640e1.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class TestBean extends Serializable

@viirya
Copy link
Member Author

viirya commented Mar 27, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 27, 2018

Test build #88623 has finished for PR 20756 at commit e7640e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class TestBean extends Serializable

@SparkQA
Copy link

SparkQA commented Apr 5, 2018

Test build #88929 has finished for PR 20756 at commit 573db59.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 5, 2018

Test build #88927 has finished for PR 20756 at commit 96a8fe6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 5, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 5, 2018

Test build #88931 has finished for PR 20756 at commit 573db59.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 5, 2018

retest this please.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - merging to master. Thanks!

@SparkQA
Copy link

SparkQA commented Apr 5, 2018

Test build #88936 has finished for PR 20756 at commit 573db59.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in c5c8b54 Apr 5, 2018
@hvanhovell
Copy link
Contributor

@viirya I thought it passed tests, but your last change is causing errors. I am going to revert this for now. Can you reopen? Sorry about this.

@viirya
Copy link
Member Author

viirya commented Apr 5, 2018

@hvanhovell Looks like we have tests using null as value for setters. So the added null check fails.

@hvanhovell
Copy link
Contributor

I think the semantic should be that we do not invoke the setter if the input is null.

@viirya
Copy link
Member Author

viirya commented Apr 5, 2018

We don't invoke the setter if the input is null, sounds like we simply skip calling the setter if the input is null?

@hvanhovell
Copy link
Contributor

Yes that would effectively be the same.

Do you think we should always call setters for reference types?

@viirya
Copy link
Member Author

viirya commented Apr 5, 2018

I'm not sure if this setter is always for Initialization of bean's property. Will it be possibly used to update the property with a null value? If no, I think skipping is safe.

robert3005 pushed a commit to palantir/spark that referenced this pull request Apr 7, 2018
…xpression

## What changes were proposed in this pull request?

Add interpreted execution for `InitializeJavaBean` expression.

## How was this patch tested?

Added unit test.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#20756 from viirya/SPARK-23593.
@viirya viirya deleted the SPARK-23593 branch December 27, 2023 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants