-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23593][SQL] Add interpreted execution for InitializeJavaBean expression #20756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| method.getParameterTypes.length == 1 | ||
| } | ||
| assert(foundMethods.length == 1, | ||
| throw new RuntimeException("The Java Bean instance should have only one " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codegen evaluation does not check method existence. But for non-codegen evaluation here, it is a bit weird to directly invoke first found method (we may not find it and cause non such element exception). cc @hvanhovell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the compiler will check if the method exists. We could throw a NoSuchMethodError if plan to keep doing resolution in the eval method (see my other comment).
|
Test build #88032 has finished for PR 20756 at commit
|
|
Test build #88034 has finished for PR 20756 at commit
|
| override def eval(input: InternalRow): Any = { | ||
| val instance = beanInstance.eval(input).asInstanceOf[Object] | ||
| if (instance != null) { | ||
| setters.foreach { case (setterMethod, fieldExpr) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we resolving setters during eval? That seems a bit expensive. How about we create the setters before we execute eval? For example (I got a bit carried away):
private lazy val resolvedSetters = {
val ObjectType(beanClass) = beanInstance.dataType
val lookup = MethodHandles.lookup()
setters.map {
case (name, expr) =>
// Resolve expression type (should be better!)
val fieldClass = CallMethodViaReflection.typeMapping(expr.dataType).head
val handle = lookup.findVirtual(
beanClass,
name,
MethodType.methodType(classOf[Unit], fieldClass))
handle -> expr
}
}
override def eval(input: InternalRow): Any = {
val bean = beanInstance.eval(input)
if (bean != null) {
resolvedSetters.foreach {
case (setter, expr) =>
// nullability?
setter.invoke(bean, expr.eval(input))
}
}
bean
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lookup.findVirtual or lookup.findSetter? Strictly I think InitializeJavaBean should be used only for setters not for general methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lookup.findSetter means direct field access in the MethodHandle world. So you need to use lookup.findVirtual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question: Do we need to find method with argument types? Codegen execution also does not do this check. I would think the caller of InitializeJavaBean should give correct arguments to the setters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do. The compiler will check the generated code right? It is just a matter of where you decide to fail. In a perfect world this would happen on the driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm it should be lenient. Can you try MethodType.methodType(classOf[Object], fieldClass))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MethodType.methodType(classOf[Object], fieldClass)) does not work too.
Actually I am not sure if MethodHandle can find the def add(x$1: Int): Boolean of java.util.LinkedList.
scala> lookup.findVirtual(classOf[java.util.LinkedList[Int]], "add", MethodType.methodType(classOf[Boolean], classOf[Int]))
java.lang.NoSuchMethodException: no such method: java.util.LinkedList.add(int)boolean/invokeVirtualAm I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should we go back to reflection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #20753 (comment) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks! :)
| val serializer = new JavaSerializer(new SparkConf()).newInstance | ||
| val resolver = ResolveTimeZone(new SQLConf) | ||
| val expr = resolver.resolveTimeZones(serializer.deserialize(serializer.serialize(expression))) | ||
| // Make it as method to obtain fresh expression everytime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content of bean instance will be changed after first evaluation of interpreted execution. For example, in the added unit test, the input bean of the later evaluation will become [1] not []. So the later evaluation result will be [1, 1].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using a literal? Ok, makes sense.
| val ObjectType(beanClass) = beanInstance.dataType | ||
|
|
||
| setters.map { case (setterMethod, fieldExpr) => | ||
| val foundMethods = beanClass.getMethods.filter { method => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Picking up our earlier conversation) You are not checking the argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will updated later.
|
Test build #88044 has finished for PR 20756 at commit
|
| eval | ||
| }.getMessage | ||
| if (errMsg != expectedErrMsg) { | ||
| if (!errMsg.contains(expectedErrMsg)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For codegen error, it has very verbose message like:
Code generation of initializejavabean([], (nonexisting,1)) failed:
[info] java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 22
: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 22: A method named "nonexisting
" is not declared in any enclosing class nor any supertype, nor through a static import
[info] java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 22
: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 22: A method named "nonexisting
" is not declared in any enclosing class nor any supertype, nor through a static import
[info] at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
[info] at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
So changes it to test if it contains the given error message.
| } | ||
| if (methods.isEmpty) { | ||
| throw new NoSuchMethodException(s"""A method named "$name" is not declared """ + | ||
| "in any enclosing class nor any supertype, nor through a static import") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is copied from compiling error thrown in codegen execution. So we can test the same error message for both execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the last bit , nor through a static import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
| case (name, expr) => | ||
| // Looking for known type mapping first, then using Class attached in `ObjectType`. | ||
| // Finally also looking for general `Object`-type parameter for generic methods. | ||
| val paramTypes = CallMethodViaReflection.typeMapping.getOrElse(expr.dataType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As #20753, we need to add other types into the mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, who is making those changes? You or @kiszk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have no time to do tonight, if @kiszk do not make those changes, I will do it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the similar variable typeJavaMapping and typeJavaMapping. This is because typeMapping is used for CallMethodViaReflection whose comment says For now, only types defined in Reflect.typeMapping are supported (basically primitives and string) as input types.
If we can expand this support to additional classes (e.g. DateType, TimestampType, BinaryType, and CalendarIntervalType), I can merge three variables into one typeMapping.
WDYT? @hvanhovell and @viirya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any special reason it only supports basically primitives and string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to expand the support, maybe we can have another PR to expand it in CallMethodViaReflection and merge those variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Is it better to have separate map variables, for now? cc @hvanhovell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not coming back to this sooner. AFAIK CallMethodViaReflection expression was only designed to work with a couple of primitives. I think we are looking for something a little bit more complete here, i.e. support all types in Spark SQL's type system. I also don't think that we should put the mappings in CallMethodViaReflection because the mapping is now using in more expressions, ScalaReflection is IMO a better place for this logic.
And finally which PR will implement this. cc @maropu for visibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| override def eval(input: InternalRow): Any = { | ||
| val instance = beanInstance.eval(input).asInstanceOf[Object] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: better to put the cast inside if for avoiding unnecessary casts in case of many null cases?;
val instance = beanInstance.eval(input)
if (instance != null) {
val obj = instance.asInstanceOf[Object]
resolvedSetters.foreach {
case (setter: Method, expr) =>
setter.invoke(obj, expr.eval(input).asInstanceOf[Object])
}
}
instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
|
LGTM except for one minor comment |
| override def eval(input: InternalRow): Any = | ||
| throw new UnsupportedOperationException("Only code-generated evaluation is supported.") | ||
| private lazy val resolvedSetters = { | ||
| val ObjectType(beanClass) = beanInstance.dataType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to put assert(beanInstance.dataType.isInstanceOf[ObjectType]) in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
|
Test build #88074 has finished for PR 20756 at commit
|
|
Test build #88073 has finished for PR 20756 at commit
|
|
Test build #88080 has finished for PR 20756 at commit
|
| val bean = instance.asInstanceOf[Object] | ||
| resolvedSetters.foreach { | ||
| case (setter, expr) => | ||
| setter.invoke(bean, expr.eval(input).asInstanceOf[Object]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the cast to .asInstanceOf[Object]? And why not use AnyRef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Without cast:
type mismatch;
[error] found : Any
[error] required: Object
[error] setter.invoke(bean, expr.eval(input))
As you seen it is required Object, but I think AnyRef is equivalent and works too. Do you prefer AnyRef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnyRef is a bit more scala like, that is all.
|
Test build #88092 has finished for PR 20756 at commit
|
|
Test build #88113 has finished for PR 20756 at commit
|
|
retest this please |
|
Test build #88118 has finished for PR 20756 at commit
|
| val bean = instance.asInstanceOf[Object] | ||
| resolvedSetters.foreach { | ||
| case (setter, expr) => | ||
| setter.invoke(bean, expr.eval(input).asInstanceOf[AnyRef]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a subtle difference between code generation and interpreted mode here. A null value for an expression that maps to a java primitive will be some default value (e.g. -1) for code generation and null for interpreted mode, this can lead to different results.
I am not sure we should address this, because I am not 100% if this can ever happen. @cloud-fan could you shed some light on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong:
If the setter takes primitive types, like setAge(int i), and we pass a null via reflection, the actual passed value would be 0. This is different from the codegen version, seems like a potential bug.
IMO, I think the codegen version is wrong. In general we should not read the codegen value if it's marked as null.
This doesn't cause any problem, because we only use these object expressions to generate encoders, which means the parameter for a primitive setter won't be null. But if we treat these expressions as a general DSL, we should be careful about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya can you add a null check to both the interpreted and code generated version? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell For non-primitive setter, seems it is valid to pass a null into a setter method. A null check means we don't allow such case at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @hvanhovell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya it might be valid to invoke method with a null reference. However let's - for now - make the behavior consistent, and avoid calling a method when its argument is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Done.
| checkEvaluation(initializeBean, list, InternalRow.fromSeq(Seq())) | ||
|
|
||
| val initializeWithNonexistingMethod = InitializeJavaBean( | ||
| Literal.fromObject(new java.util.LinkedList[Int]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test for when the parameter types do not match up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added below.
Note that because for generic method, its parameter type is Object, so for LinkedList[Int], it doesn't make sense to test it with something like Map("add" -> Literal("a string")). So I add TestBean to test this case.
| Map("setX" -> Literal("1"))) | ||
| intercept[Exception] { | ||
| evaluateWithoutCodegen(initializeWithWrongParamType, InternalRow.fromSeq(Seq())) | ||
| }.getMessage.contains( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For codegen the compile exception is like:
No applicable constructor/method found for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates are: "public void org.apache.spark.sql.catalyst.expressions.TestBean.setX(int)"
I'm not sure if we want to exactly match this kind of exception message from interpreted execution. Might be a little overkill to do that by looking methods with same name. So currently I only test interpreted execution.
|
Test build #88618 has finished for PR 20756 at commit
|
|
retest this please. |
|
Test build #88623 has finished for PR 20756 at commit
|
|
Test build #88929 has finished for PR 20756 at commit
|
|
Test build #88927 has finished for PR 20756 at commit
|
|
retest this please. |
|
Test build #88931 has finished for PR 20756 at commit
|
|
retest this please. |
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - merging to master. Thanks!
|
Test build #88936 has finished for PR 20756 at commit
|
|
@viirya I thought it passed tests, but your last change is causing errors. I am going to revert this for now. Can you reopen? Sorry about this. |
|
@hvanhovell Looks like we have tests using null as value for setters. So the added null check fails. |
|
I think the semantic should be that we do not invoke the setter if the input is null. |
|
We don't invoke the setter if the input is null, sounds like we simply skip calling the setter if the input is null? |
|
Yes that would effectively be the same. Do you think we should always call setters for reference types? |
|
I'm not sure if this setter is always for Initialization of bean's property. Will it be possibly used to update the property with a null value? If no, I think skipping is safe. |
…xpression ## What changes were proposed in this pull request? Add interpreted execution for `InitializeJavaBean` expression. ## How was this patch tested? Added unit test. Author: Liang-Chi Hsieh <[email protected]> Closes apache#20756 from viirya/SPARK-23593.
What changes were proposed in this pull request?
Add interpreted execution for
InitializeJavaBeanexpression.How was this patch tested?
Added unit test.