[SPARK-20253][SQL] Remove unnecessary nullchecks of a return value from Spark runtime routines in generated Java code#17569
[SPARK-20253][SQL] Remove unnecessary nullchecks of a return value from Spark runtime routines in generated Java code#17569kiszk wants to merge 9 commits intoapache:masterfrom
Conversation
|
Test build #75611 has finished for PR 17569 at commit
|
|
Test build #75614 has finished for PR 17569 at commit
|
|
@cloud-fan could you please review this? |
| Nil, | ||
| dataType = ObjectType(udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt())) | ||
| Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) | ||
| Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil, |
There was a problem hiding this comment.
The deserialize is totally implemented by users, can we guarantee not return null?
There was a problem hiding this comment.
I see. It is UDT. I have checked deserialized only in Spark runtime.
| Nil, | ||
| dataType = ObjectType(udt.getClass)) | ||
| Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) | ||
| Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil, |
| Nil, | ||
| dataType = ObjectType(udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt())) | ||
| Invoke(obj, "serialize", udt, inputObject :: Nil) | ||
| Invoke(obj, "serialize", udt, inputObject :: Nil, returnNullable = false) |
| Nil, | ||
| dataType = ObjectType(udt.getClass)) | ||
| Invoke(obj, "serialize", udt, inputObject :: Nil) | ||
| Invoke(obj, "serialize", udt, inputObject :: Nil, returnNullable = false) |
| """, | ||
| genValue => s"$convertedArray[$loopIndex] = $genValue;", | ||
| s"new ${classOf[GenericArrayData].getName}($convertedArray);" | ||
| s"new ${classOf[GenericArrayData].getName}($convertedArray); /*###*/" |
There was a problem hiding this comment.
Oh, sorry. It was my debug code...
| if ($funcResult == null) { | ||
| ${ev.isNull} = true; | ||
| } else { | ||
| if (!returnNullable) { |
There was a problem hiding this comment.
since we have postNullCheck, can we always go to this branch?
| // If the function can return null, we do an extra check to make sure our null bit is still set | ||
| // correctly. | ||
| val postNullCheck = if (ctx.defaultValue(dataType) == "null") { | ||
| val postNullCheck = if (ctx.defaultValue(dataType) == "null" && returnNullable) { |
There was a problem hiding this comment.
actually, can we embed postNullCheck to evaluate?
| checkDataset(dsBoolean.map(e => !e), false, true) | ||
| } | ||
|
|
||
| test("mapPrimitiveArray") { |
There was a problem hiding this comment.
do these tests fail before this PR?
There was a problem hiding this comment.
No, I have just added to confirm this check works well.
|
|
||
| case StringType => | ||
| Invoke(input, "toString", ObjectType(classOf[String])) | ||
| Invoke(input, "toString", ObjectType(classOf[String]), returnNullable = false) |
There was a problem hiding this comment.
can we check how many places we set returnNullable to true? If it's only a few, we can change the default value of returnNullable to false.
There was a problem hiding this comment.
Here is statistics for 59 call sites of Invoke().
18: dataType is primitive type
21: returnNullable is true (no specification at call site, as default)
19: returnNullable is false
1: set a variable to `returnNullable
What do you think?
There was a problem hiding this comment.
ok let's keep the default value unchanged
|
Test build #75619 has finished for PR 17569 at commit
|
|
Test build #75621 has finished for PR 17569 at commit
|
|
Seems there are places (i.e., |
| val funcResult = ctx.freshName("funcResult") | ||
| // If the function can return null, we do an extra check to make sure our null bit is still | ||
| // set correctly. | ||
| val postNullCheck = if (!returnNullable) { |
There was a problem hiding this comment.
nit: rename postNullCheck. It is actually not only null check but also assigning the function result.
|
Test build #75625 has finished for PR 17569 at commit
|
|
Test build #75627 has finished for PR 17569 at commit
|
| val funcResult = ctx.freshName("funcResult") | ||
| // If the function can return null, we do an extra check to make sure our null bit is still | ||
| // set correctly. | ||
| val postNullCheckAndAssign = if (!returnNullable) { |
There was a problem hiding this comment.
how about just assignResult?
|
LGTM |
1 similar comment
|
LGTM |
|
Test build #75628 has finished for PR 17569 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR elminates unnecessary nullchecks of a return value from known Spark runtime routines. We know whether a given Spark runtime routine returns
nullor not (e.g.ArrayData.toDoubleArray()never returnsnull). Thus, we can eliminate a null check for the return value from the Spark runtime routine.When we run the following example program, now we get the Java code "Without this PR". In this code, since we know
ArrayData.toDoubleArray()never returns ``null```, we can eliminate null checks at lines 90-92, and 97.Without this PR
With this PR (removed most of lines 90-97 in the above code)
How was this patch tested?
Add test suites to
DatasetPrimitiveSuite