[SPARK-22695][SQL] ScalaUDF should not use global variables#19900
[SPARK-22695][SQL] ScalaUDF should not use global variables#19900mgaido91 wants to merge 3 commits intoapache:masterfrom
Conversation
|
Test build #84491 has finished for PR 19900 at commit
|
|
@cloud-fan @kiszk @viirya may you please review this? Thanks |
| val expressionClassName = classOf[Expression].getName | ||
| val scalaUDFClassName = classOf[ScalaUDF].getName | ||
| private val converterClassName = classOf[Any => Any].getName | ||
| private val expressionClassName = classOf[Expression].getName |
There was a problem hiding this comment.
Expression is pre-imported, we can just write Expression in the generated code.
| ev: ExprCode): ExprCode = { | ||
| val thisClassName = this.getClass.getName | ||
| val scalaUDF = ctx.freshName("scalaUDF") | ||
| val scalaUDFRef = ctx.addReferenceMinorObj(this, thisClassName) |
There was a problem hiding this comment.
ctx.addReferenceMinorObj has a default value for class name, which is obj.getClass.getNane, so the thisClassName is redundant.
There was a problem hiding this comment.
oh i see, it's used later.
There was a problem hiding this comment.
I am using thisClassName also later (line 1045), that is why I passed it, despite it is not needed. What is your suggestion? Just not passing it as a parameter or getting rid of the thisClassName variable itself? Thanks,
| test("SPARK-22695: ScalaUDF should not use global variables") { | ||
| val ctx = new CodegenContext | ||
| ScalaUDF((s: String) => s + "x", StringType, Literal("a") :: Nil).genCode(ctx) | ||
| // we have one variable (globalIsNull) introduced by reduceCodeSize |
There was a problem hiding this comment.
wow this simple UDF will trigger the code splitting logic in reduceCodeSize?
| override def doGenCode( | ||
| ctx: CodegenContext, | ||
| ev: ExprCode): ExprCode = { | ||
| val thisClassName = this.getClass.getName |
There was a problem hiding this comment.
isn't it just scalaUDFClassName?
There was a problem hiding this comment.
yes, thanks, nice catch! I am updating it. Thank you.
| val scalaUDF = ctx.freshName("scalaUDF") | ||
| val scalaUDFRef = ctx.addReferenceMinorObj(this, thisClassName) | ||
|
|
||
| val scalaUDF = ctx.addReferenceObj("scalaUDF", this) |
There was a problem hiding this comment.
We may need to revisit all the usage of ctx.addReferenceObj, I created https://issues.apache.org/jira/browse/SPARK-22716 for it. @mgaido91 do you have interests?
There was a problem hiding this comment.
yes, sure, thanks. I would be happy to work on it.
|
LGTM |
|
Test build #84555 has finished for PR 19900 at commit
|
|
Test build #84559 has finished for PR 19900 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
ScalaUDF is using global variables which are not needed. This can generate some unneeded entries in the constant pool.
The PR replaces the unneeded global variables with local variables.
How was this patch tested?
added UT