Skip to content

[SPARK-22695][SQL] ScalaUDF should not use global variables#19900

Closed
mgaido91 wants to merge 3 commits intoapache:masterfrom
mgaido91:SPARK-22695
Closed

[SPARK-22695][SQL] ScalaUDF should not use global variables#19900
mgaido91 wants to merge 3 commits intoapache:masterfrom
mgaido91:SPARK-22695

Conversation

@mgaido91
Copy link
Copy Markdown
Contributor

@mgaido91 mgaido91 commented Dec 5, 2017

What changes were proposed in this pull request?

ScalaUDF is using global variables which are not needed. This can generate some unneeded entries in the constant pool.

The PR replaces the unneeded global variables with local variables.

How was this patch tested?

added UT

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 5, 2017

Test build #84491 has finished for PR 19900 at commit 4eaca3e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Copy Markdown
Contributor Author

mgaido91 commented Dec 6, 2017

@cloud-fan @kiszk @viirya may you please review this? Thanks

val expressionClassName = classOf[Expression].getName
val scalaUDFClassName = classOf[ScalaUDF].getName
private val converterClassName = classOf[Any => Any].getName
private val expressionClassName = classOf[Expression].getName
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expression is pre-imported, we can just write Expression in the generated code.

ev: ExprCode): ExprCode = {
val thisClassName = this.getClass.getName
val scalaUDF = ctx.freshName("scalaUDF")
val scalaUDFRef = ctx.addReferenceMinorObj(this, thisClassName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.addReferenceMinorObj has a default value for class name, which is obj.getClass.getNane, so the thisClassName is redundant.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see, it's used later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using thisClassName also later (line 1045), that is why I passed it, despite it is not needed. What is your suggestion? Just not passing it as a parameter or getting rid of the thisClassName variable itself? Thanks,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok we can keep it.

test("SPARK-22695: ScalaUDF should not use global variables") {
val ctx = new CodegenContext
ScalaUDF((s: String) => s + "x", StringType, Literal("a") :: Nil).genCode(ctx)
// we have one variable (globalIsNull) introduced by reduceCodeSize
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow this simple UDF will trigger the code splitting logic in reduceCodeSize?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

override def doGenCode(
ctx: CodegenContext,
ev: ExprCode): ExprCode = {
val thisClassName = this.getClass.getName
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it just scalaUDFClassName?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks, nice catch! I am updating it. Thank you.

val scalaUDF = ctx.freshName("scalaUDF")
val scalaUDFRef = ctx.addReferenceMinorObj(this, thisClassName)

val scalaUDF = ctx.addReferenceObj("scalaUDF", this)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to revisit all the usage of ctx.addReferenceObj, I created https://issues.apache.org/jira/browse/SPARK-22716 for it. @mgaido91 do you have interests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sure, thanks. I would be happy to work on it.

@cloud-fan
Copy link
Copy Markdown
Contributor

LGTM

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 6, 2017

Test build #84555 has finished for PR 19900 at commit eef8036.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 6, 2017

Test build #84559 has finished for PR 19900 at commit f188d55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

@asfgit asfgit closed this in e98f964 Dec 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants