Skip to content

Conversation

@maryannxue
Copy link
Contributor

@maryannxue maryannxue commented Jul 23, 2018

What changes were proposed in this pull request?

The HandleNullInputsForUDF would always add a new If node every time it is applied. That would cause a difference between the same plan being analyzed once and being analyzed twice (or more), thus raising issues like plan not matched in the cache manager. The solution is to mark the arguments as null-checked, which is to add a "KnownNotNull" node above those arguments, when adding the UDF under an If node, because clearly the UDF will not be called when any of those arguments is null.

How was this patch tested?

Add new tests under sql/UDFSuite and AnalysisSuite.

// branch of `If` will be called if any of these checked inputs is null. Thus we can
// prevent this rule from being applied repeatedly.
val newInputs = parameterTypes.zip(inputs).map{ case (cls, expr) =>
if (needsNullCheck(cls, expr)) AssertNotNull(expr) else expr }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us introduce KnownNotNull instead of using AssertNotNull, which has a side-effect?

@SparkQA
Copy link

SparkQA commented Jul 23, 2018

Test build #93459 has finished for PR 21851 at commit 62fa9cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93469 has finished for PR 21851 at commit b499b97.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class KnowNotNull(child: Expression) extends UnaryExpression

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

update the PR description?

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93503 has finished for PR 21851 at commit b499b97.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class KnowNotNull(child: Expression) extends UnaryExpression

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93514 has finished for PR 21851 at commit b499b97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class KnowNotNull(child: Expression) extends UnaryExpression

@gatorsmile
Copy link
Member

gatorsmile commented Jul 25, 2018

LGTM

Thanks! Merged to master/2.3

@asfgit asfgit closed this in c26b092 Jul 25, 2018
asfgit pushed a commit that referenced this pull request Jul 25, 2018
The HandleNullInputsForUDF would always add a new `If` node every time it is applied. That would cause a difference between the same plan being analyzed once and being analyzed twice (or more), thus raising issues like plan not matched in the cache manager. The solution is to mark the arguments as null-checked, which is to add a "KnownNotNull" node above those arguments, when adding the UDF under an `If` node, because clearly the UDF will not be called when any of those arguments is null.

Add new tests under sql/UDFSuite and AnalysisSuite.

Author: maryannxue <[email protected]>

Closes #21851 from maryannxue/spark-24891.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants