Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jul 10, 2019

What changes were proposed in this pull request?

There is a bug in ExtractPythonUDFs that produces wrong result attributes. It causes a failure when using PythonUDFs among multiple child plans, e.g., join. An example is using PythonUDFs in join condition.

>>> left = spark.createDataFrame([Row(a=1, a1=1, a2=1), Row(a=2, a1=2, a2=2)])                                                                                                                                                                                                                      
>>> right = spark.createDataFrame([Row(b=1, b1=1, b2=1), Row(b=1, b1=3, b2=1)])                                                                       
>>> f = udf(lambda a: a, IntegerType())                                                                                                               
>>> df = left.join(right, [f("a") == f("b"), left.a1 == right.b1])                                                
>>> df.collect()                                                                                                                                      
19/07/10 12:20:49 ERROR Executor: Exception in task 5.0 in stage 0.0 (TID 5)                                                                                                                                                 
java.lang.ArrayIndexOutOfBoundsException: 1                                                                                                           
        at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:201)                                                    
        at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getAs(rows.scala:35)               
        at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.isNullAt(rows.scala:36)               
        at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.isNullAt$(rows.scala:36)                             
        at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.isNullAt(rows.scala:195)
        at org.apache.spark.sql.catalyst.expressions.JoinedRow.isNullAt(JoinedRow.scala:70)
        ...

How was this patch tested?

Added test.

@viirya
Copy link
Member Author

viirya commented Jul 10, 2019

cc @HyukjinKwon @BryanCutler

@HyukjinKwon
Copy link
Member

Cool @viirya! I will take a closer look within 2 days and get this in

@viirya
Copy link
Member Author

viirya commented Jul 10, 2019

Thanks! @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107433 has finished for PR 25091 at commit 95231a6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jul 10, 2019

retest this please.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107445 has finished for PR 25091 at commit 95231a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107452 has finished for PR 25091 at commit 0c24787.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107458 has finished for PR 25091 at commit fbf8be9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"Can only extract scalar vectorized udf or sql batch udf")

val resultAttrs = udfs.zipWithIndex.map { case (u, i) =>
val resultAttrs = validUdfs.zipWithIndex.map { case (u, i) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, nice catch!

@BryanCutler
Copy link
Member

merged to master, thanks @viirya !

@BryanCutler
Copy link
Member

jira seems down, so I wasn't able to resolve the issue. will try later

@HyukjinKwon
Copy link
Member

Yea, LGTM too!

@viirya
Copy link
Member Author

viirya commented Jul 11, 2019

Thanks! @HyukjinKwon @BryanCutler

@viirya viirya deleted the SPARK-28323 branch December 27, 2023 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants