Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Oct 29, 2019

What changes were proposed in this pull request?

This PR proposes to allow array_contains to take column instances.

Why are the changes needed?

For consistent support in Scala and Python APIs. Scala allows column instances at array_contains

Scala:

import org.apache.spark.sql.functions._
val df = Seq(Array("a", "b", "c"), Array.empty[String]).toDF("data")
df.select(array_contains($"data", lit("a"))).show()

Python:

from pyspark.sql.functions import array_contains, lit
df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
df.select(array_contains(df.data, lit("a"))).show()

However, PySpark sides does not allow.

Does this PR introduce any user-facing change?

Yes.

from pyspark.sql.functions import array_contains, lit
df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
df.select(array_contains(df.data, lit("a"))).show()

Before:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../spark/python/pyspark/sql/functions.py", line 1950, in array_contains
    return Column(sc._jvm.functions.array_contains(_to_java_column(col), value))
  File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1277, in __call__
  File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1241, in _build_args
  File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1228, in _get_args
  File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", line 500, in convert
  File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__
    raise TypeError("Column is not iterable")
TypeError: Column is not iterable

After:

+-----------------------+
|array_contains(data, a)|
+-----------------------+
|                   true|
|                  false|
+-----------------------+

How was this patch tested?

Manually tested and added a doctest.

@SparkQA
Copy link

SparkQA commented Oct 29, 2019

Test build #112815 has finished for PR 26288 at commit 7ab0c76.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Oct 29, 2019

Test build #112851 has finished for PR 26288 at commit 5f8b160.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Thanks, @MaxGekk

@HyukjinKwon
Copy link
Member Author

Merged to master.

@dongjoon-hyun
Copy link
Member

Late LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants