-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28439][PYTHON][SQL] Add support for count: Column in array_repeat #25193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #107848 has finished for PR 25193 at commit
|
|
@zero323, although it's pretty straightforward, let's file a JIRA and add a simple test. (BTW, I am happy to see you back in Spark community :D) |
Sorry, my bad. I don't know why I've included JIRA info. Fixed.
I'll try to add one later today. |
|
Test build #107857 has finished for PR 25193 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
This will work for int type column.
>>> sql("CREATE TABLE t(a STRING, b int)")
>>> sql("INSERT INTO t VALUES('ab', 4)")
>>> sql("SELECT a, b FROM t").show()
+---+---+
| a| b|
+---+---+
| ab| 4|
+---+---+
>>> sql("SELECT array_repeat(a, b) FROM t").show()
+------------------+
|array_repeat(a, b)|
+------------------+
| [ab, ab, ab, ab]|
+------------------+
Please note that it will fail if we create like the following due to the difference between Python and Scala.
>>> df = spark.createDataFrame([('ab',1)], ['data','c'])
>>> df.printSchema()
root
|-- data: string (nullable = true)
|-- c: long (nullable = true)
scala> val df = Seq(("ab", 3)).toDF("data", "c")
df: org.apache.spark.sql.DataFrame = [data: string, c: int]
scala> df.printSchema()
root
|-- data: string (nullable = true)
|-- c: integer (nullable = false)
|
Merged to master. Thank you, @zero323 and @HyukjinKwon . |
|
If the column passed as count is not We may need to explicitly cast to int: |
That seems like an expected behavior here, but assuming it is to be modified, it should be done in the underlying expression, so behavior is consistent across API's (including SQL). |
|
I see your point @zero323. I ran into this issue when I passed a column which was the result of a count aggregation and was typed as |
|
@nucflash In general From the other hand |
What changes were proposed in this pull request?
This adds simple check for
countargument:Columnwe apply_to_java_columnbefore invoking JVM counterpartHow was this patch tested?
Manual testing.