Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -2632,8 +2632,8 @@ setMethod("date_sub", signature(y = "Column", x = "numeric"),

#' format_number
#'
#' Formats numeric column y to a format like '#,###,###.##', rounded to x decimal places,
#' and returns the result as a string column.
#' Formats numeric column y to a format like '#,###,###.##', rounded to x decimal places
#' with HALF_EVEN round mode, and returns the result as a string column.
#'
#' If x is 0, the result has no decimal point or fractional part.
#' If x < 0, the result will be null.
Expand Down Expand Up @@ -3548,7 +3548,7 @@ setMethod("row_number",

#' array_contains
#'
#' Returns true if the array contain the value.
#' Returns null if the array is null, true if the array contains the value, and false otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for null, we need to be more careful - null in JVM should show up as NA in R.
also, should true be TRUE and false be FALSE to match R type?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Mar 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I agree with being careful. For this PR, I just followed the others. I skimmed again and it seems we have not used the notation for None, True and False in functions.py, and NA, TRUE and FALSE in functions.R.

I can grep and replace.

#'
#' @param x A Column
#' @param value A value to be checked if contained in the column
Expand Down
8 changes: 4 additions & 4 deletions python/pyspark/sql/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -1327,8 +1327,8 @@ def encode(col, charset):
@since(1.5)
def format_number(col, d):
"""
Formats the number X to a format like '#,--#,--#.--', rounded to d decimal places,
and returns the result as a string.
Formats the number X to a format like '#,--#,--#.--', rounded to d decimal places
with HALF_EVEN round mode, and returns the result as a string.

:param col: the column name of the numeric value to be formatted
:param d: the N decimal places
Expand Down Expand Up @@ -1675,8 +1675,8 @@ def array(*cols):
@since(1.5)
def array_contains(col, value):
"""
Collection function: returns True if the array contains the given value. The collection
elements and value must be of the same type.
Collection function: returns null if the array is null, true if the array contains the
given value, and false otherwise.
Copy link
Member Author

@HyukjinKwon HyukjinKwon Mar 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other documentation in this file use true rather than True. So, I matach this to true. I am willing to sweep if anyone feels this should be fixed.
The reason I removed The collection elements and value must be of the same type is it seems we can provide other types that are implicitly castable.
This is not documented in Scala/R too. So, I instead provided a doctest as an example below in the Python documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like my other comment, probably should say True when in Python, @holdenk?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Mar 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am willing to grep and replace too. Please let me know @holdenk.

FWIW, If deciding this takes a while and holds off this PR, I would like to ask to merge this as is if you and @holdenk do not strongly feel about this.


:param col: name of column containing array
:param value: value to check for in array
Expand Down
8 changes: 8 additions & 0 deletions python/pyspark/sql/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -1129,6 +1129,14 @@ def test_rand_functions(self):
rndn2 = df.select('key', functions.randn(0)).collect()
self.assertEqual(sorted(rndn1), sorted(rndn2))

def test_array_contains_function(self):
from pyspark.sql.functions import array_contains

df = self.spark.createDataFrame([(["1", "2", "3"],), ([],)], ['data'])
actual = df.select(array_contains(df.data, 1).alias('b')).collect()
# The value argument can be implicitly castable to the element's type of the array.
self.assertEqual([Row(b=True), Row(b=False)], actual)

def test_between_function(self):
df = self.sc.parallelize([
Row(a=1, b=2, c=3),
Expand Down