[SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 7, ~30 functions) #37850

khalidmammadov · 2022-09-10T13:08:55Z

What changes were proposed in this pull request?

It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786, #37797)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

Why are the changes needed?

To improve PySpark documentation

Does this PR introduce any user-facing change?

Yes, documentation

How was this patch tested?

PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build

AmplabJenkins · 2022-09-11T23:25:31Z

Can one of the admins verify this patch?

khalidmammadov · 2022-09-12T21:51:32Z

@HyukjinKwon @srowen @itholic Please review

srowen · 2022-09-12T22:10:15Z

python/pyspark/sql/functions.py

+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        a string representatio of a :class:`StructType` parsed from given JSON.


typo in a few places: representatio -> representation

srowen · 2022-09-12T22:10:40Z

python/pyspark/sql/functions.py

+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        an array of values from first array that is not in the second.


nit: that are not

srowen · 2022-09-12T22:11:08Z

python/pyspark/sql/functions.py

+    col : :class:`~pyspark.sql.Column` or str
+        target column to work on.
+    delimiter : str
+        delimiter to use concatanate elements


concatenate, here and below
Also: "delimiter used to concatenate..."

itholic

Looks fine otherwise

itholic · 2022-09-13T00:26:07Z

python/pyspark/sql/functions.py

+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        a column of `boolean` type.


nit: if we want to use single quotation for type name, why don't we use it in other docstrings ?

e.g.

- a column of array type. + a column of `array` type.

itholic · 2022-09-13T00:30:05Z

python/pyspark/sql/functions.py

+    -------
+    :class:`~pyspark.sql.Column`
+        a column of array type. Subset of array.
+


nit: since we're here, can we also fix the minor mistake in description?

I found there are two spaces between "containing" and "all".

- Collection function: returns an array containing all the elements in `x` from index `start` + Collection function: returns an array containing all the elements in `x` from index `start`

itholic · 2022-09-13T00:37:40Z

python/pyspark/sql/functions.py

    Concatenates multiple input columns together into a single column.
-    The function works with strings, binary and compatible array columns.
+    The function works with strings, numeric, binary and compatible array columns.
+    Or any type that can be converted to string is good candidate as input value.


If there are supported types other than string, numeric, and binary, can we list them all ?

itholic · 2022-09-13T00:40:25Z

python/pyspark/sql/functions.py

+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        concatatened values. Type of the `Column` depends on input columns' type.


Another typo for "contatatened" here :-)

itholic · 2022-09-13T00:40:36Z

python/pyspark/sql/functions.py

+
+    See Also
+    --------
+    :meth:`pyspark.sql.functions.array_join` : to concatanate string columns with delimiter


itholic · 2022-09-13T01:25:44Z

python/pyspark/sql/functions.py

+    >>> df.select(element_at(df.data, -4)).collect()
+    [Row(element_at(data, -4)=None)]


Can we add a short description why this returns None?

e.g.

>>> df.select(element_at(df.data, -1)).collect() [Row(element_at(data, -1)='c')] Returns `None` if there is no value corresponding to the given `extraction`. >>> df.select(element_at(df.data, -4)).collect() [Row(element_at(data, -4)=None)]

itholic · 2022-09-13T01:36:35Z

python/pyspark/sql/functions.py

+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        a string representatio of a :class:`StructType` parsed from given CSV.


typo in a few places: representatio -> representation

Here, too

HyukjinKwon · 2022-09-19T03:07:36Z

Merged to master.

…ples self-contained (part 7, ~30 functions) ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (apache#37592, apache#37662, apache#37686, apache#37786, apache#37797) In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? Yes, documentation ### How was this patch tested? ``` PYTHON_EXECUTABLE=python3.9 ./dev/lint-python ./python/run-tests --testnames pyspark.sql.functions bundle exec jekyll build ``` Closes apache#37850 from khalidmammadov/docstrings_funcs_part_7. Lead-authored-by: Khalid Mammadov <[email protected]> Co-authored-by: khalidmammadov <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…ple in element_at ### What changes were proposed in this pull request? This PR is a followup of #37850 that removes non-ANSI compliant example in `element_at`. ### Why are the changes needed? ANSI build fails to run the example. https://github.com/apache/spark/actions/runs/3094607589/jobs/5008176959 ``` Caused by: org.apache.spark.SparkArrayIndexOutOfBoundsException: [INVALID_ARRAY_INDEX_IN_ELEMENT_AT] The index -4 is out of bounds. The array has 3 elements. Use `try_element_at` to tolerate accessing element at invalid index and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidElementAtIndexError(QueryExecutionErrors.scala:264) ... /usr/local/pypy/pypy3.7/lib-python/3/runpy.py:125: RuntimeWarning: 'pyspark.sql.functions' found in sys.modules after import of package 'pyspark.sql', but prior to execution of 'pyspark.sql.functions'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) /__w/spark/spark/python/pyspark/context.py:310: FutureWarning: Python 3.7 support is deprecated in Spark 3.4. warnings.warn("Python 3.7 support is deprecated in Spark 3.4.", FutureWarning) ********************************************************************** 1 of 6 in pyspark.sql.functions.element_at ``` ### Does this PR introduce _any_ user-facing change? No. The example added is not exposed to end users yet. ### How was this patch tested? Manually tested with enabling the ANSI configuration (`spark.sql.ansi.enabled`) Closes #37959 from HyukjinKwon/SPARK-40142-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…ples self-contained (FINAL) ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786, #37797, #37850) In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed. I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user. This should complete "my take" on `functions.py` docstrings & example improvements. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? Yes, documentation ### How was this patch tested? ``` PYTHON_EXECUTABLE=python3.9 ./dev/lint-python ./python/run-tests --testnames pyspark.sql.functions bundle exec jekyll build ``` Closes #37988 from khalidmammadov/docstrings_funcs_part_8. Authored-by: Khalid Mammadov <[email protected]> Signed-off-by: Sean Owen <[email protected]>

…ples self-contained (FINAL) ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850) In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed. I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user. This should complete "my take" on `functions.py` docstrings & example improvements. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? Yes, documentation ### How was this patch tested? ``` PYTHON_EXECUTABLE=python3.9 ./dev/lint-python ./python/run-tests --testnames pyspark.sql.functions bundle exec jekyll build ``` Closes #37988 from khalidmammadov/docstrings_funcs_part_8. Authored-by: Khalid Mammadov <[email protected]> Signed-off-by: Sean Owen <[email protected]>

khalidmammadov added 2 commits September 8, 2022 22:59

Docstring improvements

c8309fe

Fix see also

4e10dcd

github-actions bot added CORE PYTHON SQL labels Sep 10, 2022

srowen requested changes Sep 12, 2022

View reviewed changes

itholic reviewed Sep 13, 2022

View reviewed changes

Review fixes

b5c13a6

srowen approved these changes Sep 17, 2022

View reviewed changes

HyukjinKwon closed this in 6fa6edb Sep 19, 2022

HyukjinKwon mentioned this pull request Sep 22, 2022

[SPARK-40142][PYTHON][DOCS][FOLLOW-UP] Remove non-ANSI compliant example in element_at #37959

Closed

khalidmammadov mentioned this pull request Sep 25, 2022

[SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL) #37988

Closed

		>>> df.select(element_at(df.data, -4)).collect()
		[Row(element_at(data, -4)=None)]

[SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 7, ~30 functions) #37850

[SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 7, ~30 functions) #37850

Uh oh!

Conversation

khalidmammadov commented Sep 10, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Sep 11, 2022

Uh oh!

khalidmammadov commented Sep 12, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

itholic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Sep 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants