[SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 6, ~50 functions) #37797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

khalidmammadov wants to merge 6 commits into apache:master from khalidmammadov:docstrings_funcs_part_6

Contributor

khalidmammadov commented Sep 4, 2022

What changes were proposed in this pull request?

It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

Why are the changes needed?

To improve PySpark documentation

Does this PR introduce any user-facing change?

Yes, documentation

How was this patch tested?

PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build


          Docstring improvements

7c49184

github-actions bot added CORE PYTHON SQL labels

Contributor Author

khalidmammadov commented Sep 4, 2022

@srowen @HyukjinKwon @itholic please review

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated

    
                  >>> df.select(assert_true(df.a < df.b, 'error').alias('r')).collect()

                  [Row(r=None)]

                  >>> df.select(assert_true(df.a > df.b, 'My error msg').alias('r')).collect() # doctest: +SKIP

                  22/09/03 20:18:15 ERROR Executor: Exception in task 15.0 in stage 45.0 (TID 383)

Member

HyukjinKwon Sep 5, 2022

Let's probably replace this line to ...

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated

    
                  --------

                  >>> df = spark.range(1)

                  >>> df.select(raise_error("My error message")).show() # doctest: +SKIP

                  22/09/03 20:26:49 ERROR Executor: Exception in task 15.0 in stage 46.0 (TID 399)

Member

HyukjinKwon Sep 5, 2022

ditto. probably let's replace this to ...

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated

Comment on lines 4834 to 4835

    
                  >>> from pyspark.sql import types

                  >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())

Member

HyukjinKwon Sep 5, 2022

Suggested change

      
                >>> from pyspark.sql import types
          
                >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
          
                >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING")

nit but shorter

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated

Comment on lines 4866 to 4867

    
                  >>> from pyspark.sql import types

                  >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())

Member

HyukjinKwon Sep 5, 2022

Suggested change

      
                >>> from pyspark.sql import types
          
                >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
          
                >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING")

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated

Comment on lines 4898 to 4899

    
                  >>> from pyspark.sql import types

                  >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())

Member

HyukjinKwon Sep 5, 2022

Suggested change

      
                >>> from pyspark.sql import types
          
                >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
          
                >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING")

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated

Comment on lines 4930 to 4931

    
                  >>> from pyspark.sql import types

                  >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())

Member

HyukjinKwon Sep 5, 2022

Suggested change

      
                >>> from pyspark.sql import types
          
                >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
          
                >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING")

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated

Comment on lines 4962 to 4965

    
                  >>> from pyspark.sql import types

                  >>> df = spark.createDataFrame(["U3Bhcms=",

                  ...                             "UHlTcGFyaw==",

                  ...                             "UGFuZGFzIEFQSQ=="], types.StringType())

Member

HyukjinKwon Sep 5, 2022

Suggested change

      
                >>> from pyspark.sql import types
          
                >>> df = spark.createDataFrame(["U3Bhcms=",
          
                ...                             "UHlTcGFyaw==",
          
                ...                             "UGFuZGFzIEFQSQ=="], types.StringType())
          
                >>> df = spark.createDataFrame(["U3Bhcms=",
          
                ...                             "UHlTcGFyaw==",
          
                ...                             "UGFuZGFzIEFQSQ=="], "STRING")

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed

View reviewed changes

python/pyspark/sql/functions.py Outdated Show resolved Hide resolved

Member

HyukjinKwon commented Sep 5, 2022

Looks pretty good otherwise. cc @itholic @xinrong-meng @Yikun @zhengruifeng in case you guys find some time to review.

itholic reviewed

View reviewed changes

Contributor

itholic left a comment

Looks good otherwise

python/pyspark/sql/functions.py Outdated

    
                  Parameters

                  ----------

                  cols : :class:`~pyspark.sql.Column` or str

                      list of columns to work on.

Contributor

itholic Sep 5, 2022 •

edited

Loading

Maybe column or list of columns to compute on ??

Contributor

itholic Sep 5, 2022

And maybe we should mention list for the available types ??

e.g.

cols : :class:`~pyspark.sql.Column`, list or str

python/pyspark/sql/functions.py Outdated

    
                  Examples

                  --------

                  >>> spark.createDataFrame([('ABC',)], ['a']).select(hash('a').alias('hash')).collect()

Contributor

itholic Sep 5, 2022

Can we also have one more example using list of columns ??

python/pyspark/sql/functions.py Outdated

    
                  Parameters

                  ----------

                  cols : :class:`~pyspark.sql.Column` or str

                      list of columns to work on.

Contributor

itholic Sep 5, 2022

ditto ?

python/pyspark/sql/functions.py Outdated

    
                  Examples

                  --------

                  >>> spark.createDataFrame([('ABC',)], ['a']).select(xxhash64('a').alias('hash')).collect()

Contributor

itholic Sep 5, 2022

ditto ?

AmplabJenkins commented Sep 6, 2022

Can one of the admins verify this patch?

khalidmammadov and others added 5 commits

September 6, 2022 08:48


          Review corrections

2c898e7


          Simplify df creation

b1ba432

Co-authored-by: Hyukjin Kwon <[email protected]>


          Simplify df creation

303f240

Co-authored-by: Hyukjin Kwon <[email protected]>


          Simplify df creation

e180740

Co-authored-by: Hyukjin Kwon <[email protected]>


          Trigger pipeline

edf8624

khalidmammadov requested review from HyukjinKwon and itholic

September 7, 2022 10:00

HyukjinKwon approved these changes

View reviewed changes

Member

HyukjinKwon left a comment

LGTM from a cursory look. @xinrong-meng would you mind taking a quick look (and probably merging) this one please? 🙏

Yikun approved these changes

View reviewed changes

Member

Yikun left a comment

LGTM

itholic approved these changes

View reviewed changes

Contributor

itholic left a comment

LGTM !

srowen closed this in

eaccadb

Member

srowen commented Sep 8, 2022

Merged to master

khalidmammadov mentioned this pull request

[SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 7, ~30 functions) #37850

Closed

HyukjinKwon pushed a commit that referenced this pull request


          [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions exam…

6fa6edb

…ples self-contained (part 7, ~30 functions)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786, #37797)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37850 from khalidmammadov/docstrings_funcs_part_7.

Lead-authored-by: Khalid Mammadov <[email protected]>
Co-authored-by: khalidmammadov <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>

LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request


          [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions exam…

f9d3fad

…ples self-contained (part 7, ~30 functions)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (apache#37592, apache#37662, apache#37686, apache#37786, apache#37797)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes apache#37850 from khalidmammadov/docstrings_funcs_part_7.

Lead-authored-by: Khalid Mammadov <[email protected]>
Co-authored-by: khalidmammadov <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>

khalidmammadov mentioned this pull request

[SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL) #37988

Closed

srowen pushed a commit that referenced this pull request


          [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions exam…

57e6cf0

…ples self-contained (FINAL)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786, #37797, #37850)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user.

This should complete "my take" on `functions.py` docstrings & example improvements.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37988 from khalidmammadov/docstrings_funcs_part_8.

Authored-by: Khalid Mammadov <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

a0x8o added a commit to a0x8o/spark that referenced this pull request


          [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions exam…

b4c9dea

…ples self-contained (FINAL)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user.

This should complete "my take" on `functions.py` docstrings & example improvements.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37988 from khalidmammadov/docstrings_funcs_part_8.

Authored-by: Khalid Mammadov <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

a0x8o added a commit to a0x8o/spark that referenced this pull request


          [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions exam…

359591f

…ples self-contained (FINAL)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user.

This should complete "my take" on `functions.py` docstrings & example improvements.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37988 from khalidmammadov/docstrings_funcs_part_8.

Authored-by: Khalid Mammadov <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

a0x8o added a commit to a0x8o/spark that referenced this pull request


          [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions exam…

03b2fbd

…ples self-contained (FINAL)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user.

This should complete "my take" on `functions.py` docstrings & example improvements.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37988 from khalidmammadov/docstrings_funcs_part_8.

Authored-by: Khalid Mammadov <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE PYTHON SQL