Skip to content

Conversation

@khalidmammadov
Copy link
Contributor

What changes were proposed in this pull request?

It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

Why are the changes needed?

To improve PySpark documentation

Does this PR introduce any user-facing change?

Yes, documentation

How was this patch tested?

PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build

@khalidmammadov
Copy link
Contributor Author

@srowen @HyukjinKwon @itholic please review

>>> df.select(assert_true(df.a < df.b, 'error').alias('r')).collect()
[Row(r=None)]
>>> df.select(assert_true(df.a > df.b, 'My error msg').alias('r')).collect() # doctest: +SKIP
22/09/03 20:18:15 ERROR Executor: Exception in task 15.0 in stage 45.0 (TID 383)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's probably replace this line to ...

--------
>>> df = spark.range(1)
>>> df.select(raise_error("My error message")).show() # doctest: +SKIP
22/09/03 20:26:49 ERROR Executor: Exception in task 15.0 in stage 46.0 (TID 399)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. probably let's replace this to ...

Comment on lines 4834 to 4835
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING")

nit but shorter

Comment on lines 4866 to 4867
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING")

Comment on lines 4898 to 4899
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING")

Comment on lines 4930 to 4931
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType())
>>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING")

Comment on lines 4962 to 4965
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["U3Bhcms=",
... "UHlTcGFyaw==",
... "UGFuZGFzIEFQSQ=="], types.StringType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> from pyspark.sql import types
>>> df = spark.createDataFrame(["U3Bhcms=",
... "UHlTcGFyaw==",
... "UGFuZGFzIEFQSQ=="], types.StringType())
>>> df = spark.createDataFrame(["U3Bhcms=",
... "UHlTcGFyaw==",
... "UGFuZGFzIEFQSQ=="], "STRING")

@HyukjinKwon
Copy link
Member

Looks pretty good otherwise. cc @itholic @xinrong-meng @Yikun @zhengruifeng in case you guys find some time to review.

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise

Parameters
----------
cols : :class:`~pyspark.sql.Column` or str
list of columns to work on.
Copy link
Contributor

@itholic itholic Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe column or list of columns to compute on ??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe we should mention list for the available types ??

e.g.

cols : :class:`~pyspark.sql.Column`, list or str

Examples
--------
>>> spark.createDataFrame([('ABC',)], ['a']).select(hash('a').alias('hash')).collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have one more example using list of columns ??

Parameters
----------
cols : :class:`~pyspark.sql.Column` or str
list of columns to work on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto ?

Examples
--------
>>> spark.createDataFrame([('ABC',)], ['a']).select(xxhash64('a').alias('hash')).collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto ?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a cursory look. @xinrong-meng would you mind taking a quick look (and probably merging) this one please? 🙏

Copy link
Member

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@srowen srowen closed this in eaccadb Sep 8, 2022
@srowen
Copy link
Member

srowen commented Sep 8, 2022

Merged to master

HyukjinKwon pushed a commit that referenced this pull request Sep 19, 2022
…ples self-contained (part 7, ~30 functions)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786, #37797)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37850 from khalidmammadov/docstrings_funcs_part_7.

Lead-authored-by: Khalid Mammadov <[email protected]>
Co-authored-by: khalidmammadov <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request Sep 20, 2022
…ples self-contained (part 7, ~30 functions)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (apache#37592, apache#37662, apache#37686, apache#37786, apache#37797)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes apache#37850 from khalidmammadov/docstrings_funcs_part_7.

Lead-authored-by: Khalid Mammadov <[email protected]>
Co-authored-by: khalidmammadov <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
srowen pushed a commit that referenced this pull request Sep 25, 2022
…ples self-contained (FINAL)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786, #37797, #37850)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user.

This should complete "my take" on `functions.py` docstrings & example improvements.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37988 from khalidmammadov/docstrings_funcs_part_8.

Authored-by: Khalid Mammadov <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
a0x8o added a commit to a0x8o/spark that referenced this pull request Sep 25, 2022
…ples self-contained (FINAL)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user.

This should complete "my take" on `functions.py` docstrings & example improvements.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37988 from khalidmammadov/docstrings_funcs_part_8.

Authored-by: Khalid Mammadov <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
a0x8o added a commit to a0x8o/spark that referenced this pull request Dec 30, 2022
…ples self-contained (FINAL)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user.

This should complete "my take" on `functions.py` docstrings & example improvements.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37988 from khalidmammadov/docstrings_funcs_part_8.

Authored-by: Khalid Mammadov <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
a0x8o added a commit to a0x8o/spark that referenced this pull request Dec 30, 2022
…ples self-contained (FINAL)

### What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850)

In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.

I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user.

This should complete "my take" on `functions.py` docstrings & example improvements.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
Yes, documentation

### How was this patch tested?
```
PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
./python/run-tests --testnames pyspark.sql.functions
bundle exec jekyll build
```

Closes #37988 from khalidmammadov/docstrings_funcs_part_8.

Authored-by: Khalid Mammadov <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants