-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 6, ~50 functions) #37797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 6, ~50 functions) #37797
Conversation
|
@srowen @HyukjinKwon @itholic please review |
python/pyspark/sql/functions.py
Outdated
| >>> df.select(assert_true(df.a < df.b, 'error').alias('r')).collect() | ||
| [Row(r=None)] | ||
| >>> df.select(assert_true(df.a > df.b, 'My error msg').alias('r')).collect() # doctest: +SKIP | ||
| 22/09/03 20:18:15 ERROR Executor: Exception in task 15.0 in stage 45.0 (TID 383) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's probably replace this line to ...
python/pyspark/sql/functions.py
Outdated
| -------- | ||
| >>> df = spark.range(1) | ||
| >>> df.select(raise_error("My error message")).show() # doctest: +SKIP | ||
| 22/09/03 20:26:49 ERROR Executor: Exception in task 15.0 in stage 46.0 (TID 399) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. probably let's replace this to ...
python/pyspark/sql/functions.py
Outdated
| >>> from pyspark.sql import types | ||
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| >>> from pyspark.sql import types | |
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType()) | |
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING") |
nit but shorter
python/pyspark/sql/functions.py
Outdated
| >>> from pyspark.sql import types | ||
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| >>> from pyspark.sql import types | |
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType()) | |
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING") |
python/pyspark/sql/functions.py
Outdated
| >>> from pyspark.sql import types | ||
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| >>> from pyspark.sql import types | |
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType()) | |
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING") |
python/pyspark/sql/functions.py
Outdated
| >>> from pyspark.sql import types | ||
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| >>> from pyspark.sql import types | |
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], types.StringType()) | |
| >>> df = spark.createDataFrame(["Spark", "PySpark", "Pandas API"], "STRING") |
python/pyspark/sql/functions.py
Outdated
| >>> from pyspark.sql import types | ||
| >>> df = spark.createDataFrame(["U3Bhcms=", | ||
| ... "UHlTcGFyaw==", | ||
| ... "UGFuZGFzIEFQSQ=="], types.StringType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| >>> from pyspark.sql import types | |
| >>> df = spark.createDataFrame(["U3Bhcms=", | |
| ... "UHlTcGFyaw==", | |
| ... "UGFuZGFzIEFQSQ=="], types.StringType()) | |
| >>> df = spark.createDataFrame(["U3Bhcms=", | |
| ... "UHlTcGFyaw==", | |
| ... "UGFuZGFzIEFQSQ=="], "STRING") |
|
Looks pretty good otherwise. cc @itholic @xinrong-meng @Yikun @zhengruifeng in case you guys find some time to review. |
itholic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good otherwise
python/pyspark/sql/functions.py
Outdated
| Parameters | ||
| ---------- | ||
| cols : :class:`~pyspark.sql.Column` or str | ||
| list of columns to work on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe column or list of columns to compute on ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe we should mention list for the available types ??
e.g.
cols : :class:`~pyspark.sql.Column`, list or str
python/pyspark/sql/functions.py
Outdated
| Examples | ||
| -------- | ||
| >>> spark.createDataFrame([('ABC',)], ['a']).select(hash('a').alias('hash')).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also have one more example using list of columns ??
python/pyspark/sql/functions.py
Outdated
| Parameters | ||
| ---------- | ||
| cols : :class:`~pyspark.sql.Column` or str | ||
| list of columns to work on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto ?
python/pyspark/sql/functions.py
Outdated
| Examples | ||
| -------- | ||
| >>> spark.createDataFrame([('ABC',)], ['a']).select(xxhash64('a').alias('hash')).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto ?
|
Can one of the admins verify this patch? |
Co-authored-by: Hyukjin Kwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a cursory look. @xinrong-meng would you mind taking a quick look (and probably merging) this one please? 🙏
Yikun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
itholic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
|
Merged to master |
…ples self-contained (part 7, ~30 functions) ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786, #37797) In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? Yes, documentation ### How was this patch tested? ``` PYTHON_EXECUTABLE=python3.9 ./dev/lint-python ./python/run-tests --testnames pyspark.sql.functions bundle exec jekyll build ``` Closes #37850 from khalidmammadov/docstrings_funcs_part_7. Lead-authored-by: Khalid Mammadov <[email protected]> Co-authored-by: khalidmammadov <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…ples self-contained (part 7, ~30 functions) ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (apache#37592, apache#37662, apache#37686, apache#37786, apache#37797) In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? Yes, documentation ### How was this patch tested? ``` PYTHON_EXECUTABLE=python3.9 ./dev/lint-python ./python/run-tests --testnames pyspark.sql.functions bundle exec jekyll build ``` Closes apache#37850 from khalidmammadov/docstrings_funcs_part_7. Lead-authored-by: Khalid Mammadov <[email protected]> Co-authored-by: khalidmammadov <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…ples self-contained (FINAL) ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786, #37797, #37850) In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed. I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user. This should complete "my take" on `functions.py` docstrings & example improvements. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? Yes, documentation ### How was this patch tested? ``` PYTHON_EXECUTABLE=python3.9 ./dev/lint-python ./python/run-tests --testnames pyspark.sql.functions bundle exec jekyll build ``` Closes #37988 from khalidmammadov/docstrings_funcs_part_8. Authored-by: Khalid Mammadov <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…ples self-contained (FINAL) ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850) In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed. I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user. This should complete "my take" on `functions.py` docstrings & example improvements. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? Yes, documentation ### How was this patch tested? ``` PYTHON_EXECUTABLE=python3.9 ./dev/lint-python ./python/run-tests --testnames pyspark.sql.functions bundle exec jekyll build ``` Closes #37988 from khalidmammadov/docstrings_funcs_part_8. Authored-by: Khalid Mammadov <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…ples self-contained (FINAL) ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850) In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed. I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user. This should complete "my take" on `functions.py` docstrings & example improvements. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? Yes, documentation ### How was this patch tested? ``` PYTHON_EXECUTABLE=python3.9 ./dev/lint-python ./python/run-tests --testnames pyspark.sql.functions bundle exec jekyll build ``` Closes #37988 from khalidmammadov/docstrings_funcs_part_8. Authored-by: Khalid Mammadov <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…ples self-contained (FINAL) ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (apache/spark#37592, apache/spark#37662, apache/spark#37686, apache/spark#37786, apache/spark#37797, apache/spark#37850) In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed. I have also made all examples self explanatory by providing DataFrame creation command where it was missing for clarity to a user. This should complete "my take" on `functions.py` docstrings & example improvements. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? Yes, documentation ### How was this patch tested? ``` PYTHON_EXECUTABLE=python3.9 ./dev/lint-python ./python/run-tests --testnames pyspark.sql.functions bundle exec jekyll build ``` Closes #37988 from khalidmammadov/docstrings_funcs_part_8. Authored-by: Khalid Mammadov <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
It's part of the Pyspark docstrings improvement series (#37592, #37662, #37686, #37786)
In this PR I mainly covered missing parts in the docstrings adding some more examples where it needed.
Why are the changes needed?
To improve PySpark documentation
Does this PR introduce any user-facing change?
Yes, documentation
How was this patch tested?