-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23633][SQL] Update Pandas UDFs section in sql-programming-guide #21887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23633][SQL] Update Pandas UDFs section in sql-programming-guide #21887
Conversation
|
Test build #93623 has finished for PR 21887 at commit
|
|
Test build #93624 has finished for PR 21887 at commit
|
docs/sql-programming-guide.md
Outdated
| window operations. It defines an aggregation from one or more `pandas.Series` | ||
| to a scalar value, where the `pandas.Series` represents values for a column within the same group or window. | ||
|
|
||
| Note that this type of UDF doesn't not support partial aggregation and all data for a group or window will be loaded into memory. Also, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a typo doesn't not (BTW, I usually avoid abbreviation in documentation though).
docs/sql-programming-guide.md
Outdated
| to a scalar value, where the `pandas.Series` represents values for a column within the same group or window. | ||
|
|
||
| Note that this type of UDF doesn't not support partial aggregation and all data for a group or window will be loaded into memory. Also, | ||
| only unbounded window are supported with Grouped aggregate Pandas UDfs currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UDfs -> UDFs
docs/sql-programming-guide.md
Outdated
|
|
||
| ### Grouped Aggregate | ||
|
|
||
| Grouped aggregate Pandas UDFs are similar to Spark aggregate functions. Grouped aggregate Pandas UDFs are used with groupBy and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
groupBy -> `groupBy().agg()`
window operations -> can we link normal window Python API doc here?
|
Seems fine otherwise |
|
Thanks @HyukjinKwon ! I addressed the comments. |
|
Test build #93681 has finished for PR 21887 at commit
|
| # | 2| 5.0| 6.0| | ||
| # | 2|10.0| 6.0| | ||
| # +---+----+------+ | ||
| # $example off:grouped_map_pandas_udf$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a typo. It looks it should be grouped_agg_pandas_udf. @icexelloss, cd docs && SKIP_API=1 jekyll build will build the doc and should better manually be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch, my bad. Let me try building the doc.
|
@HyukjinKwon I manually generated the doc and looks good to me. |
|
Test build #93809 has finished for PR 21887 at commit
|
|
Test build #93811 has finished for PR 21887 at commit
|
|
Merged to master. |
|
Thanks! @HyukjinKwon |
What changes were proposed in this pull request?
Update Pandas UDFs section in sql-programming-guide. Add section for grouped aggregate pandas UDF.
How was this patch tested?