Skip to content

Conversation

@icexelloss
Copy link
Contributor

What changes were proposed in this pull request?

Update Pandas UDFs section in sql-programming-guide. Add section for grouped aggregate pandas UDF.

How was this patch tested?

@SparkQA
Copy link

SparkQA commented Jul 26, 2018

Test build #93623 has finished for PR 21887 at commit 2d30b2d.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 26, 2018

Test build #93624 has finished for PR 21887 at commit 9669dad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

window operations. It defines an aggregation from one or more `pandas.Series`
to a scalar value, where the `pandas.Series` represents values for a column within the same group or window.

Note that this type of UDF doesn't not support partial aggregation and all data for a group or window will be loaded into memory. Also,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a typo doesn't not (BTW, I usually avoid abbreviation in documentation though).

to a scalar value, where the `pandas.Series` represents values for a column within the same group or window.

Note that this type of UDF doesn't not support partial aggregation and all data for a group or window will be loaded into memory. Also,
only unbounded window are supported with Grouped aggregate Pandas UDfs currently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UDfs -> UDFs


### Grouped Aggregate

Grouped aggregate Pandas UDFs are similar to Spark aggregate functions. Grouped aggregate Pandas UDFs are used with groupBy and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupBy -> `groupBy().agg()`

window operations -> can we link normal window Python API doc here?

@HyukjinKwon
Copy link
Member

Seems fine otherwise

@icexelloss
Copy link
Contributor Author

Thanks @HyukjinKwon ! I addressed the comments.

@SparkQA
Copy link

SparkQA commented Jul 27, 2018

Test build #93681 has finished for PR 21887 at commit 5bb8729.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# | 2| 5.0| 6.0|
# | 2|10.0| 6.0|
# +---+----+------+
# $example off:grouped_map_pandas_udf$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a typo. It looks it should be grouped_agg_pandas_udf. @icexelloss, cd docs && SKIP_API=1 jekyll build will build the doc and should better manually be tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch, my bad. Let me try building the doc.

@icexelloss
Copy link
Contributor Author

@HyukjinKwon I manually generated the doc and looks good to me.

@SparkQA
Copy link

SparkQA commented Jul 30, 2018

Test build #93809 has finished for PR 21887 at commit 8395567.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2018

Test build #93811 has finished for PR 21887 at commit a79a1fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 8141d55 Jul 31, 2018
@icexelloss
Copy link
Contributor Author

Thanks! @HyukjinKwon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants