[SPARK-25328][PYTHON] Add an example for having two columns as the grouping key in group aggregate pandas UDF #22329

HyukjinKwon · 2018-09-04T10:05:56Z

What changes were proposed in this pull request?

This PR proposes to add another example for multiple grouping key in group aggregate pandas UDF since this feature could make users still confused.

How was this patch tested?

Manually tested and documentation built.

…gregate pandas UDF

SparkQA · 2018-09-04T10:47:17Z

Test build #95666 has finished for PR 22329 at commit 36a7ccc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

icexelloss · 2018-09-04T14:37:54Z

python/pyspark/sql/functions.py

       |  1|1.5|
       |  2|6.0|
       +---+---+
+       >>> @pandas_udf("id long, v1 double, v2 double", PandasUDFType.GROUPED_MAP)  # doctest: +SKIP


It took me a while to realize v1 is a grouping key. It also a bit uncommon to use double value as a grouping key . How about we do sth like?

id long, additional_key long, v double

SparkQA · 2018-09-05T02:31:18Z

Test build #95690 has finished for PR 22329 at commit 2ad350c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-09-05T03:13:06Z

cc @gatorsmile and @BryanCutler

icexelloss · 2018-09-05T13:25:54Z

python/pyspark/sql/functions.py

       |  2|6.0|
       +---+---+
+       >>> @pandas_udf(
+       ...    "id long, additional_key double, v double",


do you mind changing the type of additional_key to long? It seems like the type coercion here is not necessary.

Sorry, I know you just changed it, but I think just naming the column "ceil(v1 / 2)" with a type long would be a little more clear. Although "additional_key" is ok too, if you guys want to keep that.

SparkQA · 2018-09-06T03:22:10Z

Test build #95734 has finished for PR 22329 at commit 1f342aa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

BryanCutler

LGTM

icexelloss · 2018-09-06T15:17:25Z

LGTM

BryanCutler · 2018-09-06T15:31:39Z

merged to master, thanks @HyukjinKwon . I just saw branch-2.4 was cut already, I'll see if I can figure out how to merge there too.

…ouping key in group aggregate pandas UDF ## What changes were proposed in this pull request? This PR proposes to add another example for multiple grouping key in group aggregate pandas UDF since this feature could make users still confused. ## How was this patch tested? Manually tested and documentation built. Closes #22329 from HyukjinKwon/SPARK-25328. Authored-by: hyukjinkwon <[email protected]> Signed-off-by: Bryan Cutler <[email protected]> (cherry picked from commit 7ef6d1d) Signed-off-by: Bryan Cutler <[email protected]>

BryanCutler · 2018-09-06T17:06:38Z

merged to branch-2.4

HyukjinKwon · 2018-09-07T02:23:36Z

Thanks guys :-)

Add an example for having two columns as the grouping key in group ag…

36a7ccc

…gregate pandas UDF

icexelloss reviewed Sep 4, 2018

View reviewed changes

Address comments

2ad350c

icexelloss reviewed Sep 5, 2018

View reviewed changes

Address comments

1f342aa

BryanCutler approved these changes Sep 6, 2018

View reviewed changes

asfgit closed this in 7ef6d1d Sep 6, 2018

HyukjinKwon deleted the SPARK-25328 branch October 16, 2018 12:43

[SPARK-25328][PYTHON] Add an example for having two columns as the grouping key in group aggregate pandas UDF #22329

[SPARK-25328][PYTHON] Add an example for having two columns as the grouping key in group aggregate pandas UDF #22329

Uh oh!

Conversation

HyukjinKwon commented Sep 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 4, 2018

Uh oh!

icexelloss Sep 4, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 5, 2018

Uh oh!

HyukjinKwon commented Sep 5, 2018

Uh oh!

icexelloss Sep 5, 2018

Choose a reason for hiding this comment

Uh oh!

BryanCutler Sep 5, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 6, 2018

Uh oh!

BryanCutler left a comment

Choose a reason for hiding this comment

Uh oh!

icexelloss commented Sep 6, 2018

Uh oh!

BryanCutler commented Sep 6, 2018

Uh oh!

BryanCutler commented Sep 6, 2018

Uh oh!

HyukjinKwon commented Sep 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HyukjinKwon commented Sep 4, 2018 •

edited

Loading