Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Sep 4, 2018

What changes were proposed in this pull request?

This PR proposes to add another example for multiple grouping key in group aggregate pandas UDF since this feature could make users still confused.

How was this patch tested?

Manually tested and documentation built.

@SparkQA
Copy link

SparkQA commented Sep 4, 2018

Test build #95666 has finished for PR 22329 at commit 36a7ccc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

| 1|1.5|
| 2|6.0|
+---+---+
>>> @pandas_udf("id long, v1 double, v2 double", PandasUDFType.GROUPED_MAP) # doctest: +SKIP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to realize v1 is a grouping key. It also a bit uncommon to use double value as a grouping key . How about we do sth like?

id long, additional_key long, v double

@SparkQA
Copy link

SparkQA commented Sep 5, 2018

Test build #95690 has finished for PR 22329 at commit 2ad350c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

cc @gatorsmile and @BryanCutler

| 2|6.0|
+---+---+
>>> @pandas_udf(
... "id long, additional_key double, v double",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind changing the type of additional_key to long? It seems like the type coercion here is not necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I know you just changed it, but I think just naming the column "ceil(v1 / 2)" with a type long would be a little more clear. Although "additional_key" is ok too, if you guys want to keep that.

@SparkQA
Copy link

SparkQA commented Sep 6, 2018

Test build #95734 has finished for PR 22329 at commit 1f342aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@icexelloss
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 7ef6d1d Sep 6, 2018
@BryanCutler
Copy link
Member

merged to master, thanks @HyukjinKwon . I just saw branch-2.4 was cut already, I'll see if I can figure out how to merge there too.

asfgit pushed a commit that referenced this pull request Sep 6, 2018
…ouping key in group aggregate pandas UDF

## What changes were proposed in this pull request?

This PR proposes to add another example for multiple grouping key in group aggregate pandas UDF since this feature could make users still confused.

## How was this patch tested?

Manually tested and documentation built.

Closes #22329 from HyukjinKwon/SPARK-25328.

Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: Bryan Cutler <[email protected]>
(cherry picked from commit 7ef6d1d)
Signed-off-by: Bryan Cutler <[email protected]>
@BryanCutler
Copy link
Member

merged to branch-2.4

@HyukjinKwon
Copy link
Member Author

Thanks guys :-)

@HyukjinKwon HyukjinKwon deleted the SPARK-25328 branch October 16, 2018 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants