-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow str and list in aggfunc in DataFrameGroupby.agg #828
Conversation
I am not sure if this will pass or not, since group key order is changed as you could see from the picture i attach. Any suggestions on a fix is welcome. |
emm, even with current setup, |
@charlesdong1991 We don't guarantee the row order without a special reason, e.g., |
thanks for your comment, i slightly changed the test a bit to fix this order issue that failed tests, and added some docstrings in agg. Feel free to take a look. @ueshin |
any follow-up review will be appreciated a lot ^^ @ueshin @HyukjinKwon |
databricks/koalas/groupby.py
Outdated
|
||
else: | ||
group_keyname = [key.name for key in self._groupkeys] | ||
agg_cols = [key for key in self._kdf.columns if key not in group_keyname] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess self._agg_columns
should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! thanks! @ueshin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests.
@charlesdong1991 oh, one more thing I'd like to ask. |
@ueshin sure! Added and it passes tests locally, let's see if it's okay on CI. |
@charlesdong1991 Thanks! |
|
||
for aggfunc in agg_funcs: | ||
sorted_agg_kdf = kdf.groupby('kind').agg(aggfunc) | ||
sorted_agg_pdf = pdf.groupby('kind').agg(aggfunc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need .sort_index()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, seems it has the same index as pandas when it's numeric. but could add one to ensure
Softagram Impact Report for pull/828 (head commit: f1e3c6b)⭐ Change Overview
📄 Full report
Impact Report explained. Give feedback on this report to [email protected] |
The latest failure seems not related to this PR. |
Thanks! merging. |
right now, when I look at Groupby, it does not accept str or list, but in pandas, it's allowed. So before implementing named aggregation, i think this is a better thing to deal first.
e.g. in pandas we could have:
now koalas can also accept this: