-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disable 'str' for 'SeriesGroupBy', disable 'DataFrame' for 'GroupBy' #1097
disable 'str' for 'SeriesGroupBy', disable 'DataFrame' for 'GroupBy' #1097
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1097 +/- ##
=========================================
- Coverage 95.2% 95.2% -0.01%
=========================================
Files 34 34
Lines 6889 6950 +61
=========================================
+ Hits 6559 6617 +58
- Misses 330 333 +3
Continue to review full report at Codecov.
|
databricks/koalas/generic.py
Outdated
@@ -1279,6 +1279,8 @@ def groupby(self, by, as_index: bool = True): | |||
col_by = [_resolve_col(df, col_or_s) for col_or_s in by] | |||
return DataFrameGroupBy(df_or_s, col_by, as_index=as_index) | |||
if isinstance(df_or_s, Series): | |||
if not isinstance(by[0], Series): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should remove if isinstance(by, str):
,.
We also should fix the error message raise ValueError('Not a valid index: TODO')
to match with pandas'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think maybe It's hard to remove if isinstance(by, str):
,
because str
type is valid for DataFrameGroupBy
, so we need it ??
>>> pdf
regiment company experience name preTestScore postTestScore
0 Nighthawks infantry veteran Miller 4 25
1 Nighthawks infantry rookie Jacobson 24 94
2 Nighthawks cavalry veteran Ali 31 57
3 Nighthawks cavalry rookie Milner 2 62
4 Dragoons infantry veteran Cooze 3 70
5 Dragoons infantry rookie Jacon 4 25
6 Dragoons cavalry veteran Ryaner 24 94
7 Dragoons cavalry rookie Sone 31 57
8 Scouts infantry veteran Sloan 2 62
9 Scouts infantry rookie Piger 3 70
10 Scouts cavalry veteran Riani 2 62
11 Scouts cavalry rookie Ali 3 70
>>> pdf.groupby('company')
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x124f39780>
databricks/koalas/generic.py
Outdated
@@ -1279,6 +1279,8 @@ def groupby(self, by, as_index: bool = True): | |||
col_by = [_resolve_col(df, col_or_s) for col_or_s in by] | |||
return DataFrameGroupBy(df_or_s, col_by, as_index=as_index) | |||
if isinstance(df_or_s, Series): | |||
if not isinstance(by[0], Series): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also remove
elif isinstance(by, tuple):
by = [by]
Since tuple
is already Iterable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think they have slightly difference purpose,
if given by
is ('hello', 'koalas')
when elif isinstance(by, tuple)
handle this like below.
elif isinstance(by, tuple):
by = [by] # [('hello', 'koalas')]
whereas isinstance(by, Iterable)
like below.
elif isinstance(by, Iterable):
by = [key if isinstance(key, (tuple, Series)) else (key,) for key in by] # [('hello',), ('koalas',)]
we should not allow type of I added it to #1095 and this PR description. |
Softagram Impact Report for pull/1097 (head commit: bbf956b)
|
Resolve #1095