-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Series.unstack #1501
Implement Series.unstack #1501
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1501 +/- ##
==========================================
+ Coverage 93.93% 93.94% +0.01%
==========================================
Files 36 36
Lines 8445 8461 +16
==========================================
+ Hits 7933 7949 +16
Misses 512 512
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good otherwise.
@itholic can you rebase please? |
Codecov Report
@@ Coverage Diff @@
## master #1501 +/- ##
=======================================
Coverage 94.00% 94.01%
=======================================
Files 36 36
Lines 8444 8458 +14
=======================================
+ Hits 7938 7952 +14
Misses 506 506
Continue to review full report at Codecov.
|
sdf = sdf.groupby(index_scol_names).pivot(pivot_col).sum(data_scol_name) | ||
internal = _InternalFrame( | ||
spark_frame=sdf, | ||
index_map=OrderedDict((index_scol_name, None) for index_scol_name in index_scol_names), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it wouldn't but can you check if the name of the index is kept in pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, It's kept in pandas. 😓 I'll fix it soon
>>> pser
hello Koalas
one a 10
b -2
two a 4
b 7
Name: 0, dtype: int64
>>> pser.unstack()
Koalas a b
hello
one 10 -2
two 4 7
Thanks for the catch!
We should support non-numeric datatypes as well? >>> pd.Series(list('abcd'), index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']])).unstack()
a b
one a b
two c d |
@ueshin I'll adress it, too. Thanks! 👍 |
) According to #1501 (comment) and #1501 (comment), Fix `Series.unstack()` to support non-numeric type and keep the names of index and columns. ```python >>> kser = ks.Series(list('abcd'), index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']], names=["A", "B"])) >>> kser A B one a a b b two a c b d Name: 0, dtype: object >>> kser.unstack() B a b A two c d one a b ```
…527) According to databricks/koalas#1501 (comment) and databricks/koalas#1501 (comment), Fix `Series.unstack()` to support non-numeric type and keep the names of index and columns. ```python >>> kser = ks.Series(list('abcd'), index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']], names=["A", "B"])) >>> kser A B one a a b b two a c b d Name: 0, dtype: object >>> kser.unstack() B a b A two c d one a b ```
This PR proposes a
Series.unstack
.