Implement Series.unstack #1501

itholic · 2020-05-16T19:53:09Z

This PR proposes a Series.unstack.

>>> s = ks.Series([1, 2, 3, 4],
...               index=pd.MultiIndex.from_product([['one', 'two'],
...                                                 ['a', 'b']]))
>>> s
one  a    1
     b    2
two  a    3
     b    4
Name: 0, dtype: int64

>>> s.unstack(level=-1).sort_index()
     a  b
one  1  2
two  3  4

>>> s.unstack(level=0).sort_index()
   one  two
a    1    3
b    2    4

codecov-io · 2020-05-16T20:16:17Z

Codecov Report

Merging #1501 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1501      +/-   ##
==========================================
+ Coverage   93.93%   93.94%   +0.01%     
==========================================
  Files          36       36              
  Lines        8445     8461      +16     
==========================================
+ Hits         7933     7949      +16     
  Misses        512      512

Impacted Files	Coverage Δ
databricks/koalas/missing/series.py	`100.00% <ø> (ø)`
databricks/koalas/series.py	`97.02% <100.00%> (+0.05%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6fbe6d0...3308478. Read the comment docs.

databricks/koalas/series.py

HyukjinKwon

Looks good otherwise.

HyukjinKwon · 2020-05-19T01:09:47Z

@itholic can you rebase please?

codecov-commenter · 2020-05-19T01:45:29Z

Codecov Report

Merging #1501 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1501   +/-   ##
=======================================
  Coverage   94.00%   94.01%           
=======================================
  Files          36       36           
  Lines        8444     8458   +14     
=======================================
+ Hits         7938     7952   +14     
  Misses        506      506

Impacted Files	Coverage Δ
databricks/koalas/missing/series.py	`100.00% <ø> (ø)`
databricks/koalas/series.py	`97.71% <100.00%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0267539...1ced0ab. Read the comment docs.

HyukjinKwon · 2020-05-19T01:56:29Z

databricks/koalas/series.py

+        sdf = sdf.groupby(index_scol_names).pivot(pivot_col).sum(data_scol_name)
+        internal = _InternalFrame(
+            spark_frame=sdf,
+            index_map=OrderedDict((index_scol_name, None) for index_scol_name in index_scol_names),


I think it wouldn't but can you check if the name of the index is kept in pandas?

Ah, It's kept in pandas. 😓 I'll fix it soon

>>> pser hello Koalas one a 10 b -2 two a 4 b 7 Name: 0, dtype: int64 >>> pser.unstack() Koalas a b hello one 10 -2 two 4 7

Thanks for the catch!

ueshin · 2020-05-19T20:41:23Z

We should support non-numeric datatypes as well?

>>> pd.Series(list('abcd'), index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']])).unstack()
     a  b
one  a  b
two  c  d

itholic · 2020-05-19T22:44:20Z

@ueshin I'll adress it, too. Thanks! 👍

) According to #1501 (comment) and #1501 (comment), Fix `Series.unstack()` to support non-numeric type and keep the names of index and columns. ```python >>> kser = ks.Series(list('abcd'), index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']], names=["A", "B"])) >>> kser A B one a a b b two a c b d Name: 0, dtype: object >>> kser.unstack() B a b A two c d one a b ```

…527) According to databricks/koalas#1501 (comment) and databricks/koalas#1501 (comment), Fix `Series.unstack()` to support non-numeric type and keep the names of index and columns. ```python >>> kser = ks.Series(list('abcd'), index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']], names=["A", "B"])) >>> kser A B one a a b b two a c b d Name: 0, dtype: object >>> kser.unstack() B a b A two c d one a b ```

Implement Series.unstack

3308478

HyukjinKwon reviewed May 18, 2020

View reviewed changes

databricks/koalas/series.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed May 18, 2020

View reviewed changes

databricks/koalas/series.py Show resolved Hide resolved

HyukjinKwon reviewed May 18, 2020

View reviewed changes

databricks/koalas/series.py Show resolved Hide resolved

HyukjinKwon approved these changes May 18, 2020

View reviewed changes

Remove the duplicated check

1f3a680

Resolve conflicts

1ced0ab

HyukjinKwon merged commit ca7949e into databricks:master May 19, 2020

itholic deleted the s_unstack branch May 19, 2020 01:53

HyukjinKwon reviewed May 19, 2020

View reviewed changes

itholic mentioned this pull request May 22, 2020

Fix Series.unstack to support non-numeric type and keep the names #1527

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Series.unstack #1501

Implement Series.unstack #1501

itholic commented May 16, 2020

codecov-io commented May 16, 2020 •

edited

Loading

HyukjinKwon left a comment

HyukjinKwon commented May 19, 2020

codecov-commenter commented May 19, 2020 •

edited

Loading

HyukjinKwon May 19, 2020

itholic May 19, 2020 •

edited

Loading

ueshin commented May 19, 2020

itholic commented May 19, 2020

Implement Series.unstack #1501

Implement Series.unstack #1501

Conversation

itholic commented May 16, 2020

codecov-io commented May 16, 2020 • edited Loading

Codecov Report

HyukjinKwon left a comment

Choose a reason for hiding this comment

HyukjinKwon commented May 19, 2020

codecov-commenter commented May 19, 2020 • edited Loading

Codecov Report

HyukjinKwon May 19, 2020

Choose a reason for hiding this comment

itholic May 19, 2020 • edited Loading

Choose a reason for hiding this comment

ueshin commented May 19, 2020

itholic commented May 19, 2020

codecov-io commented May 16, 2020 •

edited

Loading

codecov-commenter commented May 19, 2020 •

edited

Loading

itholic May 19, 2020 •

edited

Loading