Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Series.unstack #1501

Merged
merged 3 commits into from
May 19, 2020
Merged

Implement Series.unstack #1501

merged 3 commits into from
May 19, 2020

Conversation

itholic
Copy link
Contributor

@itholic itholic commented May 16, 2020

This PR proposes a Series.unstack.

>>> s = ks.Series([1, 2, 3, 4],
...               index=pd.MultiIndex.from_product([['one', 'two'],
...                                                 ['a', 'b']]))
>>> s
one  a    1
     b    2
two  a    3
     b    4
Name: 0, dtype: int64

>>> s.unstack(level=-1).sort_index()
     a  b
one  1  2
two  3  4

>>> s.unstack(level=0).sort_index()
   one  two
a    1    3
b    2    4

@codecov-io
Copy link

codecov-io commented May 16, 2020

Codecov Report

Merging #1501 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1501      +/-   ##
==========================================
+ Coverage   93.93%   93.94%   +0.01%     
==========================================
  Files          36       36              
  Lines        8445     8461      +16     
==========================================
+ Hits         7933     7949      +16     
  Misses        512      512              
Impacted Files Coverage Δ
databricks/koalas/missing/series.py 100.00% <ø> (ø)
databricks/koalas/series.py 97.02% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6fbe6d0...3308478. Read the comment docs.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise.

@HyukjinKwon
Copy link
Member

@itholic can you rebase please?

@codecov-commenter
Copy link

codecov-commenter commented May 19, 2020

Codecov Report

Merging #1501 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1501   +/-   ##
=======================================
  Coverage   94.00%   94.01%           
=======================================
  Files          36       36           
  Lines        8444     8458   +14     
=======================================
+ Hits         7938     7952   +14     
  Misses        506      506           
Impacted Files Coverage Δ
databricks/koalas/missing/series.py 100.00% <ø> (ø)
databricks/koalas/series.py 97.71% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0267539...1ced0ab. Read the comment docs.

@HyukjinKwon HyukjinKwon merged commit ca7949e into databricks:master May 19, 2020
@itholic itholic deleted the s_unstack branch May 19, 2020 01:53
sdf = sdf.groupby(index_scol_names).pivot(pivot_col).sum(data_scol_name)
internal = _InternalFrame(
spark_frame=sdf,
index_map=OrderedDict((index_scol_name, None) for index_scol_name in index_scol_names),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it wouldn't but can you check if the name of the index is kept in pandas?

Copy link
Contributor Author

@itholic itholic May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, It's kept in pandas. 😓 I'll fix it soon

>>> pser
hello  Koalas
one    a         10
       b         -2
two    a          4
       b          7
Name: 0, dtype: int64

>>> pser.unstack()
Koalas   a  b
hello
one     10 -2
two      4  7

Thanks for the catch!

@ueshin
Copy link
Collaborator

ueshin commented May 19, 2020

We should support non-numeric datatypes as well?

>>> pd.Series(list('abcd'), index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']])).unstack()
     a  b
one  a  b
two  c  d

@itholic
Copy link
Contributor Author

itholic commented May 19, 2020

@ueshin I'll adress it, too. Thanks! 👍

HyukjinKwon pushed a commit that referenced this pull request May 25, 2020
)

According to #1501 (comment) and #1501 (comment), Fix `Series.unstack()` to support non-numeric type and keep the names of index and columns.

```python
>>> kser = ks.Series(list('abcd'), index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']], names=["A", "B"]))
>>> kser
A    B
one  a    a
     b    b
two  a    c
     b    d
Name: 0, dtype: object

>>> kser.unstack()
B    a  b
A
two  c  d
one  a  b
```
rising-star92 added a commit to rising-star92/databricks-koalas that referenced this pull request Jan 27, 2023
…527)

According to databricks/koalas#1501 (comment) and databricks/koalas#1501 (comment), Fix `Series.unstack()` to support non-numeric type and keep the names of index and columns.

```python
>>> kser = ks.Series(list('abcd'), index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']], names=["A", "B"]))
>>> kser
A    B
one  a    a
     b    b
two  a    c
     b    d
Name: 0, dtype: object

>>> kser.unstack()
B    a  b
A
two  c  d
one  a  b
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants