Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.reindex supports koalas Index parameter #1741

Merged
merged 7 commits into from
Oct 1, 2020

Conversation

LucasG0
Copy link
Contributor

@LucasG0 LucasG0 commented Sep 1, 2020

This PR would close #1740.

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, seems fine to me.

databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved
@LucasG0 LucasG0 force-pushed the reindex_koalas_index branch from 74daed9 to 3a9c0aa Compare September 2, 2020 06:53
self.assert_eq(
pdf.reindex(index=pdf2.index, fill_value=0.0).sort_index(),
kdf.reindex(index=kdf2.index, fill_value=0.0).sort_index(),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what if the given index is MultiIndex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing, I updated PR so it has the same behavior of pandas, considering we currently can only reindex single level index.

Comment on lines 2400 to 2401
pdf2 = pd.DataFrame({"a": [1.0, 2.0, 3.0, 4.0, 5.0]}, index=index2)
kdf2 = ks.from_pandas(pdf2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can just use Index?

pindex2 = pd.Index(["A", "C", "D", "E", "0"])
kindex2 = ks.from_pandas(pindex2)

@LucasG0
Copy link
Contributor Author

LucasG0 commented Sep 6, 2020

I updated with named Index + MultiIndex support.
For MultiIndex, as currently only reindexing one level index is supported,
it just returns a filled DataFrame with new index and current object columns.
Also, we can now reindex columns of an empty DataFrame as pandas supports.

Actually I think we could also support reindex on a DataFrame already indexed by a MultiIndex so I will make the changes.

databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
@LucasG0 LucasG0 force-pushed the reindex_koalas_index branch from 876943f to 0292c95 Compare September 21, 2020 11:52
@LucasG0 LucasG0 force-pushed the reindex_koalas_index branch from 0292c95 to edf9679 Compare September 21, 2020 11:55
@LucasG0
Copy link
Contributor Author

LucasG0 commented Sep 21, 2020

Hi, I updated PR to fully support MultiIndex.
To be consistent with pandas behavior :

  • Reindexing MultiIndex on multiIndexed DataFrame is supported
  • Reindexing single Index on multiIndexex DataFrame raise an error
  • Reindexing MultiIndex on single indexed DataFrame is supported and returns a DataFrame with new index and original columns filled with fill_value

@LucasG0 LucasG0 force-pushed the reindex_koalas_index branch from 006c13d to afeffb6 Compare September 22, 2020 10:48
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I left some comments. Thanks!

databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
@LucasG0 LucasG0 force-pushed the reindex_koalas_index branch from 1945dae to 071f812 Compare September 30, 2020 22:33
databricks/koalas/frame.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

@LucasG0 LucasG0 force-pushed the reindex_koalas_index branch from 071f812 to fd1c41b Compare September 30, 2020 23:36
@ueshin
Copy link
Collaborator

ueshin commented Oct 1, 2020

Thanks! merging.

@ueshin ueshin merged commit 2981b0f into databricks:master Oct 1, 2020
@LucasG0
Copy link
Contributor Author

LucasG0 commented Oct 1, 2020

Thanks ! :)

@LucasG0 LucasG0 deleted the reindex_koalas_index branch October 1, 2020 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.reindex with a koalas Index parameter is not implemented
4 participants