Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented intersection for Index & MultiIndex #1747

Merged
merged 15 commits into from
Nov 3, 2020

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Sep 3, 2020

This PR proposes the new API Index.intersection() and MultiIndex.intersection().

>>> idx1 = ks.Index([1, 2, 3, 4])
>>> idx2 = ks.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'd separate the function into ones for each class as there are so many if isinstance(self, MultiIndex): .. else: ...

databricks/koalas/indexes.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_indexes.py Show resolved Hide resolved
databricks/koalas/tests/test_indexes.py Show resolved Hide resolved
databricks/koalas/indexes.py Outdated Show resolved Hide resolved
databricks/koalas/indexes.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_indexes.py Show resolved Hide resolved
databricks/koalas/indexes.py Outdated Show resolved Hide resolved
@itholic
Copy link
Contributor Author

itholic commented Sep 23, 2020

Actually I'd separate the function into ones for each class as there are so many if isinstance(self, MultiIndex): .. else: ...

Sounds good. I'll separate

@xinrong-meng xinrong-meng self-requested a review October 31, 2020 15:30
@itholic
Copy link
Contributor Author

itholic commented Nov 2, 2020

@ueshin ,

For #1747 (comment) and #1747 (comment),

The comparison between empty MultiIndex seems not work properly for now.

empty_pandas_multi_index = pmidx.intersection(pidx)  # MultiIndex([], )
empty_koalas_multi_index = ks.from_pandas(empty_pandas_multi_index)  # MultiIndex([], )
self.assert_eq(empty_pandas_multi_index, empty_koalas_multi_index)

The test above is failed with following error.

AssertionError: MultiIndex level [0] are different

MultiIndex level [0] classes are not equivalent
[left]:  Index([], dtype='object')
[right]: Float64Index([], dtype='float64')

Left:
MultiIndex([], )
object

Right:
MultiIndex([], )
object

Maybe let me investigate this in the separated PR ?

@codecov-io
Copy link

codecov-io commented Nov 2, 2020

Codecov Report

Merging #1747 into master will increase coverage by 0.00%.
The diff coverage is 94.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1747   +/-   ##
=======================================
  Coverage   94.19%   94.20%           
=======================================
  Files          40       40           
  Lines        9867     9915   +48     
=======================================
+ Hits         9294     9340   +46     
- Misses        573      575    +2     
Impacted Files Coverage Δ
databricks/koalas/missing/indexes.py 100.00% <ø> (ø)
databricks/koalas/indexes.py 97.05% <94.00%> (-0.22%) ⬇️
databricks/koalas/spark/accessors.py 94.96% <0.00%> (+0.62%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62fb01c...7197f62. Read the comment docs.

@HyukjinKwon
Copy link
Member

The empty one would likely be an issue from Spark IIRC.

@HyukjinKwon HyukjinKwon merged commit bb22748 into databricks:master Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants