Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame.dot supporting Series parameter #1945

Merged
merged 9 commits into from
Dec 8, 2020

Conversation

xinrong-meng
Copy link
Contributor

@xinrong-meng xinrong-meng commented Dec 1, 2020

Compute the matrix multiplication between the DataFrame and other series only.

        >>> kdf = ks.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
        >>> kser = ks.Series([1, 1, 2, 1])
        >>> kdf.dot(kser)
        0   -4
        1    5
        dtype: int64

        Note how shuffling of the objects does not change the result.

        >>> kser2 = kser.reindex([1, 0, 2, 3])
        >>> kdf.dot(kser2)
        0   -4
        1    5
        dtype: int64

@xinrong-meng xinrong-meng marked this pull request as ready for review December 4, 2020 19:30
@xinrong-meng xinrong-meng changed the title Implement DataFrame.dot Implement DataFrame.dot supporting Series parameter Dec 4, 2020
@codecov-io
Copy link

codecov-io commented Dec 4, 2020

Codecov Report

Merging #1945 (6ed71de) into master (901a6f0) will decrease coverage by 0.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1945      +/-   ##
==========================================
- Coverage   94.64%   94.62%   -0.02%     
==========================================
  Files          49       49              
  Lines       10831    10834       +3     
==========================================
+ Hits        10251    10252       +1     
- Misses        580      582       +2     
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 100.00% <ø> (ø)
databricks/koalas/frame.py 96.72% <83.33%> (-0.04%) ⬇️
databricks/koalas/series.py 96.85% <0.00%> (-0.12%) ⬇️
databricks/koalas/namespace.py 84.19% <0.00%> (-0.04%) ⬇️
databricks/koalas/missing/indexes.py 100.00% <0.00%> (ø)
databricks/koalas/indexes.py 96.97% <0.00%> (+0.01%) ⬆️
databricks/koalas/internal.py 96.49% <0.00%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 901a6f0...6ed71de. Read the comment docs.

@xinrong-meng xinrong-meng changed the title Implement DataFrame.dot supporting Series parameter Implement DataFrame.dot Dec 7, 2020
Comment on lines +4065 to +4066
if not isinstance(other, ks.Series):
raise TypeError("Unsupported type {}".format(type(other).__name__))
Copy link
Contributor

@itholic itholic Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pandas also support many other types as below, not only Series if they are "aligned".

For example,

>>> pdf
   0  1  2  3
0  0  1 -2 -1
1  1  1  1  1
>>> pser
0    1
1    1
2    2
3    1
dtype: int64

>>> pdf.dot(pser.to_frame())  # DataFrame
   0
0 -4
1  5

>>> pdf.dot(pser.index)  # Index
0   -6
1    6
dtype: int64

>>> pdf.dot([1, 2, 3, 4])  # list
0    -8
1    10
dtype: int64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should've clarified in the PR description. This PR targets to support other as Series only.

Considering the implementation complexity and user demand, we might want to support other data types later.

Let me modify the PR description and leave a comment for this.

Thanks for looking into this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sure I got it :)

@xinrong-meng xinrong-meng changed the title Implement DataFrame.dot Implement DataFrame.dot supporting Series parameter Dec 8, 2020
@xinrong-meng xinrong-meng requested a review from ueshin December 8, 2020 19:31
databricks/koalas/frame.py Show resolved Hide resolved
databricks/koalas/frame.py Show resolved Hide resolved
databricks/koalas/frame.py Show resolved Hide resolved
@xinrong-meng xinrong-meng requested a review from ueshin December 8, 2020 21:40
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ueshin
Copy link
Collaborator

ueshin commented Dec 8, 2020

Thanks! merging.

@ueshin ueshin merged commit 9e8d99b into databricks:master Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants