-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable index operations. #1955
Enable index operations. #1955
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1955 +/- ##
==========================================
- Coverage 94.63% 94.47% -0.17%
==========================================
Files 49 49
Lines 10829 10855 +26
==========================================
+ Hits 10248 10255 +7
- Misses 581 600 +19
Continue to review full report at Codecov.
|
# only work between at most two `Index`s. We might need to fix it in the future. | ||
self_len = len(self) | ||
if any(len(col) != self_len for col in args if isinstance(col, IndexOpsMixin)): | ||
raise ValueError("operands could not be broadcast together with shapes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does length comparison relate to broadcast in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "broadcast" here is not the same as Spark's "broadcast".
Maybe the term should've been "broad-cast", which means cast to broader types, or in this case cast to larger size. I'd just follow pandas' error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, maybe that's what we are talking about
>>> z= np.arange(12).reshape(3,4)
>>> m= np.arange(9).reshape(3,3)
>>> z * m
Traceback (most recent call last):
File "<input>", line 1, in <module>
z * m
ValueError: operands could not be broadcast together with shapes (3,4) (3,3)
The current error message also looks good.
"Cannot combine the series or dataframe because it comes from a different dataframe. " | ||
"In order to allow this operation, enable 'compute.ops_on_diff_frames' option." | ||
) | ||
raise ValueError(ERROR_MESSAGE_CANNOT_COMBINE) | ||
|
||
|
||
def align_diff_frames( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you!
@@ -376,12 +385,11 @@ def align_diff_frames( | |||
return kdf | |||
|
|||
|
|||
def align_diff_series(func, this_series, *args, how="full"): | |||
from databricks.koalas.base import IndexOpsMixin | |||
def align_diff_series(func, this_series: "Series", *args, how: str = "full") -> "Series": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering whether we should keep align_diff_series
here or move to series.py
since it is only for Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe we can move it, but I'd leave it to the future PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
Thanks! I'd merge this now. Please feel free to leave comments if any. @HyukjinKwon @itholic |
Currently Koalas can't handle index operations very well.
or
This PR enables those operations: