-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Frame.abs to support bool type and disallow non-numeric types. #1980
Conversation
} | ||
) | ||
kdf = ks.from_pandas(pdf) | ||
self.assert_eq(kdf.A.abs(), pdf.A.abs()) | ||
self.assert_eq(kdf.B.abs(), pdf.B.abs()) | ||
self.assert_eq(kdf.E.abs(), pdf.E.abs()) | ||
# pandas' bug? | ||
# self.assert_eq(kdf[["B", "C", "E"]].abs(), pdf[["B", "C", "E"]].abs()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pandas result is:
>>> pdf = pd.DataFrame(
... {
... "A": [1, -2, 3, -4, 5],
... "B": [1.0, -2, 3, -4, 5],
... "C": [-6.0, -7, -8, -9, 10],
... "D": ["a", "b", "c", "d", "e"],
... "E": [True, False, False, True, True],
... }
... )
>>> pdf[["B", "C", "E"]].abs()
B C E
0 1 6 1
1 2 7 0
2 3 8 0
3 4 9 1
4 5 10 1
whereas calculating separately:
>>> pdf[["B", "C"]].abs()
B C
0 1.0 6.0
1 2.0 7.0
2 3.0 8.0
3 4.0 9.0
4 5.0 10.0
>>> pdf[["E"]].abs()
E
0 True
1 False
2 False
3 True
4 True
I believe this is a pandas bug.
The current Koalas behavior:
>>> ks.from_pandas(pdf)[["B", "C", "E"]].abs()
B C E
0 1.0 6.0 True
1 2.0 7.0 False
2 3.0 8.0 False
3 4.0 9.0 True
4 5.0 10.0 True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pandas bug to me, too. Let me report this to pandas dev.
Codecov Report
@@ Coverage Diff @@
## master #1980 +/- ##
=======================================
Coverage 94.57% 94.57%
=======================================
Files 50 50
Lines 10946 10952 +6
=======================================
+ Hits 10352 10358 +6
Misses 594 594
Continue to review full report at Codecov.
|
return kser.spark.transform(F.abs) | ||
else: | ||
raise TypeError( | ||
"bad operand type for abs(): {}".format(kser.spark.data_type.simpleString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, should we show pandas' dtype instead of Spark data type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, shall we address it separately. The same error message will appear in the other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's address #1980 (comment) in separately.
Thanks! merging. |
Addressing #1980 (comment) to add pandas dtypes.
Addressing databricks/koalas#1980 (comment) to add pandas dtypes.
Fixes
Frame.abs
to support bool type and disallow non-numeric types.