-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Frame.abs to support bool type and disallow non-numeric types. #1980
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -94,16 +94,26 @@ def test_sum(self): | |
def test_abs(self): | ||
pdf = pd.DataFrame( | ||
{ | ||
"A": [1, -2, 3, -4, 5], | ||
"B": [1.0, -2, 3, -4, 5], | ||
"C": [-6.0, -7, -8, -9, 10], | ||
"D": ["a", "b", "c", "d", "e"], | ||
"A": [1, -2, np.nan, -4, 5], | ||
"B": [1.0, -2, np.nan, -4, 5], | ||
"C": [-6.0, -7, -8, np.nan, 10], | ||
"D": ["a", "b", "c", "d", np.nan], | ||
"E": [True, np.nan, False, True, True], | ||
} | ||
) | ||
kdf = ks.from_pandas(pdf) | ||
self.assert_eq(kdf.A.abs(), pdf.A.abs()) | ||
self.assert_eq(kdf.B.abs(), pdf.B.abs()) | ||
self.assert_eq(kdf.E.abs(), pdf.E.abs()) | ||
# pandas' bug? | ||
# self.assert_eq(kdf[["B", "C", "E"]].abs(), pdf[["B", "C", "E"]].abs()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The pandas result is: >>> pdf = pd.DataFrame(
... {
... "A": [1, -2, 3, -4, 5],
... "B": [1.0, -2, 3, -4, 5],
... "C": [-6.0, -7, -8, -9, 10],
... "D": ["a", "b", "c", "d", "e"],
... "E": [True, False, False, True, True],
... }
... )
>>> pdf[["B", "C", "E"]].abs()
B C E
0 1 6 1
1 2 7 0
2 3 8 0
3 4 9 1
4 5 10 1 whereas calculating separately: >>> pdf[["B", "C"]].abs()
B C
0 1.0 6.0
1 2.0 7.0
2 3.0 8.0
3 4.0 9.0
4 5.0 10.0
>>> pdf[["E"]].abs()
E
0 True
1 False
2 False
3 True
4 True I believe this is a pandas bug. The current Koalas behavior: >>> ks.from_pandas(pdf)[["B", "C", "E"]].abs()
B C E
0 1.0 6.0 True
1 2.0 7.0 False
2 3.0 8.0 False
3 4.0 9.0 True
4 5.0 10.0 True There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems pandas bug to me, too. Let me report this to pandas dev. |
||
self.assert_eq(kdf[["B", "C"]].abs(), pdf[["B", "C"]].abs()) | ||
self.assert_eq(kdf[["E"]].abs(), pdf[["E"]].abs()) | ||
|
||
with self.assertRaisesRegex(TypeError, "bad operand type for abs\\(\\): string"): | ||
kdf.abs() | ||
with self.assertRaisesRegex(TypeError, "bad operand type for abs\\(\\): string"): | ||
kdf.D.abs() | ||
|
||
def test_axis_on_dataframe(self): | ||
# The number of each count is intentionally big | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, should we show pandas' dtype instead of Spark data type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, shall we address it separately. The same error message will appear in the other places.