Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

astype() handling of None #1806

Closed
amin-nejad opened this issue Sep 29, 2020 · 2 comments · Fixed by #1818
Closed

astype() handling of None #1806

amin-nejad opened this issue Sep 29, 2020 · 2 comments · Fixed by #1818
Assignees
Labels
bug Something isn't working

Comments

@amin-nejad
Copy link

amin-nejad commented Sep 29, 2020

koalas appears to ignore None values when converting the type of a DataFrame or Series using .astype() instead of converting them which is the pandas behaviour. For instance when converting to str, in pandas None becomes 'None' but this does not happen in koalas. Example below shown by using sorted() which can't handle None:

Example

import databricks.koalas as ks
import pandas as pd

data = pd.Series(['a', 'b', 'c', None])
sorted(data.astype(str).tolist())
# => ['None', 'a', 'b', 'c']

data = ks.Series(['a', 'b', 'c', None])
sorted(data.astype(str).tolist())
# ---------------------------------------------------------------------------
# TypeError                                 Traceback (most recent call last)
# <ipython-input-11-493f99e0fb6f> in <module>
#       1 data = ks.Series(['a', 'b', 'c', None])
# ----> 2 sorted(data.astype(str).tolist())
# 
# TypeError: '<' not supported between instances of 'NoneType' and 'str'

Is there a reason for this or can we bring this in line with pandas? Thanks

Ubuntu 18.04
python 3.7.6
koalas==1.2.0
pandas==1.0.5

@HyukjinKwon HyukjinKwon added the bug Something isn't working label Sep 29, 2020
@HyukjinKwon
Copy link
Member

@itholic can you take a look when you find some time?

@itholic
Copy link
Contributor

itholic commented Oct 5, 2020

Sure, let me take a look.
Thanks for the report, @amin-nejad !

itholic added a commit that referenced this issue Oct 7, 2020
This should fix #1806 

```python
>>> data = ks.Series(['a', 'b', 'c', None])
>>> sorted(data.astype(str).tolist())
['None', 'a', 'b', 'c']
```

For `DataFrame.astype` also works.

```python
>>> kdf
   A     B     C
0  3  10.0     a
1  4  20.0     b
2  5  30.0     c
3  6  40.0     d
4  7  50.0  None

>>> sorted(kdf.astype(str).C.tolist())
['None', 'a', 'b', 'c', 'd']
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants