Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nullable ints interpreted as floats #1

Open
kokes opened this issue Jun 2, 2020 · 1 comment
Open

nullable ints interpreted as floats #1

kokes opened this issue Jun 2, 2020 · 1 comment

Comments

@kokes
Copy link

kokes commented Jun 2, 2020

While pandas supports nullable ints via extension arrays, they are still not the default when reading data in. So you can easily get float64 for nullable int series, so you could save data by converting floats to these nullable int types.

It sort of depends on being able to accurately detect that these floats can be converted to ints without loss (well, without much loss, few floats map to ints precisely) - though you already lost data by the automatic conversion to floats in the first place, so we're effectively reverting this loss.

In [8]: data = list(range(1000)) + [None]                                                                                                              
In [9]: s = pd.Series(data) # defaults to float64 due to NaN                                                                                                                           
In [10]: s2 = s.astype('Int16') # Int16 nullable dtype                                                                         

In [11]: s.memory_usage()                                                                                                                              
Out[11]: 8136

In [12]: s2.memory_usage()                                                                                                                             
Out[12]: 3131
@ianozsvald
Copy link
Owner

Much obliged for the feedback, I've updated the README to note convert_dtype in Pandas for these. I figure I'll wait for some more feedback from others (especially for how this tool crashes on datasets I haven't considered!) before I make a first round of fixes. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants