You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While pandas supports nullable ints via extension arrays, they are still not the default when reading data in. So you can easily get float64 for nullable int series, so you could save data by converting floats to these nullable int types.
It sort of depends on being able to accurately detect that these floats can be converted to ints without loss (well, without much loss, few floats map to ints precisely) - though you already lost data by the automatic conversion to floats in the first place, so we're effectively reverting this loss.
In [8]: data = list(range(1000)) + [None]
In [9]: s = pd.Series(data) # defaults to float64 due to NaN
In [10]: s2 = s.astype('Int16') # Int16 nullable dtype
In [11]: s.memory_usage()
Out[11]: 8136
In [12]: s2.memory_usage()
Out[12]: 3131
The text was updated successfully, but these errors were encountered:
Much obliged for the feedback, I've updated the README to note convert_dtype in Pandas for these. I figure I'll wait for some more feedback from others (especially for how this tool crashes on datasets I haven't considered!) before I make a first round of fixes. Cheers!
While pandas supports nullable ints via extension arrays, they are still not the default when reading data in. So you can easily get float64 for nullable int series, so you could save data by converting floats to these nullable int types.
It sort of depends on being able to accurately detect that these floats can be converted to ints without loss (well, without much loss, few floats map to ints precisely) - though you already lost data by the automatic conversion to floats in the first place, so we're effectively reverting this loss.
The text was updated successfully, but these errors were encountered: