Skip to content

Conversation

@phofl
Copy link
Member

@phofl phofl commented Aug 13, 2021

This ensures consistency with the DataFrame constructor

@phofl phofl added Constructors Series/DataFrame/Index/pd.array Constructors Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Series Series data structure labels Aug 13, 2021
# GH#43018
ser = Series(np.nan, dtype="object")
result = ser.astype("bool")
expected = Series(True, dtype="bool")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is True?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool(np.nan) returns True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly.

def test_constructor_bool_dtype_missing_values(self):
# GH#43018
result = Series(index=[0], dtype="bool")
expected = Series(True, index=[0], dtype="bool")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is True?

@jreback jreback added this to the 1.4 milestone Aug 19, 2021
@jreback jreback merged commit 098661e into pandas-dev:master Aug 19, 2021
@jreback
Copy link
Contributor

jreback commented Aug 19, 2021

thanks @phofl

@simonjayhawkins
Copy link
Member

@phofl we have had reports regarding both these cases. Is the new behavior now consistent?

import pandas as pd

print(pd.__version__)
s = pd.Series(dtype="int", index=[0])
print(s)
s2 = pd.Series(dtype="bool", index=[0])
print(s2)
1.3.5
0    0
dtype: int64
0    False
dtype: bool
1.4.1
0   NaN
dtype: float64
0    True
dtype: bool

It appears to me (and maybe others from the issues) that the changes are confusing.

We have changed the int case and and instead create a float array saying that the missing value cannot be held in a integer array and yet for the bool case we continue to keep the bool dtype even though we cannot represent a missing value in a boolean array?

I appreciate that int(np.nan) raises ValueError: cannot convert float NaN to integer and bool(np.nan) is True but the users are not specifying np.nan, they are not supplying data. I wonder whether we ought to revert this until we change the constructors to use nullable dtypes?

@jbrockmendel
Copy link
Member

I wonder whether we ought to revert this until we change the constructors to use nullable dtypes?

no opinion on the reversion, but i advise against the implicit assumption that constructors are going to default to nullable dtypes. support has improved, but its still a mess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Constructors Series/DataFrame/Index/pd.array Constructors Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Series Series data structure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Default value for Series(dtype=int) is not pd.NA

4 participants