-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Open
Labels
BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateduplicatedduplicated, drop_duplicatesduplicated, drop_duplicates
Description
Found out while writing tests for .duplicated in #21645 (so far, .duplicated was almost exclusively tested implicitly through .drop_duplicates)
At first I thought this is intended behaviour for DataFrame.duplicated(), but Series.duplicated() does not treat it equally. This makes sense to me, since as objects, None is not np.nan - I therefore labelled this as a bug.
s = pd.Series([np.nan, 3, 3, None, np.nan], dtype=object)
s
# 0 NaN
# 1 3
# 2 3
# 3 None
# 4 NaN
# dtype: object
s.duplicated()
# 0 False
# 1 False
# 2 True
# 3 False
# 4 True
# dtype: bool
s.to_frame().duplicated()
# 0 False
# 1 False
# 2 True
# 3 True <- CHANGED
# 4 True
# dtype: bool
Metadata
Metadata
Assignees
Labels
BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateduplicatedduplicated, drop_duplicatesduplicated, drop_duplicates