-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
I was surprised to find that DataFrame.set_index has no analogue for Series. This is especially surprising because I remember reading in several comments here that it is strongly preferred (i.a. by @jreback) that users do not set attributes like .name or .index directly -- but currently, the only options to change the index on a Series is either that, or reconstructing with pd.Series(s, index=desired_index).
This is relevant in many scenarios, but in case someone would like a more concrete example -- I'm currently working on having .duplicated be able to return an inverse for DataFrame / Series / Index, see #21645. This inverse needs to link two different indexes -- the one of the original object, and the index of the deduplicated one. To reconstruct from the unique values, one needs exactly such a .set_index operation, because .reindex in itself cannot read a set of indexes and assign them to a different set of indexes in one go (xref #21685).
s = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
isdup, inv = s.duplicated(keep='last', return_inverse=True)
isdup
# 0 True
# 1 True
# 2 True
# 3 False
# 4 False
# 5 False
# dtype: bool
inv
# 0 4
# 1 5
# 2 4
# 3 3
# 4 4
# 5 5
# dtype: int64
unique = s.loc[~isdup]
unique
# 3 c
# 4 a
# 5 b
# dtype: object
reconstruct = unique.reindex(inv)
reconstruct
# 4 a
# 5 b
# 4 a
# 3 c
# 4 a
# 5 b
# dtype: object
This object obviously still has the wrong index to be equal to the original. For DataFrames, the reconstruction would work as unique.reindex(inv.values).set_index(inv.index), and consequently, this should be available for Series as well:
Desired:
reconstruct = unique.reindex(inv.values).set_index(inv.index)
reconstruct
# 0 a
# 1 b
# 2 a
# 3 c
# 4 a
# 5 b
# dtype: object