- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19.2k
Description
I discovered this while trying to tackle issue #32344, where @ryankarlos mentioned groupby.transform('tshift', ...) seems to behave incorrectly.
However, before we can address #32344, we probably need to address this.
# on current master
>>> import pandas as pd
>>> import numpy as np
>>> pd.__version__
'1.1.0.dev0+1708.g043b60920'
>>> df = pd.DataFrame(
...     {
...     "A": ["foo", "foo", "foo", "foo", "bar", "bar", "baz"],
...     "B": [1, 2, np.nan, 3, 3, np.nan, 4],
...     },
...     index=pd.date_range('2020-01-01', '2020-01-07')
... )
>>> df
              A    B
2020-01-01  foo  1.0
2020-01-02  foo  2.0
2020-01-03  foo  NaN
2020-01-04  foo  3.0
2020-01-05  bar  3.0
2020-01-06  bar  NaN
2020-01-07  baz  4.0
>>> df.groupby("A").tshift(1, "D")
                  B
A
bar 2020-01-06  3.0
    2020-01-07  NaN
baz 2020-01-08  4.0
foo 2020-01-02  1.0
    2020-01-03  2.0
    2020-01-04  NaN
    2020-01-05  3.0
>>> df.groupby("A").ffill()
              B
2020-01-01  1.0
2020-01-02  2.0
2020-01-03  2.0
2020-01-04  3.0
2020-01-05  3.0
2020-01-06  3.0
2020-01-07  4.0
>>> df.groupby("A").cumsum()
              B
2020-01-01  1.0
2020-01-02  3.0
2020-01-03  NaN
2020-01-04  6.0
2020-01-05  3.0
2020-01-06  NaN
2020-01-07  4.0We can see that groupby.tshift is inconsistent with other groupby transformations. It retains the groupby column, and more importantly reordered the data.
Since 0.25 we have had deliberate effort to make all groupby transformations consistent, see https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html#dataframe-groupby-ffill-bfill-no-longer-return-group-labels
Following this thinking I would expect the returned data to behave more like
>>> df.groupby("A").tshift(1, "D")  # this is actually the result of df.tshift(1, "D").drop(columns='A')
              B
2020-01-02  1.0
2020-01-03  2.0
2020-01-04  NaN
2020-01-05  3.0
2020-01-06  3.0
2020-01-07  NaN
2020-01-08  4.0However, if we are to make groupby.tshift consistent with other groupby transformation like the above, this makes it no different from df.tshift(1, "D").drop(columns='A')', and groupby` has lost its meaning here.
Perhaps we should just deprecate groupby.tshift entirely? I know #11631 discussed about deprecating tshift, but that has been stalled for a long time.